Commit Graph

160922 Commits

Author SHA1 Message Date
OverMighty 232953f996 [AArch64] Add pattern for SQDML*Lv1i32_indexed
There was no pattern to fold into these instructions. This patch adds
the pattern obtained from the following ACLE intrinsics so that they
generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and
sqadd/sqsub instructions:
 - vqdmlalh_s16, vqdmlslh_s16
 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16,
   vqdmlslh_laneq_s16 (when the lane index is 0)

It also modifies the result of the existing pattern for the latter, when
the lane index is not 0, to use the v1i32_indexed instructions instead
of the v4i16_indexed ones.

Fixes #49997.

Differential Revision: https://reviews.llvm.org/D131700
2022-08-17 12:00:47 +01:00
Rainer Orth d9993484ee [Sparc] Don't use SunStyleELFSectionSwitchSyntax
As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests
currently `FAIL` on Sparc since that backend uses the Sun assembler syntax
for the `.section` directive, controlled by
`SunStyleELFSectionSwitchSyntax`.

Instead of adapting the affected tests, this patch changes that default.
The internal assembler still accepts both forms as input, only the output
syntax is affected.

Current support for the Sun syntax is cursory at best: the built-in
assembler cannot even assemble some of the directives emitted by GCC, and
the set supported by the Solaris assembler is even larger: SPARC Assembly
Language Reference Manual, 3.4 Pseudo-Op Attributes
<https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>.

A few Sparc test cases need to be adjusted. At the same time, the patch
fixes the failures from D85414 <https://reviews.llvm.org/D85414>.

Tested on `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D85415
2022-08-17 12:59:29 +02:00
Simon Pilgrim 1d522a39f7 [TTI] Remove getInstructionThroughput cost helper.
Pulled out of D79483 - we can just as easily use getUserCost directly
2022-08-17 11:41:47 +01:00
Simon Pilgrim 594c5b1a42 [SLP] Update TODO comment about shuffle mask decoding
This is handled in ShuffleVectorInst/getShuffleCost - getInstructionThroughput is (slowly) being removed.
2022-08-17 11:41:46 +01:00
Zain Jaffal f61f99a105
[instcombine] Optimise for zero initialisation of product given fast flags are enabled
Currently, clang ignores the 0 initialisation in finite math
For example:

```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
  f_prod *= arr[i];
 }
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.

Reviewed By: fhahn, spatel

Differential Revision: https://reviews.llvm.org/D131672
2022-08-17 11:12:15 +01:00
Andre Vieira 49223e0a2d [TypePromotion] Don't promote PHI + ZExt if wider than RegisterBitWidth
Differential Revision: https://reviews.llvm.org/D131966
2022-08-17 09:54:15 +01:00
Graham Hunter 70d35443dc [LAA] Handle forked pointers with add/sub instructions
Handle cases where a forked pointer has an add or sub instruction
before reaching a select.

Reviewed By: fhahn
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130278
2022-08-17 09:51:13 +01:00
Craig Topper d27c147aaa [RISCV] Allow lowerSELECT to fold integer setcc with FP select.
We'd pick it up in DAG combine later even if we didn't handle it here.
No test changes because we get it in DAG combine anyway.
2022-08-16 21:28:54 -07:00
Vitaly Buka 16fecdfa70 Revert "[AArch64] Add `foldCSELOfCSEl` DAG combine"
Breaks ubsan on buildbot, details in D125504

This reverts commit 6f9423ef06.
2022-08-16 20:29:37 -07:00
Craig Topper ba1fb54821 [RISCV] Reuse existing VT variable instead of calling getValueType() repeatedly. NFC 2022-08-16 19:56:55 -07:00
Monk Chiang 0af4651c0f [RISCV] Add scheduling class for vector pseudo segment instructions.
Add scheduling resource for vector segment load/store instructions in D128886.
I miss to add scheduling resource for pseudo segment instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130222
2022-08-16 17:54:47 -07:00
Martin Sebor a7a1be11e6 [InstCombine] convert second std::min argument to same type as first
Ensure both arguments to std::min have the same type in all data models.
2022-08-16 17:34:33 -06:00
Eli Friedman cfd2c5ce58 Untangle the mess which is MachineBasicBlock::hasAddressTaken().
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code.  Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.

Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.

So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.

Discovered while trying to sort out related stuff on D102817.

Differential Revision: https://reviews.llvm.org/D124697
2022-08-16 16:15:44 -07:00
Craig Topper 53ce22e429 Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.

Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:51:07 -07:00
Craig Topper 2dfa4b6475 Revert "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine."
This reverts commit 1380b21ceb.

I mixed up N0 and N1 and didn't do what I intended.
2022-08-16 15:47:01 -07:00
Martin Sebor 345514e991 [InstCombine] Add support for strlcpy folding
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130666
2022-08-16 16:43:40 -06:00
Craig Topper 1380b21ceb [RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine.
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
2022-08-16 15:40:09 -07:00
Craig Topper b5a18de651 [RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)).
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
2022-08-16 14:49:52 -07:00
Martin Sebor e858f5120d [InstCombine] Remove assumptions about int having 32 bits
Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D131731
2022-08-16 15:35:08 -06:00
Craig Topper de6fd16971 [RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12.
We still need to materialize the constant in a register and we
may not be removing all uses of the original constant so it may
increase code size.
2022-08-16 14:11:31 -07:00
Craig Topper 4184edc691 [RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc.
This introduce an xori in some cases. I don't believe it was the
intention of the original patch. This was an accident because
nonan FP equality compares also use SETEQ/SETNE.

Also pass the correct type to getSetCCInverse.
2022-08-16 13:00:36 -07:00
Craig Topper c7e58836e8 [RISCV] Minor cleanups to performSUBCombine. NFC
-Rename variable NnzC -> N0C.
-Use SelectionDAG::getSetCC to reduce code.
-Use SDValue::getOperand instead of operator-> and SDNode::getOperand.

Initial steps to add another similar combine to this code.
2022-08-16 12:59:16 -07:00
Sanjay Patel ce081776b2 [FlattenCFG] avoid crash on malformed code
We don't have a dominator tree in this pass, so we
can't bail out sooner by checking for unreachable
code, but this is a minimal fix for the example in
issue #56875.
2022-08-16 15:11:00 -04:00
Nicolas Miller ccfabfbb1f Fix subrange liveness checking at rematerialization
This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

```
%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec
```

From `@3440` into the body of a loop `@16312`, but `%3078` has the following live ranges:

```
%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r  L0000000000000003 [2224r,3440r:0) 0@2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r
```

So `@16312e` `%3078.sub1` is alive but `%3078.sub0` is dead, so this instruction being moved there leads to invalid memory accesses as `3078.sub0` ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Original Author: npmiller

Reviewed by: rampitec

Submitted by: rampitec

Differential Revision: https://reviews.llvm.org/D131884
2022-08-16 10:50:09 -07:00
Simon Pilgrim 08d153d806 [ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range (PR38280)
If computeKnownBits encounters a phi node, and we fail to determine any known bits through direct analysis, see if the incoming value is part of a branch condition feeding the phi.

Handle cases where icmp(IncomingValue PRED Constant) is driving a branch instruction feeding that phi node - at the moment this only handles EQ/ULT/ULE predicate cases as they are the most straightforward to handle and most likely for branch-loop 'max upper bound' cases - we can extend this if/when necessary.

I investigated a more general icmp(LHS PRED RHS) KnownBits system, but the hard limits we put on value tracking depth through phi nodes meant that we were mainly catching constants anyhow.

Fixes the pointless vectorization in PR38280 / Issue #37628 (excessive unrolling still needs handling though)

Differential Revision: https://reviews.llvm.org/D131838
2022-08-16 16:54:44 +01:00
Arthur Eubanks 9181ce623f [Windows] Put init_seg(compiler/lib) in llvm.global_ctors
Currently we treat initializers with init_seg(compiler/lib) as similar
to any other init_seg, they simply have a global variable in the proper
section (".CRT$XCC" for compiler/".CRT$XCL" for lib) and are added to
llvm.used. However, this doesn't match with how LLVM sees normal (or
init_seg(user)) initializers via llvm.global_ctors. This
causes issues like incorrect init_seg(compiler) vs init_seg(user)
ordering due to GlobalOpt evaluating constructors, and the
ability to remove init_seg(compiler/lib) initializers at all.

Currently we use 'A' for priorities less than 200. Use 200 for
init_seg(compiler) (".CRT$XCC") and 400 for init_seg(lib) (".CRT$XCL"),
which do not append the priority to the section name. Priorities
between 200 and 400 use ".CRT$XCC${Priority}". This allows for
some wiggle room for people/future extensions that want to add
initializers between compiler and lib.

Fixes #56922

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D131910
2022-08-16 08:16:18 -07:00
Danila Malyutin 451497a030 [RS4GC] Handle vectors of pointers in non-live clobbering
Fix crash when trying to unconditionally cast alloca type to PointerType

Differential Revision: https://reviews.llvm.org/D131146
2022-08-16 17:47:30 +03:00
Steve Merritt ec60fca752 [CodeView] Use non-qualified names for static local variables
Static variables declared within a routine or lexical block should
be emitted with a non-qualified name.  This allows the variables to
be visible to the Visual Studio watch window.

Differential Revision: https://reviews.llvm.org/D131400
2022-08-16 10:33:43 -04:00
Alexey Bataev 65c7cecb13 [SLP]Fix PR51320: Try to vectorize single store operands.
Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectorization of the value operand of a single store in the basic block,
if the operand value is used only in store.
It should enable extra vectorization and should not increase compile
time significantly.
Fixes https://github.com/llvm/llvm-project/issues/51320

Differential Revision: https://reviews.llvm.org/D131894
2022-08-16 07:25:21 -07:00
David Spickett b812db1464 [LLVM][Debuginfod] Add missing thread include
One of our silent bots is currently failing:
https://lab.llvm.org/staging/#/builders/171/builds/169

With:
<...>/Debuginfod.cpp:298:23: error: no type named 'sleep_for' in namespace 'std::this_thread'
    std::this_thread::sleep_for(Interval);
    ~~~~~~~~~~~~~~~~~~^

Add missing thread include to that file,
which is what all the other users of sleep_for do.

I think we are seeing this now because we disabled
llvm threading for this builder. Maybe debuginfod should account
for that but that's for another time.
2022-08-16 13:56:23 +00:00
Kevin P. Neal 7f768371a1 Fix build error: [FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known.
This should fix these build bot errors:

Step 6 (build-check-mlir-build-only) failure: build (failure)
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(129): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1386): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1388): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
2022-08-16 08:47:36 -04:00
Kevin P. Neal 05ac82de40 [FPEnv][EarlyCSE] Support for CSE when exception behavior is "ignore" or "maytrap" and the rounding mode is known.
Previously we would only CSE constrained FP intrinsics in the default
floating point environment. Exception behavior of "strict" is still not
allowed since we are not allowed to remove any traps in that case.

There are no restrictions on CSE across function calls inside a function.

Differential Revision: https://reviews.llvm.org/D112256
2022-08-16 08:31:42 -04:00
Karl Meakin 6f9423ef06 [AArch64] Add `foldCSELOfCSEl` DAG combine
Differential Revision: https://reviews.llvm.org/D125504
2022-08-16 12:49:11 +01:00
Zain Jaffal 7155ed4289
[AArch64] Add support for 256-bit non temporal loads
Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads.

Reviewed By: fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D131773
2022-08-16 12:19:36 +01:00
Victor Campos 784da8a722 [ARM] Simplify the creation of escaped build attribute values
There is an existing mechanism to escape strings, therefore the
functions created to escape Tag_also_compatible_with values are not
really needed. We can simply use the pre-existing utilities.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D131680
2022-08-16 11:49:33 +01:00
Victor Campos 08c6840f25 [ARM] Parse Tag_also_compatible_with attribute
The ARM Attribute Parser used to parse the value of also_compatible_with
as it is, disregarding the way it is encoded.

This patch does a context aware parsing of the also_compatible_with
attribute. Additionally, some error handling is also done for incorrect
cases.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D130913
2022-08-16 11:22:56 +01:00
Bing1 Yu 807b8cb06c [X86] Fix a lowering issue of mask.compress which has undef float passthrough
Previously, LegaizeDAG didn't check mask.compress's passthrough might be float, and this lead to getConstant crash since it doesn't support fp

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131947
2022-08-16 17:54:45 +08:00
Andre Vieira c6b5a13b7a [TypePromotion] Only search for PHI + ZExt promotion of Integers
Differential Revision: https://reviews.llvm.org/D131948
2022-08-16 10:15:32 +01:00
Max Kazantsev ebabd6bf18 Return "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 354fa0b480.

Returning as is. The patch was reverted due to a miscompile, but
this patch is not causing it. This patch made it possible to infer
some nuw flags in code guarded by `false` condition, and then someone
else to managed to propagate the flag from dead code outside.

Returning the patch to be able to reproduce the issue.
2022-08-16 14:12:36 +07:00
gonglingqin a9d46d9af3 [LoongArch] Add codegen support for fabs
Differential Revision: https://reviews.llvm.org/D131871
2022-08-16 14:41:27 +08:00
Weining Lu d1f36da9e0 [LoongArch] Encode LoongArch specific ELF e_flags to binary by LoongArchTargetStreamer
Reference: https://github.com/loongson/LoongArch-Documentation
The last commit hash (main branch) is:
99016636af64d02dee05e39974d4c1e55875c45b

Note:
There are several PRs [1][2][3] that may affect the e_flags.
After they got closed or merged, we should update the implementation here accordingly.

[1] https://github.com/loongson/LoongArch-Documentation/pull/33
[2] https://github.com/loongson/LoongArch-Documentation/pull/47
[2] https://github.com/loongson/LoongArch-Documentation/pull/61

Differential Revision: https://reviews.llvm.org/D130239
2022-08-16 13:41:50 +08:00
Kazu Hirata eca990702d [ExecutionEngine] Fix a warning
This patch fixes the warning:

  llvm/lib/ExecutionEngine/JITLink/ELF_i386.cpp:66:11: error: unused
  type alias 'Base' [-Werror,-Wunused-local-typedef]
2022-08-15 20:33:10 -07:00
wanglian fbc4c26e9a [SelectionDAG][NFC] Fix return type when used isConstantIntBuildVectorOrConstantInt
and isConstantFPBuildVectorOrConstantFP

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D131870
2022-08-16 10:07:24 +08:00
Kshitij Jain 29fe204b4e Re-apply "[JITLink] Introduce ELF/i386 backend " with correct authorship.
I (lhames) accidentally pushed 5f300397c6 on
Kshitij Jain's behalf without updating the patch author first (my apologies
Kshitij!).

Re-applying with correct authorship.

https://reviews.llvm.org/D131347
2022-08-15 18:44:43 -07:00
Lang Hames 73600b7c8a Revert "[JITLink] Introduce ELF/i386 backend support for JITLink."
This reverts commit 5f300397c6.

No functional issues, I just failed to correctly set authorship on the patch.
2022-08-15 18:44:43 -07:00
Lang Hames 5f300397c6 [JITLink] Introduce ELF/i386 backend support for JITLink.
This initial ELF/i386 JITLink backend enables JIT-linking of minimal ELF i386
object files. No relocations are supported yet.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D131347
2022-08-15 18:35:51 -07:00
Rahman Lavaee df2213f345 [EHStreamer] Omit @LPStart when function has no landing pads
When no landing pads exist for a function, `@LPStart` is undefined and must be omitted.

EH table is generally not emitted for functions without landing pads, except when the personality function is uknown (`!isNoOpWithoutInvoke(classifyEHPersonality(Per))`). In that case, we must omit `@LPStart` even when machine function splitting is enabled.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131626
2022-08-15 17:09:46 -07:00
Aiden Grossman 24cdf97d63 [mlgo] Add ability to create feature-gated development features in regalloc advisor
Currently there is no way to add in development features to the ML
regalloc evict advisor which is useful to have when working on feature
engineering/improving the current model. This patch adds in the ability
to add in development features to the ML regalloc evict advisor which
are gated by a runtime flag and not added in at all if not compiled in
LLVM development mode. This sets the stage for future work where we are
planning on upstreaming some of the newer features that we are currently
experimenting with.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D131209
2022-08-15 16:01:37 -07:00
Vitaly Buka e0e960923f [AArch64] Fix signed integer overflow in CSINC case
Followup to D131815, which overlflows on different
values.
2022-08-15 15:04:20 -07:00
Martin Sebor 65967708d2 [InstCombine] Adjust snprintf folding of constant strings (PR #56598)
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D130494
2022-08-15 15:59:21 -06:00