Commit Graph

28315 Commits

Author SHA1 Message Date
Nikita Popov bb15861e14 [MemCpyOpt] Relax libcall checks
Rather than blocking the whole MemCpyOpt pass if the libcalls are
not available, only disable creation of new memset/memcpy intrinsics
where only load/stores were used previously. This only affects the
store merging and load-store conversion optimization. Other
optimizations are derived from existing intrinsics, which are
well-defined in the absence of libcalls -- not having the libcalls
just means that call simplification won't convert them to intrinsics.

This is a weaker variation of D104801, which dropped these checks
entirely. Ideally we would not couple emission of intrinsics to
libcall availability at all, but as the intrinsics may be legalized
to libcalls we need to be a bit careful right now.

Differential Revision: https://reviews.llvm.org/D106769
2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis 29a3e3dd7b [OpenMPOpt] Expand SPMDization with guarding for target parallel regions
This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106892
2021-08-04 11:49:24 -07:00
Alexey Bataev 214f99b27c Revert "[SLP]Do not emit extra shuffle for insertelements vectorization."
This reverts commit 871ea69803 to fix the
problem if the first vector is not just undef.
2021-08-04 11:28:59 -07:00
Dawid Jurczak 238139be09 [DSE][NFC] Clean up DeadStoreElimination from unused variables
Differential Revision: https://reviews.llvm.org/D106446
2021-08-04 19:44:40 +02:00
Petr Hosek 6660cec568 [InstrProfiling] Emit bias variable eagerly
Rather than emitting the bias variable lazily as needed, emit it
eagerly. This allows profile runtime to refer to this variable
unconditionally without having to use the weak reference. The bias
variable is in a COMDAT so there'll never be more than one instance,
and if it's not needed, linker should be able to GC it, so the overhead
should be minimal.

Differential Revision: https://reviews.llvm.org/D107377
2021-08-04 10:17:08 -07:00
Sander de Smalen fe6ae81ef3 [InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded.
According to the LangRef, a (vscale_range) value of 0 means unbounded.

This patch additionally cleans up the test file vscale_sext_and_zext.ll.
2021-08-04 17:17:37 +01:00
Chris Jackson 21ee38e24f [DebugInfo][LSR] Avoid crashes on large integer inputs
SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may
contain very large integers but the translation does not support
integers greater than 64 bits. This patch adds checks to ensure
conversions of these large integers is not attempted. A regression test
is added to ensure no such translation is attempted.

Reviewed by: StephenTozer

PR: https://bugs.llvm.org/show_bug.cgi?id=51329

Differential Revision: https://reviews.llvm.org/D107438
2021-08-04 15:51:22 +01:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Sjoerd Meijer 30fbb06979 [FuncSpec] Support specialising recursive functions
This adds support for specialising recursive functions. For example:

    int Global = 1;
    void recursiveFunc(int *arg) {
      if (*arg < 4) {
        print(*arg);
        recursiveFunc(*arg + 1);
      }
    }
    void main() {
      recursiveFunc(&Global);
    }

After 3 iterations of function specialisation, followed by inlining of the
specialised versions of recursiveFunc, the main function looks like this:

    void main() {
      print(1);
      print(2);
      print(3);
    }

To support this, the following has been added:
- Update the solver and state of the new specialised functions,
- An optimisation to propagate constant stack values after each iteration of
  function specialisation, which is necessary for the next iteration to
  recognise the constant values and trigger.

Specialising recursive functions is (at the moment) controlled by option
-func-specialization-max-iters and is opt-in for compile-time reasons. I.e.,
the default is -func-specialization-max-iters=1, but for the example above we
would need to use -func-specialization-max-iters=3. Future work is to see if we
can increase the default, or improve the cost-model/heuristics to control
compile-times.

Differential Revision: https://reviews.llvm.org/D106426
2021-08-04 08:07:04 +01:00
Shimin Cui 2d9759c790 [GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc
Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107397
2021-08-03 19:22:53 -04:00
Craig Topper b818da27ab [SimplifyCFG] Enable switch to lookup table for more types.
This transform has been restricted to legal types since
https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660
in 2012.

This is particularly restrictive on RISCV64 which only has i64
as a legal integer type. i32 is a very common type in code
generated from C, but we won't form a lookup table with it.
This also effects other common types like i8/i16 types on ARM,
AArch64, RISCV, etc.

This patch proposes to allow power of 2 types larger than 8 bit, if
they will fit in the largest legal integer type in DataLayout.
These types are common in C code so generally well handled in
the backends.

We could probably do this for other types like i24 and rely on
alignment and padding to allow the backend to use a single wider
load. This isn't my main concern right now and it will need more
tests.

We could also allow larger types up to some limit and let the
backend split into multiple loads, but we need to define that
limit. It's also not my main concern right now.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D107233
2021-08-03 15:35:16 -07:00
Alexey Bataev 871ea69803 [SLP]Do not emit extra shuffle for insertelements vectorization.
If the vectorized insertelements instructions form indentity subvector
(the subvector at the beginning of the long vector), it is just enough
to extend the vector itself, no need to generate inserting subvector
shuffle.

Differential Revision: https://reviews.llvm.org/D107344
2021-08-03 13:18:41 -07:00
Alexey Bataev 7d9d926a18 Revert "[SLP]Improve graph reordering."
This reverts commit e408d1dfab and
2 other (4b25c11321 and
c2deb2afaf) related to fix the problem with the
reordering shuffles.
2021-08-03 12:13:43 -07:00
Sami Tolvanen 7ce1c4da77 ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8ce with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-08-03 11:35:30 -07:00
Dylan Fleming 3943a74666 [InstCombine] Fixed select + masked load fold failure
Fixed type assertion failure caused by trying to fold a masked load with a
select where the select condition is a scalar value

Reviewed By: sdesmalen, lebedev.ri

Differential Revision: https://reviews.llvm.org/D107372
2021-08-03 19:06:12 +01:00
Philip Reames 223835f08b [runtimeunroll] A bit of style cleanup to simplify a following change [NFC]
Use for-range, use the idiomatic pattern for non-loop values, etc..
2021-08-03 10:28:46 -07:00
Krishna 946fd4ea65 Revert "[InstCombine] Remove nnan requirement for transformation to fabs from select"
This reverts commit 6180ce2e2a.
2021-08-03 18:08:11 +05:30
Krishna d99260641b [InstCombine] Fold phi ( inttoptr/ptrtoint x ) to phi (x)
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support to specific cases where this optimization works.

In this patch, we focus on phi-node operands with inttoptr casts.
We know that ptrtoint( inttoptr( ptrtoint x) ) is same as ptrtoint (x).
So, we want to remove this roundtrip cast which goes through phi-node.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D106289
2021-08-03 17:52:59 +05:30
Krishna 6180ce2e2a [InstCombine] Remove nnan requirement for transformation to fabs from select
In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs.
(i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub).
(ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg).

Differential Revision: https://reviews.llvm.org/D106872
2021-08-03 17:52:58 +05:30
David Sherwood 0156f91f3b [NFC] Rename enable-strict-reductions to force-ordered-reductions
I'm renaming the flag because a future patch will add a new
enableOrderedReductions() TTI interface and so the meaning of this
flag will change to be one of forcing the target to enable/disable
them. Also, since other places in LoopVectorize.cpp use the word
'Ordered' instead of 'strict' I changed the flag to match.

Differential Revision: https://reviews.llvm.org/D107264
2021-08-03 09:33:01 +01:00
Shimin Cui 7ce98cf56e [GlobalOpt] Fix the assert for stored once non-pointer to global address
This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107302
2021-08-02 19:23:29 -04:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Roman Lebedev 4ba3326f17
[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
This allows the expansion logic to actually trigger if the argument
was extended from i1 element type, like the rest of the reductions expect.

Alive2 agrees:
https://alive2.llvm.org/ce/z/wcfews (or zext)
https://alive2.llvm.org/ce/z/FCXNFx (or sext)
https://alive2.llvm.org/ce/z/f26zUY (and zext)
https://alive2.llvm.org/ce/z/jprViN (and sext)
2021-08-03 00:54:35 +03:00
Roman Lebedev 554fc9ad0a
[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/3oqir9 (self)
https://alive2.llvm.org/ce/z/6cuI5m (zext)
https://alive2.llvm.org/ce/z/4FL8rD (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Roman Lebedev f47b7b6d10
[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/noXtZ8 (self)
https://alive2.llvm.org/ce/z/JNrN6C (zext)
https://alive2.llvm.org/ce/z/58snuN (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-03 00:29:06 +03:00
Nikita Popov c7770574f9 Revert "[unroll] Move multiple exit costing into consumer pass [NFC]"
This reverts commit 76940577e4.

This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.
2021-08-02 22:23:34 +02:00
Roman Lebedev b9b7162b8b
[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/NbBaeT (self)
https://alive2.llvm.org/ce/z/iEaig4 (zext)
https://alive2.llvm.org/ce/z/meGb3y (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:23 +03:00
Roman Lebedev 0c13798056
[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/XxUScW (self)
https://alive2.llvm.org/ce/z/3usTF- (zext)
https://alive2.llvm.org/ce/z/GVxwQz (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 23:02:22 +03:00
Philip Reames 76940577e4 [unroll] Move multiple exit costing into consumer pass [NFC]
This aligns the multiple exit costing with all the other cost decisions.  Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.
2021-08-02 12:46:23 -07:00
Nikita Popov 380b8a603c [DFAJumpThreading] Use SmallPtrSet for Visited (NFC)
This set is only used for contains checks, so there is no need to
use std::set.
2021-08-02 21:30:25 +02:00
Nikita Popov 3f7aea1a37 [DFAJumpThreading] Use insert return value (NFC)
Rather than find + insert. Also use range based for loop.
2021-08-02 21:21:21 +02:00
Nikita Popov 84602f98c6 [DFAJumpThreading] Remove unnecessary includes (NFC)
This file uses neither unordered_map nor unordered_set.
2021-08-02 21:13:30 +02:00
Nikita Popov e97524cba2 [DFAJumpThreading] Mark DT as preserved in LegacyPM
It is marked as preserved in NewPM, but not LegacyPM.
2021-08-02 21:13:30 +02:00
Roman Lebedev 469793efa7
[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259)
Alive2 agrees:
https://alive2.llvm.org/ce/z/PDansB (self)
https://alive2.llvm.org/ce/z/55D-Xc (zext)
https://alive2.llvm.org/ce/z/LxG3-r (sext)

We already handle `vector_reduce_and(<n x i1>)`,
so let's just combine into the already-handled pattern
and let the existing fold do the rest.
2021-08-02 21:57:51 +03:00
Philip Reames 9016beaa24 [unrollruntime] Pull out a helper function for readability and eventual reuse [nfc] 2021-08-02 11:47:27 -07:00
Philip Reames ebc4c4e3b0 [unroll] Add clarifying comment
The option to not preserve LCSSA is in fact not tested at all in upstream.  I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.
2021-08-02 10:44:56 -07:00
Florian Hahn bb725c9803
[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe.
This patch updates VPInterleaveRecipe::print to print the actual defined
VPValues for load groups and the store VPValue operands for store
groups.

The IR references may become outdated while transforming the VPlan and
the defined and stored VPValues always are up-to-date.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D107223
2021-08-02 18:36:36 +01:00
Roman Lebedev 1e801439be
[InstCombine] `xor` reduction w/ i1 elt type is a parity check
For i1 element type, `xor` and `add` are interchangeable
(https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like
an `add` reduction and consistently transform them both:
https://alive2.llvm.org/ce/z/MjCm5W (self)
https://alive2.llvm.org/ce/z/kgqF4M (skipped zext)
https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)

Though, let's emit the IR that is similar to the one we produce for
`vector_reduce_add(<n x i1>)`.

See https://bugs.llvm.org/show_bug.cgi?id=51259
2021-08-02 20:21:37 +03:00
Florian Mayer 66b4aafa2e [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-08-02 11:34:12 +01:00
Rosie Sumpter f117ed542f [LoopFlatten] Fix missed LoopFlatten opportunity
When the limit of the inner loop is a known integer, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-08-02 11:09:54 +01:00
Shimin Cui 732b05555c [GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc
I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106589
2021-07-31 18:42:02 -04:00
Sanjay Patel f2a322bfcf [SROA] prevent crash on large memset length (PR50910)
I don't know much about this pass, but we need a stronger
check on the memset length arg to avoid an assert. The
current code was added with D59000.
The test is reduced from:
https://llvm.org/PR50910

Differential Revision: https://reviews.llvm.org/D106462
2021-07-31 14:07:30 -04:00
Sanjay Patel a22c99c3c1 [InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant
We can invert a compare constant and preserve the logic
as shown in this sampling:
https://alive2.llvm.org/ce/z/YAXbfs
(In theory, we could deal with non-all-ones/zero as well,
but it doesn't seem worthwhile.)

I noticed this as a part of the x86 codegen difference in
https://llvm.org/PR51259 - it ends up using "test"
instead of "not + cmp" in that example.

This pattern also shows up in https://llvm.org/PR41312
and https://llvm.org/PR50798 .

Differential Revision: https://reviews.llvm.org/D107170
2021-07-31 13:31:12 -04:00
Florian Mayer b5b023638a Revert "[hwasan] Detect use after scope within function."
This reverts commit 84705ed913.
2021-07-30 22:32:04 +01:00
Brendon Cahoon c4c379d633 [LoopStrengthReduction] Fix pointer extend asserts
Additional asserts were added to ScalarEvolution to enforce
pointer/int type rules. An assert is triggered when the LSR pass
attempts to extend a pointer SCEV in GenerateTruncates.

This patch changes GenerateTruncates to exit early if the Formaula
contains a ScaledReg or BaseReg with a pointer type.

Differential Revision: https://reviews.llvm.org/D107185
2021-07-30 17:24:08 -04:00
Fangrui Song a1532ed275 [InstrProfiling] Make CountersPtr in __profd_ relative
Change `CountersPtr` in `__profd_` to a label difference, which is a link-time
constant. On ELF, when linking a shared object, this requires that `__profc_` is
either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that
`.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation.

```
# ELF: R_X86_64_PC64 (PC-relative)
.quad .L__profc_foo-.L__profd_foo

# Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR
.quad l___profc_foo-l___profd_foo

# COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so
# the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo
# As compensation, we truncate CountersDelta in the header so that
# __llvm_profile_merge_from_buffer and llvm-profdata reader keep working.
.quad .L__profc_foo-.L__profd_foo
```

(Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer
has `.lprfd` before `.lprfc`, so we cannot work around by reordering
`.lprfc` and `.lprfd`.)

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
`ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations.
```
% readelf -r pie | awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}'
R_X86_64_JUMP_SLO 331
R_X86_64_TPOFF64 2
R_X86_64_RELATIVE 476059  # was: 607712
R_X86_64_64 2616
R_X86_64_GLOB_DAT 31
```

The absolute function address (used by llvm-profdata to collect indirect call
targets) can be converted to relative as well, but is not done in this patch.

Differential Revision: https://reviews.llvm.org/D104556
2021-07-30 11:52:18 -07:00
Simon Pilgrim afc6b09dee [InstCombine] getMaskedTypeForICmpPair - remove dead code. NFCI.
Ok should be true at this point, so the early-out is dead - replace with an assert.
2021-07-30 19:23:05 +01:00
Alexey Bataev 95e5d401ae [SLP]Improve splats vectorization.
Replace insertelement instructions for splats with just single
insertelement + broadcast shuffle. Also, try to merge these instructions
if they come from the same/shuffled gather node.

Differential Revision: https://reviews.llvm.org/D107104
2021-07-30 10:17:45 -07:00
Kazu Hirata e76ddfa9ef [Transforms] Remove HasValueForBlock (NFC)
The function seems to be unused for at least one year.
2021-07-30 08:56:49 -07:00
Dylan Fleming a7a39ec886 [SVE] Add folds for sign and zero extends of vscale
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105994
2021-07-30 16:02:50 +01:00
Florian Mayer 84705ed913 [hwasan] Detect use after scope within function.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D105201
2021-07-30 13:59:36 +01:00
Alexey Bataev 4b25c11321 [SLP]Fix an assertion for the size of user nodes.
For the nodes with reused scalars the user may be not only of the size
of the final shuffle but also of the size of the scalars themselves,
need to check for this. It is safe to just modify the check here, since
the order of the scalars themselves is preserved, only indeces of the
reused scalars are changed. So, the users with the same size as the
number of scalars in the node, will not be affected, they still will get
the operands in the required order.

Reported by @mstorsjo in D105020.

Differential Revision: https://reviews.llvm.org/D107080
2021-07-30 05:46:44 -07:00
Alexey Bataev f4fb854811 [SLP]Do not consider deleted instruction as external users.
If the instruction was previously deleted, it should not be treated as
an external user. This fixes cost estimation and removes dead
extractelement instructions.

Differential Revision: https://reviews.llvm.org/D107106
2021-07-30 05:37:43 -07:00
Alexey Bataev c2deb2afaf [SLP]Fix a crash in gathered loads analysis.
Need to check that the minimum acceptable vector factor is at least 2,
not 0, to avoid compiler crash during gathered loads analysis.

Differential Revision: https://reviews.llvm.org/D107058
2021-07-30 05:19:17 -07:00
Joseph Huber cd0dd8ece8 [OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
 - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
 - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
 - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
 - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D106802
2021-07-29 19:28:31 -04:00
Andy Kaylor b4d945bacd Fixing an infinite loop problem in InstCombine
Patch by Mohammad Fawaz

This issues started happening after
b373b5990d
Basically, if the memcpy is volatile, the collectUsers() function should
return false, just like we do for volatile loads.

Differential Revision: https://reviews.llvm.org/D106950
2021-07-29 12:57:17 -07:00
Dawid Jurczak 5c315bee8c [DSE] Transform memset + malloc --> calloc (PR25892)
After this change DSE can eliminate malloc + memset and emit calloc.
It's https://reviews.llvm.org/D101440 follow-up.

Differential Revision: https://reviews.llvm.org/D103009
2021-07-29 18:34:10 +02:00
Rosie Sumpter fab5659c79 Revert "[LoopFlatten] Fix missed LoopFlatten opportunity"
This reverts commit 2df8bf9339.

Reverting because it causes an assertion failure.
2021-07-29 15:52:45 +01:00
Jeremy Morse 2537120c87 Follow-up to D105207, only salvage affine SCEVs to avoid a crash
SCEVToIterCountExpr only expects to be fed affine expressions, but
DbgRewriteSalvageableDVIs is feeding it non-affine induction variables.
Following this up with an obvious fix, will add test coverage too if
this avoids D105207 being reverted.
2021-07-29 11:48:08 +01:00
Rosie Sumpter 2df8bf9339 [LoopFlatten] Fix missed LoopFlatten opportunity
When the trip count of the inner loop is a constant, the InstCombine
pass now causes the transformation e.g. imcp ult i32 %inc, tripcount ->
icmp ult %j, tripcount-step (where %j is the inner loop induction
variable and %inc is add %j, step), which is now accounted for when
identifying the trip count of the loop. This is also an acceptable use
of %j (provided the step is 1) so is ignored as long as the compare
that it's used in is also the condition of the inner branch.

Differential Revision: https://reviews.llvm.org/D105802
2021-07-29 09:47:41 +01:00
Joseph Huber adbaa39dfc [Attributor] Change function internalization to not replace uses in internalized callers
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931
2021-07-28 18:57:28 -04:00
Chris Jackson 0ba8595287 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
Reapply commit d675b594f4 that was
reverted due to buildbot failures. A simple fix has been applied to
remove an assertion.

Differential Revision: https://reviews.llvm.org/D105207
2021-07-28 23:04:59 +01:00
Sjoerd Meijer bc43078fe8 [LoopFlatten] Fix bug where SCEVCouldNotCompute object is used
The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute
object if the backedge-taken count is unpredictable. This fix ensures
there is no longer an attempt to use such an object to find the trip
count.

Patch by: Rosie Sumpter.

Differential Revision: https://reviews.llvm.org/D106970
2021-07-28 18:35:08 +01:00
Jeroen Dobbelaere 03b8c69d06 [PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types.
This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned  In D91661#2875179 by @jroelofs.

(The first attempt was in D105983)

D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls.
This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the
generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead
of looking at the available users of a prototype.

Reviewed By: jroelofs

Differential Revision: https://reviews.llvm.org/D106147
2021-07-28 19:30:29 +02:00
Jeroen Dobbelaere dc5570d149 Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types."
This reverts commit 77080a1eb6.

This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the
needed function cleanup. A next patch will then just improve on the name mangling.
2021-07-28 19:30:29 +02:00
Fangrui Song 6da3d8b19c [llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]]
[[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015.

Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.
2021-07-28 09:31:14 -07:00
Chris Jackson 3992896043 Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Reverted due to buildbot failures.
This reverts commit d675b594f4.
2021-07-28 16:44:54 +01:00
Chris Jackson d675b594f4 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
Reapply commit 796b84d26f that was
reverted due to reports of crashes. A minor change now guards against
getVariableLocationOperand() returning a nullptr.

Differential Revision: https://reviews.llvm.org/D106659
2021-07-28 16:28:46 +01:00
Sanjay Patel 5b83261c15 [DivRemPairs] make sure we have a valid CFG for hoisting division
This transform was added with e38b7e8948
and as shown in:
https://llvm.org/PR51241
...it could crash without an extra check of the blocks.

There might be a more compact way to write this constraint,
but we can't just count the successors/predecessors without
affecting a test that includes a switch instruction.
2021-07-28 11:09:12 -04:00
Alexey Bataev 3ad6437fcc [SLP]Fix build on MacOS, NFC. 2021-07-28 06:33:13 -07:00
Alexey Bataev e408d1dfab [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-07-28 05:49:06 -07:00
Florian Hahn c07dd2b885
[LV] Move recurrence backedge fixup code to VPlan::execute (NFC).
As suggested in D105008, move the code that fixes up the backedge value
for first order recurrences to VPlan::execute.

Now all that remains in fixFirstOrderRecurrences is the code responsible
for creating the exit values in the middle block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D106244
2021-07-28 13:32:40 +01:00
David Green 41cedb1c9a [LV][ARM] Tighten up MLA reduction costing
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.

 - The Arm implementation of getExtendedAddReductionCost is altered to
   only provide costs for legal or smaller types. Larger than legal types
   need to be split, which currently does not work very well, especially
   for predicated reductions where the predicate may be legal but needs to
   be split. Currently we limit it to legal or smaller input types.
 - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
   is a pattern that can come up, and can be treated the same as
   reduce(mul(ext, ext)) providing the extension types match.
 - And it has been adjusted to not count the ext in reduce(mul(ext, ext))
   as part of a reduce(mul) pattern.

Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.

Differential Revision: https://reviews.llvm.org/D106166
2021-07-28 12:50:58 +01:00
Chris Jackson 04b94c7cae Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR"
Crashes were reported on the upstreamm revision:
https://reviews.llvm.org/D105207

This reverts commit 796b84d26f.
2021-07-28 10:05:54 +01:00
Jose M Monsalve Diaz 5ab6aedda9 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
Johannes Doerfert 3dca83961c Reapply "[Attributor] Disable simplification AAs if a callback is present""
This reapplies commit cbb709e251 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a62.
2021-07-27 19:14:50 -05:00
Johannes Doerfert aa27430a62 Revert "[Attributor] Disable simplification AAs if a callback is present"
This reverts commit cbb709e251 as it
breaks the tests, which was not supposed to happen. Investigating now.
2021-07-27 18:09:42 -05:00
Johannes Doerfert fd520e75f1 [Attributor] Verify `checkForAllUses` return value properly
Also do not emit more than one remark after Heap2Stack failed.
2021-07-27 17:50:27 -05:00
Johannes Doerfert cbb709e251 [Attributor] Disable simplification AAs if a callback is present
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.
2021-07-27 17:50:26 -05:00
Benjamin Kramer 05815c9f63 Remove unused include that's also a layering violation. NFC. 2021-07-27 21:21:55 +02:00
Enna1 1ee6559ef6 [ASAN] NFC: Remove redundant variable
`StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant.

Reviewed By: vitalybuka, MTC

Differential Revision: https://reviews.llvm.org/D106741
2021-07-27 12:02:37 -07:00
Adam Nemet d87d3615f7 [Matrix] Fix shape for factored transpose
The shape of the input is C x R.

Differential Revision: https://reviews.llvm.org/D106722
2021-07-27 11:36:13 -07:00
Adam Nemet bf7eb48454 [Matrix] RAUW should only replace an instruction in ShapeMap if supportsShapeInfo
As an instruction is replaced in optimizeTransposes RAUW will replace it in
the ShapeMap (ShapeMap is ValueMap so that uses are updated).  In
finalizeLowering however we skip updating uses if they are in the ShapeMap
since they will be lowered separately at which point we pick up the lowered
operands.

In the testcase what happened was that since we replaced the doubled-transpose
with the shuffle, it ended up in the ShapeMap.  As we lowered the
columnwise-load the use in the shuffle was not updated.  Then as we removed
the original columnwise-load we changed that to an undef.  I.e. we ended up
with:

```
%shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32>
                                   ^^^^^
                                  <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6>
```

Besides the fix itself, I have fortified this last bit.  As we change uses to
undef when removing instruction we track the undefed instruction to make sure
we eventually remove those too.  This would have caught the issue at compile
time.

Differential Revision: https://reviews.llvm.org/D106714
2021-07-27 11:36:13 -07:00
Alexey Zhikhartsev 02077da7e7 Add jump-threading optimization for deterministic finite automata
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205
2021-07-27 14:34:04 -04:00
Andy Kaylor b373b5990d Enabling the copy-constant-to-alloca optimization in more instances
Patch by Mohammad Fawaz

This patch allows lifetime calls to be ignored (and later erased) if we
know that the copy-constant-to-alloca optimization is going to happen.
The case that is missed is when the global variable is in a different address
space than the alloca (as shown in the example added to the lit test.)

This used to work before 6da31fa4a6

Differential Revision: https://reviews.llvm.org/D106573
2021-07-27 10:11:43 -07:00
David Sherwood a5dd6c6cf9 [LoopVectorize] Don't interleave scalar ordered reductions for inner loops
Consider the following loop:

  void foo(float *dst, float *src, int N) {
    for (int i = 0; i < N; i++) {
      dst[i] = 0.0;
      for (int j = 0; j < N; j++) {
        dst[i] += src[(i * N) + j];
      }
    }
  }

When we are not building with -Ofast we may attempt to vectorise the
inner loop using ordered reductions instead. In addition we also try
to select an appropriate interleave count for the inner loop. However,
when choosing a VF=1 the inner loop will be scalar and there is existing
code in selectInterleaveCount that limits the interleave count to 2
for reductions due to concerns about increasing the critical path.
For ordered reductions this problem is even worse due to the additional
data dependency, and so I've added code to simply disable interleaving
for scalar ordered reductions for now.

Test added here:

  Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll

Differential Revision: https://reviews.llvm.org/D106646
2021-07-27 17:41:01 +01:00
Anna Thomas 8ee5759fd5 Strip undef implying attributes when moving calls
When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to.

This patch introduces such an API and uses it in relevant passes. See
updated tests.

Fix for PR50744.

Reviewed By: nikic, jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D104641
2021-07-27 10:57:05 -04:00
Tres Popp d9e3449aa8 Handle unused variable when assertions are disabled 2021-07-27 15:43:06 +02:00
Chris Jackson 796b84d26f [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
This reapplies commit 76f3ffb2b2 that was
reverted due to buildbot failures.

- Update lit tests with REQUIRES condition.
- Abandon salvage attempt if SCEVUnknown::getValue() returns nullptr.

Differential Revision: https://reviews.llvm.org/D105207
2021-07-27 14:22:09 +01:00
Chris Jackson 1930c4410d [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
This reverts commit 76f3ffb2b2 because
of a failure on sanitixer-X86-64-linux-autoconf.
2021-07-27 13:36:56 +01:00
Chris Jackson 76f3ffb2b2 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
This patch extends salvaging of debuginfo in the Loop Strength Reduction
(LSR) pass by translating Scalar Evaluations (SCEV) into DIExpressions.
The method is as follows:
- Cache dbg.value intrinsics that are salvageable.
- Obtain a loop Induction Variable (IV) from ScalarExpressionExpander or
  the loop header.
- Translate the IV SCEV into an expression that recovers the current
  loop iteration count. Combine this with the dbg.value's location
  op SCEV to create a DIExpression that salvages the value.

Review by: jmorse

Differential Revision: https://reviews.llvm.org/D105207
2021-07-27 13:00:36 +01:00
Sander de Smalen d7dd12aee3 [LV] Disable Scalable VFs when tail folding is enabled b/c of low tripcount.
The loop vectorizer may decide to use tail folding when the trip-count
is low. When that happens, scalable VFs are no longer a candidate,
since tail folding/predication is not yet supported for scalable vectors.

This can be re-enabled in a future patch.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D106657
2021-07-27 11:37:21 +01:00
Rosie Sumpter 491ac28028 [LoopFlatten] Use SCEV and Loop APIs to identify increment and trip count
Replace pattern-matching with existing SCEV and Loop APIs as a more
robust way of identifying the loop increment and trip count. Also
rename 'Limit' as 'TripCount' to be consistent with terminology.

Differential Revision: https://reviews.llvm.org/D106580
2021-07-27 08:42:59 +01:00
Johannes Doerfert 70b75f62fc [OpenMP] Try to simplify all loads in device code
Eliminating loads/stores in the device code is worth the extra effort,
especially for the new device runtime.

At the same time we do not compute AAExecutionDomain for non-device code
anymore, there is no point.

Differential Revision: https://reviews.llvm.org/D106845
2021-07-27 01:44:15 -05:00
Johannes Doerfert c55e18824d [Attributor][FIX] Copy all members in the assignment operator
Also improve debug output slightly.
2021-07-27 01:44:13 -05:00
Johannes Doerfert d4bfce5521 [Attributor] Utilize the InstSimplify interface to simplify instructions
When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.

Depends on D106189

Differential Revision: https://reviews.llvm.org/D106190
2021-07-27 00:56:23 -05:00
Chuanqi Xu 0237dbfdd3 [Coroutine] Record the elided coroutines
Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D105606
2021-07-27 13:14:09 +08:00
wlei f0d41b58da [CSSPGO] Tweak ICP threshold in top-down inliner
This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future.

Reviewed By: wenlei, hoy, wmi

Differential Revision: https://reviews.llvm.org/D106588
2021-07-26 21:49:20 -07:00
Johannes Doerfert 25a3130d89 [Local] Do not introduce a new `llvm.trap` before `unreachable`
This is the second attempt to remove the `llvm.trap` insertion after
https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6
reverted the first one. It is not clear what the exact issue was back
then and it might already be gone by now, it has been >5 years after
all.

Replaces D106299.

Differential Revision: https://reviews.llvm.org/D106308
2021-07-26 23:33:36 -05:00
Johannes Doerfert 41bd26dff9 [Attributor] Delete dead stores
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188
2021-07-26 23:33:36 -05:00
Johannes Doerfert adddd3dbda [Attributor] Introduce getPotentialCopiesOfStoredValue and use it
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185
2021-07-26 23:33:36 -05:00
Jun Ma 958dddf7df [NFC][InstCombine] Fix typo 2021-07-27 11:33:10 +08:00
Shilei Tian e97e0a4fad [AbstractAttributor] Fold __kmpc_parallel_level if possible
Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
of 0, 129, 130, etc. that also indicate activeness.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106154
2021-07-26 22:46:19 -04:00
Johannes Doerfert be2b569646 [OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass
While rewriteDeviceCodeStateMachine should probably be folded into
buildCustomStateMachine, we at least need the optimization to happen.
This was not reliably the case in the CGSCC pass but in the Module pass
it seems to work reliably.

This also ports a test to the new kernel encoding (target_init/deinit),
and makes sure we cannot run the kernel in SPMD mode.

Differential Revision: https://reviews.llvm.org/D106345
2021-07-26 21:26:07 -05:00
Johannes Doerfert e6f3e648c9 [Attributor][FIX] Do not return CHANGED unconditionally
This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and
over again. Unfortunately it turned out to be hard to reproduce the
behavior in a reasonable way.
2021-07-26 21:22:02 -05:00
Johannes Doerfert 8befd05aad [Attributor][FIX] Track change status for AAIsDead properly
If we add a new live edge we need to indicate a change or otherwise the
new live block is not shown to users. Similarly, new known dead ends and
a changed `ToBeExploredFrom` set need to cause us to return CHANGED.
2021-07-26 21:21:59 -05:00
Roman Lebedev 1901c98dd8
[SimplifyCFG] SwitchToLookupTable(): don't increase ret count
The very next SimplifyCFG pass invocation will tail-merge these two ret's
anyways, there is not much point in creating more work for ourselves.
2021-07-26 23:29:55 +03:00
Roman Lebedev 08efc2e68d
[SimplifyCFG] Drop support for simplifying cond branch to two (different) ret's
Nowadays, simplifycfg pass already tail-merges all the ret blocks together
before doing anything, and it should not increase the count of ret's,
so this is dead code.
2021-07-26 23:29:52 +03:00
Roman Lebedev 7c5f104e45
[SimplifyCFG] Drop support for duplicating ret's into uncond predecessors
This functionality existed only under a default-off flag,
and simplifycfg nowadays prefers to not increase the count of ret's.
2021-07-26 23:29:21 +03:00
Sander de Smalen 13ccb09725 [LV] Don't let ForceTargetInstructionCost override Invalid cost.
Invalid costs can be used to avoid vectorization with a given VF, which is
used for scalable vectors to avoid things that the code-generator cannot
handle. If we override the cost using the -force-target-instruction-cost
option of the LV, we would override this mechanism, rendering the flag useless.

This change ensures the cost is only overriden when the original cost that
was calculated is valid. That allows the flag to be used in combination
with the -scalable-vectorization option.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106677
2021-07-26 20:27:49 +01:00
Reid Kleckner d56e698552 [SimplifyCFG] Remove stale comment after d7378259aa, NFC 2021-07-26 12:25:29 -07:00
Joseph Huber e757a3b05f [OpenMP][NFC] Remove unncessary capture in RAII struct
Summary:
There was an unnecessary variable assigned to the information cache when we
only need it in the constructor to extract the function declaration.
2021-07-26 15:05:55 -04:00
Eli Friedman 5c486ce04d [LLVM IR] Allow volatile stores to trap.
Proposed alternative to D105338.

This is ugly, but short-term I think it's the best way forward: first,
let's formalize the hacks into a coherent model. Then we can consider
extensions of that model (we could have different flavors of volatile
with different rules).

Differential Revision: https://reviews.llvm.org/D106309
2021-07-26 10:51:00 -07:00
Florian Hahn 6d753b0751
[LAA] Remove RuntimeCheckingPtrGroup::RtCheck member (NFC).
This patch removes RtCheck from RuntimeCheckingPtrGroup to make it
possible to construct RuntimeCheckingPtrGroup objects without a
RuntimePointerChecking object. This should make it easier to
re-use the code to generate runtime checks, e.g. in D102834.

RtCheck was only used to access the pointer info for a given index.
Instead, the start and end expressions can be passed directly.

For code-gen, we also need to know the address space to use. This can
also be explicitly passed at construction.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105481
2021-07-26 17:38:10 +01:00
Sander de Smalen b9051ba848 [LV] Remove assert that VF cannot be scalable in setCostBasedWideningDecision.
Scalarization for scalable vectors is not (yet) supported, so the
LV discards a VF when scalarization is chosen as the widening
decision. It should therefore not assert that the VF is not scalable
when it computes the decision to scalarize.

The code can get here when both the interleave-cost, gather/scatter cost
and scalarization-cost are all illegal. This may e.g. happen for SVE
when the VF=1, to avoid generating `<vscale x 1 x eltty>` types that
the code-generator cannot yet handle.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106656
2021-07-26 17:11:45 +01:00
Nikita Popov f921bf6049 [MergeICmps] Collect block instructions once (NFC)
Collect the relevant instructions for a given BCECmpBlock once
on construction, rather than repeating this logic in multiple
places.
2021-07-26 18:07:20 +02:00
Nikita Popov c691651c53 [MergeICmps] Try to fix MSVC build failure
Apparently this fails to line up the types -- try to sidestep the
issue entirely by writing the code in a more reasonable way: Walk
over the operands and perform a set lookup, rather than walking
over the set and performing an operand scan.
2021-07-26 17:31:27 +02:00
Sanjay Patel 87d604ffe4 [SimplifyLibCalls] avoid crash on pointer math
We could try harder to screen out libcalls by
function signature (and that would be a much larger
change than for sprintf alone), but that might make
the transition to type-less pointers more difficult.

https://llvm.org/PR51200
2021-07-26 11:08:45 -04:00
Sanjay Patel d8260269c3 [SimplifyLibCalls] reduce code duplication; NFC 2021-07-26 11:08:45 -04:00
Nikita Popov 0d3807b365 [MergeICmps] Separate out BCECmp and use Optional (NFC)
Separate out the BCECmp part from BCECmpBlock, which just stores
the comparison atoms without the branch instruction. At the same
time switch the code to return Optional<> rather than objects in
invalid state and partially constructed objects.
2021-07-26 17:06:43 +02:00
Sander de Smalen 981e9dce54 [LV] Don't assume isScalarAfterVectorization if one of the uses needs widening.
This fixes an issue that was found in D105199, where a GEP instruction
is used both as the address of a store, as well as the value of a store.
For the former, the value is scalar after vectorization, but the latter
(as value) requires widening.

Other code in that function seems to prevent similar cases from happening,
but it seems this case was missed.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106164
2021-07-26 16:01:55 +01:00
Florian Hahn 7a1e73f0b9
Recommit "[VPlan] Add recipe for first-order rec phis, make splicing explicit."
This reverts the revert commit b1777b04dc.

The patch originally got reverted due to a crash:
https://bugs.chromium.org/p/chromium/issues/detail?id=1232798#c2

The underlying issue was that we were not using the stored values from
the modified memory recipes, but the out-of-date values directly from
the IR (accessed via the VPlan). This should be fixed in d995d6376. A
reduced version of the reproducer has been added in 93664503be.
2021-07-26 15:50:30 +01:00
Nikita Popov 33146857e9 [IR] Consider non-willreturn as side effect (PR50511)
This adjusts mayHaveSideEffect() to return true for !willReturn()
instructions. Just like other side-effects, non-willreturn calls
(aka "divergence") cannot be removed and cannot be reordered relative
to other side effects. This fixes a number of bugs where
non-willreturn calls are either incorrectly dropped or moved. In
particular, it also fixes the last open problem in
https://bugs.llvm.org/show_bug.cgi?id=50511.

I performed a cursory review of all current mayHaveSideEffect()
uses, which convinced me that these are indeed the desired default
semantics. Places that do not want to consider non-willreturn as a
sideeffect generally do not want mayHaveSideEffect() semantics at
all. I identified two such cases, which are addressed by D106591
and D106742. Finally, there is a use in SCEV for which we don't
really have an appropriate API right now -- what it wants is
basically "would this be considered forward progress". I've just
spelled out the previous semantics there.

Differential Revision: https://reviews.llvm.org/D106749
2021-07-26 16:35:14 +02:00
Alexey Bataev 6ca48efcf6 [SLP]Fix costs calculations.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
2021-07-26 07:14:03 -07:00
Nikita Popov ffb3277b00 [SimplifyCFG] Improve store speculation check
isSafeToSpeculateStore() looks for a preceding store to the same
location to make sure that introducing a new store of the same
value is safe. It currently bails on intervening mayHaveSideEffect()
instructions. However, I believe just checking mayWriteToMemory()
is sufficient there -- we just need to make sure that we know which
value was stored, we don't care if we can unwind in the meantime.

While looking into this, I started having some doubts about the
correctness of the transform with regard to thread safety. While
we don't try to hoist non-simple stores, I believe we also need
to make sure that the preceding store is simple as well. Otherwise
we could introduce a spurious non-atomic write after an atomic write
-- under our memory model this would result in a subsequent undef
atomic read, even if the second write stores the same value as the
first.

Example: https://alive2.llvm.org/ce/z/q_3YAL

Differential Revision: https://reviews.llvm.org/D106742
2021-07-26 15:01:00 +02:00
Kerry McLaughlin e484e1ae03 [SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths
Fixes more casts to `<FixedVectorType>` for the cases where the
instruction is a Insert/ExtractElementInst.

For fixed-width, this part of truncateToMinimalBitWidths is tested by
AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part
of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add
one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract
element instructions for instance, which is not possible for scalable vectors.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106163
2021-07-26 13:44:51 +01:00
Alexey Bataev d7cb2a0796 Revert "[SLP]Fix costs calculations."
This reverts commit a053afed49 to fix
buildbots.
2021-07-26 05:42:34 -07:00
Alexey Bataev a053afed49 [SLP]Fix costs calculations.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.

Differential Revision: https://reviews.llvm.org/D106578
2021-07-26 04:37:22 -07:00
Florian Hahn d995d63767
[VPlan] Use stored value from recipes for interleave groups.
Instead of getting the VPValue for the stored IR values through the
current plan, use the stored value of the recipes directly.

This way, the correct VPValues are used if the store recipes have been
modified in the VPlan and the IR value is not correct any longer. This
can happen, e.g. due to D105008.
2021-07-26 12:05:23 +01:00
Dylan Fleming 20b0fa91c9 [SVE] Add support for folding for select + masked loads
Add folds to instcombine to support the removal of select instruction when the masked_load is guaranteed to zero the same lanes, i.e. select(mask, mload(,,mask,0), 0) -> mload(,,mask,0).

Patch originally authored by @paulwalker-arm

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106376
2021-07-26 11:58:41 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Roman Lebedev c2dacb1cd3
[SimplifyCFG] Fold branch to common dest: if branch is unpredictable, prefer to speculate
This is consistent with the two other usages of prof md in this pass.
2021-07-26 02:57:19 +03:00
Roman Lebedev 59a5964e03
[SimplifyCFG] Don't speculatively execute BB[s] if they are predictably not taken
Same as D106650, but for `FoldTwoEntryPHINode()`

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106717
2021-07-26 02:55:15 +03:00
Roman Lebedev e58ce35f7b
[SimplifyCFG] Don't speculatively execute BB if it's predictably not taken
If the branch isn't `unpredictable`, and it is predicted to *not* branch
to the block we are considering speculatively executing,
then it seems counter-productive to execute the code that is predicted not to be executed.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D106650
2021-07-26 02:55:14 +03:00
Nico Weber b1777b04dc Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit."
Makes clang crash: https://reviews.llvm.org/D105008#2903350
This reverts commit d2a73fb44e.

Also revert a minor formatting follow-up:
This reverts commit 82834a6732.
2021-07-25 17:39:28 -04:00
Joseph Huber 58725c12bb [OpenMP] Introduce RAII to protect certain RTL calls from DCE
This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707
2021-07-25 14:15:47 -04:00
Nikita Popov 087a8eea35 [Attributes] Clean up handling of UB implying attributes (NFC)
Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.
2021-07-25 18:21:13 +02:00
Krishna Kariya 7bd361200a [InstCombine] Fix PR47960 - Incorrect transformation of fabs with nnan flag
Bug Fix for PR: https://llvm.org/PR47960

This patch makes sure that the fast math flag used in the 'select'
instruction is the same as the 'fabs' instruction after the transformation.

Differential Revision: https://reviews.llvm.org/D101727
2021-07-25 10:43:33 -04:00
hyeongyu kim aca5aeb752 [InstCombine] Add freezeAllUsesOfArgument to visitFreeze
In D106041, a freeze was added before the branch condition to solve the miscompilation problem of SimpleLoopUnswitch.
However, I found that the added freeze disturbed other optimizations in the following situations.
```
arg.fr = freeze(arg)
use(arg.fr)
...
use(arg)
```
It is a problem that occurred when arg and arg.fr were recognized as different values.
Therefore, changing to use arg.fr instead of arg throughout the function eliminates the above problem.
Thus, I add a function that changes all uses of arg to freeze(arg) to visitFreeze of InstCombine.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D106233
2021-07-24 18:08:58 +09:00
Kuter Dinel 96709823ec [AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997
2021-07-24 06:07:15 +03:00
Kuter Dinel 0cd964ff25 [Attributor][FIX] checkForAllInstructions, correctly handle declarations
checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration

The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.

Differential Revision: https://reviews.llvm.org/D106625
2021-07-24 02:21:29 +03:00
Roman Lebedev 943f85123b
[NFC][SimplifyCFG] Make 'conditional block' handling more straight-forward
This will simplify making use of profile weights
to not perform the speculation when obviously unprofitable.
2021-07-24 00:18:27 +03:00
Roman Lebedev 418dba0606
[NFC][SimplifyCFG] FoldTwoEntryPHINode(): make better use of GetIfCondition() returning dom block 2021-07-24 00:18:26 +03:00
Roman Lebedev 2aa2fdeed9
[NFC][BasicBlockUtils] Refactor GetIfCondition() to return the branch, not it's condition
Otherwise e.g. the FoldTwoEntryPHINode() has to do a lot of legwork
to re-deduce what is the dominant block (i.e. for which block
is this branch the terminator).
2021-07-24 00:18:26 +03:00
Nikita Popov f502683750 [MergeICmps] Relax sinking check
The check for sinking instructions past the load + cmp sequence
currently checks for side-effects, which includes writing to memory
and unwinding. However, I don't believe we care about sinking the
instructions past an unwind (as they don't have any side-effects
themselves).

Differential Revision: https://reviews.llvm.org/D106591
2021-07-23 22:16:11 +02:00
Shilei Tian ae69f46867 [AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding `__kmpc_is_spmd_exec_mode`
Since we are using assumed information now, the logic should be refined to avoid
unncessary assertion.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106630
2021-07-23 13:36:47 -04:00
Giorgis Georgakoudis f97de4cb0b [OpenMPOpt] Move dedup runtime calls after init for target regions
Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106556
2021-07-23 05:54:01 -07:00
Dawid Jurczak bc536c7101 Revert "[DSE] Transform memset + malloc --> calloc (PR25892)"
This reverts commit 43234b1595.

Reason: We should detect that we are implementing 'calloc' and bail out.
2021-07-23 11:51:59 +02:00
Vitaly Buka fef86a380a [hwasan] Fix uninitialized DisableOptimization 2021-07-23 02:25:33 -07:00
Serge Pavlov 1c64b5dc5e [ConstantFolding] Fold constrained arithmetic intrinsics
Constfold constrained variants of operations fadd, fsub, fmul, fdiv,
frem, fma and fmuladd.

The change also sets up some means to support for removal of unused
constrained intrinsics. They are declared as accessing memory to model
interaction with floating point environment, so they were not removed,
as they have side effect. Now constrained intrinsics that have
"fpexcept.ignore" as exception behavior are removed if they have no uses.
As for intrinsics that have exception behavior other than "fpexcept.ignore",
they can be removed if it is known that they do not raise floating point
exceptions. It happens when doing constant folding, attributes of such
intrinsic are changed so that the intrinsic is not claimed as accessing
memory.

Differential Revision: https://reviews.llvm.org/D102673
2021-07-23 14:39:51 +07:00
Johannes Doerfert 6ca969353c [Attributor] If provided, only look at simplification callbacks not IR
A simplification callback can mean that the IR value is modified beyond
the apparent IR semantics. That is, a `i1 true` could be replaced by an
`i1 false` based on high-level domain-specific information. If a user
provides a simplification callback we will not look at the IR but
instead give up if the callback returns a nullptr.
2021-07-22 23:57:37 -05:00
Giorgis Georgakoudis eaab880e45 [Attributor][Fix] Add overrides for AA2HS analysis 2021-07-22 18:20:14 -07:00
Giorgis Georgakoudis f8c40ed8f8 [OpenMP] Use AAHeapToStack/AAHeapToShared analysis in SPMDization
SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105634
2021-07-22 18:08:37 -07:00
Hongtao Yu ab5ac659c8 [CSSPGO] Fix a typo in SampleContextTracker
Fixing a typo in SampleContextTracker to use debug name when debug linkage name is no present. This should only affect C programs.

Saw 0.6% perf win on Cinder which is mostly C code.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D106599
2021-07-22 16:44:50 -07:00
Florian Mayer 96c63492cb [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 16:20:27 -07:00
Roman Lebedev d7378259aa
[SimplifyCFG] SimplifyCondBranchToTwoReturns(): really only deal with different ret blocks
This function is called when some predecessor of an empty return block
ends with a conditional branch, with both successors being empty ret blocks.

Now, because of the way SimplifyCFG works, it might happen to simplify
one of the blocks in a way that makes a conditional branch
into an unconditional one, since it's destinations are now identical,
but it might not have actually simplified said conditional branch
into an unconditional one yet.

So, we have to check that ourselves first,
especially now that SimplifyCFG aggressively tail-merges
all ret and resume blocks.

Even if it was an unconditional branch already,
`SimplifyCFGOpt::simplifyReturn()` doesn't call `FoldReturnIntoUncondBranch()`
by default.
2021-07-23 00:36:59 +03:00
Roman Lebedev 7ef6f01909
[SimplifyCFG] FoldTwoEntryPHINode(): bailout on inverted logical and/or (PR51149)
The logical (select) form of and/or will now be a source of problems.
We don't really account for it's inverted form, yet it exists,
and presumably we should treat it just like non-inverted form:
https://alive2.llvm.org/ce/z/BU9AXk

https://bugs.llvm.org/show_bug.cgi?id=51149 reports a reportedly-serious
perf regression that will hopefully be mitigated by this.
2021-07-22 22:19:34 +03:00
Fangrui Song 3b181568db [Matrix] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build after D106457. NFC 2021-07-22 11:33:02 -07:00
Adam Nemet ce5b1320a7 [Matrix] Fix miscompile for NT matmul if the transpose has other use
We should only add the fake lowering entry for the matrix remark if the
transpose is not lowered on its own.  `MapVector::insert` is used to insert
the entry during proper lowering which does not overwrite the fake entry in
the map.

We actually had test coverage for this but the reference output code was
wrong; it was storing undef rather than the transposed column.

Also add an assert that would have caught this.

Differential Revision: https://reviews.llvm.org/D106457
2021-07-22 10:45:56 -07:00
Shilei Tian 1a7f779022 [OpenMPOpt] Add support for BooleanStateWithSetVector
D101977 added `BooleanStateWithPtrSetVector` to store pointers to a set meanwhile
tracking boolean state. One of the limitation is that it can only store pointer.
We might want it to store other types of values, such as integer for parallel
level. This patch generalizes the idea and create `BooleanStateWithSetVector`.
`BooleanStateWithPtrSetVector` therefore becomes a type alias of `BooleanStateWithSetVector`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106149
2021-07-22 13:12:29 -04:00
Kazu Hirata f6413d8aaa [Transforms] Remove getOrCreateInitFunction (NFC)
The last use was removed on Jan 16, 2019 in commit
81101de585.
2021-07-22 06:30:39 -07:00
Caroline Concatto 5a4de84d55 [LoopVectorize] Fix crash for predicated instruction with scalable VF
This patch avoids computing discounts for predicated instructions  when the
VF is scalable.
There is no support for vectorization of loops with division because the
vectorizer cannot guarantee that zero divisions will not happen.

This loop now does not use VF scalable

```
for (long long i = 0; i < n; i++)
    if (cond[i])
      a[i] /= b[i];
```

Differential Revision: https://reviews.llvm.org/D101916
2021-07-22 12:48:27 +01:00
Florian Mayer 789a4a2e5c Revert "[hwasan] Use stack safety analysis."
This reverts commit bde9415fef.
2021-07-22 12:16:16 +01:00
Dawid Jurczak 11338e998d [LoopIdiom] Transform memmove-like loop into memmove (PR46179)
The purpose of patch is to learn Loop idiom recognition pass how to recognize simple memmove patterns
in similar way like GCC: https://godbolt.org/z/fh95e83od
LoopIdiomRecognize already has machinery for memset and memcpy recognition, patch tries to extend exisiting capabilities with minimal effort.

Differential Revision: https://reviews.llvm.org/D104464
2021-07-22 13:05:43 +02:00
Florian Mayer bde9415fef [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 12:04:54 +01:00
Simon Pilgrim 1c9bec727a [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-07-22 10:58:51 +01:00
Johannes Doerfert 0c0eb76782 [Attributor][FIX] Improve call graph updating
If we remove a non-intrinsic instruction we need to tell the (old) call
graph about it. This caused problems with some features down the line as
they allowed to removed calls more aggressively.
2021-07-22 00:07:56 -05:00
Johannes Doerfert 94d3b59c56 [Attributor][FIX] Do not introduce multiple instances of SSA values
If we have a recursive function we could create multiple instantiations
of an SSA value, one per recursive invocation of the function. This is a
problem as we use SSA value equality in various places. The basic idea
follows from this test:

```
static int r(int c, int *a) {
  int X;
  return c ? r(false, &X) : a == &X;
}

int test(int c) {
  return r(c, undef);
}
```

If we look through the argument `a` we will end up with `X`. Using SSA
value equality we will fold `a == &X` to true and return true even
though it should have been false because `a` and `&X` are from different
instantiations of the function.

Various tests for this  have been placed in value-simplify-instances.ll
and this commit fixes them all by avoiding to produce simplified values
that could be non-unique at runtime. Thus, the result of a simplify
value call will always be unique at runtime or the original value, both
do not allow to accidentally compare two instances of a value with each
other and conclude they are equal statically (pointer equivalence) while
they are unequal at runtime.
2021-07-22 00:07:55 -05:00
Johannes Doerfert c819266ecc [Attributor] Improve the Attributor::getAssumedConstant interface
Similar to Attributor::getAssumedSimplified we need to allow IRPs
directly to get the right simplification callback (and context).
2021-07-22 00:07:55 -05:00
Johannes Doerfert c4b1fe05dd [OpenMP][FIX] Use name + type checks not only name checks for calls
A call that is analyzed in an optimization needs to be verified against
the name and type of the runtime function to avoid that we look at
arguments that do not exist (anymore). This can happen if the signature
was rewritten. Since we will not set RFI.Declaration if the type doesn't
match we can use it (if it's not null) to determine if the signature is
as expected.

Differential Revision: https://reviews.llvm.org/D106341
2021-07-21 22:51:05 -05:00
Johannes Doerfert c7781a0978 [Attributor][NFC] Clang format 2021-07-21 22:51:05 -05:00
Joseph Huber 16206d17cd [OpenMP] Strip NoInline from known OpenMP runtime functions
This patch strips the NoInline attribute from known OpenMP runtime functions.
This is done so that we can denote certain runtime functions as NoInline to
ensure their call sites are intact so they can be checked by OpenMPOpt. We
don't wan't this noinline attribute to remain for any functions after OpenMPOpt
has been run however.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106482
2021-07-21 21:18:26 -04:00
Joseph Huber 196fe994b8 [OpenMP] Fold `__kmpc_is_generic_main_thread_id` if possible
This patch adds the ability to fold `__kmpc_is_generic_main_thread_id` if we
know for a fact that it is executed by the initial thread using
AAExecutionDomain. This combined with folding `__kmpc_is_spmd_exec_mode` will
allow us to fully fold `__kmpc_is_generic_main_thread`.

Depends on D106438 D106437

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106439
2021-07-21 21:18:22 -04:00
Joseph Huber 4a66860424 [OpenMP] Add an option to disable function internalization
Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438
2021-07-21 21:18:18 -04:00
Joseph Huber 7d57639264 [OpenMP] Add new execution mode for SPMD execution with Generic semantics
Qualified kernels can be transformed from generic-mode to SPMD mode using an
optimization in OpenMPOpt. This patch introduces a new execution mode to
indicate kernels that have been transformed from generic-mode to SPMD-mode.
These kernels have SPMD-mode execution, but need generic-mode semantics for
scheduling the blocks and threads. Without this far too few blocks will be
scheduled for a generic region as SPMD mode expects the trip count to be
divided by the number of threads.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D106460
2021-07-21 20:57:28 -04:00
Fangrui Song 7b78956224 [sanitizer] Place module_ctor/module_dtor in llvm.used
This removes an abuse of ELF linker behaviors while keeping Mach-O/COFF linker
behaviors unchanged.

ELF: when module_ctor is in a comdat, this patch removes reliance on a linker
abuse (an SHT_INIT_ARRAY in a section group retains the whole group) by using
SHF_GNU_RETAIN. No linker behavior difference when module_ctor is not in a comdat.

Mach-O: module_ctor gets `N_NO_DEAD_STRIP`. No linker behavior difference
because module_ctor is already referenced by a `S_MOD_INIT_FUNC_POINTERS`
section (GC root).

PE/COFF: no-op. SanitizerCoverage already appends module_ctor to `llvm.used`.
Other sanitizers: llvm.used for local linkage is not implemented in
`TargetLoweringObjectFileCOFF::emitLinkerDirectives` (once implemented or
switched to a non-local linkage, COFF can use module_ctor in comdat (i.e.
generalize ELF-specific rL301586)).

There is no object file size difference.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D106246
2021-07-21 14:03:26 -07:00
Nikita Popov aa5adc0c1c [SimplifyCFG] Fix if conversion with opaque pointers
We need to make sure that the value types are the same. Otherwise
we both may not have the necessary dereferenceability implication,
nor can we directly form the desired select pattern.

Without opaque pointers this is enforced implicitly through the
pointer comparison.
2021-07-21 22:24:07 +02:00
Sanjay Patel f14495dc75 [SROA] avoid crash on memset with constant expression length
https://llvm.org/PR50888
2021-07-21 15:20:28 -04:00
Giorgis Georgakoudis 3f71b425b2 [Attributor] Preserve BBs and instructions added in AA manifests
Manifesting AbstractAttributes may add new BBs in the IR. This patch provides an interface to register those BBs in the Attributor so that those BBs and containing instructions are not deleted as dead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106383
2021-07-21 11:27:00 -07:00
Giorgis Georgakoudis b0e06e1fc0 [Attributor][NFC] Modify isAssumedHeapToStack for const argument
There is no need for a non-const argument interface and the const argument modification covers existing and upcoming use cases.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106418
2021-07-21 10:28:21 -07:00
Arthur Eubanks 8bc298d041 [NewPM][Inliner] Check if deleted function is in current SCC
In weird cases, the inliner will inline internal recursive functions,
sometimes causing them to have no more uses, in which case the
inliner will mark the function to be deleted. The function is
actually deleted after the call to
updateCGAndAnalysisManagerForCGSCCPass(). In
updateCGAndAnalysisManagerForCGSCCPass(), UR.UpdatedC may be set to
the SCC containing the function to be deleted. Then the inliner calls
CG.removeDeadFunction() which can cause that SCC to be deleted, even
though it's still stored in UR.UpdatedC.

We could potentially check in the wrappers/pass managers if UR.UpdatedC
is in UR.InvalidatedSCCs before doing anything with it, but it's safer
to do this as close to possible to the call to CG.removeDeadFunction()
to avoid issues with allocating a new SCC in the same address as
the deleted one.

It's hard to find a small test case since we need to have recursive
internal functions be reachable from non-internal functions, yet they
need to become non-recursive and not referenced by other functions when
inlined.

Similar to https://reviews.llvm.org/D106306.

Fixes PR50788.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D106405
2021-07-21 08:47:45 -07:00
Kazu Hirata ba2dd12d4f [InstCombine] Remove CreateOverflowTuple (NFC)
The last use was removed On Jun 3, 2020 in commit
2a6c871596.
2021-07-21 07:07:53 -07:00
David Green 72dc5cab4f [LV] Make use of PatternMatchers in getReductionPatternCost. NFC
Pulled out of D106166, this modifies getReductionPatternCost to use
PatternMatchers, hopefully simplifying the code a little.
2021-07-21 11:34:30 +01:00
Rosie Sumpter 44c9adb414 [LoopFlatten][LoopInfo] Use Loop to identify latch compare instruction
Make getLatchCmpInst non-static and use it in LoopFlatten as a more
robust way of identifying the compare.

Differential Revision: https://reviews.llvm.org/D106256
2021-07-21 10:14:18 +01:00
Vitaly Buka a4904ebb88 [NFC][hwasan] Remove "pragma GCC poison"
With ifdefs they make code less readable.
2021-07-20 19:10:05 -07:00
Vitaly Buka cd4d244757 [NFC][hwasan] Simplify expression 2021-07-20 19:10:05 -07:00
Sami Tolvanen e901e581ef Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 700d07f8ce.

Reverting due to a ThinLTO+CFI breakage on -msvc targets.
2021-07-20 13:59:46 -07:00
Fangrui Song 3924877932 [IR] Rename `comdat noduplicates` to `comdat nodeduplicate`
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00
Nikita Popov 0c794abff1 [ThinTLOBitcodeWriter] Fix unused variable warning (NFC) 2021-07-20 20:19:47 +02:00
Nikita Popov ea014c5bbf [Inline] Fix noalias addition on simplified instructions (PR50589)
When adding noalias/alias.scope metadata, we analyze the instructions
of the original callee, and then place metadata on the corresponding
inlined instructions in the caller as provided by VMap. However, this
assumes that this actually a clone of the instruction, rather than
the result of simplification. If simplification occurred, the
instruction that VMap points to may not have any relationship as far
as ModRef behavior is concerned.

Fix this by tracking simplified instructions during cloning and then
only processing instructions that have not been simplified. This is
done with an additional map form original to cloned instruction,
into which we only insert if no simplification is performed. The
mapping in VMap can then be compared to this map. If they're the
same, the instruction hasn't been simplified. (I originally wanted
to only track a set of simplified instructions, but that wouldn't
work if the instruction only gets simplified afterwards, e.g. based
on rewritten phis.)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50589.

Differential Revision: https://reviews.llvm.org/D106242
2021-07-20 19:52:41 +02:00
Sami Tolvanen 700d07f8ce ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them. This version uses module inline assembly
to avoid issues with LowerTypeTestsModule.

Relands commmit 8e3b5cb39e with arch
specific tests fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-07-20 10:30:02 -07:00
Giorgis Georgakoudis e8439ec893 [OpenMP] Set RequiresFullRuntime false in SPMDization
SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105556
2021-07-20 09:54:51 -07:00
Shimin Cui 0b043bb39b This patch extends the OptimizeGlobalAddressOfMalloc to handle the null check of global pointer variables. It is disabled with https://reviews.llvm.org/rGb7cd291c1542aee12c9e9fde6c411314a163a8ea. This PR is to reenable it while fixing the original problem reported. The fix is to set the store value correctly when creating store for the new created global init bool symbol.
Reviewed By: efriedma

Differential Revision:  https://reviews.llvm.org/D102711
2021-07-20 12:27:26 -04:00
David Green 4272e64acd [LV] Change interface of getReductionPatternCost to return Optional
Currently the Instruction cost of getReductionPatternCost returns an
Invalid cost to specify "did not find the pattern". This changes that to
return an Optional with None specifying not found, allowing Invalid to
mean an infinite cost as is used elsewhere.

Differential Revision: https://reviews.llvm.org/D106140
2021-07-20 16:44:50 +01:00
Kazu Hirata 4a30a5c8d9 [SampleProfile] Remove ProfileIsValid (NFC)
The last use was removed on Jan 22, 2021 in commit
c9cd9a0066.
2021-07-20 08:07:04 -07:00
Caroline Concatto cf78995c4a [NFC][LoopVectorizer] Remove VF.isScalable() assertion from collectInstsToScalarize and getInstructionCost
This patch removes the assertion when VF is scalable and replaces
getKnownMinValue() by getFixedValue(),  so it still guards the code against
scalable vector types.
The assertions were used to guarantee that getknownMinValue were not used for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D106359
2021-07-20 15:56:30 +01:00
Johannes Doerfert d62bbbebbf [Attributor] Initialize effectively unused value to appease UBSAN 2021-07-20 09:18:51 -05:00
Florian Hahn 82834a6732
[VPlan] Fix formatting glitch from d2a73fb44e. 2021-07-20 16:16:30 +02:00
Florian Hahn d2a73fb44e
[VPlan] Add recipe for first-order rec phis, make splicing explicit.
This patch adds a VPFirstOrderRecurrencePHIRecipe, to further untangle
VPWidenPHIRecipe into distinct recipes for distinct use cases/lowering.
See D104989 for a new recipe for reduction phis.

This patch also introduces a new `FirstOrderRecurrenceSplice`
VPInstruction opcode, which is used to make the forming of the vector
recurrence value explicit in VPlan. This more accurately models def-uses
in VPlan and also simplifies code-generation. Now, the vector recurrence
values are created at the right place during VPlan-codegeneration,
rather than during post-VPlan fixups.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D105008
2021-07-20 16:14:17 +02:00