Commit Graph

10733 Commits

Author SHA1 Message Date
Augie Fackler e59c88294b MemoryBuiltins: trailing , on collection literal
This was probably bugging more than is reasonable, but it makes merging
changes in this file slightly less annoying to have the trailing comma
here. I only noticed this because Rust is currently carrying a patch to
this file and it kept making life a little difficult.
2021-08-19 17:59:23 +02:00
David Sherwood f4122398e7 [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

  -force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

  Transforms/LoopVectorize/AArch64/strict-fadd.ll
  Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Differential Revision: https://reviews.llvm.org/D106653
2021-08-19 09:29:40 +01:00
Peter Collingbourne 6f85225ef3 StackLifetime: Remove asserts for multiple lifetime intrinsics.
According to the langref, it is valid to have multiple consecutive
lifetime start or end intrinsics on the same object.

For llvm.lifetime.start:
"If ptr [...] is a stack object that is already alive, it simply
fills all bytes of the object with poison."

For llvm.lifetime.end:
"Calling llvm.lifetime.end on an already dead alloca is no-op."

However, we currently fail an assertion in such cases. I've observed
the assertion failure when the loop vectorization pass duplicates
the intrinsic.

We can conservatively handle these intrinsics by ignoring all but
the first one, which can be implemented by removing the assertions.

Differential Revision: https://reviews.llvm.org/D108337
2021-08-18 18:45:28 -07:00
Arthur Eubanks 7557d6c896 [NFC] Cleanup calls to CallBase::getAttribute() 2021-08-18 09:39:33 -07:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Mark Danial 4018d25da8 LoopNest Analysis expansion to return instructions that prevent a Loop
Nest from being perfect

Expand LoopNestAnalysis to return the full list of instructions that
cause a loop nest to be imperfect. This is useful for other passes to
know if they should continue for in the inner loops.
Added New function getInterveningInstructions
that returns a small vector with the instructions that prevent a loop
for being perfect. Also added a couple of helper functions to reduce
code duplication.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D107773
2021-08-17 22:25:49 +00:00
Nikita Popov 735a590471 [MemorySSA] Remove -enable-mssa-loop-dependency option
This option has been enabled by default for quite a while now.
The practical impact of removing the option is that MSSA use
cannot be disabled in default pipelines (both LPM and NPM) and
in manual LPM invocations. NPM can still choose to enable/disable
MSSA using loop vs loop-mssa.

The next step will be to require MSSA for LICM and drop the
AST-based implementation entirely.

Differential Revision: https://reviews.llvm.org/D108075
2021-08-16 20:59:37 +02:00
Paul Robinson 94b4598d77 [PS4] stp[n]cpy not available on PS4 2021-08-16 09:06:52 -07:00
Sanjay Patel ca637014f1 [Analysis][SimplifyLibCalls] improve function signature check for memcmp
This would assert/crash as shown in:
https://llvm.org/PR50850

The matching for bcmp/bcopy should probably also be updated,
but that's another patch.
2021-08-15 16:11:26 -04:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Arthur Eubanks a0c42ca56c [NFC] Remove AttributeList::hasParamAttribute()
It's the same as AttributeList::hasParamAttr().
2021-08-13 10:58:21 -07:00
Roman Lebedev 0dc6b597db
Revert "[SCEV] Remove premature assert. PR46786"
Since then, the SCEV pointer handling as been improved,
so the assertion should now hold.

This reverts commit b96114c1e1,
relanding the assertion from commit 141e845da5.
2021-08-13 17:50:22 +03:00
Usman Nadeem a7c4e9b1f7 [InstSimplify] Eliminate vector reverse of a splat vector
experimental.vector.reverse(splat(X)) -> splat(X)

Differential Revision: https://reviews.llvm.org/D107793

Change-Id: Id29ba88fd669ff8686712e96b1bdc46dda5b853c
2021-08-11 11:27:58 -07:00
Mircea Trofin 510402c2c8 [NFC][MLGO] 'Use' variable used for asserts 2021-08-10 19:55:17 -07:00
Christopher Di Bella c874dd5362 [llvm][clang][NFC] updates inline licence info
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.

Differential Revision: https://reviews.llvm.org/D107528
2021-08-11 02:48:53 +00:00
Fangrui Song 76093b1739 [InlineAdvisor] Add single quotes around caller/callee names
Clang diagnostics refer to identifier names in quotes.
This patch makes inline remarks conform to the convention.
New behavior:

```
% clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c
a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline]
int bar(int a) { return foo(a); }
                        ^
```

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D107791
2021-08-10 11:51:31 -07:00
Sanjay Patel e260e10c4a [InstSimplify] fold min/max with limit constant
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22

...but leaving it out of analysis makes it
harder to avoid infinite loops there.
2021-08-10 10:57:25 -04:00
Sanjay Patel 188832f419 Revert "[InstSimplify] fold min/max with limit constant; NFC"
This reverts commit f43859b437.
This is not NFC, so I'll try again without that mistake in the commit message.
2021-08-10 10:50:09 -04:00
Sanjay Patel f43859b437 [InstSimplify] fold min/max with limit constant; NFC
This is already done within InstCombine:
https://alive2.llvm.org/ce/z/MiGE22

...but leaving it out of analysis makes it
harder to avoid infinite loops there.
2021-08-10 10:43:07 -04:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Zheng Chen 30b0c455b1 [LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize
fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D107618
2021-08-06 16:51:09 +00:00
Mircea Trofin ae1a2a09e4 [NFC][MLGO] Make logging more robust
1) add some self-diagnosis (when asserts are enabled) to check that all
features have the same nr of entries

2) avoid storing pointers to mutable fields because the proto API
contract doesn't actually guarantee those stay fixed even if no further
mutation of the object occurs.

Differential Revision: https://reviews.llvm.org/D107594
2021-08-06 04:44:52 -07:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Ryan Prichard 623cf3dfdf Mark getc_unlocked as unavailable by default
Before D45736, getc_unlocked was available by default, but turned off
for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked
functions, which were unavailable by default, but it also:
 * left getc_unlocked enabled by default,
 * removed the disabling line for Windows, and
 * added code to enable getc_unlocked for GNU, Android, and OSX.

For consistency, make getc_unlocked unavailable by default. Maybe this
was the intent of D45736 anyway.

Reviewed By: MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D107527
2021-08-05 16:35:02 -07:00
Bardia Mahjour 0e08891ec1 [DA] control compile-time spent by MIV tests
Function exploreDirections() in DependenceAnalysis implements a recursive
algorithm for refining direction vectors. This algorithm has worst-case
complexity of O(3^(n+1)) where n is the number of common loop levels.
In this patch I'm adding a threshold to control the amount of time we
spend in doing MIV tests (which most of the time end up resulting in over
pessimistic direction vectors anyway).

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D107159
2021-08-05 09:50:11 -04:00
Nathan Lanza 5848166369 Disable LibFuncs for stpcpy and stpncpy for Android < 21
These functions don't exist in android API levels < 21. A change in
llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming
it's available and thus is causing link errors. Simply disable it here.

Differential Revision: https://reviews.llvm.org/D107509
2021-08-04 22:48:41 -04:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Jacob Hegna b16c37fa2c [MLGO] Update the current model url for the Oz inliner model. 2021-08-04 03:09:00 +00:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Chang-Sun Lin, Jr b58eda39eb [ValueTracking] Fix computeConstantRange to use "may" instead of "always" semantics for llvm.assume
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.

Differential Revision: https://reviews.llvm.org/D107298
2021-08-02 22:20:17 +02:00
Sanjay Patel 7f55557765 [Analysis] improve function signature checking for snprintf
The check for size_t parameter 1 was already here for snprintf_chk,
but it wasn't applied to regular snprintf. This could lead to
mismatching and eventually crashing as shown in:
https://llvm.org/PR50885
2021-07-31 15:17:20 -04:00
Kerry McLaughlin 9d35594993 Reland "[LV] Use lookThroughAnd with logical reductions"
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:

  for.body:
    %phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
    %mask = and i32 %phi, 255
    %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
    %load = load i8, i8* %gep
    %ext = zext i8 %load to i32
    %and = and i32 %mask, %ext
    ...

^ this will generate an and reduction intrinsic such as the following:
    call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)

The same example for an add instruction would create an intrinsic of type i8:
    call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)

This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.

Reviewed By: david-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D105632
2021-07-30 18:04:09 +01:00
Sander de Smalen 84a4caeb84 [InstSimplify] Don't assume parent function when simplifying llvm.vscale.
D106850 introduced a simplification for llvm.vscale by looking at the
surrounding function's vscale_range attributes. The call that's being
simplified may not yet have been inserted into the IR. This happens for
example during function cloning.

This patch fixes the issue by checking if the instruction is in a
parent basic block.
2021-07-29 20:08:08 +01:00
Jun Ma e2fe26e77b [NFC][InstSimplify] Use more intuitive variable names. 2021-07-29 13:55:47 +08:00
Wenlei He 1a8087adaf [ThinLTO] Disallow importing for functions with indir branch to block address
We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference.

We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability.

When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG.

Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining.

Differential Revision: https://reviews.llvm.org/D106930
2021-07-28 18:02:48 -07:00
Jun Ma ca0fe3447f [InstSimplify] Simplify llvm.vscale when vscale_range attribute exists
Reduce llvm.vscale to constant based on vscale_range attribute.

Differential Revision: https://reviews.llvm.org/D106850
2021-07-28 21:41:52 +08:00
Mircea Trofin 935dea2cb2 [MLGO] fix silly LLVM_DEBUG misuse 2021-07-27 15:10:28 -07:00
Mircea Trofin eb76ca573d [NFC][MLGO] Debug messages for what inline advisor is selected
We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.
2021-07-27 15:05:39 -07:00
Anna Thomas 68ffed12b7 [IVDescriptors] Fix bug in checkOrderedReduction
The Exit instruction passed in for checking if it's an ordered reduction need not be
an FPAdd operation. We need to bail out at that point instead of
assuming it is an FPAdd (and hence has two operands). See added testcase.
It crashes without the patch because the Exit instruction is a phi with
exactly one operand.
This latent bug was exposed by 95346ba which added support for
multi-exit loops for vectorization.

Reviewed-By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D106843
2021-07-27 09:31:44 -04:00
Johannes Doerfert 75636868e2 [InstSimplify] Expose generic interface for replaced operand simplification
Users, especially the Attributor, might replace multiple operands at
once. The actual implementation of simplifyWithOpReplaced is able to
handle that just fine, the interface was simply not allowing to replace
more than one operand at a time. This is exposing a more generic
interface without intended changes for existing code.

Differential Revision: https://reviews.llvm.org/D106189
2021-07-27 00:56:12 -05:00
Philip Reames f82f39b9cf [SCEV] Add a comment about invariant in howManyLessThans 2021-07-26 16:39:26 -07:00
Eli Friedman 5c486ce04d [LLVM IR] Allow volatile stores to trap.
Proposed alternative to D105338.

This is ugly, but short-term I think it's the best way forward: first,
let's formalize the hacks into a coherent model. Then we can consider
extensions of that model (we could have different flavors of volatile
with different rules).

Differential Revision: https://reviews.llvm.org/D106309
2021-07-26 10:51:00 -07:00
Florian Hahn 6d753b0751
[LAA] Remove RuntimeCheckingPtrGroup::RtCheck member (NFC).
This patch removes RtCheck from RuntimeCheckingPtrGroup to make it
possible to construct RuntimeCheckingPtrGroup objects without a
RuntimePointerChecking object. This should make it easier to
re-use the code to generate runtime checks, e.g. in D102834.

RtCheck was only used to access the pointer info for a given index.
Instead, the start and end expressions can be passed directly.

For code-gen, we also need to know the address space to use. This can
also be explicitly passed at construction.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D105481
2021-07-26 17:38:10 +01:00
Nikita Popov 33146857e9 [IR] Consider non-willreturn as side effect (PR50511)
This adjusts mayHaveSideEffect() to return true for !willReturn()
instructions. Just like other side-effects, non-willreturn calls
(aka "divergence") cannot be removed and cannot be reordered relative
to other side effects. This fixes a number of bugs where
non-willreturn calls are either incorrectly dropped or moved. In
particular, it also fixes the last open problem in
https://bugs.llvm.org/show_bug.cgi?id=50511.

I performed a cursory review of all current mayHaveSideEffect()
uses, which convinced me that these are indeed the desired default
semantics. Places that do not want to consider non-willreturn as a
sideeffect generally do not want mayHaveSideEffect() semantics at
all. I identified two such cases, which are addressed by D106591
and D106742. Finally, there is a use in SCEV for which we don't
really have an appropriate API right now -- what it wants is
basically "would this be considered forward progress". I've just
spelled out the previous semantics there.

Differential Revision: https://reviews.llvm.org/D106749
2021-07-26 16:35:14 +02:00
Paul Walker 8a8d01d58c [NFC] Change VFShape so it contains an ElementCount rather than seperate VF and IsScalable properties.
Differential Revision: https://reviews.llvm.org/D106750
2021-07-26 12:25:46 +01:00
Philipp Krones 46c0366877 [Inliner] Make the CallPenalty configurable
Tests with multiple benchmarks, like Embench [1], showed that the
CallPenalty magic number has the most influence on inlining decisions
when optimizing for size.

On the other hand, there was no good default value for this parameter.
Some benchmarks profited strongly from a reduced call penalty. On
example is the picojpeg benchmark compiled for RISC-V, which got 6%
smaller with a CallPenalty of 10 instead of 12. Other benchmarks
increased in size, like matmult.

This commit makes the compromise of turning the magic number constant of
CallPenalty into a configurable value. This introduces the flag
`--inline-call-penalty`. With that flag users can fine tune the inliner
to their needs.

The CallPenalty constant was also used for loops. This commit replaces
the CallPenalty constant with a new LoopPenalty constant that is now
used instead.

This is a slimmed down version of https://reviews.llvm.org/D30899

[1]: https://github.com/embench/embench-iot

Differential Revision: https://reviews.llvm.org/D105976
2021-07-26 12:07:49 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Liqiang Tao 4bdfea2c51 [llvm][Inline] Add interface to return cost-benefit stuff
Return cost-benefit stuff which is computed by cost-benefit analysis.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D105349
2021-07-25 20:18:19 +08:00
Philip Reames ec43def700 Style tweaks for SCEV's computeMaxBECountForLT [NFC] 2021-07-23 17:19:45 -07:00