Commit Graph

138742 Commits

Author SHA1 Message Date
Bjorn Pettersson b43235a76c [DebugInfo] Fix DwarfExpression::addConstantFP for float on big-endian
The byte swapping, when dealing with 4 byte (float) FP constants
in DwarfExpression::addConstantFP, added in commit ef8992b9f0
was not correct. It always performed byte swapping using an
uint64_t value. When dealing with 4 byte values the 4 interesting
bytes ended up in the big end of the uint64_t, but later we emitted
the 4 bytes at the little end. So we ended up with zeroes being
emitted and faulty debug information.

This patch simplifies things a bit, IMHO. Using the APInt
representation throughout the function, instead of looking at
the internal representation using getRawBytes and without using
reinterpret_cast etc. And using API.byteSwap() should result in
correct byte swapping independent of APInt being 4 or 8 bytes.

Differential Revision: https://reviews.llvm.org/D86272
2020-08-20 11:48:05 +02:00
David Stenberg ca688ae497 Revert "[LoopUnswitch] Fix incorrect Modified status"
This reverts commit dfd447c220.

After I pushed this commit, llvm-sphinx-docs started failing, due to:

  Warning, treated as error:
  extension 'recommonmark' has no setup() function;
  is it really a Sphinx extension module?

I don't see how this commit may have caused that, but I'm still
reverting it since I don't know how to proceed with that
troubleshooting.
2020-08-20 11:14:23 +02:00
Evgeny Leviant d5b701b972 [ThinLTO] Import globals recursively
Differential revision: https://reviews.llvm.org/D73698
2020-08-20 12:13:43 +03:00
Sebastian Neubauer b8d1994778 [AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.

An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.

Differential Revision: https://reviews.llvm.org/D85887
2020-08-20 10:51:49 +02:00
Konstantin Schwarz 7497b861f4 [GlobalISel][IRTranslator] Support PHI instructions in landingpad blocks
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.

This change relaxes the check to allow PHI nodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86141
2020-08-20 10:49:31 +02:00
Georgii Rymar a6436b0b3a [yaml2obj] - Make the 'Machine' key optional.
Currently we have to set 'Machine' to something in our
YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets
and 'EM_386' for 32-bit targets. At the same time, in fact, in most
cases our tests do not need a machine type and we can use
'EM_NONE'.

This is cleaner, because avoids the need of using a particular machine.

In this patch I've made the 'Machine' key optional (the default value,
when it is not specified is `EM_NONE`) and removed it (where possible)
from yaml2obj, obj2yaml and llvm-readobj tests.

There are few tests left where I decided not to remove it, because
I didn't want to touch CHECK lines or doing anything more complex
than a removing a "Machine: *" line and formatting lines around.

Differential revision: https://reviews.llvm.org/D86202
2020-08-20 11:40:51 +03:00
Bevin Hansson 1a995a0af3 [ADT] Move FixedPoint.h from Clang to LLVM.
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.

This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html

Reviewed By: leonardchan, rjmccall

Differential Revision: https://reviews.llvm.org/D85312
2020-08-20 10:29:45 +02:00
dfukalov 33e2f69a24 [AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll.
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
  D = Arr2[i*8 + C1];
  Arr1[i*64 + C2] += C3 * D;
  Arr1[i*64 + C2 + 2048] += C4 * D;
}

```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D86248
2020-08-20 10:41:47 +03:00
Yvan Roux 0459f29e8b [ARM][MachineOutliner] Add default mode.
Use the stack to save and restore the link register when there is no
available register to do it.

Differential Revision: https://reviews.llvm.org/D76069
2020-08-20 09:25:33 +02:00
David Stenberg dfd447c220 [LoopUnswitch] Fix incorrect Modified status
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.

This was caught using the check introduced by D80916.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86085
2020-08-20 09:04:16 +02:00
Johannes Doerfert 012819f301 [Attributor][FIX] Update the call graph properly when internalizing functions
The internal version is now part of the SCC, make sure to perform this
update.
2020-08-20 01:44:58 -05:00
Johannes Doerfert 3edea15f9a [Attributor] Simplify comparison against constant null pointer
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D86145
2020-08-20 01:44:58 -05:00
Johannes Doerfert d01ad217ba [Attributor][FIX] Do not use cyclic arguments for `nonnull`
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
2020-08-20 01:44:58 -05:00
Johannes Doerfert a49dae0e38 [Attributor][AAIsDead][NFC] Skip uninteresting instructions early 2020-08-20 01:44:58 -05:00
Johannes Doerfert 08f33756e6 [Attributor][NFC] Extract functionality into own member 2020-08-20 01:44:58 -05:00
Qiu Chaofan 131b3b9ed4 [PowerPC] Support constrained scalar fptosi/fptoui
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.

Reviewed By: steven.zhang, uweigand

Differential Revision: https://reviews.llvm.org/D81537
2020-08-20 13:29:43 +08:00
Johannes Doerfert 1de70a724e Revert "[OpenMPOpt] ICV tracking for calls"
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611

This reverts commit b0b32e6490.
2020-08-20 00:00:35 -05:00
Craig Topper 8750d54cea [X86][AutoUpgrade] Simplify string management in UpgradeDataLayoutString a bit. NFCI
We don't need a std::string for a literal string, we can use a
StringRef.

The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
2020-08-19 17:48:11 -07:00
Matt Arsenault 31adc28d24 GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources
This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.
2020-08-19 18:53:24 -04:00
Petr Hosek 1ed1e16ab8 [CMake] Fix an issue where get_system_libname creates an empty regex capture on windows
Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1119478

Patch By: haampie

Differential Revision: https://reviews.llvm.org/D86245
2020-08-19 14:33:52 -07:00
Kyungwoo Lee 7a028fe702 Force Remove Attribute
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute.  This patch
adds -force-remove-attribute that removes an attribute from function.

Differential Revision: https://reviews.llvm.org/D85586
2020-08-19 17:30:13 -04:00
Sanjay Patel 6f3511a01a [ValueTracking] define/use max recursion depth in header
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191

But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.

Differential Revision: https://reviews.llvm.org/D86113
2020-08-19 16:56:59 -04:00
Hiroshi Yamauchi 28ccc52c40 [X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer.
Differential Revision: https://reviews.llvm.org/D85989
2020-08-19 13:39:42 -07:00
Sourabh Singh Tomar ef8992b9f0 Re-apply "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants"
This patch was reverted in 7c182663a8 due to some failures
observed on PCC based machines. Failures were due to Endianness issue and
long double representation issues.

Patch is revised to address Endianness issue. Furthermore, support
for emission of `DW_OP_implicit_value` for `long double` has been removed
(since it was unclean at the moment). Planning to handle this in
a clean way soon!

For more context, please refer to following review link.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83560
2020-08-20 01:39:42 +05:30
Sourabh Singh Tomar 9937872c02 Revert "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants"
This reverts commit 15801f1619.
arc's land messed up! It removed the new commit message and took it
from revision.
2020-08-20 01:28:03 +05:30
Raul Tambre e887d0e89b [AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass()
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.

It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.

Differential Revision: https://reviews.llvm.org/D85720
2020-08-19 12:52:30 -07:00
Sourabh Singh Tomar 15801f1619 [DebugInfo] Emit DW_OP_implicit_value for Floating point constants
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.

For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions

Consider the following example:
```
int main() {
        long double ld = 3.14;
        printf("dummy\n");
        ld *= ld;
        return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location        (0x00000000:
                     [0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
                  DW_AT_name    ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```

GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.

Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html

GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83560
2020-08-20 01:20:40 +05:30
Hiroshi Yamauchi ab401a8c8a [PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts.
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.

Differential Revision: https://reviews.llvm.org/D85784
2020-08-19 12:13:34 -07:00
Matt Arsenault adbcc8e733 GlobalISel: Add TargetLowering member to LegalizerHelper 2020-08-19 14:50:35 -04:00
Florian Hahn c0cbe6453a [DSE] Remove dead argument from removePartiallyOverlappedStores (NFC).
The argument is unused and can be removed.
2020-08-19 19:33:52 +01:00
Matt Arsenault d64ad3f051 GlobalISel: Don't check for verifier enforced constraint
Loads are always required to have a single memory operand.
2020-08-19 14:15:38 -04:00
Matt Arsenault 9e8d59a9b8 AMDGPU/GlobalISel: Remove hack for combines forming illegal extloads
Previously we weren't adding the LegalizerInfo to the post-legalizer
combiner. Since that's fixed, we don't need to try to filter out the
one case that was breaking.
2020-08-19 14:15:38 -04:00
Matt Arsenault e95c08432a GlobalISel: Use Register 2020-08-19 13:45:31 -04:00
Petr Hosek 8e4acb82f7 [CMake] Fix OCaml build failure because of absolute path in system libs
D85820 introduced a full path in the LLVM_SYSTEM_LIBS property of the
LLVMSupport target, which made the OCaml bindings fail to build, since
they use -l [system_lib] flags for every lib in LLVM_SYSTEM_LIBS, which
cannot work with absolute paths.

This patch solves the issue in a similar vain as ZLIB does it: it adds
the full library path to imported_libs, and adds a stripped down version
without directories, lib prefix and lib suffix to system_libs

In the future we should probably make some changes to LLVM_SYSTEM_LIBS,
since both zlib and ncurses do not necessarily have to be system libs
anymore due to the find_package / find_library bits introduced in
D85820 and D79219.

Patch By: haampie

Differential Revision: https://reviews.llvm.org/D86134
2020-08-19 10:33:03 -07:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Jessica Paquette d25b12bdc3 [GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.

This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.

In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.

e.g.

```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```

Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.

Differential Revision: https://reviews.llvm.org/D85463
2020-08-19 10:20:57 -07:00
Hamilton Tobon Mosquera bd2fa1819b [OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper
canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.

Differential Revision: https://reviews.llvm.org/D86155
2020-08-19 11:42:22 -05:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Sanjay Patel c8d711adae [InstCombine] reduce code duplication; NFC 2020-08-19 12:05:12 -04:00
Benjamin Kramer b98e25b6d7 Make helpers static. NFC. 2020-08-19 16:00:03 +02:00
Roman Lebedev 3d76a133c7
Revert "[InstCombine] Lower infinite combine loop detection thresholds"
And as being reported by Florian Hahn, there's a hit
in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto,
so reverting until addressed.

This reverts commit 71e0b82c9f.
2020-08-19 16:53:30 +03:00
Simon Pilgrim 057bdd63a4 [X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC
Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.
2020-08-19 14:34:32 +01:00
Simon Pilgrim 9fee2bad6d [X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI.
canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.
2020-08-19 13:28:59 +01:00
Simon Pilgrim b61cef3a92 [X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths.
Thanks to @fhahn for the test case.
2020-08-19 13:00:26 +01:00
Roman Lebedev 71e0b82c9f
[InstCombine] Lower infinite combine loop detection thresholds
It's been a month since 2f3862eb9f,
and no new bug reports about the threshold were filled,
so let's bump it again and wait again.
2020-08-19 14:37:57 +03:00
Simon Pilgrim 80a0dc59b7 [X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling.
Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.
2020-08-19 11:39:27 +01:00
Simon Pilgrim 46fc9a0dfc [X86][AVX] Fold store(extract_element(vtrunc)) to truncated store
Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly.

Differential Revision: https://reviews.llvm.org/D86158
2020-08-19 11:10:20 +01:00
sstefan1 b0b32e6490 [OpenMPOpt] ICV tracking for calls
Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.

Differential Revision: https://reviews.llvm.org/D85544
2020-08-19 11:43:12 +02:00
Meera Nakrani 545de56f87 [ARM] Enabled VMLAV and Add instructions to use VMLAVA
Used InstCombine to enable VMLAV and Add instructions to generate VMLAVA instead with tests.
2020-08-19 08:36:49 +00:00
luxufan 6c5039a10f [RISCV] add the assemble and disassemble support of Zvlsseg instructions
This implements the assemble and disassemble support of RISCV Vector
extension Zvlsseg instructions, base on the 0.9 spec version.

Reviewed  by HsiangKai

Differential Revision: https://reviews.llvm.org/D84416
2020-08-19 16:22:25 +08:00
Florian Hahn 1a55fbceaa [DSE,MemorySSA] Use NumRedundantStores instead of NumNoopStores.
Legacy DSE uses NumRedundantStores, while MemorySSA DSE uses
NumNoopStores. We should just use the same counter.
2020-08-19 08:50:33 +01:00
Ronak Chauhan fdf71d486c Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit cacfb02d28.

Reverting due to buildbot failures.
2020-08-19 13:12:29 +05:30
David Sherwood 3f36561f69 [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.

Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.

I've added new tests in

  CodeGen/AArch64/sve-split-load.ll
  CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorLoads.

Differential Revision: https://reviews.llvm.org/D85909
2020-08-19 07:54:32 +01:00
Yaxun (Sam) Liu 7546b29e76 [HIP] Support target id by --offload-arch
This patch introduces support of target id by
-offload-arch.

Differential Revision: https://reviews.llvm.org/D60620
2020-08-18 23:43:53 -04:00
Ronak Chauhan cacfb02d28 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D80713
2020-08-19 08:49:07 +05:30
Changpeng Fang e7081d117a AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc
Summary:
  When the resource descriptor is of vgpr, we need a waterfall loop
to read into a sgpr. In this patchm we generalized the  implementation
to work for any regster class sizes, and extend the work to MIMG
instructions.

Fixes: SWDEV-223405

Reviewers:
  arsenm, nhaehnle

Differential Revision:
  https://reviews.llvm.org/D82603
2020-08-18 16:27:36 -07:00
Roman Lebedev 2f01785857
[NFC][InstCombine] Aggregate reconstruction: use plain map
Now that we no longer require for this map to have stable iteration order,
we no longer need to pay for keeping the iteration order stable,
so switch from `SmallMapVector` to `SmallDenseMap`.
2020-08-19 01:09:25 +03:00
Roman Lebedev 78bd4231bf
[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors
While it may seem like we can just "deduplicate" the case where
some basic block happens to be a predecessor more than once,
which happens for e.g. switches, that is not correct thing to do.
We must actually add a PHI operand for each predecessor.

This was initially reported to me by David Major
as a clang crash during gecko build for android.
2020-08-19 01:00:42 +03:00
Amara Emerson ed35344524 Use std::make_tuple instead of initializer lists to make a bot happy:
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux
2020-08-18 14:55:52 -07:00
Craig Topper 9028c03ce6 [X86] Fix the Predicates on MMX_PSHUFWri/PSHUFWmi to include SSE1 in addition to MMX.
These instructions weren't in the initial version of MMX, but
were added when SSE1 was introduced. We already have the intrinsic
named correctly to include sse and the frontened header enforces
sse. We have one place in the backend where we DAG combine to
this intrinsic, but that's also qualified. So don't know of anything
currently broken unless someone writes their own IR and doesn't
set the sse feature.
2020-08-18 14:28:26 -07:00
David Blaikie 1870b52f0c Recommit "PR44685: DebugInfo: Handle address-use-invalid type units referencing non-type units"
Originally committed as be3ef93bf5.
Reverted by b4bffdbadf due to bot
failures:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/17380/testReport/junit/LLVM/DebugInfo_X86/addr_tu_to_non_tu_ll/
http://45.33.8.238/win/22216/step_11.txt

MacOS failure due to testing Split DWARF which isn't compatible with
MachO.
Windows failure due to testing type units which aren't enabled on
Windows.

Fix both of these by applying an explicit x86 linux triple to the test.
2020-08-18 13:43:28 -07:00
Eli Friedman be944c85f3 [AArch64][SVE] Add patterns for integer mla/mls.
We probably want to introduce pseudo-instructions at some point, like
we have for binary operations, but this seems okay for now.

One thing I'm not sure about is whether we should be doing this as a
DAGCombine instead of directly pattern-matching it. I don't see any big
downside to doing it this way, though.

Differential Revision: https://reviews.llvm.org/D85681
2020-08-18 12:51:16 -07:00
Eli Friedman bb18532399 [AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers.
This isn't necessaary for ACLE, but could be useful in other situations.
And the change is simple.

Differential Revision: https://reviews.llvm.org/D85251
2020-08-18 12:51:16 -07:00
Jessica Paquette bf36e90295 [GlobalISel][CallLowering] NFC: Unify flag-setting from CallBase + AttributeList
It's annoying to have to maintain multiple, nearly identical chains of if
statements which all set the same attributes.

Add a helper function, `addFlagsUsingAttrFn` which performs the attribute
setting.

Then, use wrappers for that function in `lowerCall` and `setArgFlags`.

(Note that the flag-setting code in `setArgFlags` was missing the returned
attribute. There's no selection for this yet, so no test. It's an example of
the kind of thing this lets us avoid, though.)

Differential Revision: https://reviews.llvm.org/D86159
2020-08-18 11:07:33 -07:00
Jessica Paquette f29e6277ad [GlobalISel][CallLowering] Don't tail call with non-forwarded explicit sret
Similar to this commit:

faf8065a99

Testcase is pretty much the same as

test/CodeGen/AArch64/tailcall-explicit-sret.ll

Except it uses i64 (since we don't handle the i1024 return values yet), and
doesn't have indirect tail call testcases (because we can't translate those
yet).

Differential Revision: https://reviews.llvm.org/D86148
2020-08-18 11:06:57 -07:00
Matt Arsenault 5a15f6628e GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT
Add unit tests since AMDGPU will only trigger this for gigantic
vectors, and won't use the annoying odd sized breakdown case.
2020-08-18 13:51:19 -04:00
David Blaikie f7a49d2aa6 [WIP][DebugInfo] Lazily parse debug_loclist offsets
Parsing DWARFv5 debug_loclist offsets when a CU is parsed is weighing
down memory usage of symbolizers that don't need to parse this data at
all. There's not much benefit to caching these anyway - since they are
O(1) lookup and reading once you know where the offset list starts (and
can do bounds checking with the offset list size too).

In general, I think it might be time to start paying down some of the
technical debt of loc/loclist/range/rnglist parsing to try to unify it a
bit more.

eg:

* Currently DWARFUnit has: RangeSection, RangeSectionBase, LocSection,
  LocSectionBase, LocTable, RngListTable, LoclistTableHeader (be nice if
  these were all wrapped up in two variables - one for loclists, one for
  rnglists)

* rnglists and loclists are handled differently (see:
  LoclistTableHeader, but no RnglistTableHeader)

* maybe all these types could be less stateful - lazily parse what they
  need to, even reparsing rather than caching because it doesn't seem
  too expensive, for instance. (though admittedly so long as it's
  constantcost/overead per compilatiton that's probably adequate)

* Maybe implementing and using a DWARFDataExtractor that can be
  sub-ranged (so we could slice it up to just the single contribution) -
  though maybe that's not so useful because loc/ranges need to refer to
  it by absolute, not contribution-relative mechanisms

Differential Revision: https://reviews.llvm.org/D86110
2020-08-18 10:49:39 -07:00
Amara Emerson 04a6ea5d77 [GlobalISel] Add a combine for sext_inreg(load x), c --> sextload x
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.

This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.

Differential Revision: https://reviews.llvm.org/D85966
2020-08-18 10:42:15 -07:00
Amara Emerson 40e269ea6d [GlobalISel] Add a combine for ashr(shl x, c), c --> sext_inreg x, c'
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.

Differential Revision: https://reviews.llvm.org/D85965
2020-08-18 10:42:15 -07:00
Simon Pilgrim 11ff5176c4 [X86][AVX] lowerShuffleWithVPMOV - add non-VLX support.
We can efficiently handle non-VLX cases now that we have the getAVX512TruncNode helper.
2020-08-18 17:51:14 +01:00
Fangrui Song c466c5fa7e [ARM] Fix build after D86087 2020-08-18 09:20:32 -07:00
David Green 3471520b1f [ARM] Allow tail predication of VLDn
VLD2/4 instructions cannot be predicated, so we cannot tail predicate
them from autovec. From intrinsics though, they should be valid as they
will just end up loading extra values into off vector lanes, not
effecting the on lanes. The same is true for loads in general where so
long as we are not using the other vector lanes, an unpredicated load
can be converted to a predicated one.

This marks VLD2 and VLD4 instructions as validForTailPredication and
allows any unpredicated load in tail predication loop, which seems to be
valid given the other checks we have.

Differential Revision: https://reviews.llvm.org/D86022
2020-08-18 17:15:45 +01:00
Sam Tebbs 31f02ac60a [ARM] Use mov operand if the mov cannot be moved while tail predicating
There are some cases where the instruction that sets up the iteration
count for a tail predicated loop cannot be moved before the dlstp,
stopping tail predication entirely. This patch checks if the mov operand
can be used and if so, uses that instead.

Differential Revision: https://reviews.llvm.org/D86087
2020-08-18 17:10:29 +01:00
Jamie Schmeiser 645c6856a6 [NFC] Add raw_ostream parameter to printIR routines
This is a non-functional-change to generalize the printIR routines so that
the output can be saved and manipulated rather than being directly output
to dbgs(). This is a prerequisite change for many upcoming changes that
allow new ways of examining changes made to the IR in the new pass manager.

Reviewed By: aeubanks (Arthur Eubanks)

Differential Revision: https://reviews.llvm.org/D85999
2020-08-18 16:05:27 +00:00
Jessica Paquette 224a8c639e [GlobalISel][CallLowering] Look through call parameters for flags
We weren't looking through the parameters on calls at all.

E.g., say you had

```
declare i32 @zext(i32 zeroext %x)

...
%y = call i32 @zext(i32 %something)
...

```

At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.

This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.

Differential Revision: https://reviews.llvm.org/D86125
2020-08-18 08:48:56 -07:00
jasonliu f48eced390 [XCOFF] emit .rename for .lcomm when necessary
Summary:

This is a follow up for D82481. For .lcomm directive, although it's
not necessary to have .rename emitted, it's still desirable to do
it so that we do not see internal 'Rename..' gets print out in
symbol table. And we could have consistent naming between TC entry
and .lcomm. And also have consistent naming between IR and final
object file.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D86075
2020-08-18 15:32:45 +00:00
Simon Pilgrim abd33bf5ef [X86][AVX] lowerShuffleWithPERMV - pad 128/256-bit shuffles on non-VLX targets
Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles.

TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.
2020-08-18 15:46:02 +01:00
Simon Pilgrim 011bf4fd96 [X86][AVX] lowerShuffleWithVTRUNC - extend to support v16i16/v32i8 binary shuffles.
This requires a few additional SrcVT vs DstVT padding cases in getAVX512TruncNode.
2020-08-18 15:30:02 +01:00
Simon Pilgrim d5621b83a5 [X86][AVX] lowerShuffleWithVTRUNC - pull out TRUNCATE/VTRUNC creation into helper code. NFCI.
Prep work toward adding v16i16/v32i8 support for lowerShuffleWithVTRUNC and improving lowerShuffleWithVPMOV.
2020-08-18 14:52:42 +01:00
Matt Arsenault 2f5f5febf3 AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize
Previously, it would successfully select and assert if not HSA or PAL
when expanding the pseudoinstruction. We don't need the
pseudoinstruction anymore since we know the total size after
legalization.
2020-08-18 09:28:01 -04:00
Matt Arsenault 3ba7777b94 AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT
The code to determine the value size was overcomplicated and only
correct in the case where the result register already had a register
class assigned. We can always take the size directly from the
register's type.
2020-08-18 09:28:01 -04:00
Sanjay Patel 139da9c4d7 [InstCombine] fold fabs of select with negated operand
This is the FP example shown in:
https://bugs.llvm.org/PR39474
2020-08-18 09:23:07 -04:00
Georgii Rymar bd7daf5ceb [yaml2obj] - Don't crash when `FileHeader` declares an empty `Flags` key in specific situations.
We currently call the `llvm_unreachable` for the following YAML:

```
--- !ELF
FileHeader:
  Class:   ELFCLASS32
  Data:    ELFDATA2LSB
  Type:    ET_REL
  Machine: EM_NONE
  Flags:   [ ]
```

it happens because the `Flags` key is present, though `EM_NONE` is a
machine type that has no known `EF_*` values and we call `llvm_unreachable` by mistake.

Differential revision: https://reviews.llvm.org/D86138
2020-08-18 16:09:28 +03:00
Simon Pilgrim 7db5124736 [X86][AVX] lowerShuffleWithVTRUNC - avoid unnecessary division in element counts. NFCI.
(256 / SrcEltBits) == ((2 * EltSizeInBits * NumElts) / (EltSizeInBits * Scale)) == (2 * (NumElts / Scale)) == NumSrcElts
2020-08-18 13:48:22 +01:00
Nico Weber b4bffdbadf Revert "PR44685: DebugInfo: Handle address-use-invalid type units referencing non-type units"
This reverts commit be3ef93bf5.
Test fails on macOS and Windows, e.g. http://45.33.8.238/win/22216/step_11.txt
2020-08-18 08:40:36 -04:00
Ronak Chauhan e760e85680 [llvm-objdump][AMDGPU] Detect CPU string
AMDGPU ISA isn't backwards compatible and hence -mcpu must always be specified during disassembly.
However, the AMDGPU target CPU is stored in e_flags in the ELF object.

This patch allows targets to implement CPU string detection, and also implements it for AMDGPU by looking at e_flags.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D84519
2020-08-18 17:43:16 +05:30
Paul Walker 9f63dc3265 [SVE] Fix shift-by-imm patterns used by asr, lsl & lsr intrinsics.
Right shift patterns will no longer incorrectly accept a shift
amount of zero.  At the same time they will allow larger shift
amounts that are now saturated to their upper bound.

Patterns have been extended to enable immediate forms for shifts
taking an arbitrary predicate.

This patch also unifies the code path for immediate parsing so the
i64 based shifts are no longer treated specially.

Differential Revision: https://reviews.llvm.org/D86084
2020-08-18 11:41:26 +01:00
Paul Walker cb5cc47a65 [SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations.
Also strengthens the CHECK lines for scalable vector splat tests.

Differential Revision: https://reviews.llvm.org/D86070
2020-08-18 11:19:43 +01:00
Simon Pilgrim d2057a8015 [X86][AVX] Lower v16i8/v8i16 binary shuffles using VTRUNC/TRUNCATE
This patch adds lowerShuffleWithVTRUNC to handle basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements).

We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat.

For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result.

This should address the last regressions in D66004

Differential Revision: https://reviews.llvm.org/D86093
2020-08-18 11:11:58 +01:00
Shinji Okumura 5e361e2aa4 [Attributor] Deduce noundef attribute
This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85184
2020-08-18 18:05:54 +09:00
David Blaikie be3ef93bf5 PR44685: DebugInfo: Handle address-use-invalid type units referencing non-type units
Theory was that we should never reach a non-type unit (eg: type in an
anonymous namespace) when we're already in the invalid "encountered an
address-use, so stop emitting types for now, until we throw out the
whole type tree to restart emitting in non-type unit" state. But that's
not the case (prior commit cleaned up one reason this wasn't exposed
sooner - but also makes it easier to test/demonstrate this issue)
2020-08-17 21:42:00 -07:00
David Blaikie 24c3dabef4 DebugInfo: Emit class template parameters first, before members
This reads more like what you'd expect the DWARF to look like (from the
lexical order of C++ - template parameters come before members, etc),
and also happens to make it easier to tickle (& thus test) a bug related
to type units and Split DWARF I'm about to fix.
2020-08-17 21:42:00 -07:00
Johannes Doerfert 8abd69aa9e [Attributor] Bail early if AAMemoryLocation cannot derive anything
Before this change we looked through all memory operations in a function
even if the first was an unknown call that could do anything. This did
cost a lot of time but there is little use to do so. We also avoid
creating AAs for things that we would have looked at in case no other AA
will; that is the reason for the test changes.

Running only the attributor-cgscc pass on a IR version of
`llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the
time we spend in `AAMemoryLocation::update` from 4% total to
0.9% (disclaimer: no accurate measurements).
2020-08-17 23:36:36 -05:00
Johannes Doerfert 1d99c3d707 [Attributor] We (should) keep the CG updated so we can mark it as preserved 2020-08-17 23:36:36 -05:00
Johannes Doerfert 858c75f7d1 [Attributor][NFC] Directly return proper type to avoid casts 2020-08-17 23:36:36 -05:00
Johannes Doerfert b27bdf955a [Attributor][FIX] Handle function pointers properly in AANonNull
Before we tired to create a dominator tree for a declaration when we
wanted to determine if the function pointer is `nonnull`. We now avoid
looking at global values if `Value::getPointerDereferenceableBytes` not
already determined `nonnull`.
2020-08-17 23:36:35 -05:00
Harmen Stoppels a52173a3e5 Use find_library for ncurses
Currently it is hard to avoid having LLVM link to the system install of
ncurses, since it uses check_library_exists to find e.g. libtinfo and
not find_library or find_package.

With this change the ncurses lib is found with find_library, which also
considers CMAKE_PREFIX_PATH. This solves an issue for the spack package
manager, where we want to use the zlib installed by spack, and spack
provides the CMAKE_PREFIX_PATH for it.

This is a similar change as https://reviews.llvm.org/D79219, which just
landed in master.

Differential revision: https://reviews.llvm.org/D85820
2020-08-17 19:52:52 -07:00
Amy Kwan c7ec3a7e33 [PowerPC] Implement Vector Extract Mask builtins in LLVM/Clang
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82675
2020-08-17 21:14:17 -05:00
Hamilton Tobon Mosquera 496f8e5b36 [OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts.
WIP that tries to hide the latency of runtime calls that involve host to
device memory transfers by splitting them into their "issue" and "wait"
versions. The "issue" is moved upwards as much as possible. The "wait" is
moved downards as much as possible. The "issue" issues the memory transfer
asynchronously, returning a handle. The "wait" waits in the returned
handle for the memory transfer to finish. We still lack of the movement.
2020-08-17 20:56:10 -05:00
Aditya Kumar 370330f084 NFC: [GVNHoist] Outline functions from the class
Reviewers: sebpop
Reviewed By: hiraditya

Differential Revision: https://reviews.llvm.org/D86032
2020-08-17 17:40:04 -07:00
Craig Topper b673dfbb9a [X86] When manually creating intrinsic nodes in X86ISelLowering, make sure we use getTargetConstant and pointer type for the intrinsic ID.
Doesn't really matter in practice but that's how the nodes are
normally created by SelectionDAGBuilder. So we should match.

Found by temporarily hacking type checks into isel table.
2020-08-17 17:25:53 -07:00
Craig Topper 2ffa5d218f [X86] Rename INTR_TYPE_4OP to INTR_TYPE_4OP_IMM8 and truncate immediates to MVT::i8
This makes sure VPTERNLOG is generated with MVT::i8 immediate
as its SDNode declaration in X86InstrFragmentsSIMD.td declares.
2020-08-17 17:25:52 -07:00
Craig Topper bc244f08cf [X86] Truncate immediate to i8 for INTR_TYPE_3OP_IMM8
This is used for DBPSADBW which has a i32 immediate for its
intrinsic and an i8 immediate in tablegen isel patterns.
2020-08-17 17:25:51 -07:00
Craig Topper ab7151f1cf [X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8.
This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern
matching doesn't check so it doesn't matter in practice. Maybe for
SelectionDAG CSE it would matter.
2020-08-17 17:25:51 -07:00
Mircea Trofin 62fc44ca3c [MLInliner] In development mode, obtain the output specs from a file
Different training algorithms may produce models that, besides the main
policy output (i.e. inline/don't inline), produce additional outputs
that are necessary for the next training stage. To facilitate this, in
development mode, we require the training policy infrastructure produce
a description of the outputs that are interesting to it, in the form of
a JSON file. We special-case the first entry in the JSON file as the
inlining decision - we care about its value, so we can guide inlining
during training - but treat the rest as opaque data that we just copy
over to the training log.

Differential Revision: https://reviews.llvm.org/D85674
2020-08-17 16:56:47 -07:00
Hongtao Yu 819b2d9c79 [llvm-objdump] Symbolize binary addresses for low-noisy asm diff.
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.

So far only the X86 disassemblers are supported.

Test Plan:

llvm-objdump -d  --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr [rip + 4112]  # 202182 <g>
               	jge	0x20117e <_start+0x25>
               	call	0x201158 <foo>
               	inc	dword ptr [rsp]
               	jmp	0x201169 <_start+0x10>
               	xor	eax, eax
               	pop	rcx
               	ret
```

llvm-objdump -d  **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
<L1>:
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr  <g>
               	jge	 <L0>
               	call	 <foo>
               	inc	dword ptr [rsp]
               	jmp	 <L1>
<L0>:
               	xor	eax, eax
               	pop	rcx
               	ret
```

Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D84191
2020-08-17 16:55:12 -07:00
Johannes Doerfert 19bd4ef157 [Attributor] Properly use the call site argument position 2020-08-17 18:21:09 -05:00
Johannes Doerfert 5dfc207c53 [Attributor][FIX] Do not request an AANonNull for non-pointer types 2020-08-17 18:21:08 -05:00
Kazushi (Jam) Marukawa 68cb29eff1 [VE] Modify ISelLoweirng following clang-tidy
Modify case style of function names following clang-tidy.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D86076
2020-08-18 07:43:19 +09:00
Roman Lebedev 03127f795b
[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block
While the original implementation added in D85787 / ae7f08812e
is not incorrect, it is known to be suboptimal.

In particular, it is not incorrect to use the basic block
in which the original `insertvalue` instruction is located
as the merge point, that is not necessarily optimal,
as `@test6` shows.

We should look at all the AggElts, and, if they are all defined
in the same basic block, then that is the basic block we should use.

On RawSpeed library, this catches +4% (+50) more cases.
On vanilla LLVM test-suits, this catches +12% (+92) more cases.
2020-08-18 00:45:18 +03:00
Roman Lebedev f4f673e0e3
[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument
In a following patch, UseBB will be detected later,
so capturing it is potentially error-prone (capture by ref vs by val).

Also, parametrized UseBB will likely be needed
for multiple levels of PHI indirections later on anyways.
2020-08-18 00:45:18 +03:00
Roman Lebedev 4973ca3eac
[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually
This is NFC at the moment, because right now we always insert the PHI
into the same basic block in which the original `insertvalue` instruction
is, but that will change.

Also, fixes addition of the suffix to the value names.
2020-08-18 00:45:17 +03:00
Matt Arsenault a128292b90 GlobalISel: Make type for lower action more consistently optional
Some of the lower implementations were relying on this, however the
type was not set depending on which form .lower* helper form you were
using. For instance, if you used an unconditonal lower(), the type was
never set. Most of the lower actions do not benefit from a type
parameter, and just expand in terms of the original operation's types.

However, some lowerings could benefit from an additional type hint to
combine a promotion and an expansion. An example of this is for
add/sub sat. The DAG integer legalization tries to use smarter
expansions directly when promoting the integer type, and doesn't
always produce the same instruction with a wider type.

Treat this as an optional hint argument, that only means something for
specific lower actions. It may be useful to generalize this mechanism
to pass a full list of type indexes and desired types, but I haven't
run into a case like that yet.
2020-08-17 16:24:55 -04:00
diggerlin 2f0d755d81 [AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d
SUMMARY:

1. This patch provided API for decoding the traceback table info and unit test for the these API.

2. Another patchs will do the following things:
2.1 added a new option --traceback-table to decode the trace back table information for xcoff object file when
using llvm-objdump to disassemble the xcoff objfile.

2.2 print out the  traceback table information for llvm-objdump.

Reviewers:  Jason liu, Hubert Tong, James Henderson

Differential Revision: https://reviews.llvm.org/D81585
2020-08-17 16:23:47 -04:00
Florian Hahn 4cc20aa743 [DSE,MemorySSA] Skip access already dominated by a killing def.
If we already found a killing def (= a def that completely overwrites
the location) that dominates an access, we can skip processing it
further.

This does not help with compile-time, but increases the number of memory
accesses we can process with the same scan budget, leading to more
stores being eliminated.

Improvements with this change

Same hash: 203 (filtered out)
Remaining: 34
Metric: dse.NumFastStores

Program                                        base    dom     diff
 test-suite...rolangs-C++/family/family.test     2.00    4.00  100.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   172.00  229.00  33.1%
 test-suite...ks/Prolangs-C/agrep/agrep.test    10.00   12.00  20.0%
 test-suite...oxyApps-C++/miniFE/miniFE.test    44.00   51.00  15.9%
 test-suite...marks/7zip/7zip-benchmark.test   1285.00 1474.00 14.7%
 test-suite...006/450.soplex/450.soplex.test   254.00  289.00  13.8%
 test-suite...006/447.dealII/447.dealII.test   2466.00 2798.00 13.5%
 test-suite...000/197.parser/197.parser.test     9.00   10.00  11.1%
 test-suite.../Benchmarks/nbench/nbench.test    85.00   91.00   7.1%
 test-suite...ce/Applications/siod/siod.test    68.00   72.00   5.9%
 test-suite...ications/JM/lencod/lencod.test   786.00  824.00   4.8%
 test-suite...6/464.h264ref/464.h264ref.test   765.00  798.00   4.3%
 test-suite.../Benchmarks/Ptrdist/bc/bc.test   105.00  109.00   3.8%
 test-suite...lications/obsequi/Obsequi.test    29.00   28.00  -3.4%
 test-suite...3.xalancbmk/483.xalancbmk.test   1322.00 1367.00  3.4%
 test-suite...chmarks/MallocBench/gs/gs.test   118.00  122.00   3.4%
 test-suite...T2006/401.bzip2/401.bzip2.test    60.00   62.00   3.3%
 test-suite...6/482.sphinx3/482.sphinx3.test    30.00   31.00   3.3%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test   862.00  887.00   2.9%
 test-suite...telecomm-gsm/telecomm-gsm.test    78.00   80.00   2.6%
 test-suite...ediabench/gsm/toast/toast.test    78.00   80.00   2.6%
 test-suite.../Applications/SPASS/SPASS.test   163.00  167.00   2.5%
 test-suite...lications/ClamAV/clamscan.test   240.00  245.00   2.1%
 test-suite...006/453.povray/453.povray.test   1392.00 1419.00  1.9%
 test-suite...000/255.vortex/255.vortex.test   211.00  215.00   1.9%
 test-suite...:: External/Povray/povray.test   1295.00 1317.00  1.7%
 test-suite...lications/sqlite3/sqlite3.test   175.00  177.00   1.1%
 test-suite...T2000/256.bzip2/256.bzip2.test    99.00  100.00   1.0%
 test-suite...0/253.perlbmk/253.perlbmk.test   629.00  635.00   1.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test   1183.00 1194.00  0.9%
 test-suite.../CINT2000/176.gcc/176.gcc.test   647.00  653.00   0.9%
 test-suite...ications/JM/ldecod/ldecod.test   512.00  516.00   0.8%
 test-suite...0.perlbench/400.perlbench.test   1026.00 1034.00  0.8%
 test-suite...-typeset/consumer-typeset.test   1876.00 1877.00  0.1%
 Geomean difference                                             7.3%
2020-08-17 20:54:48 +01:00
Alexandre Ganea 98e01f56b0 Revert "Re-Re-land: [CodeView] Add full repro to LF_BUILDINFO record"
This reverts commit a3036b3863.

As requested in: https://reviews.llvm.org/D80833#2221866
Bug report: https://crbug.com/1117026
2020-08-17 15:49:18 -04:00
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Sanjay Patel f925fd3304 [DAGCombiner] give magic number a name in getStoreMergeCandidates; NFC 2020-08-17 15:37:55 -04:00
Sanjay Patel 046b4a550a [DAGCombiner] reduce code duplication in getStoreMergeCandidates; NFC 2020-08-17 15:37:55 -04:00
Sanjay Patel 20c85fd1ab [DAGCombiner] simplify bool return in getStoreMergeCandidates; NFC 2020-08-17 15:37:55 -04:00
Sanjay Patel 52cd8f1ecb [DAGCombiner] clean up getStoreMergeCandidates(); NFC
1. Move bailouts and local var declarations.
2. Convert if-chain to switch on StoreSource with unreachable default.
2020-08-17 15:37:54 -04:00
Sanjay Patel 27708db3e3 [DAGCombiner] convert StoreSource if-chain to switch; NFC
The "isa" checks were less constrained because they allow
target constants, but the later matching code would bail
out on those anyway, so this should be slightly more
efficient.
2020-08-17 15:37:54 -04:00
Tyker a79e604462 [AssumeBundles] Fix Bug in Assume Queries
this bug was causing miscompile.
now clang cant properly selfhost with -mllvm --enable-knowledge-retention

Reviewed By: jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D83507
2020-08-17 21:36:53 +02:00
Matt Arsenault e1a2f4713c AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Stanislav Mekhanoshin 24182f14b6 [AMDGPU] Define spill opcodes for all AGPR sizes
Since we have defined all these sizes I believe we shall be
able to spill these as well.

Differential Revision: https://reviews.llvm.org/D86098
2020-08-17 12:17:23 -07:00
Dávid Bolvanský 0f14b2e6cb Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 50c743fa71. Patch will be split to smaller ones.
2020-08-17 20:44:33 +02:00
Jonas Devlieghere 295eb54deb [llvm] Don't create the directory hierarchy in the FileCollector...
... if the collected file doesn't exists.

This fixes the situation where LLDB can't create a file when capturing a
reproducer because the parent path doesn't exist, but can during replay
because the file collector created the directory hierarchy even though
the file doesn't exist.

This is covered by the lldb reproducer test suite.
2020-08-17 11:21:39 -07:00
Florian Hahn df4756ec6c [DSE,MemorySSA] Check for underlying objects first.
isWriteAtEndOfFunction needs to check all memory uses of Def, which is
much more expensive than getting the underlying objects in practice.
Switch the call order, as recommended by the TODO, which was added as
per an earlier review.

This shaves off a bit of compile-time.
2020-08-17 18:52:18 +01:00
Matt Arsenault a275acc4a9 GlobalISel: Early continue to reduce loop indentation 2020-08-17 13:51:08 -04:00
Florian Hahn 139810449b [DSE,MemorySSA] Account for ScanLimit == 0 on entry.
Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.

In some cases, this brings a 2x speedup in terms of compile-time.
2020-08-17 17:55:14 +01:00
Aditya Kumar cb6e6936db NFC: [GVNHoist] Hoist loop invariant code and rename variables for readability
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86031
2020-08-17 09:43:34 -07:00
Matt Arsenault c8a9872259 AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
2020-08-17 12:31:38 -04:00
Steven Perron eed6476a87 Reset PAL metadata when AMDGPU traget stream finishes
If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one.  See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC.

No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side.  Let me know if there is a good way to test this.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D85667
2020-08-17 10:56:11 -04:00
Matt Arsenault 5b53b17cd3 DAG: Add missing comment for transform 2020-08-17 10:01:12 -04:00
Matt Arsenault c7b9cd31bf AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping 2020-08-17 09:53:26 -04:00
Matt Arsenault af162ac785 AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics
This should use the default mapping and insert a copy to the vcc bank,
and not try to insert a readfirstlane.
2020-08-17 09:53:25 -04:00
Matt Arsenault da3f357de6 AMDGPU: Don't look at dbg users for foldable operands
These would have always failed to fold, so checking them or adding
them to the fold candidates is useless.
2020-08-17 09:53:25 -04:00
Matt Arsenault 924f31bc3c GlobalISel: Remove unnecessary check for copy type
COPY isn't allowed to change the type, but can mix no type with type.
2020-08-17 09:19:25 -04:00
Matt Arsenault 66ffa0e91f AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo 2020-08-17 09:19:22 -04:00
Matt Arsenault e0375dbcb3 AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics
Global instructions have the signed offsets.
2020-08-17 09:19:15 -04:00
Alex Zinenko 874aef875d [llvm] support graceful failure of DataLayout parsing
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D85650
2020-08-17 15:10:37 +02:00
Kai Nacke c2ae7934c8 [SystemZ/ZOS]__(de)register_frame are not available on z/OS.
The functions `__register_frame`/`__deregister_frame` are not
available on z/OS, so add a guard to not use them.

Reviewed By: lhames, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D84787
2020-08-17 09:00:09 -04:00
Sam Elliott 3f7068ad98 [RISCV] Enable the use of the old mucounteren name
The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which
has the same numeric CSR value as `mucounteren` from 1.09.1. This patch
enables the use of the old `mucounteren` name.

Patch by Yuichi Sugiyama.

Reviewed By: lenary, jrtc27, pzheng

Differential Revision: https://reviews.llvm.org/D85067
2020-08-17 13:11:49 +01:00
Sam Elliott 5f9ecc5d85 [RISCV] Indirect branch generation in position independent code
This fixes the "Unable to insert indirect branch" fatal error sometimes
seen when generating position-independent code.

Patch by msizanoen1

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D84833
2020-08-17 13:09:26 +01:00
Sanjay Patel e6b6787d01 [InstCombine] fold abs(X)/X to cmp+select
The backend can convert the select-of-constants to
bit-hack shift+logic if desirable.

https://alive2.llvm.org/ce/z/pgJT6E

  define i8 @src(i8 %x) {
  %0:
    %a = abs i8 %x, 1
    %d = sdiv i8 %x, %a
    ret i8 %d
  }
  =>
  define i8 @tgt(i8 %x) {
  %0:
    %cond = icmp sgt i8 %x, 255
    %r = select i1 %cond, i8 1, i8 255
    ret i8 %r
  }
  Transformation seems to be correct!
2020-08-17 08:01:28 -04:00
Sanjay Patel 6cd4a6f6b2 [InstCombine] reduce code duplication; NFC 2020-08-17 08:01:27 -04:00
Simon Pilgrim c1f6ce0c73 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Simon Pilgrim 1d2ede87ea [X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases
Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.

We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004
2020-08-17 11:58:51 +01:00
Cullen Rhodes 2ccde3c96b [InlineCost] Fix scalable vectors in visitAlloca
Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848
2020-08-17 10:34:27 +00:00
Vitaly Buka 3b348d9102 [NFC][StackSafety] Move out sort from the loop 2020-08-17 03:30:14 -07:00
Kazushi (Jam) Marukawa 40f1e7e804 [VE] Support f128
Support f128 using VE instructions.  Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D86035
2020-08-17 17:26:52 +09:00
Craig Topper a206f85091 [X86] Reject dirflag in inline asm constraints other than clobber.
Fixes the crash from PR47195.
2020-08-16 23:33:45 -07:00
Chen Zheng 4d52ebb9b9 [PowerPC] Make StartMI ignore COPY like instructions.
Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D85659
2020-08-17 02:12:30 -04:00
Yonghong Song aa61e43040 [InstCombine] Fix a compilation bug
With gcc 6.3.0, I hit the following compilation bug.
  ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
   };
    ^
  cc1plus: all warnings being treated as errors

The error is introduced by Commit ae7f08812e ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
2020-08-16 21:56:42 -07:00
Vitaly Buka e10e7829bf [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Roman Lebedev 0ec1f0f332
[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically 2020-08-16 23:37:36 +03:00
Roman Lebedev ae7f08812e
[InstCombine] Aggregate reconstruction simplification (PR47060)
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787
2020-08-16 23:27:56 +03:00
Simon Pilgrim f25d47b7ed [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.

There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
2020-08-16 15:00:41 +01:00
Sanjay Patel 3ffb751f3d [InstCombine] fold copysign with fabs/fneg operand
We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.
2020-08-16 08:53:47 -04:00
Sanjay Patel 3fed67b7e6 [InstCombine] reduce code duplication; NFC 2020-08-16 08:53:47 -04:00
Vitaly Buka 47552a614a [StackSafety] Change how callee searched in index
Handle other than local linkage types.
2020-08-16 04:37:19 -07:00
Simon Pilgrim dca7eb7d60 [X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp
Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
2020-08-16 12:26:27 +01:00
Simon Pilgrim c27baa54b7 [X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.
Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.

This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
2020-08-16 11:51:44 +01:00
Fady Ghanim aaa93a681b [OpenMP][OMPBuilder] Adding support for `omp single`
This adds support for generating `omp single`, and necessary calls for
`copyprivate` clause.

Differential Revision: https://reviews.llvm.org/D85617
2020-08-16 01:15:16 -04:00
Wenlei He 577e58bcc7 [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

This is a resubmit of https://reviews.llvm.org/D83743
2020-08-15 20:17:21 -07:00
Lang Hames a49b05bb61 [JITLink][MachO] Use correct symbol scope when N_PEXT is set and N_EXT unset.
MachOLinkGraphBuilder has been treating these as hidden, but they should be
treated as local.

Symbols with N_PEXT set and N_EXT unset are produced when hidden symbols are
run through 'ld -r' without passing -keep_private_externs. They will show up
under 'nm -m' as "was private extern", hence the name of the test cases.

Testcase commited as relocatable object to ensure that the test suite doesn't
depend on having 'ld -r' available.
2020-08-15 15:53:33 -07:00
Amara Emerson 7006bb69ef [GlobalISel] Enable copy-propagation in post-legalizer combiner.
This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.

Differential Revision: https://reviews.llvm.org/D85964
2020-08-15 13:44:30 -07:00
Matt Arsenault 04a288f0f0 GlobalISel: Remove unnecessary llvm:: 2020-08-15 12:12:50 -04:00
Matt Arsenault f0af434b79 AMDGPU: Remove register class params from flat memory patterns 2020-08-15 12:12:33 -04:00
Matt Arsenault a7455652c0 AMDGPU: Fix global atomic saddr operand class 2020-08-15 12:12:28 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault e5077b5c2a AMDGPU: Fix matching wrong offsets for global atomic loads
These used signed offsets with a different size.
2020-08-15 12:12:17 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 47af1ac69a AMDGPU: Correct definitions for global saddr instructions
The VGPR component is a 32-bit offset, not 64-bits.

I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
2020-08-15 12:11:57 -04:00
Matt Arsenault 79298a5067 AMDGPU: Remove SIFixupVectorISel pass
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
2020-08-15 12:11:51 -04:00
Luofan Chen 266949b2bc [Attributor][NFC] Format code 2020-08-16 00:00:45 +08:00
Luofan Chen b7448a348b [Attributor][NFC] Use indexes instead of iterator
When adding elements when iterating, the iterator will become
valid, which could cause errors. This fixes the issue by using
indexes instead of iterator.
2020-08-15 23:09:46 +08:00
Cyndy Ishida 85d381eb02 [TextAPI] update DriverKit string value
String value differed from downstream, where upstream doesn't depend on
casing difference.
<rdar://problem/67106257>
2020-08-15 06:44:30 -07:00
Xing GUO 030df8242f [MachOYAML] Move EmitFunc to an inner scope. NFC. 2020-08-15 21:10:03 +08:00
Luofan Chen 87a85f3d57 [Attributor] Use internalized version of non-exact functions
This patch internalize non-exact functions and replaces of their uses
with the internalized version. Doing this enables the analysis of
non-exact functions.

We can do this because some non-exact functions with the same name
whose linkage is `linkonce_odr` or `weak_odr` should have the same
semantics, so we can safely internalize and replace use of them (the
result of the other version of this function should be the same.).
Note that not all functions can be internalized, e.g., function with
`linkonce` or `weak` linkage.

For now when specified in commandline, we internalize all functions
that meet the requirements without calculating the cost of such
internalzation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84167
2020-08-15 20:23:38 +08:00
Xing GUO 4a0b95dc5e [DWARFYAML] Simplify isEmpty(). NFC. 2020-08-15 20:10:29 +08:00
Dávid Bolvanský f134fc4f1b Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" 2020-08-15 12:14:57 +02:00
Martin Storsjö 3e7403a134 Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
This reverts commit 6dbf0cfcf7.

That commit caused failed assertions, e.g. like this:

$ cat sprintf-strcpy.c
char *ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); }

$ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu
clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value*,
llvm::Value::ReplaceMetadataUses): Assertion `New->getType() ==
getType() && "replaceAllUses of value with new value of different
type!"' failed.
2020-08-15 09:35:11 +03:00
Philip Reames 6b2105456a [Statepoint] Remove code related to inline operand bundles
This code becomes dead for valid IR after 48f4312 and a96fc46.  The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order.  By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports.  And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.
2020-08-14 20:29:41 -07:00
Philip Reames 48f4312d4e Remove inline gc arguments from statepoints
The "gc-live" operand bundles were recently added, and all tests have been updated to use that format.  A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement.

This is an extension to a96fc46.  "gc-live" hadn't been implemented at the point that patch was initially posted.
2020-08-14 19:44:24 -07:00
Stanislav Mekhanoshin 43a38dc251 [AMDGPU] Fix MAI ld/st hazard handling
It did not process hazard for ds_permute because it does not
load or store even though it is DS.

Differential Revision: https://reviews.llvm.org/D86003
2020-08-14 17:07:37 -07:00
Dávid Bolvanský f62de7c9c7 [SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only
Transformation creates big strings for big C values, so bail out for C > 128.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86004
2020-08-15 01:53:32 +02:00
Gui Andrade 05e3ab41e4 [MSAN] Avoid dangling ActualFnStart when replacing instruction
This would be a problem if the entire instrumented function was a call
to
e.g. memcpy

Use FnPrologueEnd Instruction* instead of ActualFnStart BB*

Differential Revision: https://reviews.llvm.org/D86001
2020-08-14 23:50:38 +00:00
Cameron McInally 92593f9e77 [SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors.
Differential Revision: https://reviews.llvm.org/D85982
2020-08-14 18:47:22 -05:00
Christopher Tetreault 416a6a85b1 [SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine
Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D82218
2020-08-14 16:40:34 -07:00
Philip Reames a96fc4638b Remove deopt and gc transition arguments from gc.statepoint intrinsic
(Forgot to land this a couple of weeks back.)

In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic.

The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy.

Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones.

Differential Revision: https://reviews.llvm.org/D80892
2020-08-14 16:07:40 -07:00
Fangrui Song 58f5966d5b Fix TargetSubtargetInfo derivatives after D85165 2020-08-14 15:50:53 -07:00
Craig Topper c7a0b2684f [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Jordan Rupprecht 38884641f2 Temporarily revert "[SCEVExpander] Add helper to clean up instrs inserted while expanding."
This reverts commit 7829c33084. The assertion is triggering on some internal code. A reduced test case is in progress.
2020-08-14 14:52:37 -07:00
Dávid Bolvanský 6dbf0cfcf7 [SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)
Transform sprintf(dst, "%s", str) -> strcpy(dst, str) if result is unused
Avoid sprintf(dest, "%s", str) -> llvm.memcpy(align 1 dest, align 1 str, strlen(str)+1) if optimizing for size.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85963
2020-08-14 23:48:53 +02:00
Gui Andrade 36ebabc153 [MSAN] Convert ActualFnStart to be a particular Instruction *, not BB
This allows us to add addtional instrumentation before the function start,
without splitting the first BB.

Differential Revision: https://reviews.llvm.org/D85985
2020-08-14 21:43:56 +00:00
Gui Andrade 97de0188dd [MSAN] Reintroduce libatomic load/store instrumentation
Have the front-end use the `nounwind` attribute on atomic libcalls.
This prevents us from seeing `invoke __atomic_load` in MSAN, which
is problematic as it has no successor for instrumentation to be added.
2020-08-14 20:31:10 +00:00
Xiangling Liao f759b4e43b [AIX] Generate unique module id based on Pid and timestamp
A unique module id, which is a part of sinit and sterm function names, is
necessary to be unique. However, `getUniqueModuleId` will fail if there is
no strong external symbol within a module. We turn to use Pid and timestamp
when this happens.

Differential Revision: https://reviews.llvm.org/D85527
2020-08-14 16:22:50 -04:00
Vitaly Buka fc4fd89852 [StackSafety] Use ValueInfo in ParamAccess::Call
This avoid GUID lookup in Index.findSummaryInModule.
Follow up for D81242.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85269
2020-08-14 12:42:44 -07:00
Greg McGary eef41efe00 [MachO] Add skeletal support for DriverKit platform
Define the platform ID = 10, and simple mappings between platform ID & name.

Reviewed By: MaskRay, cishida

Differential Revision: https://reviews.llvm.org/D85594
2020-08-14 12:36:43 -07:00
Haowei Wu ee5d07e6ce Remove unnecessary HEADER_DIRS in lib/InterfaceStub/CMakeLists.txt
This change removes unnecessary HEADER_DIRS from //llvm/lib/
InterfaceStub/CMakeLists.txt file.

Differential Revision: https://reviews.llvm.org/D85936
2020-08-14 11:22:50 -07:00
Matt Arsenault 5c5e6d951e TableGen/GlobalISel: Partially handle immAllOnesV/immAllZerosV
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.

I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.

Also doesn't try to handle undef elements like the DAG version.
2020-08-14 13:55:30 -04:00
Simon Pilgrim e9eb2dc332 [X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))
This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining

Another step towards PR41813

Recommit of rG9bd97d036398 with fixed offset adjustments
2020-08-14 18:43:19 +01:00
Matt Arsenault 40a142fa57 AMDGPU/GlobalISel: Match andn2/orn2 for more types
Unfortunately this ends up not working as expected on targets with
16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform
16-bit ops to i32.

The vector case annoyingly requires switching the checked opcode,
since constants for vectors aren't directly handled.

I also need to think more carefully about whether this is valid for i1.
2020-08-14 13:18:03 -04:00
Jordan Rupprecht fd9187f746 [NFC] Silence variables unused in release builds 2020-08-14 08:35:58 -07:00
Denis Antrushin 1c80a6ce5f [Statepoints] FixupStatepoint: properly set isKill on spilled register.
When spilling statepoint meta arg register it is incorrect to blindly
mark it as killed - it may be used in non-meta args (e.g., as call
parameter).
2020-08-14 22:19:20 +07:00
Matt Morehouse 891b2be85d Revert "[NFC][StackSafety] Move out sort from the loop"
This reverts commit 0426e28419 due to ASan
buildbot failure.
2020-08-14 08:17:35 -07:00
Johannes Doerfert 9240e48a58 [OpenMP][OMPIRBuilder] Use the source (=directory + filename) for locations
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85938
2020-08-14 08:59:25 -05:00
Denis Antrushin 5f6bee77fa [Statepoints] Spill GC Ptr regs in FixupStatepoints.
Extend FixupStatepointCallerSaved pass with ability to spill
statepoint GC pointer arguments (optionally allowing them on CSRs).
Special handling is required for invoke statepoints, because at MI
level single landing pad may be shared by multiple statepoints, so
we must ensure we spill landing pad's live-ins into the same stack
slots.

Full statepoint refactoring change set is available at D81603.

Reviewed By: skatkov

Differential Revision: https://reviews.llvm.org/D81647
2020-08-14 20:21:19 +07:00
Kazushi (Jam) Marukawa 2f01af764b [VE] Remove obsolete I8/I16 register classes
Remove I8/I16 register classes which are prepared to implement previously
to implement VE ABI.  However, it is possible to implement VE ABI correctly
without them.  Therefore, removing them now.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85905
2020-08-14 21:52:22 +09:00
Shinji Okumura 5f55a8193c [Attributor] Implement AAPotentialValues
This patch provides an implementation of `AAPotentialValues`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85632
2020-08-14 20:51:14 +09:00
Vitaly Buka 4c30d4b4e5 [NFC][StackSafety] Change map key comparison 2020-08-14 04:23:15 -07:00
Vitaly Buka 0426e28419 [NFC][StackSafety] Move out sort from the loop 2020-08-14 04:19:10 -07:00
Stefan Gränitz 9a47bcae7c [ORC][NFC] Refactor loop to determine name of init symbol in IRMaterializationUnit
This loop caused me a little headache once, because I didn't see the assigned variable is a member. The refactored version appears more readable to me.

Differential Revision: https://reviews.llvm.org/D85922
2020-08-14 11:34:44 +02:00
Sam Parker eb82d58f83 [NFC][ARM] Port MaybeCall into ARMTTImpl method
Renamed to maybeLoweredToCall.
2020-08-14 10:23:20 +01:00
Vitaly Buka 798eb71c3a [NFC][StackSafety] Dedup callees 2020-08-14 01:14:52 -07:00
Sebastian Neubauer 9aa0ff77bd [AMDGPU] Enable .rodata for amdpal os
PAL recently got support for multiple ELF sections and relocations,
therefore we can now use .rodata sections instead of forcing constants
into .text.

Differential Revision: https://reviews.llvm.org/D85895
2020-08-14 09:05:48 +02:00
David Sherwood 6c7957c990 [SVE] Fix bug in SVEIntrinsicOpts::optimizePTest
The code wasn't taking into account that the two operands
passed to ptest could be identical and was trying to erase
them twice.

Differential Revision: https://reviews.llvm.org/D85892
2020-08-14 07:57:21 +01:00
Sam Parker 725400f993 [NFCI][SimpleLoopUnswitch] Adjust CostKind query
When getUserCost was transitioned to use an explicit CostKind,
TCK_CodeSize was used even though the original kind was implicitly
SizeAndLatency so restore this behaviour. We now only query for
CodeSize when optimising for minsize.

I expect this to not change anything as, I think all, targets will
currently return the same value for CodeSize and SizeLatency. Indeed
I see no changes in the test suite for Arm, AArch64 and X86.

Differential Revision: https://reviews.llvm.org/D85829
2020-08-14 07:54:20 +01:00
Igor Kudrin 95fad44e34 [DebugInfo] Avoid an infinite loop with a truncated pre-v5 .debug_str_offsets.dwo.
dumpStringOffsetsSection() expects the size of a contribution to be
correctly aligned. The patch adds the corresponding verifications for
pre-v5 cases.

Differential Revision: https://reviews.llvm.org/D85739
2020-08-14 13:11:37 +07:00
Arthur Eubanks 48cd5b72b1 Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"
This reverts commit ab9fc8bae8.

Incorrect transformation if the result is used.
Causes breakages, e.g.
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/8193/
2020-08-13 21:05:03 -07:00
Peter Collingbourne c201f27225 hwasan: Emit the globals note even when globals are uninstrumented.
This lets us support the scenario where a binary is linked from a mix
of object files with both instrumented and non-instrumented globals.
This is likely to occur on Android where the decision of whether to use
instrumented globals is based on the API level, which is user-facing.

Previously, in this scenario, it was possible for the comdat from
one of the object files with non-instrumented globals to be selected,
and since this comdat did not contain the note it would mean that the
note would be missing in the linked binary and the globals' shadow
memory would be left uninitialized, leading to a tag mismatch failure
at runtime when accessing one of the instrumented globals.

It is harmless to include the note when targeting a runtime that does
not support instrumenting globals because it will just be ignored.

Differential Revision: https://reviews.llvm.org/D85871
2020-08-13 16:33:22 -07:00
Yuanfang Chen a5ed20b549 [NewPM][CodeGen] Add machine code verification callback
D83608 need this.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D85916
2020-08-13 16:13:01 -07:00
Ben Dunbobbin 4cb016cd2d [X86][ELF] Prefer lowering MC_GlobalAddress operands to .Lfoo$local for STV_DEFAULT only
This patch restricts the behaviour of referencing via .Lfoo$local
local aliases, introduced in https://reviews.llvm.org/D73230, to
STV_DEFAULT globals only.

Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility)
is an important scenario.

Benefits:

- Improves the size of object files by using fewer STT_SECTION symbols.

- The code reads a bit better (it was not obvious to me without going
  back to the code reviews why the canBenefitFromLocalAlias function
  currently doesn't consider visibility).

- There is also a side benefit in restoring the effectiveness of the
  --wrap linker option and making the behavior of --wrap consistent
  between LTO and normal builds for references within a translation-unit.
  Note: this --wrap behavior (which is specific to LLD) should not be
  considered reliable. See comments on https://reviews.llvm.org/D73230
  for more.

Differential Revision: https://reviews.llvm.org/D85782
2020-08-14 00:09:15 +01:00
Arthur Eubanks 41f49736a9 [ConstProp] Handle insertelement constants
Previously ConstantFoldExtractElementInstruction() would only work with
insertelement instructions, not contants. This properly handles
insertelement constants as well.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85865
2020-08-13 15:59:17 -07:00
Dávid Bolvanský ab9fc8bae8 [SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)
Solves 46489
2020-08-14 00:05:55 +02:00
David Green 0c390c22a5 Revert "[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz"
This reverts commit 18279a54b5 as it is
causing some chromium android test problems.
2020-08-13 22:40:36 +01:00
Austin Kerbow 7d1cb187fb [AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded
If we need a scratch register for the spill don't use the same scratch
register that is being used for the MBUF offset.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85772
2020-08-13 14:12:00 -07:00
Thomas Lively d53d952810 [WebAssembly] Allow inlining functions with different features
Allow inlining only when the Callee has a subset of the Caller's
features. In principle, we should be able to inline regardless of any
features because WebAssembly supports features at module granularity,
not function granularity, but without this restriction it would be
possible for a module to "forget" about features if all the functions
that used them were inlined.

Requested in PR46812.

Differential Revision: https://reviews.llvm.org/D85494
2020-08-13 13:57:43 -07:00
Dávid Bolvanský 5ef2287d36 [SLC] Optimize strncpy(a, a, C) to memcpy(a, a000, C)
Solves PR47154
2020-08-13 22:22:51 +02:00
Cameron McInally 21810b0e14 [SVE] Lower fixed length vector integer UMIN/UMAX
Differential Revision: https://reviews.llvm.org/D85926
2020-08-13 14:48:36 -05:00
Stefan Gränitz 5bcd32b744 [ORC][NFC] Fix typo in comment 2020-08-13 21:14:20 +02:00
Stefan Gränitz f12db8cf75 [ORC] cloneToNewContext() can work with a const-ref to ThreadSafeModule 2020-08-13 21:01:21 +02:00
Haowei Wu d650cbc349 [elfabi] Move llvm-elfabi related code to InterfaceStub library
This change moves elfabi related code to llvm/InterfaceStub library
so it can be shared by multiple llvm tools without causing cyclic
dependencies.

Differential Revision: https://reviews.llvm.org/D85678
2020-08-13 11:51:44 -07:00
Stanislav Mekhanoshin 0462aef5f3 [AMDGPU] Inhibit SDWA if target instruction has FI
Differential Revision: https://reviews.llvm.org/D85918
2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin d25cb5a8a2 [AMDGPU] Fix misleading SDWA verifier error. NFC.
The old error from GFX9 shall be updated to GFX9+.
2020-08-13 11:32:17 -07:00
Aditya Kumar 1a8c9cd1d9 Fix PR45442: Bail out when MemorySSA information is not available
Reviewers: sebpop, uabelho, fhahn
Reviewed by: fhahn

Differential Revision: https://reviews.llvm.org/D85881
2020-08-13 11:25:58 -07:00
Lang Hames adaadbfeac [JITLink][MachO] Return an error when MachO TLV relocations are encountered.
MachO TLV relocations aren't supported yet. Error out rather than falling
through to llvm_unreachable.
2020-08-13 11:19:35 -07:00
Sameer Arora 8d58eb11f9 [llvm-libtool-darwin] Refactor ArchiveWriter
Refactoring function `writeArchive` in ArchiveWriter. Added a new
function `writeArchiveBuffer` that returns the archive in a memory
buffer instead of writing it out to the disk. This refactor is necessary
so as to allow `llvm-libtool-darwin` to write universal files containing
archives.

Reviewed by jhenderson, MaskRay, smeenai

Differential Revision: https://reviews.llvm.org/D84858
2020-08-13 10:56:30 -07:00
Dávid Bolvanský 50c743fa71 [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 19:54:27 +02:00
David Green 2632c625ed [ARM] Mark VMINNMA/VMAXNMA as commutative
These operations take Qda and Rn register operands, which are
commutative so long as the instruction is not predicated.

Differential Revision: https://reviews.llvm.org/D85813
2020-08-13 18:01:11 +01:00
Cameron McInally e1a87f0a9b [SVE] Lower fixed length vector integer SMIN/SMAX
Differential Revision: https://reviews.llvm.org/D85855
2020-08-13 11:41:20 -05:00
Aditya Kumar 44716856db Fix PR45442: Bail out when MemorySSA information is not available 2020-08-13 09:31:18 -07:00
Bjorn Pettersson 11446b02c7 [VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load
This is a fixup to commit 43bdac2906, to make sure the
address space from the original load pointer is retained in the
vector pointer.

Resolves problem with
  Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
due to address space mismatch.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85912
2020-08-13 18:25:32 +02:00
Serguei Katkov 98ba0a5ffe [InstCombine] Handle gc.relocate(null) in one iteration
InstCombine adds users of transformed instruction to working list to
process on the same iteration. However gc.relocate may have a hidden
user (next gc.relocate) which is connected through gc.statepoint intrinsic and
there is no direct def-use chain between them.

In this case if the next gc.relocation is already processed it will not be added
to worklist and will not be able to be processed on the same iteration.
Let's we have the following case:
A = gc.relocate(null)
B = statepoint(A)
C = gc.relocate(B, hidden(A))
If C is already considered then after replacement of A with null, statepoint B
instruction will be added to the queue but not C.
C can be processed only on the next iteration.

If the chain of relocation is pretty long the many iteration may be required.
This change is to reduce the number of iteration to meet the latest changes
related to reducing infinite loop threshold.

This is a quick (not best) fix. In the follow up patches I plan to move gc relocation
handling into statepoint handler. This should also help to remove unused gc live
entries in statepoint bundle.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D75598
2020-08-13 23:16:27 +07:00
Fangrui Song 7f8c49b016 [llvm-objdump] Change symbol name/PLT decoding errors to warnings
If the referenced symbol of a J[U]MP_SLOT is invalid (e.g. symbol index 0), llvm-objdump -d will bail out:

```
error: 'a': st_name (0x326600) is past the end of the string table of size 0x7
```

where 0x326600 is the st_name field of the first entry past the end of .symtab

Change it to a warning to continue dumping.
`X86/plt.test` uses a prebuilt executable, so I pick `ELF/AArch64/plt.test`
which has a YAML input and can be easily modified.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85623
2020-08-13 08:13:42 -07:00
Simon Pilgrim 63863451d1 Fix unused variable warning. NFC.
Reduce the dyn_cast<> to a isa<> as that's all non-assert builds require, and move the cast<> inside the assert.
2020-08-13 15:43:20 +01:00
Simon Pilgrim cd3b850a4c rG9bd97d0363987b582 - Revert "[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))"
This reverts commit 9bd97d0363.

Seeing some codegen issues in internal testing.
2020-08-13 15:21:15 +01:00
Matt Arsenault c7191e3185 DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses
I think not unconditionally passing getDstAlign is broken, but leave
that for another change.
2020-08-13 09:33:17 -04:00
David Stenberg e8ebebb0bd [InstCombine] Fix incorrect Modified status
When removing instructions from unreachable blocks, and only debug info
intrinsics were removed, InstCombine could incorrectly return a false
Modified status.

This is fixed by making removeAllNonTerminatorAndEHPadInstructions()
also return how many debug info intrinsics that were removed, and take
that into account.

This was caught using the check introduced by D80916.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D85839
2020-08-13 15:10:41 +02:00
Carl Ritson d538c5837a [AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole
SIPreEmitPeephole does not process all terminators, which means
it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded
by a branch to the early exit block.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85872
2020-08-13 21:52:41 +09:00
Dávid Bolvanský f9264995a6 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 44587e2f7e. Sanitizer tests need to be updated.
2020-08-13 14:37:40 +02:00
Dávid Bolvanský 44587e2f7e [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 14:23:58 +02:00
Simon Pilgrim a31d20e67e [X86][SSE] IsElementEquivalent - add HOP(X,X) support
For HADD/HSUB/PACKS ops with repeated operands the lower/upper half element of each lane are known to be equivalent
2020-08-13 12:42:59 +01:00
Paul Walker e63cc8105a [SVE] Lower fixed length vector integer shifts.
Differential Revision: https://reviews.llvm.org/D85724
2020-08-13 12:35:47 +01:00
Kerry McLaughlin 30af595f05 [SVE][CodeGen] Legalisation of EXTRACT_VECTOR_ELT for scalable vectors
This patch changes SplitVecOp_EXTRACT_VECTOR_ELT to work correctly
for scalable vectors and also fixes an a bug in DAGCombiner where
the scalable property is dropped in visitTRUNCATE when attempting
to fold an extract + a truncate.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85754
2020-08-13 12:32:59 +01:00
Anna Welker 9eb9ba076a [ARM][MVE] Fix for tail predication for loops containing MVE gather/scatters
Fix to include non-predicated version of write-back gather in special case
treatment for deducting the instruction type.
(This is fixing https://reviews.llvm.org/D85138 for corner cases)

Differential Revision: https://reviews.llvm.org/D85889
2020-08-13 12:24:19 +01:00
Simon Pilgrim 8a41a1f567 BranchFolding.cpp - removes includes already included by BranchFolding.h. NFC. 2020-08-13 12:14:31 +01:00
Florian Hahn 3b0878a370 [DSE,MSSA] Fix crash when using tryToMergePartialOverlappingStores.
We are re-using tryToMergePartialOverlappingStores, which requires
earlier to domiante Later. In the long run,
tryToMergeParialOverlappingStores should be re-written using MemorySSA.

Fixes PR46513.
2020-08-13 12:07:56 +01:00
Paul Walker 130098228d [SVE] Lower fixed length vector integer ISD::SETCC operations.
Differential Revision: https://reviews.llvm.org/D85831
2020-08-13 12:01:56 +01:00
Dávid Bolvanský a0485421d2 Revert "[BPI] Improve static heuristics for integer comparisons"
This reverts commit 385c9d673f.
2020-08-13 12:59:15 +02:00
Paul Walker 9e04895258 [SVE] Lower fixed length integer extend operations.
Differential Revision: https://reviews.llvm.org/D85640
2020-08-13 11:54:53 +01:00
Dávid Bolvanský 385c9d673f [BPI] Improve static heuristics for integer comparisons
Similarly as for pointers, even for integers a == b is usually false.

GCC also uses this heuristic.

Reviewed By: ebrevnov

Differential Revision: https://reviews.llvm.org/D85781
2020-08-13 12:45:40 +02:00
Simon Pilgrim ebfa410433 SplitKit.cpp - removes includes already included by SplitKit.h. NFC.
Don't duplicate includes already provided by the module header.
2020-08-13 11:43:28 +01:00
Simon Pilgrim c4c1267cad DwarfDebug.cpp - removes includes already included by DwarfDebug.h. NFC.
Don't duplicate includes already provided by the module header.
2020-08-13 11:43:28 +01:00
Xing GUO b7d5d1ec64 [DWARFYAML] Replace InitialLength with Format and Length. NFC.
This change replaces the InitialLength of pub-tables with Format and
Length. All the InitialLength fields have been removed.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85880
2020-08-13 18:39:03 +08:00
David Sherwood 6af1677161 [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorStores
In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.

Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.

Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.

I've added new tests in

  CodeGen/AArch64/sve-intrinsics-stores.ll
  CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorStores.

Differential Revision: https://reviews.llvm.org/D84937
2020-08-13 11:07:17 +01:00
David Sherwood 3ec3fcb97a [CodeGen] In narrowExtractedVectorLoad bail out for scalable vectors
In narrowExtractedVectorLoad there is an optimisation that tries to
combine extract_subvector with a narrowing vector load. At the moment
this produces warnings due to the incorrect calls to
getVectorNumElements() for scalable vector types. I've got this
working for scalable vectors too when the extract subvector index
is a multiple of the minimum number of elements. I have added a
new variant of the function:

  MachineFunction::getMachineMemOperand

that copies an existing MachineMemOperand, but replaces the pointer
info with a null version since we cannot currently represent scaled
offsets.

I've added a new test for this particular case in:

  CodeGen/AArch64/sve-extract-subvector.ll

Differential Revision: https://reviews.llvm.org/D83950
2020-08-13 10:46:18 +01:00
Ali Tamur 0581c0b0ee Revert "[SCEV] Look through single value PHIs."
This reverts commit e441b7a7a0.

This patch causes a compile error in tensorflow opensource project. The stack trace looks like:

Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35

(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]

(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
  DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
        CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}

(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe

I will try to get a case from tensorflow or use creduce to get a small case.
2020-08-12 23:13:24 -07:00
Nadav Rotem d54c252bc8 [Clang options] Optimize optionMatches() runtime by removing mallocs
The method optionMatches() constructs 9865 std::string instances when comparing different
options. Many of these instances exceed the size of the internal storage and force memory
allocations. This patch adds an early exit check that eliminates most of the string allocations
while keeping the code simple.

Example inputs:
Prefix: /, Name: Fr
Prefix: -, Name: Fr
Prefix: -, Name: fsanitize-address-field-padding=
Prefix: -, Name: fsanitize-address-globals-dead-stripping
Prefix: -, Name: fsanitize-address-poison-custom-array-cookie
Prefix: -, Name: fsanitize-address-use-after-scope
Prefix: -, Name: fsanitize-address-use-odr-indicator
Prefix: -, Name: fsanitize-blacklist=

Differential Revision: D85538
2020-08-12 23:07:07 -07:00
Aditya Kumar f902a7eccf [HotColdSplit] Fix variable name spelling 2020-08-12 22:50:08 -07:00
Ruiling Song 18b1e67523 [AMDGPU] Fix crash when dag-combining bitcast
From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).

Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85804
2020-08-13 10:23:13 +08:00
Albion Fung 3136cbe29e [PowerPC] Implement Vector Shift Builtins
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.

Differential Revision: https://reviews.llvm.org/D83338
2020-08-12 18:26:58 -05:00
Nikita Popov eba5f5f798 [ValueTracking] Add abs intrinsics support to computeConstantRange()
Implementation is the same as for SPF_ABS.
2020-08-12 22:28:46 +02:00
Nikita Popov e2040d38a1 [ValueTracking] Support min/max intrinsics in computeConstantRange()
The implementation is the same as for the SPF_* case.
2020-08-12 22:07:29 +02:00
Sanjay Patel 23bd33c6ac [InstCombine] prefer xor with -1 because 'not' is easier to understand (PR32706)
This is a retry of rL300977 which was reverted because of infinite loops.
We have fixed all of the known places where that would happen, but there's
still a chance that this patch will cause infinite loops.

This matches the demanded bits behavior in the DAG and should fix:
https://bugs.llvm.org/show_bug.cgi?id=32706

Differential Revision: https://reviews.llvm.org/D32255
2020-08-12 15:50:33 -04:00
Roman Lebedev d6f0600c96
[NFC][InstCombine] Add FIXME's for getLogBase2() / visitUDivOperand()
These are not correctness issues.

In visitUDivOperand(), if the (potential) divisor is undef, then udiv is
already UB, so it is not incorrect to keep undef as shift amount.

But, that is suboptimal.
We could instead simply drop that select, picking the other operand.

Afterwards, getLogBase2() could assert that there is no undef in divisor.
2020-08-12 22:06:54 +03:00
Roman Lebedev 12d93a27e7
[InstCombine] Sanitize undef vector constant to 1 in X*(2^C) with X << C (PR47133)
While x*undef is undef, shift-by-undef is poison,
which we must avoid introducing.

Also log2(iN undef) is *NOT* iN undef, because log2(iN undef) u< N.

See https://bugs.llvm.org/show_bug.cgi?id=47133
2020-08-12 22:06:53 +03:00
Francesco Petrogalli c561f4d2ec [SVE][VLS] Don't combine logical AND.
Testing is performed when targeting 128, 256 and 512-bit wide vectors.

For 128-bit vectors, the original behavior of using NEON instructions is
preserved.

Differential Revision: https://reviews.llvm.org/D85479
2020-08-12 20:00:07 +01:00
Amara Emerson 2ff14957e8 [GlobalISel] Implement bit-test switch table optimization.
This is mostly a straight port from SelectionDAG. We re-use the actual bit-test
analysis part from SwitchLoweringUtils, which was factored out earlier to
support jump-tables.

Differential Revision: https://reviews.llvm.org/D85233
2020-08-12 11:31:39 -07:00
Simon Pilgrim 39de63aef9 Fix signed/unsigned comparison warnings. NFC. 2020-08-12 19:22:13 +01:00
Craig Topper a7a06ded8b Recommit "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and its follow up patches
This recommits the following patches now that D85684 has landed

1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
2020-08-12 10:45:27 -07:00
David Green 1bb3488685 [ARM] Predicated VFMA patterns
Similar to the Two op + select patterns that were added recently, this
adds some patterns for select + fma to turn them into predicated
operations.

Differential Revision: https://reviews.llvm.org/D85824
2020-08-12 18:35:01 +01:00
Simon Pilgrim 13d6cf0951 [X86][SSE] Pull out BUILD_VECTOR operand equivalence tests. NFC.
Pull out element equivalence code from isShuffleEquivalent/isTargetShuffleEquivalent, I've also removed many of the index modulos where possible.

First step toward simply adding some additional equivalence tests.
2020-08-12 18:20:18 +01:00
Craig Topper 5f7cdb2eff [X86][GlobalISel] Legalize G_ICMP results to s8.
We need to produce a setcc instruction which has an 8-bit result.
This gets rid of a bunch of cases that were using the s1->s8/s16/s32/s64
handling in selectZExt.

I'm not very familiar with GlobalISel yet so I'm not yet sure
the best way to do things. I'd especially like feedback on the
best way to handle the currently split 32-bit and 64-bit mode
handling.

Differential Revision: https://reviews.llvm.org/D85814
2020-08-12 10:13:59 -07:00
Cameron McInally ce2c991061 [SVE] Lower fixed length FP minnum/maxnum
Lower fixed length MINNUM/MAXNUM to scalable vectors. Cherry-picked from D71767 with added tests.

Differential Revision: https://reviews.llvm.org/D85744
2020-08-12 12:02:52 -05:00
Ilya Leoshkevich f5a252ed68 [SanitizerCoverage] Use zeroext for cmp parameters on all targets
Commit 9385aaa848 ("[sancov] Fix PR33732") added zeroext to
__sanitizer_cov_trace(_const)?_cmp[1248] parameters for x86_64 only,
however, it is useful on other targets, in particular, on SystemZ: it
fixes swap-cmp.test.

Therefore, use it on all targets. This is safe: if target ABI does not
require zero extension for a particular parameter, zeroext is simply
ignored. A similar change has been implemeted as part of commit
3bc439bdff ("[MSan] Add instrumentation for SystemZ"), and there were
no problems with it.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D85689
2020-08-12 18:38:12 +02:00
Krzysztof Parzyszek a2dc19b81b [Hexagon] Return scalar size in getMinVectorRegisterBitWidth() when no HVX
This fixes https://llvm.org/PR47128.
2020-08-12 10:13:58 -05:00
Anna Welker 4fe5615eab [ARM][MVE] Enable tail predication for loops containing MVE gather/scatters
Widen the scope of memory operations that are allowed to be tail predicated
to include gathers and scatters, such that loops that are auto-vectorized
with the option -enable-arm-maskedgatscat (and actually end up containing
an MVE gather or scatter) can be tail predicated.

Differential Revision: https://reviews.llvm.org/D85138
2020-08-12 15:32:37 +01:00
Matt Arsenault e14474a39a AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd
Remove the intermediate transform in the DAG path. I believe this is
the last non-deprecated intrinsic that needs handling.
2020-08-12 10:04:53 -04:00
Matt Arsenault 701228c411 AMDGPU: Handle intrinsics in performMemSDNodeCombine
This avoids a possible regression in a future patch
2020-08-12 10:04:53 -04:00
Xing GUO e891b6a75d [DWARFYAML] Make the address size of compilation units optional.
This patch makes the 'AddrSize' field optional. If the address size is
missing, yaml2obj will infer it from the object file.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85805
2020-08-12 21:47:32 +08:00
Xing GUO 386d5af04b [MachOYAML] Simplify the section data emitting function. NFC.
This patch helps simplify some codes in writeSectionData() function.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85821
2020-08-12 21:46:43 +08:00
Sanjay Patel cc892fd9f4 [VectorCombine] early exit if target has no vector registers
Based on post-commit discussion in:
D81766

Other vectorization passes (SLP and Loop) use this TTI API similarly.
2020-08-12 09:22:31 -04:00
Sanjay Patel 912c09e845 [InstCombine] eliminate a pointer cast around insertelement
I'm not sure if this solves PR46839 completely, but reducing the casting should help:
https://bugs.llvm.org/show_bug.cgi?id=46839

Differential Revision: https://reviews.llvm.org/D85647
2020-08-12 09:08:17 -04:00
Kai Nacke bca1b8ed99 [SystemZ/ZOS] Implement computeHostNumPhysicalCores
On z/OS, the information is stored in the Common System Data Area
(CSD). It is the number of CPs allocated to the current LPAR.

Reviewers: aganea, hubert.reinterpertcast, MaskRay

Reviewed By: hubert.reinterpertcast

Differential Revision: https://reviews.llvm.org/D85531
2020-08-12 08:31:33 -04:00
Sam Parker ea8448e361 [LoopUnroll] Adjust CostKind query
When TTI was updated to use an explicit cost, TCK_CodeSize was used
although the default implicit cost would have been the hand-wavey
cost of size and latency. So, revert back to this behaviour. This is
not expected to have (much) impact on targets since most (all?) of
them return the same value for SizeAndLatency and CodeSize.

When optimising for size, the logic has been changed to query
CodeSize costs instead of SizeAndLatency.

This patch also adds a testing option in the unroller so that
OptSize thresholds can be specified.

Differential Revision: https://reviews.llvm.org/D85723
2020-08-12 12:56:09 +01:00
Simon Pilgrim 9bd97d0363 [X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))
This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining

Another step towards PR41813
2020-08-12 12:16:36 +01:00
Simon Pilgrim a0c2c6aa42 [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
Only do this for AVX2+ targets as we still get some regressions on AVX1 without PERMPD/PERMQ
2020-08-12 11:31:05 +01:00
Cullen Rhodes 511d5aaca3 [Transforms][SROA] Skip uses of allocas where the type is scalable
When visiting load and store instructions in SROA skip scalable vectors.
This is relevant in the implementation of the 'arm_sve_vector_bits'
attribute that is used to define VLS types, where an alloca of a
fixed-length vector could be bitcasted to scalable. See D85128 for more
information.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85725
2020-08-12 09:35:48 +00:00
Florian Hahn e441b7a7a0 [SCEV] Look through single value PHIs.
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.

This increases precision of the analysis in some cases.

Reviewed By: mkazantsev, bmahjour

Differential Revision: https://reviews.llvm.org/D71539
2020-08-12 10:03:42 +01:00
Igor Kudrin 9ceb192e14 [llvm-dwarfdump] Avoid crashing if an abbreviation offset is invalid.
Note that DWARFUnit::getAbbreviations() returns nullptr if the
abbreviations could not be read, but callers used the returned
pointer without checking.

Differential Revision: https://reviews.llvm.org/D85738
2020-08-12 16:01:53 +07:00
Sjoerd Meijer 6716e7868e [ARM][MVE] tail-predication: overflow checks for backedge taken count.
This pick ups the work on the overflow checks for get.active.lane.mask,
which ensure that it is safe to insert the VCTP intrinisc that enables
tail-predication. For a 2d auto-correlation kernel and its inner loop j:

  M = Size - i;
  for (j = 0; j < M; j++)
    Sum += Input[j] * Input[j+i];

For this inner loop, the SCEV backedge taken count (BTC) expression is:

  (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body>

and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC
cannot be max" could not be determined. So overflow behaviour had to be assumed
in the loop tripcount expression that uses the BTC. As a result
tail-predication had to be forced (with an option) for this case.

This change solves that by using ScalarEvolution's helper
getConstantMaxBackedgeTakenCount which is able to determine the range of BTC,
thus can determine it is safe, so that we no longer need to force tail-predication
as reflected in the changed test cases.

Differential Revision: https://reviews.llvm.org/D85737
2020-08-12 09:32:26 +01:00
David Sherwood 88bbd30736 [SVE][CodeGen] Fix issues with EXTRACT_SUBVECTOR when using scalable FP vectors
In this patch I have fixed two issues:

1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.

The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:

  test/CodeGen/AArch64/sve-extract-subvector.ll

Differential Revision: https://reviews.llvm.org/D85516
2020-08-12 08:35:46 +01:00
Kazushi (Jam) Marukawa 5d549219df [VE] Change to promote i32 AND/OR/XOR operations
VE has only 64 bits AND/OR/XOR instructions.  We pretended that VE has 32 bits
instructions also, but doing it increase the number of generated instructions.
Therefore, we decide to promote 32 bits operations and use only 64 bits
instructions in back end.  We also avoid pretending that VE has 32 bits LEA
instruction.  Update regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85726
2020-08-12 16:23:50 +09:00
Craig Topper 6b3dc96e59 [X86][GlobalISel] Replace a misuse of SUBREG_TO_REG with INSERT_SUBREG.
SUBREG_TO_REG is supposed to be used when we know the producing
instruction already zeroed the bits we're extending. But that's
not the case here. So INSERT_SUBREG with an IMPLICIT_DEF is the
correct thing to use.
2020-08-11 23:51:02 -07:00
Kyungwoo Lee d73be5af0a [NFC] Factor out hasForceAttributes
This is a preparation for https://reviews.llvm.org/D85586.

Differential Revision: https://reviews.llvm.org/D85793
2020-08-12 02:16:57 -04:00
Petr Hosek 31e5f7120b [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-11 20:22:11 -07:00
Jordan Rupprecht 1a67522d3e [NFC] Inline variable only used in debug builds 2020-08-11 19:38:01 -07:00
Sanjay Patel b0b95dab1c [VectorCombine] add safety check for 0-width register
Based on post-commit discussion in D81766, Hexagon sets this to "0".
I'll see if I can come up with a test, but making the obvious
code fix first to unblock that target.
2020-08-11 20:30:02 -04:00
Thomas Lively 2985c02f79 [WebAssembly][AsmParser] Name missing features in error message
Rather than just saying that some feature is missing, report the exact
features to make the error message more useful and actionable.

Differential Revision: https://reviews.llvm.org/D85795
2020-08-11 17:26:14 -07:00
Vedant Kumar 30c1633386 Revert "[Instruction] Add updateLocationAfterHoist helper"
This reverts commit 4a646ca9e2.

This is causing some bots to fail with "!dbg attachment points at wrong
subprogram for function", like:

http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67958/steps/stage%201%20check/logs/stdio
2020-08-11 14:54:09 -07:00
Amy Huang 54b6cca0f2 [globalopt] Change so that emitting fragments doesn't use the type size of DIVariables
When turning on -debug-info-kind=constructor we ran into a "fragment covers
entire variable" error during thinlto. The fragment is currently always
emitted if there is no type size, but sometimes the variable has a
forward declared struct type which doesn't have a size.

This changes the code to get the type size from the GlobalVariable instead.

Differential Revision: https://reviews.llvm.org/D85572
2020-08-11 14:50:56 -07:00
Kazu Hirata cfdc96714b [Instcombine] Fix uses of undef (PR46940)
Without this patch, we attempt to distribute And over Xor even in
unsafe circumstances like so:

  undef & (true ^ true)  ==>  (undef & true) ^ (undef & true)

and evaluate it to undef instead of false.  Note that "true ^ true"
may show up implicitly with one true being part of a PHI node.

This patch fixes the problem by teaching SimplifyUsingDistributiveLaws
to not use undef as part of simplifications.

Reviewers: spatel, aqjune, nikic, lebedev.ri, fhahn, jdoerfert

Differential Revision: https://reviews.llvm.org/D85687
2020-08-11 14:13:32 -07:00
Vedant Kumar 4a646ca9e2 [Instruction] Add updateLocationAfterHoist helper
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.

Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).

For more context, see the discussion in https://reviews.llvm.org/D60913.

Differential Revision: https://reviews.llvm.org/D85670
2020-08-11 14:05:20 -07:00
Jian Cai 277873ce0f [AARCH64] [MC] add memtag as an alias of mte architecture extension
Add memtag as an alis of met architectture extesion to be consistent
with GNU as.

LINK:https://sourceware.org/bugzilla/show_bug.cgi?id=26339

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D85620
2020-08-11 13:28:47 -07:00
Nikita Popov 06d567059e [InstSimplify] Respect CanUseUndef in more places
Similar to what we do in IIQ, add an isUndefValue() helper that
checks for undef values while respective CanUseUndef. This makes
it much easier to search for places that don't respect the flag
yet.
2020-08-11 21:53:33 +02:00
Thomas Lively 1a69f02397 [WebAssembly][NFC] Replace WASM with standard Wasm
The officially specified abbreviation for WebAssembly is Wasm and the
spec explicitly calls out WASM as being an incorrect spelling. This
patch fixes a few comments and error messages to use the
spec-compliant abbreviation.

Differential Revision: https://reviews.llvm.org/D85764
2020-08-11 12:27:59 -07:00
diggerlin e9ac1495e2 [AIX][XCOFF] change the operand of branch instruction from symbol name to qualified symbol name for function declarations
SUMMARY:

1. in the patch  , remove setting storageclass in function .getXCOFFSection and construct function of class MCSectionXCOFF
there are

XCOFF::StorageMappingClass MappingClass;
XCOFF::SymbolType Type;
XCOFF::StorageClass StorageClass;
in the MCSectionXCOFF class,
these attribute only used in the XCOFFObjectWriter, (asm path do not need the StorageClass)

we need get the value of StorageClass, Type,MappingClass before we invoke the getXCOFFSection every time.

actually , we can get the StorageClass of the MCSectionXCOFF  from it's delegated symbol.

2. we also change the oprand of branch instruction from symbol name to qualify symbol name.
for example change
bl .foo
extern .foo
to
bl .foo[PR]
extern .foo[PR]

3. and if there is reference indirect call a function bar.
we also add
  extern .bar[PR]

Reviewers:  Jason liu, Xiangling Liao

Differential Revision: https://reviews.llvm.org/D84765
2020-08-11 15:26:19 -04:00
Yuanfang Chen 39617aaed9 NFC. Constify MachineVerifier::verify parameter 2020-08-11 11:59:45 -07:00
Dávid Bolvanský d68a2859ab [BPI] Teach BPI about bcmp function
bcmp is similar to memcmp
2020-08-11 20:44:53 +02:00
Jessica Paquette bebe6a6449 [GlobalISel] Combine (logic_op (op x...), (op y...)) -> (op (logic_op x, y))
This implements

```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```

when `op` is an extend, a shift, or an and.

This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)

This is implemented so it works both pre and post-legalization.

This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).

Differential Revision: https://reviews.llvm.org/D85050
2020-08-11 10:40:06 -07:00
Simon Pilgrim 2655bd51d6 [X86][SSE] combineShuffleWithHorizOp - canonicalize SHUFFLE(HOP(X,Y),HOP(Y,X)) -> SHUFFLE(HOP(X,Y))
Attempt to canonicalize binary shuffles of HOPs with commuted operands to an unary shuffle.
2020-08-11 18:13:03 +01:00
Nikita Popov d110d4aaff [InstSimplify] Forbid undef folds in expandBinOp
This is the replacement for D84250 based on D84792. As we recursively
fold with the same value twice, we need to disable undef folds,
to prevent an undef from being folded to two different values.

Reverting rG00f3579aea6e3d4a4b7464c3db47294f71cef9e4 and using the
test case from https://reviews.llvm.org/D83360#2145793, it no longer
performs the incorrect fold.

Differential Revision: https://reviews.llvm.org/D85684
2020-08-11 18:39:24 +02:00
Eric Christopher 8155cb27a2 Fold Opcode into assert uses to fix an unused variable warning without asserts. 2020-08-11 09:30:51 -07:00
Xing GUO 45a4f4c806 [DWARFYAML] Teach yaml2obj emit the correct line table program.
The following issues are addressed in this patch.

1. The operands of DW_LNE_set_discriminator should be an ULEB128 number
   rather than an address.
2. Test the emitted opcodes.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85717
2020-08-12 00:18:54 +08:00
Simon Pilgrim fe1f36986b [X86][SSE] combineShuffleWithHorizOp - avoid unnecessary subtraction. NFCI.
We can safely replace ((M - NumElts) % NumEltsPerLane) with (M % NumEltsPerLane) as the modulo result will be the same.
2020-08-11 17:07:32 +01:00
Matt Arsenault 0dc4c36d3a AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane
Fixup the special case constant bus handling pre-gfx10.
2020-08-11 11:56:16 -04:00
Whitney Tsang aa994d9867 [NFC][LoopUnrollAndJam] Use BasicBlock::replacePhiUsesWith instead of
static function updatePHIBlocks.

Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D85673
2020-08-11 15:35:14 +00:00
Jay Foad fa2b836ea3 [GlobalISel] Add G_ABS
This is equivalent to the new llvm.abs intrinsic added by D84125 with
is_int_min_poison=0.

Differential Revision: https://reviews.llvm.org/D85718
2020-08-11 16:34:37 +01:00
Sanjay Patel 1470ce4a76 [InstSimplify] fold min/max with matching min/max operands
I think this is the last remaining translation of an existing
instcombine transform for the corresponding cmp+sel idiom.

This interpretation is more general though - we can remove
mismatched signed/unsigned combinations in addition to the
more obvious cases.

min/max(X, Y) must produce X or Y as the result, so this is
just another clause in the existing transform that was already
matching a min/max of min/max.
2020-08-11 11:23:15 -04:00
Simon Pilgrim 91d59cbf1b [X86][SSE] Add HADD/SUB support to combineHorizOpWithShuffle
Handles some HOP(SHUFFLE,SHUFFLE) patterns and sets us up to improve some of the cases mentioned in PR41813.
2020-08-11 16:14:14 +01:00
Matt Arsenault 076305568c AMDGPU/GlobalISel: Prepare for more custom load lowerings
Slight restructuring of the code to avoid formatting changes when more
cases are handled here.
2020-08-11 11:09:05 -04:00
David Stenberg e2f3240472 [DebugInfo] Allow GNU macro extension to be emitted
Allow the GNU .debug_macro extension to be emitted for DWARF versions
earlier than 5. The extension is basically what became DWARF 5's format,
except that a DW_AT_GNU_macros attribute is emitted, and some entries
like the strx entries are missing. In this patch I emit GNU's indirect
entries, which are the same as DWARF 5's strp entries.

This patch adds the extension behind a hidden LLVM flag,
-use-gnu-debug-macro. I would later want to enable it by default when
tuning for GDB and targeting DWARF versions earlier than 5.

The size of a Clang 8.0 binary built with RelWithDebInfo and the flags
"-gdwarf-4 -fdebug-macro" reduces from 1533 MB to 1349 MB with
.debug_macro (compared to 1296 MB without -fdebug-macro).

Reviewed By: SouraVX, dblaikie

Differential Revision: https://reviews.llvm.org/D82975
2020-08-11 17:00:25 +02:00
David Stenberg bb640645f5 [DebugInfo] Simplify DwarfDebug::emitMacro
Broken out from a review comment on D82975. This is an NFC expect for
that the Macinfo macro string is now emitted using a single emitBytes()
invocation, so it can be done using a single string directive.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D83557
2020-08-11 17:00:25 +02:00
Benjamin Kramer d287a5a33f [GlobalISel] Remove unused variable. NFC. 2020-08-11 16:56:45 +02:00
Xing GUO 1d4bc08ce4 [DWARFYAML] Let the address size of line tables inferred from the object file.
Currently, the line table uses the first compilation unit's address size
as its address size. It's not the right behavior. The address size should be
inferred from the target machine.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D85707
2020-08-11 22:45:55 +08:00
Matt Arsenault e2f1b48f86 GlobalISel: Implement bitcast action for G_INSERT_VECTOR_ELT
This mirrors the support for the equivalent extracts. This also
creates a huge mess that would be greatly improved if we had any bit
operation combines.
2020-08-11 10:39:14 -04:00
Dinar Temirbulatov b1600d8b89 [NFC] Guard the cost report block of debug outputs with NDEBUG and
switch to SmallString, this is part of D57779.
2020-08-11 16:34:47 +02:00
Matt Arsenault 53f21e0fb7 TableGen/GlobalISel: Hack the operand order for atomic_store
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.

Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.

Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
2020-08-11 10:22:44 -04:00
Pavel Labath bb91c9fe7b [cmake] Make gtest macro definitions a part the library interface
These definitions are needed by any file which uses gtest. Previously we
were adding them in the add_unittest function, but over time we've
accumulated libraries (which don't go through add_unittest) building on
gtest and this has resulted in proliferation of the definitions.

Making this a part of the library interface enables them to be managed
centrally. This follows a patch for -Wno-suggest-override (D84554) which
took a similar approach.

Differential Revision: https://reviews.llvm.org/D84748
2020-08-11 15:22:44 +02:00
Florian Hahn 3483c28c5b [SCEV] ] If RHS >= Start, simplify (Start smax RHS) to RHS for trip counts.
This is the max version of D85046.

This change causes binary changes in 44 out of 237 benchmarks (out of
MultiSource/SPEC2000/SPEC2006)

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85189
2020-08-11 13:20:24 +01:00
Kerry McLaughlin 455ed56d48 [SVE][CodeGen] Legalisation of INSERT_VECTOR_ELT for scalable vectors
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.

This patch does the following to ensure this works for scalable vectors:
 - Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
 - Adds an IsScalable flag to getMemBasePlusOffset() and scales the
    offset by VScale when this is true
 - Ensures the immediate is clamped correctly by clampDynamicVectorIndex
    so that we don't try to use an out of range index

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D84874
2020-08-11 12:57:28 +01:00
David Stenberg 91bd9db2cd [DebugInfo] Allow GNU macro extension to be read
Allow the GNU .debug_macro extension to be parsed and printed by
llvm-dwarfdump. In an upcoming patch support will be added for emitting
that format also.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D82974
2020-08-11 13:30:52 +02:00
David Stenberg a73008c1ae [DebugInfo] Refactor .debug_macro checks. NFCI
Move the Dwarf version checks that determine if the .debug_macro section
should be emitted, into a DwarfDebug member. This is a preparatory
refactoring for allowing the GNU .debug_macro extension, which is a
precursor to the DWARF 5 format, to be emitted by LLVM for earlier DWARF
versions.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D82971
2020-08-11 13:30:52 +02:00
Benjamin Kramer 8134c2c7ff [AutoUpgrade] Simplify code
No need to set the name on an instruction that's going away, just move
it from the old instruction to the new one.
2020-08-11 13:22:58 +02:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
Benjamin Kramer 1de173c049 [X86][FPEnv] Fix a use after free
Found by asan!
2020-08-11 13:00:47 +02:00
Kazushi (Jam) Marukawa 59703f1736 [VE] Update bit operations
Change bitreverse/bswap/ctlz/ctpop/cttz regression tests to support i128
and signext/zeroext i32 types.  This patch also change the way to support
i32 types using 64 bits VE instructions.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85712
2020-08-11 19:42:12 +09:00
Paul Walker b6c7b7fa31 [SVE] Add ISD nodes for predicated integer extend inreg operations.
These are useful instructions when lowering fixed length vector
extends, so I've broken this patch out as kind of NFC like work.

Differential Revision: https://reviews.llvm.org/D85546
2020-08-11 11:39:26 +01:00
Simon Pilgrim 49016eeab6 [X86] Rename combineVectorPackWithShuffle -> combineHorizOpWithShuffle. NFC.
The plan is to use this for (F)HADD/SUB opcodes as well as PACKs - similar to how we use combineShuffleWithHorizOp
2020-08-11 11:38:43 +01:00
Paul Walker d542feb8e4 [SVE] Lower fixed length vector integer subtract operations.
Differential Revision: https://reviews.llvm.org/D85665
2020-08-11 11:32:12 +01:00
Kai Nacke d6f710fd46 [NFC] Fix typo in comment.
Twelvth -> Twelfth
2020-08-11 05:27:56 -04:00
Kai Nacke b3aece0531 [SystemZ/ZOS] Add binary format goff and operating system zos to the triple
Adds the binary format goff and the operating system zos to the triple
class. goff is selected as default binary format if zos is choosen as
operating system. No further functionality is added.

Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay

Reviewed By: efriedma, tahonermann, hubert.reinterpertcast

Differential Revision: https://reviews.llvm.org/D82081
2020-08-11 05:26:26 -04:00
Florian Hahn 0b774acf11 [SLP] Make sure instructions are ordered when computing spill cost.
The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.

The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82444
2020-08-11 11:18:12 +02:00
Dávid Bolvanský c2f0101310 [InstCombine] ~(~X + Y) -> X - Y
Proof:
https://alive2.llvm.org/ce/z/4xharr

Solves PR47051

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85593
2020-08-11 11:05:42 +02:00
Florian Hahn 7829c33084 [SCEVExpander] Add helper to clean up instrs inserted while expanding.
SCEVExpander already tracks which instructions have been inserted n
InsertedValues/InsertedPostIncValues. This patch adds an additional
vector to collect the instructions in insertion order. This can then be
used to remove exactly the instructions inserted by the expander.

This replaces ExpandedValuesCleaner, which in some cases might remove
values not inserted by the expander (e.g. if a value was dead before
insertion and is then used during expansion).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D84327
2020-08-11 09:30:31 +01:00
Sam Parker 8f92f3c2ea [RDA] Fix DBG_VALUE issues
We skip debug instructions in RDA so we cannot attempt to look them
up in our instruction map without causing a crash. But some of the
methods select the last instruction in the block and this
instruction may be a debug instruction... So, use getLastNonDebugInstr
instead of calling back on a MachineBasicBlock.

MachineBasicBlock iterators have also been updated to use
instructionsWithoutDebug so we can avoid the manual checks for debug
instructions.

Differential Revision: https://reviews.llvm.org/D85658
2020-08-11 09:03:09 +01:00
Juneyoung Lee 63b5b92bc9 [LazyValueInfo] Let getEdgeValueLocal look into freeze instructions
This patch makes getEdgeValueLocal more precise when a freeze instruction is
given, by adding support for freeze into constantFoldUser

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84629
2020-08-11 16:39:34 +09:00
Shinji Okumura 06eee8748f [Attributor][NFC] Connect AAPotentialValues with AAValueSimplify
This patch enables `AAValueSimplify` to use information from `AAPotentialValues`

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85668
2020-08-11 15:52:02 +09:00
Craig Topper 9201efb3b9 [X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns.
By factoring out the end of tryVPTERNLOG, we can use the same code
to directly match X86ISD::VPTERNLOG. This allows us to remove
around 3-4K worth of X86GenDAGISel.inc.
2020-08-10 23:15:58 -07:00
QingShan Zhang 61ede38da0 [CodeGen] Expand float operand for STRICT_FSETCC/STRICT_FSETCCS
This patch is the continue work of https://reviews.llvm.org/D69281
to implement the way that expands STRICT_FSETCC/STRICT_FSETCCS.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D81906
2020-08-11 05:55:00 +00:00
Haowei Wu db91320a89 Revert "Move ELFObjHandler to TextAPI library"
This reverts commit e6f8ba12e6 due
to build failures.
2020-08-10 21:31:29 -07:00
Haowei Wu e6f8ba12e6 Move ELFObjHandler to TextAPI library
This change moves ELFObjHandler to llvm/TextAPI library so it can
be used by different llvm tools.
2020-08-10 21:23:39 -07:00
Wang, Pengfei 9512525947 [X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics.
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85385
2020-08-11 10:28:41 +08:00
Xing GUO 3c5758964c [macho2yaml] Refactor the DWARF section dumpers.
This patch refactors the DWARF section dumpers. When dumping a DWARF
section, if the DWARF parser fails to parse the section, we will dump it
as a raw content section. This patch also fixes a bug in
DWARFYAML::Data::isEmpty(). Finally, a test case that tests dumping the
__debug_aranges section is added.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85506
2020-08-11 10:18:34 +08:00
Lang Hames 6fd30f0669 [llvm-jitlink] Update llvm-jitlink to use TargetProcessControl. 2020-08-10 17:19:48 -07:00
Johannes Doerfert fa5d22a045 [OpenMP][NFC] Reuse OMPIRBuilder `struct ident_t` handling in Clang
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D80735
2020-08-10 17:13:26 -05:00
jasonliu 20abff0481 [XCOFF][AIX] Use TE storage mapping class when large code model is enabled
Summary:
Use TE SMC instead of TC SMC in large code model mode,
so that large code model TOC entries could get placed after all
the small code model TOC entries, which reduces the chance of TOC overflow.

Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D85455
2020-08-10 19:52:10 +00:00
Puyan Lotfi 7bc03f5553 [MachineOutliner][AArch64] WA for multiple stack fixup cases in MachineOutliner.
In cases where MachineOutliner candidates either are:

  * noreturn
  * have calls with no available LR or free regs
  * Don't use SP

we can end up hitting stack fixup code for the caller and the callee for
a FrameID of MachineOutlinerDefault. This triggers the assert:

  `assert(OF.FrameConstructionID != MachineOutlinerDefault &&
          "Can only fix up stack references once");`

in AArch64InstrInfo.cpp. This assert exists for now because a lot of the
fixup code is not tested to handle fixing up more than once and needs
some better checks and enhancements to avoid potentially generating
illegal code.

I've filed a Bugzilla report to track this until these cases are handled
by the AArch64 MachineOutliner: https://bugs.llvm.org/show_bug.cgi?id=46767

This diff detects cases that will cause these multiple stack fixups and
prune the Candidates from `RepeatedSequenceLocs`.

    Differential Revision: https://reviews.llvm.org/D83923
2020-08-10 15:43:30 -04:00
Wei Mi 4cd8e9b169 [SampleFDO] Stop letting findCalleeFunctionSamples return unrelated profiles
for invoke instructions.

We see a warning of "No debug information found in function foo: Function
profile not used" in a case. The function foo is called by an invoke
instruction. It has no debug information because it has attribute((nodebug))
in the definition. It shouldn't have profile instance in the sample profile
but compiler thinks it does, that turns out to be a compiler bug in
findCalleeFunctionSamples. The bug is exposed when sample-profile-merge-inlinee
is enabled recently.

Currently in findCalleeFunctionSamples, CalleeName is unset and is empty for
invoke instruction. For empty CalleeName, findFunctionSamplesAt will treat
the call as an indirect call and will return any inline instance profile at
the same location as the instruction. That leads to a wrong profile being
returned to function foo.

The patch set CalleeName when the instruction is an invoke.

Differential Revision: https://reviews.llvm.org/D85664
2020-08-10 12:41:09 -07:00
Thomas Lively 514445e035 [WebAssembly][ConstantFolding] Fold fp-to-int truncation intrinsics
Constant fold both the trapping and saturating versions of the
WebAssembly truncation intrinsics. The tests are adapted from the
WebAssembly spec tests for the corresponding instructions.

Requested in PR46982.

Differential Revision: https://reviews.llvm.org/D85392
2020-08-10 12:40:05 -07:00
Stanislav Mekhanoshin 08803f0e62 Unbundle KILL bundles in VirtRegRewriter
SplitKit forms invalid COPY subreg bundles without a leading
BUNDLE instruction. That manifests itself in post-RA scheduler
counting instruction and asserting on "Instruction count mismatch".

The bundle shall be undone by VirtRegRewriter::expandCopyBundle(),
but it does not because VirtRegRewriter::handleIdentityCopy() can
turn COPY bundle into a KILL bundle.

Process KILLs as well.

Differential Revision: https://reviews.llvm.org/D85484
2020-08-10 11:58:37 -07:00
Matt Arsenault 6fe6b29c29 AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers 2020-08-10 13:46:52 -04:00
Matt Arsenault 68fab44acf AMDGPU: Fix visiting physreg dest users when folding immediate copies
This can fold the immediate into the physical destination, but this
should not look for further users of the register. Fixes regression
introduced by 766cb615a3.
2020-08-10 13:46:51 -04:00
Alexandre Ganea a3036b3863 Re-Re-land: [CodeView] Add full repro to LF_BUILDINFO record
This patch adds the missing information to the LF_BUILDINFO record, which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).

Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.

For more information see PR36198 and D43002.

Differential Revision: https://reviews.llvm.org/D80833
2020-08-10 13:36:30 -04:00
Craig Topper 96dfc783b2 [BreakFalseDeps][X86] Move operand loop out of X86's getUndefRegClearance and put in the pass.
X86 is the only user of this interface in tree. Previously the
X86 pass would loop over operands looking for one undef operand for
the pass to fix. But there could theoretically be multiple operands
to fix. So it makes more sense for the pass to do the looping and
ask the target if an operand needs to be fixed.
2020-08-10 10:32:29 -07:00
Wouter van Oortmerssen 582fd474dd [WebAssembly] wasm64: fix memory.init operand types
I had assumed they would all become in i64, but this is not necessary as long as data segments stay 32-bit, see:
https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md

Differential Revision: https://reviews.llvm.org/D85552
2020-08-10 10:15:20 -07:00
Mircea Trofin 211117b660 [NFC][MLInliner] remove curly braces for a few sinle-line loops 2020-08-10 09:32:21 -07:00
Mircea Trofin d5c81be3ca [NFC][MLInliner] Set up the logger outside the development mode advisor
This allows us to subsequently configure the logger for the case when we
use a model evaluator and want to log additional outputs.

Differential Revision: https://reviews.llvm.org/D85577
2020-08-10 09:22:17 -07:00
Fangrui Song 3b21a07fd7 [PGO] Delete dead comdat renaming code related to GlobalAlias. NFC
A GlobalAlias is an address-taken user of its aliased function.
canRenameComdatFunc has excluded such cases.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D85597
2020-08-10 09:02:04 -07:00
Simon Pilgrim 9a368d2b00 [X86][SSE] shuffle(hop,hop) - canonicalize unary hop(x,x) shuffle masks
If a shuffle is referring to both the lower and upper half lanes of an unary horizontal op, then canonicalize the mask to only refer to the lower half.
2020-08-10 16:09:27 +01:00
jasonliu 7866442b3f [XCOFF] Adjust .rename emission sequence
Summary:
AIX assembler does not generate correct relocation when .rename
appear between tc entry label and .tc directive.
So only emit .rename after .tc/.comm or other linkage is emitted.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D85317
2020-08-10 14:48:24 +00:00
Xiangling Liao 6ef801aa6b [AIX] Static init frontend recovery and backend support
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.

On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.

Differential Revision: https://reviews.llvm.org/D84534
2020-08-10 10:10:49 -04:00
Simon Pilgrim 07e673a02b [X86][SSE] Pull out shuffle(hop,hop) combine into combineShuffleWithHorizOp helper. NFC. 2020-08-10 15:08:57 +01:00
Stefan Pintilie 81883ca074 [PowerPC] Add option to control PCRel GOT indirect linker optimization
Add a hidden option to the compiler to control a the PC Relative GOT indirect
linker optimization.

If this option is set to false the compiler will no loger produce the
relocations required by the linker to perform the optimization.

Reviewed By: nemanjai, NeHuang, #powerpc

Differential Revision: https://reviews.llvm.org/D85377
2020-08-10 09:07:17 -05:00
Sam Parker 4f9f4b21e0 [ARM] Unrestrict Armv8-a IT when at minsize
IT blocks with more than one instruction were performance deprecated in Armv8
but that doesn't mean we should follow that advise when optimising for size.

Differential Revision: https://reviews.llvm.org/D85638
2020-08-10 14:59:53 +01:00
James Henderson ca05601cd2 [DebugInfo] Don't error for zero-length arange entries
Although the DWARF specification states that .debug_aranges entries
can't have length zero, these can occur in the wild. There's no
particular reason to enforce this part of the spec, since functionally
they have no impact. The patch removes the error and introduces a new
warning for premature terminator entries which does not stop parsing.

This is a relanding of cb3a598c87, adding the missing obj2yaml part
that was needed.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46805. See also
https://reviews.llvm.org/D71932 which originally introduced the error.

Reviewed by: ikudrin, dblaikie, Higuoxing

Differential Revision: https://reviews.llvm.org/D85313
2020-08-10 14:57:52 +01:00
Simon Pilgrim e6dc2c8ce7 [X86][SSE] combineTargetShuffle - rearrange shuffle(hop,hop) matching to delay shuffle mask manipulation. NFC.
Check that we're shuffling hadd/pack ops first before altering shuffle masks.

First step towards adding extra functionality, plus it avoids costly shuffle mask manipulation if not necessary.
2020-08-10 14:13:19 +01:00
Matt Arsenault 40188f807d AMDGPU/GlobalISel: Don't try to handle undef source operand
This is now illegal MIR
2020-08-10 08:49:43 -04:00
Matt Arsenault f9c279b057 PeepholeOptimizer: Use Register 2020-08-10 08:49:36 -04:00
Matt Arsenault 0bbf4bb8db GlobalISel: Remove redundant check for empty blocks 2020-08-10 08:46:30 -04:00
Matt Arsenault a0ec81f70d AMDGPU/GlobalISel: Merge load/store select cases 2020-08-10 08:46:26 -04:00
Matt Arsenault c8b17874e5 AMDGPU/GlobalISel: Fix typo 2020-08-10 08:41:17 -04:00
Matt Arsenault 9533f0ea68 AMDGPU/GlobalISel: Use nicer form of buildInstr 2020-08-10 08:41:07 -04:00
Nico Weber bc5d68dd8a Revert "[DebugInfo] Don't error for zero-length arange entries"
This reverts commit cb3a598c87.
Breaks build of check-llvm dep obj2yaml everywhere.
2020-08-10 08:20:35 -04:00
Sanjay Patel bebca662d4 [InstCombine] rearrange code for readability; NFC
The code comment refers to the path where we change the
size of the integer type, so handle that first, otherwise
deal with the general case.
2020-08-10 08:07:29 -04:00
James Henderson cb3a598c87 [DebugInfo] Don't error for zero-length arange entries
Although the DWARF specification states that .debug_aranges entries
can't have length zero, these can occur in the wild. There's no
particular reason to enforce this part of the spec, since functionally
they have no impact. The patch removes the error and introduces a new
warning for premature terminator entries which does not stop parsing.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46805. See also
https://reviews.llvm.org/D71932 which originally introduced the error.

Reviewed by: ikudrin, dblaikie

Differential Revision: https://reviews.llvm.org/D85313
2020-08-10 12:48:31 +01:00
Florian Hahn 8393b9fd1f [LoopInterchange] Move instructions from preheader to outer loop header.
Instructions defined in the original inner loop preheader may depend on
values defined in the outer loop header, but the inner loop header will
become the entry block in the loop nest. Move the instructions from the
preheader to the outer loop header, so we do not break dominance. We
also have to check for unsafe instructions in the preheader. If there
are no unsafe instructions, all instructions should be movable.

Currently we move all instructions except the terminator and rely on
LICM to hoist out invariant instructions later.

Fixes PR45743
2020-08-10 12:41:33 +01:00
Florian Hahn 54cb552b96 [LoopInterchange] Form LCSSA phis for values in orig outer loop header.
Values defined in the outer loop header could be used in the inner loop
latch. In that case, we need to create LCSSA phis for them, because after
interchanging they will be defined in the new inner loop and used in the
new outer loop.
2020-08-10 11:33:19 +01:00
Qiu Chaofan dbcfbffc7a [PowerPC] Add intrinsic to read or set FPSCR register
This patch introduces two intrinsics: llvm.ppc.setflm and
llvm.ppc.readflm. They read from or write to FPSCR register
(floating-point status & control) which contains rounding mode and
exception status.

To ensure correctness of program, we need to prevent FP operations from
being moved across these intrinsics (mffs/mtfsf instruction), so here I
set them as scheduling boundaries. We can relax such restriction if
FPSCR is modeled well in the future.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D84914
2020-08-10 18:27:45 +08:00
Simon Pilgrim c0c3b9a25f [ScalarizeMaskedMemIntrin] Scalarize constant mask expandload as shuffle(build_vector,pass_through)
As noticed on D66004, scalarization of an expandload with a constant mask as a chain of irregular loads+inserts makes it tricky to optimize before lowering, resulting in difficulties in merging loads etc.

This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc.

Differential Revision: https://reviews.llvm.org/D85416
2020-08-10 11:05:57 +01:00
Igor Kudrin d400606f8c [DebugInfo] Fix initialization of DwarfCompileUnit::LabelBegin.
This also fixes the condition in the assertion in
DwarfCompileUnit::getLabelBegin() because it checked something unrelated
to the returned value.

Differential Revision: https://reviews.llvm.org/D85437
2020-08-10 15:57:21 +07:00
Petar Avramovic 0d58d9e8fb AMDGPU/GlobalISel: Lower G_FREM
Add custom lower for G_FREM.

Differential Revision: https://reviews.llvm.org/D84324
2020-08-10 10:10:46 +02:00
Vitaly Buka 1970eefb17 [NFC][StackSafety] Add a couple of early returns 2020-08-09 23:42:09 -07:00
Vitaly Buka 8d91ce8f58 [NFC][StackSafety] Count dataflow inputs 2020-08-09 23:32:41 -07:00
Vitaly Buka dee812a297 [StackSafety] Fix union which produces wrapped sets 2020-08-09 23:20:17 -07:00
Vitaly Buka a6feeb1c6b [NFC][StackSafety] Avoid assert in getBaseObjec 2020-08-09 23:20:17 -07:00
Juneyoung Lee ef018cb65c [BuildLibCalls] Add noundef to standard I/O functions
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.

With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).

— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).

In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85345
2020-08-10 10:58:25 +09:00
Vitaly Buka 3a34228bff [StackSafety] Don't keep FullSet in index
Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09 15:01:46 -07:00
Vitaly Buka 654266bea9 [StackSafety] Use getSignedMin() to serialize ranges
Almost NFC as it's important only for full sets which should not
be serialized at all.
2020-08-09 14:53:13 -07:00
Piotr Sobczak 62d8b8a225 Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting
in such a copy to subtargets supporting 64-bit scalar compare,
and mapping the copy to S_CMP_LG_U64.

Before introducing the S_CSELECT pattern with explicit SCC
(0045786f14), there was no need
for handling 64-bit copy to SCC ($scc = COPY sreg_64).

The proposed handling to read only the low bits was however
based on a false premise that it is only one bit that matters,
while in fact the copy source might be a vector of booleans and
all bits need to be considered.

The practical problem of mapping the 64-bit copy to SCC is that
the natural instruction to use (S_CMP_LG_U64) is not available
on old hardware. Fix it by restricting the problematic pattern
to subtargets supporting the instruction (hasScalarCompareEq64).

Differential Revision: https://reviews.llvm.org/D85207
2020-08-09 20:50:30 +02:00
Florian Hahn d236e1c7b6 [InstSimplify/NewGVN] Add option to control the use of undef.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.

This patch adds an option to SimplifyQuery to disable the use of undef.

Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84792
2020-08-09 19:16:56 +01:00
Florian Hahn 23817cbd0b [SCEVExpander] Make sure cast properly dominates Builder's IP.
The selected cast must properly dominate the Builder's IP, so we cannot
re-use the cast, if it matches the builder's IP.
2020-08-09 16:51:19 +01:00
Aditya Kumar 53ac144848 [HotColdSplit] Add options for splitting cold functions in separate section
Add support for (if enabled) splitting cold functions into a separate section
in order to further boost locality of hot code.

Authored by: rjf (Ruijie Fang)
Reviewed by: hiraditya,rcorcs,vsk

Differential Revision: https://reviews.llvm.org/D85331
2020-08-09 08:48:12 -07:00
Sanjay Patel 43bdac2906 [VectorCombine] try to create vector loads from scalar loads
This patch was adjusted to match the most basic pattern that starts with an insertelement
(so there's no extract created here). Hopefully, that removes any concern about
interfering with other passes. Ie, the transform should almost always be profitable.

We could make an argument that this could be part of canonicalization, but we
conservatively try not to create vector ops from scalar ops in passes like instcombine.

If the transform is not profitable, the backend should be able to re-scalarize the load.

Differential Revision: https://reviews.llvm.org/D81766
2020-08-09 09:05:06 -04:00
Florian Hahn c70f0b9d4a [SCEVExpander] Avoid re-using existing casts if it means updating users.
Currently the SCEVExpander tries to re-use existing casts, even if they
are not exactly at the insertion point it was asked to create the cast.
To do so in some case, it creates a new cast at the insertion point and
updates all users to use the new cast.

This behavior is problematic, because it changes the IR outside of the
instructions created during the expansion. Therefore we cannot
completely undo all changes made during expansion.

This re-use should be only an extra optimization, so only using the new
cast in the expanded instructions should not be a correctness issue.
There are many cases equivalent instructions are created during
expansion.

This patch also adjusts findInsertPointAfter to skip instructions
inserted during expansion. This enables re-using existing casts without
the renaming any uses, by picking a better insertion point.

Reviewed By: efriedma, lebedev.ri

Differential Revision: https://reviews.llvm.org/D84399
2020-08-09 13:25:17 +01:00
David Green 186a7f81e8 [ARM] Add VADDV and VMLAV patterns for v16i16
This adds patterns for v16i16's vecreduce, using all the existing code
to go via an i32 VADDV/VMLAV and truncating the result.

Differential Revision: https://reviews.llvm.org/D85452
2020-08-09 11:09:49 +01:00
David Green 8590e5abad [ARM] Allow vecreduce_add in tail predicated loops
This allows vecreduce_add in loops so that we can tailpredicate them.

Differential Revision: https://reviews.llvm.org/D85454
2020-08-09 10:57:17 +01:00
David Green 296faa91ed [ARM] Some formatting and predicate VRHADD patterns. NFC
This formats some of the MVE patterns, and adds a missing
Predicates = [HasMVEInt] to some VRHADD patterns I noticed
as going through. Although I don't believe NEON would ever
use the patterns (as it would use ADDL and VSHRN instead)
they should ideally be predicated on having MVE instructions.
2020-08-09 10:07:52 +01:00
Craig Topper bc8be30540 [X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, i16->i64, i32->i64.
These all seem to be handled by tablegen pattern imports.
2020-08-09 00:26:15 -07:00
Craig Topper fdfdee98ac [DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1.
These aren't the canonical forms we'd get from InstCombine, but
we do have X86 tests for them. Recognizing them is pretty cheap.

While there make use of APInt:isSignedMinValue/isSignedMaxValue
instead of creating a new APInt to compare with. Also use
SelectionDAG::getAllOnesConstant helper to hide the all ones
APInt creation.
2020-08-08 22:27:16 -07:00
Petr Hosek a4d78d23c5 Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit ccbc1485b5 which
is still failing on the Windows MLIR bots.
2020-08-08 17:08:23 -07:00
Petr Hosek ccbc1485b5 [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-08 16:44:08 -07:00
Thomas Lively cc612c2908 [WebAssembly] Fix FastISel address calculation bug
Fixes PR47040, in which an assertion was improperly triggered during
FastISel's address computation. The issue was that an `Address` set to
be relative to the FrameIndex with offset zero was incorrectly
considered to have an unset base. When the left hand side of an add
set the Address to be 0 off the FrameIndex, the right side would not
detect that the Address base had already been set and could try to set
the Address to be relative to a register instead, triggering an
assertion.

This patch fixes the issue by explicitly tracking whether an `Address`
has been set rather than interpreting an offset of zero to mean the
`Address` has not been set.

Differential Revision: https://reviews.llvm.org/D85581
2020-08-08 15:23:11 -07:00
Craig Topper d3153b5ca2 [X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros.
This was blocking isTypeLegal call so that we could do a particular
transform on illegal types before type legalization. But the we
create a target specific node using that type. We shouldn't do
that if the type isn't legal. So I think we should just always
make sure the type is legal.

I suspect that in order to get the condition VT to not be a vector
of i1 we already completed type legalization anyway so this probably
doesn't matter much in practice.
2020-08-08 14:19:13 -07:00
Craig Topper 966a58e329 [X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP. 2020-08-08 13:11:47 -07:00
Dávid Bolvanský c814eca3e4 [AArch64RegisterInfo] Supress new warning 2020-08-08 21:47:01 +02:00
Craig Topper 815a9b256b [X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites.
This is just a thin wrapper around computeRegisterLivness which
we can just call directly. The only real difference is that
isSafeToClobberEFLAGS returns a bool and computeRegisterLivness
returns an enum. So we need to check for the specific enum value
that isSafeToClobberEFLAGS was hiding.

I've also adjusted which sites pass an explicit value for
Neighborhood since the default for computeRegisterLivness is 10.
2020-08-08 12:31:58 -07:00
Craig Topper 8d3ae64b04 Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
I messed up the bug numbers in the commit message before

Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR46315.
2020-08-08 11:53:14 -07:00
Craig Topper 761f568420 Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
This reverts commit 44b260cb0a.

I messed up the bug number in the commit message so I'm reverting
to fix it.
2020-08-08 11:53:14 -07:00
Simon Pilgrim cc15380f10 [X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI.
Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.
2020-08-08 19:36:18 +01:00
Craig Topper 44b260cb0a [X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places
Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR43014
2020-08-08 11:29:41 -07:00
Simon Pilgrim f13e92d4b2 [InstCombine] Use CreateVectorSplat(ElementCount) variant directly
This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally
2020-08-08 19:26:02 +01:00
Roman Lebedev e492f0e03b
[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev 1f452ac1d7
[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based 2020-08-08 20:00:28 +03:00
Roman Lebedev a587bf3eb0
[NFC][SimplifyCFG] Count the number of invokes turned into calls due to empty cleanup blocks 2020-08-08 20:00:27 +03:00
Sanjay Patel f22ac1d15b [DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP division, part 2
Follow-up to D82716 / rGea71ba11ab11
We do not have the fabs removal fold in IR yet for the case
where the sqrt operand is repeated, so that's another potential
improvement.
2020-08-08 10:38:06 -04:00
Benjamin Kramer 38537307e5 lib/CodeGen doesn't depend on lib/Passes. 2020-08-08 13:40:24 +02:00
Juneyoung Lee b6d9add71b [InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y)
This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y)
to x or y.
This was needed to resolve slowdown after D84940 is applied.

I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear.
This patch conservatively writes the pattern in a separate function,
foldSelectWithFrozenICmp.

The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85533
2020-08-08 15:22:29 +09:00
Craig Topper 514b00c439 [X86] Limit the scope of the min/max canonicalization in combineSelect
Previously the transform was doing these two canonicalizations
(x > y) ? x : y -> (x >= y) ? x : y
(x < y) ? x : y -> (x <= y) ? x : y

But those don't seem to be useful generally. And they actively
pessimize the cases in PR47049.

This patch limits it to
(x > 0) ? x : 0 -> (x >= 0) ? x : 0
(x < -1) ? x : -1 -> (x <= -1) ? x : -1

These are the cases mentioned in the comments as the motivation
for the canonicalization. These allow the CMOV to use the S
flag from the compare thus improving opportunities to use a TEST
or the flags from an arithmetic instruction.
2020-08-07 22:51:49 -07:00
Keno Fischer c58674df14 [X86] Don't produce bad x86andp nodes for i1 vectors
In D85499, I attempted to fix this same issue by canonicalizing
andnp for i1 vectors, but since there was some opposition to such
a change, this commit just fixes the bug by using two different
forms depending on which kind of vector type is in use. We can
then always decide to switch the canonical forms later.

Description of the original bug:
We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. For i1 vectors, switch the pattern
to (andnp (not cond), x), which is the canonical form for `kandn`
on mask registers.

Fixes https://github.com/JuliaLang/julia/issues/36955.

Differential Revision: https://reviews.llvm.org/D85553
2020-08-07 20:05:47 -04:00
Yuanfang Chen f5b5ccf2a6 Reland "Revert "[NewPM][CodeGen] Introduce machine pass and machine pass manager""
This relands commit 320eab2d55.

The test failed because it was looking for x86-linux target
unconditionally. Now it gets the default target.
2020-08-07 16:40:49 -07:00
Matt Arsenault 3c0597a9e4 AMDGPU: Avoid explicitly listing all the memory nodes 2020-08-07 19:22:46 -04:00
Vitaly Buka 648228bcc3 [NFC][StackSafety] Fix statistics 2020-08-07 16:18:52 -07:00
Arthur Eubanks 7abef41674 [NewPM] Print 'Skipping pass' as pass instrumentation
If OptNoneInstrumentation prints it instead, 'Skipping pass' will print for even required passes.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85493
2020-08-07 15:02:02 -07:00
Mircea Trofin 64372d93bc [NFC][MLInliner] Refactor logging implementation
This prepares it for logging externally-specified outputs.

Differential Revision: https://reviews.llvm.org/D85451
2020-08-07 14:56:56 -07:00
Vitaly Buka 7547508b7a Revert "[StackSafety] Skip ambiguous lifetime analysis"
This reverts commit 0b2616a804.

Crashes with safe-stack.
2020-08-07 14:02:50 -07:00
Vitaly Buka 7d4996033b [StackSafety,NFC] Add Stats counters 2020-08-07 14:02:50 -07:00
Gui Andrade 17ff170e3a Revert "[MSAN] Instrument libatomic load/store calls"
Problems with instrumenting atomic_load when the call has no successor,
blocking compiler roll

This reverts commit 33d239513c.
2020-08-07 19:45:51 +00:00
Yuanfang Chen 320eab2d55 Revert "[NewPM][CodeGen] Introduce machine pass and machine pass manager"
This reverts commit 911565d108.

Broke some non-Linux bots.
2020-08-07 11:59:58 -07:00
Jianzhou Zhao aedaa077f5 Reduce dropTriviallyDeadConstantArrays cumulative time percentage from 17% to 4%
The history of dropTriviallyDeadConstantArrays is like this. Because the appending linkage uses too much memory (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251381.html), dropTriviallyDeadConstantArrays was introduced (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to release unused constant arrays. Recently, dropTriviallyDeadConstantArrays was improved (https://reviews.llvm.org/rG81f385b0c6ea37dd7195a65be162c75bbdef29d2) to reduce its quadratic cost.

Our recent LTO profiling shows that when a target is large, 15-20% of time cost is from the SetVector::insert called by dropTriviallyDeadConstantArrays.

A large application has hundreds or thousands of modules; each module calls dropTriviallyDeadConstantArrays once for cleaning up tens of thousands of ConstantArrays a module has. In those ConstantArrays, usually around 5 can be deleted; a very very few deleted ConstantArrays reference other ConstantArrays: less than 10 out of millions.

Given this, the cost of SetVector::insert is mainly from the construction of WorkList from ArrayConstants. This motivated the fix that iterates ArrayConstants directly, and uses WorkList only when necessary.

Our evaluation shows that
1) The cumulative time percentage of dropTriviallyDeadConstantArrays is reduced from 15-17% to 4-6%.
2) For targets with LTO time > 20min, the time reduction is about 20%.
3) No observable performance impact for build without using LTO.

{F12506218}
{F12506221}

Reviewed By: mehdi_amini, tejohnson, jdoerfert

Differential Revision: https://reviews.llvm.org/D85379
2020-08-07 11:36:30 -07:00
Arthur Eubanks 1bf4629f11 [PPC] Rename bool-ret-to-int -> ppc-bool-ret-to-int
Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D85391
2020-08-07 11:27:05 -07:00
Arthur Eubanks 2b5502c350 [NFC] Use value initializer for OVERLAPPED
To fix
../llvm/lib/Support/Windows/Path.inc(1265,21): warning: missing field
'InternalHigh' initializer [-Wmissing-field-initializers]
  OVERLAPPED OV = {0};

Differential Revision: https://reviews.llvm.org/D85480
2020-08-07 11:18:33 -07:00
Vang Thao 04bd5b5286 [AMDGPU] Fix not rescheduling without clustering
Regions are sometimes skipped which should be rescheduled without memory op
clustering. RegionIdx is not incremented when iterating over regions that
are flagged to be skipped, causing the index to be incorrect.

Thanks to Vang Thao for discovering this bug!

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85498
2020-08-07 11:15:58 -07:00
Yuanfang Chen 911565d108 [NewPM][CodeGen] Introduce machine pass and machine pass manager
machine pass could define four methods:
- `PreservedAnalyses run(MachineFunction &, MachineFunctionAnalysisManager &)`
- `Error doInitialization(Module &, MachineFunctionAnalysisManager &)`
- `Error doFinalization(Module &, MachineFunctionAnalysisManager &)`
- `Error run(Module &, MachineFunctionAnalysisManager &)`

machine pass manger:
- MachineFunctionAnalysisManager:
  Basically an AnalysisManager<MachineFunction> augmented with the ability to
  register and query IR analyses
- MachineFunctionPassManager: support only two methods, `addPass` and `run`

Reviewed By: arsenm, asbirlea, aeubanks

Differential Revision: https://reviews.llvm.org/D67687
2020-08-07 11:00:31 -07:00
Yuanfang Chen 954bd9c861 [NewPM] Only verify loop for nonskipped user loop pass
No verification for pass mangers since it is not needed.
No verification for skipped loop pass since the asserted condition is not used.

Add a BeforeNonSkippedPass callback for this. The callback needs more
inputs than its parameters to work so the callback is added on-the-fly.

Reviewed By: aeubanks, asbirlea

Differential Revision: https://reviews.llvm.org/D84977
2020-08-07 11:00:31 -07:00
Mitch Phillips 382df1c674 Revert "Reland D64327 [MC][ELF] Allow STT_SECTION referencing SHF_MERGE on REL targets"
This reverts commit b497665d98.

Spent some time trying to reproduce this locally, reverting in a
desparate attempt to fix the sanitizer buildbot:
 - http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/28828

I don't know exactly why or how this patch breaks the bots, but it seems
pretty concrete that it's the culprit.
2020-08-07 10:56:33 -07:00
Amy Kwan 98eccec3ae [PowerPC] Add Vector Extract/Expand/Count with Mask, Move to VSR Mask Instruction Definitions and MC Tests
This patch adds the instruction definitions and assembly/disassembly tests for
the following set of instructions:

Vector Extract [byte | half | word | doubleword | quad] with mask
Vector Expand [byte | half | word | doubleword | quad] with mask
Move to VSR [byte | byte immediate | half | word | doubleword | quad] with mask
Vector Count Mask Bits [byte | half | word | doubleword]

Differential Revision: https://reviews.llvm.org/D83724
2020-08-07 11:02:08 -05:00
Kamau Bridgeman d8c6d083c9 [PowerPC][PCRelative] Set TLS unsupported with PC relative memops
Introduce a fatal error if any thread local storage code is compiled
using pc relative memory operations as well as a hidden override
option `-enable-ppc-pcrel-tls` so that this support can be incrementally
added if possible.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D85448
2020-08-07 10:56:24 -05:00
Jay Foad ffe1edfc53 [NFC][GVN] Fix "avaliable" typos
Differential Revision: https://reviews.llvm.org/D85520
2020-08-07 14:22:24 +01:00
Bevin Hansson 5de6c56f7e [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics.
Summary:
This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat,
which perform signed and unsigned saturating left shift,
respectively.

These are useful for implementing the Embedded-C fixed point
support in Clang, originally discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html
and
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html

Reviewers: leonardchan, craig.topper, bjope, jdoerfert

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83216
2020-08-07 15:09:24 +02:00
Simon Pilgrim 66a163f328 [DAG] GetDemandedBits - remove custom AND handling.
As mentioned on D85463, we should be using SimplifyMultipleUseDemandedBits (which is the default fallback).

The minor regression in illegal-bitfield-loadstore.ll will be addressed properly by D77804.
2020-08-07 12:55:47 +01:00
Simon Pilgrim fcefb53222 Remove unreachable break. NFC 2020-08-07 12:37:49 +01:00
Kazushi (Jam) Marukawa 63bc5d7863 [VE] Change to expand multiply related instructions
Change to expand MULHU/MULHS/UMUL_LOHI/SMUL_LOHI for i32 and i64 since
those instructions are not available on Aurora SX VE.  Some of them
are used in expansion of i128 multiply, so need to modify them to
support i128.  Then, update basic arithmetic regression tests of
i128 and signed/unsigned i32 typed integer values.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85490
2020-08-07 18:22:25 +09:00
Igor Kudrin 1eade73d8b [DebugInfo] Remove DwarfUnit::getDwarfVersion(). NFC.
This helper method was used only in one place, which can easily use the
direct call.

Differential revision: https://reviews.llvm.org/D85438
2020-08-07 15:55:44 +07:00
Igor Kudrin b6b0ff18a3 [DebugInfo] Clean up DIEUnit. NFC.
This removes members of the DIEUnit class which were used only in unit
tests. Note also that child classes shadowed some of these methods,
namely, getDwarfVersion() was overridden in DwartfUnit and getLength()
was overridden in DwarfCompileUnit.

Differential Revision: https://reviews.llvm.org/D85436
2020-08-07 15:55:44 +07:00
Shinji Okumura c575ba28de [Attributor] AAPotentialValues Interface
This is a split patch of D80991.
This patch introduces AAPotentialValues and its interface only.
For more detail of AAPotentialValues abstract attribute, see the original patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83283
2020-08-07 17:35:12 +09:00
Christian Kühnel f3cc4df51d Revert "[CMake] Simplify CMake handling for zlib"
This reverts commit 1adc494bce.
This patch broke the Windows compilation on buildbot and pre-merge testing:
http://lab.llvm.org:8011/builders/mlir-windows/builds/5945
https://buildkite.com/llvm-project/llvm-master-build/builds/780
2020-08-07 09:36:49 +02:00
QingShan Zhang 2b2bfdb474 [NFC] Add the stats for load/store cluster
We have the stats for MacroFusion but miss it for load/store cluster.
2020-08-07 07:09:48 +00:00
David Sherwood 0905d9f31e [SVE][CodeGen] Fix bug with store of unpacked FP scalable vectors
Fixed an incorrect pattern in lib/Target/AArch64/AArch64SVEInstrInfo.td
for storing out <vscale x 2 x f32> unpacked scalable vectors. Added
a couple of tests to

  test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll

Differential Revision: https://reviews.llvm.org/D85441
2020-08-07 07:19:09 +01:00
biplmish cce1b0e891 [PowerPC] Implement Vector Extract Low/High Order Builtins in LLVM/Clang
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D84622
2020-08-07 01:02:29 -05:00
QingShan Zhang 55de46f3b2 [PowerPC] Support constrained fp operation for setcc
The constrained fp operation fcmp was added by https://reviews.llvm.org/D69281.
This patch is trying to add the support for PowerPC backend.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D81727
2020-08-07 05:16:36 +00:00
QingShan Zhang 3359ea62ed [Scheduling] Create the missing dependency edges for store cluster
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa
as they both depend on the base register "reg"

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |
|        |
|        |
|    +---+---+
|    |  SUa  |  Load 0(reg)
|    +---+---+
|        ^
|        |
|        |
|    +---+---+
+----+  SUb  |  Load 4(reg)
     +-------+

But if it is store cluster, we need to create it as follow shows to avoid the instruction store
depend on scheduled in-between SUb and SUa.

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |         Missing       +-------+
|        | +-------------------->+   y   |
|        | |                     +---+---+
|    +---+-+-+                       ^
|    |  SUa  |  Store x 0(reg)       |
|    +---+---+                       |
|        ^                           |
|        |  +------------------------+
|        |  |
|    +---+--++
+----+  SUb  |  Store y 4(reg)
     +-------+

Reviewed By: evandro, arsenm, rampitec, foad, fhahn

Differential Revision: https://reviews.llvm.org/D72031
2020-08-07 04:58:03 +00:00
Vitaly Buka 7fb9de2c6f [StackSafety,NFC] Fix tests in debug 2020-08-06 20:46:39 -07:00
Shinji Okumura f13f2e16f0 [Attributor] Check violation of returned position nonnull and noundef attribute in AAUndefinedBehavior
This patch is a follow up of D84733.
If a function has noundef attribute in returned position, instructions that return undef or poison value cause UB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85178
2020-08-07 12:02:42 +09:00
Vitaly Buka 58b95c9b2b [StackSafety,NFC] Add debug counters 2020-08-06 19:24:02 -07:00
Vitaly Buka 92dcf12b2f [StackSafety,NFC] Use CHECK-EMPTY in tests 2020-08-06 19:19:51 -07:00
Vitaly Buka faeeed6f52 [LLParser,NFC] Simplify forward GV refs update
Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85238
2020-08-06 19:18:51 -07:00
Vitaly Buka 0b2616a804 [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-06 19:10:33 -07:00
Vitaly Buka 5c6d9b2bbf [LTO,NFC] Skip generateParamAccessSummary when empty
addGlobalValueSummary can check newly added FunctionSummary
and set HasParamAccess to mark that generateParamAccessSummary
is needed.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D85182
2020-08-06 19:01:19 -07:00
Kazushi (Jam) Marukawa f92e0d9384 [VE] Optimize trunc related instructions
Change to not generate truncate instructions if all use of a truncate
operation don't care about higher bits.  For example, an i32 add
instruction doesn't care about higher 32 bits in 64 bit registers.
Updates regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85418
2020-08-07 09:21:05 +09:00
Jessica Paquette c8a282bcf7 [GlobalISel] Fix computing known bits for loads with range metadata
In GlobalISel, if you have a load into a small type with a range, you'll hit
an assert if you try to compute known bits on it starting at a larger type.

e.g.

```
%x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n)
...
%y:_(s32) = G_SOMETHING %x
```

When we walk through G_SOMETHING and hit the load, the width of our known bits
is 32. However, the width of the range is going to be 8. This will cause us
to hit an assert.

To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate
the range type to match the bitwidth of the known bits we're calculating.

Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this
works now.

https://reviews.llvm.org/D85375
2020-08-06 16:47:07 -07:00
Matt Arsenault 1ad051dd8c GlobalISel: Implement lower for G_INSERT_VECTOR_ELT 2020-08-06 19:29:17 -04:00
Yonghong Song c50f5dece9 BPF: fix libLLVMBPFCodeGen.so build failure
Buildbot reported a build failure when building shared
library libLLVMBPFCodeGen.so with unknown reference to
"createCFGSimplificationPass".

Commit 87cba43402 ("BPF: add a SimplifyCFG IR pass during
generic Scalar/IPO optimization") added an IR pass SimplifyCFG
by BPF target. The commit called function
createCFGSimplificationPass() defined in "Scalar" library.
Add this library in Target/BPF/LLVMBuild.txt so
shared library build can succeed.
2020-08-06 15:27:15 -07:00
Matt Arsenault 87b2af8140 AMDGPU/GlobalISel: Enable s_{and|or}n2_{b32|b64} patterns 2020-08-06 18:00:38 -04:00
Roman Lebedev be02adfad7
[InstCombine] Fold (x + C1) * (-1<<C2) --> (-C1 - x) * (1<<C2)
Negator knows how to do this, but the one-use reasoning is getting
a bit muddy here, we don't really want to increase instruction count,
so we need to both lie that "IsNegation" and have an one-use check
on the outermost LHS value.
2020-08-06 23:40:16 +03:00
Roman Lebedev 0c1c756a31
[InstCombine] Generalize %x * (-1<<C) --> (-%x) * (1<<C) fold
Multiplication is commutative, and either of operands can be negative,
so if the RHS is a negated power-of-two, we should try to make it
true power-of-two (which will allow us to turn it into a left-shift),
by trying to sink the negation down into LHS op.

But, we shouldn't re-invent the logic for sinking negation,
let's just use Negator for that.

Tests and original patch by: Simon Pilgrim @RKSimon!

Differential Revision: https://reviews.llvm.org/D85446
2020-08-06 23:39:53 +03:00
Roman Lebedev 7ce76b06ec
[InstCombine] Fold sdiv exact X, -1<<C --> -(ashr exact X, C)
While that does increases instruction count,
shift is obviously better than a division.

Name: base
Pre: (1<<C1) >= 0
%o0 = shl i8 1, C1
%r = sdiv exact i8 C0, %o0
  =>
%r = ashr exact i8 C0, C1

Name: neg
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0
  =>
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0

Name: reverse
Pre: C1 != 0 && C1 u< 8
%t0 = ashr exact i8 C0, C1
%r = sub i8 0, %t0
  =>
%o0 = shl i8 -1, C1
%r = sdiv exact i8 C0, %o0

https://rise4fun.com/Alive/MRplf
2020-08-06 23:37:16 +03:00
Roman Lebedev 47aec80e4a
[NFC][InstCombine] Negator: add a comment about negating exact arithmentic shift 2020-08-06 23:37:16 +03:00
Roman Lebedev 442cb88f53
[InstCombine] Generalize sdiv exact X, 1<<C --> ashr exact X, C fold to handle non-splat vectors 2020-08-06 23:37:15 +03:00
Craig Topper ffc248f3b8 [LegalTypes] Move VSELECT node creation out of WidenVSELECTAndMask and push to 2 of the 3 callers.
One of the callers only wants the condition, but the vselect can
be simplified by getNode making it hard or impossible to retrieve
the condition.

Instead, return the condition and make the other 2 callers
responsible for creating the vselect node using the condition.
Rename the function to WidenVSELECTMask accordingly.

Differential Revision: https://reviews.llvm.org/D85468
2020-08-06 13:18:16 -07:00
Yonghong Song 87cba43402 BPF: add a SimplifyCFG IR pass during generic Scalar/IPO optimization
The following bpf linux kernel selftest failed with latest
llvm:
  $ ./test_progs -n 7/10
  ...
  The sequence of 8193 jumps is too complex.
  verification time 126272 usec
  stack depth 320
  processed 114799 insns (limit 1000000)
  ...
  libbpf: failed to load object 'pyperf600_nounroll.o'
  test_bpf_verif_scale:FAIL:110
  #7/10 pyperf600_nounroll.o:FAIL
  #7 bpf_verif_scale:FAIL

After some investigation, I found the following llvm patch
  https://reviews.llvm.org/D84108
is responsible. The patch disabled hoisting common instructions
in SimplifyCFG by default. Later on, the code changes and a
SimplifyCFG phase with hoisting on cannot do the work any more.

A test is provided to demonstrate the problem.
The IR before simplifyCFG looks like:
  for.cond:
    %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
    %cmp = icmp ult i32 %i.0, 6
    br i1 %cmp, label %for.body, label %for.cond.cleanup

  for.cond.cleanup:
    %2 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %cmp2 = icmp eq i8* %2, null
    %conv = zext i1 %cmp2 to i32
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %1) #3
    call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3
    ret i32 %conv

  for.body:
    %3 = load i8*, i8** %frame_ptr, align 8, !tbaa !2
    %tobool.not = icmp eq i8* %3, null
    br i1 %tobool.not, label %for.inc, label %land.lhs.true

The first two insns of `for.cond.cleanup` and `for.body`, load and
icmp, can be hoisted to `for.cond` block. With Patch D84108, the
optimization is delayed. But unfortunately, later on loop rotation
added addition phi nodes to `for.body` and hoisting cannot
be done any more.

Note such a hoisting is beneficial to bpf programs as
bpf verifier does path sensitive analysis and verification.
The hoisting preverts reloading from stack which will assume
conservative value and increase exploited insns. In this case,
it caused verifier failure.

To fix this problem, I added an IR pass from bpf target
to performance additional simplifycfg with hoisting common inst
enabled.

Differential Revision: https://reviews.llvm.org/D85434
2020-08-06 13:16:00 -07:00
Snehasish Kumar 8d943a928d [NFC] Rename BBSectionsPrepare -> BasicBlockSections.
Rename the BBSectionsPrepare pass as suggested by the review comment in
https://reviews.llvm.org/D85368.

Differential Revision: https://reviews.llvm.org/D85380
2020-08-06 13:12:06 -07:00
Sanjay Patel 250a167c41 [InstSimplify] avoid crashing by trying to rem-by-zero
Bug was noted in the post-commit comments for:
rGe8760bb9a8a3
2020-08-06 16:06:31 -04:00
Anton Afanasyev a7478fab6c [SLP] Fix order of `insertelement`/`insertvalue` seed operands
Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.

Fixes llvm.org/pr44067

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83779
2020-08-06 22:09:24 +03:00
dfukalov 4ccc38813e [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995
2020-08-06 21:43:27 +03:00
Matt Arsenault e00201539f GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT
Use the same basic strategy as LegalizeVectorTypes. Try to index into
smaller pieces if there's a constant index, and otherwise fall back to
a stack temporary.
2020-08-06 14:33:16 -04:00
Matt Arsenault 1a0c0944c6 AMDGPU: Define raw/struct variants of buffer atomic fadd
Somehow the new FP atomic buffer intrinsics ended up using the legacy
style for buffer intrinsics.
2020-08-06 13:36:19 -04:00
Matt Arsenault eae9c54148 AArch64/GlobalISel: Fix verifier error after selecting returnaddress
This was caching the wrong register to re-use later.
2020-08-06 13:18:05 -04:00
Mircea Trofin ca7973cf18 [NFC]{MLInliner] Point out the tests' model dependencies 2020-08-06 09:57:26 -07:00
Matt Arsenault 90eb7d5283 AMDGPU: Fix spilling of 96-bit AGPRs 2020-08-06 12:42:07 -04:00
Matt Arsenault 56270d1d42 AMDGPU/GlobalISel: Start trying to handle AGPR bank
Try to use AGPR banks for the various merge/unmerge type
operations. Previously these would introduce copies to VGPR.
2020-08-06 12:39:50 -04:00
Matt Arsenault 34040a4f61 GlobalISel: Define InvalidRegBankID enum value 2020-08-06 12:39:49 -04:00
Mircea Trofin 87fb7aa137 [llvm][MLInliner] Don't log 'mandatory' events
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.

Fixed the code, also expanded the test to capture this.

Differential Revision: https://reviews.llvm.org/D85373
2020-08-06 09:04:15 -07:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault 63c4be53cf AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub
This produces worse results right now for i8 vectors, but that should
be addressed when we actually try to optimize packed vectors.
2020-08-06 11:08:45 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
jasonliu e5062a6caf [XCOFF][AIX] Put each jump table in an independent section if -ffunction-sections is specified
If a function is in a unique section, putting all jump tables in
 .rodata will prevent functions that have a jump table to get
garbage collect by the linker.
Therefore, we need to put jump table into a unique section as well.

Reviewed By: Xiangling_L

Differential Revision: https://reviews.llvm.org/D84761
2020-08-06 14:31:04 +00:00
Matt Arsenault 5a503521e7 AMDGPU/GlobalISel: Implement expansion for rsq.clamp
Not sure why we handle this removed instruction on newer subtargets
for this one and no others, but maintain compatibility with the DAG.
2020-08-06 10:23:25 -04:00
Matt Arsenault c015cbc68b AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops 2020-08-06 10:07:22 -04:00
Matt Arsenault 28124a0a63 AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering
We really need to put this undef padding stuff into a helper
somewhere, but leave that for when this is moved to generic code.
2020-08-06 09:55:35 -04:00
Matt Arsenault 6c7f640bf7 AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses 2020-08-06 09:50:36 -04:00
Matt Arsenault 37894ba661 AMDGPU/GlobalISel: Make s16 phi legal
If we were to have an operation with an s16 def that needs to be
executed in a waterfall loop, not having s16 legal would place an
avoidable burden on RegBankSelect to widen it.
2020-08-06 09:41:14 -04:00
Matt Arsenault 5316256709 AMDGPU/GlobalISel: Fix assert on copy to vcc
This was trying to constrain a physical register. By the verifier's
understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so
don't try to handle physregs.
2020-08-06 09:41:14 -04:00
Raphael Isemann 1de43bd6df Revert "PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI."
This reverts commit 87c5437afd.

The commit includes several headers in the middle of a function, which
breaks pretty much everything.
2020-08-06 15:15:43 +02:00
Xing GUO 40506d5e2f [DWARFYAML][debug_info] Make the 'Values' field optional.
This patch makes the 'Values' field optional. This is useful when we
handcraft the terminating entry of DIEs.

```
debug_info:
  - Version:  4
    ...
    Entries:
      - AbbrCode: 1
        Values:
          - Value: 0x1234
      - AbbrCode: 0 ## Termination
```

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D85397
2020-08-06 20:43:52 +08:00
Petar Avramovic d893278bba [GlobalISel][InlineAsm] Fix matching input constraint to physreg
Add given input and mark it as tied.
Doesn't create additional copy compared to
matching input constraint to virtual register.

Differential Revision: https://reviews.llvm.org/D85122
2020-08-06 14:35:51 +02:00
Simon Pilgrim 3d10050e37 BitstreamRemarkParser.h - remove unnecessary includes. NFCI.
Remove unused includes, moving to the lib header or cpp file as necessary.
2020-08-06 13:17:53 +01:00
Simon Pilgrim 807467009d [X86] getX86MaskVec - replace mask limit from NumElts < 8 with NumElts <= 4
As noted on PR46885, the number of mask elements should always be a power of 2, so to fix the static analyzer warning we are better off replacing the condition to <= 4, and I've added a pow2 assertion as well.
2020-08-06 11:46:19 +01:00
Simon Pilgrim 87c5437afd PDBExtras.h - remove unnecessary raw_ostream forward declaration. NFCI.
We already need to include raw_ostream.h, also add missing StringRef.h and cstdint implicit dependencies.

Remove unnecessary includes from PDBExtras.cpp
2020-08-06 11:28:42 +01:00
Paul Walker 0d33a8ef5b [SVE] Lower scalable vector mul operations.
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.

Differential Revision: https://reviews.llvm.org/D85328
2020-08-06 11:15:35 +01:00
Paul Walker 3ed59b775d [SVE] Implement lowering for fixed length vector multiplication.
NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.

Differential Revision: https://reviews.llvm.org/D85327
2020-08-06 11:01:39 +01:00
Juneyoung Lee c771087161 [InstCombine] Fold freeze(undef) into a proper constant
This is a simple patch that folds freeze(undef) into a proper constant after inspecting its uses.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84948
2020-08-06 18:40:04 +09:00
David Green 745bf6cf44 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-06 10:10:50 +01:00
Roman Lebedev a512c89476
[NFC][InstCombine] Refactor '(-NSW x) pred x' fold 2020-08-06 11:50:36 +03:00
Roman Lebedev 141357663e
[InstCombine] (-NSW x) u<= x --> x s<=0 (PR39480)
Name: (-x) u<= x  -->  x s<= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ule i8 %neg_x, %x
  =>
%r = icmp sle i8 %x, 0

https://rise4fun.com/Alive/V22

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev 132be1f502
[InstCombine] (-NSW x) u< x --> x s< 0 (PR39480)
Name: (-x) u< x  -->  x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ult i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

https://rise4fun.com/Alive/zSuf

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:36 +03:00
Roman Lebedev 0e1241a3c9
[InstCombine] (-NSW x) u>= x --> x s>= 0 (PR39480)
Name: (-x) u>= x  -->  x s>= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp uge i8 %neg_x, %x
  =>
%r = icmp sge i8 %x, 0

https://rise4fun.com/Alive/LLHd

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 16c642fa39
[InstCombine] (-NSW x) u> x --> x s> 0 (PR39480)
Name: (-x) u> x  -->  x s> 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ugt i8 %neg_x, %x
  =>
%r = icmp sgt i8 %x, 0

https://rise4fun.com/Alive/Raea

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 59387c0dd7
[InstCombine] (-NSW x) s<= x --> x s>= 0 (PR39480)
Name: (-x) s<= x  -->  x >= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sle i8 %neg_x, %x
  =>
%r = icmp sge i8 %x, 0

https://rise4fun.com/Alive/91k

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 01a6c4bd26
[InstCombine] (-NSW x) s< x --> x s> 0 (PR39480)
Name: (-x) s< x  -->  x > 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp slt i8 %neg_x, %x
  =>
%r = icmp sgt i8 %x, 0

https://rise4fun.com/Alive/3IXb

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:35 +03:00
Roman Lebedev 3885207651
[InstCombine] (-NSW x) s>= x --> x s<= 0 (PR39480)
Name: (-x) s>= x  -->  x s<= 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sge i8 %neg_x, %x
  =>
%r = icmp sle i8 %x, 0

https://rise4fun.com/Alive/Hdip

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:34 +03:00
Roman Lebedev 8878b79cfe
[InstCombine] (-NSW x) ==/!= x --> x ==/!= 0 (PR39480)
Name: (-x) == x  -->  x == 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp eq i8 %neg_x, %x
  =>
%r = icmp eq i8 %x, 0

Name: (-x) != x  -->  x != 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp ne i8 %neg_x, %x
  =>
%r = icmp ne i8 %x, 0

https://rise4fun.com/Alive/4slH

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:34 +03:00
Roman Lebedev 5060f5682b
[InstCombine] (-NSW x) s> x --> x s< 0 (PR39480)
Name: (-x) s> x  -->  x s< 0
%neg_x = sub nsw i8 0, %x ; %x must not be INT_MIN
%r = icmp sgt i8 %neg_x, %x
  =>
%r = icmp slt i8 %x, 0

https://rise4fun.com/Alive/ZslD

https://bugs.llvm.org/show_bug.cgi?id=39480
2020-08-06 11:50:34 +03:00
Xing GUO 4357986b41 [DWARFYAML][debug_info] Pull out dwarf::FormParams from DWARFYAML::Unit.
Unit.Format, Unit.Version and Unit.AddrSize are replaced with
dwarf::FormParams in D84496 to get rid of unnecessary functions
getOffsetSize() and getRefSize(). However, that change makes it
difficult to make AddrSize optional (Optional<uint8_t>). This change
pulls out dwarf::FormParams from DWARFYAML::Unit and use it as a helper
struct in DWARFYAML::emitDebugInfo().

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D85296
2020-08-06 16:39:00 +08:00
Craig Topper 504a197fe5 [X86] Rename X86::getImpliedFeatures to X86::updateImpliedFeatures and pass clang's StringMap directly to it.
No point in building a vector of StringRefs for clang to apply to the
StringMap. Just pass the StringMap and modify it directly.
2020-08-06 00:20:46 -07:00
Martin Storsjö f5e6fbac24 [AArch64] [Windows] Error out on unsupported symbol locations
These might occur in seemingly generic assembly. Previously when
targeting COFF, they were silently ignored, which certainly won't
give the right result. Instead clearly error out, to make it clear
that the assembly needs to be adjusted for this target.

Also change a preexisting report_fatal_error into a proper error
message, pointing out the offending source instruction. This isn't
strictly an internal error, as it can be triggered by user input.

Differential Revision: https://reviews.llvm.org/D85242
2020-08-06 09:23:46 +03:00
Martin Storsjö 5eedc01a82 [ARM, AArch64] Fix a comment typo. NFC. 2020-08-06 09:23:45 +03:00
Chuanqi Xu 92f1f1e40d [Coroutines] Use to collect lifetime marker of in CoroFrame Differential Revision: https://reviews.llvm.org/D85279 2020-08-06 14:21:55 +08:00
Craig Topper 0215ae9735 [X86] Remove incomplete custom handling of i128 sdivrem/udivrem on Windows.
We need to have special handling of i128 div/rem on Windows due
to a weird calling convention needed for the libcall. There was
also some code that made it look like we do the same for sdivrem/udiv,
but the code didn't account for multiple return values of those
functions so couldn't possibly work. I think this code never
triggers because we don't have libcall names defined for those
functions by default so DAGCombine never creates DIVREM nodes.
2020-08-05 23:01:07 -07:00
Lang Hames ba8683f292 [JITLink][MachO][AArch64] More PAGEOFF12 relocation fixes.
Correctly sign extend the addend, and fix implicit shift operand decoding
(it incorrectly returned 0 for some cases), and check that the initial
encoded immediate is 0.
2020-08-05 21:09:45 -07:00
Matt Arsenault 0ee1eba581 AMDGPU: Remove ATOMIC_PK_FADD
The f32 and v2f16 cases should be handled the same way.
2020-08-05 22:00:52 -04:00
Ruiling Song 5ddc8b49ba [AMDGPU] add buffer_atomic_swap for float
The functionality is used when calling imageAtomicExhange() on float
type imageBuffer in Graphics shaders.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85187
2020-08-06 09:45:48 +08:00
Juneyoung Lee 9f717d7b94 [JumpThreading] Allow duplicating a basic block into preds when its branch condition is freeze(phi)
This is the last JumpThreading patch for getting the performance numbers shown at
https://reviews.llvm.org/D84940#2184653 .

This patch makes ProcessBlock call ProcessBranchOnPHI when the branch condition
is freeze(phi) as well (originally it calls the function when the condition is
phi only).

Since what ProcessBranchOnPHI does is to duplicate the basic block into
predecessors if profitable, it is still valid when the condition is freeze(phi)
too.

```
    p = phi [a, pred1] [b, pred2]
    p.fr = freeze p
    br p.fr, ...
=>
  pred1:
    p.fr = freeze a
    br p.fr, ...
  pred2:
    p.fr2 = freeze b
    br p.fr2, ...
```

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85029
2020-08-06 09:51:17 +09:00
Petr Hosek 1adc494bce [CMake] Simplify CMake handling for zlib
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.

This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.

Differential Revision: https://reviews.llvm.org/D79219
2020-08-05 16:07:11 -07:00
Craig Topper 08b2d0a963 [X86] Disable copy elision in LowerMemArgument for scalarized vectors when the loc VT is a different size than the original element.
For example a v4f16 argument is scalarized to 4 i32 values. So
the values are spread out instead of being packed tightly like
in the original vector.

Fixes PR47000.
2020-08-05 15:44:54 -07:00
Greg Clayton e1de85f9f4 Add verification for DW_AT_decl_file and DW_AT_call_file.
LTO builds have been creating invalid DWARF and one of the errors was a file index that was out of bounds. "llvm-dwarfdump --verify" will check all file indexes for line tables already, but there are no checks for the validity of file indexes in attributes.

The verification will verify if there is a DW_AT_decl_file/DW_AT_call_file that:
- there is a line table for the compile unit
- the file index is valid
- the encoding is appropriate

Tests are added that test all of the above conditions.

Differential Revision: https://reviews.llvm.org/D84817
2020-08-05 15:30:13 -07:00
Sanjay Patel c66169136f [InstCombine] fold icmp with 'mul nsw/nuw' and constant operands
This also removes a more specific fold that only handled icmp with 0.

https://rise4fun.com/Alive/sdM9

  Name: mul nsw with icmp eq
  Pre: (C1 != 0) && (C2 % C1) == 0
  %a = mul nsw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = icmp eq i8 %x, C2 / C1

  Name: mul nuw with icmp eq
  Pre: (C1 != 0) && (C2 %u C1) == 0
  %a = mul nuw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = icmp eq i8 %x, C2 /u C1

  Name: mul nsw with icmp ne
  Pre: (C1 != 0) && (C2 % C1) == 0
  %a = mul nsw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = icmp ne i8 %x, C2 / C1

  Name: mul nuw with icmp ne
  Pre: (C1 != 0) && (C2 %u C1) == 0
  %a = mul nuw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = icmp ne i8 %x, C2 /u C1
2020-08-05 17:29:32 -04:00
Stanislav Mekhanoshin 0bcda1a261 [AMDGPU] Scavenge temp reg for AGPR spill
Differential Revision: https://reviews.llvm.org/D85234
2020-08-05 13:29:19 -07:00
Rahman Lavaee 20a568c29d [Propeller]: Use a descriptive temporary symbol name for the end of the basic block.
This patch changes the functionality of AsmPrinter to name the basic block end labels as LBB_END${i}_${j}, with ${i} being the identifier for the function and ${j} being the identifier for the basic block. The new naming scheme is consistent with how basic block labels are named (.LBB${i}_{j}), and how function end symbol are named (.Lfunc_end${i}) and helps to write stronger tests for the upcoming patch for BB-Info section (as proposed in https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html). The end label is used with basicblock-labels (BB-Info section in future) and basicblock-sections to compute the size of basic blocks and basic block sections, respectively. For BB sections, the section containing the entry basic block will not have a BB end label since it already gets the function end-label.
This label is cached for every basic block (CachedEndMCSymbol) like the label for the basic block (CachedMCSymbol).

Differential Revision: https://reviews.llvm.org/D83885
2020-08-05 13:17:19 -07:00
Matt Arsenault ec8c172d01 AMDGPU: Correct prolog SP initialization logic
Having callees that will read SP is not the only reason we need to
reference the stack pointer.
2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin ea7d0e2996 [AMDGPU] gfx1031 target
Differential Revision: https://reviews.llvm.org/D85337
2020-08-05 12:36:26 -07:00
Arthur Eubanks 9e6a1e5781 [NewPM][LoopRotate] Rename rotate -> loop-rotate
To match legacy pass name.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D85338
2020-08-05 12:25:01 -07:00
Matt Arsenault 83eaf5d55d AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node
This is redundant with the other no return buffer atomic node, and we
don't really need a separate type profile for it.
2020-08-05 15:16:51 -04:00
Roman Lebedev f3056dcc02
[InstCombine] Negator: -(cond ? x : -x) --> cond ? -x : x
We were errneously only doing that for old-style abs/nabs,
but we have no such legality check on the condition of the select.

https://rise4fun.com/Alive/xBHS
2020-08-05 21:47:30 +03:00
Matt Arsenault 43c0c9252a AMDGPU: Refactor buffer atomic intrinsic lowering
Move raw/struct buffer atomic lowering to separate functions. This
avoids a long nested switch, and simplifies a future patch.
2020-08-05 14:44:55 -04:00
Matt Arsenault 3e52667433 AMDGPU: Fix verifier error with undef source producing s_bitset*
This needs to preserve the undef flag.
2020-08-05 14:42:20 -04:00
Sanjay Patel e8760bb9a8 [InstSimplify] fold icmp with mul nsw and constant operands
https://rise4fun.com/Alive/slvl

  Name: mul nsw with icmp eq
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nsw with icmp ne
  Pre: (C2 % C1) != 0
  %a = mul nsw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

Follow-up to the 'nuw' variation added with:
rGf879c9b79621
2020-08-05 14:38:39 -04:00
Sanjay Patel f879c9b796 [InstSimplify] fold icmp with mul nuw and constant operands
https://rise4fun.com/Alive/pZEr

  Name: mul nuw with icmp eq
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp eq i8 %a, C2
    =>
  %r = false

  Name: mul nuw with icmp ne
  Pre: (C2 %u C1) != 0
  %a = mul nuw i8 %x, C1
  %r = icmp ne i8 %a, C2
    =>
  %r = true

There are potentially several other transforms we need to add based on:
D51625
...but it doesn't look like there was follow-up to that patch.
2020-08-05 14:32:17 -04:00
Evgenii Stepanov f2c0423995 [msan] Remove readnone and friends from call sites.
MSan removes readnone/readonly and similar attributes from callees,
because after MSan instrumentation those attributes no longer apply.

This change removes the attributes from call sites, as well.

Failing to do this may cause DSE of paramTLS stores before calls to
readonly/readnone functions.

Differential Revision: https://reviews.llvm.org/D85259
2020-08-05 10:34:45 -07:00
Simon Pilgrim b60f998859 [X86][SSE] Fold 128-bit PACK(EXTEND(X),EXTEND(Y)) -> CONCAT(X,Y) subvectors
This is seen in the sub-128-bit vector trunc(ext()) of comparison results

Fixes pr46585.ll regression in D66004
2020-08-05 18:27:40 +01:00
Jordan Rupprecht 3c39db0c44 Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e4. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
  return ReductionOperations;
```
2020-08-05 10:24:15 -07:00
Mircea Trofin b18c41c66f [TFUtils] Expose untyped accessor to evaluation result tensors
These were implementation detail, but become necessary for generic data
copying.

Also added const variations to them, and move assignment, since we had a
move ctor (and the move assignment helps in a subsequent patch).

Differential Revision: https://reviews.llvm.org/D85262
2020-08-05 10:22:45 -07:00
David Green e9761688e4 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-05 18:14:05 +01:00
Roman Lebedev a05ec856a3
[NFC][InstCombine] Negator: include all the needed headers, IWYU 2020-08-05 20:12:36 +03:00
Roman Lebedev 3a3c9519e2
[InstCombine] Negator: 0 - (X + Y) --> (-X) - Y iff a single operand negated
This was the most obvious regression in
f5df5cd5586ae9cfb2d9e53704dfc76f47aff149.f5df5cd5586ae9cfb2d9e53704dfc76f47aff149

We really don't want to do this if the original/outermost subtraction
isn't a negation, and therefore doesn't go away - just sinking negation
isn't a win. We are actually appear to be missing folds so hoist it.

https://rise4fun.com/Alive/tiVe
2020-08-05 20:01:13 +03:00
Lang Hames 47cfffe893 [JITLink][AArch64] Handle addends on PAGE21 / PAGEOFF12 relocations. 2020-08-05 08:50:46 -07:00
Lang Hames d561d1bf96 [JITLink][AArch64] Improve debug output for addend relocations. 2020-08-05 08:50:46 -07:00
Sanjay Patel bd2c88b253 [InstSimplify] reduce code duplication in simplifyICmpWithMinMax(); NFC 2020-08-05 11:39:28 -04:00
Simon Pilgrim 6a06c7a0a7 [X86] isHorizontalBinOp - only update LHS/RHS references on success
We've had issues in the past where isHorizontalBinOp calls would affect later combines as the LHS/RHS references had been commuted but still failed to match.
2020-08-05 15:09:52 +01:00
Simon Pilgrim a57bfb44bc [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for integer types 2020-08-05 15:09:51 +01:00
Georgii Rymar 6ae5b9e405 [llvm-readobj] - Make decode_relrs() don't return Expected<>. NFCI.
The `decode_relrs` helper is declared as:

`Expected<std::vector<Elf_Rel>> decode_relrs(Elf_Relr_Range relrs) const;`

it never returns an error though and hence can be simplified to return
a vector.

Differential revision: https://reviews.llvm.org/D85302
2020-08-05 17:05:47 +03:00
Denis Antrushin d21ce40821 [Statepoints] Operand folding in presense of tied registers.
Implement proper folding of statepoint meta operands (deopt and GC)
when statepoint uses tied registers.
For deopt operands it is just about properly preserving tiedness
in new instruction.
For tied GC operands folding is a little bit more tricky.
We can fold tied GC operands only from InlineSpiller, because it knows
how to properly reload tied def after it was turned into memory operand.
Other users (e.g. peephole) cannot properly fold such operands as they
do not know how (or when) to reload them from memory.
We do this by un-tieing operand we want to fold in InlineSpiller
and allowing to fold only untied operands in foldPatchpoint.
2020-08-05 20:18:28 +07:00
Roman Lebedev f5df5cd558
Recommit "[InstCombine] Negator: -(X << C) --> X * (-1 << C)"
This reverts commit ac70b37a00
which reverted commit 8aeb2fe13a
because codegen tests got broken and i needed time to investigate.

This shows some regressions in tests, but they are all around GEP's,
so i'm not really sure how important those are.

https://rise4fun.com/Alive/1Gn
2020-08-05 15:59:13 +03:00
Sam Parker f2675ab45f [ARM][CostModel] Implement getCFInstrCost
As with other targets, set the throughput cost of control-flow
instructions to free so that we don't miss out of vectorization
opportunities.

Differential Revision: https://reviews.llvm.org/D85283
2020-08-05 12:44:51 +01:00
Hans Wennborg 3ab01550b6 Revert "[CMake] Simplify CMake handling for zlib"
This quietly disabled use of zlib on Windows even when building with
-DLLVM_ENABLE_ZLIB=FORCE_ON.

> Rather than handling zlib handling manually, use find_package from CMake
> to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
> HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
> set to YES, which requires the distributor to explicitly select whether
> zlib is enabled or not. This simplifies the CMake handling and usage in
> the rest of the tooling.
>
> This is a reland of abb0075 with all followup changes and fixes that
> should address issues that were reported in PR44780.
>
> Differential Revision: https://reviews.llvm.org/D79219

This reverts commit 10b1b4a231 and follow-ups
64d99cc6ab and
f9fec0447e.
2020-08-05 12:31:44 +02:00
Paul Walker 927fc536ca [SVE] Add lowering for fixed length vector and, or & xor operations.
Since there are no ill effects when performing these operations
with undefined elements, they are lowered to the already supported
unpredicated scalable vector equivalents.

Differential Revision: https://reviews.llvm.org/D85117
2020-08-05 11:28:34 +01:00
Simon Pilgrim 4aaf301fb8 [DAG] Fold vector (aext (load x)) -> (zext (truncate (zextload x)))
We currently don't do anything to fold any_extend vector loads as no target has such an instruction.

Instead I've added support for folding to a zextload, SimplifyDemandedBits does a good job of adjusting the zext(truncate(()) stages as required later on.

We still need the custom scalar extload handling instead of using the tryToFoldExtOfLoad helper as it has different legality tests - we can probably tweak that to reduce most of the code duplication.

Fixes the regression I mentioned in rG99a971cadff7

Differential Revision: https://reviews.llvm.org/D85129
2020-08-05 11:22:23 +01:00
Georgii Rymar f97019ad6e [llvm-readobj/elf] - Add a testing for --stackmap and refine the implementation.
Currently, we only test the `--stackmap` option here:
https://github.com/llvm/llvm-project/blob/master/llvm/test/Object/stackmap-dump.test
it uses a precompiled MachO binary currently and I've found no tests for this option for ELF.

The implementation also has issues. For example, it might assert on a wrong version
of the .llvm-stackmaps section. Or it might crash on an empty or truncated section.

This patch introduces a new tools/llvm-readobj/ELF test file as well as implements a few
basic checks to catch simple crashes/issues

It also eliminates `unwrapOrError` calls in `printStackMap()`.

Differential revision: https://reviews.llvm.org/D85208
2020-08-05 13:09:04 +03:00
David Turner ba0e71432a Do not map read-only data memory sections with EXECUTE flags.
The code in SectionMemoryManager.cpp unnecessarily maps
read-only data sections with the READ+EXECUTE flags. This is
undesirable from a security stand-point.

Moreover, on the Fuchsia platform, which is now very strict
about mapping pages with the EXECUTE permission, this simply
fails, because the section's pages were initially allocated
with only the READ+WRITE flags.

A more detailed description of the issue can be found in this
public SwiftShader bug:

  https://issuetracker.google.com/issues/154586551

This patch just restrict the mapping to the READ flag for ROData
sections. Code sections are still mapped with READ+EXECUTE as
expected.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D78574
2020-08-05 10:51:48 +02:00
Sander de Smalen f2916636f8 [AArch64][SVE] Disable tail calls if callee does not preserve SVE regs.
This fixes an issue triggered by the following code, where emitEpilogue
got confused when trying to restore the SVE registers after the call,
whereas the call to bar() is implemented as a TCReturn:

  int non_sve();
  int sve(svint32_t x) { return non_sve(); }

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84869
2020-08-05 09:38:54 +01:00
Jay Foad 8cbf4a17ac [AMDGPU] Propagate fast math flags in frem lowering
Differential Revision: https://reviews.llvm.org/D84518
2020-08-05 09:09:38 +01:00
Jay Foad 04cf4a5a65 [AMDGPU] Lower frem f16
Without this it would fail to select on subtargets that have 16-bit
instructions.

Differential Revision: https://reviews.llvm.org/D84517
2020-08-05 09:08:40 +01:00
Juneyoung Lee e0d99e9aaf [JumpThreading] Consider freeze as a zero-cost instruction
This is a simple patch that makes freeze as a zero-cost instruction, as bitcast already is.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85023
2020-08-05 14:42:36 +09:00
Evgeniy Brevnov 02a629daad [BPI][NFC] Unify handling of normal and SCC based loops
This is one more NFC part extracted from D79485. Normal and SCC based loops have very different representation and have to be handled separatly each time we deal with loops. D79485 is going to introduce much more extensive use of loops what will be problematic with out this change.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84838
2020-08-05 11:19:24 +07:00
Matt Arsenault 93cebb190a GlobalISel: Use buildAnyExtOrTrunc 2020-08-04 22:04:04 -04:00
Matt Arsenault 1ea182ce79 GlobalISel: Simplify code
This cannot be a vector of pointers, so using getScalarSizeInBits just
added a bit extra noise.
2020-08-04 22:03:59 -04:00
Matt Arsenault 8f65c933c4 GlobalISel: Fix redundant variable and shadowing 2020-08-04 22:03:55 -04:00
Matt Arsenault 54615ec48f GlobalISel: Move load/store lowering to separate functions 2020-08-04 22:03:51 -04:00
Zequan Wu e3df947175 [llvm-cov] reset executation count to 0 after wrapped segment
Fix the bug: https://bugs.llvm.org/show_bug.cgi?id=36979. It also fixes this bug: https://bugs.llvm.org/show_bug.cgi?id=35404, which I think is caused by the same problem.

Differential Revision: https://reviews.llvm.org/D85036
2020-08-04 18:38:44 -07:00
Fangrui Song 0c7af8c83b [X86] Optimize getImpliedDisabledFeatures & getImpliedEnabledFeatures after D83273
Previously the time complexity is O(|number of paths from the root to an
implied feature| * CPU_FWATURE_MAX) where CPU_FEATURE_MAX is 92.

The number of paths can be large (theoretically exponential).

For an inline asm statement, there is a code path
`clang::Parser::ParseAsmStatement -> clang::Sema::ActOnGCCAsmStmt -> ASTContext::getFunctionFeatureMap`
leading to potentially many calls of getImpliedEnabledFeatures (41 for my -march=native case).

We should improve the performance a bit in case the number of inline asm
statements is large (Linux kernel builds).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85257
2020-08-04 17:50:06 -07:00
Mircea Trofin 90b9c49ca6 [llvm] Expose type and element count-related APIs on TensorSpec
Added a mechanism to check the element type, get the total element
count, and the size of an element.

Differential Revision: https://reviews.llvm.org/D85250
2020-08-04 17:32:16 -07:00
Roman Lebedev ac70b37a00
Revert "[InstCombine] Negator: -(X << C) --> X * (-1 << C)"
Breaks codegen tests, will recommit later.

This reverts commit 8aeb2fe13a.
2020-08-05 03:19:38 +03:00
Roman Lebedev 8aeb2fe13a
[InstCombine] Negator: -(X << C) --> X * (-1 << C)
This shows some regressions in tests, but they are all around GEP's,
so i'm not really sure how important those are.

https://rise4fun.com/Alive/1Gn
2020-08-05 03:13:14 +03:00
Krzysztof Parzyszek 06d425737b [RDF] Add operator<<(raw_ostream&, RegisterAggr), NFC 2020-08-04 18:40:07 -05:00
Krzysztof Parzyszek 9521704553 [RDF] Use hash-based containers, cache extra information
This improves performance.
2020-08-04 18:36:49 -05:00
Yonghong Song 00602ee7ef BPF: simplify IR generation for __builtin_btf_type_id()
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
   if (obj is a lvalue)
     llvm.bpf.btf.type.id(obj.ptr, 1, flag)  !type
   else
     llvm.bpf.btf.type.id(obj, 0, flag)  !type
The purpose of the 2nd argument is to differentiate
   __builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
   __builtin_btf_type_id(obj.ptr, flag)

Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
  - obj is the same .e.g., nullptr
  - flag is the same
  - metadata type is different, e.g., typedef of struct "s"
    and strust "s".
In the above, we don't want CSE since their metadata is different.

This patch change IR builtin to
   llvm.bpf.btf.type.id(seq_num, flag)  !type
and seq_num is always increasing. This will prevent potential
llvm CSE.

Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.

Differential Revision: https://reviews.llvm.org/D85174
2020-08-04 16:29:42 -07:00
Krzysztof Parzyszek 4b25f67299 [RDF] Really remove remaining uses of PhysicalRegisterInfo::normalize 2020-08-04 18:23:38 -05:00
Krzysztof Parzyszek f0f467aeec [RDF] Cache register aliases in PhysicalRegisterInfo
This improves performance of PhysicalRegisterInfo::makeRegRef.
2020-08-04 18:10:00 -05:00
Krzysztof Parzyszek 47fe1b63f4 [RDF] Lower the sorting complexity in RDFLiveness::getAllReachingDefs
The sorting is needed, because reaching defs are (logically) ordered,
but are not collected in that order. This change will break up the
single call to std::sort into a series of smaller sorts, each of which
should use a cheaper comparison function than the original.
2020-08-04 18:06:37 -05:00
Adrian Prantl bf82ff61a6 Teach SROA to handle allocas with more than one dbg.declare.
It is technically legal for optimizations to create an alloca that is
used by more than one dbg.declare, if one or both of them are inlined
instances of aliasing variables.

Differential Revision: https://reviews.llvm.org/D85172
2020-08-04 15:54:51 -07:00
Arthur Eubanks f50b3ff02e [Hexagon] Use InstSimplify instead of ConstantProp
This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp.

Add -hexagon-instsimplify option to enable skipping of instsimplify in
tests that can't handle the extra optimization.

Differential Revision: https://reviews.llvm.org/D85047
2020-08-04 15:42:39 -07:00
Eli Friedman 4a47f1c4ce [SelectionDAG][SVE] Support scalable vectors in getConstantFP()
Differential Revision: https://reviews.llvm.org/D85249
2020-08-04 15:32:43 -07:00
Krzysztof Parzyszek 09897b146a [RDF] Remove uses of RDFRegisters::normalize (deprecate)
This function has been reduced to an identity function for some time.
2020-08-04 17:02:12 -05:00
Matt Arsenault 486e84dfa4 AMDGPU/GlobalISel: Use live in helper function for returnaddress 2020-08-04 17:36:01 -04:00
Mircea Trofin 65b6dbf939 [llvm][NFC] Moved implementation of TrainingLogger outside of its decl
Also renamed a method - printTensor - to print; and added comments.
2020-08-04 14:35:35 -07:00
Matt Arsenault 89011fc3c9 AMDGPU/GlobalISel: Select llvm.returnaddress 2020-08-04 17:14:38 -04:00
Matt Arsenault f8fb7835d6 GlobalISel: Add utilty for getting function argument live ins
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.

This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.

I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
2020-08-04 16:55:55 -04:00
Eli Friedman 95efea4b93 [AArch64][SVE] Widen narrow sdiv/udiv operations.
The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit
integers.  If we see an 8-bit or 16-bit divide, widen the operands to 32
bits, and narrow the result.

Differential Revision: https://reviews.llvm.org/D85170
2020-08-04 13:22:15 -07:00
Ilya Leoshkevich 153df1373e [SanitizerCoverage] Fix types of __stop* and __start* symbols
If a section is supposed to hold elements of type T, then the
corresponding CreateSecStartEnd()'s Ty parameter represents T*.
Forwarding it to GlobalVariable constructor causes the resulting
GlobalVariable's type to be T*, and its SSA value type to be T**, which
is one indirection too many. This issue is mostly masked by pointer
casts, however, the global variable still gets an incorrect alignment,
which causes SystemZ to choose wrong instructions to access the
section.
2020-08-04 21:53:27 +02:00
Cameron McInally 0f2b47b6da [FastISel] Don't transform FSUB(-0, X) -> FNEG(X) in FastISel
This corresponds with the SelectionDAGISel change in D84056.

Also, rename some poorly named tests in CodeGen/X86/fast-isel-fneg.ll with NFC.

Differential Revision: https://reviews.llvm.org/D85149
2020-08-04 14:42:53 -05:00
Yonghong Song 6d218b4adb BPF: support type exist/size and enum exist/value relocations
Four new CO-RE relocations are introduced:
  - TYPE_EXISTENCE: whether a typedef/record/enum type exists
  - TYPE_SIZE: the size of a typedef/record/enum type
  - ENUM_VALUE_EXISTENCE: whether an enum value of an enum type exists
  - ENUM_VALUE: the enum value of an enum type

These additional relocations will make CO-RE bpf programs
more adaptive for potential kernel internal data structure
changes.

Differential Revision: https://reviews.llvm.org/D83878
2020-08-04 12:35:39 -07:00
Matt Arsenault 3e16e2152c GlobalISel: Handle llvm.localescape
This one is pretty easy and shrinks the list of unhandled
intrinsics. I'm not sure how relevant the insert point is. Using the
insert position of EntryBuilder will place this after
constants. SelectionDAG seems to end up emitting these after argument
copies and before anything else, but I don't think it really
matters. This also ends up emitting these in the opposite order from
SelectionDAG, but I don't think that matters either.

This also needs a fix to stop the later passes dropping this as a dead
instruction. DeadMachineInstructionElim's version of isDead special
cases LOCAL_ESCAPE for some reason, and I'm not sure why it's excluded
from MachineInstr::isLabel (or why isDead doesn't check it).

I also noticed DeadMachineInstructionElim never considers inline asm
as dead, but GlobalISel will drop asm with no constraints.
2020-08-04 15:19:02 -04:00
Bardia Mahjour 3c0f347002 [NFC][LV] Vectorized Loop Skeleton Refactoring
This patch tries to improve readability and maintenance
of createVectorizedLoopSkeleton by reorganizing some lines,
updating some of the comments and breaking it up into
smaller logical units.

Reviewed By: pjeeva01

Differential Revision: https://reviews.llvm.org/D83824
2020-08-04 14:50:57 -04:00
Xavier Denis 29fe3fe615 [InstSimplify] Peephole optimization for icmp (urem X, Y), X
This revision adds the following peephole optimization
and it's negation:

    %a = urem i64 %x, %y
    %b = icmp ule i64 %a, %x
    ====>
    %b = true

With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.

This pattern occurs in the bound checks of Rust code, the program

    const N: usize = 3;
    const T = u8;

    pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
        let len = slice.len() / N;
        slice.split_at(len * N)
    }

the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.

ref: https://github.com/rust-lang/rust/issues/74938

Differential Revision: https://reviews.llvm.org/D85092
2020-08-04 20:48:37 +02:00
Nikita Popov 4564974504 [SCCP] Propagate inequalities
Teach SCCP to create notconstant lattice values from inequality
assumes and nonnull metadata, and update getConstant() to make
use of them. Additionally isOverdefined() needs to be changed to
consider notconstant an overdefined value.

Handling inequality branches is delayed until our branch on undef
story in other passes has been improved.

Differential Revision: https://reviews.llvm.org/D83643
2020-08-04 20:20:52 +02:00
David Blaikie e31cfc4cd3 Fix -Wconstant-conversion warning with explicit cast
Introduced by fd6584a220

Following similar use of casts in AsmParser.cpp, for instance - ideally
this type would use unsigned chars as they're more representative of raw
data and don't get confused around implementation defined choices of
char's signedness, but this is what it is & the signed/unsigned
conversions are (so far as I understand) safe/bit preserving in this
usage and what's intended, given the API design here.
2020-08-04 10:41:27 -07:00
Xing GUO 12605bfd1f [DWARFYAML] Fix unintialized value Is64BitAddrSize. NFC.
This patch fixes the undefined behavior that reported by ubsan.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/44524/
2020-08-05 00:28:17 +08:00
Matt Arsenault 0de547ed4a AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES
Fixes verifier error with SGPR unmerges with 96-bit result types.
2020-08-04 12:27:34 -04:00
Cameron McInally 23adbac9ee [GlobalISel] Don't transform FSUB(-0, X) -> FNEG(X) in GlobalISel.
This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the MIR.

This corresponds with the SelectionDAGISel change in D84056.

Differential Revision: https://reviews.llvm.org/D85139
2020-08-04 11:27:09 -05:00
Sanjay Patel a16882047a [InstSimplify] refactor min/max folds with shared operand; NFC 2020-08-04 12:21:05 -04:00
Fangrui Song 593e196297 [llvm-symbolizer] Switch command line parsing from llvm::cl to OptTable
for the advantage outlined by D83639 ([OptTable] Support grouped short options)

Some behavior changes:

* -i={0,false} is removed. Use --no-inlines instead.
* --demangle={0,false} is removed. Use --no-demangle instead
* -untag-addresses={0,false} is removed. Use --no-untag-addresses instead

Added a higher level API OptTable::parseArgs which handles optional
initial options populated from an environment variable, expands response
files recursively, and parses options.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D83530
2020-08-04 08:53:15 -07:00
Nemanja Ivanovic 14d726acd6 [PowerPC] Don't remove single swap between the load and store
The swap removal pass looks to remove swaps when a loaded value is swapped, some
number of lane-insensitive operations are performed and then the value is
swapped again and stored.

However, in a situation where we load the value, swap it and then store it
without swapping again, the pass erroneously removes the single swap. The
reason is that both checks in the same equivalence class:

- load feeds a swap
- swap feeds a store

pass. However, there is no check that the two swaps are actually a single swap.
This patch just fixes that.

Differential revision: https://reviews.llvm.org/D84785
2020-08-04 10:38:15 -05:00
Jay Foad 28e322ea93 [PowerPC] Custom lowering for funnel shifts
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.

Differential Revision: https://reviews.llvm.org/D83948
2020-08-04 16:30:49 +01:00
Jay Foad 8ec8ad868d [AMDGPU] Use fma for lowering frem
This gives shorter f64 code and perhaps better accuracy.

Differential Revision: https://reviews.llvm.org/D84516
2020-08-04 16:18:23 +01:00
Simon Pilgrim 6f0da46d53 [X86] getFauxShuffleMask - drop unnecessary computeKnownBits OR(X,Y) shuffle decoding.
Now that rG47cea9e82dda941e lets us aggressively decode multi-use shuffles for the OR(SHUFFLE(),SHUFFLE()) case we don't need the computeKnownBits variant any more.
2020-08-04 15:57:47 +01:00
Nemanja Ivanovic 62a933b72c [Support][PPC] Fix bot failures due to cd53ded557
Commit https://reviews.llvm.org/rGcd53ded557c3 attempts to fix the
computation in computeHostNumPhysicalCores() to respect Affinity.
However, the GLIBC wrapper of the affinity system call fails with
a default size of cpu_set_t on systems that have more than 1024 CPUs.
This just fixes the computation on such large machines.
2020-08-04 09:00:49 -05:00
Simon Pilgrim 051f293b78 [X86] Remove unused canScaleShuffleElements helper
The only use was removed at rG36750ba5bd0e9e72

Thanks to @nemanjai for the heads up
2020-08-04 14:51:23 +01:00
Simon Pilgrim 36750ba5bd [X86][AVX] isHorizontalBinOp - relax lane-crossing limits for AVX1-only targets.
Permit lane-crossing post shuffles on AVX1 targets as long as every element comes from the same source lane, which for v8f32/v4f64 cases can be efficiently lowered with the LowerShuffleAsLanePermuteAnd* style methods.
2020-08-04 14:27:01 +01:00
Sanjay Patel 04e45ae1c6 [InstSimplify] fold nested min/max intrinsics with constant operands
This is based on the existing code for the non-intrinsic idioms
in InstCombine.

The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:

https://alive2.llvm.org/ce/z/-2bVbM
  define <2 x i8> @src(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    %m2 = umin <2 x i8> { 9, 9 }, %m
    ret <2 x i8> %m2
  }
  =>
  define <2 x i8> @tgt(<2 x i8> %x) {
  %0:
    %m = umin <2 x i8> %x, { 7, undef }
    ret <2 x i8> %m
  }
  Transformation doesn't verify!
  ERROR: Value mismatch

  Example:
  <2 x i8> %x = < undef, undef >

  Source:
  <2 x i8> %m = < #x00 (0)	[based on undef value], #x00 (0) >
  <2 x i8> %m2 = < #x00 (0), #x00 (0) >

  Target:
  <2 x i8> %m = < #x07 (7), #x10 (16) >
  Source value: < #x00 (0), #x00 (0) >
  Target value: < #x07 (7), #x10 (16) >
2020-08-04 08:44:48 -04:00
Sanjay Patel 20c71e55aa [InstSimplify] reduce code for min/max analysis; NFC
This should probably be moved up to some common area eventually
when there's another user.
2020-08-04 08:02:33 -04:00
Sander de Smalen bb3344c7d8 [AArch64][SVE] Add missing unwind info for SVE registers.
This patch adds a CFI entry for each SVE callee saved register
that needs unwind info at an offset from the CFA. The offset is
a DWARF expression because the offset is partly scalable.

The CFI entries only cover a subset of the SVE callee-saves and
only encodes the lower 64-bits, thus implementing the lowest
common denominator ABI. Existing unwinders may support VG but
only restore the lower 64-bits.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84044
2020-08-04 11:47:06 +01:00
Sander de Smalen fd6584a220 [AArch64][SVE] Fix CFA calculation in presence of SVE objects.
The CFA is calculated as (SP/FP + offset), but when there are
SVE objects on the stack the SP offset is partly scalable and
should instead be expressed as the DWARF expression:

     SP + offset + scalable_offset * VG

where VG is the Vector Granule register, containing the
number of 64bits 'granules' in a scalable vector.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84043
2020-08-04 11:47:06 +01:00
Paul Walker 4be13b15d6 [SVE] Replace remaining _MERGE_OP1 nodes with _PRED variants.
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.

Also removes a redundant parameter from the min/max tests.

Differential Revision: https://reviews.llvm.org/D85142
2020-08-04 11:19:17 +01:00
Juneyoung Lee e734e8286b [JumpThreading] Remove cast's constraint
As discussed in D84949, this removes the constraint to cast since it does not
cause compile time degradation.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D85188
2020-08-04 19:09:25 +09:00
David Green 3c7e7d40a9 [BasicAA] Enable -basic-aa-recphi by default
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.

Differential Revision: https://reviews.llvm.org/D82998
2020-08-04 10:43:42 +01:00
Meera Nakrani 20283ff491 [ARM] Generated SSAT and USAT instructions with shift
Added patterns so that both SSAT and USAT instructions are generated with shifts. Added corresponding regression tests.

Differential Review: https://reviews.llvm.org/D85120
2020-08-04 09:38:17 +00:00
Simon Pilgrim 47cea9e82d Revert rG66e7dce714fab "Revert "[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero.""
[X86][SSE] Shuffle combine blends to OR(X,Y) if the relevant elements are known zero (REAPPLIED)

This allows us to remove the (depth violating) code in getFauxShuffleMask where we were combining the OR(SHUFFLE,SHUFFLE) shuffle inputs as well, and not just the OR().

This is a minor step toward being able to shuffle combine from/to SELECT/BLENDV as a faux shuffle.

Reapplied with fixed signed/unsigned comparisons.
2020-08-04 10:32:39 +01:00
Florian Hahn f7658241cb [AArch64] Consider instruction-level contract FMFs in combiner patterns.
Currently, instruction level fast math flags are not considered when
generating patterns for the machine combiner.

This currently leads to some missed opportunities to generate FMAs in
combination with `#pragma clang fp contract (fast)`.

For example, when building the example below with -O3 for AArch64, no
FMADD is generated. If built with -O2 and the DAGCombiner is used
instead of the MachineCombiner for FMAs, an FMADD is generated.

With this patch, the same code is generated in both cases.

    float madd_contract(float a, float b, float c) {
    #pragma clang fp contract (fast)
      return (a * b) + c;
    }

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84930
2020-08-04 10:25:16 +01:00
Qiu Chaofan 6a78a8dd37 [NFC] [PowerPC] Refactor fp/int conversion lowering
For FP_TO_INT and INT_TO_FP lowering, we have direct-move and
non-direct-move methods. But they share some conversion logic, so we can
reduce redundant code by introducing new methods.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81818
2020-08-04 15:48:16 +08:00
Juneyoung Lee 6f97103b56 [JumpThreading] Don't limit the type of an operand
Compared to the optimized code with branch conditions never frozen,
limiting the type of freeze's operand causes generation of suboptimal code in
some cases.
I would like to suggest removing the constraint, as this patch does.
If the number of freeze instructions becomes significant, this can be revisited.

Differential Revision: https://reviews.llvm.org/D84949
2020-08-04 16:21:58 +09:00
Fangrui Song b959906cb9 [PGO] Use multiple comdat groups for COFF
D84723 caused multiple definition issues (related to comdat) on Windows:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/67465
2020-08-03 21:33:16 -07:00
Wang, Pengfei 6bc7ea2d8d [X86][AVX512] Fix build fail after D81548
Test function mask_cmp_128 failed during ISEL
LLVM ERROR: Cannot select: t37: v8i1 = X86ISD::KSHIFTL t48, TargetConstant:i8<4>
due to v8i1 only available under AVX512DQ.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D84922
2020-08-04 12:31:04 +08:00
Chen Zheng 45c46d180e [PowerPC] mark r+i as legal address mode for vector type after pwr9
Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D84735
2020-08-04 00:02:37 -04:00
Carl Ritson 57899934ea [AMDGPU] Make GCNRegBankReassign assign based on subreg banks
When scavenging consider the sub-register of the source operand
to determine the bank of a candidate register (not just sub0).
Without this it is possible to introduce an infinite loop,
e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between
$sgpr0 and SGPR_96:sub1.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84910
2020-08-04 12:54:44 +09:00
Fangrui Song e56626e438 [PGO] Move __profc_ and __profvp_ from their own comdat groups to __profd_'s comdat group
D68041 placed `__profc_`,  `__profd_` and (if exists) `__profvp_` in different comdat groups.
There are some issues:

* Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64.
* `__profc_`,  `__profd_` and (if exists) `__profvp_` should be retained or
  discarded. Placing them into separate comdat groups is conceptually inferior.
* If the prevailing group does not include `__profvp_` (value profiling not
  used) but a non-prevailing group from another translation unit has `__profvp_`
  (the function is inlined into another and triggers value profiling), there
  will be a stray `__profvp_` if --gc-sections is not enabled.
  This has been fixed by 3d6f53018f.

Actually, we can reuse an existing symbol (we choose `__profd_`) as the group
signature to avoid a string in the string table (the sole reason that D68041
could improve code size is that `__profv_` was an otherwise unused symbol which
wasted string table space). This saves one or two section headers.

For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja
clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D84723
2020-08-03 20:35:50 -07:00
Max Kazantsev 7647c2716e [SimpleLoopUnswitch][NFC] Add option to always drop make.implicit metadata in non-trivial unswitching and save compile time
We might want this if we find out that using of MustExecute analysis is too expensive.
By default we do the analysis because its complexity does not exceed the complexity
of whole loop copying in unswitching. Follow-up for D84925.

Differential Revision: https://reviews.llvm.org/D85001
Reviewed By: asbirlea
2020-08-04 10:16:40 +07:00