Commit Graph

69776 Commits

Author SHA1 Message Date
Evgenii Stepanov 2a3723ef11 [memtag] Plug in stack safety analysis.
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.

Reviewers: pcc, vitalybuka, ostannard

Reviewed By: vitalybuka

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73513
2020-03-16 16:35:25 -07:00
Sriraman Tallam df082ac45a Basic Block Sections support in LLVM.
This is the second patch in a series of patches to enable basic block
sections support.

This patch adds support for:

* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.

Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
2020-03-16 16:06:54 -07:00
Craig Topper 378b1e6080 [X86] Assign avx512bf16 instructions to the SSEPackedSingle ExeDomain. 2020-03-16 14:07:01 -07:00
Nico Weber 623cb95eb3 Revert "[InstSimplify] Simplify calls with "returned" attribute"
This reverts commit 45555c3819.
Causes clang crashes in some causes, see comments on
https://reviews.llvm.org/D75815 for details (including
repro steps).
2020-03-16 15:21:30 -04:00
Francesco Petrogalli 0f2b68d9c7 Implement IR intrinsics for gather prefetch.
Summary:
Intrinsics and relative codegen has been implemented for the following
SVE instructions:

1. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>] -> 32-bit          scaled offset
2. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>] -> 32-bit unpacked scaled offset
3. PRF<T> <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]        -> 64-bit          scaled offset
4. PRF<T> <prfop>, <Pg>, [<Zn>.S{, #<imm>}]       -> 32-bit element
5. PRF<T> <prfop>, <Pg>, [<Zn>.D{, #<imm>}]       -> 64-bit element

The instructions are associated the following intrinsics, respectively:

1. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx4vi32(
          i8* %base,
          <vscale x 4 x i32> %offset,
          <vscale x 4 x i1> %Pg,
          i32 %prfop)

2. void @llvm.aarch64.sve.gather.prf<T>.scaled.<mod>.nx2vi32(
          i8* %base,
          <vscale x 2 x i32> %offset,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

3. void @llvm.aarch64.sve.gather.prf<T>.scaled.nx2vi64(
          i8* %base,
          <vscale x 2 x i64> %offset,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

4. void @llvm.aarch64.sve.gather.prf<T>.nx4vi32(
          <vscale x 4 x i32> %bases,
          i64 %imm,
          <vscale x 4 x i1> %Pg,
          i32 %prfop)

5. void @llvm.aarch64.sve.gather.prf<T>.nx2vi64(
          <vscale x 2 x i64> %bases,
          i64 %imm,
          <vscale x 2 x i1> %Pg,
          i32 %prfop)

The intrinsics are the IR counterpart of the following SVE ACLE functions:

* void svprf<T>(svbool_t pg, const void *base, svprfop op)
* void svprf<T>_vnum(svbool_t pg, const void *base, int64_t vnum, svprfop op)
* void svprf<T>_gather[_u32base](svbool_t pg, svuint32_t bases, svprfop op)
* void svprf<T>_gather[_u64base](svbool_t pg, svuint64_t bases, svprfop op)
* void svprf<T>_gather_[s32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[u32]offset(svbool_t pg, const void *base, svint32_t offsets, svprfop op)
* void svprf<T>_gather_[s64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather_[u64]offset(svbool_t pg, const void *base, svint64_t offsets, svprfop op)
* void svprf<T>_gather[_u32base]_offset(svbool_t pg, svuint32_t bases, int64_t offset, svprfop op)
* void svprf<T>_gather[_u64base]_offset(svbool_t pg, svuint64_t bases,int64_t offset, svprfop op)

Reviewers: andwar, sdesmalen, efriedma, rengolin

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75580
2020-03-16 18:52:35 +00:00
Huihui Zhang 0616e9964b [InstSimplify][SVE] Fix SimplifyGEPInst for scalable vector.
Summary:
Skip folds that rely on DataLayout::getTypeAllocSize(). For scalable
vector, only minimal type alloc size is known at compile-time.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75892
2020-03-16 11:46:12 -07:00
Matt Arsenault b0bdb186f5 Utils: Always set alignment when expanding mem intrinsics
This was creating natural aligned loads and stores, which may not be
the case. The target could request a wider type load with less
alignment.
2020-03-16 14:34:29 -04:00
Nico Weber 9e48422035 Revert "[llvm-objdump] Display locations of variables alongside disassembly"
Makes tests fail on Windows, see https://reviews.llvm.org/D70720#1924542

This reverts commit 3a5ddedadb, and
follow-ups:
f4cb9c919e
042eb0482a
c0cf5f5da9
18649f4813
f62b898c1f
2020-03-16 14:04:25 -04:00
Simon Pilgrim ebb181cf40 [X86] matchScalarReduction - add support for partial reductions
Add optional support for opt-in partial reduction cases by providing an optional partial mask to indicate which elements have been extracted for the scalar reduction.
2020-03-16 18:01:02 +00:00
Matt Arsenault 80b627d69d AMDGPU/GlobalISel: Fix handling of G_ANYEXT with s1 source
We were letting G_ANYEXT with a vcc register bank through, which was
incorrect and would select to an invalid copy. Fix this up like G_ZEXT
and G_SEXT. Also drop old code to fixup the non-boolean case in
RegBankSelect. We now have to perform that expansion during selection,
so there's no benefit to doing it during RegBankSelect.
2020-03-16 12:59:54 -04:00
Matt Arsenault c460dc6eeb AMDGPU/GlobalISel: Fix some illegal scalar argument types
Fixes integers that don't evenly divide to i32 pieces. We should
probably extract some of the code in the legalizer to start handling
argument breakdowns. I'm dissatisfied with the argument lowering's
handling of vectors for example, and we should not be producing the
weird G_EXTRACTs we do now.
2020-03-16 12:51:23 -04:00
Matt Arsenault 84386b2d8a AMDGPU: Drop special case f64 fround lowering
The result is better if ftrunc is emitted and separately legalized
when unavailable.
2020-03-16 12:09:30 -04:00
Matt Arsenault 19a0350187 GlobalISel: Fix round lowering
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
2020-03-16 11:37:30 -04:00
Dominik Montada 8ff2dcb18b [GlobalISel] add additional lowering support for G_INSERT
Summary: Add lowering support for inserting pointers or scalars into scalars, vectors or pointers

Reviewers: arsenm, dsanders

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75994
2020-03-16 16:27:17 +01:00
Guillaume Chatelet 4efec6e1c0 Revert "Disable memcpy-inline-fails.ll for windows"
This reverts commit adc2e250a1.
2020-03-16 16:03:39 +01:00
Matt Arsenault 57d896e838 AMDGPU/GlobalISel: Make some large merges legal
We allow up to 1024-bit registers, so we should support merges all the
way to the maximum.
2020-03-16 10:49:10 -04:00
Fangrui Song 536ba6373f [Object] Change ELFObjectFile<ELFT>::getFileFormatName() to use BFD names
Follow-up for D74433

What the function returns are almost standard BFD names, except that "ELF" is
in uppercase instead of lowercase.

This patch changes "ELF" to "elf" and changes ARM/AArch64 to use their BFD names.
MIPS and PPC64 have endianness differences as well, but this patch does not intend to address them.

Advantages:

* llvm-objdump: the "file format " line matches GNU objdump on ARM/AArch64 objects
* "file format " line can be extracted and fed into llvm-objcopy -O literally.
  (https://github.com/ClangBuiltLinux/linux/issues/779 has such a use case)

Affected tools: llvm-readobj, llvm-objdump, llvm-dwarfdump, MCJIT (internal implementation detail, not exposed)

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76046
2020-03-16 07:42:04 -07:00
Simon Pilgrim 2b3b453a82 [TargetLowering] Only demand a funnelshift's modulo amount bits
ISD::FSHL/FSHR shift amount values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
2020-03-16 13:52:17 +00:00
Dominik Montada c0241f150d [GlobalISel] combine G_TRUNC with G_MERGE_VALUES
Summary:
Truncating the result of a merge means that most likely we could have done without merge in the first place and just used the input merge inputs directly. This can be done in three cases:

1. If the truncation result is smaller than the merge source, we can use the source in the trunc directly
2. If the sizes are the same, we can replace the register or use a copy
3. If the truncation size is a multiple of the merge source size, we can build a smaller merge

This gets rid of most of the larger, hard-to-legalize merges.

Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, arsenm, Petar.Avramovic

Reviewed By: arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75915
2020-03-16 14:42:01 +01:00
Juneyoung Lee 7aecf2323c [ExpandMemCmp] Correctly set alignment of generated loads
Summary:
This is a part of the series of efforts for correcting alignment of memory operations.
(Another related bugs: https://bugs.llvm.org/show_bug.cgi?id=44388 , https://bugs.llvm.org/show_bug.cgi?id=44543 )

This fixes https://bugs.llvm.org/show_bug.cgi?id=43880 by giving default alignment of loads to 1.

The test CodeGen/AArch64/bcmp-inline-small.ll should have been changed; it was introduced by https://reviews.llvm.org/D64805 . I talked with @evandro, and confirmed that the test is okay to be changed.
Other two tests from PowerPC needed changes as well, but fixes were straightforward.

Reviewers: courbet

Reviewed By: courbet

Subscribers: nlopes, gchatelet, wuzish, nemanjai, kristof.beyls, hiraditya, steven.zhang, danielkiss, llvm-commits, evandro

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76113
2020-03-16 22:39:48 +09:00
Juneyoung Lee acdcd23b7b Add tests to ExpandMemCmp/X86/memcmp.ll before submitting D76113 2020-03-16 22:19:37 +09:00
Guillaume Chatelet adc2e250a1 Disable memcpy-inline-fails.ll for windows 2020-03-16 14:10:08 +01:00
Georgii Rymar 46c34447f8 [yaml2obj][test] - Fix comments in ELF/program-header-address.yaml test. NFC.
This addresses post-commit comments suggested by James Henderson for D76131.
2020-03-16 16:07:10 +03:00
Oliver Stannard f4cb9c919e Disable llvm-objdump --debug-vars tests on Windows
These tests passed in my Windows 10 VM, but are failing on Windows bots
with errors which look related to unicode encodings. Disable the tests
on Windows for now.
2020-03-16 12:50:01 +00:00
Jonas Paulsson 132f25bcca [SystemZ] Avoid scalarization of [SU]INT_TO_FP ISD-nodes.
The type legalizer will scalarize vector conversions from integer to floating
point if the source element size is less than that of the result.

This is avoided now by inserting a zero/sign-extension of the source vector
before type legalization.

Review: Ulrich Weigand

Differential revision: https://reviews.llvm.org/D75978
2020-03-16 13:07:42 +01:00
Oliver Stannard 2878c66938 Don't run PowerPC objdump tests when PowerPC backend not built 2020-03-16 12:00:33 +00:00
Oliver Stannard 161f70eae6 Don't run ARM objdump tests when ARM backend not built 2020-03-16 11:24:19 +00:00
David Stenberg 02b6a3c349 [DebugInfo] Handle generic type DW_OP_convert ops in dsymutil
Summary:
This is a preparatory change for allowing LLVM to emit DW_OP_convert
operations converting to the generic type.

If DW_OP_convert's operand is 0, it converts the top of the stack to the
generic type, as specified by DWARFv5 section 2.5.1.6:

"[...] takes one operand, which is an unsigned LEB128 integer that
 represents the offset of a debugging information entry in the current
 compilation unit, or value 0 which represents the generic type."

This adds support for such operations to dsymutil.

Reviewers: aprantl, markus, friss, JDevlieghere

Reviewed By: aprantl, JDevlieghere

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D76142
2020-03-16 12:16:37 +01:00
Oliver Stannard 3a5ddedadb [llvm-objdump] Display locations of variables alongside disassembly
This adds the --debug-vars option to llvm-objdump, which prints
locations (registers/memory) of source-level variables alongside the
disassembly based on DWARF info. A vertical line is printed for each
live-range, with a label at the top giving the variable name and
location, and the position and length of the line indicating the program
counter range in which it is valid.

Currently, this only works for object files, not executables or shared
libraries.

Differential revision: https://reviews.llvm.org/D70720
2020-03-16 10:54:40 +00:00
David Stenberg c93652517c [DebugInfo] Handle generic type DW_OP_convert ops in llvm-dwarfdump
Summary:
This is a preparatory change for allowing LLVM to emit DW_OP_convert
operations converting to the generic type.

If DW_OP_convert's operand is 0, it converts the top of stack to the
generic type, as specified by DWARFv5 section 2.5.1.6:

"[...] takes one operand, which is an unsigned LEB128 integer that
 represents the offset of a debugging information entry in the current
 compilation unit, or value 0 which represents the generic type."

This adds support for such operations to llvm-dwarfdump.

Reviewers: aprantl, markus, jdoerfert, jhenderson

Reviewed By: aprantl

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D76141
2020-03-16 11:24:01 +01:00
Rui Ueyama a2923b2a1e Implement CET Shadow Stack (Intel Controlflow Enforcement Technology) support on Windows
Patch by Petr Penzin.

Windows support for CET is limited to shadow stack, which is enabled
by setting a PE bit in the linker.

Docs:

MSVC linker flag:
https://docs.microsoft.com/en-us/cpp/build/reference/cetcompat?view=vs-2019

IMAGE_DLLCHARACTERISTICS_EX_CET_COMPAT PE bit:
https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#extended-dll-characteristics

Differential Revision: https://reviews.llvm.org/D70606
2020-03-16 17:51:32 +09:00
Georgii Rymar 2005c60a6b [obj2yaml][test] - Simplify call-graph-profile-section.yaml. NFCI.
Use yaml2obj -D to simplify.
We started to use it for other tests recently.
2020-03-16 11:47:10 +03:00
Shengchen Kan d2b522f173 [NFC][X86] Simplify test cases for branch align 2020-03-16 16:30:29 +08:00
Simon Atanasyan e0ab0e6a28 [MIPS] Implement PUL.PS and PUU.PS instructions
Patch by Michael Roe.

Differential Revision: https://reviews.llvm.org/D75812
2020-03-16 09:39:47 +03:00
Serguei Katkov ad643d5e93 [Verifier] Remove invalid verifier check
According to LangRef for unordered atomic memory transfer intrinsics
"The first three arguments are the same as they are in the @llvm.memcpy intrinsic, with the added constraint that
 len is required to be a positive integer multiple of the element_size. If len is not a positive integer multiple
 of element_size, then the behaviour of the intrinsic is undefined."

So the len is not multiple of element size is just an undefined behavior and verifier should not complain about that
as undefined behavior is allowed in LLVM IR.

This change removes the verifier check for this condition

Reviewers: reames
Reviewed By: reames
Subscribers: dantrushin, hiraditya, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D76116
2020-03-16 12:00:08 +07:00
Juneyoung Lee 6ad63606ea [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-16 12:46:20 +09:00
Juneyoung Lee 4ffe3ac729 Revert "[CodeGenPrepare] Freeze condition when transforming select to br"
This reverts commit 10aa7ea951.
2020-03-16 12:45:54 +09:00
Philip Reames a79863f2f7 Support prefix padding for alignment purposes (Relaxable instructions only)
Now that D75203 has landed and baked for a few days, extend the basic approach to prefix padding as well. The patch itself is fairly straight forward.

For the moment, this patch adds the functional support and some basic testing there of, but defaults to not enabling prefix padding. I want to be able to phrase a separate patch which adds the target specific reasoning and test it cleanly. I haven't decided whether I want to common it with the nop logic or not.

Differential Revision: https://reviews.llvm.org/D75300
2020-03-15 19:53:41 -07:00
QingShan Zhang f84beee9b8 [NFC][Test] Add three tests to verify the behavior of a*b-c*d if there is multi-uses 2020-03-16 01:58:49 +00:00
Fangrui Song ecd6d7254e [test] llvm/test/: change llvm-objdump single-dash long options to double-dash options
As announced here: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131786.html

Grouped option syntax (POSIX Utility Conventions) does not play well with -long-option
A subsequent change will reject -long-option.
2020-03-15 17:46:23 -07:00
Craig Topper b2da1ddaef [X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw. 2020-03-15 17:18:46 -07:00
Fangrui Song 6ed18eaa77 [llvm-objdump][test] Change llvm-objdump tests to use double dash options 2020-03-15 16:01:26 -07:00
Fangrui Song b1cdada023 [llvm-objdump][test] Move {AArch64,ARM}/* to ELF/ARM/ or MachO/ARM/ and {AMDGPU,Hexagon,Mips,powerPC}/ to ELF/ 2020-03-15 15:18:33 -07:00
Fangrui Song d385133249 [llvm-objdump][test] Move {AArch64,X86}/macho-* to MachO/ 2020-03-15 15:05:12 -07:00
Matt Arsenault ce33926342 AMDGPU/GlobalISel: Remove -global-isel-abort=0 from some tests 2020-03-15 17:22:34 -04:00
Matt Arsenault fe6037172b AMDGPU/GlobalISel: Add more tests for G_SADDE/G_SSUBE
These don't work, but add baseline tests.
2020-03-15 16:54:40 -04:00
Matt Arsenault 79cda46e49 AMDGPU/GlobalISel: Add baseline test for mul 2020-03-15 14:53:51 -04:00
Matt Arsenault de5b2cfdd4 AMDGPU/GlobalISel: Add baseline test for mul 2020-03-15 14:50:36 -04:00
Simon Pilgrim 3ffb5ef7b0 [PowerPC] Regenerate rotate tests 2020-03-15 18:29:00 +00:00
Simon Pilgrim 1ec395523d [Thumb2] Regenerate rotate tests 2020-03-15 18:28:54 +00:00
Simon Pilgrim 775bf62698 [SystemZ] Regenerate rotate/shift tests 2020-03-15 16:42:46 +00:00
Simon Pilgrim 5641804298 [DAG] MatchRotate - Add funnel shift by variable support
Followup to D75114, this patch reuses the existing MatchRotate ROTL/ROTR rotation pattern code to also recognize the more general FSHL/FSHR funnel shift patterns when we have variable shift amounts, matched with MatchFunnelPosNeg which acts in an (almost) equivalent manner to MatchRotatePosNeg.
2020-03-15 11:50:45 +00:00
Florian Hahn 650f363bd7 [ValueLattice] Add singlecrfromundef lattice value.
This patch adds a new singlecrfromundef lattice value, indicating a
single element constant range which was merge with undef at some point.
Merging it with another constant range results in overdefined, as we
won't be able to replace all users with a single value.

This patch uses a ConstantRange instead of a Constant*, because regular
integer constants are represented as single element constant ranges as
well and this allows the existing code working without additional
changes.

Reviewers: efriedma, nikic, reames, davide

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75845
2020-03-15 11:23:46 +00:00
Juneyoung Lee 27f303924e Be more strict when checking existence of foo 2020-03-15 12:02:19 +09:00
Juneyoung Lee 10aa7ea951 [CodeGenPrepare] Freeze condition when transforming select to br
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.

The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).

Reviewers: jdoerfert, spatel, lebedev.ri, efriedma

Reviewed By: lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76179
2020-03-15 11:10:46 +09:00
Lang Hames 1e66710d39 [JITLink][AArch64] Fix incorrect capitalization in a testcase name. 2020-03-14 18:54:40 -07:00
Lang Hames 9c9eb60b4b [JITLink][MachO] Re-apply b64afadf30, MachO linker-private support, with fixes.
Global symbols with linker-private prefixes should be resolvable across object
boundaries, but internal symbols with linker-private prefixes should not.
2020-03-14 18:36:15 -07:00
Lang Hames a7d187d9c0 Revert "[JITLink][MachO] Treat linker private symbols as hidden rather than private."
This reverts commit b64afadf30.

Reverting while I investigate bot failures.
2020-03-14 16:52:25 -07:00
Craig Topper 1ffc507405 [X86] Add avx512f only command lines to the vector add/sub saturation tests. NFC
Gives us coverage of splitting the v32i16/v64i8 when we have
avx512f and not avx512bw.

Considering making v32i16/v64i8 a legal type on avx512f which
needs this test coverage.
2020-03-14 16:50:44 -07:00
Lang Hames b64afadf30 [JITLink][MachO] Treat linker private symbols as hidden rather than private.
Linker-private symbols should be resolvable across object file boundaries.
2020-03-14 16:33:15 -07:00
Florian Hahn b8b8f04c0d [ValueLattice] Go to overdefined in getRange() for full ranges.
This is was split off 4878aa36d4,
as it can go in separately.
2020-03-14 19:50:15 +00:00
Florian Hahn 4878aa36d4 [ValueLattice] Add new state for undef constants.
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.

The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces  the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.

Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.

Reviewers: efriedma, reames, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75120
2020-03-14 17:19:59 +00:00
Georgii Rymar b236b4cb43 [yaml2obj] - Set a default value for `PAddr` property of a program header to a value of `VAddr`
`PAddr` corresponds to `p_paddr` of a program header, which is the segment's physical
address for systems in which physical addressing is relevant. `p_paddr` is often equal
to `p_vaddr`, which is the virtual address of a segment.

This patch changes the default for `PAddr` from 0 to a value of `VAddr`.

Differential revision: https://reviews.llvm.org/D76131
2020-03-14 17:44:57 +03:00
Martin Storsjö 97c7be9028 [llvm-dlltool] Add a testcase to show the kind of weak external used for import library aliases. NFC. 2020-03-14 14:00:36 +02:00
Shengchen Kan e6f1dd40bd [X86] Disable nop padding before instruction following a prefix
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76052
2020-03-14 13:15:30 +08:00
Diogo Sampaio 83cdb654e4 [AArch64][Fix] LdSt optimization generate premature stack-popping
Summary:
When moving add and sub to memory operand instructions,
aarch64-ldst-opt would prematurally pop the stack pointer,
before memory instructions that do access the stack using
indirect loads.
e.g.
```
int foo(int offset){
    int local[4] = {0};
    return local[offset];
}
```
would generate:
```
sub     sp, sp, #16            ; Push the stack
mov     x8, sp                 ; Save stack in register
stp     xzr, xzr, [sp], #16    ; Zero initialize stack, and post-increment, making it invalid
------ If an exception goes here, the stack value might be corrupted
ldr     w0, [x8, w0, sxtw #2]  ; Access correct position, but it is not guarded by SP
```

Reviewers: fhahn, foad, thegameg, eli.friedman, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, hiraditya, danielkiss, llvm-commits, simon_tatham

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75755
2020-03-14 02:03:10 +00:00
Craig Topper 755e00876c [X86] Remove isel patterns for X86VBroadcast+trunc+extload. Replace with DAG combines.
This is a little more complicated than I'd like it to be. We have
to manually match a trunc+srl+load pattern that generic DAG
combine won't do for us due to isTypeDesirableForOp.
2020-03-13 18:12:16 -07:00
Eli Friedman 65fc706ddf [SCEV] Add support for GEPs over scalable vectors.
Because we have to use a ConstantExpr at some point, the canonical form
isn't set in stone, but this seems reasonable.

The pretty sizeof(<vscale x 4 x i32>) dumping is a relic of ancient
LLVM; I didn't have to touch that code. :)

Differential Revision: https://reviews.llvm.org/D75887
2020-03-13 16:12:45 -07:00
Akira Hatanaka c6f1713c46 [ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't
a tail call

This reapplies the patch in https://reviews.llvm.org/rG1f5b471b8bf4,
which was reverted because it was causing crashes.

https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2

Check that HasSafePathToCall is true before checking the call is a tail
call.

Original commit message:

Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the
following code, which caused the object returned by @something to be
placed in the autorelease pool because the call to @something isn't a
tail call:

```
  %call = call i8* @something(...)
  %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call)
  %3 = call i8* @objc_autoreleaseReturnValue(i8* %2)
  ret i8* %3
```

Fix the bug by checking whether @something is a tail call.

rdar://problem/59275894
2020-03-13 13:52:14 -07:00
Stanislav Mekhanoshin c262b69dcc [AMDGPU] Fix endcf collapse
Only collapse inner endcf if the outer one belongs to SI_IF.
If it does belong to SI_ELSE then mask being restored in fact
a partial inverse of what we need.

Differential Revision: https://reviews.llvm.org/D76154
2020-03-13 13:50:21 -07:00
Martin Storsjö 8f540dad61 [COFF] Assign unique names to autogenerated .weak.<name>.default symbols
These symbols need to be external (MSVC tools error out if a weak
external points at a symbol that isn't external; this was tried before
but had to be reverted in bc5b7217dc,
and this was originally explicitly fixed in
732eeaf2a9).

If multiple object files have weak symbols with defaults, their
defaults could cause linker errors due to duplicate definitions,
unless the names of the defaults are unique.

GNU binutils handles this by appending the name of another symbol
from the same object file to the name of the default symbol. Try
to implement something similar; before writing the object file,
locate a symbol that should have a unique name and use the name of
that one for making the weak defaults unique.

Differential Revision: https://reviews.llvm.org/D75989
2020-03-13 22:44:55 +02:00
Matt Arsenault 015b640be4 AMDGPU: Add flag to used fixed function ABI
Pass all arguments to every function, rather than only passing the
minimum set of inputs needed for the call graph.
2020-03-13 13:27:05 -07:00
Alexey Zhikhartsev f71abec661 [LoopInterchange] Fix interchanging contents of preheader BBs
Summary:
Previously LCSSA was getting broken by placing instructions into the
(newly) inner *header* instead of the *pre*header.

Fixes PR43474

Reviewers: fhahn

Reviewed By: fhahn

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75943
2020-03-13 15:59:37 -04:00
Matt Arsenault bb8622094d AMDGPU: Don't handle kernarg.segment.ptr in functions
Just lower this to null. Pass implicitarg.ptr in its place in the
argument list.
2020-03-13 12:51:12 -07:00
Nico Weber f82b32a51e Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit 5aa5c943f7.
Causes clang to assert, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1061533#c4
for a repro.
2020-03-13 15:37:44 -04:00
Stanislav Mekhanoshin 32e90cbcd1 [AMDGPU] Disable endcf collapse
There are some functional regressions and I suspect our
scopes are not as perfectly enclosed as I expected.
Disable it for now.

Differential Revision: https://reviews.llvm.org/D76148
2020-03-13 12:33:22 -07:00
Reid Kleckner 478b06e687 Revert "[ObjC][ARC] Check the basic block size before calling DominatorTree::dominate"
This reverts commit 5c3117b0a9

This should not be necessary after
7593a480db, and Florian Hahn has confirmed
that the problem no longer reproduces with this patch.

I happened to notice this code because the FIXME talks about
OrderedBasicBlock.

Reviewed By: fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D76075
2020-03-13 11:57:55 -07:00
Simon Pilgrim 05c0d34918 [X86][SSE] Prefer trunc(movd(x)) to pextrb(x,0)
If we're extracting the 0'th index of a v16i8 vector we're better off using MOVD than PEXTRB, unless we're storing the value or we require the implicit zero extension of PEXTRB.

The biggest perf diff is on SLM targets where MOVD (uops=1, lat=3 tp=1) is notably faster than PEXTRB (uops=2, lat=5, tp=4).

This matches what we already do for PEXTRW.

Differential Revision: https://reviews.llvm.org/D76138
2020-03-13 18:43:04 +00:00
Sanjay Patel 89b19e8959 [SimplifyCFG] add test for chain of empty block conditional branches; NFC 2020-03-13 14:39:31 -04:00
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Sanjay Patel afc4dcee83 [SimplifyCFG] regenerate complete test checks; NFC 2020-03-13 14:12:28 -04:00
Sanjay Patel 7fe0e70ecc [SimplifyCFG] regenerate test checks; NFC 2020-03-13 14:12:28 -04:00
Florian Hahn e30c257811 [CVP,SCCP] Precommit test for D75055.
Test case for PR44949.
2020-03-13 17:53:39 +00:00
Philip Reames 1b86ad27a7 Use 15 byte long nops on modern Intel processors
Back in D42616, we switched our default nop length from 15 to 10 bytes because some platforms have painful decode stalls when encountering multiple instruction prefixes. (10 byte long nops come from the fact that prefixes are used to pad after 8 bytes, and some platforms have issues w/more than two prefixes.)

Based on Agner's guides, it appears to be the case that modern Intel (SandyBridge and later) can decode an arbitrary number of prefixes without issue. Intel's guide only provides up to 9 bytes; I read that as providing a safe default for all their chips. Older chips and Atom series have serious decode stalls. I can't find a conclusive reference beyond those two.

Differential Revision: https://reviews.llvm.org/D75945
2020-03-13 10:51:09 -07:00
Simon Cook a26bd4ec16 [TableGen] Support combining AssemblerPredicates with ORs
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).

AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.

To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.

This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html

Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.

At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.

Differential Revision: https://reviews.llvm.org/D74338
2020-03-13 17:13:51 +00:00
Florian Hahn 0c5b6e2ea5 Recommit "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This patch should fix the cause of the stage2 failures and
PR45185.

This reverts the revert commit c52f839e72.
2020-03-13 17:03:22 +00:00
Simon Pilgrim a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Simon Pilgrim ec3218dbee [X86] Add cttz/ctlz tests for i686 with CMOV target 2020-03-13 16:51:13 +00:00
Tyker 2543567c41 [AssumeBundles] filter usefull attriutes to preserve
Summary:
This patch will filter attributes to only preserve those that are usefull.
In the case of NoAlias it is filtered out not because it isn't usefull
but because it is incorrect to preserve it as it is only valdi for the
duration of the function.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jdoerfert, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75828
2020-03-13 17:35:47 +01:00
Tyker 69375fd0a3 [AssumeBundles] Preserve Information in the inliner
Summary:
during inling Create and insert an llvm.assume with attributes to preserve them.
to prevent any changes for now generation of llvm.assume is under a flag disabled by default.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75825
2020-03-13 17:35:47 +01:00
omarahmed1111 b285b333dc [Attributor] Detect possibly unbounded cycles in functions
This patch add mayContainUnboundedCycle helper function which checks whether a function has any cycle which we don't know if it is bounded or not.
Loops with maximum trip count are considered bounded, any other cycle not.
It also contains some fixed tests and some added tests contain bounded and
unbounded loops and non-loop cycles.

Reviewed By: jdoerfert, uenoku, baziotis

Differential Revision: https://reviews.llvm.org/D74691
2020-03-13 11:17:33 -05:00
Pankaj Gode bf990530ae [Attributor] Improve noalias preservation using reachability
Resolution for below fixme:
(ii) Check whether the value is captured in the scope using AANoCapture.
FIXME: This is conservative though, it is better to look at CFG and
             check only uses possibly executed before this callsite.

Propagates caller argument's noalias attribute to callee.

Reviewed by: jdoerfert, uenoku

Reviewers: jdoerfert, sstefan1, uenoku

Subscribers: uenoku, sstefan1, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D71617
2020-03-13 21:09:08 +05:30
Fangrui Song 7b74b0d4e5 [llvm-objdump] --syms: print 'u' for STB_GNU_UNIQUE
GCC when configured with --enable-gnu-unique (default on glibc>=2.11)
emits STB_GNU_UNIQUE for certain objects which are otherwise emitted as
STT_OBJECT, such as an inline function's static local variable or its
guard variable, and a static data member of a template.

Clang does not implement -fgnu-unique.

Implementing it as a binding is strange and the feature itself is
considered by some as a misfeature.

Reviewed By: grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D75797
2020-03-13 08:04:09 -07:00
Fangrui Song e799405e53 [llvm-objdump] --syms: print 'i' for STT_GNU_IFUNC
Reviewed By: grimar, Higuoxing, jhenderson

Differential Revision: https://reviews.llvm.org/D75793
2020-03-13 08:02:36 -07:00
Fangrui Song 0bd3da5bfa [llvm-objdump][test] Reorganize ELF --syms tests
Merge symbol-table-elf.test and common-symbol-elf.test, and add some
more tests (invalid st_type, STT_COMMON, STT_GNU_IFUNC, STT_HIOS, STT_LOPROC, SHN_UNDEF, SHN_ABS, SHN_COMMON, STB_GNU_UNIQUE, invalid binding, etc) to test/llvm-objdump/ELF/symbol-table.test

The naming follows test/llvm-{readobj,objcopy}/ELF .

Some discrepancy from GNU objdump:

* STT_COMMON: can be produced with `ld.bfd -r -z common`, but it almost never exists in practice
* STT_GNU_IFUNC: will be fixed by D75793
* STB_GNU_UNIQUE: will be fixed by D75797
* STT_TLS: GNU objdump does not print 'O'
* unknown binding: GNU objdump does not print 'g'. This probably does not matter.
* A reserved symbol index is displayed as *ABS* in GNU objdump. It is not clear what we should print.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D75796
2020-03-13 08:00:59 -07:00
Nico Weber 86eb2c3991 Revert "[ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't"
This reverts commit 1f5b471b8b.
Causes asserts when building code with arc. See
https://bugs.chromium.org/p/chromium/issues/detail?id=1061289#c2
for a full repro. Will post a creduced repro once creduce is done
running.
2020-03-13 10:16:02 -04:00
Clement Courbet ffe3515aa7 [ExpandMemCmp][NFC] Add more tests. 2020-03-13 15:06:52 +01:00
Andrzej Warzynski a0c15ed460 [AArch64][SVE] Add the @llvm.aarch64.sve.dup.x intrinsic
Summary:
This intrinsic implements the unpredicated duplication of scalar values
and is mapped to (through ISD::SPLAT_VECTOR):
  * DUP <Zd>.<T>, #<imm>
  * DUP <Zd>.<T>, <R><n|SP>

Reviewed by: sdesmalen

Differential Revision: https://reviews.llvm.org/D75900
2020-03-13 12:40:22 +00:00
David Green 2c6c169dbd [ARM] Optimise ASRL/LSRL to smaller shifts using demand bits.
The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have
them, it might turn out that enough of the 64bit result was not required
that we can use a smaller shift to perform the same result. As the
smaller shift can in general be folded in more way, such as into add
instructions in one of the test cases here, we can use the demand bit
analysis to prefer the smaller shifts where we can.

Differential Revision: https://reviews.llvm.org/D75371
2020-03-13 10:09:03 +00:00
Georgii Rymar 6f3de2e53d [yaml2obj][obj2yaml][test] - Add base tests for relocation addends.
We had no test for `Addend` field of a relocation. Though the
current behavior is not ideal and might need to be fixed.

This patch adds 2 test cases to document the current
behavior and add a few FIXMEs. These FIXME are fixed in the
follow-up: https://reviews.llvm.org/D75527

Differential revision: https://reviews.llvm.org/D75528
2020-03-13 13:07:46 +03:00
David Green f67d93dc23 [ARM] Constant long shift combines
This changes the way that asrl and lsrl intrinsics are lowered, going
via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes.
On top of that, it adds some constant folds for long shifts, in case it
turns out that the shift amount was either constant or 0.

Differential Revision: https://reviews.llvm.org/D75553
2020-03-13 08:54:59 +00:00
Juneyoung Lee c39cb1c0dd [CodeGenPrepare] Expand freeze conversion to support fcmp and icmp with null
Summary:
This is a simple patch that expands https://reviews.llvm.org/D75859 to pointer comparison and fcmp

Checked with Alive2

Reviewers: reames, jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76048
2020-03-13 17:21:33 +09:00
Juneyoung Lee 48b901b0e1 Add tests to Transforms/CodeGenPrepare/X86/freeze-cmp.ll before commiting D76048 2020-03-13 17:18:42 +09:00
Craig Topper 09c8f38924 [X86] Add isel patterns for X86VBroadcast with i16 truncates from i16->i64 zextload/extload.
We can form vpbroadcastw with a folded load.

We had patterns for i16->i32 zextload/extload, but nothing prevents
i64 from occuring.

I'd like to move this all to DAG combine to fix more cases, but
this is trivial fix to minimize test diffs when moving to a combine.
2020-03-13 00:10:48 -07:00
Craig Topper 51a4c6125c [X86] Add test cases for failures to form vbroadcastw due to isTypeDesirableForOp preventing load shrinking to i16.
These are based on existing test cases but use i64 instead of i32.
Some of these end up with i64 zextload/extloads from i16 that we
don't have isel patterns for.

Some of the other cases fail because isTypeDesirableForOp prevents
shrinking the (trunc (i64 (srl (load)))) directly. So we try
to shrink based on the (i64 (srl (load))) but we need 64 - shift_amount
to be a power of 2 to do that shrink.
2020-03-12 23:20:05 -07:00
Johannes Doerfert a198adb490 [Attributor] IPO across definition boundary of a function marked alwaysinline
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D75590
2020-03-13 01:06:12 -05:00
Johannes Doerfert 40815a4957 Revert "[Attributor] Enable test with update check lines"
This reverts commit 13def55b3f.

This broke a buildbot, will investigate.
2020-03-13 00:59:47 -05:00
Johannes Doerfert 1c9c23d60e [OpenMP][Opt][NFC] Add test case for known runtime function attributes
This test somehow did not make it in before.
2020-03-13 00:28:14 -05:00
Johannes Doerfert 13def55b3f [Attributor] Enable test with update check lines
The test disabled in 528a6a1d4c is enabled
again with the check lines for 9708279c72.
2020-03-12 23:24:15 -05:00
Arlo Siemsen 1478ed69d3 Add support for SHA256 source file checksums in debug info
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.

The SHA256 checksums are supported by the CodeView debug format.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D75785
2020-03-12 16:32:05 -07:00
Huihui Zhang f4f2706572 [ConstantFold][SVE] Fix constant folding for scalable vector compare instruction.
Summary:
Do not iterate on scalable vector. Also do not return constant scalable vector
from ConstantInt::get().
Fix result type by using getElementCount() instead of getNumElements().

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73753
2020-03-12 16:15:38 -07:00
Matt Arsenault ccc6e780c8 AMDGPU: Directly annotate functions if they have calls
Currently we infer whether the flat-scratch-init kernel input should
be enabled based on calls. Move this handling, so we can decide if the
full set of ABI inputs is needed in kernels. Ideally we would have an
analysis of some sort, rather than the function attributes.
2020-03-12 19:10:59 -04:00
Lang Hames 7266a8bfeb [ORC] Enable exception handling in JIT'd code when using LLJIT on Darwin.
This patch enables exception handling in code added to LLJIT on Darwin by
adding an orc::EHFrameRegistrationPlugin instance to the ObjectLinkingLayer
(which is currently used on Darwin only).
2020-03-12 15:33:56 -07:00
Stanislav Mekhanoshin a73528649c [AMDGPU] Simplify exec copies
The patch removes late endcf handling and only leaves the
related portion with redundant exec mask copy elimination.

Differential Revision: https://reviews.llvm.org/D76095
2020-03-12 14:54:19 -07:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Simon Pilgrim e91feeed21 [AMDGPU] Add ISD::FSHR -> ALIGNBIT support
This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions.

This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default.

Differential Revision: https://reviews.llvm.org/D76070
2020-03-12 20:16:57 +00:00
Simon Pilgrim 2a2d242017 [DAGCombine] foldVSelectOfConstants - ensure constants are same type
Fix bug identified by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21167, foldVSelectOfConstants must ensure that the 2 build vectors have scalars of the same type before trying to compare APInt values.
2020-03-12 20:02:05 +00:00
Thomas Lively 4e589e6c26 [WebAssembly] Fix SIMD shift unrolling to avoid assertion failure
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76043
2020-03-12 12:20:14 -07:00
David Green 05334de679 [ARM] Long shift tests. NFC 2020-03-12 19:01:49 +00:00
Florian Hahn c52f839e72 Revert "[SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)"
This commit is likely causing clang-with-lto-ubuntu to fail
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/16052

Also causes PR45185.

This reverts commit f1ac5d2263.
2020-03-12 18:49:11 +00:00
Stanislav Mekhanoshin 360aff0493 [AMDGPU] Simplify nested SI_END_CF
This is to replace the optimization from the SIOptimizeExecMaskingPreRA.
We have less opportunities in the control flow lowering because many
VGPR copies are still in place and will be removed later, but we know
for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64
with EXEC.

The subsequent change needs to convert s_and_saveexec into s_and and
address new TODO lines in tests, then code block guarded by the
-amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer
will be removed.

Differential Revision: https://reviews.llvm.org/D76033
2020-03-12 11:25:07 -07:00
Zarko Todorovski d688312660 [PowerPC][AIX] Implement formal arguments passed in stack memory.
This patch is the callee side counterpart for https://reviews.llvm.org/D73209.
It removes the fatal error when we pass more formal arguments than available
registers.

Differential Revision: https://reviews.llvm.org/D74225
2020-03-12 11:48:00 -04:00
Xiangling Liao 3e53bf5781 [PowerPC32] Fix the `setcc` inconsistent result type problem
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.

This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType`  interface.

Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75702
2020-03-12 10:50:37 -04:00
Sean Fertile 8b39341fb0 [PowerPC][AIX] Fix printing of program counter for AIX assembly.
Program counter on AIX is the dollar-sign.

Differential Revision:https://reviews.llvm.org/D75627
2020-03-12 10:37:18 -04:00
Andrzej Warzynski 46b9f14d71 [AArch64][SVE] Add intrinsics for non-temporal scatters/gathers
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
  * aarch64_sve_ldnt1_gather_index
  * aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.

As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.

The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
  * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
  * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1

The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75601
2020-03-12 13:55:56 +00:00
Sanjay Patel a66dc755db [InstSimplify] simplify FP ops harder with FMF (part 2)
This is part of the IR sibling for:
D75576

Related transform committed with:
rG8ec71585719d
2020-03-12 09:53:20 -04:00
Sanjay Patel 8ec7158571 [InstSimplify] simplify FP ops harder with FMF
This is part of the IR sibling for:
D75576

(I'm splitting part of the transform as a separate commit
to reduce risk. I don't know of any bugs that might be
exposed by this improved folding, but it's hard to see
those in advance...)
2020-03-12 09:13:28 -04:00
Simon Pilgrim fa8ce7c0fa [AMDGPU] Add some funnel shift intrinsic test coverage 2020-03-12 12:53:57 +00:00
Nico Weber e51d4df4b2 Use `grep -F` instead of deprecated fgrep.
(In addition to the deprecation bit, this is useful on Windows
where people might have grep but not fgrep.)
2020-03-12 08:34:38 -04:00
Sanjay Patel d748e759d5 [InstSimplify] add tests for FP poison; NFC
Adapted from codegen tests seen in D75576.
2020-03-12 08:32:04 -04:00
Florian Hahn f1ac5d2263 [SCCP] Use ValueLatticeElement instead of LatticeVal (NFCI)
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.

This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used

* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.

Reviewers: davide, efriedma, mssimpso

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60582
2020-03-12 12:03:06 +00:00
Simon Tatham 3f8e714e2f [ARM,MVE] Add intrinsics and isel for MVE fused multiply-add.
Summary:
This adds the ACLE intrinsic family for the VFMA and VFMS
instructions, which perform fused multiply-add on vectors of floats.

I've represented the unpredicated versions in IR using the cross-
platform `@llvm.fma` IR intrinsic. We already had isel rules to
convert one of those into a vector VFMA in the simplest possible way;
but we didn't have rules to detect a negated argument and turn it into
VFMS, or rules to detect a splat argument and turn it into one of the
two vector/scalar forms of the instruction. Now we have all of those.

The predicated form uses a target-specific intrinsic as usual, but
I've stuck to just one, for a predicated FMA. The subtraction and
splat versions are code-generated by passing an fneg or a splat as one
of its operands, the same way as the unpredicated version.

In arm_mve_defs.h, I've had to introduce a tiny extra piece of
infrastructure: a record `id` for use in codegen dags which implements
the identity function. (Just because you can't declare a Tablegen
value of type dag which is //only// a `$varname`: you have to wrap it
in something. Now I can write `(id $varname)` to get the same effect.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75998
2020-03-12 11:13:50 +00:00
Max Kazantsev 3dc6e53c97 [LoopPeel] Turn incorrect assert into a check
Summary:
This patch replaces incorrectt assert with a check. Previously it asserts that
if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove
`isKnownPredicate(A == B)`.

Both these fact may be not provable. It is shown in the provided test:

Could not prove: `{-294,+,-2}<%bb1> !=  0`
Asserting: `{-294,+,-2}<%bb1> == 0`

Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot
also prove that it is not zero.

Instead of assert, we should be checking the required conditions explicitly.

Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev
Reviewed By: lebedev.ri
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76050
2020-03-12 17:23:07 +07:00
Qiu Chaofan 096d545376 [PowerPC] Add strict-fp intrinsic to FP arithmetic
This patch adds basic strict-fp intrinsics support to PowerPC backend,
including basic arithmetic operations (add/sub/mul/div).

Reviewed By: steven.zhang, andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D63916
2020-03-12 17:02:54 +08:00
Clement Courbet 4edd050c7e [ExpandMemCmp][NFC] Add more tests. 2020-03-12 08:49:51 +01:00
Shengchen Kan 3a503ce663 [X86] Reduce the number of emitted fragments due to branch align
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames, LuoYuanke

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75438
2020-03-12 15:37:35 +08:00
Juneyoung Lee 629cf3c1c5 Apply update_test_check.py to CodeGenPrepare/X86/freeze-icmp.ll test 2020-03-12 16:37:16 +09:00
Djordje Todorovic 3b984641a7 [DebugInfo] Fix build failure on the mingw
Add the workaround for the X86::MOV16ri when describing call site
parameters.
2020-03-12 08:18:01 +01:00
QingShan Zhang 518292dbdf [PowerPC] Add the MacroFusion support for Power8
This patch is intend to implement the missing P8 MacroFusion for LLVM
according to Power8 User's Manual Section 10.1.12 Instruction Fusion

Differential Revision: https://reviews.llvm.org/D70651
2020-03-12 05:15:41 +00:00
Philip Reames 8fffa40400 [GC] Remove redundant entiries in stackmap section (and test it this time)
This is a reimplementation of the optimization removed in D75964. The actual spill/fill optimization is handled by D76013, this one just worries about reducing the stackmap section size itself by eliminating redundant entries. As noted in the comments, we could go a lot further here, but avoiding the degenerate invoke case as we did before is probably "enough" in practice.

Differential Revision: https://reviews.llvm.org/D76021
2020-03-11 21:24:48 -07:00
Bill Wendling 6aebf0ee56 Specify branch probabilities for callbr dests
Summary:
callbr's indirect branches aren't expected to be taken, so reduce their
probabilities to 0 while increasing the default destination to 1. This
allows some code improvements through block placement.

Reviewers: nickdesaulniers

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72656
2020-03-11 20:33:48 -07:00
Kang Zhang c4cbc58062 [NFC][PowerPC] Add a new MIR file te test ppc-early-ret pass 2020-03-12 03:18:56 +00:00
Lang Hames c700e0317c [JITLink] Read symbol linkage from the correct field.
MachO symbol linkage is described by the desc field of the nlist entry, not the
type field.
2020-03-11 20:04:54 -07:00
Adrian Prantl d5180ea134 Add debug info support for Swift/Clang APINotes.
In order for dsymutil to collect .apinotes files (which capture
attributes such as nullability, Swift import names, and availability),
I want to propose adding an apinotes: field to DIModule that gets
translated into a DW_AT_LLVM_apinotes (path) nested inside
DW_TAG_module. This will be primarily used by LLDB to indirectly
extract the Swift names of Clang declarations that were deserialized
from DWARF.

<rdar://problem/59514626>

Differential Revision: https://reviews.llvm.org/D75585
2020-03-11 18:47:30 -07:00
Stanislav Mekhanoshin d4757a6cf1 [AMDGPU] pre-commit collapse-endcf.mir. NFC.
Pre commit test before D76033.
2020-03-11 16:15:29 -07:00
Tyker 70c0a9675d [AssumeBundles] Enforce constraints on the operand bundle of llvm.assume
Summary: Add verification that operand bundles on an llvm.assume are well formed to the verify pass.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75269
2020-03-11 23:53:48 +01:00
Huihui Zhang 8f52573962 [InstSimplify][SVE] Fix SimplifyInsert/ExtractElementInst for scalable vector.
Summary:
For scalable vector, index out-of-bound can not be determined at compile-time.
The same apply for VectorUtil findScalarElement().

Add test cases to check the functionality of SimplifyInsert/ExtractElementInst for scalable vector.

Reviewers: sdesmalen, efriedma, spatel, apazos

Reviewed By: efriedma

Subscribers: cameron.mcinally, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75782
2020-03-11 15:09:56 -07:00
Adrian Prantl e4e7e44765 Add an SDK attribute to DICompileUnit
This is part of PR44213 https://bugs.llvm.org/show_bug.cgi?id=44213

When importing (system) Clang modules, LLDB needs to know which SDK
(e.g., MacOSX, iPhoneSimulator, ...) they came from. While the sysroot
attribute contains the absolute path to the SDK, this doesn't work
well when the debugger is run on a different machine than the
compiler, and the SDKs are installed in different directories. It thus
makes sense to just store the name of the SDK instead of the absolute
path, so it can be found relative to LLDB.

rdar://problem/51645582

Differential Revision: https://reviews.llvm.org/D75646
2020-03-11 14:14:06 -07:00
Jin Lin a0cacb6054 Fix conflict value for metadata "Objective-C Garbage Collection" in the mix of swift and Objective-C bitcode
Summary:
The change is to fix conflict value for metadata "Objective-C Garbage Collection" in the mix of swift and Objective-C bitcode.
The purpose is to provide the support of LTO for swift and Objective-C mixed project.

Reviewers: rjmccall, ahatanak, steven_wu

Reviewed By: rjmccall, steven_wu

Subscribers: manmanren, mehdi_amini, hiraditya, dexonsmith, llvm-commits, jinlin

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71219
2020-03-11 13:26:06 -07:00
Sanjay Patel fae900921b [InstCombine] reduce demand-limited bool math to logic
The cmp math test is inspired by memcmp() patterns seen in D75840.
I know there's at least 1 related fold we can do here if both
values are sext'd, but I'm not seeing a way to generalize further.

We have some other bool math patterns that we want to reduce, but
that might require fixing the bogus transforms noted in D72396.

Alive proof translations of the regression tests:
https://rise4fun.com/Alive/zGWi

  Name: demand add 1
  %xz = zext i1 %x to i32
  %ys = sext i1 %y to i32
  %sub = add i32 %xz, %ys
  %r = lshr i32 %sub, 31
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = zext i1 %and to i32

  Name: demand add 2
  %xz = zext i1 %x to i5
  %ys = sext i1 %y to i5
  %sub = add i5 %xz, %ys
  %r = and i5 %sub, 16
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = select i1 %and, i5 -16, i5 0

  Name: demand add 3
  %xz = zext i1 %x to i8
  %ys = sext i1 %y to i8
  %a = add i8 %ys, %xz
  %r = ashr i8 %a, 7
  =>
  %notx = xor i1 %x, 1
  %and = and i1 %y, %notx
  %r = sext i1 %and to i8

  Name: cmp math
  %gt = icmp ugt i32 %x, %y
  %lt = icmp ult i32 %x, %y
  %xz = zext i1 %gt to i32
  %yz = zext i1 %lt to i32
  %s = sub i32 %xz, %yz
  %r = lshr i32 %s, 31
  =>
  %r = zext i1 %lt to i32

Differential Revision: https://reviews.llvm.org/D75961
2020-03-11 15:45:58 -04:00