Commit Graph

120049 Commits

Author SHA1 Message Date
Thomas Lively 2a47e03ee4 [WebAssembly][FastISel] Do not assume naive CmpInst lowering
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=40172. See
test/CodeGen/WebAssembly/PR40172.ll for an explanation.

Reviewers: dschuff, aheejin

Subscribers: nikic, llvm-commits, sunfish, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D56457

llvm-svn: 351127
2019-01-14 22:03:43 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
James Y Knight 544fa425c9 [opaque pointer types] Update GetElementPtr creation APIs to
consistently accept a pointee-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56559

llvm-svn: 351124
2019-01-14 21:39:35 +00:00
James Y Knight 84c1dbde08 [opaque pointer types] Update LoadInst creation APIs to consistently
accept a return-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56558

llvm-svn: 351123
2019-01-14 21:37:53 +00:00
James Y Knight eb2c4af1bf [opaque pointer types] Update InvokeInst creation APIs to consistently
accept a callee-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56557

llvm-svn: 351122
2019-01-14 21:37:48 +00:00
James Y Knight f956390954 [opaque pointer types] Update CallInst creation APIs to consistently
accept a callee-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56556

llvm-svn: 351121
2019-01-14 21:37:42 +00:00
Jonathan Metzman e159a0dd1a [SanitizerCoverage][NFC] Use appendToUsed instead of include
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.

Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.

Reviewers: pcc

Reviewed By: pcc

Subscribers: hiraditya

Differential Revision: https://reviews.llvm.org/D56369

llvm-svn: 351118
2019-01-14 21:02:02 +00:00
Craig Topper 9906f77f82 [X86] Silence a -Wparentheses warning on gcc. NFC
llvm-svn: 351111
2019-01-14 19:44:02 +00:00
Simon Pilgrim bfe2ee453a [X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

llvm-svn: 351103
2019-01-14 19:07:26 +00:00
David Callahan 957795973b Ignore PhiNodes when mapping sample profile data
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.

Reviewers: danielcdh, twoh, Kader, wmi

Reviewed By: wmi

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55094

llvm-svn: 351102
2019-01-14 19:05:59 +00:00
David Callahan b1853a6a94 Revert "Merge branch 'arcpatch-D55094'"
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.

unintended merge pushed

llvm-svn: 351095
2019-01-14 18:49:27 +00:00
Sanjay Patel b23ff7a0e2 [x86] lower extracted add/sub to horizontal vector math
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0

This is the integer sibling to D56011.

There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more 
beneficial than this minimal case. An improvement to the more general 
h-op lowering may allow us to remove the restriction here in a follow-up.

llvm-svn: 351093
2019-01-14 18:44:02 +00:00
David Callahan 1b46231764 Merge branch 'arcpatch-D55094'
llvm-svn: 351092
2019-01-14 18:35:43 +00:00
Amara Emerson e07cdb107e Revert "[VFS] Allow multiple RealFileSystem instances with independent CWDs."
This reverts commit r351079, r351069 and r351050 as it broken the greendragon bots on macOS.

llvm-svn: 351091
2019-01-14 18:32:09 +00:00
Tom Stellard d0a7676087 cmake: Don't install plugins used for examples or tests
Summary:
This patch drops install targets for LLVMHello.so,
TestPlugin.so, and BugpointPasses.so.

Reviewers: chandlerc, beanz, thakis, philip.pfaffe

Reviewed By: chandlerc

Subscribers: SquallATF, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D55965

llvm-svn: 351087
2019-01-14 18:25:35 +00:00
Dan Gohman bbb548d85f [WebAssembly] Remove old intrinsics
This removes the old grow_memory and mem.grow-style intrinsics, leaving just
the memory.grow-style intrinsics.

Differential Revision: https://reviews.llvm.org/D56645

llvm-svn: 351084
2019-01-14 18:23:45 +00:00
Adrian Prantl fa2e35838c Reapply r345008 "Split MachinePipeliner code into header and cpp files"
Split MachinePipeliner code into header and cpp files to allow
inheritance from SwingSchedulerDAG.

This reapplies https://reviews.llvm.org/D56084 after moving the
implementation of the dump functions into the .cpp files. This fixes a
linker error when building with Clang modules enables and local
submodule visibility disabled.

Original patch by Lama Saba <lama.saba@intel.com>!

llvm-svn: 351077
2019-01-14 17:24:11 +00:00
James Y Knight 68729f94ee Remove NameLen argument from newly-introduced IR C APIs.
Normally, changing the function signatures of C APIs is disallowed,
but as these two are brand new last week, and haven't been released
yet, it is okay in this instance.

As per discussion in D56556, we will not add NameLen arguments to IR
building APIs, for the following reasons:

1. We do not want to deprecate all of the IR building APIs, just to add a
NameLen argument to each one.

2. Consistency is important, so adding it just to new ones is unfortunate.

3. The IR names are completely optional, useful for readability of IR
only. There is no value in ever supporting nul bytes.

Differential Revision: https://reviews.llvm.org/D56669

llvm-svn: 351076
2019-01-14 17:16:55 +00:00
Nirav Dave 3badfe74a2 Reland "Refactor GetRegistersForValue. NFCI."
Remove over-strictification class membership check.

llvm-svn: 351074
2019-01-14 17:09:45 +00:00
Simon Pilgrim a1bd4a6ba4 [DAGCombiner] Add (sub_sat x, x) -> 0 combine
llvm-svn: 351073
2019-01-14 15:43:34 +00:00
Simon Pilgrim fa1f518748 [DAGCombiner] Enable sub saturation constant folding
llvm-svn: 351072
2019-01-14 15:28:53 +00:00
Simon Pilgrim 7fc6882374 [DAGCombiner] Add add/sub saturation undef handling
Match ConstantFolding.cpp:
(add_sat x, undef) -> -1
(sub_sat x, undef) -> 0

llvm-svn: 351070
2019-01-14 14:16:24 +00:00
Sam McCall 7a99727c62 [VFS] Fix unused variable warning. NFC
llvm-svn: 351069
2019-01-14 14:13:24 +00:00
Simon Pilgrim cfa5f06dde [DAGCombiner] Enable add saturation constant folding
llvm-svn: 351060
2019-01-14 12:34:31 +00:00
Aleksandar Beserminji 4c4c0377ca [mips] Optimize shifts for types larger than GPR size (mips2/mips3)
With this patch, shifts are lowered to optimal number of instructions
necessary to shift types larger than the general purpose register size.

This resolves PR/32293.

Thanks to Kyle Butt for reporting the issue!

Differential Revision: https://reviews.llvm.org/D56320

llvm-svn: 351059
2019-01-14 12:28:51 +00:00
Jeremy Morse f216da7ee0 [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.

This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).

Differential Revision: https://reviews.llvm.org/D55272

llvm-svn: 351058
2019-01-14 12:13:12 +00:00
Simon Pilgrim 67610926fc [DAGCombiner] Add add saturation constant folding tests.
Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now.

llvm-svn: 351057
2019-01-14 12:12:42 +00:00
Diana Picus 8987d00653 [ARM GlobalISel] Import MOVi32imm into GlobalISel
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.

We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.

llvm-svn: 351056
2019-01-14 12:04:08 +00:00
Simon Pilgrim 3d42815cd8 [SelectionDAG] Add type sanity assertions for add/sub saturation node creation.
llvm-svn: 351055
2019-01-14 11:56:59 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Sam McCall c2b310aedf [VFS] Allow multiple RealFileSystem instances with independent CWDs.
Summary:
Previously only one RealFileSystem instance was available, and its working
directory is shared with the process. This doesn't work well for multithreaded
programs that want to work with relative paths - the vfs::FileSystem is assumed
to provide the working directory, but a thread cannot control this exclusively.

The new vfs::createPhysicalFileSystem() factory copies the process's working
directory initially, and then allows it to be independently modified.

This implementation records the working directory path, and glues it to relative
paths to provide the correct absolute path to the sys::fs:: functions.
This will give different results in unusual situations (e.g. the CWD is moved).

The main alternative is the use of openat(), fstatat(), etc to ask the OS to
resolve paths relative to a directory handle which can be kept open. This is
more robust. There are two reasons not to do this initially:
1. these functions are not available on all supported Unixes, and are somewhere
   between difficult and unavailable on Windows. So we need a path-based
   fallback anyway.
2. this would mean also adding support at the llvm::sys::fs level, which is a
   larger project. My clearest idea is an OS-specific `BaseDirectory` object
   that can be optionally passed to functions there. Eventually this could be
   backed by either paths or a fd where openat() is supported.
   This is a large project, and demonstrating here that a path-based fallback
   works is a useful prerequisite.

There is some subtlety to the path-manipulation mechanism:
  - when setting the working directory, both Specified=makeAbsolute(path) and
    Resolved=realpath(path) are recorded. These may differ in the presence of
    symlinks.
  - getCurrentWorkingDirectory() and makeAbsolute() use Specified - this is
    similar to the behavior of $PWD and sys::path::current_path
  - IO operations like openFileForRead use Resolved. This is similar to the
    behavior of an openat() based implementation, that doesn't see changes
    in symlinks.
There may still be combinations of operations and FS states that yield unhelpful
behavior. This is hard to avoid with symlinks and FS abstractions :(

The caching behavior of the current working directory is removed in this patch.
getRealFileSystem() is now specified to link to the process CWD, so the caching
is incorrect.
The user who needed this so far is clangd, which will immediately switch to
createPhysicalFileSystem().

Reviewers: ilya-biryukov, bkramer, labath

Subscribers: ioeric, kadircet, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D56545

llvm-svn: 351050
2019-01-14 10:56:35 +00:00
Francis Visoiu Mistrih b7cef81fd3 Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049
2019-01-14 10:55:55 +00:00
Petar Avramovic 7d370a36bb [MIPS GlobalISel] Add pre legalizer combiner pass
Introduce GlobalISel pre legalizer pass for MIPS.
It will be used to cope with instructions that require
combining before legalization.

Differential Revision: https://reviews.llvm.org/D56269

llvm-svn: 351046
2019-01-14 10:27:05 +00:00
Max Kazantsev 1f73310e1e [BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocks
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.

This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted. 

The logic of `DeleteDeadBlocks` is following:
  for each block
    create relevant DT updates;
    remove all instructions (replace with undef if needed);
    replace terminator with unreacheable;
  apply DT updates;
  for each block
    delete block;

Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.

Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov

llvm-svn: 351045
2019-01-14 10:26:26 +00:00
Thomas Preud'homme bc5e6ee87a Add support for prefix-only CLI options
Summary:
Add support for options that always prefix their value, giving an error
if the value is in the next argument or if the option is given a value
assignment (ie. opt=val). This is the desired behavior for the -D option
of FileCheck for instance.

Copyright:
- Linaro (changes in version 2 of revision D55940)
- GraphCore (changes in later versions and introduced when creating
  D56549)

Reviewers: jdenny

Subscribers: llvm-commits, probinson, kristina, hiraditya,
JonChesterfield

Differential Revision: https://reviews.llvm.org/D56549

llvm-svn: 351038
2019-01-14 09:28:53 +00:00
Craig Topper e7b4ea4726 [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351035
2019-01-14 08:46:45 +00:00
Craig Topper ab077dda72 [X86] Update type profile for DBPSADBW to indicate the immediate is an i8 not just any int.
Removes some type checks from X86GenDAGISel.inc

llvm-svn: 351033
2019-01-14 02:59:08 +00:00
Craig Topper c8cd85588b [X86] Remove unused intrinsic handlers. NFC
llvm-svn: 351032
2019-01-14 01:56:59 +00:00
Craig Topper 075fcc1151 [X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFC
llvm-svn: 351031
2019-01-14 01:44:09 +00:00
Craig Topper 3f3b8ef442 [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
The input mask can be represented with an AND in IR.

Fixes PR40258

llvm-svn: 351028
2019-01-14 00:03:50 +00:00
Simon Pilgrim 56ba1db933 [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)
NOTE: We need more powerful signed overflow detection in computeOverflowKind
llvm-svn: 351026
2019-01-13 22:08:26 +00:00
Simon Pilgrim 888fa8680c Fix unused variable warning. NFCI.
llvm-svn: 351025
2019-01-13 21:53:12 +00:00
Simon Pilgrim 897d4c6fe9 [DAGCombiner] Some very basic add/sub saturation combines.
Handle combines with zero and constant canonicalization for adds.

llvm-svn: 351024
2019-01-13 21:50:24 +00:00
Craig Topper 4978de36e4 [LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with an assert.
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.

It calls 'getNOT' on a node that should be a CondCode.

I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?

llvm-svn: 351022
2019-01-13 19:33:30 +00:00
Nikita Popov 0400e50445 [X86] Rename overly verbose method; NFC
As suggested on D56636.

llvm-svn: 351021
2019-01-13 16:41:26 +00:00
Craig Topper 31156bbdb9 [X86] Add more ISD nodes to handle masked versions of VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements.
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.

Fixes another case from PR34877

llvm-svn: 351018
2019-01-13 02:59:59 +00:00
Craig Topper 4561edbec0 [X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which only produces 2 result elements and zeroes the upper elements.
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.

Fixes another case from PR34877.

llvm-svn: 351017
2019-01-13 02:59:57 +00:00
Benjamin Kramer b17d2136ea Give helper classes/functions local linkage. NFC.
llvm-svn: 351016
2019-01-12 18:36:22 +00:00
Simon Pilgrim a0069ba0db [X86] More aggressive shuffle mask widening in combineExtractWithShuffle
Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through.

llvm-svn: 351013
2019-01-12 16:38:56 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Sanjay Patel 625d5aef62 [DAGCombiner] fold insert_subvector of insert_subvector
This pattern:

    t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
  t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>

...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.

In the affected tests here, it looks like it just makes RA wiggle. But we might 
as well squash this to prevent it interfering with other pattern-matching.

Differential Revision:
https://reviews.llvm.org/D56604

llvm-svn: 351008
2019-01-12 15:12:28 +00:00
Simon Pilgrim 0d92c4debc Use getShiftAmountTy for shift amounts.
llvm-svn: 351005
2019-01-12 12:00:43 +00:00
Simon Atanasyan 789f4154db [ORC][MIPS] Fill delay-slot after `jr` instruction
MIPS `jr` instruction uses a delay-slot. To escape execution of
arbitrary instruction we should either fill the delay-slot by `nop`
instruction or swap `jr` instruction and logically preceding
instruction. This fix implements the second method to generate a bit
more effective code.

llvm-svn: 351001
2019-01-12 11:12:08 +00:00
Simon Atanasyan f903f782e7 [ORC][MIPS] Setup t9 register and call function through this register
MIPS ABI states that every function must be called through jalr $t9. In
other words, a function expect that t9 register points to the beginning
of its code. A function uses this register to calculate offset to the
Global Offset Table and save it to the `gp` register.
```
lui   $gp, %hi(_gp_disp)
addiu $gp, %lo(_gp_disp)
addu  $gp, $gp, $t9
```

If `t9` and as a result `$gp` point to the wrong place the following code
loads incorrect value from GOT and passes control to invalid code.
```
lw    $v0,%call16(foo)($gp)
jalr  $t9
```

OrcMips32 and OrcMips64 writeResolverCode methods pass control to the
resolved address, but do not setup `$t9` before the call. The `t9` holds
value of the beginning of `resolver` code so any attempts to call
routines via GOT failed.

This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test
starts to pass correctly. Before the change it fails on MIPS because the
`exitOnLazyCallThroughFailure` called from the resolver code could not
call libc routine `exit` via GOT.

Differential Revision: http://reviews.llvm.org/D56058

llvm-svn: 351000
2019-01-12 11:12:04 +00:00
Simon Pilgrim a21e2bd682 [X86] Improve vXi64 ISD::ABS codegen with SSE41+
Make use of vblendvpd to select on the signbit

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350999
2019-01-12 10:28:12 +00:00
Simon Pilgrim ca0de0363b [X86][AARCH64] Improve ISD::ABS support
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350998
2019-01-12 09:59:32 +00:00
Nikita Popov 5f393eb5da Reapply "[DemandedBits] Use SetVector for Worklist"
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Reapplying with a smaller number of inline elements in the
SmallSetVector, to avoid running into the SmallDenseMap issue
described in D56455.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350997
2019-01-12 09:09:15 +00:00
Craig Topper 90fe6edcba [X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic lowering.
llvm-svn: 350995
2019-01-12 08:15:54 +00:00
Craig Topper 33b2cf50e3 [X86] Add ISD node for masked version of CVTPS2PH.
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.

This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.

Fixes another instruction for PR34877.

llvm-svn: 350994
2019-01-12 08:05:12 +00:00
Alex Bradbury 61aa940074 [RISCV] Introduce codegen patterns for RV64M-only instructions
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.

Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).

Differential Revision: https://reviews.llvm.org/D53230

llvm-svn: 350993
2019-01-12 07:43:06 +00:00
Alex Bradbury d05eae7a7b [RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>

Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.

There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.

Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.

Differential Revision: https://reviews.llvm.org/D56264

llvm-svn: 350992
2019-01-12 07:32:31 +00:00
Craig Topper a69d903204 [X86] Remove unnecessary code from getMaskNode.
We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them.

llvm-svn: 350991
2019-01-12 06:13:44 +00:00
Craig Topper bf61525e8c [X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1.
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.

By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.

llvm-svn: 350989
2019-01-12 02:22:10 +00:00
Craig Topper 8695e6dfc4 [X86] Change some patterns that select MOVZX16rm8 to instead select MOVZX32rm8 and extract the subregister.
This should be a shorter encoding and is consistent with what we do for zext i8->i16

llvm-svn: 350988
2019-01-12 02:22:06 +00:00
Evandro Menezes 7bc55e4075 [ARM] Fix typo
Fix typo in r350952.

llvm-svn: 350986
2019-01-12 01:06:43 +00:00
Craig Topper abe6ef8d09 [X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits.
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.

This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.

This fixes some of PR34877

llvm-svn: 350985
2019-01-12 00:55:27 +00:00
Evandro Menezes 7d7e3256cd [AArch64] Improve Exynos predicates
Expand the predicate using shifted arithmetic and logic instructions to also
consider the respective not shifted instructions.

llvm-svn: 350976
2019-01-11 22:39:47 +00:00
Nikita Popov 9f6e9cf71b [ConstantFolding] Fold undef for integer intrinsics
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.

This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.

The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.

Differential Revision: https://reviews.llvm.org/D55950

llvm-svn: 350971
2019-01-11 21:18:00 +00:00
Nirav Dave 6b7f5aac72 [X86] Fix incomplete handling of register-assigned variables in parsing.
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.

Reviewers: rnk, nickdesaulniers, void

Subscribers: hiraditya, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D56287

llvm-svn: 350966
2019-01-11 20:17:36 +00:00
Evandro Menezes 0c14c87d00 [AArch64] Add pipeline model for Exynos M4
Add the scheduling and cost model for Exynos M4.

llvm-svn: 350960
2019-01-11 19:36:25 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Pirama Arumuga Nainar cc07dabdaa [Legalizer] Use correct ValueType of SELECT_CC node during Float promotion
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.

Fix PR40273.

Reviewers: efriedma, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56566

llvm-svn: 350951
2019-01-11 18:46:02 +00:00
Teresa Johnson 290a839891 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Vedant Kumar ee10ef737e [MergeFunc] Erase unused duplicate functions if they are discardable
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.

Differential Revision: https://reviews.llvm.org/D56574

llvm-svn: 350939
2019-01-11 17:56:35 +00:00
Vedant Kumar 08fe7e02fb [MergeFunc] Use Instruction::getFunction as a cleanup, NFC
llvm-svn: 350938
2019-01-11 17:56:21 +00:00
Ehsan Amiri f452f116d2 [Jump Threading] Unfold a select insn that feeds a switch via a phi node
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select 
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a 
switch (via a phi node). The patch unfolds select under this condition. 
A testcase is provided.

llvm-svn: 350931
2019-01-11 15:52:57 +00:00
Sanjay Patel 40cd4b77e9 [x86] allow insert/extract when matching horizontal ops
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.

There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.

llvm-svn: 350928
2019-01-11 14:27:59 +00:00
Martin Storsjo 114ad37c1d Revert "[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI."
This reverts commit r350841, as it actually had functional changes
and broke compilation. See PR40290.

llvm-svn: 350921
2019-01-11 07:31:17 +00:00
Craig Topper b97885cc2e [X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector.
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.

Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.

This fixes some of the regressions from D350800.

llvm-svn: 350918
2019-01-11 05:44:56 +00:00
Heejin Ahn e73c7a1ab2 [WebAssembly] Fix stack pointer store check in RegStackify
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.

This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.

Reviewers: dschuff, sunfish

Subscribers: jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D56094

llvm-svn: 350906
2019-01-10 23:12:07 +00:00
Anton Korobeynikov 0681d6bc90 [MSP430] Minor fixes/improvements for assembler/disassembler
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
  than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56547

llvm-svn: 350903
2019-01-10 22:59:50 +00:00
Anton Korobeynikov 29ffb6d558 [MSP430] Add missing instruction forms
* Add missing mm, [r|m]n, [r|m]p instruction forms.
* Fix bit16mc instruction.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56546

llvm-svn: 350902
2019-01-10 22:54:53 +00:00
Thomas Lively 64a39a1c4e [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56560

llvm-svn: 350901
2019-01-10 22:32:11 +00:00
Gerolf Hoflehner cb7d968f73 [MachineCombiner][NFC] Prevent dereferencing past-the-end object in an MRI container
llvm-svn: 350896
2019-01-10 21:53:13 +00:00
Alina Sbirlea e41f4b39e5 [MemorySSA] Disable checkClobberSanity for SkipSelfWalker.
Sanity will fail for this, since we're exploring getting a clobber
further than the sanity check expects.
Ideally we need to teach the sanity check to differentiate between the
two walkers based on the SkipSelf bool in the query.

llvm-svn: 350895
2019-01-10 21:47:15 +00:00
Matt Davis 9cd9f41f0e [GVN] Update BlockRPONumber prior to use.
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional.  In short we found cases where that map was being accessed with blocks that had not yet been added to that structure.  For context, I've kept the wall of text below,  to what we are trying to fix, by always ensuring a updated BlockRPONumber.

== Backstory ==

I was investigating an ICE (segfault accessing a DenseMap item).  This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).

After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`.  Our test case that triggered this warning and periodic crash is rather involved.  But the problematic line looks to be:

GVN.cpp: Line 2197

```
     if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```

To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.

My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above).  In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.

1) There is no sequence point for the `>=` operation.

2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).

3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) .  A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid.  The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.

The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int.  And that pointer is being stored on  the stack.  If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid. 

Reviewers: t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D55974

llvm-svn: 350880
2019-01-10 19:56:03 +00:00
Alina Sbirlea cae12edaaa Use MemorySSA in LICM to do sinking and hoisting.
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Reviewers: sanjoy, davide, gberry, george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40375

llvm-svn: 350879
2019-01-10 19:29:04 +00:00
Craig Topper 844f989608 [X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.

This should help some of the regressions from D56387

Differential Revision: https://reviews.llvm.org/D56421

llvm-svn: 350875
2019-01-10 19:05:34 +00:00
Craig Topper 350e6e9d7c [X86] Simplify the BRCOND handling for FCMP_UNE.
Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch.

As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all.

Differential Revision: https://reviews.llvm.org/D56309

llvm-svn: 350873
2019-01-10 19:02:14 +00:00
Sanjay Patel 9b368f39a9 [DAGCombiner] simplify code; NFC
llvm-svn: 350844
2019-01-10 16:47:42 +00:00
Nirav Dave cd18977add [SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI.
llvm-svn: 350841
2019-01-10 16:25:47 +00:00
Nirav Dave 4817c0e46c [SelectionDAGBuilder] Fix formatting. NFC.
llvm-svn: 350839
2019-01-10 16:22:19 +00:00
Neil Henning e85d45a699 [AMDGPU] Fix dwordx3/southern-islands failures.
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.

Differential Revision: https://reviews.llvm.org/D56434

llvm-svn: 350838
2019-01-10 16:21:08 +00:00
Nirav Dave 57f2c14860 [SelectionDAGBuilder] Refactor visitInlineAsm. NFC.
llvm-svn: 350837
2019-01-10 16:18:18 +00:00
James Y Knight 62df5eed16 [opaque pointer types] Remove some calls to generic Type subtype accessors.
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).

I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.

llvm-svn: 350835
2019-01-10 16:07:20 +00:00
Alex Bradbury 6f302b8a69 [RISCV][MC] Add support for evaluating constant symbols as immediates
This further improves compatibility with GNU as, allowing input such as the
following to be assembled:

.equ CONST, 0x123456
li a0, CONST
addi a0, a0, %lo(CONST)

.equ CONST, 1
slli a0, a0, CONST

Note that we don't have perfect compatibility with gas, as it will avoid
emitting a relocation in this case:

addi a0, a0, %lo(CONST2)
.equ CONST2, 0x123456

Thanks to Shiva Chen for suggesting a better way to approach this during review.

Differential Revision: https://reviews.llvm.org/D52298

llvm-svn: 350831
2019-01-10 15:33:17 +00:00
Sanjay Patel 87ae1460f7 [x86] fix remaining miscompile bug in horizontal binop matching (PR40243)
When we use the partial-matching function on a 128-bit chunk, we must 
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.

This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243

llvm-svn: 350830
2019-01-10 15:27:23 +00:00
Sanjay Patel ed5cfc6792 [x86] fix horizontal binop matching for 256-bit vectors (PR40243)
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the 
existing (old) horizontal op matching function because it does not model the way 
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). 
A potential follow-up change for that is discussed in the bug report - see also D56490.

This generally duplicates a lot of the existing matching code, but we can't just remove 
that without introducing regressions, so the existing code is renamed and used less often. 
Follow-ups may try to reduce that overlap.

Differential Revision: https://reviews.llvm.org/D56450

llvm-svn: 350826
2019-01-10 15:04:52 +00:00
Bryan Chan 7ce5775e62 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
Andrea Di Biagio 97ed076dd1 [MCA] Fix wrong definition of ResourceUnitMask in DefaultResourceStrategy.
Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It
should have been a 64 bit quantity instead. That means, ResourceUnitMask was
always implicitly truncated to a 32 bit quantity.
This issue has been found by inspection. Surprisingly, that bug was latent, and
it never negatively affected any existing upstream targets.

This patch fixes  the wrong definition of ResourceUnitMask, and adds a bunch of
extra debug prints to help debugging potential issues related to invalid
processor resource masks.

llvm-svn: 350820
2019-01-10 13:59:13 +00:00
Sam Parker 2088a7536c [ARM] Fix for verifier buildbot
Copy the MachineOperand first and then change the flags instead of
making a copy.

llvm-svn: 350811
2019-01-10 10:47:23 +00:00
Fedor Sergeev b7871405fa [LoopUnroll] add parsing for unroll parameters in -passes pipeline
Allow to specify loop-unrolling with optional parameters explicitly
spelled out in -passes pipeline specification.
Introducing somewhat generic way of specifying parameters parsing via
FUNCTION_PASS_PARAMETRIZED pass registration.

Syntax of parametrized unroll pass name is as follows:
   'unroll<' parameter-list '>'

Where parameter-list is ';'-separate list of parameter names and optlevel
   optlevel: 'O[0-3]'
   parameter: { 'partial' | 'peeling' | 'runtime' | 'upperbound' }
   negated:  'no-' parameter

Example:
   -passes=loop(unroll<O3;runtime;no-upperbound>)

    this invokes LoopUnrollPass configured with OptLevel=3,
    Runtime, no UpperBound, everything else by default.

llvm-svn: 350808
2019-01-10 10:01:53 +00:00
Sam Parker 7208221452 [ARM] Size reduce teq to eors
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.

Differential Revision: https://reviews.llvm.org/D56255

llvm-svn: 350801
2019-01-10 08:36:33 +00:00
Craig Topper 5d20eb240f [X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.

Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.

The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.

Fixes PR39741

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56460

llvm-svn: 350800
2019-01-10 07:43:54 +00:00
Zi Xuan Wu 64c956eea8 Recommit "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This re-commit r350685.

Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350799
2019-01-10 06:20:14 +00:00
Mandeep Singh Grang 859cb2e35d [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00
Thomas Lively fdd4999b86 Revert "[WebAssembly] Add simd128-unimplemented subtarget feature"
This reverts rL350791.

llvm-svn: 350795
2019-01-10 04:09:25 +00:00
Stanislav Mekhanoshin d3757d3f3a [AMDGPU] Separate feature dot-insts
Differential Revision: https://reviews.llvm.org/D56524

llvm-svn: 350793
2019-01-10 03:25:20 +00:00
Thomas Lively eb6f9abd41 [WebAssembly] Add simd128-unimplemented subtarget feature
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.

llvm-svn: 350791
2019-01-10 02:55:52 +00:00
Thomas Lively fdca5fab60 Revert "[WebAssembly] Add unimplemented-simd128 subtarget feature"
This reverts L350778.

llvm-svn: 350789
2019-01-10 01:37:44 +00:00
Craig Topper c38c9c120f [X86] After turning VSELECT into SHRUNKBLEND, make we push the VSELECT into the worklist so it can be deleted.
Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user.

The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do.

llvm-svn: 350781
2019-01-10 00:14:27 +00:00
Thomas Lively 2eeade1814 [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D56501

llvm-svn: 350778
2019-01-09 23:59:37 +00:00
Evandro Menezes 224d831bed [llvm-mca] Display masks in hex
Display the resources masks as hexadecimal.  Otherwise, NFC.

llvm-svn: 350777
2019-01-09 23:57:15 +00:00
Eli Friedman d4e7a0d83c [SimplifyLibCalls] Fix memchr expansion for constant strings.
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]".  The expansion
was missing the conversion to unsigned char.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .

Differential Revision: https://reviews.llvm.org/D55947

llvm-svn: 350775
2019-01-09 23:39:26 +00:00
David Major 30ba0a0c95 Don't require a null terminator when loading objects
When a null terminator is required and the file size is a multiple of the system page size, MemoryBuffer will prefer pread() over mmap(), which can result in excessive memory usage.

Patch by Mike Hommey!

Differential Revision: https://reviews.llvm.org/D56475

llvm-svn: 350774
2019-01-09 23:36:32 +00:00
Heejin Ahn 569f090922 [WebAssembly] Print a debug message at the start of each pass
Summary:
Looks like many passes print its pass description as a debug message at
the start of each pass, so added that to (mostly newly added) other
passes as well.

Reviewers: dschuff

Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56142

llvm-svn: 350771
2019-01-09 23:05:21 +00:00
Easwaran Raman b45994b843 Refactor synthetic profile count computation. NFC.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).

Reviewers: davidxl

Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D56464

llvm-svn: 350755
2019-01-09 20:10:27 +00:00
Francis Visoiu Mistrih ac6454a7f6 [CodeGen] Ignore return sext/zext attributes of unused results for tail calls
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.

However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.

Differential Revision: https://reviews.llvm.org/D56486

llvm-svn: 350753
2019-01-09 19:46:15 +00:00
Easwaran Raman ed279752f0 [Inliner] Assert that the computed inline threshold is non-negative.
Reviewers: chandlerc

Subscribers: haicheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56409

llvm-svn: 350751
2019-01-09 19:26:17 +00:00
David Callahan 3ef0f4447d refactor BlockFrequencyInfo::view to take a title parameter
Summary: All a non-default title for the debugging this debugging aide

Reviewers: twoh, Kader, modocache

Reviewed By: twoh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56499

llvm-svn: 350749
2019-01-09 19:12:38 +00:00
Thomas Lively edb54b22d3 [WebAssembly] Standardize order of SIMD bitselect arguments
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56412

llvm-svn: 350746
2019-01-09 18:13:11 +00:00
Aleksandar Beserminji 8abf680424 [mips][micrompis] Emit 16bit NOPs by default
Emit 16bit NOPs by default.
Use 32bit NOPs in delay slots where necessary.

Differential https://reviews.llvm.org/D55323

llvm-svn: 350733
2019-01-09 15:58:02 +00:00
Valery Pykhtin b7a459547d Revert "[AMDGPU] Fix DPP combiner"
This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721

llvm-svn: 350730
2019-01-09 15:21:53 +00:00
Kristof Beyls c650ff77eb Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.

Differential Revision: https://reviews.llvm.org/D55929

llvm-svn: 350729
2019-01-09 15:13:34 +00:00
Valery Pykhtin 1e0b5c719b [AMDGPU] Fix DPP combiner
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:

1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.

Differential revision: https://reviews.llvm.org/D55444

llvm-svn: 350721
2019-01-09 13:43:32 +00:00
Florian Hahn 9697d2a764 Revert r350647: "[NewPM] Port tsan"
This patch breaks thread sanitizer on some macOS builders, e.g.
http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/52725/

llvm-svn: 350719
2019-01-09 13:32:16 +00:00
Simon Pilgrim 5a7132ff0f [X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors
llvm-svn: 350716
2019-01-09 13:23:28 +00:00
Anton Korobeynikov c18e90369d [MSP430] Optimize 'shl x, 8[+ N] -> swpb(zext(x)) [<< N]' for i16
Perform additional simplification to reduce shift amount.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56016

llvm-svn: 350712
2019-01-09 13:03:01 +00:00
Anton Korobeynikov 9222ed4485 [MSP430] Fix crash while lowering llvm.stacksave/stackrestore
Perform the usual expansion of stacksave / restore intrinsics.
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54890

llvm-svn: 350710
2019-01-09 12:52:15 +00:00
Diogo N. Sampaio 1eb31c8e94 [AArch64] Move feature predctrl to predres
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang

Differential Revision: https://reviews.llvm.org/D56484

llvm-svn: 350702
2019-01-09 11:24:15 +00:00
Simon Pilgrim 7ee86e8e81 [X86] Fix gcc7 -Wunused-but-set-variable warning. NFCI.
llvm-svn: 350701
2019-01-09 11:18:49 +00:00
David Stenberg 33b192d72b [DebugInfo] Omit location list entries with empty ranges
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:

.Ldebug_loc0:
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .short  1                       # Loc expr size
        .byte   85                      # DW_OP_reg5
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .quad   .Lfunc_end0-.Lfunc_begin0
        .short  1                       # Loc expr size
        .byte   85                      # super-register DW_OP_reg5
        .quad   0
        .quad   0

As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.

To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:

"A location list entry (but not a base address selection or end of list
 entry) whose beginning and ending addresses are equal has no effect
 because the size of the range covered by such an entry is zero."

Reviewers: davide, aprantl, dblaikie

Reviewed By: aprantl

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55919

llvm-svn: 350698
2019-01-09 09:58:59 +00:00
Matt Arsenault 3dddb163dd GlobalISel: Implement fewerElements for implicit_def
llvm-svn: 350697
2019-01-09 07:51:52 +00:00
Matt Arsenault befee402ff GlobalISel: Implement widenScalar for implicit_def
llvm-svn: 350695
2019-01-09 07:34:14 +00:00
Max Kazantsev 4615a505f8 [IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.

This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.

Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma

llvm-svn: 350694
2019-01-09 07:28:13 +00:00
Zi Xuan Wu f2a75eef41 Revert "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This reverts commit r350685.

See compile assert in compiler-rt.

llvm-svn: 350693
2019-01-09 06:12:24 +00:00
Hiroshi Inoue dad8c6a1c9 [NFC] fix trivial typos in comments
llvm-svn: 350690
2019-01-09 05:11:10 +00:00
Craig Topper 2fa8e2d8a8 [X86] Correct the MaskVT for avx512 gather/scatter intrinsics to use the min of the number of index and data elements.
When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made.

This is consistent with the isel patterns used for the target independent gather/scatter intrinsic.

llvm-svn: 350687
2019-01-09 04:21:12 +00:00
Zi Xuan Wu 9479f6d72e [PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel
Bad machine code: Illegal virtual register for instruction

function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc

Fix assert about missing match between fcmp instruction and register class. 
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.

add -verifymachineinstrs option into related test cases to enable the verify pass.


Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350685
2019-01-09 02:31:10 +00:00
Stanislav Mekhanoshin ed0d6c60af Remove check for single use in ShrinkDemandedConstant
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.

After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.

Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D56406

llvm-svn: 350684
2019-01-09 02:24:22 +00:00
Matt Arsenault 0ad1b71fe3 RegisterCoalescer: Assume CR_Replace for SubRangeJoin
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.

During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.

Fixes most of the testcases in bugs 39542 and 39602

llvm-svn: 350678
2019-01-08 23:22:18 +00:00
Matt Arsenault 2c807410fd RegisterCoalescer: Defer clearing implicit_def lanes
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.

Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.

Fixes bug 39602

llvm-svn: 350676
2019-01-08 23:10:47 +00:00
Sanjay Patel d023dd60e9 [InstCombine] canonicalize another raw IR rotate pattern to funnel shift
This is matching the equivalent of the DAG expansion, 
so it should never end up with worse perf than the 
original code even if the target doesn't have a rotate
instruction.

llvm-svn: 350672
2019-01-08 22:39:55 +00:00
Rong Xu 016220549d [PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.

Differential Revision: https://reviews.llvm.org/D56327

llvm-svn: 350671
2019-01-08 22:39:47 +00:00
Heejin Ahn 321d522038 [WebAssembly] Rename StoreResults to MemIntrinsicResults
Summary:
StoreResults pass does not optimize store instructions anymore because
store instructions don't return results values anymore. Now this pass is
used solely for memory intrinsics, so update the pass name accordingly
and fix outdated pass descriptions as well.

This patch does not change any meaningful behavior, but not marked as
NFC because it changes a comment check line in a test case.

Reviewers: dschuff

Subscribers: mgorny, sbc100, jgravelle-google, sunfiish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56093

llvm-svn: 350669
2019-01-08 22:35:18 +00:00
Rong Xu d236b0aa93 [PGO] Revert r350442 to fix commit message.
Will re-commit it using the correct commit message.

llvm-svn: 350667
2019-01-08 22:33:29 +00:00
Evandro Menezes 39c97bf6cd [AArch64] Adjust the cost model for Exynos
Improve the modeling of ALU instructions.

llvm-svn: 350663
2019-01-08 22:29:58 +00:00
Evandro Menezes 5d780093fd [llvm-mca] Improve debugging (NFC)
llvm-svn: 350661
2019-01-08 22:29:38 +00:00
Zachary Turner 2fe4900525 [llvm-undname] Add support for demangling msvc's noexcept types.
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions.  This patch
makes llvm-undname properly demangle them.

Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769

llvm-svn: 350656
2019-01-08 21:05:51 +00:00
Zachary Turner 4e83923d83 Don't write #include "Windows/WindowsSupport.h" from the Windows dir.
This generates -Wnonportable-include-dir warnings, and doesn't need
to be there.  It seems this was just checked in on accident.

llvm-svn: 350655
2019-01-08 21:05:34 +00:00
Adrian Prantl 8a753a2e5a Revert "Revert "Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files""""
This reverts commit D56084.

llvm-svn: 350654
2019-01-08 21:05:10 +00:00
Philip Pfaffe 82f995db75 [NewPM] Port tsan
A straightforward port of tsan to the new PM, following the same path
as D55647.

Differential Revision: https://reviews.llvm.org/D56433

llvm-svn: 350647
2019-01-08 19:21:57 +00:00
Paul Robinson 7402fd9a35 Rename DIFlagFixedEnum to DIFlagEnumClass. NFC
llvm-svn: 350641
2019-01-08 17:52:29 +00:00
Anna Thomas 2dfa412efe [UnrollRuntime] Fix domTree failures in multiexit unrolling
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).

Reviewers: reames, mzolotukhin, mkazantsev, hfinkel

Reviewed by: brzycki, kuhar

Subscribers: zzheng, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56284

llvm-svn: 350640
2019-01-08 17:16:25 +00:00
Yonghong Song 0d99031de0 [BPF] Fix .BTF.ext reloc type assigment issue
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.

Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.

To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 350637
2019-01-08 16:36:06 +00:00
Florian Hahn c1ece1b41b [MachineVerifier] Include offending register in allocatable live-in error msg.
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.

Reviewers: MatzeB, rtereshin, dsanders

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D55946

llvm-svn: 350630
2019-01-08 15:16:23 +00:00
Petr Pavlu bf4fdecc51 [GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with -global-isel=0
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.

Fixes PR40131.

Differential Revision: https://reviews.llvm.org/D56266

llvm-svn: 350626
2019-01-08 14:19:06 +00:00
Philip Pfaffe efb5ad1c58 [DA][NewPM] Add a printerpass and port the testsuite
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.

Differential Revision: https://reviews.llvm.org/D56386

llvm-svn: 350624
2019-01-08 14:06:58 +00:00
Francis Visoiu Mistrih 7a6d7672c1 [X86][Darwin] Emit compact-unwind for register-sized stack adjustments
For stack frames on the size of a register in x86, a code size optimization
emits "push rax/eax" instead of "sub" for stack allocation. For example:

foo:
  .cfi_startproc
BB#0:
  pushq %rax
Ltmp0:
  .cfi_def_cfa_offset 16
  ...
  .cfi_endproc

However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.

This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.

In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.

It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.

Patch authored by Bruno Cardoso Lopes.

Differential Revision: https://reviews.llvm.org/D13793

llvm-svn: 350623
2019-01-08 13:53:15 +00:00
Lama Saba 32f08399eb Revert "Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"""
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084

llvm-svn: 350621
2019-01-08 13:30:36 +00:00
Tim Northover 964eea7ad2 AArch64: avoid splitting vector truncating stores.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.

Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.

llvm-svn: 350620
2019-01-08 13:30:27 +00:00
Benjamin Kramer a480523ce9 [GlobalISel] Fix unused variable warning in Release builds.
llvm-svn: 350618
2019-01-08 12:54:26 +00:00
Sam Parker 53000a74a5 [ARM] Add missing patterns for DSP muls
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.

Differential Revision: https://reviews.llvm.org/D55992

llvm-svn: 350613
2019-01-08 10:12:36 +00:00
Matt Arsenault c765240060 AMDGPU/GlobalISel: Introduce vcc reg bank
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.

For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.

This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.

Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.

For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.

llvm-svn: 350611
2019-01-08 06:30:53 +00:00
Thomas Lively 6a87ddac9a [WebAssembly] Massive instruction renaming
Summary:
An automated renaming of all the instructions listed at
https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329
as well as some similarly-named identifiers.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, eraman, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56338

llvm-svn: 350609
2019-01-08 06:25:55 +00:00
Robert Widmann 616ed17221 [LLVM-C] Allow For Creating a BasicBlock without a Parent Function
Summary: Add a utility function for creating a basic block without a parent function.  A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56279

llvm-svn: 350608
2019-01-08 06:24:19 +00:00
Robert Widmann 40dc48be0e [LLVM-C] Allow Specifying Signedness in Int Cast
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56280

llvm-svn: 350607
2019-01-08 06:23:22 +00:00
Mandeep Singh Grang f286bee9fe [MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc.
Summary: This patch is a follow-up to D55896.

Reviewers: efriedma, mstorsjo

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56029

llvm-svn: 350606
2019-01-08 04:48:00 +00:00
Chris Kennelly a97cad4642 [NFC] Remove empty line as a test commit.
llvm-svn: 350605
2019-01-08 04:04:51 +00:00
Matt Arsenault a1515d2d33 AMDGPU/GlobalISel: Legalize concat_vectors
llvm-svn: 350598
2019-01-08 01:30:02 +00:00
Matt Arsenault 376f2ef2f0 Fix typos
llvm-svn: 350597
2019-01-08 01:25:47 +00:00
Heejin Ahn e95056d69c [WebAssembly] Move CFG-changing passes before RegStackify
Summary:
FixIrreducibleControlFlow and LateEHPrepare both possibly modify CFG and
create new registers. There seems to be no reason these passes go after
register-related optimization passes (PrepareForLiveIntervals,
OptimizeLiveIntervals, StoreResults, RegStackify, and RegColoring), and
this also possibly create new optimization opportunities. I think we
should put all current and future optimization passes before RegStackify
(and related passes) unless there's a reason not to.

Reviewers: kripken

Subscribers: dschuff, sbc100, sunfish, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D56356

llvm-svn: 350596
2019-01-08 01:25:12 +00:00
Matt Arsenault adc40baa29 RegBankSelect: Fix copy insertion point for terminators
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.

I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).

Also legalize brcond for AMDGPU.

llvm-svn: 350595
2019-01-08 01:22:47 +00:00
Heejin Ahn 8e2bac8e7f [WebAssembly] Use 'I' multiclass template for br_table (NFC)
Summary:
We don't need to explicitly use `NI` anymore because we now don't use
`let` statements within the definitions.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56376

llvm-svn: 350594
2019-01-08 01:15:15 +00:00
Matt Arsenault ae6f1e07fc AMDGPU/GlobalISel: Disallow VGPR->SCC copies
This fixes using scalar adds when only the carry in is a VGPR
using greedy regbankselect.

llvm-svn: 350593
2019-01-08 01:13:20 +00:00
Matt Arsenault 68c668a5f3 AMDGPU/GlobalISel: RegBankSelect for carry-in
I'm not sure we should be allowing the truncate
to s1 for the inputs. It may be necessary to
create a new VCC reg bank.

llvm-svn: 350592
2019-01-08 01:09:09 +00:00
Matt Arsenault 2cc15b67b7 AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out
llvm-svn: 350589
2019-01-08 01:03:58 +00:00
Matt Arsenault 299302fbe7 AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUES
llvm-svn: 350588
2019-01-08 00:46:19 +00:00
Wei Mi 2645fd0ece [RegisterCoalescer] dst register's live interval needs to be updated when
merging a src register in ToBeUpdated set.

This is to fix PR40061 related with https://reviews.llvm.org/rL339035.

In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.

In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.

The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.

Differential revision: https://reviews.llvm.org/D55867

llvm-svn: 350586
2019-01-08 00:26:11 +00:00
Davide Italiano bf1fdb852f [Verifier] Reject invalid type for DILocalVariable.
Reviewers: aprantl

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56414

llvm-svn: 350578
2019-01-07 23:09:09 +00:00
Craig Topper 486313b5f7 Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350567
2019-01-07 21:00:32 +00:00
Martin Storsjo 93a7137c0a [ObjectYAML] [COFF] Support multiple symbols with the same name
Differential Revision: https://reviews.llvm.org/D56294

llvm-svn: 350566
2019-01-07 20:55:33 +00:00
Craig Topper 81fe1fbf4a [X86][AutoUpgrade] Make some tweaks to reduce the number of nested if/else in the intrinsic upgrade code to avoid an MSVC compiler limit.
MSVC has a nesting limit of around 110-130. An if/else if/else if counts against this next level. The autoupgrade code consists a long chain of these checking matches against strings.

This commit moves some code to a helper function to move out a large if/else chain that was inside of one of the blocks into a separate function. There are more of these we could move or we could change some to lookup tables.

I've also merged together a few similar blocks in the outer chain. This should buy us some margin for a little bit.

llvm-svn: 350564
2019-01-07 20:13:45 +00:00
Craig Topper fad1589f39 Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The AutoUpgrade.cpp if/else cascade hit an MSVC limit again.

llvm-svn: 350562
2019-01-07 19:39:05 +00:00
Alina Sbirlea 12bbb4fe8d [MemorySSA] Add SkipSelfWalker.
Summary: Add implementation of SkipSelfWalker.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D56285

llvm-svn: 350561
2019-01-07 19:38:47 +00:00
Craig Topper 826f44b550 [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Differential Revision: https://reviews.llvm.org/D56087

llvm-svn: 350560
2019-01-07 19:30:43 +00:00
Alina Sbirlea bc8aa24c2f [MemorySSA] Refactor CachingWalker.
Summary:
Refactor caching walker to make creating a walker that skips the
starting access strightforward.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D55957

llvm-svn: 350558
2019-01-07 19:22:37 +00:00
Craig Topper 9c4f7e9147 [X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics.
Differential Revision: https://reviews.llvm.org/D56377

llvm-svn: 350554
2019-01-07 19:10:12 +00:00
Diogo N. Sampaio f192cdb5c9 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098

llvm-svn: 350553
2019-01-07 19:01:47 +00:00
Alina Sbirlea f723020456 [MemorySSA] Extend the clobber walker with the option to skip the starting access.
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.

Adding the walker that uses this option in a separate patch.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D55944

llvm-svn: 350551
2019-01-07 18:40:27 +00:00
Nikita Popov 8dd19ed3ec Revert "[DemandedBits] Use SetVector for Worklist"
This reverts commit r350547.

Seeing assertion failures on clang tests.

llvm-svn: 350549
2019-01-07 18:15:11 +00:00
Nikita Popov 353d92decb [DemandedBits] Use SetVector for Worklist
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350547
2019-01-07 18:03:36 +00:00
Rhys Perry f77e2e8406 AMDGPU: test for uniformity of branch instruction, not its condition
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.

Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56331

llvm-svn: 350532
2019-01-07 15:52:28 +00:00
Alexandre Ganea 90f4b94da3 [CodeView] More appropriate name and type for a Microsoft precompiled headers parameter. NFC
llvm-svn: 350520
2019-01-07 13:53:16 +00:00
Matt Arsenault 7a27c1886f AMDGPU: Remove v16i8 from register classes
llvm-svn: 350518
2019-01-07 13:31:55 +00:00
Matt Arsenault 369acb8470 AMDGPU: Remove VS/SV mappings from select
These would violate the constant bus restriction

llvm-svn: 350517
2019-01-07 13:21:36 +00:00
Chandler Carruth 90c09232a2 [CallSite removal] Move the rest of IR implementation code away from
`CallSite`.

With this change, the remaining `CallSite` usages are just for
implementing the wrapper type itself.

This does update the C API but leaves the names of that API alone and
only updates their implementation.

Differential Revision: https://reviews.llvm.org/D56184

llvm-svn: 350509
2019-01-07 07:31:49 +00:00
Chandler Carruth 57578aaf96 [CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth fee1a04d04 [CallSite removal] Move the verifier to use `CallBase` instead of the
`CallSite` wrapper.

Mostly mechanical, but I've tried to tidy up code where it made sense to
do so.

Differential Revision: https://reviews.llvm.org/D56143

llvm-svn: 350507
2019-01-07 07:02:34 +00:00
Chandler Carruth 363ac68374 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Craig Topper 6ffeeb705f [X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions.
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56361

llvm-svn: 350498
2019-01-06 18:10:18 +00:00
Lama Saba f385c21f79 Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files""
This reverts commit rL350493
issues related to modules  still appear in http://green.lab.llvm.org/green/job/lldb-cmake

llvm-svn: 350497
2019-01-06 16:39:14 +00:00
Sanjay Patel 8f12e8f3f6 [x86] explicitly set cost of integer add/sub
There are no test changes here in the existing cost model
regression tests because integer add/sub have a default
legal cost of 1 already. This would break, however, if
we custom lower those ops because the default cost model
assumes that custom-lowered ops are more expensive.

This is similar to the change in rL350403. See discussion
in D56011 for more details. When we enhance that patch to
handle integer ops, we need this cost model change to avoid
unintended diffs here from the custom lowering.

llvm-svn: 350496
2019-01-06 16:21:42 +00:00
Lama Saba ea9d555b83 Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"
Resubmitted in rL345290 and reverted in rL350345 due to failures in
http://green.lab.llvm.org/green/job/lldb-cmake/
Resubmitting after a workaround to lldb-cmake failure was
committed in rL350346, more info in https://reviews.llvm.org/D56084

llvm-svn: 350493
2019-01-06 15:45:40 +00:00
Craig Topper 57fc891c1b [LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that should be handled.
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.

llvm-svn: 350490
2019-01-06 07:06:35 +00:00
Craig Topper 1187991bcf [X86][AsmParser] Don't allow X86::DX in CheckBaseRegAndIndexRegAndScale.
This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case.

llvm-svn: 350483
2019-01-05 23:30:28 +00:00
Craig Topper d0ba531a0c [X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL.
llvm-svn: 350481
2019-01-05 22:42:58 +00:00
Craig Topper 46f8b4a11e [X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8.
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.

llvm-svn: 350480
2019-01-05 21:40:07 +00:00
Stanislav Mekhanoshin 35a3a3bd11 Added single use check to ShrinkDemandedConstant
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.

Differential Revision: https://reviews.llvm.org/D56289

llvm-svn: 350475
2019-01-05 19:20:00 +00:00
Craig Topper 3f48dbf72e [X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available.
llvm-svn: 350473
2019-01-05 18:48:11 +00:00
Nikita Popov 65038515ee [InstCombine] Relax cttz/ctlz with select on zero
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.

Differential Revision: https://reviews.llvm.org/D55786

llvm-svn: 350463
2019-01-05 09:48:16 +00:00
Easwaran Raman 366a873f14 [Inliner] Optimize shouldBeDeferred
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.

llvm-svn: 350456
2019-01-05 02:26:29 +00:00
Craig Topper 45ec002e25 [X86] Require second operand of X86vshiftuniform to be an integer. NFC
We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer.

llvm-svn: 350455
2019-01-05 01:40:29 +00:00
Evgeniy Stepanov 0184c53cbd Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)""
This reapplies commit r348983.

llvm-svn: 350448
2019-01-05 00:44:58 +00:00
Rong Xu b5fa0a89b2 [PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.

llvm-svn: 350442
2019-01-04 22:54:03 +00:00
Nikita Popov c35b4a37ba [X86] Fix warning; NFC
llvm-svn: 350437
2019-01-04 21:41:35 +00:00
Vyacheslav Zakharin 0a6f86c54b Update the pr_datasz of .note.gnu.property section.
Patch by Xiang Zhang.

Differential Revision: https://reviews.llvm.org/D56080

llvm-svn: 350436
2019-01-04 21:25:01 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Evandro Menezes 9f53bea536 [AArch64] Adjust the cost model for Exynos M3
Improve the modeling of ASIMD loads and stores.

llvm-svn: 350434
2019-01-04 21:02:25 +00:00
Craig Topper cfeb1cf9af [X86] Add INSERT_SUBVECTOR to ComputeNumSignBits
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.

My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.

Differential Revision: https://reviews.llvm.org/D56283

llvm-svn: 350432
2019-01-04 20:50:59 +00:00
Peter Collingbourne 87f477b5e4 hwasan: Implement lazy thread initialization for the interceptor ABI.
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.

The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.

This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.

Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.

Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.

Differential Revision: https://reviews.llvm.org/D56038

llvm-svn: 350429
2019-01-04 19:27:04 +00:00
Teresa Johnson 853b962416 [ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.

Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.

Reviewers: pcc, davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54507

llvm-svn: 350423
2019-01-04 19:04:54 +00:00
Sanjay Patel 6153565511 [x86] lower extracted fadd/fsub to horizontal vector math; 2nd try
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.

Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350421
2019-01-04 17:48:13 +00:00
Vedant Kumar a1778df474 [CodeExtractor] Do not extract unsafe lifetime markers
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):

```
               entry:
              +------------+
              | x = alloca |
              | y = alloca |
              +------------+
             /              \
   lhs:                      rhs:
  +-------------------+     +-------------------+
  | lifetime_start(x) |     | lifetime_start(x) |
  | use(x)            |     | lifetime_start(y) |
  | lifetime_end(x)   |     | use(x, y)         |
  | lifetime_start(y) |     | lifetime_end(y)   |
  | use(y)            |     | lifetime_end(x)   |
  | lifetime_end(y)   |     +-------------------+
  +-------------------+
```

Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.

This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.

Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.

Fixes llvm.org/PR39671 (rdar://45939472).

Differential Revision: https://reviews.llvm.org/D55967

llvm-svn: 350420
2019-01-04 17:43:22 +00:00
Sanjay Patel 722466e1f1 [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.

llvm-svn: 350419
2019-01-04 17:38:12 +00:00
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Nirav Dave 1468d6e1c5 Undo r350355 "[X86] Remove terrible DX Register parsing hack in parse operand. NFCI."
Add missing test case and update comments.

llvm-svn: 350406
2019-01-04 17:11:15 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Ranjeet Singh 107dd2565c Revert patches 348835 and 348571 because they're
causing code size performance regressions.

llvm-svn: 350402
2019-01-04 16:39:10 +00:00
Simon Pilgrim 9f4dea8c06 [X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine
Repeat of the generic SimplifyDemandedBits shift combine

llvm-svn: 350399
2019-01-04 15:43:43 +00:00
Andrea Di Biagio 3f4b54850f [MCA] Improved handling of in-order issue/dispatch resources.
Added field 'MustIssueImmediately' to the instruction descriptor of instructions
that only consume in-order issue/dispatch processor resources.
This speeds up queries from the hardware Scheduler, and gives an average ~5%
speedup on a release build.

No functional change intended.

llvm-svn: 350397
2019-01-04 15:08:38 +00:00
Florian Hahn 7902405c42 [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Andrea Di Biagio 7bec693433 [MCA] Store extra information about processor resources in the ResourceManager.
Method ResourceManager::use() is responsible for updating the internal state of
used processor resources, as well as notifying resource groups that contain used
resources.

Before this patch, method 'use()' didn't know how to quickly obtain the set of
groups that contain a particular resource unit. It had to discover groups by
perform a potentially slow search (done by iterating over the set of processor
resource descriptors).

With this patch, the relationship between resource units and groups is stored in
the ResourceManager. That means, method 'use()' no longer has to search for
groups. This gives an average speedup of ~4-5% on a release build.

This patch also adds extra code comments in ResourceManager.h to better describe
the resource mask layout, and how resouce indices are computed from resource
masks.

llvm-svn: 350387
2019-01-04 12:31:14 +00:00
Richard Trieu e1fef949ae [WebAssembly] Split the checking from the sorting logic.
Move the check for -1 and identical values outside the vector sorting code.
Compare functions need to be able to compare identical elements to be
conforming.

llvm-svn: 350379
2019-01-04 06:49:24 +00:00
Xin Tong 47beee2f3f [memcpyopt] Remove a few unnecessary isVolatile() checks. NFC
We already checked for isSimple() on the store.

llvm-svn: 350378
2019-01-04 02:13:22 +00:00
Craig Topper 6265a15f2e [X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used.
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.

Differential Revision: https://reviews.llvm.org/D56246

llvm-svn: 350374
2019-01-04 00:10:58 +00:00
Sanjay Patel 26ce9c38a7 revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math
There are non-codegen tests that need to be updated with this code change.

llvm-svn: 350373
2019-01-04 00:02:02 +00:00
Sanjay Patel ef4afca2ad [x86] lower extracted fadd/fsub to horizontal vector math
This would show up if we fix horizontal reductions to narrow as they go along, 
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger 
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350369
2019-01-03 23:16:19 +00:00
Heejin Ahn 777d01c756 [WebAssembly] Optimize Irreducible Control Flow
Summary:
Irreducible control flow is not that rare, e.g. it happens in malloc and
3 other places in the libc portions linked in to a hello world program.
This patch improves how we handle that code: it emits a br_table to
dispatch to only the minimal necessary number of blocks. This reduces
the size of malloc by 33%, and makes it comparable in size to asm2wasm's
malloc output.

Added some tests, and verified this passes the emscripten-wasm tests run
on the waterfall (binaryen2, wasmobj2, other).

Reviewers: aheejin, sunfish

Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D55467

Patch by Alon Zakai (kripken)

llvm-svn: 350367
2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen 820c6263d9 [WebAssembly] Fixed disassembler not knowing about new brlist operand
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56227

llvm-svn: 350366
2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen 9843295608 [WebAssembly] Made InstPrinter more robust
Summary:
Instead of asserting on certain kinds of malformed instructions, it
now still print, but instead adds an annotation indicating the
problem, and/or indicates invalid_type etc.

We're using the InstPrinter from many contexts that can't always
guarantee values are within range (e.g. the disassembler), where having
output is more valueable than asserting.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56223

llvm-svn: 350365
2019-01-03 22:59:59 +00:00
Nirav Dave 8de916d1a4 [X86] Remove terrible DX Register parsing hack in parse operand. NFCI.
Fold hack special casing of (%dx) operand parsing into the related
hack for out*/in* instruction parsing.

llvm-svn: 350355
2019-01-03 21:46:30 +00:00
Sanjay Patel 9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Alexander Timofeev 993e2798fd [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that changes def-use iteratorson the
way. As a result loop exits immediately when def-use iterator is
changed. Hence, the operand is folded to the very first use instruction
only. This makes VGPR live along the whole basic block and increases
register pressure significantly. The performance drop observed in SHOC
DeviceMemory test is caused by this bug.

Proposed fix: collect uses to separate container for further processing
in another loop.

Testing: make check-llvm
SHOC performance test.

Reviewers: rampitec, ronlieb

Differential Revision: https://reviews.llvm.org/D56161

llvm-svn: 350350
2019-01-03 19:55:32 +00:00
Anna Thomas a470aa6701 [UnrollRuntime] Move the DomTree verification under expensive checks
Suggested by Hal as done in r349871.

llvm-svn: 350349
2019-01-03 19:43:33 +00:00
Stefan Granitz a9b7ca472d Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files""
This reverts commit r350290.

llvm-svn: 350345
2019-01-03 19:09:24 +00:00
Kristina Brooks e434280f3d [MCStreamer] Use report_fatal_error in EmitRawTextImpl
Use report_fatal_error in MCStreamer::EmitRawTextImpl instead of
using errs() and explain the rationale behind it not being
llvm_unreachable() to save confusion for any future maintainers.

Differential Revision: https://reviews.llvm.org/D56245

llvm-svn: 350342
2019-01-03 18:42:31 +00:00
Anna Thomas 0785e7307e [UnrollRuntime] Add DomTree verification under debug mode
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.

llvm-svn: 350334
2019-01-03 17:44:44 +00:00
Evandro Menezes 0f67746c92 [AArch64] Add new scheduling predicates
Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode.

llvm-svn: 350332
2019-01-03 17:28:09 +00:00
Andrea Di Biagio b284054b26 [MCA] Improve code comment and reuse an helper function in ResourceManager. NFCI
llvm-svn: 350322
2019-01-03 14:47:46 +00:00
Alex Bradbury 2ba76be882 [RISCV][MC] Accept %lo and %pcrel_lo on operands to li
This matches GNU assembler behaviour.

llvm-svn: 350321
2019-01-03 14:41:41 +00:00
Philip Pfaffe b39a97c8f6 [NewPM] Port Msan
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.

Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
  library functions, for which it inserts declarations as necessary.
- Update tests.

Caveats:
- There is one test that I dropped, because it specifically tested the
  definition of the ctor.

Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka

Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji

Differential Revision: https://reviews.llvm.org/D55647

llvm-svn: 350305
2019-01-03 13:42:44 +00:00
Simon Pilgrim c2aadfaaad [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Diogo N. Sampaio 8786a946d8 [ARM] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also renames FeatureSpecRestrict to FeatureSB.

Reviewed By: olista01, LukeCheeseman

Differential Revision: https://reviews.llvm.org/D55990

llvm-svn: 350299
2019-01-03 12:09:12 +00:00
Simon Pilgrim d824f99a6c [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)
Costs for real SSE2 instructions

llvm-svn: 350295
2019-01-03 11:38:42 +00:00
Piotr Sobczak 3abef8f9ea [AMDGPU] Change section name with metadata access
Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".

This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.

Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668

Reviewers: kzhuravl

Reviewed By: kzhuravl

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56197

llvm-svn: 350292
2019-01-03 11:22:58 +00:00
Lama Saba 4d752a88e8 Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"
The commit caused unclear failures in http://green.lab.llvm.org/green//job/lldb-cmake/
will revert if the error reappears

Differential Revision: https://reviews.llvm.org/D56084

llvm-svn: 350290
2019-01-03 10:03:54 +00:00
Markus Lavin 72b9deb21f [CodeGen] Skip over dbg-instr in twoaddr pass
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.

Differential Revision: https://reviews.llvm.org/D55987

llvm-svn: 350289
2019-01-03 08:36:06 +00:00
Martin Storsjo 74e7d26090 [llvm-readobj] [COFF] Print the symbol index for relocations
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.

Differential Revision: https://reviews.llvm.org/D56140

llvm-svn: 350288
2019-01-03 08:08:23 +00:00
Kristina Brooks bbbec9daa4 Don't go over 80 chars in MCStreamer.cpp. NFC.
Fixing up style issues around the area to prepare for
a larger differential.

llvm-svn: 350286
2019-01-03 06:06:38 +00:00
QingShan Zhang f24ec7bdd0 [Power9] Enable the Out-of-Order scheduling model for P9 hw
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.

This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.

With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:

x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%

And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. 

Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810

llvm-svn: 350285
2019-01-03 05:04:18 +00:00
Pete Cooper 697281df42 Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.

Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.

This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.

rdar://problem/47005143

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D56235

llvm-svn: 350284
2019-01-03 01:38:08 +00:00
Robert Widmann 7882b283cd [LLVM-C] Expand LLVMRelocMode
Summary: Add read[only|write] PIC relocation models to the C API and teach the TargetMachine API about it.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56187

llvm-svn: 350279
2019-01-03 00:33:44 +00:00
Craig Topper df5304d8de [X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL.
The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX.

llvm-svn: 350272
2019-01-02 23:24:08 +00:00
Wouter van Oortmerssen ad72f68501 [WebAssembly] made assembler parse block_type
Summary:
This was previously ignored and an incorrect value generated.

Also fixed Disassembler's handling of block_type.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56092

llvm-svn: 350270
2019-01-02 23:23:51 +00:00
Xin Tong 33e3b4b9b3 [ThinLTO] Scan all variants of vague symbol for reachability.
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.

Reviewers: tejohnson, grimar

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D56117

llvm-svn: 350269
2019-01-02 23:18:20 +00:00
Pete Cooper 8d58048024 Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef.
The caller to EraseInstruction had this conditional:

    // ARC calls with null are no-ops. Delete them.
    if (IsNullOrUndef(Arg))

but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.

This adds support for both of these cases.

rdar://problem/47003805

llvm-svn: 350261
2019-01-02 21:00:02 +00:00
Nikita Popov cc6ef7f153 [BDCE] Remove instructions without demanded bits
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.

Differential Revision: https://reviews.llvm.org/D56185

llvm-svn: 350257
2019-01-02 20:02:14 +00:00
Pawel Bylica 119aa8fa5f Format AggresiveInstCombine.cpp. NFC
llvm-svn: 350255
2019-01-02 19:51:46 +00:00
Craig Topper 9d4860ec4e [X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.

This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.

I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.

This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.

Differential Revision: https://reviews.llvm.org/D55975

llvm-svn: 350245
2019-01-02 19:01:05 +00:00
Zachary Turner ba797b6dae [MS Demangler] Add a flag for dumping types without tag specifier.
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc.  This patch adds a
flag enabling this.

llvm-svn: 350241
2019-01-02 18:33:12 +00:00
Craig Topper 8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper 3109f3a4ab [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.

By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.

This fixes the regression on X86 in D56156.

Differential Revision: https://reviews.llvm.org/D56176

llvm-svn: 350236
2019-01-02 17:58:30 +00:00
Craig Topper c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Wei Mi ecc89b76cb [PowerPC] Remove SeenUse check when optimizing conditional branch in
PPCPreEmitPeephole pass.

PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.

When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.

The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.

Differential Revision: https://reviews.llvm.org/D56041

llvm-svn: 350223
2019-01-02 17:07:23 +00:00
Simon Pilgrim d8125726d5 [X86] Support SHLD/SHRD masked shift-counts (PR34641)
Peek through shift modulo masks while matching double shift patterns.

I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.

Differential Revision: https://reviews.llvm.org/D56199

llvm-svn: 350222
2019-01-02 17:05:37 +00:00
Hal Finkel 4f2381440d [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).

Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).

After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).

As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.

Patch by me with contributions from Michael Ferguson (mferguson@cray.com).

Differential Revision: https://reviews.llvm.org/D38662

llvm-svn: 350220
2019-01-02 16:28:09 +00:00
Philip Pfaffe 6bc98ad7e8 Extend Module::getOrInsertGlobal to control the construction of the
GlobalVariable

Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.

Reviewers: chandlerc

Subscribers: hiraditya, llvm-commits, bollu

Differential Revision: https://reviews.llvm.org/D56130

llvm-svn: 350219
2019-01-02 15:41:47 +00:00
Andrea Di Biagio 0682afbaee [MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCI
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.

The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.

No functional change intended.

llvm-svn: 350218
2019-01-02 15:40:52 +00:00
Piotr Sobczak 378131bae0 [AMDGPU] Handle OR as operand of raw load/store
Summary:
Use isBaseWithConstantOffset() which handles OR as an operand
to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store.

Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a

Reviewers: nhaehnle, mareko, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55999

llvm-svn: 350208
2019-01-02 09:47:41 +00:00
Craig Topper f7cc7e3201 [X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL.
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.

llvm-svn: 350206
2019-01-02 06:40:11 +00:00
Craig Topper d4db122483 [X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.

Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll

llvm-svn: 350205
2019-01-02 05:46:03 +00:00
Sanjay Patel 654e6aabb9 [InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188

Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.

The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.

llvm-svn: 350199
2019-01-01 21:51:39 +00:00
Craig Topper 00b390a000 [X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.

This is inspired by the helper functions used by ARM and AArch64 for the same purpose.

The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.

llvm-svn: 350198
2019-01-01 19:34:11 +00:00
Robert Widmann db5b537f1e [LLVM-C] bool -> LLVMBool
llvm-svn: 350197
2019-01-01 19:03:37 +00:00
Robert Widmann 5d1dfa3eb6 [LLVM-C] Add Accessors for Discarding Value Names in the IR
Summary: Add accessors so the performance improvement from this setting is accessible to third parties.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56179

llvm-svn: 350196
2019-01-01 18:56:51 +00:00
Sanjay Patel 738a863648 [x86] move/rename helper for horizontal op codegen; NFC
Preliminary commit as suggested in D56011.

llvm-svn: 350193
2019-01-01 16:08:36 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
Ayonam Ray e00606a1b2 Reversing the commit in revision 350186. Revision causes regression in 4
tests.

llvm-svn: 350187
2019-01-01 07:28:55 +00:00
Ayonam Ray c471bb2e67 Omit range checks from jump tables when lowering switches with unreachable
default

During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.

Review Reference: D52002

llvm-svn: 350186
2019-01-01 06:37:50 +00:00
Chen Zheng 4952e668f8 [InstCombine] canonicalize MUL with NEG operand
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)

Differential Revision: https://reviews.llvm.org/D55961

llvm-svn: 350185
2019-01-01 01:09:20 +00:00
Craig Topper ed3ffae4a4 [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.
Differential Revision: https://reviews.llvm.org/D56168

llvm-svn: 350179
2018-12-31 19:09:30 +00:00
Craig Topper bb0873cf46 [X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode.
Differential Revision: https://reviews.llvm.org/D56169

llvm-svn: 350178
2018-12-31 19:09:27 +00:00
Simon Pilgrim f2b9d10477 Keep tablegen commands in alphabetical order. NFCI.
Mentioned on D56167.

llvm-svn: 350176
2018-12-31 14:51:53 +00:00
Martin Storsjo 74d93f9b24 [AArch64] Accept "sve" as arch feature in assembler
Differential Revision: https://reviews.llvm.org/D56128

llvm-svn: 350174
2018-12-31 10:22:04 +00:00
Alexander Potapenko cea4f83371 [MSan] Handle llvm.is.constant intrinsic
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.

llvm-svn: 350173
2018-12-31 09:42:23 +00:00
Craig Topper 802c4979ae [DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.

llvm-svn: 350172
2018-12-31 05:40:46 +00:00
Martin Storsjo 2018777836 [AArch64] Implement the .arch_extension directive
Differential Revision: https://reviews.llvm.org/D56131

llvm-svn: 350169
2018-12-30 21:06:32 +00:00
Kang Zhang 9d78c60bf4 [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will 
be reserved later, but the verifier doesn't know that).

So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know 
X2 is a reserved register.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D56148

llvm-svn: 350165
2018-12-30 15:13:51 +00:00
David Bolvansky 90004149cc [NFC] Fixed extra semicolon warning
-This line, and those below, will be ignored--

M    lib/Support/Error.cpp

llvm-svn: 350162
2018-12-30 13:18:17 +00:00
Kang Zhang 4aa6453767 [PowerPC] Fix ADDE, SUBE do not know how to promote operator
Summary:
This patch is created to fix the Bugzilla bug 39815:
https://bugs.llvm.org/show_bug.cgi?id=39815 

This patch is to support promotion integer result for the instruction ADDE, SUBE.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D56119

llvm-svn: 350161
2018-12-30 07:48:09 +00:00
Craig Topper a32e353afa [X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.

Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.

llvm-svn: 350159
2018-12-30 03:05:07 +00:00
Craig Topper f237ce159e [X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting.
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.

We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.

llvm-svn: 350158
2018-12-30 02:30:34 +00:00
Nemanja Ivanovic 0dad994a10 [PowerPC][NFC] Macro for register set defs for the Asm Parser
We have some unfortunate code in the back end that defines a bunch of register
sets for the Asm Parser. Every time another class is needed in the parser, we
have to add another one of those definitions with explicit lists of registers.
This NFC patch simply provides macros to use to condense that code a little bit.

Differential revision: https://reviews.llvm.org/D54433

llvm-svn: 350156
2018-12-29 16:13:11 +00:00
Nemanja Ivanovic 0f7715afe1 [PowerPC] Complete the custom legalization of vector int to fp conversion
A recent patch has added custom legalization of vector conversions of
v2i16 -> v2f64. This just rounds it out for other types where the input vector
has an illegal (narrower) type than the result vector. Specifically, this will
handle the following conversions:

v2i8 -> v2f64
v4i8 -> v4f32
v4i16 -> v4f32

Differential revision: https://reviews.llvm.org/D54663

llvm-svn: 350155
2018-12-29 13:40:48 +00:00
Nemanja Ivanovic 3c7ac649ec [PowerPC] Fix CR Bit spill pseudo expansion
The current CRBIT spill pseudo-op expansion creates a KILL instruction
that kills the CRBIT and defines the enclosing CR field. However, this
paints a false picture to the register allocator that all bits in the CR
field are killed so copies of other bits out of the field become dead and
removable.
This changes the expansion to preserve the KILL flag on the CRBIT as an
implicit use and to treat the CR field as an undef input.

Thanks to Hal Finkel for the review and Uli Weigand for implementation input.

Differential revision: https://reviews.llvm.org/D55996

llvm-svn: 350153
2018-12-29 11:43:54 +00:00
Simon Atanasyan a6424e7c4e [mips] Show an error on attempt to use 64-bit PC-relative relocation
The following code requests 64-bit PC-relative relocations unsupported
by MIPS ABI. Now it triggers an assertion. It's better to show an error
message.
```
foo:
  .quad bar - foo
```

llvm-svn: 350152
2018-12-29 10:10:02 +00:00
Simon Atanasyan b243d8d42a [mips] Show a regular error message on attempt to use one byte relocation
llvm-svn: 350151
2018-12-29 10:09:55 +00:00
Max Kazantsev 201534d753 Drop SE cache early because loop parent can change in LoopSimplifyCFG
llvm-svn: 350145
2018-12-29 04:26:22 +00:00
Heejin Ahn 4d98dfb67d [WebAssembly] Fix comments in ExplicitLocals (NFC)
llvm-svn: 350144
2018-12-29 02:42:04 +00:00
Richard Trieu a87b70d1db Add vtable anchor to classes.
llvm-svn: 350142
2018-12-29 02:02:13 +00:00
Craig Topper 0a6cec6f9f [X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization.
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.

llvm-svn: 350141
2018-12-29 01:17:11 +00:00
Craig Topper f814d28eb3 [X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply.
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.

Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll

Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.

I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization

llvm-svn: 350134
2018-12-28 19:19:39 +00:00
Anna Thomas 98743fa77a [UnrollRuntime] NFC: Add comment and verify LCSSA
Added -verify-loop-lcssa to test cases.
Updated comments in ConnectProlog.

llvm-svn: 350131
2018-12-28 18:52:16 +00:00
Diogo N. Sampaio 9123f82cc4 [AArch64] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also moves to FeatureSB the old FeatureSpecRestrict.

Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman	

Differential Revision: https://reviews.llvm.org/D55921

llvm-svn: 350126
2018-12-28 17:14:58 +00:00
Hiroshi Inoue 1ea98f040e [PowerPC] handle ISD:TRUNCATE in BitPermutationSelector
This is the last one in a series of patches to support better code generation for bitfield insert.
BitPermutationSelector already support ISD::ZERO_EXTEND but not TRUNCATE.
This patch adds support for ISD:TRUNCATE in BitPermutationSelector.

For example of this test case, 
struct s64b {
  int a:4;
  int b:16;
  int c:24;
};
void bitfieldinsert64b(struct s64b *p, unsigned char v) {
  p->b = v;
}

the selection DAG loos like:

t14: i32,ch = load<(load 4 from %ir.0)> t0, t2, undef:i64
       t18: i32 = and t14, Constant:i32<-1048561>
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t22: i64 = AssertZext t4, ValueType:ch:i8
        t23: i32 = truncate t22
      t16: i32 = shl nuw nsw t23, Constant:i32<4>
    t19: i32 = or t18, t16
  t20: ch = store<(store 4 into %ir.0)> t14:1, t19, t2, undef:i64

By handling truncate in the BitPermutationSelector, we can use information from AssertZext when selecting t19 and skip the mask operation corresponding to t18.
So the generated sequences with and without this patch are

without this patch
	rlwinm 5, 5, 0, 28, 11 # corresponding to t18
	rlwimi 5, 4, 4, 20, 27
with this patch
	rlwimi 5, 4, 4, 12, 27

Differential Revision: https://reviews.llvm.org/D49076

llvm-svn: 350118
2018-12-28 08:00:39 +00:00
Max Kazantsev 530ff8f3cc Temporarily disable term folding in LoopSimplifyCFG, add tests
llvm-svn: 350117
2018-12-28 06:22:39 +00:00
Max Kazantsev 80e4b40f3e [LoopSimplifyCFG] Delete dead blocks in RPO
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.

llvm-svn: 350116
2018-12-28 06:08:51 +00:00
QingShan Zhang f2d9df61c7 [PowerPC] Remove the implicit use of the register if it is replaced by Imm
If we are changing the MI operand from Reg to Imm, we need also handle its implicit use if have.

Differential Revision: https://reviews.llvm.org/D56078

llvm-svn: 350115
2018-12-28 03:38:09 +00:00
Zi Xuan Wu 5187444345 [NFC] clang-format functions related to r350113
llvm-svn: 350114
2018-12-28 02:45:17 +00:00
Zi Xuan Wu a02a3feecf [PowerPC] Fix assert from machine verify pass that atomic pseudo expanding causes mismatched register class
For atomic value operand which less than 4 bytes need to be masked. 
And the related operation to calculate the newvalue can be done in 32 bit gprc. 
So just use gprc for mask and value calculation.

Differential Revision: https://reviews.llvm.org/D56077

llvm-svn: 350113
2018-12-28 02:12:55 +00:00
Chen Zheng 5ede950df9 [PowerPC] fix register class after converting X-FORM instruction to D-FORM instruction
Differential Revision: https://reviews.llvm.org/D55806

llvm-svn: 350111
2018-12-28 01:02:35 +00:00
Chandler Carruth 05b5bd8b85 [CallSite removal] Add and flesh out APIs on the new `CallBase` base class that previously were only available on the `CallSite` wrapper.
Summary:
This will make migrating code easier and generally seems like a good collection
of API improvements.

Some of these APIs seem like more consistent / better naming of existing
ones. I've retained the old names for migration simplicit and am just
adding the new ones in this commit. I'll try to garbage collect these
once CallSite is gone.

Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55638

llvm-svn: 350109
2018-12-27 23:40:17 +00:00
Craig Topper 787ad92bf6 [X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it.
Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2.

This seems to give some improvements in our ability to use SimplifyDemandedBits.

llvm-svn: 350084
2018-12-27 03:37:04 +00:00
Craig Topper a8f07e51f9 [X86] Factor the core code out of LowerSETCC into a helper that can create CMP/BT/PTEST/KORTEST etc. without making an X86ISD::SETCC node. NFCI
Make each of the helper functions only return their comparison node and the condition code. Leave X86ISD::SETCC creation to the LowerSETCC function itself.

Looking into whether we can use this code directly in BRCOND and SELECT lowering instead of going through LowerSETCC which creates an X86ISD::SETCC node we need to look through.

llvm-svn: 350082
2018-12-27 01:50:40 +00:00
Craig Topper 4f1ef9fc0f [X86] Merge getBitTestCondition into LowerAndToBT. Don't create X86ISD::SETCC node in the merged function. NFCI
Only one of the 3 callers of LowerAndToBT need the SETCC node. Two of them have to look through it to find the operands they really need. Instead create it after the one call that needs it.

LowerAndToBT now returns both the BT node and the X86 specific condition code separately.

llvm-svn: 350081
2018-12-27 01:50:38 +00:00
Wouter van Oortmerssen f227621036 [WebAssembly] Added basic support for if/else/end_if in MC layer.
Summary:
These instructions are currently unused in our backend, but for
completeness it is good to support them, so they can be used with
the assembler in hand-written code.

Tests are very basic, signature support missing much like other blocks.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55973

llvm-svn: 350079
2018-12-26 22:55:26 +00:00
Wouter van Oortmerssen 29c6ce5879 [WebAssembly] Make assembler check for proper nesting of control flow.
Summary:
It does so using a simple nesting stack, and gives clear errors upon
violation. This is unique to wasm, since most CPUs do not have
any nested constructs.

Had to add an end of file check to the general assembler for this.

Note: if/else/end instructions are not currently supported in our
tablegen defs, so these tests will be enabled in a follow-up.
They already pass the nesting check.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55797

llvm-svn: 350078
2018-12-26 22:46:18 +00:00
Heejin Ahn ce1d50f9d7 [WebAssembly] Delete an unnecessary line in RegStackify
`OneUseInst` is set outside of the loop before and `OneUse` does not
change throughout the loop, so this line is not necessary.

llvm-svn: 350076
2018-12-26 22:33:35 +00:00
Heejin Ahn 99d3946398 [WebAssembly] Fix typos in comments in RegStackify (NFC)
llvm-svn: 350075
2018-12-26 22:27:46 +00:00
Craig Topper c9a6000755 [LoopIdiomRecognize] Add CTTZ support
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)

This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.

Patch by Yuanfang Chen (tabloid.adroit)

Reviewers: craig.topper, evstupac

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55876

llvm-svn: 350074
2018-12-26 21:59:48 +00:00
Reid Kleckner c168c6f86f [codeview] Check if this 'this' type of a method is a pointer
Fixes crash reported after r347354 for frontends that don't always emit
'this' pointers for methods. Now we will silently produce debug info
that makes functions like this look like static methods, which seems
reasonable.

llvm-svn: 350073
2018-12-26 21:52:17 +00:00
Justin Lebar 49fac56ea3 [NVPTX] Allow libcalls that are defined in the current module.
The patch adds a possibility to make library calls on NVPTX.

An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.

Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.

Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.

Patch by Denys Zariaiev!

Differential Revision: https://reviews.llvm.org/D34708

llvm-svn: 350069
2018-12-26 19:12:31 +00:00
Max Kazantsev 28298e9647 [NFC] Use utility function for guards detection
llvm-svn: 350064
2018-12-26 08:22:25 +00:00
Petar Avramovic 09dff33349 [MIPS GlobalISel] Select G_SELECT
Add widen scalar for type index 1 (i1 condition) for G_SELECT.
Select G_SELECT for pointer, s32(integer) and smaller low level
types on MIPS32.

Differential Revision: https://reviews.llvm.org/D56001

llvm-svn: 350063
2018-12-25 14:42:30 +00:00
Max Kazantsev 9b25bf3960 [NFC] Reuse variables instead of re-calling getParent
llvm-svn: 350062
2018-12-25 07:20:06 +00:00
Kang Zhang d501a1e596 [PowerPC] Fix the bug of ISD::ADDE to set its second return type to glue
Summary:
This patch is to fix the bug imported by rL341634.
In above submit , the the return type of ISD::ADDE is 
14224: SDVTList VTs = DAG.getVTList(MVT::i64, MVT::i64), 
but in fact, the second return type of ISD::ADDE should be 
MVT::Glue not MVT::i64.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D55977

llvm-svn: 350061
2018-12-25 03:29:51 +00:00
Craig Topper 0229da8f07 [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.
This is an alternative to what I attempted in D56057.

GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.

I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.

Based on a patch that Simon Pilgrim sent me in email.

Fixes PR40142.

llvm-svn: 350059
2018-12-24 19:40:20 +00:00
Eugene Leviant 4dc3a3f746 [HWASAN] Instrument memorty intrinsics by default
Differential revision: https://reviews.llvm.org/D55926

llvm-svn: 350055
2018-12-24 16:02:48 +00:00
Max Kazantsev 0b455c2b71 Revert rL350048 and rL350050
These patches have broken almost all buildbots on test
DebugInfo/X86/addr_comments.ll. Reverting to green.

llvm-svn: 350052
2018-12-24 10:30:04 +00:00
David Blaikie 5353e64935 Fix build - follow-up to r350048 which broke headerless (v4) address pool
llvm-svn: 350050
2018-12-24 07:56:40 +00:00
Max Kazantsev edabb9ae56 [LoopSimplifyCFG] Delete dead exiting edges
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.

Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev

llvm-svn: 350049
2018-12-24 07:41:33 +00:00
David Blaikie e20bf9ab91 DebugInfo: Use assembly label arithmetic for address pool size for easier reading/editing
llvm-svn: 350048
2018-12-24 07:35:10 +00:00
David Blaikie d671eb7e7c DebugInfo: Add assembly comments for debug_addr contribution header fields
llvm-svn: 350047
2018-12-24 07:09:50 +00:00
David Blaikie b917c3a41a llvm-dwarfdump: Skip address index info (and dump only the address, if found) when non-verbose dumping addrx forms
There's a few bugs here still - demonstrated with FIXITs in the test.

llvm-svn: 350046
2018-12-24 06:52:31 +00:00
Max Kazantsev 347c583772 Return "[LoopSimplifyCFG] Delete dead in-loop blocks"
The underlying bug that caused the revert should be fixed by rL348567.

Differential Revision: https://reviews.llvm.org/D54023

llvm-svn: 350045
2018-12-24 06:06:17 +00:00
George Burgess IV 7e12875c89 [LoopIdioms] More LocationSize::precise annotations; NFC
Both of these places reference memset-like loops. Memset is precise.

Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350044
2018-12-24 05:55:50 +00:00
Craig Topper 0adc3fe9e7 [X86] Remove unused variables left after r350041. NFC
llvm-svn: 350043
2018-12-24 05:45:45 +00:00
George Burgess IV 610c76534f [SelectionDAGBuilder] Use ::precise LocationSizes; NFC
More migration so we can disable the implicit int -> LocationSize
conversion.

All of these are either scatter/gather'ed vector instructions, or direct
loads. Hence, they're all precise.

Perhaps if we see way more getTypeStoreSize calls, we can make a
getTypeStoreLocationSize (or similar) as a wrapper that applies this
::precise. Doesn't appear that it's a good idea to make getTypeStoreSize
return a LocationSize itself, however.

llvm-svn: 350042
2018-12-24 05:34:21 +00:00
Craig Topper d8217b23ff [X86] Move the optimization that turns 'CMP (AND+IMM64), 0' into SRL/SHL+TEST to X86ISelDAGToDAG.
This cleans more code out of EmitTest.

llvm-svn: 350041
2018-12-24 05:27:13 +00:00
Craig Topper e8c50fc6af [X86] Remove the ANDN check from EmitTest.
Remove the TESTmr isel patterns and add another postprocessing combine for TESTrr+ANDrm->TESTmr. We already have a postprocessing combine for TESTrr+ANDrr->TESTrr. With this we can give ANDN a chance to match first. And clean it up during post processing if we ended up with just a regular AND.

This is another step towards my plan to gut EmitTest and do more flag handling during isel matching or by using optimizeCompare.

llvm-svn: 350038
2018-12-24 01:10:13 +00:00
Sanjay Patel 93f1074677 [DAGCombiner] limit shuffle to extend transform (PR40146)
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.

This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146

llvm-svn: 350034
2018-12-23 20:48:31 +00:00
Sanjay Patel 9933574ac3 [DAGCombiner] allow hoisting vector bitwise logic ahead of extends
llvm-svn: 350032
2018-12-23 19:58:16 +00:00
Simon Atanasyan 4553cdafe1 [ORC] Rename register in the OrcMips64 resolver code comments. NFC
The `fp` and `s8` register names are synonyms. But `fp` better reflects
a purpose of the register.

llvm-svn: 350023
2018-12-23 12:05:04 +00:00
Simon Atanasyan 15b68b87d5 [ORC] clang-format OrcMips32 and OrcMips64 code. NFC
llvm-svn: 350022
2018-12-23 12:05:00 +00:00
Simon Atanasyan da29981707 [ORC] Remove redundant instruction from MIPS resolver code. NFC
It's redundant to restore the `$a3` register twice.

llvm-svn: 350021
2018-12-23 12:04:55 +00:00
George Burgess IV 5e4a03a089 [MemCpyOpt] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

srcSize is derived from the size of an alloca, and we quit out if the
size of that is > the size of the thing we're copying to. Hence, we
should always copy everything over, so these sizes are precise.

Don't make srcSize itself a LocationSize, since optionality isn't
helpful, and we do some comparisons against other sizes elsewhere in
that function.

llvm-svn: 350019
2018-12-23 06:40:39 +00:00
Craig Topper 006bac6880 [X86] Return false from hasAndNotCompare if the comparision value is a constant.
We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets.

llvm-svn: 350018
2018-12-23 05:52:55 +00:00
George Burgess IV 69952979da [MemoryLocation] Use LocationSize instead of ints; NFC
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This one sadly isn't *super* small, but all of the changes here are
either to:
- libfuncs that are passed a constant size (memcpy, memset, ...)
- instructions that store/load a constant size

So they have to be precise

llvm-svn: 350017
2018-12-23 03:36:44 +00:00
George Burgess IV 8c5413f3f7 [Loads] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

This tries to find literal loads/stores of the given type, so this has
to be precise.

llvm-svn: 350016
2018-12-23 03:10:56 +00:00
George Burgess IV 1329cf1791 [Lint] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350015
2018-12-23 02:50:08 +00:00
George Burgess IV 685e781d55 [AAEval] Use LocationSize instead of ints; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350014
2018-12-23 02:39:58 +00:00
Craig Topper 3cc92a28ce [X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2.
llvm-svn: 350013
2018-12-23 01:54:43 +00:00
David Blaikie 2a38c17b34 DebugInfo: Accurately propagate the section used by a relocation when accessing ranges defined by low/high_pc
This is difficult/not possible to test in LLVM, but is visible as a
crash in LLD when parsing DWARF to generate gdb-index.

This function is called by llvm-dwarfdump when parsing high_pc for
non-verbose output (to print the actual high_pc rather than the low_pc
relative value), but in that case llvm-dwarfdump doesn't print section
names (if it did, it would hit this problem).

We could add some other features to llvm-dwarfdump to expose this, but
nothing really springs to my mind. I will add a test to lld, though.

llvm-svn: 350010
2018-12-22 22:20:40 +00:00
David Blaikie 25179613f6 llvm-dwarfdump: Dump the section name/number for addr attributes
llvm-svn: 350009
2018-12-22 20:34:58 +00:00
George Burgess IV 640be69249 [Analysis] More LocationSize cleanup; NFC
Keeping these patches super small so they're easily post-commit
verifiable, as requested in D44748.

llvm-svn: 350008
2018-12-22 18:23:21 +00:00
Sanjay Patel 4b537aaf6d [DAGCombiner] allow narrowing of add followed by truncate
trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. 
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) 
that does almost the same thing.

Differential Revision: https://reviews.llvm.org/D55866

llvm-svn: 350006
2018-12-22 17:10:31 +00:00
Sanjay Patel 52c02d70e2 [x86] add load fold patterns for movddup with vzext_load
The missed load folding noticed in D55898 is visible independent of that change 
either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build 
vector becomes a broadcast first; movddup is not produced until we get into isel 
via tablegen patterns).

Differential Revision: https://reviews.llvm.org/D55936

llvm-svn: 350005
2018-12-22 16:59:02 +00:00
David Blaikie 9efb0153f0 llvm-dwarfdump: Remove extraneous space between '(' and 'indexed'
When dumping string or address indexes

llvm-svn: 349997
2018-12-22 08:43:08 +00:00
David Blaikie c04d2bf22a llvm-dwarfdump: Print the section name/number for addr_index attributes
(addr attributes coming shortly)

llvm-svn: 349996
2018-12-22 08:33:55 +00:00
David Blaikie 87ae80fb2f DebugInfo: Refactor named section dumping into a reusable helper
Currently the section name (& possibly number) is only printed on
addresses in ranges - but no reason it couldn't also be displayed on
other addresses (like low/high PC).

Refactor in that direction by pulling out the section lookup and name
ambiguity dumping logic into a reusable helper.

llvm-svn: 349995
2018-12-22 08:23:10 +00:00
David Blaikie e4e0b9f48f DebugInfo: Remove extra attribute lookup
llvm-svn: 349985
2018-12-22 02:24:13 +00:00
Craig Topper 1f02ac3451 [X86] FixupLEAs, reduce number of calls to getOperand and use X86::AddrBaseReg/AddrIndexReg, etc. instead of hardcoded constants.
Makes the code a little more readable.

llvm-svn: 349983
2018-12-22 01:34:47 +00:00
Justin Lebar 7f41fe3a58 [NVPTX] Reduce stack size in NVPTXAsmPrinter::doInitialization().
NVPTXAsmPrinter::doInitialization() was creating an NVPTXSubtarget on
the stack.  This object is huge, about 80kb.  Also it's slow to create.
And it's all redundant; we have one in NVPTXTargetMachine anyway!

llvm-svn: 349982
2018-12-22 01:30:37 +00:00
David Blaikie 219c6bd388 libDebugInfo: Refactor error handling in range list parsing
Propagate the llvm::Error a little further up. This is NFC for
llvm-dwarfdump in this change, but allows ld.lld to emit more precise
error messages about which object and archive the erroneous DWARF is in.

llvm-svn: 349978
2018-12-22 00:31:02 +00:00
Reid Kleckner 98bbd07cc3 [MC] Enable .file support on COFF and diagnose it on unsupported targets
Summary:
The "single parameter" .file directive appears to be an ELF-only feature
that is intended to insert the main source filename into the string
table table.

I noticed that if you assemble an ELF .s file for COFF, typically it
will assert right away on a .file directive near the top of the file. My
first change was to make this emit a proper error in the asm parser so
that we don't assert so easily.

However, COFF actually does have some support for this directive, and if
you emit an object file, llvm-mc does not assert. When emitting a COFF
object, MC will take those file names and create "debug" symbol table
entries for them. I'm not familiar with these kinds of symbol table
entries, and I'm not aware of any users of them, but @compnerd added
them a while ago. They don't introduce absolute paths, and most main
source file paths are short enough that this extra entry shouldn't cause
any problems, so I enabled the flag in MCAsmInfoCOFF that indicates that
it's supported.

This has the side effect of adding an extra debug symbol to every object
produced by clang, which is a pretty big functional change. My question
is, should we keep the functionality or remove it in the name of symbol
table minimalism?

Reviewers: mstorsjo, compnerd

Subscribers: hiraditya, compnerd, llvm-commits

Differential Revision: https://reviews.llvm.org/D55900

llvm-svn: 349976
2018-12-21 23:35:48 +00:00
Mircea Trofin 499a66ecc0 Silence warning in assert introduced in rL349973.
Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56030

llvm-svn: 349975
2018-12-21 23:02:10 +00:00
Mircea Trofin b53eeb6f4c [llvm] API for encoding/decoding DWARF discriminators.
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)

The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.

Reviewers: davidxl, danielcdh, wmi, dblaikie

Reviewed By: dblaikie

Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D55681

llvm-svn: 349973
2018-12-21 22:48:50 +00:00
David Blaikie c3f30a7fc6 Reapply: DebugInfo: Assume an absence of ranges or high_pc on a CU means the CU is empty (devoid of code addresses)
Originally committed in r349333, reverted in r349353.

GCC emitted these unconditionally on/before 4.4/March 2012
Clang emitted these unconditionally on/before 3.5/March 2014

This improves performance when parsing CUs (especially those using split
DWARF) that contain no code ranges (such as the mini CUs that may be
created by ThinLTO importing - though generally they should be/are
avoided, especially for Split DWARF because it produces a lot of very
small CUs, which don't scale well in a bunch of other ways too
(including size)).

The revert was due to a (Google internal) test that had some checked in old
object files missing DW_AT_ranges. That's since been fixed.

llvm-svn: 349968
2018-12-21 22:25:01 +00:00
Vedant Kumar b264d69de7 [IR] Add Instruction::isLifetimeStartOrEnd, NFC
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.

This was suggested as a cleanup in D55967.

Differential Revision: https://reviews.llvm.org/D56019

llvm-svn: 349964
2018-12-21 21:49:40 +00:00
Craig Topper e58cd9cbc6 [X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops.
This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them.

I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know.

Differential Revision: https://reviews.llvm.org/D55813

llvm-svn: 349962
2018-12-21 21:42:43 +00:00
Sanjay Patel 47a6129e26 [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC
llvm-svn: 349958
2018-12-21 21:26:30 +00:00
Craig Topper 62ec024d3b [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used.
The BEXTR instruction documents the SF bit as undefined.

The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.

Fixes PR40060

Differential Revision: https://reviews.llvm.org/D55807

llvm-svn: 349956
2018-12-21 21:16:26 +00:00
Changpeng Fang 6f539294b5 AMDGPU: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing.
Summary:
  Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing.
This is because the M0 field is of unsigned.

This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization
if the base is known unsigned.

Reviewers:
  arsemn

Differential Revision:
  https://reviews.llvm.org/D55568

llvm-svn: 349951
2018-12-21 20:57:34 +00:00
Armando Montanez 4cc2113114 [TextAPI][elfabi] Fix YAML support for weak symbols
Weak symbols are supposed to be supported in the ELF TextAPI
implementation, but the YAML handler didn't read or write the `Weak`
member of ELFSymbol. This change adds the YAML mapping and updates tests
to ensure correct behavior.

Differential Revision: https://reviews.llvm.org/D56020

llvm-svn: 349950
2018-12-21 20:45:58 +00:00
Reid Kleckner 84a2d29681 [BasicAA] Fix AA bug on dynamic allocas and stackrestore
Summary:
BasicAA has special logic for unescaped allocas, which normally applies
equally well to dynamic and static allocas. However, llvm.stackrestore
has the power to end the lifetime of dynamic allocas, without referring
to them directly.

stackrestore is already marked with the most conservative memory
modification attributes, but because the alloca is not escaped, the
normal logic produces incorrect results. I think BasicAA needs a special
case here to teach it about the relationship between dynamic allocas and
stackrestore.

Fixes PR40118

Reviewers: gbiv, efriedma, george.burgess.iv

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55969

llvm-svn: 349945
2018-12-21 19:59:03 +00:00
Anna Thomas 18be3cb606 [RuntimeUnrolling] NFC: Add TODO and comments in connectProlog
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.

llvm-svn: 349944
2018-12-21 19:45:05 +00:00
Sanjay Patel 80187b8a17 [x86] add movddup specialization for build vector lowering (PR37502)
This is admittedly a narrow fix for the problem:
https://bugs.llvm.org/show_bug.cgi?id=37502
...but as the XOP restriction shows, it's a maze to get this right. 
In the motivating example, note that we have movddup before SSE4.1 and 
again with AVX2. That's because insertps isn't available pre-SSE41 and 
vbroadcast is (more generally) available with AVX2 (and the splat is 
reduced to movddup via isel pattern).

Differential Revision: https://reviews.llvm.org/D55898

llvm-svn: 349937
2018-12-21 18:48:32 +00:00
Florian Hahn 8c9f865e3d [ARM] Set Defs = [CPSR] for COPY_STRUCT_BYVAL, as it clobbers CPSR.
Fixes PR35023.

Reviewers: MatzeB, t.p.northover, sunfish, qcolombet, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55909

llvm-svn: 349935
2018-12-21 18:07:10 +00:00
Jessica Paquette 453ab1db5b [GlobalISel][AArch64] Add support for widening G_FCEIL
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.

This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.

llvm-svn: 349927
2018-12-21 17:05:26 +00:00
Evandro Menezes 96c11eceb2 [AArch64] Refactor Exynos predicate (NFC)
Change order of conditions in predicate.

llvm-svn: 349918
2018-12-21 15:51:34 +00:00
Simon Pilgrim aa53fc15b7 [XCore] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349915
2018-12-21 15:35:32 +00:00
Simon Pilgrim 7787a02b23 [Sparc] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349914
2018-12-21 15:32:36 +00:00
Simon Pilgrim 3c157d3fa3 [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349912
2018-12-21 15:29:47 +00:00
Simon Pilgrim ca8bca2ad3 [WebAssembly] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349911
2018-12-21 15:25:37 +00:00
Simon Pilgrim d800ee4861 [ARM] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349909
2018-12-21 15:15:38 +00:00
Simon Pilgrim 148957f336 [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349908
2018-12-21 15:05:10 +00:00
Simon Pilgrim 911dce2f30 [SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349907
2018-12-21 14:56:18 +00:00
Simon Pilgrim 2482c51e99 [SystemZ] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349906
2018-12-21 14:50:54 +00:00
Simon Pilgrim d43bdc715c [Lanai] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349905
2018-12-21 14:48:35 +00:00
Simon Pilgrim af1ab22a76 [PPC] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349903
2018-12-21 14:32:39 +00:00
Simon Pilgrim 57733507fe [X86] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the old KnownBits output paramater version.

llvm-svn: 349902
2018-12-21 14:25:14 +00:00
Fedor Sergeev 2d94c2265e [NewPM] -print-module-scope -print-after now prints module even after invalidated Loop/SCC
-print-after IR printing generally can not print the IR unit (Loop or SCC)
which has just been invalidated by the pass. However, when working in -print-module-scope
mode even if Loop was invalidated there is still a valid module that we can print.

Since we can not access invalidated IR unit from AfterPassInvalidated instrumentation
point we can remember the module to be printed *before* pass. This change introduces
BeforePass instrumentation that stores all the information required for module printing
into the stack and then after pass (in AfterPassInvalidated) just print whatever
has been placed on stack.

Reviewed By: philip.pfaffe
Differential Revision: https://reviews.llvm.org/D55278

llvm-svn: 349896
2018-12-21 11:49:05 +00:00
Luke Cheeseman 41a9e53500 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Thomas Lively b6dac89c87 [WebAssembly] Fix invalid machine instrs in -O0, verify in tests
Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55956

llvm-svn: 349889
2018-12-21 06:58:15 +00:00
Matt Arsenault 3eae3c4590 AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.vote
llvm-svn: 349882
2018-12-21 03:20:54 +00:00
Matt Arsenault f4c21c575a AMDGPU/GlobalISel: RegBankSelect for some fp ops
llvm-svn: 349880
2018-12-21 03:14:45 +00:00
Matt Arsenault bee2ad7185 AMDGPU/GlobalISel: Redo legality for build_vector
It seems better to avoid using the callback if possible since
there are coverage assertions which are disabled if this is used.

Also fix missing tests. Only test the legal cases since it seems
legalization for build_vector is quite lacking.

llvm-svn: 349878
2018-12-21 03:03:11 +00:00
Reid Kleckner b894ecf903 [memcpyopt] Add debug logs when forwarding memcpy src to dst
llvm-svn: 349873
2018-12-21 01:41:20 +00:00
Eli Friedman 3af2f53456 [LoopUnroll] Don't verify domtree by default with +Asserts.
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.

Differential Revision: https://reviews.llvm.org/D54732

llvm-svn: 349871
2018-12-21 01:28:49 +00:00
Craig Topper 54f1a7be13 [X86] Refactor hasNoCarryFlagUses and hasNoSignFlagUses in X86ISelDAGToDAG.cpp to tranlate opcode to condition code using the helpers in X86InstrInfo.cpp.
This shortens the switches in X86ISelDAGToDAG.cpp to only need to check condition code instead of a list of opcodes.

This also fixes a bug where the memory forms of SETcc were missing from hasNoCarryFlagUses.

llvm-svn: 349868
2018-12-21 01:14:25 +00:00
Craig Topper e0cff10289 [X86] Add memory forms of some SETCC instructions to hasNoCarryFlagUses.
Found while working on another patch

llvm-svn: 349867
2018-12-21 01:14:23 +00:00
Eli Friedman b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Tom Stellard 2f44fbe936 cmake: Remove add_llvm_loadable_module()
Summary:
This function is very similar to add_llvm_library(),  so this patch merges it
into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...)
with add_llvm_library(lib MODULE ...)

Reviewers: philip.pfaffe, beanz, chandlerc

Reviewed By: philip.pfaffe

Subscribers: chapuni, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D51748

llvm-svn: 349839
2018-12-20 22:04:08 +00:00
Jessica Paquette a6b9c68a85 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00
David Blaikie b3c56af49b DebugInfo: Fix for missing comp_dir handling with r349207
When deciding lazily whether a CU would be split or non-split I
accidentally dropped some handling for the line tables comp_dir (by
doing it lazily it was too late to be handled properly by the MC line
table code).

Move that bit of the code back to the non-lazy place.

llvm-svn: 349819
2018-12-20 20:46:55 +00:00
Eli Friedman 48397102d0 [MC] [AArch64] Correctly resolve ":abs_g1:3" etc.
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them.  This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.

Differential Revision: https://reviews.llvm.org/D55906

llvm-svn: 349800
2018-12-20 19:46:14 +00:00
Eli Friedman 4648209e16 [MC] [AArch64] Support resolving fixups for abs_g0 etc.
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc.  Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.

I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.

This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).

Differential Revision: https://reviews.llvm.org/D55896

llvm-svn: 349799
2018-12-20 19:38:07 +00:00
Simon Pilgrim 2a25360ae3 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

Clang counterpart: https://reviews.llvm.org/D55937

Differential Revision: https://reviews.llvm.org/D55938

llvm-svn: 349795
2018-12-20 19:01:07 +00:00
Florian Hahn ef307b8c26 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798

llvm-svn: 349794
2018-12-20 18:49:09 +00:00
Michael Trent f44d830e2d Add PLATFORM constants for iOS, tvOS, and watchOS simulators
Summary:
Add PLATFORM constants for iOS, tvOS, and watchOS simulators, as well
as human readable names for these constants, to the Mach-O file format
header files. 

rdar://46854119

Reviewers: ab, davide

Reviewed By: ab, davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55905

llvm-svn: 349779
2018-12-20 17:51:17 +00:00
Yonghong Song 821c93d556 [BPF] Disable relocation for .BTF.ext section
Build llvm with assertion on, and then build bcc against this llvm.
Run any bcc tool with debug=8 (turning on -g for clang compilation),
you will get the following assertion errors,
  /home/yhs/work/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:888:
  void llvm::RuntimeDyldELF::resolveBPFRelocation(const llvm::SectionEntry&, uint64_t,
    uint64_t, uint32_t, int64_t): Assertion `Value <= (4294967295U)' failed.

The .BTF.ext ELF section uses Fixup's to get the instruction
offsets. The data width of the Fixup is 4 bytes since we only need
the insn offset within the section.

This caused the above error though since R_BPF_64_32 expects
4-byte value and the Runtime Dyld tried to resolve the actual
insn address which is 8 bytes.

Actually the offset within the section is all what we need.
Therefore, there is no need to perform any kind of relocation
for .BTF.ext section and such relocation will actually cause
incorrect result.

This patch changed BPFELFObjectWriter::getRelocType() such that
for Fixup Kind FK_Data_4, if the relocation Target is a temporary
symbol, let us skip the relocation (ELF::R_BPF_NONE).

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 349778
2018-12-20 17:40:23 +00:00
Brock Wyma b17464e4e8 [CodeView] Emit global variables within lexical scopes to limit visibility
Emit static locals within the correct lexical scope so variables with the same
name will not confuse the debugger into getting the wrong value.

Differential Revision: https://reviews.llvm.org/D55336

llvm-svn: 349777
2018-12-20 17:33:45 +00:00
Michael Kruse 199427100b [InstCombine] Preserve access-group metadata.
Preserve llvm.access.group metadata when combining store instructions.
This was forgotten in r349725.

Fixes llvm.org/PR40117

llvm-svn: 349774
2018-12-20 17:11:02 +00:00
Krzysztof Parzyszek 30c42e2ab6 [Hexagon] Add patterns for funnel shifts
llvm-svn: 349770
2018-12-20 16:39:20 +00:00
Simon Pilgrim b208255fe0 [SelectionDAGBuilder] Enable funnel shift building to custom rotates
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.

AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.

Differential Revision: https://reviews.llvm.org/D55747

llvm-svn: 349765
2018-12-20 14:56:44 +00:00
Alex Bradbury eb3a64a4da [RISCV] Properly evaluate fixup_riscv_pcrel_lo12
This is a update to D43157 to correctly handle fixup_riscv_pcrel_lo12.

Notable changes:

Rebased onto trunk
Handle and test S-type
Test case pcrel-hilo.s is merged into relocations.s

D43157 description:
VK_RISCV_PCREL_LO has to be handled specially. The MCExpr inside is
actually the location of an auipc instruction with a VK_RISCV_PCREL_HI fixup
pointing to the real target.

Differential Revision: https://reviews.llvm.org/D54029
Patch by Chih-Mao Chen and Michael Spencer.

llvm-svn: 349764
2018-12-20 14:52:15 +00:00
Simon Pilgrim 09c081176a [X86][AVX512] Don't custom lower v16i8 rotations.
As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI.

llvm-svn: 349763
2018-12-20 14:38:35 +00:00
Ulrich Weigand 380bece7af [SystemZ] "Generic" vector assembler instructions shoud clobber CC
There are several vector instructions which may or may not set the
condition code register, depending on the value of an argument.

For codegen, we use two versions of the instruction, one that sets
CC and one that doesn't, which hard-code appropriate values of that
argument.  But we also have a "generic" version of the instruction
that is used for the assembler/disassembler.  These generic versions
should always be considered to clobber CC just to be safe.

llvm-svn: 349761
2018-12-20 14:24:17 +00:00
Ulrich Weigand 44d37ae38c [SystemZ] Make better use of VLLEZ
This patch fixes two deficiencies in current code that recognizes
the VLLEZ idiom:

- For the floating-point versions, we have ISel patterns that match
  on a bitconvert as the top node.  In more complex cases, that
  bitconvert may already have been merged into something else.
  Fix the patterns to match the inner nodes instead.

- For the 64-bit integer versions, depending on the surrounding code,
  we may get either a DAG tree based on JOIN_DWORDS or one based on
  INSERT_VECTOR_ELT.  Use a PatFrags to simply match both variants.

llvm-svn: 349749
2018-12-20 13:05:03 +00:00
Ulrich Weigand 8bb46b0f01 [SystemZ] Make better use of VGEF/VGEG
Current code in SystemZDAGToDAGISel::tryGather refuses to perform
any transformation if the Load SDNode has more than one use.  This
(erronously) counts uses of the chain result, which prevents the
optimization in many cases unnecessarily.  Fixed by this patch.

llvm-svn: 349748
2018-12-20 13:01:20 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Ulrich Weigand f43b510015 [SystemZ] Make better use of VLDEB
We already have special code (DAG combine support for FP_ROUND)
to recognize cases where we an use a vector version of VLEDB to
perform two floating-point truncates in parallel, but equivalent
support for VLEDB (vector floating-point extends) has been
missing so far.  This patch adds corresponding DAG combine
support for FP_EXTEND.

llvm-svn: 349746
2018-12-20 12:59:05 +00:00
George Rimar 6367d7a6d1 [yaml2obj/obj2yaml] - Support dumping/parsing ABI version.
These tools were assuming ABI version is 0,
that is not always true.

Patch teaches them to work with that field.

Differential revision: https://reviews.llvm.org/D55884

llvm-svn: 349737
2018-12-20 10:43:49 +00:00
Piotr Sobczak deaacc17fe [InstCombine][AMDGPU] Handle more buffer intrinsics
Summary:
Include the following intrinsics in the InsctCombine
simplification:

* amdgcn_raw_buffer_load
* amdgcn_raw_buffer_load_format
* amdgcn_struct_buffer_load
* amdgcn_struct_buffer_load_format

Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55882

llvm-svn: 349735
2018-12-20 10:08:18 +00:00
Alexander Potapenko 0e3b85a730 [MSan] Don't emit __msan_instrument_asm_load() calls
LLVM treats void* pointers passed to assembly routines as pointers to
sized types.
We used to emit calls to __msan_instrument_asm_load() for every such
void*, which sometimes led to false positives.
A less error-prone (and truly "conservative") approach is to unpoison
only assembly output arguments.

llvm-svn: 349734
2018-12-20 10:05:00 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet d4bd3eb85d [NFC] Fix trailing comma after function.
lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 349732
2018-12-20 09:20:07 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Eugene Leviant 2d98eb1b2e [HWASAN] Add support for memory intrinsics
Differential revision: https://reviews.llvm.org/D55117

llvm-svn: 349728
2018-12-20 09:04:33 +00:00
Kang Zhang ca8db48974 [PowerPC] Implement the isSelectSupported() target hook
Summary:
PowerPC has scalar selects (isel) and vector mask selects (xxsel). But PowerPC
does not have vector CR selects, PowerPC does not support scalar condition 
selects on vectors.
In addition to implementing this hook, isSelectSupported() should return false
when the SelectSupportKind is ScalarCondVectorVal, so that predictable selects
are converted into branch sequences.

Reviewed By: steven.zhang,  hfinkel

Differential Revision: https://reviews.llvm.org/D55754

llvm-svn: 349727
2018-12-20 06:19:59 +00:00
Craig Topper bd788ce5db [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.
llvm-svn: 349726
2018-12-20 05:28:06 +00:00
Michael Kruse 978ba61536 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

llvm-svn: 349725
2018-12-20 04:58:07 +00:00
Thomas Lively feb18fe927 [WebAssembly] Emit a splat for v128 IMPLICIT_DEF
Summary:
This is a code size savings and is also important to get runnable code
while engines do not support v128.const.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55910

llvm-svn: 349724
2018-12-20 04:20:32 +00:00
Amara Emerson 321bfb210a Fix build errors introduced by r349712 on aarch64 bots.
llvm-svn: 349723
2018-12-20 03:27:42 +00:00
Thomas Lively 8dbf29af95 [WebAssembly] Gate unimplemented SIMD ops on flag
Summary:
Gates v128.const, f32x4.sqrt, f32x4.div, i8x16.extract_lane_u, and
i16x8.extract_lane_u on the --wasm-enable-unimplemented-simd flag,
since these ops are not implemented yet in V8.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55904

llvm-svn: 349720
2018-12-20 02:10:22 +00:00
Matt Arsenault 4339883710 AMDGPU: Make i1/i64/v2i32 and/or/xor legal
The 64-bit types do depend on the register bank,
but that's another issue to deal with later.

llvm-svn: 349716
2018-12-20 01:35:49 +00:00
Matt Arsenault 8cc98bee8a AMDGPU/GlobalISel: Fix ValueMapping tables for i1
This was incorrectly selecting SGPR for any i1 values,
e.g. G_TRUNC to i1 from a VGPR was still an SGPR.

llvm-svn: 349715
2018-12-20 01:33:43 +00:00
Craig Topper 9ca2f5605e [X86] Disable custom widening of signed/unsigned add/sub saturation intrinsics under -x86-experimental-vector-widening-legalization.
Generic legalization should take care of this.

llvm-svn: 349714
2018-12-20 01:32:06 +00:00
Amara Emerson 8cb186ce17 [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.

Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.

rdar://46491420

llvm-svn: 349712
2018-12-20 01:11:04 +00:00
Matt Arsenault dff33c38e1 AMDGPU/GlobalISel: RegBankSelect for fp conversions
llvm-svn: 349709
2018-12-20 00:37:02 +00:00
Matt Arsenault 36d4092173 AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchg
llvm-svn: 349708
2018-12-20 00:33:49 +00:00
Vitaly Buka 07a55f27dc [asan] Undo special treatment of linkonce_odr and weak_odr
Summary:
On non-Windows these are already removed by ShouldInstrumentGlobal.
On Window we will wait until we get actual issues with that.

Reviewers: pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55899

llvm-svn: 349707
2018-12-20 00:30:27 +00:00
Vitaly Buka d414e1bbb5 [asan] Prevent folding of globals with redzones
Summary:
ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized
globals.
Also in general unnamed_addr is not valid here as address now is important for
ODR violation detector and redzone poisoning.

Before the patch ICF on globals caused:
1. false ODR reports when we register global on the same address more than once
2. globals buffer overflow if we fold variables of smaller type inside of large
type. Then the smaller one will poison redzone which overlaps with the larger one.

Reviewers: eugenis, pcc

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55857

llvm-svn: 349706
2018-12-20 00:30:18 +00:00
Matt Davis 87b2268c0c [DwarfExpression] Fix a typo in a doxygen comment. NFC.
llvm-svn: 349703
2018-12-20 00:01:57 +00:00
Craig Topper 217b3b20d8 [X86] Remove TLI variable from ReplaceNodeResults. NFC
We're already in X86TargetLowering which is a derived class of TargetLowering. We can just call methods directly.

llvm-svn: 349695
2018-12-19 23:13:03 +00:00
Rhys Perry 3931ad38b9 AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcasts
Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55058

llvm-svn: 349694
2018-12-19 22:53:33 +00:00
Eli Friedman a69084ffa8 [CodeGenPrepare] Fix bad IR created by large offset GEP splitting.
Creating the IR builder, then modifying the CFG, leads to an IRBuilder
where the BB and insertion point are inconsistent, so new instructions
have the wrong parent.

Modified an existing test because the test wasn't covering anything
useful (the "invoke" was not actually an invoke by the time we hit the
code in question).

Differential Revision: https://reviews.llvm.org/D55729

llvm-svn: 349693
2018-12-19 22:52:04 +00:00
Rhys Perry 972273d1d3 Fix test commit
Seems that was actually a eight space tab...

llvm-svn: 349690
2018-12-19 22:33:42 +00:00
Rhys Perry 111bf831de Test commit
Replace tab with 4 spaces.

llvm-svn: 349689
2018-12-19 22:26:51 +00:00
Evandro Menezes 374ccf6768 [AArch64] Improve Exynos predicates
Expand the predicate `ExynosResetPred` to include all forms of immediate
moves.

llvm-svn: 349686
2018-12-19 22:24:36 +00:00
Evandro Menezes ff827d737a [AArch64] Use canonical copy idiom
Use only the canonical form of the alias for register transfers in the
`IsCopyIdiomPred` predicate.

llvm-svn: 349685
2018-12-19 22:24:31 +00:00
Nikita Popov 3817ee7908 Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This reverts commit r349674. It causes a failure in
test-suite enc-3des.execution_time.

llvm-svn: 349684
2018-12-19 22:09:02 +00:00
Reid Kleckner ed3ef41711 [llvm-ar] Simplify string table get-or-insert pattern with .insert, NFC
llvm-svn: 349681
2018-12-19 20:54:06 +00:00
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Craig Topper d16da2b479 [X86] Remove a bunch of 'else' after returns in reduceVMULWidth. NFC
This reduces indentation and makes it obvious this function always returns something.

llvm-svn: 349671
2018-12-19 19:39:34 +00:00
David Blaikie ac69af7ad6 llvm-dwarfdump: Improve/fix pretty printing of array dimensions
This is to address post-commit feedback from Paul Robinson on r348954.

The original commit misinterprets count and upper bound as the same thing (I thought I saw GCC producing an upper bound the same as Clang's count, but GCC correctly produces an upper bound that's one less than the count (in C, that is, where arrays are zero indexed)).

I want to preserve the C-like output for the common case, so in the absence of a lower bound the count (or one greater than the upper bound) is rendered between []. In the trickier cases, where a lower bound is specified, a half-open range is used (eg: lower bound 1, count 2 would be "[1, 3)" and an unknown parts use a '?' (eg: "[1, ?)" or "[?, 7)" or "[?, ? + 3)").

Reviewers: aprantl, probinson, JDevlieghere

Differential Revision: https://reviews.llvm.org/D55721

llvm-svn: 349670
2018-12-19 19:34:24 +00:00
Matthew Voss 62fcfc5adb [ThinLTO] Remove dllimport attribute from locally defined symbols
Summary:
The LTO/ThinLTO driver currently creates invalid bitcode by setting 
symbols marked dllimport as dso_local. The compiler often has access 
to the definition (often dllexport) and the declaration (often 
dllimport) of an object at link-time, leading to a conflicting 
declaration. This patch resolves the inconsistency by removing the
dllimport attribute.

Reviewers: tejohnson, pcc, rnk, echristo

Reviewed By: rnk

Subscribers: dmikulin, wristow, mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D55627

llvm-svn: 349667
2018-12-19 19:07:45 +00:00
Jessica Paquette 3560e93dc1 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00
Craig Topper 84a00bd98a [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.

This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.

Differential Revision: https://reviews.llvm.org/D55870

llvm-svn: 349661
2018-12-19 18:49:13 +00:00
Craig Topper 291470347a [X86] Fix assert fails in pass X86AvoidSFBPass
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743

The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.

This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D55642

llvm-svn: 349660
2018-12-19 18:45:57 +00:00
Evandro Menezes 5d409b2278 [AArch64] Improve the Exynos M3 pipeline model
llvm-svn: 349652
2018-12-19 17:37:51 +00:00
Anton Afanasyev ce28791e20 Test commit
Fix typos.

llvm-svn: 349644
2018-12-19 17:18:40 +00:00
Sanjay Patel 798c5982a0 [ValueTracking] remove unused parameters from helper functions; NFC
llvm-svn: 349641
2018-12-19 16:49:18 +00:00
Yonghong Song 7b410ac352 [BPF] Generate BTF DebugInfo under BPF target
This patch implements BTF (BPF Type Format).
The BTF is the debug info format for BPF, introduced
in the below linux patch:
  69b693f0ae (diff-06fb1c8825f653d7e539058b72c83332)
and further extended several times, e.g.,
  https://www.spinics.net/lists/netdev/msg534640.html
  https://www.spinics.net/lists/netdev/msg538464.html
  https://www.spinics.net/lists/netdev/msg540246.html

The main advantage of implementing in LLVM is:
   . better integration/deployment as no extra tools are needed.
   . bpf JIT based compilation (like bcc, bpftrace, etc.) can get
     BTF without much extra effort.
   . BTF line_info needs selective source codes, which can be
     easily retrieved when inside the compiler.

This patch implemented BTF generation by registering a BPF
specific DebugHandler in BPFAsmPrinter.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D55752

llvm-svn: 349640
2018-12-19 16:40:25 +00:00
Peter Wu f0ad811b54 [Object] Deduplicate long archive member names
Summary:
Import libraries as created by llvm-dlltool always use the same archive
member name for every object file (namely, the DLL library name). Ensure
that long names are not repeatedly stored in the string table.

Reviewed By: ruiu

Differential Revision: https://reviews.llvm.org/D55860

llvm-svn: 349637
2018-12-19 16:15:05 +00:00
Simon Pilgrim 7bfbf3caa4 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (llvm)
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.

I'm intending to deal with the signed saturation equivalents as well.

Clang counterpart: https://reviews.llvm.org/D55879

Differential Revision: https://reviews.llvm.org/D55855

llvm-svn: 349630
2018-12-19 14:43:36 +00:00
Simon Pilgrim 2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim 47ff0431e9 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 1 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349628
2018-12-19 14:09:09 +00:00
Simon Pilgrim 6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Nico Weber f7cf1a1a73 Let TableGen write output only if it changed, instead of doing so in cmake, attempt 2
This relands r330742:
"""
Let TableGen write output only if it changed, instead of doing so in cmake.

Removes one subprocess and one temp file from the build for each tablegen
invocation.

No intended behavior change.
"""

In particular, if you see rebuilds after this change that you didn't see
before this change, that's unintended and it's fine to revert this change
again (but let me know).

r330742 got reverted because some people reported that llvm-tblgen ran on every
build after it.  This could happen if the depfile output got deleted without
deleting the main .inc output. To fix, make TableGen always write the depfile,
but keep writing the main .inc output only if it has changed. This matches what
we did in cmake before.

Differential Revision: https://reviews.llvm.org/D55842

llvm-svn: 349624
2018-12-19 13:35:53 +00:00
Nicolai Haehnle 8d5e974076 AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1
Summary:
Using HI here makes no logical sense, since the dword is only
32 bits to begin with.

Current Mesa master does not look at the relocation type at all,
so this change is fine. Future Mesa will rely on this, however.

Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55369

llvm-svn: 349620
2018-12-19 11:55:03 +00:00
Simon Pilgrim 2072b5afbe [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.

This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.

I've updated SelectionDAG::simplifyShift to demonstrate its use.

Differential Revision: https://reviews.llvm.org/D55819

llvm-svn: 349616
2018-12-19 10:41:06 +00:00
Carl Ritson c521ac3a44 AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.

Irreducible loop test has been extended based on a CTS failure detected for GFX9.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D55602

llvm-svn: 349611
2018-12-19 10:17:49 +00:00
Diana Picus 6c35a1e5af [ARM GlobalISel] Support G_CONSTANT for Thumb2
All we have to do is mark it as legal.

This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.

llvm-svn: 349610
2018-12-19 09:55:10 +00:00
Matt Arsenault b110e2277c AMDGPU/GlobalISel: Regbankselect for fsub
llvm-svn: 349608
2018-12-19 09:07:58 +00:00
Martin Storsjo e84a0b5a9e [llvm-objcopy] Initial COFF support
This is an initial implementation of no-op passthrough copying of COFF
with objcopy.

Differential Revision: https://reviews.llvm.org/D54939

llvm-svn: 349605
2018-12-19 07:24:38 +00:00
Kewen Lin a6247e7cf4 [PowerPC]Exploit P9 vabsdu for unsigned vselect patterns
For type v4i32/v8ii16/v16i8, do following transforms:
  (vselect (setcc a, b, setugt), (sub a, b), (sub b, a)) -> (vabsd a, b)
  (vselect (setcc a, b, setuge), (sub a, b), (sub b, a)) -> (vabsd a, b)
  (vselect (setcc a, b, setult), (sub b, a), (sub a, b)) -> (vabsd a, b)
  (vselect (setcc a, b, setule), (sub b, a), (sub a, b)) -> (vabsd a, b)

Differential Revision: https://reviews.llvm.org/D55812

llvm-svn: 349599
2018-12-19 03:04:07 +00:00
Evandro Menezes f03c45d582 [AArch64] Simplify the Exynos M3 pipeline model
llvm-svn: 349569
2018-12-18 23:19:57 +00:00
Evandro Menezes 4e39fa4474 [AArch64] Fix instructions order (NFC)
llvm-svn: 349568
2018-12-18 23:19:55 +00:00
Yonghong Song 61b189e06f [DebugInfo] Move several private headers to include directory
This patch moved the following files in lib/CodeGen/AsmPrinter/
  AsmPrinterHandler.h
  DbgEntityHistoryCalculator.h
  DebugHandlerBase.h
to include/llvm/CodeGen directory.

Such a change will enable Target to extend DebugHandlerBase
and emit Target specific debug info sections.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D55755

llvm-svn: 349564
2018-12-18 23:10:17 +00:00
Pete Cooper a3e0be109c Preserve the linkage for objc* intrinsics as clang will set them to weak_external in some cases
Clang uses weak linkage for objc runtime functions when they are not available on the platform.

The intrinsic has this linkage so we just need to pass that on to the runtime call.

llvm-svn: 349559
2018-12-18 22:42:08 +00:00
Pete Cooper d0ffdf8782 Add nonlazybind to objc_retain/objc_release when converting from intrinsics.
For performance reasons, clang set nonlazybind on these functions.  Now that we
are using intrinsics instead of runtime calls, we should set this attribute when
creating the runtime functions.

llvm-svn: 349558
2018-12-18 22:31:34 +00:00
Florian Hahn 485f2826ba [LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.

Reviewers: anemet, anna, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D54892

llvm-svn: 349556
2018-12-18 22:25:11 +00:00
Vitaly Buka 4e4920694c [asan] Restore ODR-violation detection on vtables
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables

Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55799

llvm-svn: 349555
2018-12-18 22:23:30 +00:00
Pete Cooper f86db5ce9e Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's.  Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.

llvm-svn: 349552
2018-12-18 22:20:03 +00:00
Martin Storsjo df20c666d6 [AArch64] Avoid crashing on .seh directives in assembly
Differential Revision: https://reviews.llvm.org/D55670

llvm-svn: 349549
2018-12-18 22:10:17 +00:00
Kuba Mracek 3760fc9f3d [asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.

Differential Revision: https://reviews.llvm.org/D55794

llvm-svn: 349544
2018-12-18 21:20:17 +00:00
Evandro Menezes 3753c25b8c [llvm-mca] Dump mask in hex
Dump the resources masks as hexadecimal.

llvm-svn: 349536
2018-12-18 20:45:50 +00:00