Commit Graph

120049 Commits

Author SHA1 Message Date
Dylan McKay d770da9834 [AVR] Rewrite the CBRRdK instruction as an alias of ANDIRdK
The CBR instruction is just an ANDI instruction with the immediate
complemented.

Because of this, prior to this change TableGen would warn due to a
decoding conflict.

This commit fixes the existing compilation warning:

  ===============
  [423/492] Building AVRGenDisassemblerTables.inc...
  Decoding Conflict:
                  0111............
                  01..............
                  ................
          ANDIRdK 0111____________
          CBRRdK 0111____________
  ================

After this commit, there are no more decoding conflicts in the AVR
backend's instruction definitions.

Thanks to Eli F for pointing me torward `t2_so_imm_not` as an example of
how to perform a complement in an instruction alias.

Fixes BugZilla PR38802.

llvm-svn: 351526
2019-01-18 07:31:34 +00:00
Hsiangkai Wang 66609a8255 [CodeGen] Fix bugs in LiveDebugVariables when debug labels are generated.
Remove DBG_LABELs in LiveDebugVariables and generate them in
VirtRegRewriter.

This bug is reported in
https://bugs.chromium.org/p/chromium/issues/detail?id=898152.

Differential Revision: https://reviews.llvm.org/D54465

llvm-svn: 351525
2019-01-18 07:17:09 +00:00
Dylan McKay 7203e00b5e [AVR] Expand 8/16-bit multiplication to libcalls on MCUs that don't have hardware MUL
This change modifies the LLVM ISel lowering settings so that
8-bit/16-bit multiplication is expanded to calls into the compiler
runtime library if the MCU being targeted does not support
multiplication in hardware.

Before this, MUL instructions would be generated on CPUs like the
ATtiny85, triggering a CPU reset due to an illegal instruction at
runtime.

First raised in https://github.com/avr-rust/rust/issues/124.

llvm-svn: 351523
2019-01-18 06:10:41 +00:00
Max Kazantsev b4cd50be56 Re-enable terminator folding in LoopSimplifyCFG: underlying bugs fixed
llvm-svn: 351520
2019-01-18 04:57:32 +00:00
Thomas Lively c6795e07f0 [WebAssembly] Add languages from debug info to producers section
Reviewers: aheejin, dschuff, sbc100

Subscribers: aprantl, jgravelle-google, hiraditya, sunfish

Differential Revision: https://reviews.llvm.org/D56889

llvm-svn: 351507
2019-01-18 02:47:48 +00:00
Vedant Kumar f529b50702 [HotColdSplit] Allow outlining with live outputs
Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.

Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).

As a drive-by, remove an errant TODO.

llvm-svn: 351492
2019-01-17 22:36:05 +00:00
Vedant Kumar b70e20db62 [HotColdSplit] Consider resume instructions to be cold
Resuming exception unwinding is roughly as unlikely as throwing an
exception.

Tested on LNT+externals (in particular, the C++ EH regression tests
provide end-to-end test coverage), as well as with a full build of iOS.

llvm-svn: 351491
2019-01-17 22:35:47 +00:00
Vladimir Stefanovic 3daf8bc96a [mips] Emit .reloc R_{MICRO}MIPS_JALR along with j(al)r(c) $25
The callee address is added as an optional operand (MCSymbol) in
AdjustInstrPostInstrSelection() and then used by asm printer to insert:
'.reloc tmplabel, R_MIPS_JALR, symbol
tmplabel:'.
Controlled with '-mips-jalr-reloc', default is true.

Differential revision: https://reviews.llvm.org/D56694

llvm-svn: 351485
2019-01-17 21:50:37 +00:00
Vedant Kumar 4541be0686 [HotColdSplit] Relax requirement that the cold sink block be extractable
Relaxing this requirement creates opportunities to split code dominated
by an EH pad.

Tested on LNT+externals.

llvm-svn: 351483
2019-01-17 21:42:36 +00:00
Vedant Kumar 32a014d048 [HotColdSplit] Simplify tests by lowering their splitting thresholds
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.

llvm-svn: 351480
2019-01-17 21:29:34 +00:00
Wei Mi 3bcccdfe38 [SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.

Differential Revision: https://reviews.llvm.org/D54819

llvm-svn: 351476
2019-01-17 20:48:34 +00:00
Reid Kleckner edd653bc07 [InstCombine] Don't sink dynamic allocas
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.

Fixes PR40365

Reviewers: hfinkel, efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56872

llvm-svn: 351475
2019-01-17 20:46:53 +00:00
Erik Pilkington 5094e5ef8b NFC: Make the copies of the demangler byte-for-byte identical
With this patch, the copies of the files ItaniumDemangle.h,
StringView.h, and Utility.h are kept byte-for-byte in sync between
libcxxabi and llvm. All differences (namespaces, fallthrough, and
unreachable macros) are defined in each copies' DemanglerConfig.h.

This patch also adds a script to copy changes from libcxxabi
(cp-to-llvm.sh), and a README.txt explaining the situation.

Differential revision: https://reviews.llvm.org/D53538

llvm-svn: 351474
2019-01-17 20:37:51 +00:00
Sanjin Sijaric 4d1450298c Fix the buildbot failure introduced by r351404
EXPENSIVE_CHECKS buildbots are failing due to r351404.

Add x1 as live in to the funclet basic block for SEH funclets, as well as
-verify-machineinstrs to the test case that triggered the failure.

llvm-svn: 351472
2019-01-17 20:24:14 +00:00
Wouter van Oortmerssen f3b762a0b6 [WebAssembly] Fixed objdump not parsing function headers.
Summary:
objdump was interpreting the function header containing the locals
declaration as instructions. To parse these without injecting target
specific code in objdump, MCDisassembler::onSymbolStart was added to
be implemented by the WebAssembly implemention.

WasmObjectFile now returns a code offset for the "address" of a symbol,
rather than the index. This is also more in-line with what other
targets do.

Also ensured that the AsmParser correctly puts each function
in its own segment to enable this test case.

Reviewers: sbc100, dschuff

Subscribers: jgravelle-google, aheejin, sunfish, rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D56684

llvm-svn: 351460
2019-01-17 18:14:09 +00:00
Teresa Johnson 8d86f1ba47 Revert "[ThinLTO] Add summary entries for index-based WPD"
Mistaken commit of something still under review!

This reverts commit r351453.

llvm-svn: 351455
2019-01-17 16:05:04 +00:00
Teresa Johnson 4fcf3b1621 [ThinLTO] Add summary entries for index-based WPD
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).

Also added are the necessary bitcode records, and the corresponding
assembly support.

The index-based WPD will be sent as a follow-on.

Depends on D53890.

Reviewers: pcc

Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54815

llvm-svn: 351453
2019-01-17 15:49:03 +00:00
James Henderson ce5b5b486a Move demangling function from llvm-objdump to Demangle library
This allows it to be used in an upcoming llvm-readobj change.

A small change in internal behaviour of the function is to always call
the microsoftDemangle function if the string does not have an itanium
encoding prefix, rather than only if it starts with '?'. This is
harmless because the microsoftDemangle function does the same check
already.

Reviewed by: grimar, erik.pilkington

Differential Revision: https://reviews.llvm.org/D56721

llvm-svn: 351448
2019-01-17 15:18:44 +00:00
Max Kazantsev 61a8d3fb33 [LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling
During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
parent loop may stop being reachable from the child loop, and therefore they become
siblings. If the former child loop had uses of some values from its former parent loop,
now such uses will require LCSSA Phis, even if they weren't needed before. So we must
form LCSSA for all loops that stopped being ancestors of the current loop in this case.

Differential Revision: https://reviews.llvm.org/D56144
Reviewed By: fedor.sergeev

llvm-svn: 351434
2019-01-17 12:51:10 +00:00
Max Kazantsev 8b134169f5 [LoopSimplifyCFG] Fix order of deletion of complex dead subloops
Function `DeleteDeadBlock` requires that all predecessors of a block
being deleted have already been deleted, with the exception of a
single-block loop. When we use it for removal of dead subloops that
contain more than one block, we may not fulfull this requirement and
fail an assertion.

This patch replaces invocation of `DeleteDeadBlock` with a generalized
version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
even if they contain some cycles.

Differential Revision: https://reviews.llvm.org/D56121
Reviewed By: fedor.sergeev

llvm-svn: 351433
2019-01-17 12:25:40 +00:00
Matt Arsenault 0cb08e448a Allow FP types for atomicrmw xchg
llvm-svn: 351427
2019-01-17 10:49:01 +00:00
Benjamin Kramer bd13c9787f [MC] Remove unused variable
llvm-svn: 351426
2019-01-17 10:25:18 +00:00
Diana Picus 639e0661ce Fix capitalization. NFC
llvm-svn: 351425
2019-01-17 10:11:59 +00:00
Diana Picus d5c2499aec [ARM GlobalISel] Allow calls to varargs functions
Allow varargs functions to be called, both in arm and thumb mode. This
boils down to choosing the correct calling convention, which we can
easily test by making sure arm_aapcscc is used instead of
arm_aapcs_vfpcc when the callee is variadic.

llvm-svn: 351424
2019-01-17 10:11:55 +00:00
Alex Bradbury 07f1c62371 [RISCV] Add codegen support for RV64A
In order to support codegen RV64A, this patch:
* Introduces masked atomics intrinsics for atomicrmw operations and cmpxchg
  that use the i64 type. These are ultimately lowered to masked operations
  using lr.w/sc.w, but we need to use these alternate intrinsics for RV64
  because i32 is not legal
* Modifies RISCVExpandPseudoInsts.cpp to handle PseudoAtomicLoadNand64 and
  PseudoCmpXchg64
* Modifies the AtomicExpandPass hooks in RISCVTargetLowering to sext/trunc as
  needed for RV64 and to select the i64 intrinsic IDs when necessary
* Adds appropriate patterns to RISCVInstrInfoA.td
* Updates test/CodeGen/RISCV/atomic-*.ll to show RV64A support

This ends up being a fairly mechanical change, as the logic for RV32A is
effectively reused.

Differential Revision: https://reviews.llvm.org/D53233

llvm-svn: 351422
2019-01-17 10:04:39 +00:00
Sanjin Sijaric b694030647 [ARM64][Windows] Share unwind codes between epilogues
There are cases where we have multiple epilogues that have the exact same unwind
code sequence.  In that case, the epilogues can share the same unwind codes in
the .xdata section.  This should get us past the assert "SEH unwind data
splitting not yet implemented" in many cases.

We still need to add support for generating multiple .pdata/.xdata sections for
those functions that need to be split into fragments.

Differential Revision: https://reviews.llvm.org/D56813

llvm-svn: 351421
2019-01-17 09:45:17 +00:00
Max Kazantsev ee61308595 [NFC] Factor out some local vars
llvm-svn: 351416
2019-01-17 06:20:42 +00:00
Thomas Lively cbda16eb8e [WebAssembly] Parse llvm.ident into producers section
llvm-svn: 351413
2019-01-17 02:29:55 +00:00
Vedant Kumar a9906c1e5e [MergeFunc] Prevent silent miscompile of vararg functions
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.

I've filed llvm.org/PR40345 to track the issue.

rdar://47326238

llvm-svn: 351411
2019-01-17 02:15:05 +00:00
Thomas Lively 3cfcc94c09 Revert "[WebAssembly] Parse llvm.ident into producers section"
This reverts commit eccdbba3a02a33e13b5262e92200a33e2ead873d.

llvm-svn: 351410
2019-01-17 00:39:49 +00:00
Vedant Kumar e21ab22115 [FunctionComparator] Consider tail call kinds
Essentially, do not treat `call` and `musttail call` as the same thing.

As a drive-by, fold CallInst and InvokeInst handling together using the
CallSite helper.

Differential Revision: https://reviews.llvm.org/D56815

llvm-svn: 351405
2019-01-17 00:29:14 +00:00
Sanjin Sijaric 685565ae9a [SEH] [ARM64] Retrieve the frame pointer from SEH funclets
The Windows ARM64 runtime passes the establisher frame to funclets as the first
argument.

llvm-svn: 351404
2019-01-17 00:24:38 +00:00
Thomas Lively a56c23c5ba [WebAssembly] Parse llvm.ident into producers section
Summary:
Everything before the word "version" is the tool, and everything after
the word "version" is the version.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56742

llvm-svn: 351399
2019-01-16 23:46:14 +00:00
Wei Mi 79c4408aa2 Fix a mistake in rL351392.
PGOInstrGen should be initialized to "" instead of false.

llvm-svn: 351397
2019-01-16 23:31:40 +00:00
Jonas Devlieghere 669edb5ce5 [AsmPrinter] Collapse .loc 0 0 directives
Currently we do not always collapse subsequent .loc 0 0 directives. The
reason is that we were checking for a PrevInstLoc which is not set when
we emit a line-0 record. We should only check the LastAsmLine, which
seems to be created exactly for this purpose.

  // When we emit a line-0 record, we don't update PrevInstLoc; so look at
  // the last line number actually emitted, to see if it was line 0.
  unsigned LastAsmLine =
    Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();

Differential revision: https://reviews.llvm.org/D56767

llvm-svn: 351395
2019-01-16 23:26:29 +00:00
Wei Mi c876e3d42b [PGO] Make pgo related options in opt more consistent.
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.

To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.

Differential Revision: https://reviews.llvm.org/D56749

llvm-svn: 351392
2019-01-16 23:19:02 +00:00
Sam Clegg 3a2028f4b0 [WebAssembly] Remove expected failure from known_gcc_test_failures.txt. NFC.
Differential Revision: https://reviews.llvm.org/D56809

llvm-svn: 351388
2019-01-16 22:26:59 +00:00
Reid Kleckner ca16e9d8a3 [X86] Sink complex MCU CC helper to .cpp file from .h file, NFC
llvm-svn: 351384
2019-01-16 22:05:36 +00:00
Craig Topper 59abdf5f3f [X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv
Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits.

This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts.

We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change.

Patch by zhutianyang and Craig Topper

Differential Revision: https://reviews.llvm.org/D56695

llvm-svn: 351381
2019-01-16 21:46:32 +00:00
Craig Topper 5ea3120718 [X86] Use X86ISD::BLENDV for blendv intrinsics. Replace vselect with blendv just before isel table lookup. Remove vselect isel patterns.
This cleans up the duplication we have with both intrinsic isel patterns and vselect isel patterns. This should also allow the intrinsics to get SimplifyDemandedBits support for the condition.

I've switched the canonical pattern in isel to use the X86ISD::BLENDV node instead of VSELECT. Since it always seemed weird to move from BLENDV with its relaxed rules on condition bits to VSELECT which has strict rules about all bits of the condition element being the same. Its more correct to go from VSELECT to BLENDV.

Differential Revision: https://reviews.llvm.org/D56771

llvm-svn: 351380
2019-01-16 21:46:28 +00:00
Changpeng Fang fe9269f804 AMDGPU: Adjust the chain for loads writing to the HI part of a register.
Summary:
  For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part
of the register to maintain the appropriate order.

Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D56454

llvm-svn: 351379
2019-01-16 21:32:53 +00:00
Craig Topper e5b7cc8aa0 [X86] Add a one use check to the setcc inversion code in combineVSelectWithAllOnesOrZeros
If we're going to generate a new inverted setcc, we should make sure we will be able to remove the old setcc.

Differential Revision: https://reviews.llvm.org/D56765

llvm-svn: 351378
2019-01-16 21:29:29 +00:00
Mandeep Singh Grang 33c49c0c82 [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Krzysztof Parzyszek 4121eaf0a5 [Hexagon] Do not promote terminator instructions in Hexagon loop idioms
llvm-svn: 351369
2019-01-16 19:40:27 +00:00
Andrea Di Biagio c5f0f5309e [X86][BtVer2] Update latency of horizontal operations.
On Jaguar, horizontal adds/subs have local forwarding disable.
That means, we pay a compulsory extra cycle of write-back stage, and the value
is not available until the end of that stage.

This patch changes the latency of horizontal operations by adding an extra
cycle. With this patch, latency numbers now match what is reported by perf.

I plan to send another patch to also 'fix' the latency of shuffle operations (on
Jaguar, local forwarding is disabled for vector shuffles too).

Differential Revision: https://reviews.llvm.org/D56777

llvm-svn: 351366
2019-01-16 18:18:01 +00:00
Simon Pilgrim 5a2bbe267a [X86] getFauxShuffleMask - bail for non-byte aligned shuffle types
Remove the existing assertion and just return false for unexpected shuffle value types (<X x i1> mainly....).

Found while updating combineX86ShufflesRecursively to run within SimplifyDemandedVectorElts/SimplifyDemandedBits.

llvm-svn: 351365
2019-01-16 18:15:31 +00:00
Jeremy Morse 7dcea5ae3b [DebugInfo] Allow creation of DBG_VALUEs in blocks where the operand is not used
dbg.value intrinsics can appear in blocks where their operand is not used,
meaning the operand never receives an SDNode, and thus no DBG_VALUE will
be created. Get around this by looking to see whether the operand has already
been allocated a virtual register. This allows dbg.values of Phi node and
Values that are used across basic blocks to successfully be translated into
DBG_VALUEs.

Differential Revision: https://reviews.llvm.org/D56678

llvm-svn: 351358
2019-01-16 17:25:27 +00:00
Simon Pilgrim 524daea429 [X86] Add combineX86ShufflesRecursively helper. NFCI.
combineX86ShufflesRecursively is pretty cumbersome with a lot of arguments that only matter later in recursion.

This commit adds a wrapper version that only takes the initial root Op to simplify calls that don't need to worry about these.

An early, cleanup step towards merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts/SimplifyDemandedBits.

llvm-svn: 351352
2019-01-16 16:01:42 +00:00
Marek Olsak c5cec5e1fa AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52944

llvm-svn: 351351
2019-01-16 15:43:53 +00:00
Alexey Bataev 18809a6bbb [SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.

Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56783

llvm-svn: 351349
2019-01-16 15:39:52 +00:00
Sanjay Patel 0dbecd05ed [x86] lower shuffle of extracts to AVX2 vperm instructions
I was trying to prevent shuffle regressions while matching more horizontal ops 
and ended up here:
  shuf (extract X, 0), (extract X, 4), Mask --> extract (shuf X, undef, Mask'), 0

The affected tests were added for:
https://bugs.llvm.org/show_bug.cgi?id=34380

This patch won't change the examples in the bug report itself, but we should be 
able to extend this to catch more types.

Differential Revision: https://reviews.llvm.org/D56756

llvm-svn: 351346
2019-01-16 14:15:18 +00:00
Anton Korobeynikov cbdb4effae [MSP430] Emit a separate section for every interrupt vector
This is LLVM part of D56663

Linker scripts shipped by TI require to have every
interrupt vector in a separate section with a specific name:

 SECTIONS
 {
   __interrupt_vector_XX   : { KEEP (*(__interrupt_vector_XX )) } > VECTXX
   ...
 }

Follow the requirement emit the section for every vector
which contain address of interrupt handler:

  .section  __interrupt_vector_XX,"ax",@progbits
  .word %isr%

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56664

llvm-svn: 351345
2019-01-16 14:03:41 +00:00
Gabor Buella 3ec170c85a Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.

For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)

```
%rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]*    <----- this one causing the assertion and is an extra bitcast
%5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
```

isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.

As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.

The test is a greatly simplified to just reproduce the assertion.

Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>

Reviewers: chandlerc, craig.topper

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D55934

llvm-svn: 351325
2019-01-16 12:06:17 +00:00
Philip Pfaffe 81101de585 [MSan] Apply the ctor creation scheme of TSan
Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56734

llvm-svn: 351322
2019-01-16 11:14:07 +00:00
Florian Hahn e94470f1cc [SelectionDAG] Update check in createOperands to reflect max() is a valid value.
The value returned by max() is the last valid value, adjust the
comparison accordingly.

The code added in D55073 creates TokenFactors with max() operands.

Reviewers: aemerson, efriedma, RKSimon, craig.topper

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D56738

llvm-svn: 351318
2019-01-16 10:06:04 +00:00
Pavel Labath 1ad53ca2b0 [Support] Remove error return value from one overload of fs::make_absolute
Summary:
The version of make_absolute which accepted a specific directory to use
as the "base" for the computation could never fail, even though it
returned a std::error_code. The reason for that seems to be historical
-- the CWD flavour (which can fail due to failure to retrieve CWD) was
there first, and the new version was implemented by extending that.

This removes the error return value from the non-CWD overload and
reimplements the CWD version on top of that. This enables us to remove
some dead code where people were pessimistically trying to handle the
errors returned from this function.

Reviewers: zturner, sammccall

Subscribers: hiraditya, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D56599

llvm-svn: 351317
2019-01-16 09:55:32 +00:00
Philip Pfaffe 685c76d7a3 [NewPM][TSan] Reiterate the TSan port
Summary:
Second iteration of D56433 which got reverted in rL350719. The problem
in the previous version was that we dropped the thunk calling the tsan init
function. The new version keeps the thunk which should appease dyld, but is not
actually OK wrt. the current semantics of function passes. Hence, add a
helper to insert the functions only on the first time. The helper
allows hooking into the insertion to be able to append them to the
global ctors list.

Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan

Subscribers: hiraditya, bollu, llvm-commits

Differential Revision: https://reviews.llvm.org/D56538

llvm-svn: 351314
2019-01-16 09:28:01 +00:00
Sam Parker dd8cd6d26b [DAGCombine] Fix ReduceLoadWidth for shifted offsets
ReduceLoadWidth can trigger using a shifted mask is used and this
requires that the function return a shl node to correct for the
offset. However, the way that this was implemented meant that the
returned result could be an existing node, which would be incorrect.
This fixes the method of inserting the new node and replacing uses.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 351310
2019-01-16 08:40:12 +00:00
Dan Gohman 9299637d3c [WebAssembly] COWS has been renamed to WASI.
llvm-svn: 351297
2019-01-16 05:23:52 +00:00
Tom Stellard 3d36e5c3e6 Only promote args when function attributes are compatible
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments.  This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.

The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.

This is a very conservative fix for PR37358.  Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.

Reviewers: echristo, chandlerc, eli.friedman, craig.topper

Reviewed By: echristo, chandlerc

Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits

Differential Revision: https://reviews.llvm.org/D53554

llvm-svn: 351296
2019-01-16 05:15:31 +00:00
Serguei Katkov a5b0e5585b [InstCombine]Avoid introduction of unaligned mem access
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.

Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582

llvm-svn: 351295
2019-01-16 04:36:26 +00:00
Sam Clegg 56c587adfd [WebAssembly] Store section alignment as a power of 2
This change bumps for version number of the wasm object file
metadata.

See https://github.com/WebAssembly/tool-conventions/pull/92

Differential Revision: https://reviews.llvm.org/D56758

llvm-svn: 351285
2019-01-16 01:34:48 +00:00
Aditya Nandakumar 500e3ead9f [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Mandeep Singh Grang 436735c3fe [EH] Rename llvm.x86.seh.recoverfp intrinsic to llvm.eh.recoverfp
Summary:
Make recoverfp intrinsic target-independent so that it can be implemented for AArch64, etc.
Refer D53541 for the context. Clang counterpart D56748.

Reviewers: rnk, efriedma

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56747

llvm-svn: 351281
2019-01-16 00:37:13 +00:00
Craig Topper 0e420e6a62 [X86] Rename SHRUNKBLEND ISD node to BLENDV.
That's really what it is. If we didn't use intrinsics for BLENDVPS/BLENDVPD/PBLENDVB all the way to isel, this is the node we would use.

llvm-svn: 351278
2019-01-16 00:20:30 +00:00
Craig Topper 34ac509ac8 [X86] Add avx512 scatter intrinsics that use a vXi1 mask instead of a scalar integer.
We're trying to have the vXi1 types in IR as much as possible. This prevents the need for bitcasts when the producer of the mask was already a vXi1 value like an icmp. The bitcasts can be subject to code motion and interfere with basic block at a time isel in bad ways.

llvm-svn: 351275
2019-01-15 23:36:25 +00:00
Changpeng Fang 20fe3d2f35 AMDGPU: Raise the priority of MAD24 in instruction selection.
Summary:
  We have seen performance regression when v_add3 is generated. The major reason is that the v_mad pattern
is broken when v_add3 is generated. We also see the register pressure increased. While we could not properly
estimate register pressure during instruction selection, we can give mad a higher priority.

In this work, we raise the priority for mad24 in selection and resolve the performance regression.

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D56745

llvm-svn: 351273
2019-01-15 23:12:36 +00:00
Jonas Devlieghere 1a0ce65ad3 [VFS] Move RedirectingFileSystem interface into header (NFC)
This moves the RedirectingFileSystem into the header so it can be
extended. This is needed in LLDB we need a way to obtain the external
path to deal with FILE* and file descriptor APIs.

Discussion on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127755.html

Differential revision: https://reviews.llvm.org/D54277

llvm-svn: 351265
2019-01-15 22:36:41 +00:00
Roman Lebedev fb4eed381d X86DAGToDAGISel::matchBitExtract() with truncation (PR36419)
Summary:
Previously in D54095 i have added support for extraction of `lshr` from `X` if we are to produce `BEXTR`.
That was good, but the fix was partial, there was still [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]].

That pattern can also appear, roughly, when you have a large (64-bit) storage, and the consume bits from it.
It will not be unexpected if you will be doing further computations in 32-bit width.
And then the current code breaks, as the tests show.

The basic idea/pattern here is following:
1. We have `i64` input
2. We perform `i64` right-shift on it.
3. We `trunc`ate that shifted value
4. We do all further work (masking) in `i32`

Since we see `trunc`ation and not `lshr`, we give up, and stop trying to extract that right-shift.
BUT. The mask is `i32`, therefore we can extend both of the operands of the masking (`and`) to `i64`
and truncate the result after masking: https://rise4fun.com/Alive/K4B
```
Name: @bextr64_32_b1 -> @bextr64_32_b0
  %shiftedval = lshr i64 %val, %numskipbits
  %truncshiftedval = trunc i64 %shiftedval to i32
  %widenumlowbits1 = zext i8 %numlowbits to i32
  %notmask1 = shl nsw i32 -1, %widenumlowbits1
  %mask1 = xor i32 %notmask1, -1
  %res = and i32 %truncshiftedval, %mask1
=>
  %shiftedval = lshr i64 %val, %numskipbits
  %widenumlowbits = zext i8 %numlowbits to i64
  %notmask = shl nsw i64 -1, %widenumlowbits
  %mask = xor i64 %notmask, -1
  %wideres = and i64 %shiftedval, %mask
  %res = trunc i64 %wideres to i32
```

Thus, we are again able to extract that `lshr` into `BEXTR`'s control.

Now, the perf (via `llvm-exegesis`) of the snippet suggests that it is not a good idea:
```
$ cat /tmp/old.s
# bextr64_32_b1
# LLVM-EXEGESIS-LIVEIN RSI
# LLVM-EXEGESIS-LIVEIN EDX
# LLVM-EXEGESIS-LIVEIN RDI
movq %rsi, %rcx
shrq %cl, %rdi
shll $8, %edx
bextrl %edx, %edi, %eax
$ cat /tmp/old.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-1e0082.o
---
mode:            latency
key:
  instructions:
    - 'MOV64rr RCX RSI'
    - 'SHR64rCL RDI RDI'
    - 'SHL32ri EDX EDX i_0x8'
    - 'BEXTR32rr EAX EDI EDX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: latency, value: 0.6638, per_snippet_value: 2.6552 }
error:           ''
info:            ''
assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
...
$ cat /tmp/old.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-43e346.o
---
mode:            uops
key:
  instructions:
    - 'MOV64rr RCX RSI'
    - 'SHR64rCL RDI RDI'
    - 'SHL32ri EDX EDX i_0x8'
    - 'BEXTR32rr EAX EDI EDX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: PdFPU0, value: 0, per_snippet_value: 0 }
  - { key: PdFPU1, value: 0, per_snippet_value: 0 }
  - { key: PdFPU2, value: 0, per_snippet_value: 0 }
  - { key: PdFPU3, value: 0, per_snippet_value: 0 }
  - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
error:           ''
info:            ''
assembled_snippet: 4889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C74889F148D3EFC1E208C4E268F7C7C3
...
```
vs
```
$ cat /tmp/new.s
# bextr64_32_b1
# LLVM-EXEGESIS-LIVEIN RDX
# LLVM-EXEGESIS-LIVEIN SIL
# LLVM-EXEGESIS-LIVEIN RDI
shlq $8, %rdx
movzbl %sil, %eax
orq %rdx, %rax
bextrq %rax, %rdi, %rax
$ cat /tmp/new.s | ./bin/llvm-exegesis -mode=latency -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8944f1.o
---
mode:            latency
key:
  instructions:
    - 'SHL64ri RDX RDX i_0x8'
    - 'MOVZX32rr8 EAX SIL'
    - 'OR64rr RAX RAX RDX'
    - 'BEXTR64rr RAX RDI RAX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: latency, value: 0.7454, per_snippet_value: 2.9816 }
error:           ''
info:            ''
assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
...
$ cat /tmp/new.s | ./bin/llvm-exegesis -mode=uops -snippets-file=-
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-da403c.o
---
mode:            uops
key:
  instructions:
    - 'SHL64ri RDX RDX i_0x8'
    - 'MOVZX32rr8 EAX SIL'
    - 'OR64rr RAX RAX RDX'
    - 'BEXTR64rr RAX RDI RAX'
  config:          ''
  register_initial_values: []
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: PdFPU0, value: 0, per_snippet_value: 0 }
  - { key: PdFPU1, value: 0, per_snippet_value: 0 }
  - { key: PdFPU2, value: 0, per_snippet_value: 0 }
  - { key: PdFPU3, value: 0, per_snippet_value: 0 }
  - { key: NumMicroOps, value: 1.2571, per_snippet_value: 5.0284 }
error:           ''
info:            ''
assembled_snippet: 48C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C748C1E208400FB6C64809D0C4E2F8F7C7C3
...
```
^ latency increased (worse).

Except //maybe// not really.
Like with all synthetic benchmarks, they //may// be misleading.

Let's take a look on some actual real-world hotpath.
In this case it's 'my' [[ https://github.com/darktable-org/rawspeed | RawSpeed ]]'s `BitStream<>::peekBitsNoFill()`, in [[ e3316dc851/src/librawspeed/decompressors/VC5Decompressor.cpp (L814) | GoPro VC5 decompressor ]]:
```
raw.pixls.us-unique/GoPro/HERO6 Black$ /usr/src/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-clangs1-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR
RUNNING: /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmplwbKEM
2018-12-22 21:23:03
Running /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench
Run on (8 X 4012.81 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 3.41, 2.41, 2.03
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_mean           40 ms         40 ms        128   0.322244          7.96974        12M       37.4457M        298.534M      3.12047       24.8778   0.040465
GOPR9172.GPR/threads:8/real_time_median         39 ms         39 ms        128   0.312606          7.99155        12M        38.387M        306.788M      3.19891       25.5656   0.039115
GOPR9172.GPR/threads:8/real_time_stddev          4 ms          3 ms        128  0.0271557         0.130575          0        2.4941M        21.3909M     0.207842       1.78257   3.81081m
RUNNING: /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 GOPR9172.GPR --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpWAkan9
2018-12-22 21:23:08
Running /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
Run on (8 X 4013.1 MHz CPU s)
CPU Caches:
  L1 Data 16K (x8)
  L1 Instruction 64K (x4)
  L2 Unified 2048K (x4)
  L3 Unified 8192K (x1)
Load Average: 3.78, 2.50, 2.06
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time           CPU Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_mean           39 ms         39 ms        128   0.311533          7.97323        12M       38.6828M        308.471M      3.22356        25.706  0.0390928
GOPR9172.GPR/threads:8/real_time_median         38 ms         38 ms        128   0.304231          7.99005        12M       39.4437M        315.527M      3.28698        26.294  0.0380316
GOPR9172.GPR/threads:8/real_time_stddev          3 ms          3 ms        128  0.0229149         0.133814          0       2.26225M        19.1421M     0.188521       1.59517   3.13671m
Comparing /home/lebedevri/rawspeed/build-clangs1-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-clangs1-new/src/utilities/rsbench/rsbench
Benchmark                                                 Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------
GOPR9172.GPR/threads:8/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
GOPR9172.GPR/threads:8/real_time_mean                  -0.0339         -0.0316            40            39            40            39
GOPR9172.GPR/threads:8/real_time_median                -0.0277         -0.0274            39            38            39            38
GOPR9172.GPR/threads:8/real_time_stddev                -0.1769         -0.1267             4             3             3             3
```
I.e. this results in //roughly// -3% improvements in perf.

While this will help [[ https://bugs.llvm.org/show_bug.cgi?id=36419 | PR36419 ]], it won't address it fully.

Reviewers: RKSimon, craig.topper, andreadb, spatel

Reviewed By: craig.topper

Subscribers: courbet, llvm-commits

Differential Revision: https://reviews.llvm.org/D56052

llvm-svn: 351253
2019-01-15 21:31:18 +00:00
David Callahan d129d3e93f treat invoke like call
Summary:
InvokeInst should be treated like CallInst and
assigned a separate discriminator. This is particularly
import when an Invoke is converted to a Call
during compilation and so can invalidate sample profile
data collected wtih different link time optimizations

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56491

llvm-svn: 351251
2019-01-15 21:26:51 +00:00
Matt Morehouse 19ff35c481 [SanitizerCoverage] Don't create comdat for interposable functions.
Summary:
Comdat groups override weak symbol behavior, allowing the linker to keep
the comdats for weak symbols in favor of comdats for strong symbols.

Fixes the issue described in:
https://bugs.chromium.org/p/chromium/issues/detail?id=918662

Reviewers: eugenis, pcc, rnk

Reviewed By: pcc, rnk

Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56516

llvm-svn: 351247
2019-01-15 21:21:01 +00:00
Craig Topper 82015b633b [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
In keeping with our general direction of having the vXi1 type present in IR, this patch converts the mask argument for avx512 gather to vXi1. This can avoid k-register to GPR to k-register transitions late in codegen.

I left the existing intrinsics behind because they have many out of tree users such as ISPC. They generate their own code and don't go through the autoupgrade path which only works for bitcode and ll parsing. Ideally we will get them to migrate to target independent intrinsics, but it might be easier for them to migrate to these new intrinsics.

I'll work on scatter and gatherpf/scatterpf next.

Differential Revision: https://reviews.llvm.org/D56527

llvm-svn: 351234
2019-01-15 20:12:33 +00:00
Anton Korobeynikov c9e9e28487 [MSP430] Recognize '{' as a line separator
msp430-as supports multiple assembly statements on the same line
separated by a '{' character.

llvm-svn: 351233
2019-01-15 20:10:46 +00:00
Craig Topper 99fcbf67d0 [Nios2] Remove Nios2 backend
As mentioned here http://lists.llvm.org/pipermail/llvm-dev/2019-January/129121.html This backend is incomplete and has not been maintained in several months.

Differential Revision: https://reviews.llvm.org/D56691

llvm-svn: 351231
2019-01-15 19:59:19 +00:00
Nikita Popov d3b86b79fa Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351219
2019-01-15 18:43:41 +00:00
Yury Delendik be24c02003 [WebAssembly] Fix updating/moving DBG_VALUEs in RegStackify
Summary:
As described in PR40209, there can be issues in DBG_VALUEs handling when multiple defs present in a BB. This patch
adds logic for detection of related to def DBG_VALUEs and localizes register update and movement to found DBG_VALUEs.

Reviewers: aheejin

Subscribers: mgorny, dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56401

llvm-svn: 351216
2019-01-15 18:14:12 +00:00
David Callahan dee001207e We can improve the performance (generally) by memo-izing the action to map a debug location to its function summary.
Summary:
Here are timings (as reported by "opt -time-passes") for
sample-profile pass for some files holding hot functions from a major
service©r. Average 17% reduction. Delta column is 100*(old-new)/old.

```
Old    New    Delta
0.0537 0.0538 -0.2%
0.8155 0.6522 20.0%
0.0779 0.0751  3.6%
0.0727 0.0913 -25.6%
0.1622 0.1302 19.7%
0.0627 0.0594  5.3%
0.0766 0.0744  2.9%
0.6426 0.4387 31.7%
0.3521 0.2776 21.2%
0.3549 0.2721 23.3%
0.0912 0.0904  0.9%
0.1236 0.1059 14.3%
0.0854 0.0866 -1.4%
0.0757 0.0722  4.6%
0.1293 0.1147 11.3%
0.1354 0.1122 17.1%
0.0767 0.0770 -0.4%
0.1135 0.0968 14.7%
0.0524 0.0608 -16.0%
0.1279 0.1106 13.5%
==========
3.6820 3.0520 17.1% Total
```

Reviewers: twoh, Kader, danielcdh, wmi

Reviewed By: wmi

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D56435

llvm-svn: 351211
2019-01-15 17:45:54 +00:00
Nirav Dave fcb4a492db [SelectionDAG] Check membership of register in class for single
register constraints. NFCI.

Now that X86's ST(7) constraints are fixed this check can be
reinstated.

llvm-svn: 351207
2019-01-15 17:09:23 +00:00
Nirav Dave dbc41bac0c [X86] Fix register class for assembly constraints to ST(7). NFCI.
Modify getRegForInlineAsmConstraint to return special singleton
register class when a constraint references ST(7) not RFP80 for which
ST(7) is not a member.

llvm-svn: 351206
2019-01-15 17:09:14 +00:00
Simon Pilgrim b8f08c8d7b [X86] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

We were already doing this for SSSE3 targets as we have PSHUFB, but its worth doing for all targets.

llvm-svn: 351203
2019-01-15 16:56:55 +00:00
Sanjay Patel fad5bdaf95 [DAGCombiner] reduce buildvec of zexted extracted element to shuffle
The motivating case for this is shown in the first regression test. We are 
transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'.

That's a special-case for a more general pattern as shown here. In all tests, 
we're avoiding the vector-scalar-vector moves in favor of vector ops.

We aren't producing optimal shuffle code in some cases though, so the patch is
limited to reduce regressions.

Differential Revision: https://reviews.llvm.org/D56281

llvm-svn: 351198
2019-01-15 16:11:05 +00:00
Lang Hames 199a00c3a2 Revert r351138 "[ORC] Move ORC Core symbol map and set types into their own
header: CoreTypes.h."

This commit broke some bots. Reverting while I investigate.

llvm-svn: 351195
2019-01-15 15:21:13 +00:00
Zaara Syeda b7dff9c9af [SimpleLoopUnswitch] Increment stats counter for unswitching switch instruction
Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for
unswitching a non-trivial switch instruction. This is to fix a bug that it
increments NumBranches even for the case of switch instruction.
There is no functional change in this patch.

Differential Revision: https://reviews.llvm.org/D56408

llvm-svn: 351193
2019-01-15 15:08:01 +00:00
Florian Hahn 4094f34f78 [InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs.
Otherwise instcombine gets stuck in a cycle. The canonicalization was
added in D55961.

This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400

llvm-svn: 351187
2019-01-15 11:18:21 +00:00
Max Kazantsev f8a0e0ddf0 [NFC] Remove some code duplication
llvm-svn: 351185
2019-01-15 11:16:14 +00:00
Max Kazantsev 80242ee87e [NFC] Remove obsolete enum RangeCheckKind
llvm-svn: 351183
2019-01-15 10:48:45 +00:00
Max Kazantsev 78a5435284 [NFC] Decrease if nest
llvm-svn: 351180
2019-01-15 10:01:46 +00:00
Max Kazantsev a78dc4d6c8 [NFC] Move some functions to LoopUtils
llvm-svn: 351179
2019-01-15 09:51:34 +00:00
Dan Gohman 1839dfd6d4 [WebAssembly] Support multilibs for wasm32 and add a wasm OS that uses it
This adds support for multilib paths for wasm32 targets, following
[Debian's Multiarch conventions], and also adds an experimental OS name in
order to test it.

[Debian's Multiarch conventions]: https://wiki.debian.org/Multiarch/

Differential Revision: https://reviews.llvm.org/D56553

llvm-svn: 351163
2019-01-15 06:58:13 +00:00
Thomas Lively 6bf2b40051 [WebAssembly] Expand SIMD shifts while V8's implementation disagrees
Summary:
V8 currently implements SIMD shifts as taking an immediate operation,
which disagrees with the spec proposal and the toolchain
implementation. As a stopgap measure to get things working, unroll all
vector shifts. Since this is a temporary measure, there are no tests.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56520

llvm-svn: 351151
2019-01-15 02:16:03 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Evandro Menezes f793fe1402 [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Reid Kleckner fe5e5dcab0 [X86] Avoid clobbering ESP/RSP in the epilogue.
Summary:
In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to
be used for tail calls, but this also caused `findDeadCallerSavedReg` to
think they were acceptable targets for clobbering. Filter them out.

Fixes PR40289.

Patch by Geoffry Song!

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D56617

llvm-svn: 351146
2019-01-15 01:24:18 +00:00
Eli Friedman 295e346da4 [EarlyIfConversion] Don't if-convert unconditional branches.
A block ending in an unconditional branch can have two successors if one
is a landing pad.  In practice, I think this only has an effect on
Windows because landing pads are never empty for Itanium unwinding.

(Alternatively, I could add a check to
AArch64InstrInfo::canInsertSelect, but this seems more obvious.)

Differential Revision: https://reviews.llvm.org/D56468

llvm-svn: 351142
2019-01-15 00:19:46 +00:00
Eli Friedman 33aecc8182 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes bf59cb02c3 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
Lang Hames ed2df18a48 [ORC] Move ORC Core symbol map and set types into their own header: CoreTypes.h.
This will allow other utilities (including a future RuntimeDyld replacement) to
use these types without pulling in the major Core types (JITDylib, etc.).

llvm-svn: 351138
2019-01-14 23:49:13 +00:00
Benjamin Kramer 2fc8ede082 [X86] Fix unused variable warning in Release builds. NFC.
llvm-svn: 351136
2019-01-14 23:29:54 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Lang Hames 46f0a97e2c [Object] Return a symbol_iterator, rather than a basic_symbol_iterator, from
MachOObjectFile::getSymbolByIndex.

ObjectFile derivatives should prefer symbol_iterator/SymbolRef over
basic_symbol_iterator/BasicSymbolRef where possible, as the former
retain their link to the ObjectFile (rather than a SymbolicFile) and provide
more functionality.

No test for this: Existing code is working, and we don't have (m)any libObject
unit tests. I'll think about how we can test more systematically going forward.

llvm-svn: 351128
2019-01-14 22:05:12 +00:00
Thomas Lively 2a47e03ee4 [WebAssembly][FastISel] Do not assume naive CmpInst lowering
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=40172. See
test/CodeGen/WebAssembly/PR40172.ll for an explanation.

Reviewers: dschuff, aheejin

Subscribers: nikic, llvm-commits, sunfish, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D56457

llvm-svn: 351127
2019-01-14 22:03:43 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
James Y Knight 544fa425c9 [opaque pointer types] Update GetElementPtr creation APIs to
consistently accept a pointee-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56559

llvm-svn: 351124
2019-01-14 21:39:35 +00:00
James Y Knight 84c1dbde08 [opaque pointer types] Update LoadInst creation APIs to consistently
accept a return-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56558

llvm-svn: 351123
2019-01-14 21:37:53 +00:00
James Y Knight eb2c4af1bf [opaque pointer types] Update InvokeInst creation APIs to consistently
accept a callee-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56557

llvm-svn: 351122
2019-01-14 21:37:48 +00:00
James Y Knight f956390954 [opaque pointer types] Update CallInst creation APIs to consistently
accept a callee-type argument.

Note: this also adds a new C API and soft-deprecates the old C API.

Differential Revision: https://reviews.llvm.org/D56556

llvm-svn: 351121
2019-01-14 21:37:42 +00:00
Jonathan Metzman e159a0dd1a [SanitizerCoverage][NFC] Use appendToUsed instead of include
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.

Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.

Reviewers: pcc

Reviewed By: pcc

Subscribers: hiraditya

Differential Revision: https://reviews.llvm.org/D56369

llvm-svn: 351118
2019-01-14 21:02:02 +00:00
Craig Topper 9906f77f82 [X86] Silence a -Wparentheses warning on gcc. NFC
llvm-svn: 351111
2019-01-14 19:44:02 +00:00
Simon Pilgrim bfe2ee453a [X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

llvm-svn: 351103
2019-01-14 19:07:26 +00:00
David Callahan 957795973b Ignore PhiNodes when mapping sample profile data
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.

Reviewers: danielcdh, twoh, Kader, wmi

Reviewed By: wmi

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D55094

llvm-svn: 351102
2019-01-14 19:05:59 +00:00
David Callahan b1853a6a94 Revert "Merge branch 'arcpatch-D55094'"
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.

unintended merge pushed

llvm-svn: 351095
2019-01-14 18:49:27 +00:00
Sanjay Patel b23ff7a0e2 [x86] lower extracted add/sub to horizontal vector math
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0

This is the integer sibling to D56011.

There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more 
beneficial than this minimal case. An improvement to the more general 
h-op lowering may allow us to remove the restriction here in a follow-up.

llvm-svn: 351093
2019-01-14 18:44:02 +00:00
David Callahan 1b46231764 Merge branch 'arcpatch-D55094'
llvm-svn: 351092
2019-01-14 18:35:43 +00:00
Amara Emerson e07cdb107e Revert "[VFS] Allow multiple RealFileSystem instances with independent CWDs."
This reverts commit r351079, r351069 and r351050 as it broken the greendragon bots on macOS.

llvm-svn: 351091
2019-01-14 18:32:09 +00:00
Tom Stellard d0a7676087 cmake: Don't install plugins used for examples or tests
Summary:
This patch drops install targets for LLVMHello.so,
TestPlugin.so, and BugpointPasses.so.

Reviewers: chandlerc, beanz, thakis, philip.pfaffe

Reviewed By: chandlerc

Subscribers: SquallATF, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D55965

llvm-svn: 351087
2019-01-14 18:25:35 +00:00
Dan Gohman bbb548d85f [WebAssembly] Remove old intrinsics
This removes the old grow_memory and mem.grow-style intrinsics, leaving just
the memory.grow-style intrinsics.

Differential Revision: https://reviews.llvm.org/D56645

llvm-svn: 351084
2019-01-14 18:23:45 +00:00
Adrian Prantl fa2e35838c Reapply r345008 "Split MachinePipeliner code into header and cpp files"
Split MachinePipeliner code into header and cpp files to allow
inheritance from SwingSchedulerDAG.

This reapplies https://reviews.llvm.org/D56084 after moving the
implementation of the dump functions into the .cpp files. This fixes a
linker error when building with Clang modules enables and local
submodule visibility disabled.

Original patch by Lama Saba <lama.saba@intel.com>!

llvm-svn: 351077
2019-01-14 17:24:11 +00:00
James Y Knight 68729f94ee Remove NameLen argument from newly-introduced IR C APIs.
Normally, changing the function signatures of C APIs is disallowed,
but as these two are brand new last week, and haven't been released
yet, it is okay in this instance.

As per discussion in D56556, we will not add NameLen arguments to IR
building APIs, for the following reasons:

1. We do not want to deprecate all of the IR building APIs, just to add a
NameLen argument to each one.

2. Consistency is important, so adding it just to new ones is unfortunate.

3. The IR names are completely optional, useful for readability of IR
only. There is no value in ever supporting nul bytes.

Differential Revision: https://reviews.llvm.org/D56669

llvm-svn: 351076
2019-01-14 17:16:55 +00:00
Nirav Dave 3badfe74a2 Reland "Refactor GetRegistersForValue. NFCI."
Remove over-strictification class membership check.

llvm-svn: 351074
2019-01-14 17:09:45 +00:00
Simon Pilgrim a1bd4a6ba4 [DAGCombiner] Add (sub_sat x, x) -> 0 combine
llvm-svn: 351073
2019-01-14 15:43:34 +00:00
Simon Pilgrim fa1f518748 [DAGCombiner] Enable sub saturation constant folding
llvm-svn: 351072
2019-01-14 15:28:53 +00:00
Simon Pilgrim 7fc6882374 [DAGCombiner] Add add/sub saturation undef handling
Match ConstantFolding.cpp:
(add_sat x, undef) -> -1
(sub_sat x, undef) -> 0

llvm-svn: 351070
2019-01-14 14:16:24 +00:00
Sam McCall 7a99727c62 [VFS] Fix unused variable warning. NFC
llvm-svn: 351069
2019-01-14 14:13:24 +00:00
Simon Pilgrim cfa5f06dde [DAGCombiner] Enable add saturation constant folding
llvm-svn: 351060
2019-01-14 12:34:31 +00:00
Aleksandar Beserminji 4c4c0377ca [mips] Optimize shifts for types larger than GPR size (mips2/mips3)
With this patch, shifts are lowered to optimal number of instructions
necessary to shift types larger than the general purpose register size.

This resolves PR/32293.

Thanks to Kyle Butt for reporting the issue!

Differential Revision: https://reviews.llvm.org/D56320

llvm-svn: 351059
2019-01-14 12:28:51 +00:00
Jeremy Morse f216da7ee0 [DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.

This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).

Differential Revision: https://reviews.llvm.org/D55272

llvm-svn: 351058
2019-01-14 12:13:12 +00:00
Simon Pilgrim 67610926fc [DAGCombiner] Add add saturation constant folding tests.
Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now.

llvm-svn: 351057
2019-01-14 12:12:42 +00:00
Diana Picus 8987d00653 [ARM GlobalISel] Import MOVi32imm into GlobalISel
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.

We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.

llvm-svn: 351056
2019-01-14 12:04:08 +00:00
Simon Pilgrim 3d42815cd8 [SelectionDAG] Add type sanity assertions for add/sub saturation node creation.
llvm-svn: 351055
2019-01-14 11:56:59 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Sam McCall c2b310aedf [VFS] Allow multiple RealFileSystem instances with independent CWDs.
Summary:
Previously only one RealFileSystem instance was available, and its working
directory is shared with the process. This doesn't work well for multithreaded
programs that want to work with relative paths - the vfs::FileSystem is assumed
to provide the working directory, but a thread cannot control this exclusively.

The new vfs::createPhysicalFileSystem() factory copies the process's working
directory initially, and then allows it to be independently modified.

This implementation records the working directory path, and glues it to relative
paths to provide the correct absolute path to the sys::fs:: functions.
This will give different results in unusual situations (e.g. the CWD is moved).

The main alternative is the use of openat(), fstatat(), etc to ask the OS to
resolve paths relative to a directory handle which can be kept open. This is
more robust. There are two reasons not to do this initially:
1. these functions are not available on all supported Unixes, and are somewhere
   between difficult and unavailable on Windows. So we need a path-based
   fallback anyway.
2. this would mean also adding support at the llvm::sys::fs level, which is a
   larger project. My clearest idea is an OS-specific `BaseDirectory` object
   that can be optionally passed to functions there. Eventually this could be
   backed by either paths or a fd where openat() is supported.
   This is a large project, and demonstrating here that a path-based fallback
   works is a useful prerequisite.

There is some subtlety to the path-manipulation mechanism:
  - when setting the working directory, both Specified=makeAbsolute(path) and
    Resolved=realpath(path) are recorded. These may differ in the presence of
    symlinks.
  - getCurrentWorkingDirectory() and makeAbsolute() use Specified - this is
    similar to the behavior of $PWD and sys::path::current_path
  - IO operations like openFileForRead use Resolved. This is similar to the
    behavior of an openat() based implementation, that doesn't see changes
    in symlinks.
There may still be combinations of operations and FS states that yield unhelpful
behavior. This is hard to avoid with symlinks and FS abstractions :(

The caching behavior of the current working directory is removed in this patch.
getRealFileSystem() is now specified to link to the process CWD, so the caching
is incorrect.
The user who needed this so far is clangd, which will immediately switch to
createPhysicalFileSystem().

Reviewers: ilya-biryukov, bkramer, labath

Subscribers: ioeric, kadircet, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D56545

llvm-svn: 351050
2019-01-14 10:56:35 +00:00
Francis Visoiu Mistrih b7cef81fd3 Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049
2019-01-14 10:55:55 +00:00
Petar Avramovic 7d370a36bb [MIPS GlobalISel] Add pre legalizer combiner pass
Introduce GlobalISel pre legalizer pass for MIPS.
It will be used to cope with instructions that require
combining before legalization.

Differential Revision: https://reviews.llvm.org/D56269

llvm-svn: 351046
2019-01-14 10:27:05 +00:00
Max Kazantsev 1f73310e1e [BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocks
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.

This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted. 

The logic of `DeleteDeadBlocks` is following:
  for each block
    create relevant DT updates;
    remove all instructions (replace with undef if needed);
    replace terminator with unreacheable;
  apply DT updates;
  for each block
    delete block;

Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.

Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov

llvm-svn: 351045
2019-01-14 10:26:26 +00:00
Thomas Preud'homme bc5e6ee87a Add support for prefix-only CLI options
Summary:
Add support for options that always prefix their value, giving an error
if the value is in the next argument or if the option is given a value
assignment (ie. opt=val). This is the desired behavior for the -D option
of FileCheck for instance.

Copyright:
- Linaro (changes in version 2 of revision D55940)
- GraphCore (changes in later versions and introduced when creating
  D56549)

Reviewers: jdenny

Subscribers: llvm-commits, probinson, kristina, hiraditya,
JonChesterfield

Differential Revision: https://reviews.llvm.org/D56549

llvm-svn: 351038
2019-01-14 09:28:53 +00:00
Craig Topper e7b4ea4726 [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351035
2019-01-14 08:46:45 +00:00
Craig Topper ab077dda72 [X86] Update type profile for DBPSADBW to indicate the immediate is an i8 not just any int.
Removes some type checks from X86GenDAGISel.inc

llvm-svn: 351033
2019-01-14 02:59:08 +00:00
Craig Topper c8cd85588b [X86] Remove unused intrinsic handlers. NFC
llvm-svn: 351032
2019-01-14 01:56:59 +00:00
Craig Topper 075fcc1151 [X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFC
llvm-svn: 351031
2019-01-14 01:44:09 +00:00
Craig Topper 3f3b8ef442 [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
The input mask can be represented with an AND in IR.

Fixes PR40258

llvm-svn: 351028
2019-01-14 00:03:50 +00:00
Simon Pilgrim 56ba1db933 [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)
NOTE: We need more powerful signed overflow detection in computeOverflowKind
llvm-svn: 351026
2019-01-13 22:08:26 +00:00
Simon Pilgrim 888fa8680c Fix unused variable warning. NFCI.
llvm-svn: 351025
2019-01-13 21:53:12 +00:00
Simon Pilgrim 897d4c6fe9 [DAGCombiner] Some very basic add/sub saturation combines.
Handle combines with zero and constant canonicalization for adds.

llvm-svn: 351024
2019-01-13 21:50:24 +00:00
Craig Topper 4978de36e4 [LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with an assert.
I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work.

It calls 'getNOT' on a node that should be a CondCode.

I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC?

llvm-svn: 351022
2019-01-13 19:33:30 +00:00
Nikita Popov 0400e50445 [X86] Rename overly verbose method; NFC
As suggested on D56636.

llvm-svn: 351021
2019-01-13 16:41:26 +00:00
Craig Topper 31156bbdb9 [X86] Add more ISD nodes to handle masked versions of VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements.
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.

Fixes another case from PR34877

llvm-svn: 351018
2019-01-13 02:59:59 +00:00
Craig Topper 4561edbec0 [X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which only produces 2 result elements and zeroes the upper elements.
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.

Fixes another case from PR34877.

llvm-svn: 351017
2019-01-13 02:59:57 +00:00
Benjamin Kramer b17d2136ea Give helper classes/functions local linkage. NFC.
llvm-svn: 351016
2019-01-12 18:36:22 +00:00
Simon Pilgrim a0069ba0db [X86] More aggressive shuffle mask widening in combineExtractWithShuffle
Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through.

llvm-svn: 351013
2019-01-12 16:38:56 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Sanjay Patel 625d5aef62 [DAGCombiner] fold insert_subvector of insert_subvector
This pattern:

    t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
  t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>

...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.

In the affected tests here, it looks like it just makes RA wiggle. But we might 
as well squash this to prevent it interfering with other pattern-matching.

Differential Revision:
https://reviews.llvm.org/D56604

llvm-svn: 351008
2019-01-12 15:12:28 +00:00
Simon Pilgrim 0d92c4debc Use getShiftAmountTy for shift amounts.
llvm-svn: 351005
2019-01-12 12:00:43 +00:00
Simon Atanasyan 789f4154db [ORC][MIPS] Fill delay-slot after `jr` instruction
MIPS `jr` instruction uses a delay-slot. To escape execution of
arbitrary instruction we should either fill the delay-slot by `nop`
instruction or swap `jr` instruction and logically preceding
instruction. This fix implements the second method to generate a bit
more effective code.

llvm-svn: 351001
2019-01-12 11:12:08 +00:00
Simon Atanasyan f903f782e7 [ORC][MIPS] Setup t9 register and call function through this register
MIPS ABI states that every function must be called through jalr $t9. In
other words, a function expect that t9 register points to the beginning
of its code. A function uses this register to calculate offset to the
Global Offset Table and save it to the `gp` register.
```
lui   $gp, %hi(_gp_disp)
addiu $gp, %lo(_gp_disp)
addu  $gp, $gp, $t9
```

If `t9` and as a result `$gp` point to the wrong place the following code
loads incorrect value from GOT and passes control to invalid code.
```
lw    $v0,%call16(foo)($gp)
jalr  $t9
```

OrcMips32 and OrcMips64 writeResolverCode methods pass control to the
resolved address, but do not setup `$t9` before the call. The `t9` holds
value of the beginning of `resolver` code so any attempts to call
routines via GOT failed.

This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test
starts to pass correctly. Before the change it fails on MIPS because the
`exitOnLazyCallThroughFailure` called from the resolver code could not
call libc routine `exit` via GOT.

Differential Revision: http://reviews.llvm.org/D56058

llvm-svn: 351000
2019-01-12 11:12:04 +00:00
Simon Pilgrim a21e2bd682 [X86] Improve vXi64 ISD::ABS codegen with SSE41+
Make use of vblendvpd to select on the signbit

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350999
2019-01-12 10:28:12 +00:00
Simon Pilgrim ca0de0363b [X86][AARCH64] Improve ISD::ABS support
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350998
2019-01-12 09:59:32 +00:00
Nikita Popov 5f393eb5da Reapply "[DemandedBits] Use SetVector for Worklist"
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Reapplying with a smaller number of inline elements in the
SmallSetVector, to avoid running into the SmallDenseMap issue
described in D56455.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350997
2019-01-12 09:09:15 +00:00
Craig Topper 90fe6edcba [X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic lowering.
llvm-svn: 350995
2019-01-12 08:15:54 +00:00
Craig Topper 33b2cf50e3 [X86] Add ISD node for masked version of CVTPS2PH.
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.

This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.

Fixes another instruction for PR34877.

llvm-svn: 350994
2019-01-12 08:05:12 +00:00
Alex Bradbury 61aa940074 [RISCV] Introduce codegen patterns for RV64M-only instructions
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.

Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).

Differential Revision: https://reviews.llvm.org/D53230

llvm-svn: 350993
2019-01-12 07:43:06 +00:00
Alex Bradbury d05eae7a7b [RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>

Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.

There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.

Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.

Differential Revision: https://reviews.llvm.org/D56264

llvm-svn: 350992
2019-01-12 07:32:31 +00:00
Craig Topper a69d903204 [X86] Remove unnecessary code from getMaskNode.
We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them.

llvm-svn: 350991
2019-01-12 06:13:44 +00:00
Craig Topper bf61525e8c [X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1.
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.

By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.

llvm-svn: 350989
2019-01-12 02:22:10 +00:00
Craig Topper 8695e6dfc4 [X86] Change some patterns that select MOVZX16rm8 to instead select MOVZX32rm8 and extract the subregister.
This should be a shorter encoding and is consistent with what we do for zext i8->i16

llvm-svn: 350988
2019-01-12 02:22:06 +00:00
Evandro Menezes 7bc55e4075 [ARM] Fix typo
Fix typo in r350952.

llvm-svn: 350986
2019-01-12 01:06:43 +00:00
Craig Topper abe6ef8d09 [X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits.
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.

This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.

This fixes some of PR34877

llvm-svn: 350985
2019-01-12 00:55:27 +00:00
Evandro Menezes 7d7e3256cd [AArch64] Improve Exynos predicates
Expand the predicate using shifted arithmetic and logic instructions to also
consider the respective not shifted instructions.

llvm-svn: 350976
2019-01-11 22:39:47 +00:00
Nikita Popov 9f6e9cf71b [ConstantFolding] Fold undef for integer intrinsics
This fixes https://bugs.llvm.org/show_bug.cgi?id=40110.

This implements handling of undef operands for integer intrinsics in
ConstantFolding, in particular for the bitcounting intrinsics (ctpop,
cttz, ctlz), the with.overflow intrinsics, the saturating math
intrinsics and the funnel shift intrinsics.

The undef behavior follows what InstSimplify does for the general cas
e of non-constant operands. For the bitcount intrinsics (where
InstSimplify doesn't do undef handling -- there cannot be a combination
of an undef + non-constant operand) I'm using a 0 result if the intrinsic
is defined for zero and undef otherwise.

Differential Revision: https://reviews.llvm.org/D55950

llvm-svn: 350971
2019-01-11 21:18:00 +00:00
Nirav Dave 6b7f5aac72 [X86] Fix incomplete handling of register-assigned variables in parsing.
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.

Reviewers: rnk, nickdesaulniers, void

Subscribers: hiraditya, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D56287

llvm-svn: 350966
2019-01-11 20:17:36 +00:00
Evandro Menezes 0c14c87d00 [AArch64] Add pipeline model for Exynos M4
Add the scheduling and cost model for Exynos M4.

llvm-svn: 350960
2019-01-11 19:36:25 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Pirama Arumuga Nainar cc07dabdaa [Legalizer] Use correct ValueType of SELECT_CC node during Float promotion
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.

Fix PR40273.

Reviewers: efriedma, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56566

llvm-svn: 350951
2019-01-11 18:46:02 +00:00
Teresa Johnson 290a839891 [LTO] Record whether LTOUnit splitting is enabled in index
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).

The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.

This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.

Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.

Reviewers: pcc

Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D53890

llvm-svn: 350948
2019-01-11 18:31:57 +00:00
Vedant Kumar ee10ef737e [MergeFunc] Erase unused duplicate functions if they are discardable
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.

Differential Revision: https://reviews.llvm.org/D56574

llvm-svn: 350939
2019-01-11 17:56:35 +00:00
Vedant Kumar 08fe7e02fb [MergeFunc] Use Instruction::getFunction as a cleanup, NFC
llvm-svn: 350938
2019-01-11 17:56:21 +00:00
Ehsan Amiri f452f116d2 [Jump Threading] Unfold a select insn that feeds a switch via a phi node
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select 
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a 
switch (via a phi node). The patch unfolds select under this condition. 
A testcase is provided.

llvm-svn: 350931
2019-01-11 15:52:57 +00:00
Sanjay Patel 40cd4b77e9 [x86] allow insert/extract when matching horizontal ops
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.

There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.

llvm-svn: 350928
2019-01-11 14:27:59 +00:00
Martin Storsjo 114ad37c1d Revert "[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI."
This reverts commit r350841, as it actually had functional changes
and broke compilation. See PR40290.

llvm-svn: 350921
2019-01-11 07:31:17 +00:00
Craig Topper b97885cc2e [X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector.
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.

Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.

This fixes some of the regressions from D350800.

llvm-svn: 350918
2019-01-11 05:44:56 +00:00
Heejin Ahn e73c7a1ab2 [WebAssembly] Fix stack pointer store check in RegStackify
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.

This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.

Reviewers: dschuff, sunfish

Subscribers: jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D56094

llvm-svn: 350906
2019-01-10 23:12:07 +00:00
Anton Korobeynikov 0681d6bc90 [MSP430] Minor fixes/improvements for assembler/disassembler
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
  than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56547

llvm-svn: 350903
2019-01-10 22:59:50 +00:00
Anton Korobeynikov 29ffb6d558 [MSP430] Add missing instruction forms
* Add missing mm, [r|m]n, [r|m]p instruction forms.
* Fix bit16mc instruction.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56546

llvm-svn: 350902
2019-01-10 22:54:53 +00:00
Thomas Lively 64a39a1c4e [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56560

llvm-svn: 350901
2019-01-10 22:32:11 +00:00
Gerolf Hoflehner cb7d968f73 [MachineCombiner][NFC] Prevent dereferencing past-the-end object in an MRI container
llvm-svn: 350896
2019-01-10 21:53:13 +00:00
Alina Sbirlea e41f4b39e5 [MemorySSA] Disable checkClobberSanity for SkipSelfWalker.
Sanity will fail for this, since we're exploring getting a clobber
further than the sanity check expects.
Ideally we need to teach the sanity check to differentiate between the
two walkers based on the SkipSelf bool in the query.

llvm-svn: 350895
2019-01-10 21:47:15 +00:00
Matt Davis 9cd9f41f0e [GVN] Update BlockRPONumber prior to use.
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional.  In short we found cases where that map was being accessed with blocks that had not yet been added to that structure.  For context, I've kept the wall of text below,  to what we are trying to fix, by always ensuring a updated BlockRPONumber.

== Backstory ==

I was investigating an ICE (segfault accessing a DenseMap item).  This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).

After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`.  Our test case that triggered this warning and periodic crash is rather involved.  But the problematic line looks to be:

GVN.cpp: Line 2197

```
     if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```

To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.

My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above).  In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.

1) There is no sequence point for the `>=` operation.

2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).

3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) .  A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid.  The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.

The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int.  And that pointer is being stored on  the stack.  If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid. 

Reviewers: t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D55974

llvm-svn: 350880
2019-01-10 19:56:03 +00:00
Alina Sbirlea cae12edaaa Use MemorySSA in LICM to do sinking and hoisting.
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Reviewers: sanjoy, davide, gberry, george.burgess.iv

Subscribers: llvm-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40375

llvm-svn: 350879
2019-01-10 19:29:04 +00:00
Craig Topper 844f989608 [X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.

This should help some of the regressions from D56387

Differential Revision: https://reviews.llvm.org/D56421

llvm-svn: 350875
2019-01-10 19:05:34 +00:00
Craig Topper 350e6e9d7c [X86] Simplify the BRCOND handling for FCMP_UNE.
Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch.

As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all.

Differential Revision: https://reviews.llvm.org/D56309

llvm-svn: 350873
2019-01-10 19:02:14 +00:00
Sanjay Patel 9b368f39a9 [DAGCombiner] simplify code; NFC
llvm-svn: 350844
2019-01-10 16:47:42 +00:00
Nirav Dave cd18977add [SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI.
llvm-svn: 350841
2019-01-10 16:25:47 +00:00
Nirav Dave 4817c0e46c [SelectionDAGBuilder] Fix formatting. NFC.
llvm-svn: 350839
2019-01-10 16:22:19 +00:00
Neil Henning e85d45a699 [AMDGPU] Fix dwordx3/southern-islands failures.
This commit fixes the dwordx3/southern-islands failures that were found
in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not
generating the dwordx3 variants of load/store instructions that were
added to the ISA after southern islands.

Differential Revision: https://reviews.llvm.org/D56434

llvm-svn: 350838
2019-01-10 16:21:08 +00:00
Nirav Dave 57f2c14860 [SelectionDAGBuilder] Refactor visitInlineAsm. NFC.
llvm-svn: 350837
2019-01-10 16:18:18 +00:00
James Y Knight 62df5eed16 [opaque pointer types] Remove some calls to generic Type subtype accessors.
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).

I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.

llvm-svn: 350835
2019-01-10 16:07:20 +00:00
Alex Bradbury 6f302b8a69 [RISCV][MC] Add support for evaluating constant symbols as immediates
This further improves compatibility with GNU as, allowing input such as the
following to be assembled:

.equ CONST, 0x123456
li a0, CONST
addi a0, a0, %lo(CONST)

.equ CONST, 1
slli a0, a0, CONST

Note that we don't have perfect compatibility with gas, as it will avoid
emitting a relocation in this case:

addi a0, a0, %lo(CONST2)
.equ CONST2, 0x123456

Thanks to Shiva Chen for suggesting a better way to approach this during review.

Differential Revision: https://reviews.llvm.org/D52298

llvm-svn: 350831
2019-01-10 15:33:17 +00:00
Sanjay Patel 87ae1460f7 [x86] fix remaining miscompile bug in horizontal binop matching (PR40243)
When we use the partial-matching function on a 128-bit chunk, we must 
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.

This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243

llvm-svn: 350830
2019-01-10 15:27:23 +00:00
Sanjay Patel ed5cfc6792 [x86] fix horizontal binop matching for 256-bit vectors (PR40243)
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the 
existing (old) horizontal op matching function because it does not model the way 
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). 
A potential follow-up change for that is discussed in the bug report - see also D56490.

This generally duplicates a lot of the existing matching code, but we can't just remove 
that without introducing regressions, so the existing code is renamed and used less often. 
Follow-ups may try to reduce that overlap.

Differential Revision: https://reviews.llvm.org/D56450

llvm-svn: 350826
2019-01-10 15:04:52 +00:00
Bryan Chan 7ce5775e62 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
Andrea Di Biagio 97ed076dd1 [MCA] Fix wrong definition of ResourceUnitMask in DefaultResourceStrategy.
Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It
should have been a 64 bit quantity instead. That means, ResourceUnitMask was
always implicitly truncated to a 32 bit quantity.
This issue has been found by inspection. Surprisingly, that bug was latent, and
it never negatively affected any existing upstream targets.

This patch fixes  the wrong definition of ResourceUnitMask, and adds a bunch of
extra debug prints to help debugging potential issues related to invalid
processor resource masks.

llvm-svn: 350820
2019-01-10 13:59:13 +00:00
Sam Parker 2088a7536c [ARM] Fix for verifier buildbot
Copy the MachineOperand first and then change the flags instead of
making a copy.

llvm-svn: 350811
2019-01-10 10:47:23 +00:00
Fedor Sergeev b7871405fa [LoopUnroll] add parsing for unroll parameters in -passes pipeline
Allow to specify loop-unrolling with optional parameters explicitly
spelled out in -passes pipeline specification.
Introducing somewhat generic way of specifying parameters parsing via
FUNCTION_PASS_PARAMETRIZED pass registration.

Syntax of parametrized unroll pass name is as follows:
   'unroll<' parameter-list '>'

Where parameter-list is ';'-separate list of parameter names and optlevel
   optlevel: 'O[0-3]'
   parameter: { 'partial' | 'peeling' | 'runtime' | 'upperbound' }
   negated:  'no-' parameter

Example:
   -passes=loop(unroll<O3;runtime;no-upperbound>)

    this invokes LoopUnrollPass configured with OptLevel=3,
    Runtime, no UpperBound, everything else by default.

llvm-svn: 350808
2019-01-10 10:01:53 +00:00
Sam Parker 7208221452 [ARM] Size reduce teq to eors
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.

Differential Revision: https://reviews.llvm.org/D56255

llvm-svn: 350801
2019-01-10 08:36:33 +00:00
Craig Topper 5d20eb240f [X86] Disable DomainReassignment pass when AVX512BW is disabled to avoid injecting VK32/VK64 references into the MachineIR
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.

Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.

The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.

Fixes PR39741

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56460

llvm-svn: 350800
2019-01-10 07:43:54 +00:00
Zi Xuan Wu 64c956eea8 Recommit "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This re-commit r350685.

Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350799
2019-01-10 06:20:14 +00:00
Mandeep Singh Grang 859cb2e35d [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00
Thomas Lively fdd4999b86 Revert "[WebAssembly] Add simd128-unimplemented subtarget feature"
This reverts rL350791.

llvm-svn: 350795
2019-01-10 04:09:25 +00:00
Stanislav Mekhanoshin d3757d3f3a [AMDGPU] Separate feature dot-insts
Differential Revision: https://reviews.llvm.org/D56524

llvm-svn: 350793
2019-01-10 03:25:20 +00:00
Thomas Lively eb6f9abd41 [WebAssembly] Add simd128-unimplemented subtarget feature
This is a second attempt at r350778, which was reverted in
r350789. The only change is that the unimplemented-simd128 feature has
been renamed simd128-unimplemented, since naming it
unimplemented-simd128 somehow made the simd128 feature flag enable the
unimplemented-simd128 feature on Windows.

llvm-svn: 350791
2019-01-10 02:55:52 +00:00
Thomas Lively fdca5fab60 Revert "[WebAssembly] Add unimplemented-simd128 subtarget feature"
This reverts L350778.

llvm-svn: 350789
2019-01-10 01:37:44 +00:00
Craig Topper c38c9c120f [X86] After turning VSELECT into SHRUNKBLEND, make we push the VSELECT into the worklist so it can be deleted.
Found while trying to figure out why my second version of D56421 worked better than the first version. We weren't deleting the vselect in a timely fashion and that caused SimplfyDemandedBit to see an additional user.

The new version doesn't have this problem so this fix isn't needed there, but seemed like the right thing to do.

llvm-svn: 350781
2019-01-10 00:14:27 +00:00
Thomas Lively 2eeade1814 [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This replaces the old ad-hoc -wasm-enable-unimplemented-simd
flag. Also makes the new unimplemented-simd128 feature imply the
simd128 feature.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton

Differential Revision: https://reviews.llvm.org/D56501

llvm-svn: 350778
2019-01-09 23:59:37 +00:00
Evandro Menezes 224d831bed [llvm-mca] Display masks in hex
Display the resources masks as hexadecimal.  Otherwise, NFC.

llvm-svn: 350777
2019-01-09 23:57:15 +00:00
Eli Friedman d4e7a0d83c [SimplifyLibCalls] Fix memchr expansion for constant strings.
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]".  The expansion
was missing the conversion to unsigned char.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .

Differential Revision: https://reviews.llvm.org/D55947

llvm-svn: 350775
2019-01-09 23:39:26 +00:00
David Major 30ba0a0c95 Don't require a null terminator when loading objects
When a null terminator is required and the file size is a multiple of the system page size, MemoryBuffer will prefer pread() over mmap(), which can result in excessive memory usage.

Patch by Mike Hommey!

Differential Revision: https://reviews.llvm.org/D56475

llvm-svn: 350774
2019-01-09 23:36:32 +00:00
Heejin Ahn 569f090922 [WebAssembly] Print a debug message at the start of each pass
Summary:
Looks like many passes print its pass description as a debug message at
the start of each pass, so added that to (mostly newly added) other
passes as well.

Reviewers: dschuff

Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56142

llvm-svn: 350771
2019-01-09 23:05:21 +00:00
Easwaran Raman b45994b843 Refactor synthetic profile count computation. NFC.
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).

Reviewers: davidxl

Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits

Differential Revision: https://reviews.llvm.org/D56464

llvm-svn: 350755
2019-01-09 20:10:27 +00:00
Francis Visoiu Mistrih ac6454a7f6 [CodeGen] Ignore return sext/zext attributes of unused results for tail calls
If the caller's return type does not have a zeroext attribute but the
callee does a tail call zeroext, we won't consider the tail call during
CodeGenPrepare because the attributes don't match.

However, if the result of the tail call has no uses, it makes sense to
drop the sext/zext attributes.

Differential Revision: https://reviews.llvm.org/D56486

llvm-svn: 350753
2019-01-09 19:46:15 +00:00
Easwaran Raman ed279752f0 [Inliner] Assert that the computed inline threshold is non-negative.
Reviewers: chandlerc

Subscribers: haicheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56409

llvm-svn: 350751
2019-01-09 19:26:17 +00:00
David Callahan 3ef0f4447d refactor BlockFrequencyInfo::view to take a title parameter
Summary: All a non-default title for the debugging this debugging aide

Reviewers: twoh, Kader, modocache

Reviewed By: twoh

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56499

llvm-svn: 350749
2019-01-09 19:12:38 +00:00
Thomas Lively edb54b22d3 [WebAssembly] Standardize order of SIMD bitselect arguments
Summary:
For some reason the backend assumed that the condition mask would be
the first argument to the LLVM intrinsic, but everywhere else the
condition mask is the third argument.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56412

llvm-svn: 350746
2019-01-09 18:13:11 +00:00
Aleksandar Beserminji 8abf680424 [mips][micrompis] Emit 16bit NOPs by default
Emit 16bit NOPs by default.
Use 32bit NOPs in delay slots where necessary.

Differential https://reviews.llvm.org/D55323

llvm-svn: 350733
2019-01-09 15:58:02 +00:00
Valery Pykhtin b7a459547d Revert "[AMDGPU] Fix DPP combiner"
This reverts commit e3e2923a39cbec3b3bc3a7d3f0e9a77a4115080e, svn revision rL350721

llvm-svn: 350730
2019-01-09 15:21:53 +00:00
Kristof Beyls c650ff77eb Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.

Differential Revision: https://reviews.llvm.org/D55929

llvm-svn: 350729
2019-01-09 15:13:34 +00:00
Valery Pykhtin 1e0b5c719b [AMDGPU] Fix DPP combiner
Fixed issue with identity values and other cases, f32/f16 identity values to be added later. fma/mac instructions is disabled for now.
Test is fully reworked, added comments. Other fixes:

1. dpp move with uses and old reg initializer should be in the same BB.
2. bound_ctrl:0 is only considered when bank_mask and row_mask are fully enabled (0xF). Othervise the old register value is checked for identity.
3. Added add, subrev, and, or instructions to the old folding function.
4. Kill flag is cleared for the src0 (DPP register) as it may be copied into more than one user.

Differential revision: https://reviews.llvm.org/D55444

llvm-svn: 350721
2019-01-09 13:43:32 +00:00
Florian Hahn 9697d2a764 Revert r350647: "[NewPM] Port tsan"
This patch breaks thread sanitizer on some macOS builders, e.g.
http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/52725/

llvm-svn: 350719
2019-01-09 13:32:16 +00:00
Simon Pilgrim 5a7132ff0f [X86] Enable combining shuffles to PACKSS/PACKUS for 256/512-bit vectors
llvm-svn: 350716
2019-01-09 13:23:28 +00:00
Anton Korobeynikov c18e90369d [MSP430] Optimize 'shl x, 8[+ N] -> swpb(zext(x)) [<< N]' for i16
Perform additional simplification to reduce shift amount.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56016

llvm-svn: 350712
2019-01-09 13:03:01 +00:00
Anton Korobeynikov 9222ed4485 [MSP430] Fix crash while lowering llvm.stacksave/stackrestore
Perform the usual expansion of stacksave / restore intrinsics.
Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54890

llvm-svn: 350710
2019-01-09 12:52:15 +00:00
Diogo N. Sampaio 1eb31c8e94 [AArch64] Move feature predctrl to predres
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang

Differential Revision: https://reviews.llvm.org/D56484

llvm-svn: 350702
2019-01-09 11:24:15 +00:00
Simon Pilgrim 7ee86e8e81 [X86] Fix gcc7 -Wunused-but-set-variable warning. NFCI.
llvm-svn: 350701
2019-01-09 11:18:49 +00:00
David Stenberg 33b192d72b [DebugInfo] Omit location list entries with empty ranges
Summary:
This fixes PR39710. In that case we emitted a location list looking like
this:

.Ldebug_loc0:
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .short  1                       # Loc expr size
        .byte   85                      # DW_OP_reg5
        .quad   .Lfunc_begin0-.Lfunc_begin0
        .quad   .Lfunc_end0-.Lfunc_begin0
        .short  1                       # Loc expr size
        .byte   85                      # super-register DW_OP_reg5
        .quad   0
        .quad   0

As seen, the first entry's beginning and ending addresses evalute to 0,
which meant that the entry inadvertently became an "end of list" entry,
resulting in the location list ending sooner than expected.

To fix this, omit all entries with empty ranges. Location list entries
with empty ranges do not have any effect, as specified by DWARF, so we
might as well drop them:

"A location list entry (but not a base address selection or end of list
 entry) whose beginning and ending addresses are equal has no effect
 because the size of the range covered by such an entry is zero."

Reviewers: davide, aprantl, dblaikie

Reviewed By: aprantl

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55919

llvm-svn: 350698
2019-01-09 09:58:59 +00:00
Matt Arsenault 3dddb163dd GlobalISel: Implement fewerElements for implicit_def
llvm-svn: 350697
2019-01-09 07:51:52 +00:00
Matt Arsenault befee402ff GlobalISel: Implement widenScalar for implicit_def
llvm-svn: 350695
2019-01-09 07:34:14 +00:00
Max Kazantsev 4615a505f8 [IPT] Drop cache less eagerly in GVN and LoopSafetyInfo
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.

This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.

Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma

llvm-svn: 350694
2019-01-09 07:28:13 +00:00
Zi Xuan Wu f2a75eef41 Revert "[PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel"
This reverts commit r350685.

See compile assert in compiler-rt.

llvm-svn: 350693
2019-01-09 06:12:24 +00:00
Hiroshi Inoue dad8c6a1c9 [NFC] fix trivial typos in comments
llvm-svn: 350690
2019-01-09 05:11:10 +00:00
Craig Topper 2fa8e2d8a8 [X86] Correct the MaskVT for avx512 gather/scatter intrinsics to use the min of the number of index and data elements.
When the result type is v2i64/v2f64 and the index element size is i32, the index vector has two unused elements making the type v4i32. The mask VT should match the number of memory accesses that will be made.

This is consistent with the isel patterns used for the target independent gather/scatter intrinsic.

llvm-svn: 350687
2019-01-09 04:21:12 +00:00
Zi Xuan Wu 9479f6d72e [PowerPC] Fix assert from machine verify pass that unmatched register class about fcmp selection in fast-isel
Bad machine code: Illegal virtual register for instruction

function: TestULE
basic block: %bb.0 entry (0x1000a39b158)
instruction: %2:crrc = FCMPUD %1:vsfrc, %3:f8rc
operand 1: %1:vsfrc

Fix assert about missing match between fcmp instruction and register class. 
We should use vsx related cmp instruction xvcmpudp instead of fcmpu when vsx is opened.

add -verifymachineinstrs option into related test cases to enable the verify pass.


Differential Revision: https://reviews.llvm.org/D55686

llvm-svn: 350685
2019-01-09 02:31:10 +00:00
Stanislav Mekhanoshin ed0d6c60af Remove check for single use in ShrinkDemandedConstant
This removes check for single use from general ShrinkDemandedConstant
to the BE because of the AArch64 regression after D56289/rL350475.

After several hours of experiments I did not come up with a testcase
failing on any other targets if check is not performed.

Moreover, direct call to ShrinkDemandedConstant is not really needed
and superceed by SimplifyDemandedBits.

Differential Revision: https://reviews.llvm.org/D56406

llvm-svn: 350684
2019-01-09 02:24:22 +00:00
Matt Arsenault 0ad1b71fe3 RegisterCoalescer: Assume CR_Replace for SubRangeJoin
Currently it's possible for following
check on V.WriteLanes (which is not really meaningful
during SubRangeJoin) to pass for one half of the pair,
and then fall through to to one of the impossible
or unresolved states. This then fails as inconsistent
on the other half.

During the main range join, the check between V.WriteLanes
and OtherV.ValidLanes must have passed, meaning this
should be a CR_Replace.

Fixes most of the testcases in bugs 39542 and 39602

llvm-svn: 350678
2019-01-08 23:22:18 +00:00
Matt Arsenault 2c807410fd RegisterCoalescer: Defer clearing implicit_def lanes
We can't go back and recover the lanes if it turns
out the implicit_def really can't be erased.

Assume all lanes are valid if an unresolved conflict
is encountered. There aren't any tests where this
seems to matter either way, but this seems like a
safer option.

Fixes bug 39602

llvm-svn: 350676
2019-01-08 23:10:47 +00:00
Sanjay Patel d023dd60e9 [InstCombine] canonicalize another raw IR rotate pattern to funnel shift
This is matching the equivalent of the DAG expansion, 
so it should never end up with worse perf than the 
original code even if the target doesn't have a rotate
instruction.

llvm-svn: 350672
2019-01-08 22:39:55 +00:00
Rong Xu 016220549d [PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.

Differential Revision: https://reviews.llvm.org/D56327

llvm-svn: 350671
2019-01-08 22:39:47 +00:00
Heejin Ahn 321d522038 [WebAssembly] Rename StoreResults to MemIntrinsicResults
Summary:
StoreResults pass does not optimize store instructions anymore because
store instructions don't return results values anymore. Now this pass is
used solely for memory intrinsics, so update the pass name accordingly
and fix outdated pass descriptions as well.

This patch does not change any meaningful behavior, but not marked as
NFC because it changes a comment check line in a test case.

Reviewers: dschuff

Subscribers: mgorny, sbc100, jgravelle-google, sunfiish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56093

llvm-svn: 350669
2019-01-08 22:35:18 +00:00
Rong Xu d236b0aa93 [PGO] Revert r350442 to fix commit message.
Will re-commit it using the correct commit message.

llvm-svn: 350667
2019-01-08 22:33:29 +00:00
Evandro Menezes 39c97bf6cd [AArch64] Adjust the cost model for Exynos
Improve the modeling of ALU instructions.

llvm-svn: 350663
2019-01-08 22:29:58 +00:00
Evandro Menezes 5d780093fd [llvm-mca] Improve debugging (NFC)
llvm-svn: 350661
2019-01-08 22:29:38 +00:00
Zachary Turner 2fe4900525 [llvm-undname] Add support for demangling msvc's noexcept types.
Starting in C++17, MSVC introduced a new mangling for function
parameters that are themselves noexcept functions.  This patch
makes llvm-undname properly demangle them.

Patch by Zachary Henkel
Differential Revision: https://reviews.llvm.org/D55769

llvm-svn: 350656
2019-01-08 21:05:51 +00:00
Zachary Turner 4e83923d83 Don't write #include "Windows/WindowsSupport.h" from the Windows dir.
This generates -Wnonportable-include-dir warnings, and doesn't need
to be there.  It seems this was just checked in on accident.

llvm-svn: 350655
2019-01-08 21:05:34 +00:00
Adrian Prantl 8a753a2e5a Revert "Revert "Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files""""
This reverts commit D56084.

llvm-svn: 350654
2019-01-08 21:05:10 +00:00
Philip Pfaffe 82f995db75 [NewPM] Port tsan
A straightforward port of tsan to the new PM, following the same path
as D55647.

Differential Revision: https://reviews.llvm.org/D56433

llvm-svn: 350647
2019-01-08 19:21:57 +00:00
Paul Robinson 7402fd9a35 Rename DIFlagFixedEnum to DIFlagEnumClass. NFC
llvm-svn: 350641
2019-01-08 17:52:29 +00:00
Anna Thomas 2dfa412efe [UnrollRuntime] Fix domTree failures in multiexit unrolling
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).

Reviewers: reames, mzolotukhin, mkazantsev, hfinkel

Reviewed by: brzycki, kuhar

Subscribers: zzheng, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56284

llvm-svn: 350640
2019-01-08 17:16:25 +00:00
Yonghong Song 0d99031de0 [BPF] Fix .BTF.ext reloc type assigment issue
Commit f1db33c5c1a9 ("[BPF] Disable relocation for .BTF.ext section")
assigned relocation type R_BPF_NONE if the fixup type
is FK_Data_4 and the symbol is temporary.
The reason is we use FK_Data_4 as a fixup type
for insn offsets in .BTF.ext section.

Just checking whether the symbol is temporary is not enough.
For example, .debug_info may reference some strings whose
fixup is FK_Data_4 with a temporary symbol as well.

To truely reflect the case for .BTF.ext section,
this patch further checks that the section associateed with the symbol
must be SHF_ALLOC and SHF_EXECINSTR, i.e., in the text section.
This fixed the above-mentioned problem.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 350637
2019-01-08 16:36:06 +00:00
Florian Hahn c1ece1b41b [MachineVerifier] Include offending register in allocatable live-in error msg.
This patch adds a convenience report() method for physical registers and
uses it to print the offending register with the 'MBB has allocatable
live-in' error.

Reviewers: MatzeB, rtereshin, dsanders

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D55946

llvm-svn: 350630
2019-01-08 15:16:23 +00:00
Petr Pavlu bf4fdecc51 [GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with -global-isel=0
Commit rL347861 introduced an unintentional change in the behaviour when
compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly
disabling GlobalISel resulted in using FastISel but an updated condition
in the commit changed it to using SelectionDAG. The patch fixes this
condition and slightly better organizes the code that chooses the
instruction selector.

Fixes PR40131.

Differential Revision: https://reviews.llvm.org/D56266

llvm-svn: 350626
2019-01-08 14:19:06 +00:00
Philip Pfaffe efb5ad1c58 [DA][NewPM] Add a printerpass and port the testsuite
The new-pm version of DA is untested. Testing requires a printer, so
add that and use it in the existing DA tests.

Differential Revision: https://reviews.llvm.org/D56386

llvm-svn: 350624
2019-01-08 14:06:58 +00:00
Francis Visoiu Mistrih 7a6d7672c1 [X86][Darwin] Emit compact-unwind for register-sized stack adjustments
For stack frames on the size of a register in x86, a code size optimization
emits "push rax/eax" instead of "sub" for stack allocation. For example:

foo:
  .cfi_startproc
BB#0:
  pushq %rax
Ltmp0:
  .cfi_def_cfa_offset 16
  ...
  .cfi_endproc

However, we are falling back to DWARF in this case because we cannot
encode %rax as a saved register.

This requirement is wrong, since we don't care about the contents of
%rax, it is the equivalent of a sub.

In order to specify that we care about the contents of %rax, we would
need a .cfi_offset %rax, <offset>.

It's also overzealous in the case where there are pushes for callee saved
registers followed by a "push rax/eax" instead of "sub", in which case we should
also be able to encode the callee saved regs and everything else using compact
unwind.

Patch authored by Bruno Cardoso Lopes.

Differential Revision: https://reviews.llvm.org/D13793

llvm-svn: 350623
2019-01-08 13:53:15 +00:00
Lama Saba 32f08399eb Revert "Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"""
This reverts commit rL350497
reported remaining issues seem to be unrelated to modules or this change.
more info: https://reviews.llvm.org/D56084

llvm-svn: 350621
2019-01-08 13:30:36 +00:00
Tim Northover 964eea7ad2 AArch64: avoid splitting vector truncating stores.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.

Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.

llvm-svn: 350620
2019-01-08 13:30:27 +00:00
Benjamin Kramer a480523ce9 [GlobalISel] Fix unused variable warning in Release builds.
llvm-svn: 350618
2019-01-08 12:54:26 +00:00
Sam Parker 53000a74a5 [ARM] Add missing patterns for DSP muls
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.

Differential Revision: https://reviews.llvm.org/D55992

llvm-svn: 350613
2019-01-08 10:12:36 +00:00
Matt Arsenault c765240060 AMDGPU/GlobalISel: Introduce vcc reg bank
I'm not entirely sure this is the correct thing
to do with the global isel philosophy, but I think
this is necessary to handle how differently SGPRs
are used normally vs. from a condition.

For example, it makes sense to allow a copy
from a VGPR to an SGPR, but it makes no sense
to allow a copy from VGPRs to SGPRs used as
select mask.

This avoids regbankselecting strange code with
a truncate feeding directly into a condition field.
Now a copy is forced from sgpr(s1) to vcc, which is
more sensible to handle.

Some of these issues could probably avoided with making enough
operations resulting in i1 illegal. I think we can't avoid
this register bank for legality.

For example, an i1 and where one source is from a truncate, and
one source is a compare needs some kind of copy inserted to
make sure both are in condition registers.

llvm-svn: 350611
2019-01-08 06:30:53 +00:00
Thomas Lively 6a87ddac9a [WebAssembly] Massive instruction renaming
Summary:
An automated renaming of all the instructions listed at
https://github.com/WebAssembly/spec/issues/884#issuecomment-426433329
as well as some similarly-named identifiers.

Reviewers: aheejin, dschuff, aardappel

Subscribers: sbc100, jgravelle-google, eraman, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56338

llvm-svn: 350609
2019-01-08 06:25:55 +00:00
Robert Widmann 616ed17221 [LLVM-C] Allow For Creating a BasicBlock without a Parent Function
Summary: Add a utility function for creating a basic block without a parent function.  A useful operation for compilers that need to synthesize and conditionally insert code without having to bother with appending and immediately unlinking a block.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56279

llvm-svn: 350608
2019-01-08 06:24:19 +00:00
Robert Widmann 40dc48be0e [LLVM-C] Allow Specifying Signedness in Int Cast
Summary: Fix an old outstanding problem with the int cast builder binding always assuming the cast is signed by introducing a new LLVMBuildIntCast2 operation and deprecating the old prototype.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56280

llvm-svn: 350607
2019-01-08 06:23:22 +00:00
Mandeep Singh Grang f286bee9fe [MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc.
Summary: This patch is a follow-up to D55896.

Reviewers: efriedma, mstorsjo

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56029

llvm-svn: 350606
2019-01-08 04:48:00 +00:00
Chris Kennelly a97cad4642 [NFC] Remove empty line as a test commit.
llvm-svn: 350605
2019-01-08 04:04:51 +00:00
Matt Arsenault a1515d2d33 AMDGPU/GlobalISel: Legalize concat_vectors
llvm-svn: 350598
2019-01-08 01:30:02 +00:00
Matt Arsenault 376f2ef2f0 Fix typos
llvm-svn: 350597
2019-01-08 01:25:47 +00:00
Heejin Ahn e95056d69c [WebAssembly] Move CFG-changing passes before RegStackify
Summary:
FixIrreducibleControlFlow and LateEHPrepare both possibly modify CFG and
create new registers. There seems to be no reason these passes go after
register-related optimization passes (PrepareForLiveIntervals,
OptimizeLiveIntervals, StoreResults, RegStackify, and RegColoring), and
this also possibly create new optimization opportunities. I think we
should put all current and future optimization passes before RegStackify
(and related passes) unless there's a reason not to.

Reviewers: kripken

Subscribers: dschuff, sbc100, sunfish, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D56356

llvm-svn: 350596
2019-01-08 01:25:12 +00:00
Matt Arsenault adc40baa29 RegBankSelect: Fix copy insertion point for terminators
If a copy was needed to handle the condition of brcond, it was being
inserted before the defining instruction. Add tests for iterator edge
cases.

I find the existing code here suspect for the case where it's looking
for terminators that modify the register. It's going to insert a copy
in the middle of the terminators, which isn't allowed (it might be
necessary to have a COPY_terminator if anybody actually needs this).

Also legalize brcond for AMDGPU.

llvm-svn: 350595
2019-01-08 01:22:47 +00:00
Heejin Ahn 8e2bac8e7f [WebAssembly] Use 'I' multiclass template for br_table (NFC)
Summary:
We don't need to explicitly use `NI` anymore because we now don't use
`let` statements within the definitions.

Reviewers: aardappel

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56376

llvm-svn: 350594
2019-01-08 01:15:15 +00:00
Matt Arsenault ae6f1e07fc AMDGPU/GlobalISel: Disallow VGPR->SCC copies
This fixes using scalar adds when only the carry in is a VGPR
using greedy regbankselect.

llvm-svn: 350593
2019-01-08 01:13:20 +00:00
Matt Arsenault 68c668a5f3 AMDGPU/GlobalISel: RegBankSelect for carry-in
I'm not sure we should be allowing the truncate
to s1 for the inputs. It may be necessary to
create a new VCC reg bank.

llvm-svn: 350592
2019-01-08 01:09:09 +00:00
Matt Arsenault 2cc15b67b7 AMDGPU/GlobalISel: RegBankSelect for add/sub with carry out
llvm-svn: 350589
2019-01-08 01:03:58 +00:00
Matt Arsenault 299302fbe7 AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUES
llvm-svn: 350588
2019-01-08 00:46:19 +00:00
Wei Mi 2645fd0ece [RegisterCoalescer] dst register's live interval needs to be updated when
merging a src register in ToBeUpdated set.

This is to fix PR40061 related with https://reviews.llvm.org/rL339035.

In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.

In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.

The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.

Differential revision: https://reviews.llvm.org/D55867

llvm-svn: 350586
2019-01-08 00:26:11 +00:00
Davide Italiano bf1fdb852f [Verifier] Reject invalid type for DILocalVariable.
Reviewers: aprantl

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56414

llvm-svn: 350578
2019-01-07 23:09:09 +00:00
Craig Topper 486313b5f7 Recommit r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350567
2019-01-07 21:00:32 +00:00
Martin Storsjo 93a7137c0a [ObjectYAML] [COFF] Support multiple symbols with the same name
Differential Revision: https://reviews.llvm.org/D56294

llvm-svn: 350566
2019-01-07 20:55:33 +00:00
Craig Topper 81fe1fbf4a [X86][AutoUpgrade] Make some tweaks to reduce the number of nested if/else in the intrinsic upgrade code to avoid an MSVC compiler limit.
MSVC has a nesting limit of around 110-130. An if/else if/else if counts against this next level. The autoupgrade code consists a long chain of these checking matches against strings.

This commit moves some code to a helper function to move out a large if/else chain that was inside of one of the blocks into a separate function. There are more of these we could move or we could change some to lookup tables.

I've also merged together a few similar blocks in the outer chain. This should buy us some margin for a little bit.

llvm-svn: 350564
2019-01-07 20:13:45 +00:00
Craig Topper fad1589f39 Revert r350554 "[X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics."
The AutoUpgrade.cpp if/else cascade hit an MSVC limit again.

llvm-svn: 350562
2019-01-07 19:39:05 +00:00
Alina Sbirlea 12bbb4fe8d [MemorySSA] Add SkipSelfWalker.
Summary: Add implementation of SkipSelfWalker.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D56285

llvm-svn: 350561
2019-01-07 19:38:47 +00:00
Craig Topper 826f44b550 [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Differential Revision: https://reviews.llvm.org/D56087

llvm-svn: 350560
2019-01-07 19:30:43 +00:00
Alina Sbirlea bc8aa24c2f [MemorySSA] Refactor CachingWalker.
Summary:
Refactor caching walker to make creating a walker that skips the
starting access strightforward.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits, jfb

Differential Revision: https://reviews.llvm.org/D55957

llvm-svn: 350558
2019-01-07 19:22:37 +00:00
Craig Topper 9c4f7e9147 [X86] Remove AVX512VBMI2 concat and shift intrinsics. Replace with target independent funnel shift intrinsics.
Differential Revision: https://reviews.llvm.org/D56377

llvm-svn: 350554
2019-01-07 19:10:12 +00:00
Diogo N. Sampaio f192cdb5c9 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098

llvm-svn: 350553
2019-01-07 19:01:47 +00:00
Alina Sbirlea f723020456 [MemorySSA] Extend the clobber walker with the option to skip the starting access.
Summary:
The option enables loop transformations to hoist accesses that do not
have clobbers in the loop. If the clobber queries skips the starting
access, the result may be outside the loop instead of the header Phi.

Adding the walker that uses this option in a separate patch.

Reviewers: george.burgess.iv

Subscribers: sanjoy, jlebar, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D55944

llvm-svn: 350551
2019-01-07 18:40:27 +00:00
Nikita Popov 8dd19ed3ec Revert "[DemandedBits] Use SetVector for Worklist"
This reverts commit r350547.

Seeing assertion failures on clang tests.

llvm-svn: 350549
2019-01-07 18:15:11 +00:00
Nikita Popov 353d92decb [DemandedBits] Use SetVector for Worklist
DemandedBits currently uses a simple vector for the worklist, which
means that instructions may be inserted multiple times into it.
Especially in combination with the deep lattice, this may cause
instructions too be recomputed very often. To avoid this, switch
to a SetVector.

Differential Revision: https://reviews.llvm.org/D56362

llvm-svn: 350547
2019-01-07 18:03:36 +00:00
Rhys Perry f77e2e8406 AMDGPU: test for uniformity of branch instruction, not its condition
Summary:
If a divergent branch instruction is marked as divergent by propagation
rule 2 in DivergencePropagator::exploreSyncDependency() and its condition
is uniform, that branch would incorrectly be assumed to be uniform.

Reviewers: arsenm, tstellar

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56331

llvm-svn: 350532
2019-01-07 15:52:28 +00:00
Alexandre Ganea 90f4b94da3 [CodeView] More appropriate name and type for a Microsoft precompiled headers parameter. NFC
llvm-svn: 350520
2019-01-07 13:53:16 +00:00
Matt Arsenault 7a27c1886f AMDGPU: Remove v16i8 from register classes
llvm-svn: 350518
2019-01-07 13:31:55 +00:00
Matt Arsenault 369acb8470 AMDGPU: Remove VS/SV mappings from select
These would violate the constant bus restriction

llvm-svn: 350517
2019-01-07 13:21:36 +00:00
Chandler Carruth 90c09232a2 [CallSite removal] Move the rest of IR implementation code away from
`CallSite`.

With this change, the remaining `CallSite` usages are just for
implementing the wrapper type itself.

This does update the C API but leaves the names of that API alone and
only updates their implementation.

Differential Revision: https://reviews.llvm.org/D56184

llvm-svn: 350509
2019-01-07 07:31:49 +00:00
Chandler Carruth 57578aaf96 [CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and
update client code.

Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.

Differential Revision: https://reviews.llvm.org/D56183

llvm-svn: 350508
2019-01-07 07:15:51 +00:00
Chandler Carruth fee1a04d04 [CallSite removal] Move the verifier to use `CallBase` instead of the
`CallSite` wrapper.

Mostly mechanical, but I've tried to tidy up code where it made sense to
do so.

Differential Revision: https://reviews.llvm.org/D56143

llvm-svn: 350507
2019-01-07 07:02:34 +00:00
Chandler Carruth 363ac68374 [CallSite removal] Migrate all Alias Analysis APIs to use the newly
minted `CallBase` class instead of the `CallSite` wrapper.

This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.

Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.

Thanks for the review from Saleem!

Differential Revision: https://reviews.llvm.org/D55641

llvm-svn: 350503
2019-01-07 05:42:51 +00:00
Craig Topper 6ffeeb705f [X86] Add support for matching vector funnel shift to AVX512VBMI2 instructions.
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56361

llvm-svn: 350498
2019-01-06 18:10:18 +00:00
Lama Saba f385c21f79 Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files""
This reverts commit rL350493
issues related to modules  still appear in http://green.lab.llvm.org/green/job/lldb-cmake

llvm-svn: 350497
2019-01-06 16:39:14 +00:00
Sanjay Patel 8f12e8f3f6 [x86] explicitly set cost of integer add/sub
There are no test changes here in the existing cost model
regression tests because integer add/sub have a default
legal cost of 1 already. This would break, however, if
we custom lower those ops because the default cost model
assumes that custom-lowered ops are more expensive.

This is similar to the change in rL350403. See discussion
in D56011 for more details. When we enhance that patch to
handle integer ops, we need this cost model change to avoid
unintended diffs here from the custom lowering.

llvm-svn: 350496
2019-01-06 16:21:42 +00:00
Lama Saba ea9d555b83 Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"
Resubmitted in rL345290 and reverted in rL350345 due to failures in
http://green.lab.llvm.org/green/job/lldb-cmake/
Resubmitting after a workaround to lldb-cmake failure was
committed in rL350346, more info in https://reviews.llvm.org/D56084

llvm-svn: 350493
2019-01-06 15:45:40 +00:00
Craig Topper 57fc891c1b [LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that should be handled.
The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too.

llvm-svn: 350490
2019-01-06 07:06:35 +00:00
Craig Topper 1187991bcf [X86][AsmParser] Don't allow X86::DX in CheckBaseRegAndIndexRegAndScale.
This was here because out and in instructions allow '(%dx)' even though its not a memory reference. To handle this we build a special operand for the DX register reference before we get to the call to CheckBaseRegAndIndexRegAndScale. So we no longer need this special case.

llvm-svn: 350483
2019-01-05 23:30:28 +00:00
Craig Topper d0ba531a0c [X86] Use two pmovmskbs in combineBitcastvxi1 for (i64 (bitcast (v64i1 (truncate (v64i8)))) on KNL.
llvm-svn: 350481
2019-01-05 22:42:58 +00:00
Craig Topper 46f8b4a11e [X86] Allow combinevxi1Bitcast to use pmovmskb on avx512 targets if the input is a truncate from v16i8/v32i8.
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.

llvm-svn: 350480
2019-01-05 21:40:07 +00:00
Stanislav Mekhanoshin 35a3a3bd11 Added single use check to ShrinkDemandedConstant
Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink
source node to demanded bits only even if there are other uses.

Differential Revision: https://reviews.llvm.org/D56289

llvm-svn: 350475
2019-01-05 19:20:00 +00:00
Craig Topper 3f48dbf72e [X86] Allow LowerTRUNCATE to use PACKUS/PACKSS for v16i16->v16i8 truncate when -mprefer-vector-width-256 is in effect and BWI is not available.
llvm-svn: 350473
2019-01-05 18:48:11 +00:00
Nikita Popov 65038515ee [InstCombine] Relax cttz/ctlz with select on zero
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.

Differential Revision: https://reviews.llvm.org/D55786

llvm-svn: 350463
2019-01-05 09:48:16 +00:00
Easwaran Raman 366a873f14 [Inliner] Optimize shouldBeDeferred
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.

llvm-svn: 350456
2019-01-05 02:26:29 +00:00
Craig Topper 45ec002e25 [X86] Require second operand of X86vshiftuniform to be an integer. NFC
We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer.

llvm-svn: 350455
2019-01-05 01:40:29 +00:00
Evgeniy Stepanov 0184c53cbd Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)""
This reapplies commit r348983.

llvm-svn: 350448
2019-01-05 00:44:58 +00:00
Rong Xu b5fa0a89b2 [PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.

llvm-svn: 350442
2019-01-04 22:54:03 +00:00
Nikita Popov c35b4a37ba [X86] Fix warning; NFC
llvm-svn: 350437
2019-01-04 21:41:35 +00:00
Vyacheslav Zakharin 0a6f86c54b Update the pr_datasz of .note.gnu.property section.
Patch by Xiang Zhang.

Differential Revision: https://reviews.llvm.org/D56080

llvm-svn: 350436
2019-01-04 21:25:01 +00:00
Nikita Popov 6658fce4fc [BDCE] Remove dead uses of arguments
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.

I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.

Differential Revision: https://reviews.llvm.org/D56247

llvm-svn: 350435
2019-01-04 21:21:43 +00:00
Evandro Menezes 9f53bea536 [AArch64] Adjust the cost model for Exynos M3
Improve the modeling of ASIMD loads and stores.

llvm-svn: 350434
2019-01-04 21:02:25 +00:00
Craig Topper cfeb1cf9af [X86] Add INSERT_SUBVECTOR to ComputeNumSignBits
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.

My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.

Differential Revision: https://reviews.llvm.org/D56283

llvm-svn: 350432
2019-01-04 20:50:59 +00:00
Peter Collingbourne 87f477b5e4 hwasan: Implement lazy thread initialization for the interceptor ABI.
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.

The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.

This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.

Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.

Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.

Differential Revision: https://reviews.llvm.org/D56038

llvm-svn: 350429
2019-01-04 19:27:04 +00:00
Teresa Johnson 853b962416 [ThinLTO] Handle chains of aliases
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.

Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.

Reviewers: pcc, davidxl

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D54507

llvm-svn: 350423
2019-01-04 19:04:54 +00:00
Sanjay Patel 6153565511 [x86] lower extracted fadd/fsub to horizontal vector math; 2nd try
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.

Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350421
2019-01-04 17:48:13 +00:00
Vedant Kumar a1778df474 [CodeExtractor] Do not extract unsafe lifetime markers
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):

```
               entry:
              +------------+
              | x = alloca |
              | y = alloca |
              +------------+
             /              \
   lhs:                      rhs:
  +-------------------+     +-------------------+
  | lifetime_start(x) |     | lifetime_start(x) |
  | use(x)            |     | lifetime_start(y) |
  | lifetime_end(x)   |     | use(x, y)         |
  | lifetime_start(y) |     | lifetime_end(y)   |
  | use(y)            |     | lifetime_end(x)   |
  | lifetime_end(y)   |     +-------------------+
  +-------------------+
```

Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.

This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.

Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.

Fixes llvm.org/PR39671 (rdar://45939472).

Differential Revision: https://reviews.llvm.org/D55967

llvm-svn: 350420
2019-01-04 17:43:22 +00:00
Sanjay Patel 722466e1f1 [InstCombine] reduce raw IR narrowing rotate patterns to funnel shift
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.

llvm-svn: 350419
2019-01-04 17:38:12 +00:00
John Brawn 39ac159c24 [LICM] Adjust how moving the re-hoist point works
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.

Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.

Differential Revision: https://reviews.llvm.org/D55266

llvm-svn: 350408
2019-01-04 17:12:09 +00:00
Nirav Dave 1468d6e1c5 Undo r350355 "[X86] Remove terrible DX Register parsing hack in parse operand. NFCI."
Add missing test case and update comments.

llvm-svn: 350406
2019-01-04 17:11:15 +00:00
Simon Pilgrim c2054144ee [CostModel][X86] Fix SSE1 FADD/FSUB costs
Noticed in D56011 - handle the case that scalar fp ops are quicker on P3 than P4

Add the other costs so that we're not relying on the default "is legal/custom" cost logic.

llvm-svn: 350403
2019-01-04 16:55:57 +00:00
Ranjeet Singh 107dd2565c Revert patches 348835 and 348571 because they're
causing code size performance regressions.

llvm-svn: 350402
2019-01-04 16:39:10 +00:00
Simon Pilgrim 9f4dea8c06 [X86] Add VPSLLI/VPSRLI ((X >>u C1) << C2) SimplifyDemandedBits combine
Repeat of the generic SimplifyDemandedBits shift combine

llvm-svn: 350399
2019-01-04 15:43:43 +00:00
Andrea Di Biagio 3f4b54850f [MCA] Improved handling of in-order issue/dispatch resources.
Added field 'MustIssueImmediately' to the instruction descriptor of instructions
that only consume in-order issue/dispatch processor resources.
This speeds up queries from the hardware Scheduler, and gives an average ~5%
speedup on a release build.

No functional change intended.

llvm-svn: 350397
2019-01-04 15:08:38 +00:00
Florian Hahn 7902405c42 [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset
GetPointerBaseWithConstantOffset include this code, where ByteOffset
and GEPOffset are both of type llvm::APInt :

  ByteOffset += GEPOffset.getSExtValue();

The problem with this line is that getSExtValue() returns an int64_t, but
the += matches an overload for uint64_t. The problem is that the resulting
APInt is no longer considered to be signed. That in turn causes assertion
failures later on if the relevant pointer type is > 64 bits in width and
the GEPOffset was negative.

Changing it to

  ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth());

resolves the issue and explicitly performs the sign-extending
or truncation. Additionally, instead of asserting later if the result
is > 64 bits, it breaks out of the loop in that case.

See also
 https://reviews.llvm.org/D24729
 https://reviews.llvm.org/D24772

This commit must be merged after D38662 in order for the test to pass.

Patch by Michael Ferguson <mpfergu@gmail.com>.

Reviewers: reames, sanjoy, hfinkel

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D38501

llvm-svn: 350395
2019-01-04 14:53:22 +00:00
Andrea Di Biagio 7bec693433 [MCA] Store extra information about processor resources in the ResourceManager.
Method ResourceManager::use() is responsible for updating the internal state of
used processor resources, as well as notifying resource groups that contain used
resources.

Before this patch, method 'use()' didn't know how to quickly obtain the set of
groups that contain a particular resource unit. It had to discover groups by
perform a potentially slow search (done by iterating over the set of processor
resource descriptors).

With this patch, the relationship between resource units and groups is stored in
the ResourceManager. That means, method 'use()' no longer has to search for
groups. This gives an average speedup of ~4-5% on a release build.

This patch also adds extra code comments in ResourceManager.h to better describe
the resource mask layout, and how resouce indices are computed from resource
masks.

llvm-svn: 350387
2019-01-04 12:31:14 +00:00
Richard Trieu e1fef949ae [WebAssembly] Split the checking from the sorting logic.
Move the check for -1 and identical values outside the vector sorting code.
Compare functions need to be able to compare identical elements to be
conforming.

llvm-svn: 350379
2019-01-04 06:49:24 +00:00
Xin Tong 47beee2f3f [memcpyopt] Remove a few unnecessary isVolatile() checks. NFC
We already checked for isSimple() on the store.

llvm-svn: 350378
2019-01-04 02:13:22 +00:00
Craig Topper 6265a15f2e [X86] Add post-isel peephole to fold KAND+KORTEST into KTEST if only the zero flag is used.
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.

Differential Revision: https://reviews.llvm.org/D56246

llvm-svn: 350374
2019-01-04 00:10:58 +00:00
Sanjay Patel 26ce9c38a7 revert r350369: [x86] lower extracted fadd/fsub to horizontal vector math
There are non-codegen tests that need to be updated with this code change.

llvm-svn: 350373
2019-01-04 00:02:02 +00:00
Sanjay Patel ef4afca2ad [x86] lower extracted fadd/fsub to horizontal vector math
This would show up if we fix horizontal reductions to narrow as they go along, 
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.

We need to do this late to not interfere with other pattern matching of larger 
horizontal sequences.

We can extend this to integer ops in a follow-up patch.

Differential Revision: https://reviews.llvm.org/D56011

llvm-svn: 350369
2019-01-03 23:16:19 +00:00
Heejin Ahn 777d01c756 [WebAssembly] Optimize Irreducible Control Flow
Summary:
Irreducible control flow is not that rare, e.g. it happens in malloc and
3 other places in the libc portions linked in to a hello world program.
This patch improves how we handle that code: it emits a br_table to
dispatch to only the minimal necessary number of blocks. This reduces
the size of malloc by 33%, and makes it comparable in size to asm2wasm's
malloc output.

Added some tests, and verified this passes the emscripten-wasm tests run
on the waterfall (binaryen2, wasmobj2, other).

Reviewers: aheejin, sunfish

Subscribers: mgrang, jgravelle-google, sbc100, dschuff, llvm-commits

Differential Revision: https://reviews.llvm.org/D55467

Patch by Alon Zakai (kripken)

llvm-svn: 350367
2019-01-03 23:10:11 +00:00
Wouter van Oortmerssen 820c6263d9 [WebAssembly] Fixed disassembler not knowing about new brlist operand
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56227

llvm-svn: 350366
2019-01-03 23:01:30 +00:00
Wouter van Oortmerssen 9843295608 [WebAssembly] Made InstPrinter more robust
Summary:
Instead of asserting on certain kinds of malformed instructions, it
now still print, but instead adds an annotation indicating the
problem, and/or indicates invalid_type etc.

We're using the InstPrinter from many contexts that can't always
guarantee values are within range (e.g. the disassembler), where having
output is more valueable than asserting.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56223

llvm-svn: 350365
2019-01-03 22:59:59 +00:00
Nirav Dave 8de916d1a4 [X86] Remove terrible DX Register parsing hack in parse operand. NFCI.
Fold hack special casing of (%dx) operand parsing into the related
hack for out*/in* instruction parsing.

llvm-svn: 350355
2019-01-03 21:46:30 +00:00
Sanjay Patel 9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Alexander Timofeev 993e2798fd [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Detailed description: SIFoldOperands::foldInstOperand iterates over the
operand uses calling the function that changes def-use iteratorson the
way. As a result loop exits immediately when def-use iterator is
changed. Hence, the operand is folded to the very first use instruction
only. This makes VGPR live along the whole basic block and increases
register pressure significantly. The performance drop observed in SHOC
DeviceMemory test is caused by this bug.

Proposed fix: collect uses to separate container for further processing
in another loop.

Testing: make check-llvm
SHOC performance test.

Reviewers: rampitec, ronlieb

Differential Revision: https://reviews.llvm.org/D56161

llvm-svn: 350350
2019-01-03 19:55:32 +00:00
Anna Thomas a470aa6701 [UnrollRuntime] Move the DomTree verification under expensive checks
Suggested by Hal as done in r349871.

llvm-svn: 350349
2019-01-03 19:43:33 +00:00
Stefan Granitz a9b7ca472d Revert "Resubmit rL345008 "Split MachinePipeliner code into header and cpp files""
This reverts commit r350290.

llvm-svn: 350345
2019-01-03 19:09:24 +00:00
Kristina Brooks e434280f3d [MCStreamer] Use report_fatal_error in EmitRawTextImpl
Use report_fatal_error in MCStreamer::EmitRawTextImpl instead of
using errs() and explain the rationale behind it not being
llvm_unreachable() to save confusion for any future maintainers.

Differential Revision: https://reviews.llvm.org/D56245

llvm-svn: 350342
2019-01-03 18:42:31 +00:00
Anna Thomas 0785e7307e [UnrollRuntime] Add DomTree verification under debug mode
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.

llvm-svn: 350334
2019-01-03 17:44:44 +00:00
Evandro Menezes 0f67746c92 [AArch64] Add new scheduling predicates
Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode.

llvm-svn: 350332
2019-01-03 17:28:09 +00:00
Andrea Di Biagio b284054b26 [MCA] Improve code comment and reuse an helper function in ResourceManager. NFCI
llvm-svn: 350322
2019-01-03 14:47:46 +00:00
Alex Bradbury 2ba76be882 [RISCV][MC] Accept %lo and %pcrel_lo on operands to li
This matches GNU assembler behaviour.

llvm-svn: 350321
2019-01-03 14:41:41 +00:00
Philip Pfaffe b39a97c8f6 [NewPM] Port Msan
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.

Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
  library functions, for which it inserts declarations as necessary.
- Update tests.

Caveats:
- There is one test that I dropped, because it specifically tested the
  definition of the ctor.

Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka

Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji

Differential Revision: https://reviews.llvm.org/D55647

llvm-svn: 350305
2019-01-03 13:42:44 +00:00
Simon Pilgrim c2aadfaaad [SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123)
Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics

llvm-svn: 350300
2019-01-03 12:18:23 +00:00
Diogo N. Sampaio 8786a946d8 [ARM] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also renames FeatureSpecRestrict to FeatureSB.

Reviewed By: olista01, LukeCheeseman

Differential Revision: https://reviews.llvm.org/D55990

llvm-svn: 350299
2019-01-03 12:09:12 +00:00
Simon Pilgrim d824f99a6c [X86] Add ADD/SUB SSAT/USAT vector costs (PR40123)
Costs for real SSE2 instructions

llvm-svn: 350295
2019-01-03 11:38:42 +00:00
Piotr Sobczak 3abef8f9ea [AMDGPU] Change section name with metadata access
Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".

This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.

Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668

Reviewers: kzhuravl

Reviewed By: kzhuravl

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D56197

llvm-svn: 350292
2019-01-03 11:22:58 +00:00
Lama Saba 4d752a88e8 Resubmit rL345008 "Split MachinePipeliner code into header and cpp files"
The commit caused unclear failures in http://green.lab.llvm.org/green//job/lldb-cmake/
will revert if the error reappears

Differential Revision: https://reviews.llvm.org/D56084

llvm-svn: 350290
2019-01-03 10:03:54 +00:00
Markus Lavin 72b9deb21f [CodeGen] Skip over dbg-instr in twoaddr pass
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.

Differential Revision: https://reviews.llvm.org/D55987

llvm-svn: 350289
2019-01-03 08:36:06 +00:00
Martin Storsjo 74e7d26090 [llvm-readobj] [COFF] Print the symbol index for relocations
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.

Differential Revision: https://reviews.llvm.org/D56140

llvm-svn: 350288
2019-01-03 08:08:23 +00:00
Kristina Brooks bbbec9daa4 Don't go over 80 chars in MCStreamer.cpp. NFC.
Fixing up style issues around the area to prepare for
a larger differential.

llvm-svn: 350286
2019-01-03 06:06:38 +00:00
QingShan Zhang f24ec7bdd0 [Power9] Enable the Out-of-Order scheduling model for P9 hw
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.

This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.

With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:

x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%

And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved. 

Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810

llvm-svn: 350285
2019-01-03 05:04:18 +00:00
Pete Cooper 697281df42 Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.

Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.

This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.

rdar://problem/47005143

Reviewed By: ahatanak

Differential Revision: https://reviews.llvm.org/D56235

llvm-svn: 350284
2019-01-03 01:38:08 +00:00
Robert Widmann 7882b283cd [LLVM-C] Expand LLVMRelocMode
Summary: Add read[only|write] PIC relocation models to the C API and teach the TargetMachine API about it.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56187

llvm-svn: 350279
2019-01-03 00:33:44 +00:00
Craig Topper df5304d8de [X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL.
The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX.

llvm-svn: 350272
2019-01-02 23:24:08 +00:00
Wouter van Oortmerssen ad72f68501 [WebAssembly] made assembler parse block_type
Summary:
This was previously ignored and an incorrect value generated.

Also fixed Disassembler's handling of block_type.

Reviewers: dschuff, aheejin

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D56092

llvm-svn: 350270
2019-01-02 23:23:51 +00:00
Xin Tong 33e3b4b9b3 [ThinLTO] Scan all variants of vague symbol for reachability.
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.

Reviewers: tejohnson, grimar

Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D56117

llvm-svn: 350269
2019-01-02 23:18:20 +00:00
Pete Cooper 8d58048024 Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef.
The caller to EraseInstruction had this conditional:

    // ARC calls with null are no-ops. Delete them.
    if (IsNullOrUndef(Arg))

but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.

This adds support for both of these cases.

rdar://problem/47003805

llvm-svn: 350261
2019-01-02 21:00:02 +00:00
Nikita Popov cc6ef7f153 [BDCE] Remove instructions without demanded bits
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.

Differential Revision: https://reviews.llvm.org/D56185

llvm-svn: 350257
2019-01-02 20:02:14 +00:00
Pawel Bylica 119aa8fa5f Format AggresiveInstCombine.cpp. NFC
llvm-svn: 350255
2019-01-02 19:51:46 +00:00
Craig Topper 9d4860ec4e [X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.

This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.

I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.

This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.

Differential Revision: https://reviews.llvm.org/D55975

llvm-svn: 350245
2019-01-02 19:01:05 +00:00
Zachary Turner ba797b6dae [MS Demangler] Add a flag for dumping types without tag specifier.
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc.  This patch adds a
flag enabling this.

llvm-svn: 350241
2019-01-02 18:33:12 +00:00
Craig Topper 8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper 3109f3a4ab [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.

By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.

This fixes the regression on X86 in D56156.

Differential Revision: https://reviews.llvm.org/D56176

llvm-svn: 350236
2019-01-02 17:58:30 +00:00
Craig Topper c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Wei Mi ecc89b76cb [PowerPC] Remove SeenUse check when optimizing conditional branch in
PPCPreEmitPeephole pass.

PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.

When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.

The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.

Differential Revision: https://reviews.llvm.org/D56041

llvm-svn: 350223
2019-01-02 17:07:23 +00:00
Simon Pilgrim d8125726d5 [X86] Support SHLD/SHRD masked shift-counts (PR34641)
Peek through shift modulo masks while matching double shift patterns.

I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.

Differential Revision: https://reviews.llvm.org/D56199

llvm-svn: 350222
2019-01-02 17:05:37 +00:00
Hal Finkel 4f2381440d [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).

Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).

After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).

As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.

Patch by me with contributions from Michael Ferguson (mferguson@cray.com).

Differential Revision: https://reviews.llvm.org/D38662

llvm-svn: 350220
2019-01-02 16:28:09 +00:00
Philip Pfaffe 6bc98ad7e8 Extend Module::getOrInsertGlobal to control the construction of the
GlobalVariable

Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.

Reviewers: chandlerc

Subscribers: hiraditya, llvm-commits, bollu

Differential Revision: https://reviews.llvm.org/D56130

llvm-svn: 350219
2019-01-02 15:41:47 +00:00
Andrea Di Biagio 0682afbaee [MCA] Minor refactoring of method DefaultResourceStrategy::select. NFCI
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.

The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.

No functional change intended.

llvm-svn: 350218
2019-01-02 15:40:52 +00:00
Piotr Sobczak 378131bae0 [AMDGPU] Handle OR as operand of raw load/store
Summary:
Use isBaseWithConstantOffset() which handles OR as an operand
to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store.

Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a

Reviewers: nhaehnle, mareko, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55999

llvm-svn: 350208
2019-01-02 09:47:41 +00:00
Craig Topper f7cc7e3201 [X86] Remove the separate SMUL8/UMUL8 X86ISD opcodes by merging with SMUL/UMUL. Remove the second result from X86ISD::UMUL.
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.

llvm-svn: 350206
2019-01-02 06:40:11 +00:00
Craig Topper d4db122483 [X86] Allow LowerSELECT and LowerBRCOND to directly lower i8 UMULO/SMULO.
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.

Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll

llvm-svn: 350205
2019-01-02 05:46:03 +00:00
Sanjay Patel 654e6aabb9 [InstCombine] canonicalize raw IR rotate patterns to funnel shift
The final piece of IR-level analysis to allow this was committed with:
rL350188

Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.

The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.

llvm-svn: 350199
2019-01-01 21:51:39 +00:00
Craig Topper 00b390a000 [X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.

This is inspired by the helper functions used by ARM and AArch64 for the same purpose.

The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.

llvm-svn: 350198
2019-01-01 19:34:11 +00:00
Robert Widmann db5b537f1e [LLVM-C] bool -> LLVMBool
llvm-svn: 350197
2019-01-01 19:03:37 +00:00
Robert Widmann 5d1dfa3eb6 [LLVM-C] Add Accessors for Discarding Value Names in the IR
Summary: Add accessors so the performance improvement from this setting is accessible to third parties.

Reviewers: whitequark, deadalnix

Reviewed By: whitequark

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56179

llvm-svn: 350196
2019-01-01 18:56:51 +00:00
Sanjay Patel 738a863648 [x86] move/rename helper for horizontal op codegen; NFC
Preliminary commit as suggested in D56011.

llvm-svn: 350193
2019-01-01 16:08:36 +00:00
Nikita Popov bc9986e9ad Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 350188
2019-01-01 10:05:26 +00:00
Ayonam Ray e00606a1b2 Reversing the commit in revision 350186. Revision causes regression in 4
tests.

llvm-svn: 350187
2019-01-01 07:28:55 +00:00
Ayonam Ray c471bb2e67 Omit range checks from jump tables when lowering switches with unreachable
default

During the lowering of a switch that would result in the generation of a jump
table, a range check is performed before indexing into the jump table, for the
switch value being outside the jump table range and a conditional branch is
inserted to jump to the default block. In case the default block is
unreachable, this conditional jump can be omitted. This patch implements
omitting this conditional branch for unreachable defaults.

Review Reference: D52002

llvm-svn: 350186
2019-01-01 06:37:50 +00:00
Chen Zheng 4952e668f8 [InstCombine] canonicalize MUL with NEG operand
-X * Y --> -(X * Y)
X * -Y --> -(X * Y)

Differential Revision: https://reviews.llvm.org/D55961

llvm-svn: 350185
2019-01-01 01:09:20 +00:00
Craig Topper ed3ffae4a4 [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.
Differential Revision: https://reviews.llvm.org/D56168

llvm-svn: 350179
2018-12-31 19:09:30 +00:00
Craig Topper bb0873cf46 [X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode.
Differential Revision: https://reviews.llvm.org/D56169

llvm-svn: 350178
2018-12-31 19:09:27 +00:00
Simon Pilgrim f2b9d10477 Keep tablegen commands in alphabetical order. NFCI.
Mentioned on D56167.

llvm-svn: 350176
2018-12-31 14:51:53 +00:00
Martin Storsjo 74d93f9b24 [AArch64] Accept "sve" as arch feature in assembler
Differential Revision: https://reviews.llvm.org/D56128

llvm-svn: 350174
2018-12-31 10:22:04 +00:00
Alexander Potapenko cea4f83371 [MSan] Handle llvm.is.constant intrinsic
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.

llvm-svn: 350173
2018-12-31 09:42:23 +00:00
Craig Topper 802c4979ae [DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform.
Found while trying out some other changes so I don't really have a test case.

llvm-svn: 350172
2018-12-31 05:40:46 +00:00
Martin Storsjo 2018777836 [AArch64] Implement the .arch_extension directive
Differential Revision: https://reviews.llvm.org/D56131

llvm-svn: 350169
2018-12-30 21:06:32 +00:00
Kang Zhang 9d78c60bf4 [PowerPC] Fix machine verify pass error for PATCHPOINT pseudo instruction that bad machine code
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will 
be reserved later, but the verifier doesn't know that).

So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know 
X2 is a reserved register.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D56148

llvm-svn: 350165
2018-12-30 15:13:51 +00:00
David Bolvansky 90004149cc [NFC] Fixed extra semicolon warning
-This line, and those below, will be ignored--

M    lib/Support/Error.cpp

llvm-svn: 350162
2018-12-30 13:18:17 +00:00