Commit Graph

22929 Commits

Author SHA1 Message Date
Craig Topper 5d7ece8895 bar
llvm-svn: 299619
2017-04-06 04:02:31 +00:00
Craig Topper faf5a8553c foo
llvm-svn: 299618
2017-04-06 04:02:28 +00:00
Adam Nemet d5ffdd3605 [DAGCombine] Support FMF contract in fused multiple-and-sub too
This is a follow-on to r299096 which added support for fmadd.

Subtract does not have the case where with two multiply operands we commute in
order to fuse with the multiply with the fewer uses.

llvm-svn: 299572
2017-04-05 17:58:48 +00:00
Adam Nemet 99e347fc35 [DAGCombine] Remove commented-out code from r299096
llvm-svn: 299571
2017-04-05 17:58:44 +00:00
Keno Fischer 4ecee77c9a [ExecutionDepsFix] Don't recurse over the CFG
Summary:
Use an explicit work queue instead, to avoid accidentally
causing stack overflows for input with very large CFGs.

Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D31681

llvm-svn: 299569
2017-04-05 17:42:56 +00:00
Sanjay Patel b2f1621bb1 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483

llvm-svn: 299542
2017-04-05 14:09:39 +00:00
Jonas Paulsson 38a2da92bc [DAGCombiner] Don't make a BUILD_VECTOR with operands of illegal type.
When DAGCombiner visits a SIGN_EXTEND_INREG of a BUILD_VECTOR with
constant operands, a new BUILD_VECTOR node will be created transformed
constants.

Llvm-stress found a case where the new BUILD_VECTOR had constant operands
of an illegal type, because the (legal) element type is in fact not a legal
scalar type.

This patch changes this so that the new BUILD_VECTOR has the same operand
type as the old one.

Review: Eli Friedman, Nirav Dave
https://bugs.llvm.org//show_bug.cgi?id=32422

llvm-svn: 299540
2017-04-05 13:45:37 +00:00
Matt Arsenault 7b0d947404 Allow targets to opt-in to codegen in SCC order
Decouple this setting from EnableIRPA.

To support function calls on AMDGPU, it is necessary to
report the global register usage throughout the kernel's
call graph, so callees need to be handled first.

llvm-svn: 299487
2017-04-04 23:44:46 +00:00
Keno Fischer 282c62495f [ExecutionDepsFix] Don't revisit true dependencies
If an instruction has a true dependency, it makes sense for to use that
register for any undef read operands in the same instruction (we'll have
to wait for that register to become available anyway). This logic
was already implemented. However, the code would then still try to
revisit that instruction and break the dependency (and always fail,
since by definition a true dependency has to be live before the
instruction). Avoid revisiting such instructions as a performance
optimization. No functional change.

Differential Revision: https://reviews.llvm.org/D30173

llvm-svn: 299467
2017-04-04 20:30:47 +00:00
Daniel Sanders bee5739a7c [tablegen][globalisel] Add support for nested instruction matching.
Summary:
Lift the restrictions that prevented the tree walking introduced in the
previous change and add support for patterns like:
  (G_ADD (G_MUL (G_SEXT $src1), (G_SEXT $src2)), $src3) -> SMADDWrrr $dst, $src1, $src2, $src3
Also adds support for G_SEXT and G_ZEXT to support these cases.

One particular aspect of this that I should draw attention to is that I've
tried to be overly conservative in determining the safety of matches that
involve non-adjacent instructions and multiple basic blocks. This is intended
to be used as a cheap initial check and we may add a more expensive check in
the future. The current rules are:
* Reject if any instruction may load/store (we'd need to check for intervening
  memory operations.
* Reject if any instruction has implicit operands.
* Reject if any instruction has unmodelled side-effects.
See isObviouslySafeToFold().

Reviewers: t.p.northover, javed.absar, qcolombet, aditya_nandakumar, ab, rovka

Reviewed By: ab

Subscribers: igorb, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30539

llvm-svn: 299430
2017-04-04 13:25:23 +00:00
Matt Arsenault c82768290d DAG: Fix missing legalization for any_extend_vector_inreg operands
llvm-svn: 299389
2017-04-03 21:28:13 +00:00
Jun Bum Lim dee5565869 [CodeGenPrep] move aarch64-type-promotion to CGP
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.

Reviewers: jmolloy, mcrosier, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: llvm-commits, aemerson, rengolin, mcrosier

Differential Revision: https://reviews.llvm.org/D28680

llvm-svn: 299379
2017-04-03 19:20:07 +00:00
Craig Topper 3882613956 [DAGCombine][InstCombine] Fix inverted if condition in equivalent comments in DAGCombine and InstCombine. NFC
llvm-svn: 299378
2017-04-03 19:18:48 +00:00
Zvi Rackover d76a4d0ac6 Revert "[DAGCombine] A shuffle of a splat is always the splat itself"
This reverts commit r299047 which is incorrect because the
simplification may result in incorrect propogation of undefs to users of
the folded shuffle.

Thanks to Andrea Di Biagio for pointing this out.

llvm-svn: 299368
2017-04-03 17:41:19 +00:00
Craig Topper d33ee1b960 [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Simon Pilgrim 9daf9c047d [DAGCombiner] Check limits before accessing array element (PR32502)
llvm-svn: 299361
2017-04-03 15:27:49 +00:00
Sanjay Patel 665021e7ee [DAGCombiner] enable vector transforms for any/all {sign} bits set/clear
The code already allowed vector types in via "isInteger" (which might want
a more specific name), so use splat-friendly constant predicates to match
those types.

llvm-svn: 299304
2017-04-01 15:05:54 +00:00
Craig Topper 73250168e7 [DAGCombiner] Fix fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask) to explicitly ensure that only one of the inputs of each shuffle is a zero vector.
This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros.

Fixes PR32484.

llvm-svn: 299291
2017-04-01 04:26:20 +00:00
Quentin Colombet fc8f048c13 Revert "Localizer fun"
This reverts commit r299283.

Didn't intend to commit this :(

llvm-svn: 299287
2017-04-01 01:26:21 +00:00
Quentin Colombet 35a47010b1 Revert "Instrument SDISel C++ patterns"
This reverts commit r299284.

Didn't intend to commit this :(

llvm-svn: 299286
2017-04-01 01:26:17 +00:00
Quentin Colombet 7f64318938 [RegBankSelect] Support REG_SEQUENCE for generic mapping
REG_SEQUENCE falls into the same category as COPY for operands mapping:
- They don't have MCInstrDesc with register constraints
- The input variable could use whatever register classes
- It is possible to have register class already assigned to the operands

In particular, given REG_SEQUENCE are always target specific because of
the subreg indices. Those indices must apply to the register class of
the definition of the REG_SEQUENCE and therefore, the target must set a
register class to that definition. As a result, the generic code can
always use that register class to derive a valid mapping for a
REG_SEQUENCE.

llvm-svn: 299285
2017-04-01 01:26:14 +00:00
Quentin Colombet b43da15602 Instrument SDISel C++ patterns
llvm-svn: 299284
2017-04-01 01:21:32 +00:00
Quentin Colombet 3c40b366c5 Localizer fun
WIP

llvm-svn: 299283
2017-04-01 01:21:28 +00:00
Sanjay Patel 16d458ea0d [DAGCombiner] refactor and/or-of-setcc to get rid of duplicated code; NFCI
llvm-svn: 299266
2017-03-31 21:30:50 +00:00
Sanjay Patel 34da36e74f [DAGCombiner] add fold for 'All sign bits set?'
(and (setlt X,  0), (setlt Y,  0)) --> (setlt (and X, Y),  0)

We have 7 similar folds, but this one got away. The fact that the
x86 test with a branch didn't change is probably a separate bug. We
may also be missing this and the related folds in instcombine.

llvm-svn: 299252
2017-03-31 20:28:06 +00:00
Sanjay Patel 61d3409535 [DAGCombiner] remove redundant code and add comments; NFCI
llvm-svn: 299241
2017-03-31 18:18:58 +00:00
Jan Sjodin f1a30f1800 Refactor code to create getFallThrough method in MachineBasicBlock.
Differential Revision: https://reviews.llvm.org/D27264

llvm-svn: 299227
2017-03-31 15:55:37 +00:00
Simon Pilgrim 1cdbfe44b1 [DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT
Followup to D31311

llvm-svn: 299221
2017-03-31 14:21:50 +00:00
Simon Pilgrim 3c81c34d8d [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

llvm-svn: 299219
2017-03-31 13:54:09 +00:00
Simon Pilgrim 37b536e4b3 [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

llvm-svn: 299201
2017-03-31 11:24:16 +00:00
Simon Pilgrim 6bdc755519 Spelling mistakes in comments. NFCI.
llvm-svn: 299197
2017-03-31 10:59:37 +00:00
Peter Collingbourne c66018e247 Move llvm::emitLinkerFlagsForGlobalCOFF() to Mangler.
llvm-svn: 299183
2017-03-31 04:46:50 +00:00
Peter Collingbourne 2e0ffe9858 Move llvm::canBeOmittedFromSymbolTable() to Analysis.
llvm-svn: 299182
2017-03-31 04:46:31 +00:00
Eric Christopher b9c56d1235 getPristineRegs is not accurately considering shrink wrapping puts
registers not saved in certain blocks. Use explicit getCalleeSavedInfo
and isLiveIn instead.

This fixes pr32292.

Patch by Tim Shen!

llvm-svn: 299124
2017-03-30 22:34:20 +00:00
Adam Nemet edaec6de73 [DAGCombiner] Initial support for the fast-math flag contract
Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD
can also use the per operation FMF to allow fusion.

The idea here is not to port everything to the new scheme (e.g. fused
multiply-and-sub will be ported later) but that this work all the way from
clang.

The transformation is conditionalized on *both* the FADD and the FMUL having
the FMF contract flag.

Differential Revision: https://reviews.llvm.org/D31169

llvm-svn: 299096
2017-03-30 18:53:04 +00:00
Ahmed Bougacha 6dd6082472 [CodeGen] Pass SDAG an ORE, and replace FastISel stats with remarks.
In the long-term, we want to replace statistics with something
finer-grained that lets us gather per-function data.
Remarks are that replacement.

Create an ORE instance in SelectionDAGISel, and pass it to
SelectionDAG.

SelectionDAG was used so that we can emit remarks from all
SelectionDAG-related code, including TargetLowering and DAGCombiner.
This isn't used in the current patch but Adam tells me he's interested
for the fp-contract combines.

Use the ORE instance to emit FastISel failures as remarks (instead of
the mix of dbgs() dumps and statistics that we currently have).

Eventually, we want to have an API that tells us whether remarks are
enabled (http://llvm.org/PR32352) so that we don't emit expensive
remarks (in this case, dumping IR) when it's not needed.  For now, use
'isEnabled' as a crude replacement.

This does mean that the replacement for '-fast-isel-verbose' is now
'-pass-remarks-missed=isel'.  Additionally, clang users also need to
enable remark diagnostics, using '-Rpass-missed=isel'.

This also removes '-fast-isel-verbose2': there are no static statistics
that we want to only enable in asserts builds, so we can always use
the remarks regardless of the build type.

Differential Revision: https://reviews.llvm.org/D31405

llvm-svn: 299093
2017-03-30 17:49:58 +00:00
Sanjay Patel 6d5ba061f8 [DAGCombiner] add helper function for visitORLike; NFCI
This combines all of the equivalent clean-ups for foldAndOfSetCCs:
https://reviews.llvm.org/rL298938
https://reviews.llvm.org/rL298940
https://reviews.llvm.org/rL298944
https://reviews.llvm.org/rL298949
https://reviews.llvm.org/rL298950
https://reviews.llvm.org/rL299002
https://reviews.llvm.org/rL299013

The sins of code duplication are on full display here:
each function is missing a fold that wasn't copied over from its logical sibling. 

llvm-svn: 299091
2017-03-30 17:32:42 +00:00
Simon Pilgrim 68168d17b9 Spelling mistakes in comments. NFCI.
Based on corrections mentioned in patch for clang for PR27635

llvm-svn: 299072
2017-03-30 12:59:53 +00:00
Craig Topper eafcbe2d10 [APInt] Remove references to integerPartWidth outside of APFloat implentation.
Turns out integerPartWidth only explicitly defines the width of the tc functions in the APInt class. Functions that aren't used by APInt implementation itself. Many places in the code base already assume APInt is made up of 64-bit pieces. Explicitly assuming 64-bit here doesn't make that situation much worse. A full audit would need to be done if it ever changes.

llvm-svn: 299059
2017-03-30 05:49:03 +00:00
Zvi Rackover 7569436f81 [DAGCombine] A shuffle of a splat is always the splat itself
Summary:
Add a simplification:
shuffle (splat-shuffle), undef, M --> splat-shuffle

Fixes pr32449

Patch by Sanjay Patel

Reviewers: eli.friedman, RKSimon, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31426

llvm-svn: 299047
2017-03-30 01:42:57 +00:00
Eric Christopher 9ea300f08d If the DIUnit has flags passed on it then have DW_AT_producer be a combination of DICompileUnit::Producer and Flags.
The darwin behavior is unchanged and will continue to use DW_AT_APPLE_flags.

Patch by Zhizhou Yang

llvm-svn: 299038
2017-03-29 23:34:27 +00:00
Davide Italiano cdcdc97879 [DAGCombiner] Remove else after return. NFCI.
llvm-svn: 299022
2017-03-29 19:39:46 +00:00
Sanjay Patel ff211bb5a6 [DAGCombiner] unify type checks and add asserts; NFCI
We had a mix of type checks and usage that wasn't very clear.

llvm-svn: 299013
2017-03-29 18:08:01 +00:00
Sanjay Patel 087e922328 [DAGCombiner] reduce code duplication by rearranging checks; NFCI
llvm-svn: 299002
2017-03-29 15:37:33 +00:00
Sven van Haastregt 04bfa87f12 [MachineVerifier] Drop a spurious const
As of r298987 the argument is a value that we std::move, so it
shouldn't be const anymore.

llvm-svn: 298999
2017-03-29 15:25:06 +00:00
Sven van Haastregt 039a6d9f9f [MachineVerifier] Avoid reference to nullptr
Instantiation of the MachineVerifierPass through
PassInfo::getNormalCtor would yield a segfault since the default
constructor of the MachineVerifierPass takes a reference to nullptr.

Patch by Simone Pellegrini.

Differential Revision: https://reviews.llvm.org/D31387

llvm-svn: 298987
2017-03-29 09:08:25 +00:00
Adam Nemet 92a5cf4366 [SDAG] Remove -enable-fmf-dag
This is no longer needed as spotted by Sanjay in
https://reviews.llvm.org/D31165.

llvm-svn: 298963
2017-03-28 23:46:14 +00:00
Adam Nemet 6820f391eb [SDAG] Add AllowContract to SNodeFlags
Properly propagate the FMF from the LLVM IR to this flag.

This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.

Differential Revision: https://reviews.llvm.org/D31165

llvm-svn: 298961
2017-03-28 23:46:08 +00:00
Sanjay Patel a41a5c29f0 [DAGCombiner] reduce code duplication with local variables; NFCI
llvm-svn: 298954
2017-03-28 22:45:53 +00:00
Sanjay Patel 9747d8070b [DAG] fix formatting; NFC
llvm-svn: 298950
2017-03-28 22:25:25 +00:00
Sanjay Patel d832eddde5 [DAGCombiner] remove redundant conditions and duplicated code; NFCI
llvm-svn: 298949
2017-03-28 22:22:50 +00:00
Sanjay Patel d2a26db991 [DAGCombiner] rename variables in foldAndOfSetCCs for easier reading; NFCI
llvm-svn: 298944
2017-03-28 21:40:41 +00:00
Matt Arsenault 323b021b5e Fix crashing on TargetCustom PseudoSourceValues
Default to something more reasonable if printCustom isn't implemented.

llvm-svn: 298941
2017-03-28 20:33:12 +00:00
Sanjay Patel 3230e4be11 [DAGCombiner] clean up foldAndOfSetCCs; NFCI
1. Fix bogus comment.
2. Early exit to reduce indent.
3. Change node pointer param to what it really is: an SDLoc.

llvm-svn: 298940
2017-03-28 20:28:16 +00:00
Sanjay Patel 16af53a395 [DAGCombiner] add helper function for and-of-setcc folds; NFC
This is just a cut and paste followed by clang-format. Clean up to follow.

llvm-svn: 298938
2017-03-28 19:58:46 +00:00
Sanjay Patel f01a1dad7f [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality
Follow-up to:
https://reviews.llvm.org/rL298775

llvm-svn: 298933
2017-03-28 17:23:49 +00:00
Nirav Dave 472b5efc8b [SDAG] Deal with deleted node in PromoteIntShiftOp
Deal with case that initial node is deleted during dag-combine leading
to an assertional failure in promoteIntShiftOp.

Fixes PR32420.

Reviewers: spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31403

llvm-svn: 298931
2017-03-28 17:09:49 +00:00
Nirav Dave 5b414ebe63 [SDAG] Avoid deleted SDNodes PromoteIntBinOp
Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from
being used.

Fixes PR32340 and PR32345.

Reviewers: hfinkel, dbabokin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31148

llvm-svn: 298923
2017-03-28 15:41:12 +00:00
Nirav Dave 9b5563c52c [SDAG] Fix Stale SDNode usage in visitAND
Reorder CombineTo Calls to prevent potential use of deleted node.
Fixes PR32372.

Reviewers: jnspaulsson, RKSimon, uweigand, jonpa

Reviewed By: jonpa

Subscribers: jonpa, llvm-commits

Differential Revision: https://reviews.llvm.org/D31346

llvm-svn: 298920
2017-03-28 14:11:20 +00:00
Nirav Dave 423b24ae76 [SDAG] Minor cleanup of variable usage. NFC.
llvm-svn: 298916
2017-03-28 13:39:50 +00:00
Valery Pykhtin 910da13a07 MachineScheduler/ScheduleDAG: Add support for GetSubGraph
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30626

llvm-svn: 298896
2017-03-28 05:12:31 +00:00
Junmo Park c7479ba86a CodeGen : Check LLVM_ENABLE_DUMP definition for dumpMachineInstrRangeWithSlotIndex.
Summary:
Add missing check routine for dumpMachineInstrRangeWithSlotIndex including LLVM_DUMP_METHOD.

Reviewers: bkramer

Differential revision: https://reviews.llvm.org/D30367

llvm-svn: 298895
2017-03-28 04:14:25 +00:00
Javed Absar 3d59437093 Improve machine schedulers for in-order processors
This patch enables schedulers to specify instructions that 
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.

Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744

llvm-svn: 298885
2017-03-27 20:46:37 +00:00
Adrian Prantl e8450fdb48 Remove redundant check for nullptr.
llvm-svn: 298866
2017-03-27 17:36:31 +00:00
Adrian Prantl 035862b926 Remove unneccessary virtual destructor from DwarfExpression.
llvm-svn: 298865
2017-03-27 17:34:04 +00:00
Ahmed Bougacha 2d29998f22 [GlobalISel] Add a 'getConstantVRegVal' helper.
Use it to compare immediate operands.

llvm-svn: 298855
2017-03-27 16:35:27 +00:00
Sanjay Patel 9ebb68843e [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, 
we can replace memcmp calls with inline code that is both smaller and faster.

Differential Revision: https://reviews.llvm.org/D31290

llvm-svn: 298775
2017-03-25 16:05:33 +00:00
Simon Pilgrim dbc94db3f3 Apply clang-format as commented in D31311. NFCI.
llvm-svn: 298751
2017-03-24 23:47:41 +00:00
Reid Kleckner 6b78e16368 [codeview] Don't assert when the user violates the ODR
If we have an array of a user-defined aggregates for which there was an
ODR violation, then the array size will not necessarily match the number
of elements times the size of the element.

Fixes PR32383

llvm-svn: 298750
2017-03-24 23:28:42 +00:00
Davide Italiano 6a1209ee87 [MachineScheduler] Add missing machine pass dependency.
llvm-svn: 298736
2017-03-24 20:52:56 +00:00
Adrian Prantl b216e3e935 Refactor code to reduce indentation and improve readability. (NFC)
llvm-svn: 298665
2017-03-23 23:35:09 +00:00
Adrian Prantl f0ffc5233d Fix a bug when emitting debug info for partially constant global variables.
While fixing a malformed testcase, I discovered that the code
exercised by it was wrong, too.

llvm-svn: 298664
2017-03-23 23:35:00 +00:00
Dehao Chen b197d5b0a0 Fix trellis layout to avoid mis-identify triangle.
Summary:
For the following CFG:

A->B
B->C
A->C

If there is another edge B->D, then ABC should not be considered as triangle.

Reviewers: davidxl, iteratee

Reviewed By: iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D31310

llvm-svn: 298661
2017-03-23 23:28:09 +00:00
Dehao Chen 775341a14c Use isFunctionHotInCallGraph to set the function section prefix.
Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31225

llvm-svn: 298656
2017-03-23 23:14:11 +00:00
Jessica Paquette 5096a34726 [Outliner] Remove unused lambda capture.
Remove an unused lambda capture that made some bots unhappy.

llvm-svn: 298651
2017-03-23 22:17:20 +00:00
Jessica Paquette acffa28c63 [Outliner] Fix compile-time overhead for candidate choice
The old candidate collection method in the outliner caused some very large
regressions in compile time on large tests. For MultiSource/Benchmarks/7zip it
caused a 284.07 s or 1156% increase in compile time. On average, using the
SingleSource/MultiSource tests, it caused an average increase of 8 seconds in
compile time (something like 1000%).

This commit replaces that candidate collection method with a new one which
only visits each node in the tree once. This reduces the worst compile time
increase (still 7zip) to a 0.542 s overhead (22%) and the average compile time
increase on SingleSource and MultiSource to 0.018 s (4%).

llvm-svn: 298648
2017-03-23 21:27:38 +00:00
Adrian Prantl 8f3337950e Zero-Initialize PrevInstBB when entering a new MachineFunction.
It is not guaranteed that the memory used for MachineBasicBlocks in
the previous MachineFunction hasn't been freed, so holding on to a
pointer to the last function's isn't correct. Particularly I have
observed the sret.ll testcase failing because the first BasicBlock in
the new function happened to be allocated to the exact same memory as
the previously saved and (deleted) PrevInstBB.

llvm-svn: 298642
2017-03-23 20:23:42 +00:00
Nirav Dave e9ca32ae52 [SDAG] Fix zeroExtend assertion error
Move CombineTo preventing deleted node from being returned in
visitZERO_EXTEND.

Fixes PR32284.

Reviewers: RKSimon, bogner

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31254

llvm-svn: 298604
2017-03-23 15:01:50 +00:00
Adrian Prantl a27198853d Rename helper functions in DwarfExpression to be less misleading (NFC)
llvm-svn: 298523
2017-03-22 17:19:55 +00:00
Adrian Prantl 5503077cde Fix PR32298 by adding an early exit to getFrameIndexExprs().
Also add an assertion for the case that there are multiple FI
expressions with a DW_OP_LLVM_fragment; which should violate internal
constraints in DbgVariable.

llvm-svn: 298518
2017-03-22 16:50:16 +00:00
Aditya Nandakumar bc389badbc [GlobalISel]: Create VREGs for ConstantInt args
This patch changes the behavior of IRTranslating intrinsics where we
now create VREG + G_CONSTANT for ConstantInt values. We already do this
for FloatingPoint values. This makes it easier for the backends to
select code and it won't have to de-duplicate creation+selection of
constants.

Reviewed by: ab

llvm-svn: 298473
2017-03-22 01:16:39 +00:00
Adrian Prantl 0498baa4be Don't compose DWARF expressions with multiple subregisters.
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.

llvm-svn: 298472
2017-03-22 01:16:01 +00:00
Adrian Prantl 80e188d983 DwarfExpression: Defer emitting DWARF register operations
until the rest of the expression is known.

This is still an NFC refactoring in preparation of a subsequent bugfix.

This reapplies r298388 with a bugfix for non-physical frame registers.

llvm-svn: 298471
2017-03-22 01:15:57 +00:00
Ahmed Bougacha 15b3e8a93a [GlobalISel] Update DBG_VALUEs referencing DCE'd instructions.
Quentin points out that r298358 would cause us to emit different code
with debug info.  That's a big no-no; also erase the instructions that
only live thanks to DBG_VALUE users.

Adrian explained how this is an existing problem and an OK thing to do:
clang has allocas for all variables so shouldn't be affected at -O0, but
swift uses a bit of inlineasm to explicitly keep values live for the
purpose of debug info quality.  I'm not sure there is a better scheme.

llvm-svn: 298460
2017-03-21 23:42:54 +00:00
Ahmed Bougacha e8e1fa3a7c [GlobalISel] Don't translate br to layout successor.
MI can represent fallthrough to layout successor blocks, and our
post-isel representation uses that extensively.

We might as well use it too, to avoid translating and carrying along
unnecessary branches.

llvm-svn: 298459
2017-03-21 23:42:50 +00:00
Tim Northover 548feeecd3 GlobalISel: respect BooleanContents when extending i1.
The world isn't just x86 & ARM, some targets need to store -1 into the byte
when legalizing a bool store.

llvm-svn: 298453
2017-03-21 22:22:05 +00:00
Matthias Braun 8445cbd1ca SplitKit: Fix subreg copy related problems
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
  when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
  block.

This fixes http://llvm.org/PR32353

llvm-svn: 298448
2017-03-21 21:58:08 +00:00
Tim Northover dd4b9d6d7b GlobalISel: widen booleans by zero-extending to a byte.
A bool is represented by a single byte, which the ARM ABI requires to be either
0 or 1. So we cannot use G_ANYEXT when legalizing the type.

llvm-svn: 298439
2017-03-21 21:12:04 +00:00
Adrian Prantl 4dc0324ff8 Revert 298388 and 298389 because they broke some AMDGPU tests.
llvm-svn: 298401
2017-03-21 17:14:30 +00:00
Reid Kleckner b518054b87 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Adrian Prantl 1db6486fb1 Don't compose DWARF expressions with multiple subregisters.
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.

llvm-svn: 298389
2017-03-21 16:37:39 +00:00
Adrian Prantl dabee8e57c DwarfExpression: Defer emitting DWARF register operations
until the rest of the expression is known.

This is still an NFC refactoring in preparation of a subsequent bugfix.

llvm-svn: 298388
2017-03-21 16:37:35 +00:00
Matt Arsenault dce313c3cf DAG: Fold bitcast/extract_vector_elt of undef to undef
Fixes not eliminating store when intrinsic is lowered to undef.

llvm-svn: 298385
2017-03-21 16:20:16 +00:00
Volkan Keles 47debaeef0 [GlobalISel] Move isTriviallyDead to Utils. NFC.
Make it accessible by the targets to avoid code duplication.

llvm-svn: 298358
2017-03-21 10:47:35 +00:00
Jonas Paulsson 54c7680e1f [DAGTypeLegalizer] Handle widening truncate to vector of i1.
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().

This patch adds the needed handling, along with a test case.

Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077

llvm-svn: 298357
2017-03-21 10:24:14 +00:00
Volkan Keles 75bdc7690e [GlobalISel] Translate shufflevector
Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders

Reviewed By: javed.absar

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30962

llvm-svn: 298347
2017-03-21 08:44:13 +00:00
Zachary Turner 82a0c97b32 Add a function to MD5 a file's contents.
In doing so, clean up the MD5 interface a little.  Most
existing users only care about the lower 8 bytes of an MD5,
but for some users that care about the upper and lower,
there wasn't a good interface.  Furthermore, consumers
of the MD5 checksum were required to handle endianness
details on their own, so it seems reasonable to abstract
this into a nicer interface that just gives you the right
value.

Differential Revision: https://reviews.llvm.org/D31105

llvm-svn: 298322
2017-03-20 23:33:18 +00:00
Adrian Prantl 956484b7b5 Replace uses of DwarfExpression::addMachineReg* with addMachineRegExpression
and mark the methods as protected.

Besides reducing the surface area of DwarfExpression, this is in
preparation for an upcoming bugfix in the DwarfExpression
implementation, for which it will be necessary to defer emitting
register operations until the rest of the expression is known.

NFC

llvm-svn: 298309
2017-03-20 21:35:09 +00:00
Adrian Prantl 52884b7be8 Make implementation details in DwarfExpression protected. (NFC)
llvm-svn: 298308
2017-03-20 21:34:19 +00:00
Reid Kleckner 8819c73878 [WinEH] Adjust decision to emit SEH moves for leaf functions
Move the check for "MF->hasWinCFI()" up into the calculation of the
shouldEmitMoves boolean, rather than putting it in the early returning
if. This ensures that endFunction doesn't try to emit .seh_* directives
for leaf functions.

llvm-svn: 298276
2017-03-20 17:45:59 +00:00
Tim Northover 89268b183f GlobalISel: allow quad-precision values to be dumped.
Otherwise the fallback path fails with an assertion on AAPCS AArch64 targets,
when "long double" is encountered.

llvm-svn: 298273
2017-03-20 16:52:08 +00:00
Diana Picus d79253a9f7 [GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.

Differential Revision: https://reviews.llvm.org/D31039

llvm-svn: 298254
2017-03-20 14:40:18 +00:00
Simon Pilgrim 8424df7dea Fix constant folding of fp2int to large integers
We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result.

Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes.

This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases.

Differential Revision: https://reviews.llvm.org/D31074

llvm-svn: 298226
2017-03-19 16:50:25 +00:00
Ahmed Bougacha 931904d777 [GlobalISel] Don't select trivially dead instructions.
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

llvm-svn: 298224
2017-03-19 16:13:00 +00:00
Ahmed Bougacha 7f2d17331c [GlobalISel] Move method definition to the proper file. NFC.
llvm-svn: 298221
2017-03-19 16:12:48 +00:00
Oren Ben Simhon 0ef61ec32a [MIR] Support Customed Register Mask and CSRs
The MIR printer dumps a string that describe the register mask of a function.
A static predefined list of register masks matches a static list of strings.
However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails.
This patch adds support to custom register mask printing and dumping.
Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic.
As such this data needs to be dumped and parsed back to the Machine Register Info.

Differential Revision: https://reviews.llvm.org/D30971

llvm-svn: 298207
2017-03-19 08:14:18 +00:00
Matthias Braun e6ff30b696 ExecutionDepsFix: Let targets specialize the pass; NFC
Let targets specialize the pass with the register class so we can get a
parameterless default constructor and can put the pass into the pass
registry to enable testing with -run-pass=.

llvm-svn: 298184
2017-03-18 05:08:58 +00:00
Matthias Braun e9f8209e87 ExecutionDepsFix: Normalize names; NFC
Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and
ExecutionDepsFix to the last one.

llvm-svn: 298183
2017-03-18 05:05:40 +00:00
Matthias Braun 6e67052912 CodeGen.cpp: Sort alphabetically; NFC
llvm-svn: 298182
2017-03-18 05:05:32 +00:00
Nirav Dave ac6081cb67 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Nirav Dave 6de2c77944 Capitalize ArgListEntry fields. NFC.
llvm-svn: 298178
2017-03-18 00:43:57 +00:00
Evgeniy Stepanov 51c962f72e Add !associated metadata.
This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section
pointing to the metadata argument's section. The effect of that is a reverse dependency
between sections for the linker GC.

!associated does not change the behavior of global-dce. The global
may also need to be added to llvm.compiler.used.

Since SHF_LINK_ORDER is per-section, !associated effectively enables
fdata-sections for the affected globals, the same as comdats do.

Differential Revision: https://reviews.llvm.org/D29104

llvm-svn: 298157
2017-03-17 22:17:24 +00:00
Eli Friedman 46ddab3810 [SelectionDAG] Remove redundant stores more aggressively.
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects.  This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.

It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.

Differential Revision: https://reviews.llvm.org/D29845

llvm-svn: 298156
2017-03-17 22:15:50 +00:00
Jun Bum Lim 4230101def [CodeGenPrep]Restructure promoting Ext to form ExtLoad
Summary:
Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case.
This change was motivated from D26524.  Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP.

Reviewers: jmolloy, qcolombet, mcrosier, javed.absar

Reviewed By: qcolombet

Subscribers: rengolin, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27853

llvm-svn: 298114
2017-03-17 19:05:21 +00:00
Simon Pilgrim 5a68d401c7 [SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS
llvm-svn: 298108
2017-03-17 17:45:36 +00:00
Matthias Braun f0b68d3fbc SplitKit: Correctly implement partial subregister copies
- This fixes a bug where subregister incompatible with the vregs register
  class where used.
- Implement the case where multiple copies are necessary to cover a
  given lanemask.

Differential Revision: https://reviews.llvm.org/D30438

llvm-svn: 298025
2017-03-17 00:41:39 +00:00
Matthias Braun fa289ec7f0 VirtRegMap: Correctly deal with bundles when deleting identity copies.
This fixes two problems when VirtRegMap encounters bundles:

- When substituting a vreg subregister def with an actual register the
  internal read flag must be cleared.
- Removing an identity COPY from a bundle needs to use
  removeFromBundle() and a newly introduced function to update
  SlotIndexes.

No testcase here, because none of the in-tree targets trigger this,
however an upcoming commit of mine will need this and the testcase there
will trigger this.

Differential Revision: https://reviews.llvm.org/D30925

llvm-svn: 298024
2017-03-17 00:41:33 +00:00
Eric Christopher 53da761570 Remove LessPreciseFPMADOption from TargetOptions along with all of the
associated command line options and functions - it's currently unused
in all of llvm and clang other than being set and reset.

llvm-svn: 298023
2017-03-17 00:38:03 +00:00
Reid Kleckner 45707d4d5a Remove getArgumentList() in favor of arg_begin(), args(), etc
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.

In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D31052

llvm-svn: 298010
2017-03-16 22:59:15 +00:00
Simon Pilgrim fbfb19b1d7 Remove redundant conditions (PR31753). NFCI.
llvm-svn: 297976
2017-03-16 19:52:00 +00:00
Adrian Prantl dc855221af Attempt to fix bot failure on Windows.
Looks like this expression was accidentally using 32-bit arithmetic.

llvm-svn: 297969
2017-03-16 18:06:04 +00:00
Adrian Prantl 3621309e8d Rearrange fields. NFC.
llvm-svn: 297967
2017-03-16 17:42:47 +00:00
Adrian Prantl a63b8e8227 Rename methods in DwarfExpression to adhere to the LLVM coding guidelines.
NFC.

llvm-svn: 297966
2017-03-16 17:42:45 +00:00
Adrian Prantl 981f03e6a2 PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288

  The DWARF generated by LLVM includes this location:

  0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
  0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
  to assume the DWARF consumer knows which part of a register
  logically holds the value (low bytes, high bytes, how many bytes,
  etc) for a primitive value like an integer.

This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.

(This reapplies r297960 with two additional testcase updates).

rdar://problem/31069390
https://reviews.llvm.org/D31010

llvm-svn: 297965
2017-03-16 17:14:56 +00:00
Adrian Prantl c5b3351750 Revert "PR32288: More efficient encoding for DWARF expr subregister access."
This reverts commit 2bf453116889a576956892ea9683db4fcd96e30e while investigating buildbot breakage.

llvm-svn: 297962
2017-03-16 16:38:22 +00:00
Adrian Prantl 8508e87998 PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288

  The DWARF generated by LLVM includes this location:

  0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
  0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
  to assume the DWARF consumer knows which part of a register
  logically holds the value (low bytes, high bytes, how many bytes,
  etc) for a primitive value like an integer.

This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.

rdar://problem/31069390
https://reviews.llvm.org/D31010

llvm-svn: 297960
2017-03-16 16:34:14 +00:00
Oren Ben Simhon da59ffae91 Fixing typos.
llvm-svn: 297932
2017-03-16 08:15:52 +00:00
Jonas Paulsson 84319bfc40 [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.

(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.

The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.

Updated tests:

test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll

Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489

llvm-svn: 297930
2017-03-16 07:17:12 +00:00
Kyle Butt 08655997eb CodeGen: BlockPlacement: Reduce TriangleChainCount to 2
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%

llvm-svn: 297925
2017-03-16 01:32:29 +00:00
Craig Topper 6c66bbca4a [StackColoring] Remove unused header file for post-order traversal. Update comment that indicated we were using it when we really use a depth-first search. NFC
llvm-svn: 297904
2017-03-15 22:40:26 +00:00
Matt Arsenault 02d915be90 CodeGenPrepare: Sink addressing modes for atomics
llvm-svn: 297903
2017-03-15 22:35:20 +00:00
Eric Christopher 17ce8a2f5e Fix up grammar in a comment.
llvm-svn: 297898
2017-03-15 21:50:46 +00:00
Zvi Rackover 48cdde0e59 [DAGCombine] Bail out if can't create a vector with at least two elements
Summary:

Fixes pr32278

Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30978

llvm-svn: 297878
2017-03-15 19:48:36 +00:00
Ahmed Bougacha 2fb8030748 [GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS.
Currently, we create a G_CONSTANT for every "synthetic" integer
constant operand (for instance, for the G_GEP offset).
Instead, share the G_CONSTANTs we might have created by going through
the ValueToVReg machinery.

When we're emitting synthetic constants, we do need to get Constants from
the context.  One could argue that we shouldn't modify the context at
all (for instance, this means that we're going to use a tad more memory
if the constant wasn't used elsewhere), but constants are mostly
harmless.  We currently do this for extractvalue and all.

For constant fcmp, this does mean we'll emit an extra COPY, which is not
necessarily more optimal than an extra materialized constant.
But that preserves the current intended design of uniqued G_CONSTANTs,
and the rematerialization problem exists elsewhere and should be
resolved with a single coherent solution.

llvm-svn: 297875
2017-03-15 19:21:11 +00:00
Tim Northover 0d98b03b9f ARM: avoid clobbering register in v6 jump-table expansion.
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.

Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.

llvm-svn: 297871
2017-03-15 18:38:13 +00:00
Ahmed Bougacha 07f247b6c2 [GlobalISel] Insert translated switch icmp blocks after switch parent.
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.

llvm-svn: 297869
2017-03-15 18:22:37 +00:00
Ahmed Bougacha a61c214f51 [GlobalISel] Preserve IR block layout.
It makes the output function layout more predictable;  the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.

getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything.  Rename it and extract the MBB creation logic out of it.

A couple tests were sensitive to the order. Update them.

llvm-svn: 297868
2017-03-15 18:22:33 +00:00
Craig Topper bcb6093610 [CodeGen] Use APInt::setLowBits/setHighBits/setBitsFrom in more places
This patch replaces ORs with getHighBits/getLowBits etc. with setLowBits/setHighBits/setBitsFrom.

In a few of the places we weren't ORing, but the KnownZero/KnownOne vectors were already initialized to zero. We exploit this in most places already there were just some that were inconsistent.

Differential Revision: https://reviews.llvm.org/D30965

llvm-svn: 297860
2017-03-15 16:53:53 +00:00
Ahmed Bougacha 2b7f1377aa [GlobalISel] Remove dead member. NFC.
llvm-svn: 297855
2017-03-15 16:29:32 +00:00
Peter Collingbourne d44a01aae6 CodeGen: Use the source filename as the argument to .file, rather than the module ID.
Using the module ID here is wrong for a couple of reasons:
1) The module ID is not persisted, so we can end up with different
   object file contents given the same input file (for example if the same
   file is accessed via different paths).
2) With ThinLTO the module ID field may contain the path to a bitcode file,
   which is incorrect, as the .file argument is supposed to contain the path to
   a source file.

Differential Revision: https://reviews.llvm.org/D30584

llvm-svn: 297853
2017-03-15 16:24:52 +00:00
Simon Pilgrim 018eedd9a5 [SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273)
llvm-svn: 297852
2017-03-15 16:22:24 +00:00
Nuno Lopes ae455c562d fix gcc -Wmisleading-indentation [NFC]
llvm-svn: 297816
2017-03-15 09:33:33 +00:00
Taewook Oh 1b192336d8 NFC: Reformats comments according to the coding guildelines.
llvm-svn: 297808
2017-03-15 06:29:23 +00:00
Taewook Oh fb1833efeb [BranchFolding] Merge debug locations from common tail instead of removing
Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.

Reviewers: aprantl, probinson, MatzeB, rob.lougher

Reviewed By: aprantl

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D30226

llvm-svn: 297805
2017-03-15 05:44:59 +00:00
Peter Collingbourne 7f6e2c97b8 Ensure that prefix data is preserved with subsections-via-symbols
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.

Patch by Moritz Angermann!

Differential Revision: https://reviews.llvm.org/D30770

llvm-svn: 297804
2017-03-15 04:18:16 +00:00
Volkan Keles 4862c63594 [GlobalISel] IRTranslator: Return the scalar for <1 x Ty> constant vectors
Summary:
<1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES
instruction for them.

Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30948

llvm-svn: 297792
2017-03-14 23:45:06 +00:00
Daniel Sanders 8a4bae9993 [globalisel][tblgen] Add support for ComplexPatterns
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.

Depends on D30046, D29712

Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30089

llvm-svn: 297782
2017-03-14 21:32:08 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Sanjay Patel 8dd99dce6c [DAG] vector div/rem with any zero element in divisor is undef
This is the backend counterpart to:
https://reviews.llvm.org/rL297390
https://reviews.llvm.org/rL297409
and follow-up to:
https://reviews.llvm.org/rL297384

It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, 
but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those 
someday.

Differential Revision: https://reviews.llvm.org/D30826

llvm-svn: 297762
2017-03-14 18:06:28 +00:00
Benjamin Kramer babcbddae0 [CodeGen] Fix -Wreorder warning.
llvm-svn: 297729
2017-03-14 10:29:47 +00:00
Sam Parker 916b1ba617 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708

llvm-svn: 297716
2017-03-14 09:13:22 +00:00
Oren Ben Simhon fe34c5e429 Disable Callee Saved Registers
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.

Differential Revision: https://reviews.llvm.org/D28566

llvm-svn: 297715
2017-03-14 09:09:26 +00:00
Nirav Dave 4fc8401abf Recommitting Craig Topper's patch now that r296476 has been recommitted.
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.

This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 297698
2017-03-14 01:42:23 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Adrian Prantl 19aadf57c8 Revert "Debug Info: Add basic support for external types references."
This reverts commit r242302. External type refs of this form were
never used by any LLVM frontend so this is effectively dead code.
(They were introduced to support clang module debug info, but in the
end we came up with a better design that doesn't use this feature at
all.)

rdar://problem/25897929

Differential Revision: https://reviews.llvm.org/D30917

llvm-svn: 297684
2017-03-13 22:56:14 +00:00
Marcello Maggioni 598d89a3f4 [IPRA] Change algorithm for RegUsageInfoCollector.
The previous algorithm for RegUsageInfoCollector had pretty bad
performance on architectures with a lot of registers that alias
a lot one another, because we potentially iterate for every register
over all the aliasing registers. This costs even more if the function
is small and doesn't define a lot of registers.
This patch changes the algorithm to one that while iterating over
all the registers it will iterate over the aliasing registers only
if the register itself is defined.
This should be faster based on the assumption that only a subset
of the whole LLVM registers set is actually defined in the function.

Differential Revision: https://reviews.llvm.org/D30880

llvm-svn: 297673
2017-03-13 21:42:53 +00:00
Volkan Keles 38a91a0de6 GlobalISel: Translate ConstantDataVector
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab

Reviewed By: qcolombet, dsanders, ab

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30216

llvm-svn: 297670
2017-03-13 21:36:19 +00:00
Jessica Paquette c984e21394 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.

llvm-svn: 297653
2017-03-13 18:39:33 +00:00
Simon Pilgrim fa97699d09 Fix -Wsentinel warning
llvm-svn: 297560
2017-03-11 12:56:02 +00:00
Amaury Sechet d1ec5d54cf Use setBits in SelectionDAG
Summary: As per title.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30836

llvm-svn: 297559
2017-03-11 11:24:03 +00:00
Quentin Colombet ee8a4f51c4 [IRTranslator] Simplify error handling for translating constants. NFC.
We don't need to check whether the fallback path is enabled to return
false. Just do that all the time on error cases, the caller knows (or
at least should know!) how to handle the failing case.

llvm-svn: 297535
2017-03-11 00:28:33 +00:00
Stanislav Mekhanoshin b546174b0e Fix subreg value numbers in handleMoveUp
The problem can occur in presence of subregs. If we are swapping two
instructions defining different subregs of the same register we will
get a new liveout from a block. We need to preserve value number for
block's liveout for successor block's livein to match.

Differential Revision: https://reviews.llvm.org/D30558

llvm-svn: 297534
2017-03-11 00:14:52 +00:00
Simon Pilgrim 455e2f3313 Strip trailing whitespace.
llvm-svn: 297529
2017-03-10 22:53:19 +00:00
Simon Pilgrim 83c37c4dac Fix redundant condition (PR32138)
'!A || (A && B)' is equivalent to '!A || B'

llvm-svn: 297527
2017-03-10 22:44:47 +00:00
Volkan Keles 225921ac41 [GlobalISel] LegalizerHelper: Lower (G_FSUB X, Y) to (G_FADD X, (G_FNEG Y))
Summary: No test case as none of the in-tree targets with GlobalISel support has this condition.

Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab

Reviewed By: qcolombet

Subscribers: dberris, rovka, kristof.beyls, llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D30786

llvm-svn: 297512
2017-03-10 21:25:09 +00:00
Volkan Keles 970fee4bfe GlobalISel: Translate ConstantAggregateZero vectors
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30259

llvm-svn: 297509
2017-03-10 21:23:13 +00:00
Volkan Keles 04cb08cc83 [GlobalISel] Translate insertelement and extractelement
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30761

llvm-svn: 297495
2017-03-10 19:08:28 +00:00
Simon Pilgrim 7dedbfa89d [SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits
llvm-svn: 297492
2017-03-10 18:36:46 +00:00
Volkan Keles 685fbda217 [GlobalISel] Make LegalizerInfo accessible in LegalizerHelper
Summary:
We don’t actually use LegalizerInfo in Legalizer pass, it’s just passed
as an argument.

In order to check if an instruction is legal or not, we need to get LegalizerInfo
by calling `MI.getParent()->getParent()->getSubtarget().getLegalizerInfo()`.
Instead, make LegalizerInfo accessible in LegalizerHelper.

Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, kristof.beyls

Reviewed By: qcolombet

Subscribers: dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D30838

llvm-svn: 297491
2017-03-10 18:34:57 +00:00
Amaury Sechet 62e0759d56 [SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC.
Summary:
Depends on D30379

This improves the state of things for the sub class of operation.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30436

llvm-svn: 297482
2017-03-10 17:26:44 +00:00
Amaury Sechet 69fa16c810 [SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO.
Summary: As per title. This is extracted from D29872 and I threw SADDO in.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30379

llvm-svn: 297479
2017-03-10 17:06:52 +00:00
Simon Pilgrim b02667c469 [APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.

Differential Revision: https://reviews.llvm.org/D30780

llvm-svn: 297458
2017-03-10 13:44:32 +00:00
Ahmed Bougacha d22b84b9d0 [GlobalISel] Use ImmutableCallSite instead of templates. NFC.
ImmutableCallSite abstracts away CallInst and InvokeInst. Use it!

llvm-svn: 297426
2017-03-10 00:25:44 +00:00
Ahmed Bougacha 4ec6d5abed [GlobalISel] Fallback when failing to translate invoke.
We unintentionally stopped falling back in r293670.

While there, change an unusual construct.

llvm-svn: 297425
2017-03-10 00:25:35 +00:00
Tim Northover aa995c98f4 GlobalISel: support trivial inlineasm calls.
They're used for nefarious purposes by ObjC.

llvm-svn: 297422
2017-03-09 23:36:26 +00:00
Eli Friedman 93f47e5ffb Refactor alias check from MISched into common helper. NFC.
Differential Revision: https://reviews.llvm.org/D30598

llvm-svn: 297421
2017-03-09 23:33:36 +00:00
Amaury Sechet e7d102cf02 [DAGCombiner] Do various combine on uaddo.
Summary: This essentially does the same transform as for ADC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30417

llvm-svn: 297416
2017-03-09 22:47:00 +00:00
Tim Northover d1e951e5eb GlobalISel: inform FrameLowering when we emit a function call.
Amongst other things (I expect) this is necessary to ensure decent backtraces
when an "unreachable" is involved.

llvm-svn: 297413
2017-03-09 22:00:39 +00:00
Tim Northover 7a9ea8f628 GlobalISel: put debug info for static allocas in the MachineFunction.
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.

The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken.  And it still gets it wrong if the variable is at SP
directly.

llvm-svn: 297410
2017-03-09 21:12:06 +00:00
Amaury Sechet 10425de063 [DAGCombiner] Do various combine on usubo.
Summary: This essentially does the same transform as for SUBC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30437

llvm-svn: 297404
2017-03-09 19:28:00 +00:00
Sanjay Patel df21979db7 [DAG] recognize div/rem by 0 as undef before trying constant folding
As discussed in the review thread for rL297026, this is actually 2 changes that 
would independently fix all of the test cases in the patch:

1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before 
   foldBinopIntoSelect() as a matter of efficiency.

I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().

I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).

Differential Revision:
https://reviews.llvm.org/D30741

llvm-svn: 297384
2017-03-09 15:02:25 +00:00
Adam Nemet 5361b82d54 [SSP] In opt remarks, stream Function directly
With this, it shows up as an attribute in YAML and non-printable characters
are properly removed by GlobalValue::getRealLinkageName.

llvm-svn: 297362
2017-03-09 06:10:27 +00:00
Matt Arsenault 9a3fd87523 DAG: Check no signed zeros instead of unsafe math attribute
llvm-svn: 297354
2017-03-09 01:36:39 +00:00
Konstantin Zhuravlyov d5561e0a0b [DebugInfo] Emit address space with DW_AT_address_class attribute for pointer and reference types
Differential Revision: https://reviews.llvm.org/D29670

llvm-svn: 297320
2017-03-08 23:55:44 +00:00
Jessica Paquette d4cb9c6da0 [Outliner] Fix memory leak in suffix tree.
This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator.
Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes.
Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released.

llvm-svn: 297319
2017-03-08 23:55:33 +00:00
Tim Northover 7596bd7a27 GlobalISel: correctly handle trivial fcmp predicates.
It makes sense to only do them once in IRTranslator rather than making everyone
deal with them.

llvm-svn: 297304
2017-03-08 18:49:54 +00:00
Volkan Keles 5698b2ae6e [GlobalISel] Add default action for G_FNEG
Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets.

Reviewers: qcolombet, ab, kristof.beyls, t.p.northover, aditya_nandakumar, dsanders

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits

Differential Revision: https://reviews.llvm.org/D30721

llvm-svn: 297301
2017-03-08 18:09:14 +00:00
Eli Friedman c2c2e21d77 [DAGCombine] Simplify ISD::AND in GetDemandedBits.
This helps in cases involving bitfields where an AND is exposed by
legalization.

Differential Revision: https://reviews.llvm.org/D30472

llvm-svn: 297249
2017-03-08 00:56:35 +00:00
Konstantin Zhuravlyov f9b41cd3d8 [DebugInfo] Make legal and emit DW_OP_swap and DW_OP_xderef
Differential Revision: https://reviews.llvm.org/D29672

llvm-svn: 297247
2017-03-08 00:28:57 +00:00
Daniel Sanders 1351db49b2 Fix additional constructor call missed by r297241.
It was added between my build+test and my commit.

llvm-svn: 297244
2017-03-07 23:32:10 +00:00
Daniel Sanders 52b4ce727a Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297241
2017-03-07 23:20:35 +00:00
Tim Northover 542d1c1463 GlobalISel: use inserts for landingpad instead of sequences.
llvm-svn: 297237
2017-03-07 23:04:06 +00:00
Tim Northover 2eb18d3c4b GlobalISel: fix legalization of G_INSERT
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".

llvm-svn: 297226
2017-03-07 21:24:33 +00:00
Yaron Keren 75fadfc774 Implement FreeMachineFunction::getPassName().
llvm-svn: 297222
2017-03-07 20:59:08 +00:00
Ahmed Bougacha 55d10423a6 [GlobalISel] Don't translate intrinsics with metadata parameters.
Some intrinsics take metadata parameters.  These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.

llvm-svn: 297209
2017-03-07 20:53:09 +00:00
Ahmed Bougacha 5c7924fca5 [GlobalISel] Avoid invalidating ValToVReg when translating no-op bitcast.
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.

Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.

The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.

llvm-svn: 297208
2017-03-07 20:53:06 +00:00
Ahmed Bougacha 38455ea8a6 [GlobalISel] Relax vector G_SELECT assertion.
For vector operands, the `select` instruction supports both vector and
non-vector conditions.  The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).

Make it possible to express the full range of G_SELECTs.

llvm-svn: 297207
2017-03-07 20:53:03 +00:00
Ahmed Bougacha adce3ee219 [GlobalISel] Slightly clean up DBG_VALUE FP build code.
I messed up my rebases leading to r297200, and ended up with stale (but
working) code.  Fix it.

llvm-svn: 297205
2017-03-07 20:52:57 +00:00
Ahmed Bougacha c373262d52 [GlobalISel] Ignore %noreg when applying default regbank mapping.
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.

Also skip them when applying the mapping.  Otherwise, we could end
up with mappings that we can't apply.

While there, duplicate an assert to distinguish between the two
error conditions.

llvm-svn: 297201
2017-03-07 20:34:23 +00:00
Ahmed Bougacha 4826bae8b4 [GlobalISel] Emit DBG_VALUE %noreg for non-int/fp constant values.
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do.  Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.

llvm-svn: 297200
2017-03-07 20:34:20 +00:00
Arnold Schwaighofer 69e74b48f2 SjLjEHPrepare: Fix the pass for swifterror arguments
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.

swifterror arguments have restrictions on their uses.

rdar://30839288

llvm-svn: 297197
2017-03-07 20:29:02 +00:00
Daniel Sanders 8ebec37d26 Revert r297177: Change LLT constructor string into an LLT-based object ...
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.

Somehow, this change causes the build to need Attributes.gen before it's been
generated.

llvm-svn: 297188
2017-03-07 19:21:23 +00:00
Daniel Sanders 8612326a08 [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297177
2017-03-07 18:32:25 +00:00
Volkan Keles 20d3c4200d [GlobalISel] Translate floating-point negation
Reviewers: qcolombet, javed.absar, aditya_nandakumar, dsanders, t.p.northover, ab

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30671

llvm-svn: 297171
2017-03-07 18:03:28 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Sanjay Patel 7f18ec50ba [DAG] refactor related div/rem folds; NFCI
This is known incomplete and not called in the right order relative to
other folds, but that's the current behavior. I'm just trying to clean
this up before making actual functional changes to make the patch smaller.

The logic here should mimic the IR equivalents that are in InstSimplify's
simplifyDivRem().

llvm-svn: 297086
2017-03-06 22:32:40 +00:00
Paul Robinson f96e21ad6d [DWARFv5] Update definitions to match published spec.
Some late additions to DWARF v5 were not in Dwarf.def; also one form
was redefined.  Add the new cases to relevant switches in different
parts of LLVM.  Replace DW_FORM_ref_sup with DW_FORM_ref_sup[4,8].

I did not add support for DW_FORM_strx3/addrx3 other that defining the
constants. We don't have any infrastructure to support these.

Differential Revision: http://reviews.llvm.org/D30664

llvm-svn: 297085
2017-03-06 22:20:03 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Krzysztof Parzyszek 5b8fae5edd [IfConversion] Only renormalize probabilities if branches are analyzable
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.

Differential Revision: https://reviews.llvm.org/D30556

llvm-svn: 297054
2017-03-06 19:12:42 +00:00
Tim Northover 95b6d5f2b1 GlobalISel: don't emit degenerate G_INSERT instructions.
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.

So now we detect these degenerate cases and emit real casts instead.

llvm-svn: 297051
2017-03-06 19:04:17 +00:00
Tim Northover 81dafc1c88 GlobalISel: add buildUndef method to MachineIRBuilder. NFC.
llvm-svn: 297044
2017-03-06 18:36:40 +00:00
Tim Northover 75e0b91e59 GlobalISel: refactor legalization of G_INSERT.
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.

llvm-svn: 297042
2017-03-06 18:23:04 +00:00
Sanjay Patel 7f7947bf41 [DAGCombiner] simplify div/rem-by-0
Refactoring of duplicated code and more fixes to follow.

This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html

Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.

The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.

llvm-svn: 297026
2017-03-06 16:36:42 +00:00
Sanjay Patel 6b029a5380 [DAG] fix formatting; NFC
llvm-svn: 297015
2017-03-06 15:27:57 +00:00
Sanjay Patel 5273afd4bb [DAG] fix typo in comment; NFC
llvm-svn: 297011
2017-03-06 15:07:43 +00:00
Dean Michael Berris 7e8eea429f [XRay] Allow logging the first argument of a function call.
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.

For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.

Reviewers: dberris

Reviewed By: dberris

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29702

llvm-svn: 296998
2017-03-06 06:48:56 +00:00
Simon Pilgrim 584d6d9d91 [SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions
Found by fuzz testing after rL296985 landed

llvm-svn: 296989
2017-03-05 15:52:18 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Craig Topper 6ffc044b2f [DAGCombine] Use APInt::operator|(uint64_t) instead of creating a temporary APInt and calling APInt::Or. NFC
This is more efficient by itself. But this is prep for a future patch that may remove APInt::Or while making operator| support rvalue references similar to add/sub.

llvm-svn: 296981
2017-03-05 01:08:16 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Florian Hahn 6406f98342 [legalize-types] Remove stale entries from SoftenedFloats.
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).

When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).

This fixes the fp128 test failures with expensive checks for X86.

I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.

Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon

Reviewed By: chh

Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D29265

llvm-svn: 296964
2017-03-04 12:00:35 +00:00
Eli Friedman 49f0220086 [MISched] Remove unused arguments. NFC.
llvm-svn: 296934
2017-03-04 00:42:55 +00:00
Matthias Braun ffe40dd69e RegAllocGreedy: Follow-up to r296722
We can now end up in situations where we initiate LiveIntervalUnion
queries with different SubRanges against the same register unit, so the
assert() no longer holds in all cases. Just recalculate now when we know
the cache is out of date.

llvm-svn: 296928
2017-03-03 23:27:20 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Tim Northover bf017293af GlobalISel: add merge/unmerge nodes for legalization.
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.

I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.

llvm-svn: 296921
2017-03-03 22:46:09 +00:00
Matthias Braun a04d7ad851 RegisterCoalescer: Simplify subrange splitting code; NFC
- Use slightly better variable names / compute in a more direct way.

llvm-svn: 296905
2017-03-03 19:05:34 +00:00
Simon Pilgrim 6dfab414db Use APInt::setBits instead of OR'ing in a separate APInt::getBitsSet call
llvm-svn: 296886
2017-03-03 17:03:52 +00:00
Simon Pilgrim cf12b5e1a6 Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296879
2017-03-03 16:35:57 +00:00
Simon Pilgrim 10754abe7e Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation
Avoids all the unnecessary extra bitrange creation/shift stages.

llvm-svn: 296871
2017-03-03 14:25:46 +00:00
Simon Pilgrim b01bb3a1b2 Fix Wdocumentation warning
llvm-svn: 296866
2017-03-03 12:09:11 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Adrian Prantl ea8880b81f LiveDebugValues: Assume calls never clobber SP.
A call should never modify the stack pointer, but some backends are
not so sure about this and never list SP in the regmask. For the
purposes of LiveDebugValues we assume a call never clobbers SP. We
already have a similar workaround in DbgValueHistoryCalculator (which
we hopefully can retire soon).

This fixes the availabilty of local ASANified variables on AArch64.

rdar://problem/27757381

llvm-svn: 296847
2017-03-03 01:08:25 +00:00
Kyle Butt 1fa6030767 CodeGen: BlockPlacement: Precompute layout for chains of triangles.
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

1) The post-dominators marked 50% are actually taken 56% (This shrinks with
   longer chains)
2) The chains are statically correlated. Branch probabilities have a very
   U-shaped distribution.
   [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
   If the branches in a chain are likely to be from the same side of the
   distribution as their predecessor, but are independent at runtime, this
   transformation is profitable. (Because the cost of being wrong is a small
   fixed cost, unlike the standard triangle layout where the cost of being
   wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
   branch was taken positively influences whether the next branch will be
   taken
We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

llvm-svn: 296845
2017-03-03 01:00:22 +00:00
Taewook Oh 96c6415697 [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

llvm-svn: 296825
2017-03-02 21:58:35 +00:00
Sanjay Patel 7884dcb788 [DAG] early exit to improve readability and formatting of visitMemCmpCall(); NFCI
llvm-svn: 296824
2017-03-02 21:56:43 +00:00
Kyle Butt 1393761e0c CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.
Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn: 296818
2017-03-02 21:44:24 +00:00
Zachary Turner d9dc2829ea [Support] Move Stream library from MSF -> Support.
After several smaller patches to get most of the core improvements
finished up, this patch is a straight move and header fixup of
the source.

Differential Revision: https://reviews.llvm.org/D30266

llvm-svn: 296810
2017-03-02 20:52:51 +00:00
Sanjay Patel 209b0f9aad [DAG] improve documentation comments; NFC
llvm-svn: 296808
2017-03-02 20:48:08 +00:00
Simon Pilgrim 7b227fecb5 Fix some Wdocumentation warnings
llvm-svn: 296783
2017-03-02 18:59:07 +00:00
Sanjay Patel fffa179837 [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().

llvm-svn: 296768
2017-03-02 17:18:56 +00:00
Sanjay Patel f7aba7ba22 fix typo in comment; NFC
llvm-svn: 296760
2017-03-02 16:37:24 +00:00
Serge Pavlov e2bf69715f Do not verify MachimeDominatorTree if it is not calculated
If dominator tree is not calculated or is invalidated, set corresponding
pointer in the pass state to nullptr. Such pointer value will indicate
that operations with dominator tree are not allowed. In particular, it
allows to skip verification for such pass state. The dominator tree is
not calculated if the machine dominator pass was skipped, it occures in
the case of entities with linkage available_externally.

The change fixes some test fails observed when expensive checks
are enabled.

Differential Revision: https://reviews.llvm.org/D29280

llvm-svn: 296742
2017-03-02 12:00:10 +00:00
Matthias Braun dbcf9e2ee4 LiveRegMatrix: Fix some subreg interference checks
Surprisingly, one of the three interference checks in LiveRegMatrix was
using the main live range instead of the apropriate subregister range
resulting in unnecessarily conservative results.

llvm-svn: 296722
2017-03-02 00:35:08 +00:00
Paul Robinson a94f76b18c Remove spurious use of LLVM_FALLTHROUGH (NFC)
llvm-svn: 296713
2017-03-01 23:59:11 +00:00
Amaury Sechet 71f511fd1e [DAGCombiner] mulhi + 1 never overflow.
Summary:
This can be used to optimize large multiplications after legalization.

Depends on D29565

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29587

llvm-svn: 296711
2017-03-01 23:44:17 +00:00
Ahmed Bougacha 120ae22d70 [GlobalISel] Add a way for targets to enable GISel.
Until now, we've had to use -global-isel to enable GISel.  But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.

This first step adds an override for the target to explicitly define its
level of support.  For AArch64, do that using
a new command-line option (I know..):
  -aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.

Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!

While there, remove a couple LLVM_UNLIKELYs.  Building the pipeline is
such a cold path that in practice that shouldn't matter at all.

llvm-svn: 296710
2017-03-01 23:33:08 +00:00
Sanjay Patel 92938657a0 [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502

llvm-svn: 296699
2017-03-01 22:51:31 +00:00
Victor Leschuk d7bfa40ace [DebugInfo] [DWARFv5] Unique abbrevs for DIEs with different implicit_const values
Take DW_FORM_implicit_const attribute value into account when profiling
DIEAbbrevData.

Currently if we have two similar types with implicit_const attributes and
different values we end up with only one abbrev in .debug_abbrev section.
For example consider two structures: S1 with implicit_const attribute ATTR
and value VAL1 and S2 with implicit_const ATTR and value VAL2.
The .debug_abbrev section will contain only 1 related record:

[N] DW_TAG_structure_type       DW_CHILDREN_yes
        DW_AT_ATTR        DW_FORM_implicit_const  VAL1
        // ....

This is incorrect as struct S2 (with VAL2) will use abbrev record with VAL1.

With this patch we will have two different abbreviations here:

[N] DW_TAG_structure_type       DW_CHILDREN_yes
        DW_AT_ATTR        DW_FORM_implicit_const  VAL1
        // ....

[M] DW_TAG_structure_type       DW_CHILDREN_yes
        DW_AT_ATTR        DW_FORM_implicit_const  VAL2
        // ....

llvm-svn: 296691
2017-03-01 22:13:42 +00:00
Benjamin Kramer 0e429606b0 [DAGCombiner] Remove non-ascii character and reflow comment.
llvm-svn: 296690
2017-03-01 22:10:43 +00:00
Matthias Braun 173e11439e LIU:::Query: Query LiveRange instead of LiveInterval; NFC
- We only need the information from the base class, not the additional
  details in the LiveInterval class.
- Spread more `const`
- Some code cleanup

llvm-svn: 296684
2017-03-01 21:48:12 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Matthias Braun 702f55bb4a LIU::Query: Remove always false member+getter; NFC
llvm-svn: 296675
2017-03-01 21:02:52 +00:00
Nemanja Ivanovic b223cfabcc Improve scheduling with branch coalescing
This patch adds a MachineSSA pass that coalesces blocks that branch
on the same condition.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D28249

llvm-svn: 296670
2017-03-01 20:29:34 +00:00
Nirav Dave 0a4703b5ec [DAG] Prevent Stale nodes from entering worklist
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.

This fixes PR32108.

Reviewers: jyknight, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30506

llvm-svn: 296668
2017-03-01 20:19:38 +00:00
Paul Robinson d4f1c487f3 Alphabetize some cases (NFC)
llvm-svn: 296655
2017-03-01 19:01:47 +00:00
Paul Robinson 91d74813a6 [DWARF] Default lower bound should respect requested DWARF version.
DWARF may define a default lower-bound for arrays in languages defined
in a particular DWARF version.  But the logic to suppress an
unnecessary lower-bound attribute was looking at the hard-coded
default DWARF version, not the version that had been requested.

Also updated the list with all languages defined in DWARF v5.

Differential Revision: http://reviews.llvm.org/D30484

llvm-svn: 296652
2017-03-01 18:32:37 +00:00
Artur Pilipenko e1b2d31468 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651
2017-03-01 18:12:29 +00:00
Ahmed Bougacha 20b3e9a835 [CodeGen] Remove dead FastISel code after SDAG emitted a tailcall.
When SDAGISel (top-down) selects a tail-call, it skips the remainder
of the block.

If, before that, FastISel (bottom-up) selected some of the (no-op) next
few instructions, we can end up with dead instructions following the
terminator (selected by SDAGISel).

We need to erase them, as we know they aren't necessary (in addition to
being incorrect).

We already do this when FastISel falls back on the tail-call itself.
Also remove the FastISel-emitted code if we fallback on the
instructions between the tail-call and the return.

llvm-svn: 296552
2017-03-01 00:43:42 +00:00
Ahmed Bougacha 67d1c7c3c2 [GlobalISel] Replace all combined G_EXTRACT uses.
Iterating on the use-list we're modifying doesn't work: after the first
iteration, the use-list iterator will point to a MachineOperand
referencing the new register.  This caused us to skip the other uses to
replace.

Instead, use MRI.replaceRegWith(), which accounts for this behavior.

llvm-svn: 296551
2017-03-01 00:43:39 +00:00
Paul Robinson 3443575c03 Add missing module/license header. NFC.
llvm-svn: 296550
2017-03-01 00:14:42 +00:00
Paul Robinson cddd60445e [DWARFv5] Emit new unit header format.
Requesting DWARF v5 will now get you the new compile-unit and
type-unit headers.  llvm-dwarfdump will also recognize them.

Differential Revision: http://reviews.llvm.org/D30206

llvm-svn: 296514
2017-02-28 20:24:55 +00:00
Sanjay Patel ea61ea9f19 [DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC
llvm-svn: 296502
2017-02-28 18:41:49 +00:00
Craig Topper 419f145ebb [DAGISel] When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 296486
2017-02-28 16:52:05 +00:00
David Bozier 5159968786 [Stack Protection] Add diagnostic information for why stack protection was applied to a function
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which functions have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.

This change adds a remark that is reported by the stack protection code when an instruction or attribute is encountered that causes SSP to be applied.

Patch by: James Henderson

Differential Revision: https://reviews.llvm.org/D29023

llvm-svn: 296483
2017-02-28 16:02:37 +00:00
Daniel Sanders 983c9b98e9 Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1.

llvm-svn: 296478
2017-02-28 15:00:27 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Daniel Sanders a5afdefec6 [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 296474
2017-02-28 14:21:31 +00:00
Sanjoy Das eef785c1a5 [ImplicitNullCheck] Add alias analysis usage
Summary:
With this change ImplicitNullCheck optimization uses alias analysis
and can use load/store memory access for implicit null check if there
are other load/store before but memory accesses do not alias.

Patch by Serguei Katkov!

Reviewers: sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30331

llvm-svn: 296440
2017-02-28 07:04:49 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Michael Kuperstein 13bf8a2684 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296416
2017-02-28 00:11:34 +00:00
Zachary Turner 695ed56ba5 [PDB] Make streams carry their own endianness.
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.

This makes the code cleaner and more general, but otherwise
has NFC.

llvm-svn: 296415
2017-02-28 00:04:07 +00:00
Eugene Zelenko fa912a7151 [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 296404
2017-02-27 22:45:06 +00:00
Arnold Schwaighofer b2605f31ed ISel: We need to notify FastIS of the IMPLICIT_DEF we created in createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.

rdar://30536186

llvm-svn: 296395
2017-02-27 22:12:06 +00:00
Zachary Turner 120faca41b [PDB] Partial resubmit of r296215, which improved PDB Stream Library.
This was reverted because it was breaking some builds, and
because of incorrect error code usage.  Since the CL was
large and contained many different things, I'm resubmitting
it in pieces.

This portion is NFC, and consists of:

1) Renaming classes to follow a consistent naming convention.
2) Fixing the const-ness of the interface methods.
3) Adding detailed doxygen comments.
4) Fixing a few instances of passing `const BinaryStream& X`.  These
   are now passed as `BinaryStreamRef X`.

llvm-svn: 296394
2017-02-27 22:11:43 +00:00
Matt Arsenault 4a7cc16e89 Revert "DAG: Check if extract_vector_elt is legal or custom"
This reverts r295782. This could potentially result in some
legalization loops and I avoided the need for this.

llvm-svn: 296393
2017-02-27 21:59:07 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Taewook Oh a49eb8578a [TailDuplicator] Maintain DebugLoc for branch instructions
Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions.

Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie

Reviewed By: dblaikie

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D30026

llvm-svn: 296371
2017-02-27 19:30:01 +00:00
Artur Pilipenko f7196c8d9e [DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE targets
This pattern is essentially a i16 load from p+1 address:

  %p1.i16 = bitcast i8* %p to i16*
  %p2.i8 = getelementptr i8, i8* %p, i64 2
  %v1 = load i16, i16* %p1.i16
  %v2.i8 = load i8, i8* %p2.i8
  %v2 = zext i8 %v2.i8 to i16
  %v1.shl = shl i16 %v1, 8
  %res = or i16 %v1.shl, %v2

Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.

llvm-svn: 296336
2017-02-27 13:04:23 +00:00
Artur Pilipenko c43b20a43b [DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper
This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296332
2017-02-27 11:42:54 +00:00
Artur Pilipenko f2c26e0bf2 [DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the load node
This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets.

llvm-svn: 296331
2017-02-27 11:40:14 +00:00
Daniel Jasper 3ca4525612 Revert "[CGP] Split some critical edges coming out of indirect branches"
This reverts commit r296149 as it leads to crashes when compiling for
PPC.

llvm-svn: 296295
2017-02-26 11:09:12 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Craig Topper 3b8aca2ecf [ExecutionDepsFix] Don't make copies of LiveReg objects when collecting operands for soft instructions
Summary:
While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object.

To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler.

The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation.

Fixes PR30284.

Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30242

llvm-svn: 296260
2017-02-25 18:12:25 +00:00
Artyom Skrobov ac56719231 No need to copy the variable [NFC]
llvm-svn: 296259
2017-02-25 17:18:09 +00:00
NAKAMURA Takumi 05a75e40da Revert r296215, "[PDB] General improvements to Stream library." and followings.
r296215, "[PDB] General improvements to Stream library."
r296217, "Disable BinaryStreamTest.StreamReaderObject temporarily."
r296220, "Re-enable BinaryStreamTest.StreamReaderObject."
r296244, "[PDB] Disable some tests that are breaking bots."
r296249, "Add static_cast to silence -Wc++11-narrowing."

std::errc::no_buffer_space should be used for OS-oriented errors for socket transmission.
(Seek discussions around llvm/xray.)

I could substitute s/no_buffer_space/others/g, but I revert whole them ATM.

Could we define and use LLVM errors there?

llvm-svn: 296258
2017-02-25 17:04:23 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Junmo Park 061bec802e Minor code cleanup. NFC.
llvm-svn: 296222
2017-02-25 01:50:45 +00:00
Zachary Turner af299ea5d4 [PDB] General improvements to Stream library.
This adds various new functionality and cleanup surrounding the
use of the Stream library.  Major changes include:

* Renaming of all classes for more consistency / meaningfulness
* Addition of some new methods for reading multiple values at once.
* Full suite of unit tests for reader / writer functionality.
* Full set of doxygen comments for all classes.
* Streams now store their own endianness.
* Fixed some bugs in a few of the classes that were discovered
  by the unit tests.

llvm-svn: 296215
2017-02-25 00:44:30 +00:00
Zachary Turner d2684b7969 [PDB] Rename Stream related source files.
This is part of a larger effort to get the Stream code moved
up to Support.  I don't want to do it in one large patch, in
part because the changes are so big that it will treat everything
as file deletions and add, losing history in the process.
Aside from that though, it's just a good idea in general to
make small changes.

So this change only changes the names of the Stream related
source files, and applies necessary source fix ups.

llvm-svn: 296211
2017-02-25 00:33:34 +00:00
Dan Gohman 82607f56bd [WebAssembly] Add support for using a wasm global for the stack pointer.
This replaces the __stack_pointer variable which was allocated in linear
memory.

llvm-svn: 296201
2017-02-24 23:46:05 +00:00
Dan Gohman d934cb8806 [WebAssembly] Basic support for Wasm object file encoding.
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.

llvm-svn: 296190
2017-02-24 23:18:00 +00:00
Stanislav Mekhanoshin 42259cf35e Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

llvm-svn: 296182
2017-02-24 21:56:16 +00:00
Tim Northover ef29e7284b GlobalISel: check for CImm rather than Imm on G_CONSTANTs.
All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm
(i.e. a ConstantInt), so that's what the selector needs to look for.

llvm-svn: 296176
2017-02-24 21:21:38 +00:00
Eli Friedman c12a5a7595 [CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.

Differential Revision: https://reviews.llvm.org/D30114

llvm-svn: 296167
2017-02-24 20:51:36 +00:00
Michael Kuperstein 46b131e3f8 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296149
2017-02-24 18:41:32 +00:00
Sanjay Patel 832b1622d8 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

llvm-svn: 296137
2017-02-24 17:17:33 +00:00
Daniel Sanders 066ebbfd46 [globalisel] Decouple src pattern operands from dst pattern operands.
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
  %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
  %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

llvm-svn: 296131
2017-02-24 15:43:30 +00:00
Justin Bogner 259a0cf3ee Add missing initialization for MachineOptimizationRemarkEmitter
This was missed in r293110.

llvm-svn: 296096
2017-02-24 07:42:35 +00:00
Craig Topper c446b1ffae [ExecutionDepsFix] Use range-based for loop. NFC
llvm-svn: 296093
2017-02-24 06:38:24 +00:00
Michael Kuperstein 581c9f4b20 Revert r269060 to pacify bots.
llvm-svn: 296064
2017-02-24 01:22:19 +00:00
Michael Kuperstein 12e79d5002 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296060
2017-02-24 00:56:21 +00:00
Ahmed Bougacha 7daaf88c70 [GlobalISel] Use the same name for all remarks.
While there, switch to the explicit ctor.

llvm-svn: 296059
2017-02-24 00:34:47 +00:00
Ahmed Bougacha 7c88a4e12b [GlobalISel] Use the DISubprogram for translation failure remarks.
Justin added support for DISubprogram locs in r295531 and r296052.
Use that instead of no-loc for constants and arguments.

llvm-svn: 296058
2017-02-24 00:34:44 +00:00
Ahmed Bougacha 8f9e99bcb6 [GlobalISel] Remove now-unnecessary variable. NFC.
Since r296047, we're able to return early on failures.
Don't track whether we succeeded.

llvm-svn: 296057
2017-02-24 00:34:41 +00:00
Justin Bogner 369ba753aa OptDiag: Summarize the instruction count in asm-printer
Add an optimization remark to asm-printer that summarizes the number
of instructions emitted per function.

llvm-svn: 296053
2017-02-24 00:19:22 +00:00
Ahmed Bougacha 4f8dd0202d [GlobalISel] Don't translate other blocks when one failed.
We were stopping the translation of the parent block when the
translation of an instruction failed, but we were still trying to
translate the other blocks of the parent function.

Don't do that.

llvm-svn: 296047
2017-02-23 23:57:36 +00:00
Ahmed Bougacha eceabddcfd [GlobalISel] Finalize translated function on scope exit. NFC.
This is the compromise between having a per-function IRTranslator
and manually managing the per-function state.

llvm-svn: 296046
2017-02-23 23:57:28 +00:00
Kyle Butt ebe6cc4dad CodeGen: MachineBlockPlacement: Rename member to more general name. NFC.
Rename ComputedTrellisEdges to ComputedEdges to allow for other methods of
pre-computing edges.

Differential Revision: https://reviews.llvm.org/D30308

llvm-svn: 296018
2017-02-23 21:22:24 +00:00
Ahmed Bougacha ae9dadecf3 [GlobalISel] Emit opt remarks on isel fallbacks.
Having more fine-grained information on the specific construct that
caused us to fallback is valuable for large-scale data collection.

We still have the fallback warning, that's also used for FastISel.
We still need to remove the fallback warning, and teach FastISel to also
emit remarks (it currently has a combination of the warning, stats, and
debug prints: the remarks could unify all three).

The abort-on-fallback path could also be better handled using remarks:
one could imagine a "-Rpass-error", analoguous to "-Werror", which would
promote missed/failed remarks to errors.  It's not clear whether that
would be useful for other remarks though, so we're not there yet.

llvm-svn: 296013
2017-02-23 21:05:42 +00:00
Ahmed Bougacha 272739751d [CodeGen] Teach opt remarks how to print MI instructions.
This will be used with GISel opt remarks.

llvm-svn: 296012
2017-02-23 21:05:33 +00:00
Ahmed Bougacha 97119d48db [CodeGen] Print MI without a newline when skipping debugloc. NFC.
This matches the behavior for skip-operands. While there, document it.
This is a follow-up to r296007.

llvm-svn: 296011
2017-02-23 21:05:29 +00:00
Stanislav Mekhanoshin ce3ddd2de4 Correct register pressure calculation in presence of subregs
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

llvm-svn: 296009
2017-02-23 20:19:44 +00:00
Ahmed Bougacha 4319224628 [CodeGen] Add a way to SkipDebugLoc in MachineInstr::print(). NFC.
llvm-svn: 296007
2017-02-23 19:17:31 +00:00
Ahmed Bougacha 1b3339447d [GlobalISel] Simplify Select type cleanup using a ScopeExit. NFC.
This lets us use more natural early-returns when selection fails.

llvm-svn: 296006
2017-02-23 19:17:24 +00:00
Sanjay Patel 4a4fbe162f [DAG] add convenience function to get -1 constant; NFCI
llvm-svn: 296004
2017-02-23 19:02:33 +00:00
Adam Nemet b516cf3f3f [LazyMachineBFI] Reimplement with getAnalysisIfAvailable
Since LoopInfo is not available in machine passes as universally as in IR
passes, using the same approach for OptimizationRemarkEmitter as we did for IR
will run LoopInfo and DominatorTree unnecessarily.  (LoopInfo is not used
lazily by ORE.)

To fix this, I am modifying the approach I took in D29836.  LazyMachineBFI now
uses its client passes including MachineBFI itself that are available or
otherwise compute them on the fly.

So for example GreedyRegAlloc, since it's already using MBFI, will reuse that
instance.  On the other hand, AsmPrinter in Justin's patch will generate DT,
LI and finally BFI on the fly.

(I am of course wondering now if the simplicity of this approach is even
preferable in IR.  I will do some experiments.)

Testing is provided by an updated version of D29837 which requires Justin's
patch to bring ORE to the AsmPrinter.

Differential Revision: https://reviews.llvm.org/D30128

llvm-svn: 295996
2017-02-23 17:30:01 +00:00
Simon Pilgrim 858d8e672d Fix signed/unsigned comparison warning on MSVC
llvm-svn: 295962
2017-02-23 12:00:34 +00:00
Eugene Zelenko db56e5a89a [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295893
2017-02-22 22:32:51 +00:00
Bill Seurer 8e48f416ad [DAGCombiner] revert r295336
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE.  See 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.

Reverting as per the developer's request.

llvm-svn: 295849
2017-02-22 16:27:33 +00:00
Igor Breger f7359d893a [X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .

Reviewers: qcolombet, rovka, zvi, ab

Reviewed By: rovka

Subscribers: mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29816

llvm-svn: 295824
2017-02-22 12:25:09 +00:00
Dan Gohman 18eafb6c68 [WebAssembly] Add skeleton MC support for the Wasm container format
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.

Differential Revision: https://reviews.llvm.org/D26722

llvm-svn: 295803
2017-02-22 01:23:18 +00:00
Matt Arsenault f0a4823b91 DAG: Check if extract_vector_elt is legal or custom
Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.

llvm-svn: 295782
2017-02-21 22:47:27 +00:00
Eugene Zelenko 49e2fc4f5f [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295773
2017-02-21 22:07:52 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Matthias Braun 9ab403942b ScheduleDAG: Cleanup; NFC
- Fix doxygen comments (do not repeat documented name, remove definition
    comment if there is already one at the declaration, add \p, ...)
- Add some const modifiers
- Use range based for

llvm-svn: 295688
2017-02-21 01:27:33 +00:00
Taewook Oh 4cf5c1087c [BranchFolding] Update debug location along with the update of branch instruction.
Summary:
Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of

```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```

, but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue.

Reviewers: qcolombet, aprantl, craig.topper, MatzeB

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29902

llvm-svn: 295684
2017-02-21 00:12:38 +00:00
Simon Pilgrim c0dc9a4913 Strip trailing whitespace.
llvm-svn: 295653
2017-02-20 11:56:43 +00:00
Simon Pilgrim 50b958c07a [SelectionDAG] Add scalarization support for ISD::*_EXTEND_VECTOR_INREG opcodes.
Thanks to Mikael Holmén for the initial test case

llvm-svn: 295652
2017-02-20 11:55:58 +00:00
Artyom Skrobov be31754094 Remove redundant call to GluedNodes.back() [NFC]
llvm-svn: 295607
2017-02-19 16:56:18 +00:00
Matthias Braun 431305927f MachineRegionInfo: Fix pass initialization
- Adapt MachineBasicBlock::getName() to have the same behavior as the IR
  BasicBlock (Value::getName()).
- Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in
  the CodeGen library.
- MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region").
- MachineRegionInfo should depend on MachineDominatorTree,
  MachinePostDominatorTree and MachineDominanceFrontier instead of their
  respective IR versions.
- Since there were no tests for this, add a X86 MIR test.

Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com>

llvm-svn: 295518
2017-02-18 00:41:16 +00:00
Eugene Zelenko be37db1882 [CodeGen] Revert changes in LowLevelType to pre-r295499 to fix broken buildbots.
llvm-svn: 295505
2017-02-17 22:23:34 +00:00
Eugene Zelenko 5db84df728 [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 295499
2017-02-17 21:43:25 +00:00
Adrian Prantl 67c2442210 Debug Info: Sort frame index expressions before emitting them.
This fixes PR31381, which caused an assertion and/or invalid debug info.

This affects debug variables that have multiple fragments in the MMI
side (i.e.: in the stack frame) table.
rdar://problem/30571676

llvm-svn: 295486
2017-02-17 19:42:32 +00:00
Tim Northover 88634996c7 GlobalISel: verify that generic loads & stores have a mem operand.
The mem operand is used by GlobalISel to convey atomic constraints so dropping
it is invalid.

llvm-svn: 295476
2017-02-17 18:50:15 +00:00
Sanjay Patel 7f2e58972c [DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI
I can't find any tests of the non-i1 code path, so it may be unnecessary at this point.

llvm-svn: 295463
2017-02-17 17:13:27 +00:00
Simon Pilgrim 0429c0cf8b Fix signed/unsigned comparison warning.
llvm-svn: 295453
2017-02-17 16:01:16 +00:00
Simon Pilgrim 511d788a95 [DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).

This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.

The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).

Differential Revision: https://reviews.llvm.org/D29454

llvm-svn: 295451
2017-02-17 15:14:48 +00:00
Sanjay Patel 5573042035 [DAGCombiner] improve readability; NFCI
llvm-svn: 295447
2017-02-17 14:21:59 +00:00
Teresa Johnson 76b5b7493c Handle link of NoDebug CU with a CU that has debug emission enabled
Summary:
This is an issue both with regular and Thin LTO. When we link together
a DICompileUnit that is marked NoDebug (e.g when compiling with -g0
but applying an AutoFDO profile, which requires location tracking
in the compiler) and a DICompileUnit with debug emission enabled,
we can have failures during dwarf debug generation. Specifically,
when we have inlined from the NoDebug compile unit into the debug
compile unit, we can fail during construction of the abstract and
inlined scope DIEs. This is because the SPMap does not include NoDebug
CUs (they are skipped in the debug_compile_units_iterator).

This patch fixes the failures by skipping locations from NoDebug CUs
when extracting lexical scopes.

Reviewers: dblaikie, aprantl

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D29765

llvm-svn: 295384
2017-02-17 00:21:19 +00:00
Benjamin Kramer 3f6260cab4 [MachinePipeliner] Remove redundant destructor. NFC.
llvm-svn: 295372
2017-02-16 20:26:51 +00:00
David Blaikie b2fbb4b276 Refactor DebugHandlerBase a bit to common non-debug-having-function filtering
llvm-svn: 295354
2017-02-16 18:48:33 +00:00
Artur Pilipenko 85d758299e [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit -r295314 with PowerPC and AMDGPU tests updated.

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295336
2017-02-16 17:07:27 +00:00
Artur Pilipenko a1b384c4ce Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine"
This change causes some of AMDGPU and PowerPC tests to fail.

llvm-svn: 295316
2017-02-16 13:04:46 +00:00
Artur Pilipenko daaa0c0f7d [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295314
2017-02-16 12:53:26 +00:00
Diana Picus ca6a890d7f [ARM] GlobalISel: Lower double precision FP args
For the hard float calling convention, we just use the D registers.

For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.

For pure soft float targets, we still bail out because we don't support the
libcalls yet.

llvm-svn: 295295
2017-02-16 07:53:07 +00:00
Hans Wennborg a468601e0e [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

llvm-svn: 295262
2017-02-16 00:04:05 +00:00
Tim Northover 9136617a3f GlobalISel: legalize va_arg on AArch64.
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.

llvm-svn: 295255
2017-02-15 23:22:50 +00:00
Tim Northover 4a652227dd GlobalISel: support translating va_arg
Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also
need to attach the required alignment info.

llvm-svn: 295254
2017-02-15 23:22:33 +00:00
Matt Arsenault 900b21c350 Fix typos
llvm-svn: 295246
2017-02-15 22:19:06 +00:00
Matt Arsenault 5de8dc9cf5 DAG: Do not scalarize fsub if fneg is legal
Tests will be included with future commit.

llvm-svn: 295242
2017-02-15 22:02:42 +00:00
Kyle Butt 7fbec9bdf1 Codegen: Make chains from trellis-shaped CFGs
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:

  A     B
  |\   /|
  | \ / |
  |  X  |
  | / \ |
  |/   \|
  C     D

would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.

because of this we can tail duplicate to extend existing trellises.

As an example consider the following CFG:

    B   D   F   H
   / \ / \ / \ / \
  A---C---E---G---Ret

Where A,C,E,G are all small (Currently 2 instructions).

The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.

The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret

define void @straight_test(i32 %tag) {
entry:
  br label %test1
test1: ; A
  %tagbit1 = and i32 %tag, 1
  %tagbit1eq0 = icmp eq i32 %tagbit1, 0
  br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
  call void @a()
  br label %test2
test2: ; C
  %tagbit2 = and i32 %tag, 2
  %tagbit2eq0 = icmp eq i32 %tagbit2, 0
  br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
  call void @b()
  br label %test3
test3: ; E
  %tagbit3 = and i32 %tag, 4
  %tagbit3eq0 = icmp eq i32 %tagbit3, 0
  br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
  call void @c()
  br label %test4
test4: ; G
  %tagbit4 = and i32 %tag, 8
  %tagbit4eq0 = icmp eq i32 %tagbit4, 0
  br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
  call void @d()
  br label %exit
exit:
  ret void
}

here is the layout after D27742:
straight_test:                          # @straight_test
; ... Prologue elided
; BB#0:                                 # %entry ; A (merged with test1)
; ... More prologue elided
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_2
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_3
	b .LBB0_4
.LBB0_2:                                # %optional1 ; B (copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_4
.LBB0_3:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_5
	b .LBB0_6
.LBB0_4:                                # %optional2 ; D (copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_5:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
	b .LBB0_7
.LBB0_6:                                # %optional3 ; F (copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit ; Ret
	ld 30, 96(1)                    # 8-byte Folded Reload
	addi 1, 1, 112
	ld 0, 16(1)
	mtlr 0
	blr

The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.

This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.

Here is the resulting concrete layout:

straight_test:                          # @straight_test
; BB#0:                                 # %entry ; A (merged with test1)
	mr 30, 3
	andi. 3, 30, 1
	bc 12, 1, .LBB0_4
; BB#1:                                 # %test2 ; C
	rlwinm. 3, 30, 0, 30, 30
	bne	 0, .LBB0_5
.LBB0_2:                                # %test3 ; E
	rlwinm. 3, 30, 0, 29, 29
	bne	 0, .LBB0_6
.LBB0_3:                                # %test4 ; G
	rlwinm. 3, 30, 0, 28, 28
	bne	 0, .LBB0_7
	b .LBB0_8
.LBB0_4:                                # %optional1 ; B (Copy of C)
	bl a
	nop
	rlwinm. 3, 30, 0, 30, 30
	beq	 0, .LBB0_2
.LBB0_5:                                # %optional2 ; D (Copy of E)
	bl b
	nop
	rlwinm. 3, 30, 0, 29, 29
	beq	 0, .LBB0_3
.LBB0_6:                                # %optional3 ; F (Copy of G)
	bl c
	nop
	rlwinm. 3, 30, 0, 28, 28
	beq	 0, .LBB0_8
.LBB0_7:                                # %optional4 ; H
	bl d
	nop
.LBB0_8:                                # %exit

Differential Revision: https://reviews.llvm.org/D28522

llvm-svn: 295223
2017-02-15 19:49:14 +00:00
Xinliang David Li 538d666814 include function name in dot filename
Differential Revision: http://reviews.llvm.org/D29975

llvm-svn: 295220
2017-02-15 19:21:04 +00:00
Michael Kuperstein ba80db39d7 [DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.

This fixes PR311956.

Differential Revision: https://reviews.llvm.org/D29961

llvm-svn: 295213
2017-02-15 18:37:26 +00:00
Sagar Thakur ec65792910 [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el
Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697

llvm-svn: 295164
2017-02-15 10:48:11 +00:00
Craig Topper 96ec7a23e3 [SelectionDAGBuilder] Simplify creation of shufflevector DAG nodes where inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.

This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.

Reviewers: zvi, RKSimon

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29926

llvm-svn: 295152
2017-02-15 05:57:16 +00:00
Reid Kleckner a622fc9bdf [BranchFolding] Tail common all identical unreachable blocks
Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.

Reviewers: hans, iteratee, haicheng

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29153

llvm-svn: 295105
2017-02-14 21:02:24 +00:00
Tim Northover c2f8956313 GlobalISel: introduce G_PTR_MASK to simplify alloca handling.
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.

llvm-svn: 295103
2017-02-14 20:56:18 +00:00
Eric Christopher 14303d1815 Reformat slightly.
llvm-svn: 295096
2017-02-14 19:43:50 +00:00
Wolfgang Pieb 399dcfaa2a Reapply r294532, reverted in r294787.
Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with 
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.

This fixes PR31935.

llvm-svn: 295093
2017-02-14 19:08:45 +00:00
Aditya Nandakumar bb0483bc8e [Tablegen] Instrumenting table gen DAGGenISelDAG
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV

llvm-svn: 295081
2017-02-14 18:32:41 +00:00
Adam Nemet bbb141c734 Add new pass LazyMachineBlockFrequencyInfo
And use it in MachineOptimizationRemarkEmitter.  A test will follow on top of
Justin's changes to enable MachineORE in AsmPrinter.

The approach is similar to the IR-level pass.  It's a bit simpler because BPI
is immutable at the Machine level so we don't need to make that lazy.

Because of this, a new function mapping is introduced (BPIPassTrait::getBPI).
This function extracts BPI from the pass.  In case of the lazy pass, this is
when the calculation of the BFI occurs.  For Machine-level, this is the
identity function.

Differential Revision: https://reviews.llvm.org/D29836

llvm-svn: 295072
2017-02-14 17:21:09 +00:00
Artyom Skrobov dc66a82dc7 Removing a redundant assignment
llvm-svn: 295055
2017-02-14 14:44:01 +00:00
Eugene Zelenko d96089b248 [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
Same changes in files affected by reduced MC headers dependencies.

llvm-svn: 295009
2017-02-14 00:33:36 +00:00
Tim Northover 48dfa1a6ed GlobalISel: represent atomic loads & stores via the MachineMemOperand.
Also make sure the AArch64 backend doesn't try to convert them into normal
loads and stores.

llvm-svn: 294993
2017-02-13 22:14:16 +00:00
Tim Northover b73e309071 MIR: parse & print the atomic parts of a MachineMemOperand.
We're going to need them very soon for GlobalISel.

llvm-svn: 294992
2017-02-13 22:14:08 +00:00
Taewook Oh 4d35f9e10e Address post-commit comments for https://reviews.llvm.org/D29596. NFCI.
llvm-svn: 294985
2017-02-13 21:12:27 +00:00
Arnold Schwaighofer 8f3df731dc swiftcc: Don't emit tail calls from callers with swifterror parameters
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.

rdar://30495920

llvm-svn: 294982
2017-02-13 19:58:28 +00:00
Taewook Oh 06a2128cfa Make MachineBasicBlock::updateTerminator to update DebugLoc as well
Summary:
Currently MachineBasicBlock::updateTerminator simply drops DebugLoc for newly created branch instructions, which may cause incorrect stepping and/or imprecise sample profile data. Below is an example:

```
  1 extern int bar(int x);
  2
  3 int foo(int *begin, int *end) {
  4   int *i;
  5   int ret = 0;
  6   for (
  7       i = begin ;
  8       i != end ;
  9       i++)
 10   {
 11       ret += bar(*i);
 12   }
 13   return ret;
 14 }
```

Below is a bitcode of 'foo' at the end of LLVM-IR level optimizations with -O3:

```
define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 {
entry:
  %cmp6 = icmp eq i32* %begin, %end, !dbg !9
  br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12

for.body.preheader:                               ; preds = %entry
  br label %for.body, !dbg !13

for.body:                                         ; preds = %for.body.preheader, %for.body
  %ret.08 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ]
  %i.07 = phi i32* [ %incdec.ptr, %for.body ], [ %begin, %for.body.preheader ]
  %0 = load i32, i32* %i.07, align 4, !dbg !13, !tbaa !15
  %call = tail call i32 @bar(i32 %0), !dbg !19
  %add = add nsw i32 %call, %ret.08, !dbg !20
  %incdec.ptr = getelementptr inbounds i32, i32* %i.07, i64 1, !dbg !21
  %cmp = icmp eq i32* %incdec.ptr, %end, !dbg !9
  br i1 %cmp, label %for.end.loopexit, label %for.body, !dbg !12, !llvm.loop !22

for.end.loopexit:                                 ; preds = %for.body
  br label %for.end, !dbg !24

for.end:                                          ; preds = %for.end.loopexit, %entry
  %ret.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ]
  ret i32 %ret.0.lcssa, !dbg !24
}
```

where

```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```

. As you can see, the terminator of 'entry' block, which is a loop control branch, has a DebugLoc of line 6, column 3. Howerver, after the execution of 'MachineBlock::updateTerminator' function, which is triggered by MachineSinking pass, the DebugLoc info is dropped as below (see there's no debug-location for JNE_1):

```
  bb.0.entry:
    successors: %bb.4(0x30000000), %bb.1.for.body.preheader(0x50000000)
    liveins: %rdi, %rsi

    %6 = COPY %rsi
    %5 = COPY %rdi
    %8 = SUB64rr %5, %6, implicit-def %eflags, debug-location !9
    JNE_1 %bb.1.for.body.preheader, implicit %eflags
```

This patch addresses this issue and make newly created branch instructions to keep debug-location info.

Reviewers: aprantl, MatzeB, craig.topper, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, llvm-commits

Differential Revision: https://reviews.llvm.org/D29596

llvm-svn: 294976
2017-02-13 18:15:31 +00:00
Quentin Colombet fbae5fcb96 [FastISel] Add a diagnostic to warm on fallback.
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.

llvm-svn: 294970
2017-02-13 17:38:59 +00:00
Sanne Wouda 91eadad3bd [Assembler] Improve diagnostics for inline assembly.
Summary:
Keep a vector of LocInfos around; one for each call to EmitInlineAsm.
Since each call to EmitInlineAsm creates a new buffer in the inline asm
SourceMgr, we can use the buffer number to map to the right LocInfo.

Reviewers: rengolin, grosbach, rnk, echristo

Reviewed By: rnk

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D29769

llvm-svn: 294947
2017-02-13 13:58:00 +00:00
Andrew V. Tischenko 8da96914f9 Compile time decreasing in the case we're dealing with Machine Combiner.
Before this patch compile time was about 21s (see below). After this patch
we have less than 2s (see bellow).

  Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

    DAGCombiner - trunk
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m1.685s

    DAGCombiner + Speed patch
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m1.655s

    MachineCombiner w/o Speed patch
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m21.614s

    MachineCombiner + Speed patch
        time ./llc spill_fdiv.ll -o /dev/null -enable-unsafe-fp-math
        real  0m1.593s

The test spill_fdiv.ll  is attached to D29627
D29627 should be closed.

llvm-svn: 294936
2017-02-13 09:43:37 +00:00
Craig Topper 3668bde371 [DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932
2017-02-13 04:53:33 +00:00
Craig Topper aa46204ed9 [DAGCombiner] Remove the half vector width check for the combine of EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.

llvm-svn: 294930
2017-02-12 23:49:49 +00:00
Sanjay Patel 0557a44287 [TargetLowering] fix SETCC SETLT folding with FP types
The bug was introduced with:
https://reviews.llvm.org/rL294863

...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should 
still be able to fold this setcc/compare.

llvm-svn: 294924
2017-02-12 23:07:52 +00:00
Craig Topper b633adedc7 [DAGCombiner] Make the combine of INSERT_SUBVECTOR into a CONCAT_VECTOR more generic to support larger concats.
llvm-svn: 294875
2017-02-11 22:57:09 +00:00
Sanjay Patel 63499b61c9 [TargetLowering] check for sign-bit comparisons in SimplifyDemandedBits
I don't know if anything other than x86 vectors is affected by this change, but this may allow 
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.

The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, 
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210

Differential Revision: https://reviews.llvm.org/D29687

llvm-svn: 294863
2017-02-11 18:01:55 +00:00
Nico Weber ee0b0ec935 Revert r294532, it caused PR31935
llvm-svn: 294787
2017-02-10 21:57:30 +00:00
Tim Shen 918ed871df [XRay] Implement powerpc64le xray.
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.

Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.

Reviewers: dberris, echristo, iteratee, kbarton, hfinkel

Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D29742

llvm-svn: 294781
2017-02-10 21:03:24 +00:00
Tim Northover 0e01170c79 GlobalISel: drop lifetime intrinsics during translation.
We don't use them yet and they just cause problems.

llvm-svn: 294770
2017-02-10 19:10:38 +00:00
Simon Pilgrim bfb1747806 [DAGCombine] Allow vector constant folding of any value type before type legalization
The patch comes in 2 parts:

1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.

2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.

Fix for PR30760

Differential Revision: https://reviews.llvm.org/D29568

llvm-svn: 294749
2017-02-10 14:37:25 +00:00
Craig Topper a9f1121896 [SelectionDAG] Dump the DAG after legalizing vector ops and after the second type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.

I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.

Having said that though, I think we should probably fix legalize vector ops to log what its doing.

Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc

Reviewed By: RKSimon

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D29554

llvm-svn: 294711
2017-02-10 05:05:57 +00:00
Eric Fiselier 87c87f4c30 [CMake] Fix pthread handling for out-of-tree builds
LLVM defines `PTHREAD_LIB` which is used by AddLLVM.cmake and various projects
to correctly link the threading library when needed. Unfortunately
`PTHREAD_LIB` is defined by LLVM's `config-ix.cmake` file which isn't installed
and therefore can't be used when configuring out-of-tree builds. This causes
such builds to fail since `pthread` isn't being correctly linked.

This patch attempts to fix that problem by renaming and exporting
`LLVM_PTHREAD_LIB` as part of`LLVMConfig.cmake`. I renamed `PTHREAD_LIB`
because It seemed likely to cause collisions with downstream users of
`LLVMConfig.cmake`.

llvm-svn: 294690
2017-02-10 01:59:20 +00:00
Geoff Berry 7e320c2485 [SelectionDAG] Fix bugs in inverted condition splitting code.
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen.  Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.

Reviewers: bogner, MatzeB, qcolombet

Subscribers: mcrosier, uabelho, llvm-commits

Differential Revision: https://reviews.llvm.org/D29680

llvm-svn: 294605
2017-02-09 18:28:17 +00:00
David Bozier 93e773e9be Revert: "[Stack Protection] Add diagnostic information for why stack protection was applied to a function"
this reverts revision r294590 as it broke some buildbots.

llvm-svn: 294593
2017-02-09 15:40:14 +00:00
David Bozier 6a44b7c2eb [Stack Protection] Add diagnostic information for why stack protection was applied to a function
Stack Smash Protection is not completely free, so in hot code, the overhead it causes can cause performance issues. By adding diagnostic information for which function have SSP and why, a user can quickly determine what they can do to stop SSP being applied to a specific hot function.

This change adds an SSP-specific DiagnosticInfo class and uses of it to the Stack Protection code. A subsequent change to clang will cause the remarks to be emitted when enabled.

Patch by: James Henderson

Differential Revision: https://reviews.llvm.org/D29023

llvm-svn: 294590
2017-02-09 15:08:40 +00:00
Artur Pilipenko 4a64031954 [DAGCombiner] Support non-zero offset in load combine
Enable folding patterns which load the value from non-zero offset:

  i8 *a = ...
  i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
  i32 val = *((i32*)(a+4))

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29394

llvm-svn: 294582
2017-02-09 12:06:01 +00:00
Wolfgang Pieb 458b4e7c46 Reapply r294356 ("Keep track of spilled variables in LiveDebugValues").
Was reverted with r294447 due to undefined behavior with negative offsets
in DBG_VALUE instructions.

llvm-svn: 294532
2017-02-08 23:46:59 +00:00
Tim Northover e041841811 GlobalISel: legalize G_FPOW to a libcall on AArch64.
There's no instruction to implement it.

llvm-svn: 294531
2017-02-08 23:23:39 +00:00
Tim Northover b38b4e2464 GlobalISel: translate @llvm.pow intrinsic to G_FPOW.
It'll usually be immediately legalized back to a libcall, but occasionally
something can be done with it so we'd just as well enable that flexibility from
the start.

llvm-svn: 294530
2017-02-08 23:23:32 +00:00
Tim Northover 0a9b27933a GlobalISel: expand mul-with-overflow into mul-hi on AArch64.
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].

llvm-svn: 294519
2017-02-08 21:22:15 +00:00
Simon Dardis 2e8cdbd795 [DebugInfo] Rename EmitDebugValue to EmitDebugThreadLocal (NFC)
As pointed out by David Blaikie in the post commit review of
r292624, EmitDebugValue should be called EmitDebugThreadLocal.

llvm-svn: 294500
2017-02-08 19:03:46 +00:00
Artur Pilipenko 045ab08252 [DAGCombiner] NFC. Mark ByteProvider accessors as const
llvm-svn: 294494
2017-02-08 17:59:34 +00:00
Tim Northover f19d467ff6 GlobalISel: translate @llvm.va_start intrinsic.
Because we need to preserve the memory access being performed we need a
separate instruction to represent this.

llvm-svn: 294492
2017-02-08 17:57:20 +00:00
Sanne Wouda 2933875cc2 [Assembler] Enable nicer diagnostics for inline assembly.
Fixed test.

Summary:
Enables source location in diagnostic messages from the backend.  This
is after parsing, during finalization.  This requires the SourceMgr, the
inline assembly string buffer, and DiagInfo to still be alive after
EmitInlineAsm returns.

This patch creates a single SourceMgr for inline assembly inside the
AsmPrinter.  MCContext gets a pointer to this SourceMgr.  Using one
SourceMgr per call to EmitInlineAsm would make it difficult for
MCContext to figure out in which SourceMgr the SMLoc is located, while a
single SourceMgr can figure it out if it has multiple buffers.

The Str argument to EmitInlineAsm is copied into a buffer and owned by
the inline asm SourceMgr.  This ensures that DiagHandlers won't print
garbage.  (Clang emits a "note: instantiated into assembly here", which
refers to this string.)

The AsmParser gets destroyed before finalization, which means that the
DiagHandlers the AsmParser installs into the SourceMgr will be stale.
Restore the saved DiagHandlers.

Since now we're using just one SourceMgr for multiple inline asm
strings, we need to tell the AsmParser which buffer it needs to parse
currently.  Hand a buffer id -- returned from SourceMgr::
AddNewSourceBuffer -- to the AsmParser.

Reviewers: rnk, grosbach, compnerd, rengolin, rovka, anemet

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29441
    

llvm-svn: 294458
2017-02-08 14:48:05 +00:00
Diana Picus 79add417b4 Revert "[Assembler] Enable nicer diagnostics for inline assembly."
This reverts commit r294433 because it seems it broke the buildbots.

llvm-svn: 294448
2017-02-08 14:02:16 +00:00
NAKAMURA Takumi a3bc043caa Revert r294356, "DebugInfo: Track spilled variables in LiveDebugValues"
It caused undefined behavior in VarLoc. As far as I investigated,

  - VarLoc::VarLoc() treats negative offset value as InvalidKind.
    Consider the case that (int64_t)MI.getOperand(1).getImm() is negative and whether it satisfies ((uint64_t)Offset < (1ULL << 32)).

  - Comparison operators in VarLoc behave undefined since VarLoc::Loc.Hash is uninitialized in case of InvalidKind.

I guess Offset (in VarLoc) could be made aware of signed, but I am not sure.
So I have reverted it for now.

llvm-svn: 294447
2017-02-08 13:49:28 +00:00
Sanne Wouda 09adc245ea [Assembler] Enable nicer diagnostics for inline assembly.
Summary:
Enables source location in diagnostic messages from the backend.  This
is after parsing, during finalization.  This requires the SourceMgr, the
inline assembly string buffer, and DiagInfo to still be alive after
EmitInlineAsm returns.

This patch creates a single SourceMgr for inline assembly inside the
AsmPrinter.  MCContext gets a pointer to this SourceMgr.  Using one
SourceMgr per call to EmitInlineAsm would make it difficult for
MCContext to figure out in which SourceMgr the SMLoc is located, while a
single SourceMgr can figure it out if it has multiple buffers.

The Str argument to EmitInlineAsm is copied into a buffer and owned by
the inline asm SourceMgr.  This ensures that DiagHandlers won't print
garbage.  (Clang emits a "note: instantiated into assembly here", which
refers to this string.)

The AsmParser gets destroyed before finalization, which means that the
DiagHandlers the AsmParser installs into the SourceMgr will be stale.
Restore the saved DiagHandlers.

Since now we're using just one SourceMgr for multiple inline asm
strings, we need to tell the AsmParser which buffer it needs to parse
currently.  Hand a buffer id -- returned from SourceMgr::
AddNewSourceBuffer -- to the AsmParser.

Reviewers: rnk, grosbach, compnerd, rengolin, rovka, anemet

Reviewed By: rnk

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29441

llvm-svn: 294433
2017-02-08 10:20:07 +00:00
Matt Arsenault 1672b1b86e TargetLowering: Remove AddrSpace parameter from GetAddrModeArguments
It doesn't make any sense to pass in to what is supposed to be parsing
the call, and this can be inferred from the pointer output.

llvm-svn: 294412
2017-02-08 07:09:03 +00:00
Amaury Sechet 4b946916ac [DAGCombiner] Push truncate through adde when the carry isn't used.
Summary: As per title.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29528

llvm-svn: 294394
2017-02-08 00:32:36 +00:00
Wolfgang Pieb 02f329370f DebugInfo: Track spilled variables in LiveDebugValues
When variables are spilled to the stack by the register allocator, keep track of their
debug locations in LiveDebugValues and insert DBG_VALUE instructions at the appropriate
place. Ensure that the locations are propagated down the dominator tree via the existing 
mechanisms.

Reviewer: aprantl

Differential Revision: https://reviews.llvm.org/D29500

llvm-svn: 294356
2017-02-07 21:23:15 +00:00
Hans Wennborg 819e3e02a9 [X86] Disable conditional tail calls (PR31257)
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).

Reverting until we figure out the proper solution.

llvm-svn: 294348
2017-02-07 20:37:45 +00:00
Tim Northover d0d025ae45 GlobalISel: translate @llvm.va_end intrinsic.
Turns out no-one actually cares about this one (at least) in tree so we can
just drop it entirely.

llvm-svn: 294345
2017-02-07 20:08:59 +00:00
Sanjoy Das 2f63cbcc0c [ImplicitNullCheck] Extend Implicit Null Check scope by using stores
Summary:
This change allows usage of store instruction for implicit null check.

Memory Aliasing Analisys is not used and change conservatively supposes
that any store and load may access the same memory. As a result
re-ordering of store-store, store-load and load-store is prohibited.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: atrick, llvm-commits

Differential Revision: https://reviews.llvm.org/D29400

llvm-svn: 294338
2017-02-07 19:19:49 +00:00
Reid Kleckner 0887d44a61 [SDAGISel] Simplify some SDAGISel code, NFC
Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.

Also use range-based for loops where possible.

llvm-svn: 294329
2017-02-07 18:42:53 +00:00
Sanjay Patel 8c99ca3df0 [TargetLowering] fix formatting and comments for ShrinkDemandedConstant; NFC
llvm-svn: 294325
2017-02-07 18:04:26 +00:00
Igor Laevsky 3be81bae63 [CodeGenPrepare] Hoist all getSubtargetImpl calls to the beginning of the pass
Differential Revision: https://reviews.llvm.org/D29456

llvm-svn: 294301
2017-02-07 13:27:20 +00:00
Daniel Jasper 84b3cc394d Revert "[DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)"
This reverts commit r294186.

On an internal test, this triggers an out-of-memory error on PPC,
presumably because there is another dagcombine that does the exact
opposite triggering and endless loop consuming more and more memory.

Chandler has started at creating a reduced test case and we'll attach it
as soon as possible.

llvm-svn: 294288
2017-02-07 08:57:50 +00:00
Matthias Braun f80403d8ee RegisterCoalescer: Fix joinReservedPhysReg()
joinReservedPhysReg() can only deal with a liverange in a single basic
block when copying from a vreg into a physreg.

See also rdar://30306405

Differential Revision: https://reviews.llvm.org/D29436

llvm-svn: 294268
2017-02-07 01:59:39 +00:00
Tim Northover 868332d6bf GlobalISel: legalize narrow G_SELECTS on AArch64.
Otherwise there aren't any patterns to select them.

llvm-svn: 294261
2017-02-06 23:41:27 +00:00
Tim Northover 0e6afbdd77 GlobalISel: legalize G_INSERT instructions
We don't handle all cases yet (see arm64-fallback.ll for an example), but this
is enough to cover most common C++ code so it's a good place to start.

llvm-svn: 294247
2017-02-06 21:56:47 +00:00
Artur Pilipenko d3464bf9ad [DAGCombiner] Support bswap as a part of load combine patterns
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D29397

llvm-svn: 294201
2017-02-06 17:48:08 +00:00
Amaury Sechet e674f5c758 Add ADDC to SelectionDAG::computeKnownBits and ComputeNumSignBits.
Summary: As per title.

Reviewers: bkramer, sunfish, lattner, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29521

llvm-svn: 294188
2017-02-06 14:59:06 +00:00
Amaury Sechet 8a3b32941d [DAGCombiner] Make DAGCombiner smarter about overflow
Summary: Leverage it to transform addc into add.

Reviewers: mkuper, spatel, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29524

llvm-svn: 294187
2017-02-06 14:54:49 +00:00
Amaury Sechet 1d466f598e [DAGCombiner] (add X, (adde Y, 0, Carry)) -> (adde X, Y, Carry)
Summary: This is extracted from D29443 .

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29564

llvm-svn: 294186
2017-02-06 14:28:39 +00:00
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Kamil Rytarowski 5d2bd8dd54 Revamp llvm::once_flag to be closer to std::once_flag
Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.

This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.

Sponsored by <The NetBSD Foundation>

Reviewers: mehdi_amini, labath, joerg

Reviewed By: mehdi_amini

Subscribers: emaste, clayborg

Differential Revision: https://reviews.llvm.org/D29566

llvm-svn: 294143
2017-02-05 21:13:06 +00:00
Geoff Berry 76ca8c2b34 [SelectionDAG] In InstrEmitter, handle EXTRACT_SUBREG of a physical register.
Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.

Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.

Reviewers: bogner, qcolombet, MatzeB, atrick

Subscribers: aemerson, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D29495

llvm-svn: 294129
2017-02-05 18:28:14 +00:00
Amaury Sechet 143902c29f [DAGCombiner] Leverage add's commutativity
Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.

Reviewers: mkuper, spatel, bkramer, RKSimon, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29541

llvm-svn: 294125
2017-02-05 14:22:20 +00:00
Craig Topper 42b83f8d6e [DAGCombiner] Canonicalize the order of a chain of INSERT_SUBVECTORs.
Based on similar code for INSERT_VECTOR_ELT.

llvm-svn: 294110
2017-02-04 23:26:39 +00:00
Craig Topper 04dce84ead [DAGCombiner] Use DAG.getAnyExtOrTrunc to simplify some code. NFC
llvm-svn: 294109
2017-02-04 23:26:37 +00:00
Craig Topper ceaf9c1633 [DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.
llvm-svn: 294108
2017-02-04 23:26:34 +00:00
Amaury Sechet 6e2d8e49ec Formatting in DAGCombiner. NFC
llvm-svn: 294091
2017-02-04 13:01:53 +00:00
Matthias Braun 82e7f4d877 MachineCopyPropagation: Respect implicit operands of COPY
The code missed to check implicit operands of COPY instructions for
defs/uses.

Differential Revision: https://reviews.llvm.org/D29522

llvm-svn: 294088
2017-02-04 02:27:20 +00:00
Matthias Braun 776a1d7ecb MachineCopyPropagation: Do not consider undef operands as clobbers
This was originally introduced in r278321 to work around correctness
problems in the ExecutionDepsFix pass; Probably also to keep the
performance benefits of breaking the false dependencies which of course
also affect undef operands.

ExecutionDepsFix has been improved here recently (see for example
r278321) so we should not need this exception any longer.

Differential Revision: https://reviews.llvm.org/D29525

llvm-svn: 294087
2017-02-04 02:27:13 +00:00
Kyle Butt c7d67eef5a [CodeGen]: BlockPlacement: Skip extraneous logging.
Move a check for blocks that are not candidates for tail duplication up before
the logging. Reduces logging noise. No non-logging changes intended.

llvm-svn: 294086
2017-02-04 02:26:34 +00:00
Kyle Butt e9425c4ff8 [CodeGen]: BlockPlacement: Apply const liberally. NFC
Anything that needs to be passed to AnalyzeBranch unfortunately can't be const,
or more would be const. Added const_iterator to BlockChain to allow
BlockChain to be const when we don't expect to change it.

llvm-svn: 294085
2017-02-04 02:26:32 +00:00
Eugene Zelenko 502d0bc28e [CodeGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce TargetInstrInfo.h dependencies.

llvm-svn: 294084
2017-02-04 02:00:53 +00:00
Craig Topper aa741abfc5 [TwoAddressInstruction] Fix typo in comment. NFC
llvm-svn: 294083
2017-02-04 01:58:10 +00:00
Brendon Cahoon 9809b10586 [RegisterCoalescer] Do not call getInstructionIndex with DBG_VALUE
An assert occurs when calling SlotIndexes::getInstructionIndex with
a DBG_VALUE instruction because the function expects an instruction
with a slot index. However, there is no slot index for a DBG_VALUE
instruction.

Differential Revision: https://reviews.llvm.org/D29048

llvm-svn: 294070
2017-02-04 00:10:22 +00:00
Ahmed Bougacha 9677cc6fb7 [TLI] Robustize SDAG LibFunc proto checking by merging it into TLI.
This re-applies commit r292189, reverted in r292191.

SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
  checking: Ulrich agrees on removing them.

I don't think it's worth supporting any of these (IMO) invalid
testcases.  Instead, fix them to be more meaningful.

llvm-svn: 294028
2017-02-03 19:11:19 +00:00
Tim Northover c3e3f59d12 GlobalISel: translate dynamic alloca instructions.
llvm-svn: 294022
2017-02-03 18:22:45 +00:00
Alexey Bataev a0d9f2582b [SelectionDAG] Fix for PR30775: Assertion `NodeToMatch->getOpcode() !=
ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.

NodeToMatch can be modified during matching, but code does not handle
this situation.

Differential Revision: https://reviews.llvm.org/D29292

llvm-svn: 294003
2017-02-03 12:28:40 +00:00
David Blaikie a0e3c75187 DebugInfo: ensure type and namespace names are included in pubnames/pubtypes even when they are only present in type units
While looking to add support for placing singular types (types that will
only be emitted in one place (such as attached to a strong vtable or
explicit template instantiation definition)) not in type units (since
type units have overhead) I stumbled across that change causing an
increase in pubtypes.

Turns out we were missing some types from type units if they were only
referenced from other type units and not from the debug_info section.

This fixes that, following GCC's line of describing the offset of such
entities as the CU die (since there's no compile unit-relative offset
that would describe such an entity - they aren't in the CU). Also like
GCC, this change prefers to describe the type stub within the CU rather
than the "just use the CU offset" fallback where possible. This may give
the DWARF consumer some opportunity to find the extra info in the type
stub - though I'm not sure GDB does anything with this currently.

The size of the pubnames/pubtypes sections now match exactly with or
without type units enabled.

This nearly triples (+189%) the pubtypes section for a clang self-host
and grows pubnames by 0.07% (without compression). For a total of 8%
increase in debug info sections of the objects of a Split DWARF build
when using type units.

llvm-svn: 293971
2017-02-03 00:44:18 +00:00
Bob Haarman dd4ebc1d3b [lto] add getLinkerOpts()
Summary: Some compilers, including MSVC and Clang, allow linker options to be specified in source files. In the legacy LTO API, there is a getLinkerOpts() method that returns linker options for the bitcode module being processed. This change adds that method to the new API, so that the COFF linker can get the right linker options when using the new LTO API.

Reviewers: pcc, ruiu, mehdi_amini, tejohnson

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D29207

llvm-svn: 293950
2017-02-02 23:00:49 +00:00
Reid Kleckner c35139ec0d [CodeGen] Remove dead call-or-prologue enum from CCState
This enum has been dead since Olivier Stannard re-implemented ARM byval
handling in r202985 (2014).

llvm-svn: 293943
2017-02-02 21:58:22 +00:00
Xinliang David Li 58fcc9bdce [PGO] internal option cleanups
1. Added comments for options
2. Added missing option cl::desc field
3. Uniified function filter option for graph viewing.
   Now PGO count/raw-counts share the same
   filter option: -view-bfi-func-name=.

llvm-svn: 293938
2017-02-02 21:29:17 +00:00
Quentin Colombet 5725f56bb0 [LiveRangeEdit] Don't mess up with LiveInterval when a new vreg is created.
In r283838, we added the capability of splitting unspillable register.
When doing so we had to make sure the split live-ranges were also
unspillable and we did that by marking the related live-ranges in the
delegate method that is called when a new vreg is created.
However, by accessing the live-range there, we also triggered their lazy
computation (LiveIntervalAnalysis::getInterval) which is not what we
want in general. Indeed, later code in LiveRangeEdit is going to build
the live-ranges this lazy computation may mess up that computation
resulting in assertion failures. Namely, the createEmptyIntervalFrom
method expect that the live-range is going to be empty, not computed.

Thanks to Mikael Holmén <mikael.holmen@ericsson.com> for noticing and
reporting the problem.

llvm-svn: 293934
2017-02-02 20:44:36 +00:00
Xinliang David Li 1eb4ec6a2e [PGO] make graph view internal options available for all builds
Differential Revision: https://reviews.llvm.org/D29259

llvm-svn: 293921
2017-02-02 19:18:56 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Amaury Sechet f3e421d6e9 Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC
llvm-svn: 293903
2017-02-02 16:07:44 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Matthias Braun 9dc3b5ff89 RegisterCoalescer: Cleanup joinReservedPhysReg(); NFC
- Factor out a common subexpression
- Add some helpful comments
- Fix printing of a register in a debug message

llvm-svn: 293856
2017-02-02 02:23:27 +00:00
Paul Robinson 5362216c36 Remove an assertion that doesn't hold when mixing -g and -gmlt through
LTO.  Replace it with a related assertion, ensuring that abstract
variables appear only in abstract scopes.
Part of PR31437.

Differential Revision: http://reviews.llvm.org/D29430

llvm-svn: 293841
2017-02-01 23:51:56 +00:00
Dehao Chen 0944a8c2ec Change debug-info-for-profiling from a TargetOption to a function attribute.
Summary: LTO requires the debug-info-for-profiling to be a function attribute.

Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl

Reviewed By: mehdi_amini, dblaikie, aprantl

Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29203

llvm-svn: 293833
2017-02-01 22:45:09 +00:00
Paul Robinson a380e613f1 Remove an assertion that doesn't hold when mixing -g and -gmlt through
LTO.  Part of PR31437.

Differential Revision: http://reviews.llvm.org/D29310

llvm-svn: 293818
2017-02-01 21:54:50 +00:00
Sanjoy Das 08da2e28ee [ImplicitNullCheck] Extend canReorder scope
Summary:
This change allows a re-order of two intructions if their uses
are overlapped.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29120

llvm-svn: 293775
2017-02-01 16:04:21 +00:00
Florian Hahn 7a5ec55fb3 [legalizetypes] Push fp16 -> fp32 extension node to worklist.
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765
2017-02-01 13:01:33 +00:00
Evandro Menezes 94edf02923 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737
2017-02-01 02:54:34 +00:00
Sanjoy Das 15e50b510e [ImplicitNullCheck] NFC isSuitableMemoryOp cleanup
Summary:
isSuitableMemoryOp method is repsonsible for verification
that instruction is a candidate to use in implicit null check.
Additionally it checks that base register is not re-defined before.
In case base has been re-defined it just returns false and lookup
is continued while any suitable instruction will not succeed this check
as well. This results in redundant further operations.

So when we found that base register has been re-defined we just
stop.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29119

llvm-svn: 293736
2017-02-01 02:49:25 +00:00
Stanislav Mekhanoshin 70c245e92d Fix regalloc assignment of overlapping registers
SplitEditor::defFromParent() can create a register copy.
If register is a tuple of other registers and not all lanes are used
a copy will be done on a full tuple regardless. Later register unit
for an unused lane will be considered free and another overlapping
register tuple can be assigned to a different value even though first
register is live at that point. That is because interference only look at
liveness info, while full register copy clobbers all lanes, even unused.

This patch fixes copy to only cover used lanes.

Differential Revision: https://reviews.llvm.org/D29105

llvm-svn: 293728
2017-02-01 01:18:36 +00:00
Kyle Butt b15c06677c CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Differential Revision: https://reviews.llvm.org/D28583

llvm-svn: 293716
2017-01-31 23:48:32 +00:00
Tim Northover c6bfa481cf GlobalISel: the translation of an invoke must branch to the good block.
Otherwise bad things happen if the basic block order isn't trivial after an
invoke.

llvm-svn: 293679
2017-01-31 20:12:18 +00:00
Matthias Braun 01fa962226 InterleaveAccessPass: Avoid constructing invalid shuffle masks
Fix a bug where we would construct shufflevector instructions addressing
invalid elements.

Differential Revision: https://reviews.llvm.org/D29313

llvm-svn: 293673
2017-01-31 18:37:53 +00:00
Tim Northover 293f74355b GlobalISel: merge invoke and call translation paths.
Well, sort of. But the lower-level code that invoke used to be using completely
botched the handling of varargs functions, which hopefully won't be possible if
they're using the same code.

llvm-svn: 293670
2017-01-31 18:36:11 +00:00
Nirav Dave a7c041d147 [X86] Implement -mfentry
Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

llvm-svn: 293648
2017-01-31 17:00:27 +00:00
Nicolai Haehnle 8813d5d221 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635
2017-01-31 14:35:37 +00:00
Keno Fischer 578cf7aae7 [ExecutionDepsFix] Improve clearance calculation for loops
Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:

PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
      ^                                  |
      +----------------------------------+

In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.

To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.

In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D

To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
    - All Predecessors have completed primary processing
    - For x the number of predecessors that have completed primary
      processing *at the time of primary processing of this block*,
      the number of predecessors that are done has reached x.

The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.

Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759

llvm-svn: 293571
2017-01-30 23:37:03 +00:00
Tim Northover 2bf8c9d381 GlobalISel: correctly translate invoke when callee is a register.
This should fix the GlobalISel verifier.

llvm-svn: 293550
2017-01-30 21:45:21 +00:00
Tim Northover c944970484 GlobalISel: account for differing exception selector sizes.
For some reason the exception selector register must be a pointer (that's
assumed by SDag); on the other hand, it gets moved into an IR-level type which
might be entirely different (i32 on AArch64). IRTranslator needs to be aware of
this.

llvm-svn: 293546
2017-01-30 20:52:42 +00:00
Tim Northover c94d70336b GlobalISel: tidy up def/use test. NFC.
llvm-svn: 293545
2017-01-30 20:52:37 +00:00
Tim Northover 79f43f195c GlobalISel: translate memset & memmove.
llvm-svn: 293541
2017-01-30 19:33:07 +00:00
Tim Northover 480609d0f3 GlobalISel: permit unused vregs without a register-class after ISel.
This can happen if earlier combining has removed all uses of some VReg, which
is fine and shouldn't flag an error.

llvm-svn: 293537
2017-01-30 19:12:50 +00:00
Simon Pilgrim ffe2535cf6 Use SelectionDAG::getBuildVector helper function where possible. NFCI.
llvm-svn: 293532
2017-01-30 18:53:45 +00:00
Justin Bogner 8f520a73b2 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00
Simon Pilgrim 0a5ab5c4db Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI.
llvm-svn: 293520
2017-01-30 18:20:42 +00:00
Matt Arsenault 32e6bfa20f DAG: Fold fneg into compare with constant into the constant
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)

InstCombine already does this.

llvm-svn: 293512
2017-01-30 17:57:28 +00:00
David Blaikie a66696f210 unique_ptrify some containers in GlobalISel::RegisterBankInfo
To simplify/clarify memory ownership, make leaks (as one was found/fixed
recently) harder to write, etc.

(also, while I was there - removed a duplicate lookup in a container)

llvm-svn: 293506
2017-01-30 17:13:56 +00:00
Matt Arsenault 0c687390fe DAG: Constant fold fp16_to_fp/fp16_to_fp
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.

llvm-svn: 293499
2017-01-30 16:57:41 +00:00
Kristof Beyls 65a12c012f [GlobalISel] Add support for indirectbr
Differential Revision: https://reviews.llvm.org/D28079

llvm-svn: 293470
2017-01-30 09:13:18 +00:00
Matthias Braun a4976c6166 MachineInstr: Remove parameter from dump()
The primary use of the dump() functions in LLVM is for use in a
debugger. Unfortunately lldb does not seem to handle default arguments
so using `p SomeMI.dump()` fails and you have to type the longer `p
SomeMI.dump(nullptr)`. Remove the paramter to make the most common use
easy. (You can always construct something like `p
SomeMI.print(dbgs(),MyTII)` if you need more features).

Differential Revision: https://reviews.llvm.org/D29241

llvm-svn: 293440
2017-01-29 18:20:42 +00:00
Craig Topper 135da1faf5 [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
2017-01-29 06:08:02 +00:00
Craig Topper 4753736abf [DAGCombiner] Use unsigned for a constant vector index instead of APInt.
The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

llvm-svn: 293414
2017-01-29 04:38:21 +00:00
Craig Topper d15730902b [DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR.
The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

llvm-svn: 293413
2017-01-29 04:38:19 +00:00
Craig Topper 24cdbe8fa6 [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.
llvm-svn: 293412
2017-01-29 04:38:16 +00:00
Xinliang David Li fd3f645f9d Add support to dump dot graph block layout after MBP
Differential Revision: https://reviews.llvm.org/D29141

llvm-svn: 293408
2017-01-29 01:57:02 +00:00
Matthias Braun 194ded551c Use print() instead of dump() in code
llvm-svn: 293371
2017-01-28 06:53:55 +00:00
Quentin Colombet 8cf1163c4f [RegisterBankInfo] Emit proper type for remapped registers.
When the OperandsMapper creates virtual registers, it used to just create
plain scalar register with the right size. This may confuse the
instruction selector because we lose the information of the instruction
using those registers what supposed to do. The MachineVerifier complains
about that already.

With this patch, the OperandsMapper still creates plain scalar register,
but the expectation is for the mapping function to remap the type
properly. The default mapping function has been updated to do that.

rdar://problem/30231850

llvm-svn: 293362
2017-01-28 02:23:48 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Quentin Colombet 351099022a [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".
In r292621, the recommit fixes a bug related with live interval update
after the partial redundent copy is moved.

This recommit solves an additional bug related to the lack of update of
subranges.

The original patch is to solve the performance problem described in
PR27827. Register coalescing sometimes cannot remove a copy because of
interference. But if we can find a reverse copy in one of the predecessor
block of the copy, the copy is partially redundent and we may remove the
copy partially by moving it to the predecessor block without the
reverse copy.

Differential Revision: https://reviews.llvm.org/D28585

Re-apply r292621

Revert "Revert rL292621. Caused some internal build bot failures in apple."

This reverts commit r292984.

Original patch: Wei Mi <wmi@google.com>
Subrange fix: Mostly Matthias Braun <matze@braunis.de>

llvm-svn: 293353
2017-01-28 01:05:27 +00:00
Evgeniy Stepanov d0852873e5 Fix memory leak in globalisel.
#0 0x89cdeb in operator new[](unsigned long) /code/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:84:37
    #1 0x4ec87c4 in llvm::RegisterBankInfo::ValueMapping const* llvm::RegisterBankInfo::getOperandsMapping<llvm::RegisterBankInfo::ValueMapping const* const*>(llvm::RegisterBankInfo::ValueMapping const* const*, llvm::RegisterBankInfo::ValueMapping const* const*) const /code/llvm/lib/CodeGen/GlobalISel/RegisterBankInfo.cpp:297:9
    #2 0x9327ee in llvm::AArch64RegisterBankInfo::getInstrMapping(llvm::MachineInstr const&) const /code/llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp:540:30
    #3 0x4eb8d07 in llvm::RegBankSelect::assignInstr(llvm::MachineInstr&) /code/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:546:24
    #4 0x4eb9dd2 in llvm::RegBankSelect::runOnMachineFunction(llvm::MachineFunction&) /code/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:624:12
    #5 0x3141875 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /code/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:13
    #6 0x396128d in llvm::FPPassManager::runOnFunction(llvm::Function&) /code/llvm/lib/IR/LegacyPassManager.cpp:1513:27
    #7 0x3961832 in llvm::FPPassManager::runOnModule(llvm::Module&) /code/llvm/lib/IR/LegacyPassManager.cpp:1534:16
    #8 0x3962540 in runOnModule /code/llvm/lib/IR/LegacyPassManager.cpp:1590:27
    #9 0x3962540 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /code/llvm/lib/IR/LegacyPassManager.cpp:1693
    #10 0x8ae368 in compileModule(char**, llvm::LLVMContext&) /code/llvm/tools/llc/llc.cpp:562:8
    #11 0x8a7a1b in main /code/llvm/tools/llc/llc.cpp:316:22

llvm-svn: 293351
2017-01-28 00:46:30 +00:00
Tim Northover 12bd22fbee GlobalISel: don't leak super-entry BB when merging with IR-level one.
We have to delete the block manually or it leaks. That triggers failures in
-fsanitize=leak bots (unsurprisingly), which should be fixed by this patch.

llvm-svn: 293347
2017-01-27 23:54:31 +00:00
Tim Northover d8b85584f2 GlobalISel: set correct regclass for LOAD_STACK_GUARD.
Since it's not actually a generic MI, its register operands need a RegClass,
which is conveniently the target's pointer RegClass.

llvm-svn: 293335
2017-01-27 21:31:24 +00:00
Tim Northover c9bc8a5580 GlobalISel: mark incoming landing-pad registers as live.
Should fix machine verifier failures.

llvm-svn: 293334
2017-01-27 21:31:17 +00:00
Matthias Braun c91e28af4b ScheduleDAGInstrs: Do not try to toggle kill flags on debug uses
Preparation for upcoming changes. No testcase as none of the public
targets bundles early enough and has a post machine scheduler enabled at
the same time. The error is also easily catched by asserts.

llvm-svn: 293324
2017-01-27 18:53:07 +00:00
Matthias Braun 26e8c350f9 ScheduleDAGInstrs: Cleanup toggleKillFlag(); NFC
llvm-svn: 293323
2017-01-27 18:53:05 +00:00
Matthias Braun bd7d91838e ScheduleDAGInstrs: Cleanup; NFC
Comment, doxygen and a bit of whitespace cleanup.

llvm-svn: 293322
2017-01-27 18:53:00 +00:00
Jun Bum Lim b99a06b7c9 [CodeGenPrep]No negative cost in the ExtLd promotion
Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument.

Reviewers: mcrosier, jmolloy, qcolombet, javed.absar

Reviewed By: mcrosier, qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28871

llvm-svn: 293307
2017-01-27 17:16:37 +00:00
Jonas Paulsson bb0ed3e732 [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().
In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268
2017-01-27 07:46:26 +00:00
Tim Northover 09aac4ad2a GlobalISel: support debug intrinsics.
The translation scheme is mostly cribbed from FastISel, and it's not entirely
convincing semantically. But it does seem to work in the common cases and allow
variables to be printed so it can't be all wrong.

llvm-svn: 293228
2017-01-26 23:39:14 +00:00
Andrew Kaylor a0a1164ce4 Add intrinsics for constrained floating point operations
This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226
2017-01-26 23:27:59 +00:00
Kyle Butt c4614b3e76 [IfConversion] Use reverse_iterator to simplify. NFC
This simplifies skipping debug instructions and shrinking ranges.

llvm-svn: 293202
2017-01-26 20:02:47 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Craig Topper 001aad7da7 [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors.
llvm-svn: 293152
2017-01-26 05:38:46 +00:00
Adam Nemet a964066705 New OptimizationRemarkEmitter pass for MIR
This allows MIR passes to emit optimization remarks with the same level
of functionality that is available to IR passes.

It also hooks up the greedy register allocator to report spills.  This
allows for interesting use cases like increasing interleaving on a loop
until spilling of registers is observed.

I still need to experiment whether reporting every spill scales but this
demonstrates for now that the functionality works from llc
using -pass-remarks*=<pass>.

Differential Revision: https://reviews.llvm.org/D29004

llvm-svn: 293110
2017-01-25 23:20:33 +00:00
Tim Northover 470f070b7d SDag: fix how initial loads are formed when splitting vector ops.
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.

llvm-svn: 293088
2017-01-25 20:58:26 +00:00
Tim Northover 9e35f1e21c GlobalISel: rework getOrCreateVReg to avoid double lookup. NFC.
Thanks to Quentin for suggesting the refactoring.

llvm-svn: 293087
2017-01-25 20:58:22 +00:00
Tim Northover 5d27063eb4 DebugInfo: remove unused parameter from function. NFC.
I think it's a hold-over from some previous iteration, but it's never
set to true in LLVM as it exists now.

llvm-svn: 293086
2017-01-25 20:58:07 +00:00
Krzysztof Parzyszek ee9aa3ffee Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC
llvm-svn: 293077
2017-01-25 19:29:04 +00:00
Chad Rosier 4f724dce42 Revert "Do not verify dominator tree if it has no roots"
This reverts commit r293033, per Danny's comment.  In short, we require
domtrees to have roots at all times.

llvm-svn: 293075
2017-01-25 17:15:48 +00:00
Artur Pilipenko bc93452420 Fix buildbot failures introduced by 293036
Fix unused variable, specify types explicitly to make VC compiler happy.

llvm-svn: 293039
2017-01-25 09:10:07 +00:00
Artur Pilipenko 41c0005aa3 [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861

llvm-svn: 293036
2017-01-25 08:53:31 +00:00
Serge Pavlov 43a7759f4b Do not verify dominator tree if it has no roots
If dominator tree has no roots, the pass that calculates it is
likely to be skipped. It occures, for instance, in the case of
entities with linkage available_externally. Do not run tree
verification in such case.

Differential Revision: https://reviews.llvm.org/D28767

llvm-svn: 293033
2017-01-25 07:58:10 +00:00
Matt Arsenault 732a531506 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
Justin Bogner 4844573eb1 GlobalISel: Fix typo in error message
llvm-svn: 293023
2017-01-25 06:02:10 +00:00
Matt Arsenault 8a27aee6ae DAGCombiner: Allow negating ConstantFP after legalize
llvm-svn: 293019
2017-01-25 04:54:34 +00:00
Ahmed Bougacha eb185e1f64 Try to prevent build breakage by touching a CMakeLists.txt.
Looks like our cmake goop for handling .inc->td dependencies doesn't
track the .td files.

This manifests as cmake complaining about missing files since r293009.

Force a rerun to avoid that.

llvm-svn: 293012
2017-01-25 02:55:24 +00:00
Justin Bogner a029531e10 GlobalISel: Use the correct types when translating landingpad instructions
There was a bug here where we were using p0 instead of s32 for the
selector type in the landingpad. Instead of hardcoding these types we
should get the types from the landingpad instruction directly.

Note that we replicate an assert from SDAG here to only support
two-valued landingpads.

llvm-svn: 292995
2017-01-25 00:16:53 +00:00
Wei Mi f1cf0278e8 Revert rL292621. Caused some internal build bot failures in apple.
llvm-svn: 292984
2017-01-24 22:15:06 +00:00
Geoff Berry 92a286ae5a [SelectionDAG] Handle inverted conditions when splitting into multiple branches.
Summary:
When conditional branches with complex conditions are split into
multiple branches in SelectionDAGBuilder::FindMergedConditions, also
handle inverted conditions.  These may sometimes appear without having
been optimized by InstCombine when CodeGenPrepare decides to sink and
duplicate cmp instructions, causing them to have only one use.  This
problem can be increased by e.g. GVNHoist hiding more cmps from
InstCombine by combining equivalent cmps from different blocks.

For example codegen X & !(Y | Z) as:
    jmp_if_X TmpBB
    jmp FBB
  TmpBB:
    jmp_if_notY Tmp2BB
    jmp FBB
  Tmp2BB:
    jmp_if_notZ TBB
    jmp FBB

Reviewers: bogner, MatzeB, qcolombet

Subscribers: llvm-commits, hiraditya, mcrosier, sebpop

Differential Revision: https://reviews.llvm.org/D28380

llvm-svn: 292944
2017-01-24 16:36:07 +00:00
Craig Topper ff272ad4f3 [SelectionDAG] Teach getNode to simplify a couple easy cases of EXTRACT_SUBVECTOR
Summary:
This teaches getNode to simplify extracting from Undef. This is similar to what is done for EXTRACT_VECTOR_ELT. It also adds support for extracting from CONCAT_VECTOR when we can reuse one of the inputs to the concat. These seem like simple non-target specific optimizations.

For X86 we currently handle undef in extractSubvector, but not all EXTRACT_SUBVECTOR creations go through there.

Ultimately, my motivation here is to simplify extractSubvector and remove custom lowering for EXTRACT_SUBVECTOR since we don't do anything but handle undef and BUILD_VECTOR optimizations, but those should be DAG combines.

Reviewers: RKSimon, delena

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29000

llvm-svn: 292876
2017-01-24 02:36:59 +00:00
Matthias Braun b901d33461 LiveIntervalAnalysis: Calculate liveness even if a superreg is reserved.
A register unit may be allocatable and non-reserved but some of the
register(tuples) built with it are reserved. We still need to calculate
liveness in this case.

Note to out of tree targets: If you start seeing machine verifier errors
with this commit, it probably means that you do not properly mark super
registers of reserved register as reserved. See for example r292836 or
r292870 for example on how to fix that.

rdar://29996737

Differential Revision: https://reviews.llvm.org/D28881

llvm-svn: 292871
2017-01-24 01:12:58 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Matt Arsenault 4e305c6c1e DAG: Don't fold vector extract into load if target doesn't want to
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.

llvm-svn: 292842
2017-01-23 22:48:53 +00:00
Ahmed Bougacha b6137063eb [AArch64][GlobalISel] Legalize narrow scalar fp->int conversions.
Since we're now avoiding operations using narrow scalar integer types,
we have to legalize the integer side of the FP conversions.

This requires teaching the legalizer how to do that.

llvm-svn: 292828
2017-01-23 21:10:14 +00:00
Matt Arsenault 7030661364 DAG: Allow legalization of fcanonicalize vector types
llvm-svn: 292814
2017-01-23 18:52:26 +00:00
Matthias Braun 28eae8f4e0 LiveRegUnits: Add accumulateBackward() function
Re-Commit r292543 with a fix for the situation when the chain end is
MBB.end().

This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.

Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.

- The AArch64A57LoadBalancing code is using a backwards analysis now
  which is irrespective of kill flags. This is the main motivation for
  this change.

Differential Revision: http://reviews.llvm.org/D22082

llvm-svn: 292705
2017-01-21 02:21:04 +00:00
Tim Northover 7f3ad2e07b GlobalISel: prevent heap use-after-free when looking up VReg.
Translating the constant can create more VRegs, which can invalidate the
reference into the DenseMap. So we have to look up the value again after all
that's happened.

llvm-svn: 292675
2017-01-20 23:25:17 +00:00
Petar Jovanovic dbb39356b4 [mips] Fix debug information for __thread variable
This patch fixes debug information for __thread variable on Mips
using .dtprelword and .dtpreldword directives.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D28770

llvm-svn: 292624
2017-01-20 17:53:30 +00:00
Wei Mi d5f7eebd42 [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".
The recommit fixes a bug related with live interval update after the partial
redundent copy is moved.

The original patch is to solve the performance problem described in PR27827.
Register coalescing sometimes cannot remove a copy because of interference.
But if we can find a reverse copy in one of the predecessor block of the copy,
the copy is partially redundent and we may remove the copy partially by moving
it to the predecessor block without the reverse copy.

Differential Revision: https://reviews.llvm.org/D28585

llvm-svn: 292621
2017-01-20 17:38:54 +00:00
Matthias Braun d9217c0b86 Revert "LiveRegUnits: Add accumulateBackward() function"
This seems to be breaking some bots.

This reverts commit r292543.

llvm-svn: 292574
2017-01-20 03:58:42 +00:00
Ahmed Bougacha d294823930 [AArch64][GlobalISel] Widen scalar int->fp conversions.
It's incorrect to ignore the higher bits of the integer source.
Teach the legalizer how to widen it.

llvm-svn: 292563
2017-01-20 01:37:24 +00:00
Stanislav Mekhanoshin 6ec3e3a728 [AMDGPU] Prevent spills before exec mask is restored
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.

Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.

Differential Revision: https://reviews.llvm.org/D27997

llvm-svn: 292554
2017-01-20 00:44:31 +00:00
Justin Bogner e094cc4372 GlobalISel: Add a note about how we're being a bit loose with memory operands
The logic in r292461 is conservatively correct, but we should revisit
this later. Add a TODO so we don't forget.

llvm-svn: 292553
2017-01-20 00:30:17 +00:00
Ahmed Bougacha bf480554df [MIRParser] Allow generic register specification on operand.
This completes r292321 by adding support for generic registers, e.g.:

  %2:_(s32) = G_ADD %0, %1

llvm-svn: 292550
2017-01-20 00:29:59 +00:00
Justin Bogner 680663931c GlobalISel: Only set FailedISel on dropped dbg intrinsics when using fallback
It's easier to test the non-fallback path if we just drop these
intrinsics for now, like we did before we added the fallback path.
We'll obviously need to fix this properly, but the fixme for that is
already here.

llvm-svn: 292547
2017-01-20 00:24:30 +00:00
Justin Bogner 23acaf07de GlobalISel: Pass the MachineFunction in to reportSelectionError directly
Rather than trying to find MF based on the possibly-null MI we've
passed in here, just pass it in directly. It's already available at
all callers anyway.

llvm-svn: 292544
2017-01-20 00:16:19 +00:00
Matthias Braun 3ffeb68869 LiveRegUnits: Add accumulateBackward() function
This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.

Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.

- The AArch64A57LoadBalancing code is using a backwards analysis now
  which is irrespective of kill flags. This is the main motivation for
  this change.

Differential Revision: http://reviews.llvm.org/D22082

llvm-svn: 292543
2017-01-20 00:16:17 +00:00
Matthias Braun 710a4c1f3d CodeGen: Add/Factor out LiveRegUnits class; NFCI
This is a set of register units intended to track register liveness, it
is similar in spirit to LivePhysRegs.
You can also think of this as the liveness tracking parts of the
RegisterScavenger factored out into an own class.

This was proposed in http://llvm.org/PR27609

Differential Revision: http://reviews.llvm.org/D21916

llvm-svn: 292542
2017-01-20 00:16:14 +00:00
Tim Northover 3babfef11d AArch64: fall back to DAG ISel for inline assembly.
We can't currently handle "calls" to inlineasm strings so it's better to let
the DAG handle it than generate rubbish.

llvm-svn: 292540
2017-01-19 23:59:35 +00:00
Simon Pilgrim fb32eea1b4 [SelectionDAG] Improve knownbits handling of UMIN/UMAX (PR31293)
This patch improves the knownbits logic for unsigned integer min/max opcodes.

For UMIN we know that the result will have the maximum of the inputs' known leading zero bits in the result, similarly for UMAX the maximum of the inputs' leading one bits.

This is particularly useful for simplifying clamping patterns,. e.g. as SSE doesn't have a uitofp instruction we want to use sitofp instead where possible and for that we need to confirm that the top bit is not set.

Differential Revision: https://reviews.llvm.org/D28853

llvm-svn: 292528
2017-01-19 22:41:22 +00:00
Mikael Holmen 2074e7497b [DAG] Don't increase SDNodeOrder for dbg.value/declare.
Summary:
The SDNodeOrder is saved in the IROrder field in the SDNode, and this
field may affects scheduling. Thus, letting dbg.value/declare increase
the order numbers may in turn affect scheduling.

Because of this change we also need to update the code deciding when
dbg values should be output, in ScheduleDAGSDNodes.cpp/ProcessSDDbgValues.

Dbg values now have the same order as the SDNode they are connected to,
not the following orders.

Test cases provided by Florian Hahn.

Reviewers: bogner, aprantl, sunfish, atrick

Reviewed By: atrick

Subscribers: fhahn, probinson, andreadb, llvm-commits, MatzeB

Differential Revision: https://reviews.llvm.org/D25318

llvm-svn: 292485
2017-01-19 13:55:55 +00:00
Daniel Sanders d64d5024a4 Re-commit: [globalisel] Tablegen-erate current Register Bank Information
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.

Changes since first commit attempt:
* Added missing guards
* Added more missing guards
* Found and fixed a use-after-free bug involving Twine locals

Reviewers: t.p.northover, ab, rovka, qcolombet

Reviewed By: qcolombet

Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27338

llvm-svn: 292478
2017-01-19 11:15:55 +00:00
Justin Bogner ddb80aee7e GlobalISel: Implement widening for shifts
llvm-svn: 292476
2017-01-19 07:51:17 +00:00
Justin Bogner d09c3ce6c0 GlobalISel: Implement narrowing for G_LOAD
llvm-svn: 292461
2017-01-19 01:05:48 +00:00
Justin Bogner 1f5c505437 GlobalISel: Fix text wrapping in a comment. NFC
llvm-svn: 292460
2017-01-19 01:04:46 +00:00
Dehao Chen 1ce8d6ca59 Add -debug-info-for-profiling to emit more debug info for sample pgo profile collection
Summary:
SamplePGO binaries built with -gmlt to collect profile. The current -gmlt debug info is limited, and we need some additional info:

* start line of all subprograms
* linkage name of all subprograms
* standalone subprograms (functions that has neither inlined nor been inlined)

This patch adds these information to the -gmlt binary. The impact on speccpu2006 binary size (size increase comparing with -g0 binary, also includes data for -g binary, which does not change with this patch):

               -gmlt(orig) -gmlt(patched) -g
433.milc       4.68%       5.40%          19.73%
444.namd       8.45%       8.93%          45.99%
447.dealII     97.43%      115.21%        374.89%
450.soplex     27.75%      31.88%         126.04%
453.povray     21.81%      26.16%         92.03%
470.lbm        0.60%       0.67%          1.96%
482.sphinx3    5.77%       6.47%          26.17%
400.perlbench  17.81%      19.43%         73.08%
401.bzip2      3.73%       3.92%          12.18%
403.gcc        31.75%      34.48%         122.75%
429.mcf        0.78%       0.88%          3.89%
445.gobmk      6.08%       7.92%          42.27%
456.hmmer      10.36%      11.25%         35.23%
458.sjeng      5.08%       5.42%          14.36%
462.libquantum 1.71%       1.96%          6.36%
464.h264ref    15.61%      16.56%         43.92%
471.omnetpp    11.93%      15.84%         60.09%
473.astar      3.11%       3.69%          14.18%
483.xalancbmk  56.29%      81.63%         353.22%
geomean        15.60%      18.30%         57.81%

Debug info size change for -gmlt binary with this patch:

433.milc       13.46%
444.namd       5.35%
447.dealII     18.21%
450.soplex     14.68%
453.povray     19.65%
470.lbm        6.03%
482.sphinx3    11.21%
400.perlbench  8.91%
401.bzip2      4.41%
403.gcc        8.56%
429.mcf        8.24%
445.gobmk      29.47%
456.hmmer      8.19%
458.sjeng      6.05%
462.libquantum 11.23%
464.h264ref    5.93%
471.omnetpp    31.89%
473.astar      16.20%
483.xalancbmk  44.62%
geomean        16.83%

Reviewers: davidxl, echristo, dblaikie

Reviewed By: echristo, dblaikie

Subscribers: aprantl, probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25434

llvm-svn: 292457
2017-01-19 00:44:11 +00:00
Matthias Braun 9f21a8d787 LiveIntervalAnalysis: Cleanup; NFC
- Fix doxygen comments: Do not repeat name, remove duplicated doxygen
  comment (on declaration + implementation), etc.
- Use more range based for

llvm-svn: 292455
2017-01-19 00:32:13 +00:00
Krzysztof Parzyszek de44c9d857 Treat segment [B, E) as not overlapping block with boundaries [A, B)
llvm-svn: 292446
2017-01-18 23:12:19 +00:00
Haicheng Wu 8ce2d14356 [CodeGenPrepare] Fix a typo in the comment. NFC.
encode => endcode.

Differential Revision: https://reviews.llvm.org/D28866

llvm-svn: 292438
2017-01-18 21:12:10 +00:00
Justin Bogner fde0104649 GlobalISel: Implement narrowing for G_STORE
Legalize stores of types that are too wide by breaking them up into
sequences of smaller stores.

llvm-svn: 292412
2017-01-18 17:29:54 +00:00
Teresa Johnson 2d384ac381 Don't create a comdat group for a dropped def with initializer
Non-prevailing weak/linkonce odr symbols will be dropped by ThinLTO to
available_externally when possible. If they had an initializer in the
global_ctors list, a comdat group was being created. This code
already had logic to skip available_externally defs, but now the
EliminateAvailableExternally pass will drop these symbols to
declarations earlier. Change the check to skip all declarations for
linker (which includes available_externally along with declarations).

Reviewers: mehdi_amini

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28737

llvm-svn: 292408
2017-01-18 16:58:43 +00:00
Florian Hahn 8485cecd3f [thumb,framelowering] Reset NoVRegs in Thumb1FrameLowering::emitPrologue.
Summary:
In this function, virtual registers can be introduced (for example
through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs
will replace those virtual registers with concrete registers later on
in PrologEpilogInserter, which sets NoVRegs again.

This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which
failed with expensive checks.
https://llvm.org/bugs/show_bug.cgi?id=27484


Reviewers: rnk, bkramer, olista01

Reviewed By: olista01

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D28829

llvm-svn: 292372
2017-01-18 15:01:22 +00:00
Daniel Sanders af76f989b5 Re-revert: [globalisel] Tablegen-erate current Register Bank Information
More missing guards. My build didn't notice it due to a stale file left over
from a Global ISel build.

llvm-svn: 292369
2017-01-18 14:26:12 +00:00
Daniel Sanders 517b61cb69 Re-commit: [globalisel] Tablegen-erate current Register Bank Information
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.

Changes since last commit:
The new tablegen pass is now correctly guarded by LLVM_BUILD_GLOBAL_ISEL and
this should fix the buildbots however it may not be the whole fix. The previous
buildbot failures suggest there may be a memory bug lurking that I'm unable to
reproduce (including when using asan) or spot in the source. If they re-occur
on this commit then I'll need assistance from the bot owners to track it down.

Reviewers: t.p.northover, ab, rovka, qcolombet

Reviewed By: qcolombet

Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27338

llvm-svn: 292367
2017-01-18 14:17:50 +00:00
Matt Arsenault f411071d63 DAG: Consider nnan in isKnownNeverNaN
llvm-svn: 292328
2017-01-18 02:10:08 +00:00
Wei Mi ce9d04ce58 Revert rL292292 since it causes a SEGV on sanitizer-x86_64-linux-fuzzer build bot.
llvm-svn: 292327
2017-01-18 01:53:53 +00:00
Matthias Braun de5fea2c30 MIRParser: Allow regclass specification on operand
You can now define the register class of a virtual register on the
operand itself avoiding the need to use a "registers:" block.

Example: "%0:gr64 = COPY %rax"

Differential Revision: https://reviews.llvm.org/D22398

llvm-svn: 292321
2017-01-18 00:59:19 +00:00
Wei Mi 8f4178a59e [RegisterCoalescing] Remove partial redundent copy.
The patch is to solve the performance problem described in PR27827.
Register coalescing sometimes cannot remove a copy because of interference.
But if we can find a reverse copy in one of the predecessor block of the copy,
the copy is partially redundent and we may remove the copy partially by moving
it to the predecessor block without the reverse copy.

Differential Revision: https://reviews.llvm.org/D28585

llvm-svn: 292292
2017-01-17 23:39:07 +00:00
Tim Northover d943354216 GlobalISel: correctly handle varargs
Some platforms (notably iOS) use a different calling convention for unnamed vs
named parameters in varargs functions, so we need to keep track of this
information when translating calls.

Since not many platforms are involved, the guts of the special handling is in
the ValueHandler class (with a generic implementation that should work for most
targets).

llvm-svn: 292283
2017-01-17 22:30:10 +00:00
Tim Northover b6636fd392 [GlobalISel] track predecessor mapping during switch lowering.
Correctly populating Machine PHIs relies on knowing exactly how the IR level
CFG was lowered to MachineIR. This needs to be tracked by any translation
phases that meddle (currently only SwitchInst handling).

This reapplies r291973 which was reverted because of testing failures. Fixes:

 + Don't return an ArrayRef to a local temporary.
 + Incorporate Kristof's suggested comment improvements.

llvm-svn: 292278
2017-01-17 22:13:50 +00:00
Ahmed Bougacha 9e5a085cf1 Revert "[TLI] Robustize SDAG proto checking by merging it into TLI."
This reverts commit r292189, as it causes issues on SystemZ bots.

llvm-svn: 292191
2017-01-17 03:31:00 +00:00
Ahmed Bougacha c018efd680 [TLI] Robustize SDAG proto checking by merging it into TLI.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.

Use TLI instead, removing another heap of redundant code.

This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to two tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.

I don't think it's worth supporting either of these (IMO) broken
testcases.  Instead, fix them to be more correct.

llvm-svn: 292189
2017-01-17 03:10:06 +00:00
Daniel Sanders a83a1a69c5 Revert r292132: [globalisel] Tablegen-erate current Register Bank Information'...
Several buildbots encountered a crash in tablegen when building this commit.
Reverting while I investigate the cause.

llvm-svn: 292136
2017-01-16 15:34:43 +00:00
Daniel Sanders ab8194def0 [globalisel] Tablegen-erate current Register Bank Information
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.

Reviewers: t.p.northover, ab, rovka, qcolombet

Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27338

llvm-svn: 292132
2017-01-16 15:20:43 +00:00
Simon Pilgrim 3e91519a1c [SelectionDAG] Add knownbits support for BITREVERSE
llvm-svn: 292130
2017-01-16 14:49:26 +00:00
Simon Pilgrim db73dbcc7c [SelectionDAG] Add support for BITREVERSE constant folding
We were relying on constant folding of the legalized instructions to do what constant folding we had previously

llvm-svn: 292114
2017-01-16 13:39:00 +00:00
Serge Pavlov ed5eb93384 Reverted: Track validity of pass results
Commits r291882 and related r291887.

llvm-svn: 292062
2017-01-15 10:23:18 +00:00
Daniel Jasper bf56ad36cb Revert "[GlobalISel] track predecessor mapping during switch lowering."
This reverts commit r291973.

The test fails in a Release build with LLVM_BUILD_GLOBAL_ISEL enabled.
AFAICT, llc segfaults. I'll add a few more details to the original
commit.

llvm-svn: 292061
2017-01-15 09:41:49 +00:00
Justin Bogner 1a314dac4b GlobalISel: Abort in ResetMachineFunctionPass if fallback isn't enabled
When GlobalISel is configured to abort rather than fallback the only
thing that resetting the machine function does is make things harder
to debug. If we ever get to this point in the abort configuration it
indicates that we've already hit a bug, so this changes the behaviour
to abort instead.

llvm-svn: 291977
2017-01-13 23:46:11 +00:00
Tim Northover 2b57987827 [GlobalISel] track predecessor mapping during switch lowering.
Correctly populating Machine PHIs relies on knowing exactly how the IR level
CFG was lowered to MachineIR. This needs to be tracked by any translation
phases that meddle (currently only SwitchInst handling).

llvm-svn: 291973
2017-01-13 23:11:37 +00:00
David Majnemer e0ebdf4bc7 [CodeGen] Simplify getRecipEstimateForFunc
It used two attribute lookups when only one was needed.

llvm-svn: 291965
2017-01-13 22:24:25 +00:00
James Y Knight 99a2ce2af2 Check for register clobbers when merging a vreg live range with a
reserved physreg in RegisterCoalescer.

Previously, we only checked for clobbers when merging into a READ of
the physreg, but not when merging from a WRITE to the physreg.

Differential Revision: https://reviews.llvm.org/D28527

llvm-svn: 291942
2017-01-13 19:08:36 +00:00
Malcolm Parsons 17d266bc96 Remove unused lambda captures. NFC
llvm-svn: 291916
2017-01-13 17:12:16 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Diana Picus 116bbab4e4 [CodeGen] Rename MachineInstrBuilder::addOperand. NFC
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.

See https://reviews.llvm.org/D28057 for the whole discussion.

Differential Revision: https://reviews.llvm.org/D28556

llvm-svn: 291891
2017-01-13 09:58:52 +00:00
Serge Pavlov d409411ef1 Track validity of pass results
Running tests with expensive checks enabled exhibits some problems with
verification of pass results.

First, the pass verification may require results of analysis that are not
available. For instance, verification of loop info requires results of dominator
tree analysis. A pass may be marked as conserving loop info but does not need to
be dependent on DominatorTreePass. When a pass manager tries to verify that loop
info is valid, it needs dominator tree, but corresponding analysis may be
already destroyed as no user of it remained.

Another case is a pass that is skipped. For instance, entities with linkage
available_externally do not need code generation and such passes are skipped for
them. In this case result verification must also be skipped.

To solve these problems this change introduces a special flag to the Pass
structure to mark passes that have valid results. If this flag is reset,
verifications dependent on the pass result are skipped.

Differential Revision: https://reviews.llvm.org/D27190

llvm-svn: 291882
2017-01-13 06:09:54 +00:00
Daniel Sanders b7391dd3b4 [globalisel] Move as much RegisterBank initialization to the constructor as possible
Summary:
The register bank is now entirely initialized in the constructor. However,
we still have the hardcoded number of register classes which will be
dealt with in the TableGen patch (D27338) since we do not have access
to this information to resolve this at this stage. The number of register
classes is known to the TRI and to TableGen but the RegisterBank
constructor is too early for the former and too late for the latter.
This will be fixed when the data is tablegen-erated.

Reviewers: t.p.northover, ab, rovka, qcolombet

Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D27809

llvm-svn: 291770
2017-01-12 16:11:23 +00:00
Daniel Sanders ae03595bfb [globalisel] Initialize RegisterBanks with static data.
Summary:
Refactor the RegisterBank initialization to use static data. This requires
GlobalISel implementations to rewrite calls to createRegisterBank() and
addRegBankCoverage() into a call to setRegBankData().

Out of tree targets can use diff 4 of D27807
(https://reviews.llvm.org/D27807?id=84117) to have addRegBankCoverage() dump
the register classes and other data that needs to be provided to
setRegBankData(). This is the method that was used to generate the static data
in this patch.

Tablegen-eration of this static data will follow after some refactoring.

Reviewers: t.p.northover, ab, rovka, qcolombet

Subscribers: aditya_nandakumar, kristof.beyls, vkalintiris, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D27807
Differential Revision: https://reviews.llvm.org/D27808

llvm-svn: 291768
2017-01-12 15:32:10 +00:00
Zachary Turner 629cb7d8cc [CodeView] Finish decoupling TypeDatabase from TypeDumper.
Previously the type dumper itself was passed around to a lot of different
places and manipulated in ways that were more appropriate on the type
database. For example, the entire TypeDumper was passed into the symbol
dumper, when all the symbol dumper wanted to do was lookup the name of a
TypeIndex so it could print it. That's what the TypeDatabase is for --
mapping type indices to names.

Another example is how if the user runs llvm-pdbdump with the option to
dump symbols but not types, we still have to visit all types so that we
can print minimal information about the type of a symbol, but just without
dumping full symbol records. The way we did this before is by hacking it
up so that we run everything through the type dumper with a null printer,
so that the output goes to /dev/null. But really, we don't need to dump
anything, all we want to do is build the type database. Since
TypeDatabaseVisitor now exists independently of TypeDumper, we can do
this. We just build a custom visitor callback pipeline that includes a
database visitor but not a dumper.

All the hackery around printers etc goes away. After this patch, we could
probably even delete the entire CVTypeDumper class since really all it is
at this point is a thin wrapper that hides the details of how to build a
useful visitation pipeline. It's not a priority though, so CVTypeDumper
remains for now.

After this patch we will be able to easily plug in a different style of
type dumper by only implementing the proper visitation methods to dump
one-line output and then sticking it on the pipeline.

Differential Revision: https://reviews.llvm.org/D28524

llvm-svn: 291724
2017-01-11 23:24:22 +00:00
Kyle Butt efe56fed12 Revert "CodeGen: Allow small copyable blocks to "break" the CFG."
This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded.

This needs a simple probability check because there are some cases where it is
not profitable.

llvm-svn: 291695
2017-01-11 19:55:19 +00:00
Tim Northover 778d0816ab GlobalISel: only print debug info with -debug. NFC.
Turns out DEBUG(...) has uses even inside NDEBUG checks.

llvm-svn: 291685
2017-01-11 17:33:37 +00:00
Craig Topper 7af39837a9 Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent."
Some test appears to be hanging on the build bots.

llvm-svn: 291650
2017-01-11 04:59:25 +00:00
Craig Topper 577d258569 [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent.
llvm-svn: 291645
2017-01-11 04:02:23 +00:00
Matt Arsenault e482403e1c DAGCombiner: Add hasOneUse checks to fadd/fma combine
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.

llvm-svn: 291642
2017-01-11 02:02:12 +00:00
Eugene Zelenko c4ad1ce068 [Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291641
2017-01-11 01:45:03 +00:00
Quentin Colombet 0b63b31b2e [RegBankSelect] Improve the output of the debug messages.
Add more information about mapping cost and chosen solution.

llvm-svn: 291629
2017-01-11 00:48:41 +00:00
Zachary Turner a9054ddd9c [CodeView/PDB] Rename a bunch of files.
We were starting to get some name clashes between llvm-pdbdump
and the common CodeView framework, so I took this opportunity
to rename a bunch of files to more accurately describe their
usage.  This also helps in llvm-pdbdump to distinguish
between different files and whether they are used for pretty
dump mode or raw dump mode.

llvm-svn: 291627
2017-01-11 00:35:43 +00:00
Zachary Turner c640b76db5 [CodeView] Add TypeDatabase class.
This creates a centralized class in which to store type records.
It stores types as an array of entries, which matches the
notion of a type stream being a topologically sorted DAG.
Logic to build up such a database was already being used in
CVTypeDumper, so CVTypeDumper is now updated to to read from
a TypeDatabase which is filled out by an earlier visitor in
the pipeline.

Differential Revision: https://reviews.llvm.org/D28486

llvm-svn: 291626
2017-01-11 00:35:08 +00:00
Kyle Butt df27aa8c89 CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well.

Differential revision: https://reviews.llvm.org/D27742

llvm-svn: 291609
2017-01-10 23:04:30 +00:00
Matt Arsenault def496c04b Remove unused CONVERT_RNDSAT intrinsics
llvm-svn: 291607
2017-01-10 22:38:02 +00:00
Matt Arsenault 0b382a7cb8 DAG: Avoid OOB when legalizing vector indexing
If a vector index is out of bounds, the result is supposed to be
undefined but is not undefined behavior. Change the legalization
for indexing the vector on the stack so that an out of bounds
index does not create an out of bounds memory access.

llvm-svn: 291604
2017-01-10 22:02:30 +00:00
Victor Leschuk cbddae74f5 DebugInfo: support for DW_FORM_implicit_const
Support for DW_FORM_implicit_const DWARFv5 feature.
When this form is used attribute value goes to .debug_abbrev section (as SLEB).
As this form would break any debug tool which doesn't support DWARFv5
it is guarded by dwarf version check. Attempt to use this form with
dwarf version <= 4 is considered a fatal error.

Differential Revision: https://reviews.llvm.org/D28456

llvm-svn: 291599
2017-01-10 21:18:26 +00:00
Simon Dardis 548a53f5ee [mips] Fix Mips MSA instrinsics
The usage of some MIPS MSA instrinsics that took immediates could crash LLVM
during lowering. This patch addresses that behaviour. Crucially this patch
also makes the use of intrinsics with out of range immediates as producing an
internal error.

The ld,st instrinsics would trigger an assertion failure for MIPS64 as their
lowering would attempt to add an i32 offset to a i64 pointer.

Reviewers: vkalintiris, slthakur

Differential Revision: https://reviews.llvm.org/D25438

llvm-svn: 291571
2017-01-10 16:40:57 +00:00
Craig Topper 588c734b0f [DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC
I guess previously we just assumed if the result type was i1, then the condition type must also be i1?

llvm-svn: 291548
2017-01-10 07:42:57 +00:00
Craig Topper d915d6ba57 [DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> select Cond, X, Y. Just let combine on the xor itself take care of it.
llvm-svn: 291534
2017-01-10 04:12:19 +00:00
Evandro Menezes 063259d4d1 [CodeGen] Implement the SUnit::print() method
This method seems to have had a troubled life.  This patch proposes that it
replaces the recently added helper function dumpSUIdentifier.  This way, the
method can be used in other files using the SUnit class.

Differential revision: https://reviews.llvm.org/D28488

llvm-svn: 291520
2017-01-10 01:08:01 +00:00
Matthias Braun ba7d95d425 PeepholeOptimizer: Do not replace SubregToReg(bitcast like)
While we can usually replace bitcast like instructions
(MachineInstr::isBitcast()) with a COPY this is not legal if any of the
users uses SUBREG_TO_REG to assert the upper bits of the result are
zero.

Differential Revision: https://reviews.llvm.org/D28474

llvm-svn: 291483
2017-01-09 21:38:17 +00:00
Matthias Braun a37430844c MachineInstr: Print name for subreg index in SUBREG_TO_REG
SUBREG_TO_REG takes a subregister index as 3rd operand, print the name
instead of a number. We already do the same for INSERT_SUBREG and
REG_SEQUENCE.

llvm-svn: 291481
2017-01-09 21:38:10 +00:00
Sumanth Gundapaneni 0ac1ce70ba In the below scenario, we must be able to skip the a DBG_VALUE instruction and
remove the dead store.

%vreg0<def> = L2_loadri_io <fi#15>, 0; mem:LD4[%dataF](align=4)
DBG_VALUE %vreg0, %noreg, !"dataF", <!184>; IntRegs:%vreg0 
S2_storeri_io <fi#15>, 0, %vreg0; mem:ST4[%dataF]

In reality, this kind of stores are eliminated before Stack Slot Coloring pass,
possibly in instruction lowering

Differential Revision: https://reviews.llvm.org/D26616

llvm-svn: 291455
2017-01-09 17:45:02 +00:00
Bjorn Pettersson b14afd452d [SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.
Summary:
Originally

 i64 = umax t8, Constant:i64<4>

was expanded into

 i32,i32 = umax Constant:i32<0>, Constant:i32<0>
 i32,i32 = umax t7, Constant:i32<4>

Now instead the two produced umax:es return i32 instead of i32, i32.

Thanks to Jan Vesely for help with the test case.

Patch by mikael.holmen at ericsson.com

Reviewers: bogner, jvesely, tstellarAMD, arsenm

Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D28135

llvm-svn: 291441
2017-01-09 12:03:50 +00:00
Daniel Sanders 12360efa76 [globalisel] Stop requiring -debug/-debug-only=registerbankinfo for assertions.
Summary:
I've noticed that these assertions don't trigger when the condition is false.
The problem is that the DEBUG(x) macro only executes x when the pass is
emitting debug output via the -debug and -debug-only=registerbankinfo command
line arguments.

Debug builds should always execute the assertions so use '#ifndef NDEBUG' instead.

Also removed an assertion that is only true the first time it's tested. <Target>RegisterBankInfo's constructor will re-use register banks causing them to be valid on subsequent tests. That
assertion will fail on the first test too in the near future.

Reviewers: t.p.northover, ab, rovka, qcolombet

Subscribers: dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D28358

llvm-svn: 291235
2017-01-06 14:29:34 +00:00
David Majnemer 9e04befb09 [SelectionDAG] Rework lowerRangeToAssertZExt
Utilize ConstantRange to make it easier to interpret range metadata.

llvm-svn: 291211
2017-01-06 02:43:28 +00:00
David Majnemer eaba06cffa [SelectionDAG] Correctly transform range metadata to AssertZExt
We used the logBase2 of the high instead of the ceilLogBase2 resulting
in the wrong result for certain values.  For example, it resulted in an
i1 AssertZExt when the exclusive portion of the range was 3.

llvm-svn: 291196
2017-01-06 00:11:46 +00:00
Joerg Sonnenberger 83963995c6 PR 31534: When emitting both DWARF unwind tables and debug information,
do not use .cfi_sections. This requires checking if any non-declaration
function in the module needs an unwind table.

llvm-svn: 291172
2017-01-05 20:55:28 +00:00
Matthias Braun 1172332203 CodeGen: Assert that liveness is up to date when reading block live-ins.
Add an assert that checks whether liveins are up to date before they are
used.

- Do not print liveins into .mir files anymore in situations where they
  are out of date anyway.
- The assert in the RegisterScavenger is superseded by the new one in
  livein_begin().
- Skip parts of the liveness updating logic in IfConversion.cpp when
  liveness isn't tracked anymore (just enough to avoid hitting the new
  assert()).

Differential Revision: https://reviews.llvm.org/D27562

llvm-svn: 291169
2017-01-05 20:01:19 +00:00
Kristof Beyls a983e7c4a4 [GlobalISel] Add support for address-taken basic blocks
To make this work, pointers from the MachineBasicBlock to the LLVM-IR-level
basic blocks need to be initialized, as the AsmPrinter uses this link to be
able to print out labels for the basic blocks that are address-taken.

Most of the changes in this commit are about adapting existing tests to include
the basic block name that is now printed out in the MIR format, now that the
name becomes available as the link to the LLVM-IR basic block is initialized.
The relevant test change for the functionality added in this patch are the
added "(address-taken)" strings in
test/CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D28123

llvm-svn: 291105
2017-01-05 13:27:52 +00:00
Kristof Beyls eced071e88 [GlobalISel] Add support for switch statements
This commit does this using a trivial chain of conditional branches.  In the
future, we probably want to reuse the optimized switch lowering used in
SelectionDAG.

Differential Revision: https://reviews.llvm.org/D28176

llvm-svn: 291099
2017-01-05 11:28:51 +00:00
Saleem Abdulrasool 6252bd8eac MC: support passing search paths to the IAS
This is needed to support inclusion in inline assembly via the
`.include` directive.

llvm-svn: 291085
2017-01-05 05:56:39 +00:00