Commit Graph

28070 Commits

Author SHA1 Message Date
Fangrui Song c662795b07 [AsmPrinter][ELF] Emit local alias for ExternalLinkage dso_local GlobalAlias 2020-02-12 17:08:22 -08:00
Guozhi Wei 369d086d78 [MBP] Partial tail duplication into hot predecessors
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.

This patch improves tail duplication in 3 aspects:

  A successor can be duplicated into part of its predecessors.
  A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
  If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.

Differential Revision: https://reviews.llvm.org/D73387
2020-02-12 15:22:33 -08:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Simon Pilgrim 9eb426c88c [TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.

This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.

It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.

Differential Revision: https://reviews.llvm.org/D74221
2020-02-12 11:51:42 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Clement Courbet 15488ff24b [CodeGen] Fix the computation of the alignment of split stores.
Summary:
Right now the alignment of the lower half of a store is computed as
align/2, which fails for unaligned stores (align = 1), and is overly
pessimitic for, e.g. a 8 byte store aligned to 4 bytes.
Fixes PR44851
Fixes PR44877

Reviewers: gchatelet, spatel, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74311
2020-02-12 10:37:30 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Nicolai Hähnle 07a5b849f7 SelectionDAG: Fix bug in ClusterNeighboringLoads
Summary:
The method attempts to find loads that can be legally clustered by
looking for loads consuming the same chain glue token.

However, the old code looks at _all_ users of values produced by the
chain node -- including uses of the loaded/returned value of volatile
loads or atomics. This could lead to circular dependencies which then
failed during scheduling.

With this change, we filter out users by getResNo, i.e. by which
SDValue value they use, to ensure that we only look at users of the
chain glue token.

This appears to be a rather old bug, which is perhaps surprising.
However, the test case is actually quite fragile (i.e., it is hidden
by fairly small changes), and the test _must_ use volatile loads for
the bug to manifest.

Reviewers: arsenm, bogner, craig.topper, foad

Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74253
2020-02-12 09:12:55 +01:00
Craig Topper 0daf9b8e41 [X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.

Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
2020-02-11 22:30:04 -08:00
lewis-revill a6bd1256ce [DebugInfo] Call site entries cannot be generated for FrameSetup calls
Instructions marked as FrameSetup do not cause requestLabelAfterInsn to
be called and so no such label is generated. Call instructions which
require call site entries to be generated require this label to be
present in order to calculate the return PC offset/address, but the
check for whether the call instruction is marked as FrameSetup was not
present.

Therefore in the case where a call instruction is marked as FrameSetup,
an assertion failure occurs if a call site entry is to be generated.
This is the case with RISC-V's implementation of save/restore via
library calls.

Differential Revision: https://reviews.llvm.org/D71593
2020-02-11 21:23:18 +00:00
OCHyams 35e0ab647b [DebugInfo][NFC] Fixup the UserValue methods to use FragmentInfo
Fixup the UserValue methods to use FragmentInfo instead of DIExpression because
the DIExpression is only ever used to get the to get the FragmentInfo. The
DIExpression is meaningless in the UserValue class because each definition point
added to a UserValue may have a unique DIExpression.

Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74057
2020-02-11 10:20:24 +00:00
OCHyams 3aa33fde03 [DebugInfo][NFC] Rename the class DbgValueLocation to DbgVariableValue
Rename the class DbgValueLocation to DbgVariableValue and instances from Loc to
DbgValue. These names better express the new semantics introduced in D74053.

The class previously represented a { Location } only. It now represents a
{ Location, DIExpression } pair which together describe a value.

Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74055
2020-02-11 10:20:24 +00:00
OCHyams 1e40799324 [DebugInfo] Teach LDV how to handle identical variable fragments
LiveDebugVariables uses interval maps to explicitly represent DBG_VALUE
intervals. DBG_VALUEs are filtered into an interval map based on their {
Variable, DIExpression }. The interval map will coalesce adjacent entries that
use the same { Location }.  Under this model, DBG_VALUEs which refer to the same
bits of the same variable will be filtered into different interval maps if they
have different DIExpressions which means the original intervals will not be
properly preserved.

This patch fixes the problem by using { Variable, Fragment } to filter the
DBG_VALUEs into maps, and coalesces adjacent entries iff they have the same
{ Location, DIExpression } pair.

The solution is not perfect because we see the similar issues appear when
partially overlapping fragments are encountered, but is far simpler than a
complete solution (i.e. D70121).

Fixes: pr41992, pr43957
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D74053
2020-02-11 10:20:24 +00:00
Amara Emerson 067dd9c6b1 [GlobalISel][CallLowering] Use stripPointerCasts().
A downstream test exposed a simple logic bug with the manual pointer
stripping code, fix that by just using stripPointerCasts() on the value.

I don't think there's a way to expose this issue upstream.
2020-02-10 15:43:57 -08:00
Matt Arsenault f270da6bfc RegisterCoalescer: Add LaneMask to debug printing 2020-02-10 12:34:33 -08:00
diggerlin aa86311e62 [AIX][XCOFF] Support Mergeable2ByteCString and Mergeable4ByteCString
SUMMARY:
The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString

Reviewers: daltenty
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D74164
2020-02-10 14:45:54 -05:00
Sebastian Neubauer 7cddd15e56 [SelectionDAG] Optimize build_vector of truncates and shifts
Add a simplification to fuse a manual vector extract with shifts and
truncate into a bitcast.

Unpacking and packing values into vectors is only optimized with
extractelement instructions, not when manually unpacked using shifts
and truncates.
This patch simplifies shifts and truncates into a bitcast if possible.

Simplify (build_vec (trunc $1)
                    (trunc (srl $1 width))
                    (trunc (srl $1 (2 * width))) ...)
to (bitcast $1)

Differential Revision: https://reviews.llvm.org/D73892
2020-02-10 15:04:07 +01:00
Djordje Todorovic 3a4dc577c9 [CSInfo] Fix the assertions regarding updating the CSInfo
The call site info was not updated correctly when deleting
corresponding call instructions.

Differential Revision: https://reviews.llvm.org/D73700
2020-02-10 10:55:06 +01:00
Djordje Todorovic 68908993eb [CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo
Use the isCandidateForCallSiteEntry().
This should mostly be an NFC, but there are some parts ensuring
the moveCallSiteInfo() and copyCallSiteInfo() operate with call site
entry candidates (both Src and Dest should be the call site entry
candidates).

Differential Revision: https://reviews.llvm.org/D74122
2020-02-10 10:03:14 +01:00
Amara Emerson 21c9d9ad43 [GlobalISel][CallLowering] Tighten constantexpr check for callee.
I'm not sure there's a test case for this, but it's better to be safe.
2020-02-09 22:59:48 -08:00
Matt Arsenault 312a9d1b83 GlobalISel: Fix narrowScalar for G_{CTLZ|CTTZ}_ZERO_UNDEF
Narrow these for 64-bit VALU for AMDGPU.
2020-02-09 19:02:38 -05:00
Matt Arsenault 6135f5eda4 GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ
The result type is separate from the source type.
2020-02-09 18:11:43 -05:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Craig Topper 2af1640f9a [LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF.
Summary:
For CTTZ we place a set bit just past where the non-promoted type
stopped so the extended bits won't be used for the count. For
CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in
the original type and we end up counting into the extended bits.
So we can just use ANY_EXTEND for both cases.

This matches what is done in type legalization for these operations.
We make no effort to force the upper bits to zero.

Differential Revision: https://reviews.llvm.org/D74111
2020-02-07 22:25:56 -08:00
Amara Emerson 35c63d66aa [GlobalISel][CallLowering] Look through bitcasts from constant function pointers.
Calls to ObjC's objc_msgSend function are done by bitcasting the function global
to the required function type signature. This patch looks through this bitcast
so that we can do a direct call with bl on arm64 instead of using an indirect blr.

Differential Revision: https://reviews.llvm.org/D74241
2020-02-07 15:32:54 -08:00
Vedant Kumar 0d0ef315cb [MachineInstr] Add isCandidateForCallSiteEntry predicate
Add the isCandidateForCallSiteEntry predicate to MachineInstr to
determine whether a DWARF call site entry should be created for an
instruction.

For now, it's enough to have any call instruction that doesn't belong to
a blacklisted set of opcodes. For these opcodes, a call site entry isn't
meaningful.

Differential Revision: https://reviews.llvm.org/D74159
2020-02-07 10:10:41 -08:00
Petar Avramovic 7df5fc9e03 [GlobalISel] Add buildMerge with SrcOp initializer list
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.

Differential Revision: https://reviews.llvm.org/D74223
2020-02-07 18:43:45 +01:00
Amara Emerson 28d22c2c9c [GlobalISel][IRTranslator] Add special case support for ~memory inline asm clobber.
This is a one off special case, since actually implementing full inline asm
support will be much more involved. This lets us compile a lot more code as a
common simple case.

Differential Revision: https://reviews.llvm.org/D74201
2020-02-07 08:55:23 -08:00
Jinsong Ji 01edae1271 [AsmPrinter] Print FP constant in hexadecimal form instead
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.

But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.

Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.

This patch try to print all FP constant in hex form instead.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D73566
2020-02-07 16:00:55 +00:00
Matt Arsenault 3b198518ad GlobalISel: Fix narrowing of G_CTPOP
The result type is separate from the source type. Tests will be
included in a future AMDGPU patch which uses this from
RegBankSelect/applyMappingImpl.
2020-02-07 06:58:00 -08:00
Matt Arsenault 8de2dad9e0 GlobalISel: Fix lowering of G_CTLZ/G_CTTZ
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
2020-02-07 06:54:12 -08:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
Sourabh Singh Tomar 84e5760a16 [DebugInfo]: Reorderd the emission of debug_str section.
Summary:
This patch reorders the emission of debug_str section, so that
string can come after macros.
This is necessary for macro forms like DW_MACRO_define_strp,
which emits macro as a string in debug_str section.
2020-02-07 11:15:55 +05:30
Amara Emerson ac8a12c874 [GlobalISel] Use G_ZEXTLOAD instead of an anyextending load for non-pow-2 legalization.
Fixes PR43288
2020-02-06 14:36:36 -08:00
Konstantin Schwarz 76986bdc46 [GlobalISel] Legalize more G_FP(EXT|TRUNC) libcalls.
This adds a new helper function for retrieving the
floating point type corresponding to the specified
bit-width.
2020-02-06 11:41:34 -08:00
Fangrui Song 727362e87b [MC][ELF] Rename MC related "Associated" to "LinkedToSym"
"linked-to section" is used by the ELF spec. By analogy, "linked-to
symbol" is a good name for the signature symbol.  The word "linked-to"
implies a directed edge and makes it clear its relation with "sh_link",
while one can argue that "associated" means an undirected edge.

Also, combine tests and add precise SMLoc to improve diagnostics.

Reviewed By: eugenis, grimar, jhenderson

Differential Revision: https://reviews.llvm.org/D74082
2020-02-06 11:31:04 -08:00
Jeremy Morse 6531a78ac4 Revert "[DebugInfo] Remove some users of DBG_VALUEs IsIndirect field"
This reverts commit ed29dbaafa.

I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.

Conflicts:
	llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
	llvm/test/DebugInfo/X86/dbg-addr-dse.ll

SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.

There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
2020-02-06 14:41:40 +00:00
Jeremy Morse ece761427f Revert "[DebugInfo][DAG] Distinguish different kinds of location indirection"
This reverts commit 3137fe4d23.

I'm backing out D68945, which this patch is a follow up for. It'll be
re-landed when D68945 is fixed.

The changes to dbg-value-func-arg.ll occur because our handling of certain
kinds of location now mixes up indirection that happens at different points
in a DIExpression. While this is a regression, it's a return to the prior
behaviour while a better patch is sought.
2020-02-06 14:41:40 +00:00
Jeremy Morse ed5998d21e Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit 2d3174c4df.

The overall solution for this problem is reverting D68945, which wasn't
handling the -O0 path through the codegen backend correctly. See:
discussion in D73526.
2020-02-06 14:41:39 +00:00
Sjoerd Meijer 93b0536fd2 [RDA] getInstFromId: find instructions. NFC.
To find the instruction in the block for a given ID, first a count and then a
lookup was performed in the map, which is almost the same thing, thus doing
double the work.

Differential Revision: https://reviews.llvm.org/D73866
2020-02-06 14:13:31 +00:00
Sam Parker 0a8cae10fe [ReachingDefs] Make isSafeToMove more strict.
Test that we're not moving the instruction through instructions with
side-effects.

Differential Revision: https://reviews.llvm.org/D74058
2020-02-06 14:06:08 +00:00
Diogo Sampaio 8ba2b62810 [ARM] Fix non-determenistic behaviour
Summary:
ARM Type Promotion pass does not clear
the container that defines if one variable
was visited or not, missing optimization
opportunities by luck when two llvm:Values
from different functions  are allocated at
the same memory address.

Also fixes a comment and uses existing
method to pop and obtain last element
of the worklist.

Reviewers: samparker

Reviewed By: samparker

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73970
2020-02-06 09:21:13 +00:00
Matt Arsenault 7464e8d6ad GlobalISel: Remove check for illegal MIR
The verifier will catch this.
2020-02-05 18:37:17 -05:00
Jonas Paulsson 96ea377ea4 [PHIElimination] Compile time optimization for huge functions.
This is a compile-time optimization for PHIElimination (splitting of critical
edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As
discussed there, the way to remedy the slowdowns with huge functions is to
pre-compute the live-in registers for each MBB in an efficient way in
PHIElimination.cpp and then pass that information along to
LiveVariabless::addNewBlock().

In all the huge test programs where this slowdown has been noticable, it has
dissapeared entirely with this patch.

Review: Björn Pettersson, Quentin Colombet.

Differential Revision: https://reviews.llvm.org/D73152
2020-02-05 18:10:03 -05:00
Matt Arsenault 9087ef0765 GlobalISel: Allow CSE of G_IMPLICIT_DEF
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
2020-02-05 17:47:21 -05:00
Shu-Chun Weng ce9633633c [GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.

This change adds another function to directly return the register that
is the source of the register and use that for folding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44783

Differential Revision: https://reviews.llvm.org/D74005
2020-02-05 10:38:35 -08:00
Simon Pilgrim 4592bb7195 visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI.
This always gets called at least once.
2020-02-05 13:30:54 +00:00
Djordje Todorovic de90d73e03 [DebugInfo] Avoid the call site param for mem instrs with multiple defs
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.

Differential Revision: https://reviews.llvm.org/D73954
2020-02-05 10:03:14 +01:00
Thomas Lively 649aba93a2 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
David Blaikie ec50e10db4 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
Originally committed in: 1ced28cbe7
            Reverted in: f75301d16d

(reverted due to tests failing on non-linux/x86 targets, tests have since been
generalized and specialized... since Split DWARF isn't supported on non-elf
targets anyway and we have no way to run on "whatever elf target is available"
so they fail on MacOS without an explicit target triple)

This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-04 19:25:47 -08:00
Francis Visoiu Mistrih 7531a5039f [Remarks] Extend the RemarkStreamer to support other emitters
This extends the RemarkStreamer to allow for other emitters (e.g.
frontends, SIL, etc.) to emit remarks through a common interface.

See changes in llvm/docs/Remarks.rst for motivation and design choices.

Differential Revision: https://reviews.llvm.org/D73676
2020-02-04 17:16:02 -08:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Matt Arsenault 23b76096b7 CodeGenPrepare: Reorder check for cold and shouldOptimizeForSize
shouldOptimizeForSize is showing up in a profile, spending around 10%
of the pass time in one function. This should probably not be so slow,
but the much cheaper attribute check should be done first anyway.
2020-02-04 11:23:13 -08:00
Matt Arsenault de8451fe4d GlobalISel: Fold SmallVector resizes into constructors 2020-02-04 10:28:08 -08:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Nico Weber f75301d16d Revert "DebugInfo: Check DW_OP_convert in loclists with Split DWARF"
and follow-ups.

This reverts commit 1ced28cbe7.
This reverts commit 4f281f0474.
This reverts commit 552a8fe12b.

The test fails on non-Linux.
2020-02-04 10:06:46 -05:00
Jeremy Morse 41206b61e3 [DebugInfo] Re-instate LiveDebugVariables scope trimming
This patch reverts part of r362750 / D62650, which stopped
LiveDebugVariables from trimming leading variable location ranges down
to only covering those instructions that are in scope. I've observed some
circumstances where the number of DBG_VALUEs in a function can be
amplified in an un-necessary way, to cover more instructions that are
out of scope, leading to very slow compile times. Trimming the range
of instructions that the variables cover solves the slow compile times.

The specific problem that r362750 tries to fix is addressed by the
assignment to RStart that I've added. Any variable location that begins
at the first instruction of a block will now be considered to begin at the
start of the block. While these sound the same, the have different
SlotIndexes, and the register allocator may shoehorn additional
instructions in between the two. The test added in the past
(wrong_debug_loc_after_regalloc.ll) still works with this modification.

live-debug-variables.ll has a range trimmed to not cover the prologue of
the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one
instruction with no DebugLoc, which is expected behaviour.

Differential Revision: https://reviews.llvm.org/D73691
2020-02-04 14:51:06 +00:00
Filipe Cabecinhas abada5036e [NFC] Fix some spelling mistakes to test pushing to GH. 2020-02-04 11:07:31 +00:00
Simon Pilgrim 3dd688a9ee [DAG] OptLevelChanger - fix uninitialized variable analyzer warning (PR44471)
Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor.

This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case.

Differential Revision: https://reviews.llvm.org/D73875
2020-02-04 10:54:33 +00:00
David Green 362d00e051 [ARM][VecReduce] Force expand vector_reduce_fmin
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.

Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
2020-02-04 09:36:59 +00:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
David Blaikie 1ced28cbe7 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-03 19:16:42 -08:00
David Blaikie 031f83fb82 DebugInfo: Simplify emitDebugLocEntry by never passing a null CU 2020-02-03 18:47:14 -08:00
Matt Arsenault cd7650c186 GlobalISel: Implement fewerElementsVector for G_SEXT_INREG
Start using a new strategy with a combination of merge and unmerges.

This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
2020-02-03 11:47:33 -08:00
Quentin Colombet f26ff8c9df [TargetRegisterInfo] Make the heuristic to skip region split overridable by the target
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).

This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.

The default implementation of this hook keeps the heuristic as it was
before this patch.
2020-02-03 11:30:35 -08:00
Simon Pilgrim 61621f826a [TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling
We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.
2020-02-03 16:50:04 +00:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Guillaume Chatelet fc19465965 [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
Guillaume Chatelet 75d9994a51 Fix broken invariant
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73791
2020-02-03 11:01:05 +01:00
Fangrui Song 44cdae68c3 [CodeGenPrepare] Delete dead !DL check
Follow-up for D73754

DL is assigned in CodeGenPrepare::runOnFunction and is guaranteed to be
non-null.
2020-02-02 09:49:06 -08:00
Fangrui Song 5a56a25b0b [CodeGenPrepare] Make TargetPassConfig required
The code paths in the absence of TargetMachine, TargetLowering or
TargetRegisterInfo are poorly tested. As rL285987 said, requiring
TargetPassConfig allows us to delete many (untested) checks littered
everywhere.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D73754
2020-02-02 09:28:45 -08:00
Fangrui Song 5932f7b8f2 [PatchableFunction] Use an empty DebugLoc
The current FirstMI.getDebugLoc() is actually null in almost all cases.
If it isn't, the generated .loc will be considered initial. The .loc
will have the prologue_end flag and terminate the prologue prematurely.

Also use an overload of BuildMI that will not prepend
PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.
2020-02-01 14:12:06 -08:00
Craig Topper 943b5561d6 [LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops.
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html

The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.

It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.

This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.

Differential Revision: https://reviews.llvm.org/D73749
2020-02-01 11:21:04 -08:00
Matt Arsenault bc101ffd77 GlobalISel: Support widening unmerge results with pointer source 2020-02-01 10:47:03 -05:00
David Blaikie 338beff4dc DwarfDebug.cpp: Fix some indentation 2020-01-31 16:01:57 -08:00
David Blaikie b33e5f3c3e DebugInfo: Split DWARF: Hash non-member function child DIEs
Significant missing hashing - as per the comment this was only meant to
skip member functions (unspecified, but I think it's legible as member
function declarations, not definitions) but was skipping all named
subprograms (so only hashed child DIEs for member function definitions -
because they didn't have a direct name, but only a name given indirectly
in the DW_AT_specification-referenced DIE)
2020-01-31 15:32:03 -08:00
Matt Arsenault 792d9b5719 DAG: Check if a value is divergent before requiresUniformRegister
This avoids a potentially expensive scan if we already know it doesn't
matter.
2020-01-31 15:27:18 -08:00
Jay Foad f465b1aff4 [GlobalISel] Tweak lowering of G_SMULO/G_UMULO
Summary:
Applying this cleanup:

    -      MIRBuilder.buildInstr(TargetOpcode::G_ASHR)
    -        .addDef(Shifted)
    -        .addUse(Res)
    -        .addUse(ShiftAmt);
    +      MIRBuilder.buildAShr(Shifted, Res, ShiftAmt);

caused an assertion failure here:

    llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() || std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed.

    #4  0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403
    #5  0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244
    #6  0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210
    #7  0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341
    #8  0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160
    #9  0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298
    #10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020

because at this point there are two instructions defining Res: the
original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is
to modify the original mul in place, so that there is only ever one
definition of Res.

Reviewers: arsenm, aditya_nandakumar

Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72842
2020-01-31 19:21:01 +00:00
Simon Pilgrim 8fbc7fd567 [DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors
If we don't demand any elements of the inserted subvector then just skip it.
2020-01-31 18:57:22 +00:00
Simon Pilgrim 5702dadf6f [DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-31 18:02:34 +00:00
Hiroshi Yamauchi ac8da31a0f [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Jay Foad 2a1b5af299 [GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister
Summary:
As a side effect some redundant copies of constant values are removed by
CSEMIRBuilder.

Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73789
2020-01-31 17:07:16 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Quentin Colombet cfebd77742 [GISel][KnownBits] Fix a bug where we could run out of stack space
One of the exit criteria of computeKnownBits is whether we reach the max
recursive call depth. Before this patch we would check that the
depth is exactly equal to max depth to exit.

Depth may get bigger than max depth if it gets passed to a different
GISelKnownBits object.
This may happen when say a generic part uses a GISelKnownBits object
with some max depth, but then we hit TL.computeKnownBitsForTargetInstr
which creates a new GISelKnownBits object with a different and smaller
depth. In that situation, when we hit the max depth check for the first
time in the target specific GISelKnownBits object, depth may already
be bigger than the current max depth. Hence we would continue to compute
the known bits, until we ran through the full depth of the chain of
computation or ran out of stack space.

For instance, let say we have
GISelKnownBits Info(/*MaxDepth*/ = 10);
Info.getKnownBits(Foo)
// 9 recursive calls to computeKnownBitsImpl.
// Then we hit a target specific instruction.
// The target specific GISelKnownBits does this:
  GISelKnownBits TargetSpecificInfo(/*MaxDepth*/ = 6)
  TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would
                                            // always return false.

This commit does not have any test case, none of the in-tree targets
use computeKnownBitsForTargetInstr.
2020-01-30 19:30:39 -08:00
Leonard Chan 2d3174c4df [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt
symbols when debugging.

This is an attempt to reland with a few fixes for buildbot since I
haven't merged from master in a bit.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 17:09:42 -08:00
Amara Emerson 84bd851108 [GlobalISel][IRTranslator] When translating vector geps, splat the base pointer if required.
We can have geps that have a scalar base pointer, and a vector index value, which
means that the base pointer must be splatted into a vector of pointers.

This fixes crashes on arm64 GlobalISel with optimizations enabled.
2020-01-30 16:27:27 -08:00
Leonard Chan 3b23453b6c Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit fff6a1b0f1.

This was breaking a bunch of buildbots.
2020-01-30 16:18:41 -08:00
Leonard Chan fff6a1b0f1 [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in
corrupt symbols when debugging.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 15:58:37 -08:00
Matt Arsenault eb7f74e300 CodeGen: Use Register 2020-01-30 15:01:56 -08:00
Sean Fertile 8b737688c2 [AIX] Minor cleanup in AsmPrinter. [NFC]
- Extends the comments related to function descriptors, noting how they
are only used on AIX.

- Changes the condition used to gate the creation of the current function
symbol in AsmPrinter::SetupMachineFunction to reflect being AIX
specific. The creation of the symbol is different because of AIXs
linkage conventions, not because AIX uses function descriptors.

Differential Revision: https://reviews.llvm.org/D73115
2020-01-30 14:15:02 -05:00
Fangrui Song 06b8e32d4f [AArch64] -fpatchable-function-entry=N,0: place patch label after BTI
Summary:
For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after
9a24488cb6, we place the NOP sled after
the initial BTI.

```
.Lfunc_begin0:
bti c
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lfunc_begin0
```

This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label:

```
.Lfunc_begin0:
bti c
.Lpatch0:
nop
nop

.section __patchable_function_entries,"awo",@progbits,f,unique,0
.p2align 3
.xword .Lpatch0
```

This placement is compatible with the resolution in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 .

A local linkage function whose address is not taken does not need a BTI.
Placing the patch label after BTI has the advantage that code does not
need to differentiate whether the function has an initial BTI.

Reviewers: mrutland, nickdesaulniers, nsz, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73680
2020-01-30 11:11:52 -08:00
Matt Arsenault ea956685a1 GlobalISel: Implement s32->s64 G_FPTOSI lowering
Port directly from DAG version.

The lowering for G_FPTOUI used to fail on AMDGPU because it uses
G_FPTOSI.
2020-01-30 08:47:07 -05:00
Dominik Montada dc141af755 [GlobalISel] (fix) Use pointer type size for offset constant when lowering stores
Commit 9965b12fd1 was supposed to change the offset constant when
lowering load/stores, but only introduced this change for loads. This
patch adds the same fix for stores.
2020-01-30 08:32:35 -05:00
Simon Pilgrim 57b0d33224 [DAGCombiner] ISD::AND/OR/XOR - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-30 12:02:53 +00:00
Simon Pilgrim a967aa2706 [DAGCombiner] ISD::SDIV/UDIV/SREM/UREM - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-30 12:02:52 +00:00
Matt Arsenault c5fffa4da3 GlobalISel: Add observer argument to legalizeIntrinsic
This is passed to legalizeCustom, but not intrinsic. Also remove the
MRI argument, since you can get that from the MachineIRBuilder.

I'm not sure why MachineIRBuilder has a private observer member, and
this is passed separately.
2020-01-29 18:33:45 -05:00
Amara Emerson c12f046eb9 [GlobalISel] Add new combine to convert scalar G_MUL to G_SHL.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.

Vector support not yet implemented.

Differential Revision: https://reviews.llvm.org/D73659
2020-01-29 13:39:00 -08:00
Amara Emerson 0da937bb5c [GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS.
We were needlessly putting known constant values on the LHS of a G_MUL, which
is suboptimal.

Differential Revision: https://reviews.llvm.org/D73650
2020-01-29 11:37:19 -08:00
Fangrui Song 8903e61b66 [AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects
For `MC_GlobalAddress` operands referencing **certain** GlobalObjects,
we can lower them to STB_LOCAL aliases to avoid costs brought by
assembler/linker's conservative decisions about symbol interposition:

* An assembler conservatively assumes a global default visibility symbol interposable (ELF
  semantics). So relocations in object files are needed even if the code generator assumed
  the definition exact and non-interposable.
* The relocations can cause the creation of PLT entries on some targets for -shared links.
  A linker conservatively assumes a global default visibility symbol interposable (if not
  otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc).

"certain" refers to GlobalObjects in the intersection of
`hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`.
Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very
few objects LLVM interpret specially.  So the set just includes `external`.

This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower
MC_GlobalAddress operands to STB_LOCAL aliases if applicable.
We may extend the scope and include GlobalAlias in the future.

LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations:

* Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage
  GlobalObjects as non-interposable.
* Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no
  outstanding relocation), if the target is defined in the same section. Put it simply, even if IR
  optimizations failed to optimize and allowed interposition for the function call in
  `void foo() {} void bar() { foo(); }`, the assembler would disallow it.

This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so.
With and without the patch, the object file output should be identical:
`.Lfoo$local` does not take a symbol table entry.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D73228
2020-01-29 10:58:43 -08:00
Simon Pilgrim f7245ef897 [DAGCombiner] ISD::SHL/SRA/SRL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 18:49:42 +00:00
Adrian Prantl 18dbe1b279 Run clang-format on DwarfExpression (NFC) 2020-01-29 10:23:12 -08:00
Adrian Prantl 816ee8a423 DwarfExpression: Factor out getOrCreateBaseType() (NFC) 2020-01-29 10:23:12 -08:00
Simon Pilgrim 25b8e96388 [DAGCombiner] ISD::MUL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 17:26:22 +00:00
Simon Pilgrim 4b04e11735 [DAGCombiner] Sub/SUBSAT - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 16:57:13 +00:00
Simon Pilgrim 48bd6a0986 [DAGCombiner] visitIMINMAX - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us instead of the ConstantSDNode variant where we have to do it ourselves.
2020-01-29 16:57:13 +00:00
Matt Arsenault b63629a58d GlobalISel: Fix mask computation in lowerInsert
This is supposed to be the high bit index, not the width. Use the
wrapping form of getBitsSet and avoid the bitflip.
2020-01-29 08:25:36 -08:00
Jay Foad 0d7bd34312 [MachineScheduler] Ignore artificial edges when forming store chains
Summary:
BaseMemOpClusterMutation::apply forms store chains by looking for
control (i.e. non-data) dependencies from one mem op to another.

In the test case, clusterNeighboringMemOps successfully clusters the
loads, and then adds artificial edges to the loads' successors as
described in the comment:
      // Copy successor edges from SUa to SUb. Interleaving computation
      // dependent on SUa can prevent load combining due to register reuse.
The effect of this is that *data* dependencies from one load to a store
are copied as *artificial* dependencies from a different load to the
same store.

Then when BaseMemOpClusterMutation::apply looks at the stores, it finds
that some of them have a control dependency on a previous load, which
breaks the chains and means that the stores are not all considered part
of the same chain and won't all be clustered.

The fix is to only consider non-artificial control dependencies when
forming chains.

Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71717
2020-01-29 16:23:01 +00:00
Matt Arsenault f717483acd GlobalISel: Assert on invalid bitcast in MIRBuilder
The other casts validate, so this should too.
2020-01-29 07:49:39 -08:00
Matt Arsenault c5c1bb3374 GlobalISel: Lower G_WRITE_REGISTER 2020-01-29 06:48:24 -08:00
David Stenberg 6a2413c435 [ARM64] Debug info for structure argument missing DW_AT_location
Summary:
Prevent eliminating dbg_val due to COPY.

Fixes this
https://bugs.llvm.org/show_bug.cgi?id=40709

Patch by: Kamlesh Kumar (kamleshbhalui)

Reviewers: aprantl, dblaikie, vsk, dsanders

Reviewed By: dsanders

Subscribers: dstenb, kristof.beyls, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73159
2020-01-29 10:56:23 +01:00
Sam Parker ac30ea2f87 [RDA][ARM] Move functionality into RDA
Add several new helpers to RDA:
- hasLocalDefBefore
- isRegDefinedAfter
- isSafeToDefRegAt

And move two bits of logic from ARMLowOverheadLoops into RDA:
- isSafeToMove
- isSafeToRemove

Both of these have some wrappers too to make them more convienent to
use.

Differential Revision: https://reviews.llvm.org/D73460
2020-01-29 03:27:47 -05:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Michael Spang a2fb2c0ddc [GlobalMerge] Preserve symbol visibility when merging globals
Symbols created for merged external global variables have default
visibility. This can break programs when compiling with -Oz
-fvisibility=hidden as symbols that should be hidden will be exported at
link time.

Differential Revision: https://reviews.llvm.org/D73235
2020-01-28 13:26:18 -08:00
Hiroshi Yamauchi 2c03c899d5 [MBFI] Move BranchFolding::MBFIWrapper to its own files. NFC.
Summary:
To avoid header file circular dependency issues in passing updated MBFI (in
MBFIWrapper) to the interface of profile guided size optimizations.

A prep step for (and split off of) D73381.

Reviewers: davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73494
2020-01-28 10:58:46 -08:00
Sam Parker 7ad879caa0 [NFC][RDA] typedef SmallPtrSetImpl<MachineInstr*> 2020-01-28 13:15:44 +00:00
Wang, Pengfei 3239b5034e [FPEnv] Add pragma FP_CONTRACT support under strict FP.
Summary: Support pragma FP_CONTRACT under strict FP.

Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3

Subscribers: hiraditya, jdoerfert, cfe-commits, llvm-commits, LuoYuanke

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72820
2020-01-28 20:43:43 +08:00
Guillaume Chatelet 879c825cb8 [instrinsics] Add @llvm.memcpy.inline instrinsics
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.

Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71710
2020-01-28 09:42:01 +01:00
Fangrui Song c7c5da6df3 Reland "[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()""
Reland 7a8b0b1595, with a fix that checks
`!E.value().empty()` to avoid inserting a zero to SlotRemap.

Debugged by rnk@ in https://bugs.chromium.org/p/chromium/issues/detail?id=1045650#c33

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D73510
2020-01-27 15:58:49 -08:00
Jay Foad cbbbd5b5f6 [GlobalISel] Make use of KnownBits::computeForAddSub
Summary:
This is mostly NFC. computeForAddSub may give more precise results in
some cases, but that doesn't seem to affect any existing GlobalISel
tests.

Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73431
2020-01-27 22:22:56 +00:00
Simon Pilgrim e7e043724e [DAG] Enable ISD::EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.

Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-27 21:17:47 +00:00
Adrian Prantl a095d149c2 Fix an assertion failure in DwarfExpression's subregister composition
This patch fixes an assertion failure in DwarfExpression that is
triggered when a complex fragment has exactly the size of a
subregister of the register the DBG_VALUE points to *and* there is no
DWARF encoding for the super-register.

I took the opportunity to replace/document some magic values with
static constructor functions to make this code less confusing to read.

rdar://problem/58489125

Differential Revision: https://reviews.llvm.org/D72938
2020-01-27 12:44:37 -08:00
Vedant Kumar e08f205f5c Reland (again): [DWARF] Allow cross-CU references of subprogram definitions
This is a revert-of-revert (i.e. this reverts commit 802bec89, which
itself reverted fa4701e1 and 79daafc9) with a fix folded in. The problem
was that call site tags weren't emitted properly when LTO was enabled
along with split-dwarf. This required a minor fix. I've added a reduced
test case in test/DebugInfo/X86/fission-call-site.ll.

Original commit message:

This allows a call site tag in CU A to reference a callee DIE in CU B
without resorting to creating an incomplete duplicate DIE for the callee
inside of CU A.

We already allow cross-CU references of subprogram declarations, so it
doesn't seem like definitions ought to be special.

This improves entry value evaluation and tail call frame synthesis in
the LTO setting. During LTO, it's common for cross-module inlining to
produce a call in some CU A where the callee resides in a different CU,
and there is no declaration subprogram for the callee anywhere. In this
case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram
in order to fill in the call site tag. That empty 'definition' defeats
entry value evaluation etc., because the debugger can't figure out what
it means.

As a follow-up, maybe we could add a DWARF verifier check that a
DW_TAG_subprogram at least has a DW_AT_name attribute.

Update #1:

Reland with a fix to create a declaration DIE when the declaration is
missing from the CU's retainedTypes list. The declaration is left out
of the retainedTypes list in two cases:

1) Re-compiling pre-r266445 bitcode (in which declarations weren't added
   to the retainedTypes list), and
2) Doing LTO function importing (which doesn't update the retainedTypes
   list).

It's possible to handle (1) and (2) by modifying the retainedTypes list
(in AutoUpgrade, or in the LTO importing logic resp.), but I don't see
an advantage to doing it this way, as it would cause more DWARF to be
emitted compared to creating the declaration DIEs lazily.

Update #2:

Fold in a fix for call site tag emission in the split-dwarf + LTO case.

Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a
ReleaseLTO-g build of the test suite.

rdar://46577651, rdar://57855316, rdar://57840415, rdar://58888440

Differential Revision: https://reviews.llvm.org/D70350
2020-01-27 10:52:34 -08:00
Nico Weber 68051c1224 Revert "[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()"
This reverts commit 7a8b0b1595.
It seems to break exception handling on 32-bit Windows, see
https://crbug.com/1045650
2020-01-27 11:22:33 -05:00
Dominik Montada 9965b12fd1 Use pointer type size for offset constant when lowering load/stores 2020-01-27 06:55:32 -08:00
Matt Arsenault 2a160ba5b0 GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results
Only use shifts if the requested type exactly matches the source type,
and create sub-unmerges otherwise.
2020-01-27 06:18:26 -08:00
Matt Arsenault 06d9230fef GlobalISel: Translate vector GEPs 2020-01-27 05:35:05 -08:00
Igor Kudrin 8f3d47c54a [DWARF] Do not pass Version to DWARFExpression. NFCI.
The Version was used only to determine the size of an operand of
DW_OP_call_ref. The size was 4 for all versions apart from 2, but
the DW_OP_call_ref operation was introduced only in DWARF3. Thus,
the code may be simplified and using of Version may be eliminated.

Differential Revision: https://reviews.llvm.org/D73264
2020-01-27 19:08:46 +07:00
David Stenberg 13d4ef9ac0 Improvements to call site register worklist
Summary:
This fixes PR44118. For cases where we have a chain like this:

  R8 = R1 (entry value)
  R0 = R8
  call @foo R0

the code that emits call site entries using entry values would not
follow that chain, instead emitting a call site entry with R8 as
location rather than R0. Such a case was discovered when originally
adding dbgcall-site-orr-moves.mir. This patch fixes that issue. This is
done by changing the ForwardedRegWorklist set to a map in which the
worklist registers always map to the parameter registers that they
describe.

Another thing this patch fixes is that worklist registers now can
describe more than one parameter register at a time. Such a case
occurred in dbgcall-site-interpretation.mir, resulting in a call site
entry not being emitted for one of the parameters.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73168
2020-01-27 12:41:42 +01:00
David Stenberg b46baa82fc Don't separate imp/expl def handling for call site params
Summary:
Since D70431 the describeLoadedValue() hook takes a parameter register,
meaning that it can now be asked to describe any register. This means
that we can drop the difference between explicit and implicit defines
that we previously had in collectCallSiteParameters().

I have not found any case for any upstream targets where a parameter
register is only implicitly defined, and does not overlap with any
explicit defines. I don't know if such a case would even make sense. So
as far as I have tested, this patch should be a non-functional change.
However, this reduces the complexity of the code a bit, and it will
simplify the implementation of an upcoming patch which solves PR44118.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D73167
2020-01-27 11:31:09 +01:00
Petar Avramovic cbf03aee6d [MIPS GlobalISel] Select population count (popcount)
G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates
these intrinsics from __builtin_popcount and __builtin_popcountll.
Add lower and narrow scalar for G_CTPOP.
Lower G_CTPOP for MIPS32.

Differential Revision: https://reviews.llvm.org/D73216
2020-01-27 09:59:50 +01:00
Petar Avramovic 8bc7ba5b9e [MIPS GlobalISel] Select count trailing zeros
llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_ctz and
__builtin_ctzll.
G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true).
Clang generates such intrinsics as parts of expansion of builtin_ffs
and builtin_ffsll. It is also traditionally part of and many
algorithms that are now predicated on avoiding zero-value inputs.

Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ.
Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32.

Differential Revision: https://reviews.llvm.org/D73215
2020-01-27 09:51:06 +01:00
Petar Avramovic 2b66d32f3f [MIPS GlobalISel] Select count leading zeros
llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef,
it tells whether zero as the first argument produces a defined result.
MIPS clz instruction returns 32 for zero input.
G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false)
intrinsics, clang generates these intrinsics from __builtin_clz and
__builtin_clzll.
G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as
second argument. It is also traditionally part of and many algorithms
that are now predicated on avoiding zero-value inputs.

Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF).
Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32.

Differential Revision: https://reviews.llvm.org/D73214
2020-01-27 09:43:38 +01:00
Fangrui Song 941f20c3bd [MachineVerifier] Simplify and delete LLVM_VERIFY_MACHINEINSTRS from a comment. NFC
The environment variable has been unused since r228079.
2020-01-27 00:31:23 -08:00
Wang, Pengfei 17b8f96d65 [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION,
and macro FUNCTION likewise. NFCI.

Some functions like fmuladd don't really have a node, we should divide
the declaration form those have node to avoid introducing fake nodes.

Differential Revision: https://reviews.llvm.org/D72871
2020-01-27 10:38:05 +08:00
Simon Pilgrim 4a5f9d9faf [TargetLowering] Respect recursive depth in SimplifyDemandedBits call to ComputeNumSignBits 2020-01-26 10:01:56 +00:00
Simon Pilgrim 3daa71ee00 [SelectionDAG] ComputeNumSignBits - add DemandedElts support for MIN/MAX ops 2020-01-25 20:21:14 +00:00
Simon Pilgrim 3f8916b2e8 [SelectionDAG] ComputeNumSignBits - add support for rotate non-uniform vector amounts 2020-01-25 19:15:05 +00:00
Simon Pilgrim e3c26a9d1b [SelectionDAG] ComputeNumSignBits - add support for rotate uniform vector amounts 2020-01-25 18:55:47 +00:00
Simon Pilgrim c8de7c8f50 [TargetLowering] SimplifyDemandedBits - Remove ashr if all our demandedbits already match the sign bit
Differential Revision: https://reviews.llvm.org/D73412
2020-01-25 17:36:46 +00:00
Vedant Kumar 802bec8961 Revert "Reland: [DWARF] Allow cross-CU references of subprogram definitions"
... as well as:
Revert "[DWARF] Defer creating declaration DIEs until we prepare call site info"

This reverts commit fa4701e197.

This reverts commit 79daafc903.

There have been reports of this assert getting hit:

CalleeDIE && "Could not find DIE for call site entry origin
2020-01-24 18:07:54 -08:00
Quentin Colombet 5d87b5d202 [GISelKnownBits] Add support for PHIs
Teach the GISelKnowBits analysis how to deal with PHI operations.
PHIs are essentially COPYs happening on edges, so we can just reuse
the code for COPY.

This is NFC COPY-wise has we leave Depth untouched when calling
computeKnownBitsImpl for COPYs, like it was before this patch.
Increasing Depth is however required for PHIs as they may loop back to
themselves and we would end up in an infinite loop if we were not
increasing Depth.

Differential Revision: https://reviews.llvm.org/D73317
2020-01-24 16:43:52 -08:00
@justice_adams (Justice Adams) daee63f974 [SelectionDag] Updated FoldConstantArithmetic method signature in preparation for merge with FoldConstantVectorArithmetic
Updated FoldConstantArithmetic method signature to match that of
FoldConstantVectorArithmetic in preparation for merging the two
functions together

https://bugs.llvm.org/show_bug.cgi?id=36544

This is the first step in combining the various
FoldConstantVectorArithmetic and FoldConstantVectorArithmetic
functions into one FoldConstantArithmetic function.

Differential Revision: https://reviews.llvm.org/D72870
2020-01-24 18:00:58 -05:00
Craig Topper d3bf06bc81 [DAGCombiner] Add combine for (not (strict_fsetcc)) to create a strict_fsetcc with the opposite condition.
Unlike the existing code that I modified here, I only handle the
case where the strict_fsetcc has a single use. Not sure exactly
how to handle multiples uses.

Testing this on X86 is hard because we already have a other
combines that get rid of lowered version of the integer setcc that
this xor will eventually become. So this combine really just
saves a bunch of extra nodes being created. Not sure about other
targets.

Differential Revision: https://reviews.llvm.org/D71816
2020-01-24 14:15:36 -08:00
Stanislav Mekhanoshin be8e38cbd9 Correct NumLoads in clustering
Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.

This is NFC for in tree targets.

Differential Revision: https://reviews.llvm.org/D73292
2020-01-24 12:45:28 -08:00
Stanislav Mekhanoshin 7a94d4f4ee Allow combining of extract_subvector to extract element
Differential Revision: https://reviews.llvm.org/D73132
2020-01-24 10:50:26 -08:00
Fangrui Song 50a3ff30e1 [PatchableFunction] Allow empty entry MachineBasicBlock
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D73301
2020-01-24 09:42:48 -08:00
Tom Weaver f5147765ba [DebugInfo][LiveDebugValues] Teach Live Debug Values About Meta Instructions
Previously LiveDebugValues pass would consider meta instructions that 'fiddle' with liveness of registers as register definitions when transfering register defs. This would mean that, for example, a KILL instruction would cause LiveDebugValues to terminate the range of an earlier DBG_VALUE instruction resulting in the none propogation of said DBG_VALUE instructions into later blocks.

This patch adds the check and a helpful comment, fixes a test that previously tested for the broken behaviour by coincidence and adds a test specifically for this.

reviewers: vsk, dstenb, djtodoro

Differential Revision: https://reviews.llvm.org/D73210
2020-01-24 16:29:05 +00:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Simon Pilgrim 0b45c2264a [SelectionDAG] rot(x, y) --> x iff ComputeNumSignBits(x) == BitWidth(x)
Rotating an 0/-1 value by any amount will always result in the same 0/-1 value
2020-01-24 10:35:57 +00:00
Fangrui Song 22467e2595 Add function attribute "patchable-function-prefix" to support -fpatchable-function-entry=N,M where M>0
Similar to the function attribute `prefix` (prefix data),
"patchable-function-prefix" inserts data (M NOPs) before the function
entry label.

-fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry)
will look like:

```
  .type	foo,@function
.Ltmp0:               # @foo
  nop
foo:
.Lfunc_begin0:
  # optional `bti c` (AArch64 Branch Target Identification) or
  # `endbr64` (Intel Indirect Branch Tracking)
  nop

  .section  __patchable_function_entries,"awo",@progbits,get,unique,0
  .p2align  3
  .quad .Ltmp0
```

-fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable
placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html):

```
(a)         (b)

func:       func:
.Ltmp0:     bti c
  bti c     .Ltmp0:
  nop       nop
```

(a) needs no additional code. If the consensus is to go for (b), we will
need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp .

Differential Revision: https://reviews.llvm.org/D73070
2020-01-23 17:02:27 -08:00