Commit Graph

1680 Commits

Author SHA1 Message Date
Jon Roelofs 095e91c973 [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Re-landing now that the crasher this patch previously uncovered has been fixed
in: https://reviews.llvm.org/D102935

Differential revision: https://reviews.llvm.org/D102452
2021-05-24 10:10:44 -07:00
Christudasan Devadasan ab60e361c2 GlobalISel: Help reduce operation width for instruction with two results.
The function `reduceOperationWidth` helps to legalize a vector
operation either by narrowing its type or by scalarizing the
operation itself. It currently supports instructions with one result.
This patch, in addition allows the same for instructions with two
results (for instance, G_SDIVREM).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100725
2021-05-21 10:34:18 +05:30
Jon Roelofs 0af3105b64 Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths"
This reverts commit 4bf69fb52b.

This broke spec2k6/403.gcc under -global-isel. Details to follow once I've
reduced the problem.
2021-05-20 12:19:16 -07:00
Stephen Tozer cf725dde9c [DebugInfo] Handle DIArgList in FastISel or GlobalIsel
Currently, variadic dbg.values (i.e. those using a DIArgList as part of
their location) are not handled properly by FastISel or GlobalISel, and
will produce invalid DBG_VALUE instructions if they encounter them. This
patch fixes this issue by emitting undef DBG_VALUE instructions for
variadic dbg.values, so that no incorrect instruction is produced and
any prior variable location is terminated.

This is simply a quick-fix to prevent errors; a correct implementation
should come later for these ISel pipelines to ensure that we do not drop
debug information unnecessarily.

Differential Revision: https://reviews.llvm.org/D102500
2021-05-20 17:37:28 +01:00
Amara Emerson 57ea5d4f48 [GlobalISel] Fix div+rem -> divrem combine causing use-def violation. 2021-05-19 23:13:41 -07:00
Jon Roelofs 4bf69fb52b [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Differential revision: https://reviews.llvm.org/D102452
2021-05-19 15:09:18 -07:00
Jessica Paquette 84ae1cf8ed Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
Add missing REQUIRES line to
prelegalizer-combiner-icmp-to-true-false-known-bits.
2021-05-19 09:29:19 -07:00
Nico Weber 52a7797626 Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
This reverts commit 892497c806.
Breaks tests, see comments on https://reviews.llvm.org/D102542
2021-05-19 09:02:27 -04:00
Jessica Paquette 892497c806 [GlobalISel] Simplify G_ICMP to true/false when the result is known
Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs.

E.g.

x == x -> true
x != x -> false
load(x) > 1 -> true (when the load is known to be greater than 1)

And so on.

Differential Revision: https://reviews.llvm.org/D102542
2021-05-18 09:26:41 -07:00
Amara Emerson 80c534a8f9 [GlobalISel][CallLowering] Fix crash when handling a v3s32 type that's being passed as v2s64. 2021-05-14 16:30:51 -07:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
cynecx 8ec9fd4839 Support unwinding from inline assembly
I've taken the following steps to add unwinding support from inline assembly:

1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:

```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
    to label %exit unwind label %uexit
```

2.) Add Bitcode writing/reading support + LLVM-IR parsing.

3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.

4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.

5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.

6.) Don't allow unwinding callbr.

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D95745
2021-05-13 19:13:03 +01:00
Matt Arsenault 6f5ddf6731 GlobalISel: Don't hardcode varargs=false in resultsCompatible 2021-05-11 20:22:06 -04:00
Matt Arsenault 24e2e5df0e GlobalISel: Split ValueHandler into assignment and emission classes
Currently the ValueHandler handles both selecting the type and
location for arguments, as well as inserting instructions needed to
handle them. Split this so that the determination of the argument
handling is independent of the function state. Currently the checks
for tail call compatibility do not follow the full assignment logic,
so it misses cases where arguments require nontrivial legalization.

This should help avoid targets ending up in a buggy state where the
argument evaluation may change in different contexts.
2021-05-11 19:50:12 -04:00
Amara Emerson dc75499998 [GlobalISel][IRTranslator] Fix bit-test lowering dropping phi edges.
For contiguous ranges we drop the last bit-test case but in doing so we skip
adding the new MBB PHI edges to the list of replacement PHI edges, and as a
result we incorrectly omit them in the G_PHI in finishPendingPhis().

Was found when bootstrapping clang with -O3 and GlobalISel enabled on Apple Silicon.
2021-05-10 11:59:31 -07:00
Fraser Cormack 3212a08a8c [Constant] Allow ConstantAggregateZero a scalable element count
A ConstantAggregateZero may be created from a scalable vector type.
However, it still assumed fixed number of elements when queried for
them. This patch changes ConstantAggregateZero to correctly report its
element count.

This change fixes a couple of issues. Firstly, it fixes a crash in
Constant::getUniqueValue when called on a scalable-vector
zeroinitializer constant.

Secondly, it fixes a latent bug in GlobalISel's IRTranslator in which
translating a scalable-vector zeroinitializer would hit the assertion in
ConstantAggregateZero::getNumElements when casting to a FixedVectorType,
rather than reporting an error more gracefully. This is currently
hypothetical as the IRTranslator has deeper issues preventing the use of
scalable vector types.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102082
2021-05-10 13:51:53 +01:00
Momchil Velikov f3139b20a0 [GlobalISel] Fix wrong invocation of `getParamStackAlign` (NFC)
The function template `CallLowering::setArgFlags` is invoked both
for arguments and return values. In the latter case, it calls
`getParamStackAlign` with argument index `~0u`. Nothing wrong
happens now, as the argument is safely incremented back to 0
inside `getParamStackAlign` (the type is `unsigned`), but in
principle it's fragile and may become incorrect.

Differential Revision: https://reviews.llvm.org/D102004
2021-05-10 12:16:33 +01:00
Amara Emerson 808bc11d9e [GlobalISel] Don't form zero/sign extending loads for atomics.
For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or
G_SEXTLOAD.

Differential Revision: https://reviews.llvm.org/D101932
2021-05-07 16:41:48 -07:00
Amara Emerson 1ccebb18ef [GlobalISel] Micro-optimize the conditional branch optimization.
Convert a check into an assert and pass an MI instead of recomputing in the
apply function.
2021-05-07 00:03:09 -07:00
Matt Arsenault fa0b93b5a0 GlobalISel: Use DAG call lowering infrastructure in a more compatible way
Unfortunately the current call lowering code is built on top of the
legacy MVT/DAG based code. However, GlobalISel was not using it the
same way. In short, the DAG passes legalized types to the assignment
function, and GlobalISel was passing the original raw type if it was
simple.

I do believe the DAG lowering is conceptually broken since it requires
picking a type up front before knowing how/where the value will be
passed. This ends up being a problem for AArch64, which wants to pass
i1/i8/i16 values as a different size if passed on the stack or in
registers.

The argument type decision is split across 3 different places which is
hard to follow. SelectionDAG builder uses
getRegisterTypeForCallingConv to pick a legal type, tablegen gives the
illusion of controlling the type, and the target may have additional
hacks in the C++ part of the call lowering. AArch64 hacks around this
by not using the standard AnalyzeFormalArguments and special casing
i1/i8/i16 by looking at the underlying type of the original IR
argument.

I believe people have generally assumed the calling convention code is
processing the original types, and I've discovered a number of dead
paths in several targets.

x86 actually relies on the opposite behavior from AArch64, and relies
on x86_32 and x86_64 sharing calling convention code where the 64-bit
cases implicitly do not work on x86_32 due to using the pre-legalized
types.

AMDGPU targets without legal i16/f16 have always used a broken ABI
that promotes to i32/f32. GlobalISel accidentally fixed this to be the
ABI we should have, but this fixes it so we're using the worse ABI
that is compatible with the DAG. Ideally we would fix the DAG to match
the old GlobalISel behavior, but I don't wish to fight that battle.

A new native GlobalISel call lowering framework should let the target
process the incoming types directly.

CCValAssigns select a "ValVT" and "LocVT" but the meanings of these
aren't entirely clear. Different targets don't use them consistently,
even within their own call lowering code. My current belief is the
intent was "ValVT" is supposed to be the legalized value type to use
in the end, and and LocVT was supposed to be the ABI passed type
(which is also legalized).

With the default CCState::Analyze functions always passing the same
type for these arguments, these only differ when the TableGen part of
the lowering decide to promote the type from one legal type to
another. AArch64's i1/i8/i16 hack ends up inverting the meanings of
these values, so I had to add an additional hack to let the target
interpret how large the argument memory is.

Since targets don't consistently interpret ValVT and LocVT, this
doesn't produce quite equivalent code to the initial DAG
lowerings. I've opted to consistently interpret LocVT as the in-memory
size for stack passed values, and ValVT as the register type to assign
from that memory. We therefore produce extending loads directly out of
the IRTranslator, whereas the DAG would emit regular loads of smaller
values. This will also produce loads/stores that are wider than the
argument value if the allocated stack slot is larger (and there will
be undef padding bytes). If we had the optimizations to reduce
load/stores based on truncated values, this wouldn't produce a
different end result.

Since ValVT/LocVT are more consistently interpreted, we now will emit
more G_BITCASTS as requested by the CCAssignFn. For example AArch64
was directly assigning types to some physical vector registers which
according to the tablegen spec should have been casted to a vector
with a different element type.

This also moves the responsibility for inserting
G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the
generic code, which is closer to how SelectionDAGBuilder works.

I had to xfail an x86 test since I don't see a quick way to fix it
right now (I filed bug 50035 for this). It's broken independently of
this change, and only triggers since now we end up with more ands
which hit the improperly handled selection pattern.

I also observed that FP arguments that need promotion (e.g. f16 passed
as f32) are broken, and use regular G_TRUNC and G_ANYEXT.

TLDR; the current call lowering infrastructure is bad and nobody has
ever understood how it chooses types.
2021-05-05 17:35:02 -04:00
Vang Thao a3d273c9ff [GlobalISel] Fix buildZExtInReg creating new register.
Fix a bug where buildZExtInReg will create and use a new register instead of using the register from parameter DstOp Res.

Reviewed By: arsenm, foad

Differential Revision: https://reviews.llvm.org/D101871
2021-05-05 08:19:52 -07:00
Amara Emerson fa2340574c [GlobalISel][Legalizer] Bump up a smallvector size that was found to be too small. NFC. 2021-04-29 14:41:34 -07:00
Amara Emerson 96ec6d91e4 [AArch64][GlobalISel] Simplify out of range rotate amount.
Differential Revision: https://reviews.llvm.org/D101005
2021-04-29 14:05:58 -07:00
Amara Emerson 2fa14d4700 Try to fix bots. We shouldn't be setting the entrybuilder's DL to a null one.
This was causing a DILocation verifier error, the old code path didn't try to do
this when building constants via the finishPendingPhis() method.
2021-04-29 03:51:10 -07:00
Amara Emerson aa0b9200e8 [GlobalISel][IRTranslator] Move line zero DebugLoc creation to constant translation. NFC.
This is a compile time optimization. DILocation:get() is expensive to call, and
we were calling it to create a line zero debug loc for *every* instruction we
translated. We only really need to do this just before we build constants in the
entry block, so I moved this code there. This reduces the LLVM -O0 codegen time
of sqlite3 IR by around 0.7% instructions executed and by about ~2% in CPU time.

We can probably do better with a more involved change, since the reason we need
to create one for each new constant is because we're using the debug scope and
inlined-at loc. If we just use a single instruction's scope and drop the
inlined-at, we can just cache these and have them be free.
2021-04-28 23:54:14 -07:00
Petar Avramovic 0713c82b13 [GlobalISel]: Add a getConstantIntVRegVal utility
Returns ConstantInt from G_CONSTANT instruction given its def register.

Differential Revision: https://reviews.llvm.org/D99733
2021-04-27 10:52:07 +02:00
Nico Weber ba7a92c01e [Support] Don't include VirtualFileSystem.h in CommandLine.h
CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.

(Already remarked by jhenderson on D70769.)

No behavior change.

Differential Revision: https://reviews.llvm.org/D100957
2021-04-21 10:19:01 -04:00
Simon Pilgrim bc98076ff6 Silence MSVC signed/unsigned comparison warning. NFCI. 2021-04-20 17:20:13 +01:00
Matt Arsenault 620fdb9671 GlobalISel: Defer register creation in handleAssignments
This is currently built on top of the SelectionDAG call lowering, but
does not use it the same way. SelectionDAG passes legalized types to
the assignment functions, and the tablegenerated assignment functions
may change the value types expected for registers. This does not
change the types used, just moves the register creation to help fix
this in the future.

Defer the register creation until after all of the assignment
decisions have been made. This will also help have correct tail call
compatibility checking in a future change. Currently it does not work
as expected for any arguments split across multiple registers.
2021-04-20 11:48:12 -04:00
Matt Arsenault 14b03b4aad GlobalISel: Check for powers of 2 for inverse funnel shift lowering
This doesn't make a practical difference since it would only be broken
if a target actually had a legal non-power-of-2 inverse shift.
2021-04-20 11:30:22 -04:00
Matt Arsenault 83a25a1010 GlobalISel: Restrict narrow scalar for fptoui/fptosi results
This practically only works for the f16 case AMDGPU uses, not wider
types.

Fixes bug 49710 by failing legalization.
2021-04-20 10:54:40 -04:00
Jessica Paquette 91bbb914e0 [AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.

This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.

For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.

Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.

Differential Revision: https://reviews.llvm.org/D100398
2021-04-19 10:47:49 -07:00
Momchil Velikov f9d932e673 [clang][AArch64] Correctly align HFA arguments when passed on the stack
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example

    struct S {
      __attribute__ ((__aligned__(16))) double v[4];
    };

Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)

Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.

This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.

The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.

For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.

On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.

Patch by Momchil Velikov and Lucas Prates.

Differential Revision: https://reviews.llvm.org/D98794
2021-04-15 22:58:14 +01:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
Yang Fan 0d7fd9f0d0
[GlobalISel] Fix Wint-in-bool-context warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp: In member function ‘bool llvm::CombinerHelper::matchFunnelShiftToRotate(llvm::MachineInstr&)’:
/llvm-project/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp:3882:35: warning: ?: using integer constants in boolean context, the expression will always evaluate to ‘true’ [-Wint-in-bool-context]
 3882 |       Opc == TargetOpcode::G_FSHL ? TargetOpcode::G_ROTL : TargetOpcode::G_ROTR;
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
2021-03-31 09:59:43 +08:00
Amara Emerson a35c2c7942 [GlobalISel] Implement fewerElements legalization for vector reductions.
This patch adds 3 methods, one for power-of-2 vectors which use tree
reductions using vector ops, before a final reduction op. For non-pow-2
types it generates multiple narrow reductions and combines the values with
scalar ops.

Differential Revision: https://reviews.llvm.org/D97163
2021-03-30 11:19:21 -07:00
Amara Emerson 91887cd4ec [AArch64][GlobalISel] Combine funnel shifts to rotates.
Differential Revision: https://reviews.llvm.org/D99388
2021-03-30 11:00:36 -07:00
Jessica Paquette 700431128e [GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX
Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG.

This is only done post-legalization for now. Once the legalizer knows how to
decompose these back into shifts, this requirement can probably be removed.

Differential Revision: https://reviews.llvm.org/D99230
2021-03-30 10:14:30 -07:00
Amara Emerson f5e9be6fdb [GlobalISel] Implement lowering for G_ROTR and G_ROTL.
This is a straightforward port.

Differential Revision: https://reviews.llvm.org/D99449
2021-03-30 09:44:41 -07:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Jessica Paquette 23f657c165 [AArch64][GlobalISel] Emit bzero on Darwin
Darwin platforms for both AArch64 and X86 can provide optimized `bzero()`
routines. In this case, it may be preferable to use `bzero` in place of a
memset of 0.

This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can
be generated by platforms which may want to use bzero.

To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The
conditions for this are largely a port of the bzero case in
`AArch64SelectionDAGInfo::EmitTargetCodeForMemset`.

The only difference in comparison to the SelectionDAG code is that, when
compiling for minsize, this will fire for all memsets of 0. The original code
notes that it's not beneficial to do this for small memsets; however, using
bzero here will save a mov from wzr. For minsize, I think that it's preferable
to prioritise omitting the mov.

This also fixes a bug in the libcall legalization code which would delete
instructions which could not be legalized. It also adds a check to make sure
that we actually get a libcall name.

Code size improvements (Darwin):

- CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign)
- CTMark -Oz: -0.2% geomean (-0.5% on bullet)

Differential Revision: https://reviews.llvm.org/D99358
2021-03-25 17:14:25 -07:00
Amara Emerson 0d2c4db637 [GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF.
This may occur when swifterror codegen in the translator generates these,
but we shouldn't try to handle them since they should have regclasses anyway.

rdar://75784009

Differential Revision: https://reviews.llvm.org/D99287
2021-03-24 23:08:51 -07:00
Matt Arsenault b24436ac96 GlobalISel: Lower funnel shifts 2021-03-23 09:11:17 -04:00
Pushpinder Singh d0e5422eb8 [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93963
2021-03-23 05:45:43 +00:00
Matt Arsenault 9fdfd8dd52 GlobalISel: Add utility function to constant fold FP ops 2021-03-22 14:38:17 -04:00
Matt Arsenault c34819afe3 GlobalISel: Handle G_BUILD_VECTOR in isKnownToBeAPowerOfTwo 2021-03-22 14:20:35 -04:00
Matt Arsenault 1098acd46d GlobalISel: Avoid unnecessary truncation to i64
We can just directly pass through the APInt to create a new constant.
2021-03-21 10:07:41 -04:00
Matt Arsenault b9a0384983 GlobalISel: Preserve source value information for outgoing byval args
Pass through the original argument IR value in order to preserve the
aliasing information in the memcpy memory operands.
2021-03-18 09:16:54 -04:00
Matt Arsenault 61f834cc09 GlobalISel: Insert memcpy for outgoing byval arguments
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.

Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).

This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).
2021-03-18 09:16:54 -04:00
Amara Emerson 28963d895b [GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers.
These are pseudos without any users, so DCE was killing them in the combiner.

Marking them as having side effects doesn't seem quite right since they don't.

Gives a nice 0.3% geomean size win on CTMark -Os.

Differential Revision: https://reviews.llvm.org/D98811
2021-03-17 18:02:08 -07:00
Amara Emerson d7fed7b899 [AArch64][GlobalISel] Fall back if disabling neon/fp in the translator.
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.

We should fall back earlier, in the translator, to avoid this issue.

Differential Revision: https://reviews.llvm.org/D98730
2021-03-17 15:08:08 -07:00
Matt Arsenault 6b76d82853 GlobalISel: Fix marking byval arguments as immutable
byval arguments need to be assumed writable. Only implicitly stack
passed arguments which aren't addressable in the IR can be assumed
immutable.

Mips is still broken since for some reason its doing its own thing
with the ValueHandlers (and x86 doesn't actually handle byval
arguments now, although some of the code is there).
2021-03-12 09:01:53 -05:00
Matt Arsenault 34471c3060 GlobalISel: Partially fix handling of byval arguments
This was essentially ignoring byval and treating them as a pointer
argument which needed to be loaded from. This should copy the frame
index value to the virtual register, not insert a load from the frame
index into the pointer value.

For AMDGPU, this was producing a load from the byval pointer argument,
to a pointer used for the byval arguments. I do not understand how
AArch64 managed to work before since it appears to be similarly
broken.

We could also change the ValueHandler API to avoid the extra copy from
the frame index, since currently it returns a new register.

I believe there is still an issue with outgoing byval arguments. These
should have a copy inserted in case the callee decided to overwrite
the memory.
2021-03-12 09:01:53 -05:00
Matt Arsenault cf5ecd5644 GlobalISel: Fix off by one in finding explicit byval alignment
For attribute sets, the return index is at 0, and arguments start at
1. getParamAlignment adds the offset of 1, so we need to convert from
attribute index back to IR index.
2021-03-11 10:23:08 -05:00
Christudasan Devadasan 4c6ab48fb1 GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM
It is good to have a combined `divrem` instruction when the
`div` and `rem` are computed from identical input operands.
Some targets can lower them through a single expansion that
computes both division and remainder. It effectively reduces
the number of instructions than individually expanding them.

Reviewed By: arsenm, paquette

Differential Revision: https://reviews.llvm.org/D96013
2021-03-10 18:46:07 +05:30
Amara Emerson 55e760769b [GlobalISel] Fold away G_BUILD_VECTOR with all elements extracted.
If every element is extracted from a G_BUILD_VECTOR, pass through the source
registers. This is different to the extract(build_vector) combine because this
one tolerates multiple users as long as they're exhaustive.

Differential Revision: https://reviews.llvm.org/D97890
2021-03-09 11:34:26 -08:00
Amara Emerson e60ab72137 [AArch64][GlobalISel] Add combine for extract_vector_elt(build_vector, cst)
Differential Revision: https://reviews.llvm.org/D97835
2021-03-09 11:08:02 -08:00
Jessica Paquette 5c26be214d [AArch64][GlobalISel] Lower G_BUILD_VECTOR -> G_DUP
If we have

```
%vec = G_BUILD_VECTOR %reg, %reg, ..., %reg
```

Then lower it to

```
%vec = G_DUP %reg
```

Also update the selector to handle constant splats on G_DUP.

This will not combine when the splat is all zeros or ones. Tablegen-imported
patterns rely on these being G_BUILD_VECTOR.

Minor code size improvements on CTMark at -Os.

Also adds some utility functions to make it a bit easier to recognize splats,
and an AArch64-specific splat helper.

Differential Revision: https://reviews.llvm.org/D97731
2021-03-08 13:01:10 -08:00
Petar Avramovic d44f61f81c Reland [GlobalISel] Combine zext(trunc x) to x
Recommit 4112299ee7. Depends on
4c8fb7ddd6 which was reverted.

Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-05 11:05:37 +01:00
Petar Avramovic d7834556b7 Reland [GlobalISel] Start using vectors in GISelKnownBits
This is recommit of 4c8fb7ddd6.
MIR in one unit test had mismatched types.

For vectors we consider a bit as known if it is the same for all demanded
vector elements (all elements by default). KnownBits BitWidth for vector
type is size of vector element. Add support for G_BUILD_VECTOR.
This allows combines of urem_pow2_to_mask in pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D96122
2021-03-04 21:47:13 +01:00
Nico Weber 59beb1ef6d Revert "[GlobalISel] Combine zext(trunc x) to x"
This reverts commit 4112299ee7.
Seems to depend on 4c8fb7ddd6 which
is being reverted.
2021-03-04 10:13:40 -05:00
Nico Weber 4b1015361c Revert "[GlobalISel] Start using vectors in GISelKnownBits"
This reverts commit 4c8fb7ddd6.
Breaks check-llvm everywhere, see https://reviews.llvm.org/D96122
2021-03-04 10:13:40 -05:00
Petar Avramovic 4112299ee7 [GlobalISel] Combine zext(trunc x) to x
Combine zext(trunc x) to x when truncated bits are known to be zero.

Differential Revision: https://reviews.llvm.org/D96031
2021-03-04 15:05:23 +01:00
Petar Avramovic 4c8fb7ddd6 [GlobalISel] Start using vectors in GISelKnownBits
For vectors we consider a bit as known if it is the same for all demanded
vector elements (all elements by default). KnownBits BitWidth for vector
type is size of vector element. Add support for G_BUILD_VECTOR.
This allows combines of urem_pow2_to_mask in pre-legalizer combiner.

Differential Revision: https://reviews.llvm.org/D96122
2021-03-04 15:05:23 +01:00
Matt Arsenault 78dcff4841 GlobalISel: Add default implementation of assignValueToReg
Refactor insertion of the asserting ops. This enables using them for
AMDGPU.

This code should essentially be the same for every target. Mips, X86
and ARM all have different code there now, but this seems to be an
accident. The assignment functions are called with different types
than they would be in the DAG, so this is all likely an assortment of
hacks to get around that.
2021-03-03 09:29:53 -05:00
Matt Arsenault fd82cbcf7d GlobalISel: Merge and cleanup more AMDGPU call lowering code
This merges more AMDGPU ABI lowering code into the generic call
lowering. Start cleaning up by factoring away more of the pack/unpack
logic into the buildCopy{To|From}Parts functions. These could use more
improvement, and the SelectionDAG versions are significantly more
complex, and we'll eventually have to emulate all of those cases too.

This is mostly NFC, but does result in some minor instruction
reordering. It also removes some of the limitations with mismatched
sizes the old code had. However, similarly to the merge on the input,
this is forcing gfx6/gfx7 to use the gfx8+ ABI (which is what we
actually want, but SelectionDAG is stuck using the weird emergent
ABI).

This also changes the load/store size for stack passed EVTs for
AArch64, which makes it consistent with the DAG behavior.
2021-03-02 17:31:13 -05:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Nikita Popov c35761db0f [GlobalISel] Bail on G_PHI narrowing of odd types (PR48188)
The current narrowing code for G_PHI can only handle the case
where the size is a multiple of the narrow size. If this is not
the case, fall back to SDAG instead of asserting.

Original patch by shepmaster.

Differential Revision: https://reviews.llvm.org/D92446
2021-03-01 23:30:50 +01:00
Matt Arsenault 0131498402 GlobalISel: Remove dead code
Generic code should probably not introduce G_INSERT/G_EXTRACT. The
mirror unpackRegs should also be removed, but AMDGPU still has a use
remaining which needs to be fixed.
2021-03-01 17:06:43 -05:00
Matt Arsenault 6c260d3bc0 GlobalISel: Move splitToValueTypes to generic code
I copied the nearly identical function from AArch64 into AMDGPU, so
fix this duplication.

Mips and X86 have their own more exotic versions which should be
removed. However replacing those is better left for a separate patch
since it requires other changes to avoid regressions.
2021-03-01 08:58:18 -05:00
James Y Knight 6de6455752 Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available.
These locations were missed as part of adding alignment to the
instructions, and were still making their own alignment assumptions.
2021-02-26 15:06:15 -05:00
Jay Foad a6be26710b [GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC. 2021-02-23 17:08:34 +00:00
Cassie Jones 8f956a5e8f [GlobalISel] Implement narrowScalar for SADDE/SSUBE/UADDE/USUBE
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96673
2021-02-22 19:59:36 -05:00
Cassie Jones e1532649cb [GlobalISel] Implement narrowScalar for SADDO/SSUBO
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96672
2021-02-22 19:59:36 -05:00
Cassie Jones c63b33b792 [GlobalISel] Implement narrowScalar for UADDO/USUBO
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96671
2021-02-22 19:59:35 -05:00
Amara Emerson 212d6a95ab [GloblalISel] Support lowering <3 x i8> arguments in multiple parts.
Differential Revision: https://reviews.llvm.org/D97086
2021-02-22 13:58:44 -08:00
Amara Emerson 69ce291bcc [AArch64][GlobalISel] Support lowering <1 x i8> arguments.
We don't yet have working codegen for the resulting unmerges, and if
we did it would probably be horrible.

Differential Revision: https://reviews.llvm.org/D97035
2021-02-22 13:58:44 -08:00
Kazu Hirata 0b417ba20f [CodeGen] Use range-based for loops (NFC) 2021-02-20 21:46:02 -08:00
Matt Arsenault 62d946e133 GlobalISel: Merge some AMDGPU ABI lowering code to generic code
AMDGPU currently has a lot of pre-processing code to pre-split
argument types into 32-bit pieces before passing it to the generic
code in handleAssignments. This is a bit sloppy and also requires some
overly fancy iterator work when building the calls. It's better if all
argument marshalling code is handled directly in
handleAssignments. This handles more situations like decomposing large
element vectors into sub-element sized pieces.

This should mostly be NFC, but does change the generated code by
shifting where the initial argument packing instructions are placed. I
think this is nicer looking, since it now emits the packing code
directly after the relevant copies, rather than after the copies for
the remaining arguments.

This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit
types. This is ultimately the better option, but incompatible with the
DAG. Fixing this requires more work, especially for f16.
2021-02-18 17:26:55 -05:00
Jessica Paquette e6064a6418 [GlobalISel] Implement computeKnownBits for G_ASSERT_SEXT
Implementation is the same as G_SEXT_INREG.

Differential Revision: https://reviews.llvm.org/D96899
2021-02-17 14:00:36 -08:00
Jessica Paquette 26fb036559 [GlobalISel] Implement computeNumSignBits for G_ASSERT_SEXT
Same implementation as G_SEXT_INREG.

Add a testcase to combine-sext-inreg for a concrete example, and a testcase
to KnownBitsTest.

Differential Revision: https://reviews.llvm.org/D96897
2021-02-17 13:53:17 -08:00
Jessica Paquette 60aa646441 [GlobalISel] Add G_ASSERT_SEXT
This adds a G_ASSERT_SEXT opcode, similar to G_ASSERT_ZEXT. This instruction
signifies that an operation was already sign extended from a smaller type.

This is useful for functions with sign-extended parameters.

E.g.

```
define void @foo(i16 signext %x) {
 ...
}
```

This adds verifier, regbankselect, and instruction selection support for
G_ASSERT_SEXT equivalent to G_ASSERT_ZEXT.

Differential Revision: https://reviews.llvm.org/D96890
2021-02-17 13:10:34 -08:00
Matt Arsenault 392e0fcfd1 GlobalISel: Handle arguments partially passed on the stack
The API is a bit awkward since you need to index into an array in the
passed struct. I guess an alternative would be to pass all of the
individual fields.
2021-02-15 17:06:14 -05:00
Cassie Jones 97a1cdb156 [GlobalISel] Disable vector types in narrowScalarAddSub
The implementation for vectors is broken and doesn't seem to be used by
anything. Explicitly remove support for them, they can be added again
later when they're properly implemented.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D95699
2021-02-14 18:06:32 -05:00
Cassie Jones 36246388ba [GlobalISel] Extract a narrowScalarAddSub method. NFC
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D95426
2021-02-14 18:06:32 -05:00
Kazu Hirata 905cf88d18 [CodeGen] Use range-based for loops (NFC) 2021-02-12 23:44:33 -08:00
Amara Emerson 5d6d9b63a3 [GlobalISel] Propagate extends through G_PHIs into the incoming value blocks.
This combine tries to do inter-block hoisting of extends of G_PHIs, into the
originating blocks of the phi's incoming value. The idea is to expose further
optimization opportunities that are normally obscured by the PHI.

Some basic heuristics, and a target hook for AArch64 is added, to allow tuning.
E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine
since it may be folded into the addressing mode during selection.

There are very minor code size improvements on AArch64 -Os, but the real benefit
is that it unlocks optimizations like AArch64 conditional compares on some
benchmarks.

Differential Revision: https://reviews.llvm.org/D95703
2021-02-12 11:52:52 -08:00
Petar Avramovic f0d65f4096 AMDGPU/GlobalISel: Calculate isKnownNeverNaN for fminnum and fmaxnum
Implements same logis as in SelectionDAG.
G_FMINNUM_IEEE and G_FMAXNUM_IEEE are never SNaN by definition and
never NaN when one operand is known non-NaN and other known non-SNaN.
G_FMINNUM and G_FMAXNUM are never NaN/SNaN when one of the operands
is known non-NaN/SNaN.

Differential Revision: https://reviews.llvm.org/D91716
2021-02-12 17:14:34 +01:00
Petar Avramovic 122c649c98 AMDGPU/GlobalISel: Check values of constants in isKnownNeverNaN
Differential Revision: https://reviews.llvm.org/D91714
2021-02-12 17:14:34 +01:00
Amara Emerson de035c18cf [GlobalISel] Fix sext_inreg(load) combine to not move the originating load.
The builder was using the extend user as the insertion point, which meant that
we were incorrectly "moving" the load from its original position, and therefore
could violate memory operation ordering.
2021-02-11 19:27:09 -08:00
Matt Arsenault b72a23650f GlobalISel: Fix using wrong calling convention for callees
This was taking the calling convention from the parent function,
instead of the callee. Avoids regressions in a future patch when the
caller and callee have different type breakdowns.

For some reason AArch64's lowerFormalArguments seems to intentionally
ignore the parent isVarArg.
2021-02-09 13:48:56 -05:00
Matt Arsenault 87e280110d GlobalISel: Use correct calling convention in handleAssignments
This was using the calling convention of the calling function, not the
callee. Avoids regressions in a future patch.
2021-02-08 17:09:28 -05:00
Amara Emerson ec41ed5b1b [AArch64][GlobalISel] Support the 'returned' parameter attribute.
On AArch64 (which seems to be the only target that supports it), this
attribute allows codegen to avoid saving/restoring the value in x0
across a call.

Gives a 0.1% geomean -Os code size improvement on CTMark.

Differential Revision: https://reviews.llvm.org/D96099
2021-02-08 12:47:39 -08:00
Kazu Hirata 5438e079b1 [GlobalISel] Use ListSeparator (NFC) 2021-02-04 21:18:04 -08:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Justin Bogner 62ce4b048f [GlobalISel] Combine narrowScalar of G_ADD and G_SUB. NFC
These two cases have identical implementations other than an
unreachable part of `G_ADD` that checks if the scalar we're narrowing
is a vector. Combining them to avoid unnecessary divergence.
2021-02-03 11:06:04 -08:00
Jessica Paquette 02d4b365bf [GlobalISel] Check if branches use the same MBB in matchOptBrCondByInvertingCond
If the G_BR + G_BRCOND in this combine use the same MBB, then it will infinite
loop. Don't allow that to happen.

Differential Revision: https://reviews.llvm.org/D95895
2021-02-02 15:38:48 -08:00
Jessica Paquette 4809663334 [GlobalISel] Make sure G_ASSERT_ZEXT's src ends up with the same rc as dst
When replacing the dst reg with the src reg, we need to make sure that we
propagate the dst reg's register class through to the src.

Otherwise, we aren't meeting the requirements for G_ASSERT_ZEXT, and so the
verifier will fail.

Differential Revision: https://reviews.llvm.org/D95708
2021-02-01 09:46:35 -08:00
Tim Northover c2b322fc19 GlobalISel: check type size before getZExtValue()ing it.
Otherwise getZExtValue() asserts.
2021-02-01 12:43:33 +00:00
xgupta 94fac81fcc [Branch-Rename] Fix some links
According to the [[ https://foundation.llvm.org/docs/branch-rename/ | status of branch rename ]], the master branch of the LLVM repository is removed on 28 Jan 2021.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95766
2021-02-01 16:43:21 +05:30
Jessica Paquette d6656c3b25 [GlobalISel] Remove hint instructions in generic InstructionSelect code.
I think every target will want to remove these in the same way. Rather than
making them all implement the same code, let's just put this in
InstructionSelect.

Differential Revision: https://reviews.llvm.org/D95652
2021-01-29 11:20:07 -08:00
Jay Foad 5cf6412a27 [GlobalISel] Fix modifying a G_OR without notifying the observer
Remove the call to setFlags in favour of creating the instruction with
the correct flags in the first place, so we don't have to explicitly
notify the observer.

Differential Revision: https://reviews.llvm.org/D95681
2021-01-29 16:32:24 +00:00
Jessica Paquette d5736a2746 [GlobalISel] Implement regbankselect for G_ASSERT_ZEXT
This adds generic regbankselect support for G_ASSERT_ZEXT.

It inherits whatever register bank the source was given, always, on all targets.

I think that at the point where we run into these, the source register bank
should be decided.

This also adds some AArch64-specific code which makes sure we can handle
G_ASSERT_ZEXT when deciding on register banks for G_STORE, G_PHI, ... etc.

Differential Revision: https://reviews.llvm.org/D95649
2021-01-28 16:56:14 -08:00
Jessica Paquette f19971d1de [GlobalISel] Implement computeKnownBits for G_ASSERT_ZEXT
It's the same as the ZEXT/TRUNC case, except SrcBitWidth is given by the
immediate operand.

Update KnownBitsTest.cpp and a MIR test for a concrete example.

Differential Revision: https://reviews.llvm.org/D95566
2021-01-28 16:34:34 -08:00
Jessica Paquette daffab1985 Recommit "[GlobalISel] Walk through hints in getDefIgnoringCopies et al"
Recommit of 4580acf675

`Opc = DefMI->getOpcode()` was in the wrong place.
2021-01-28 14:43:00 -08:00
Jessica Paquette dcb5b5f1f2 Revert "[GlobalISel] Walk through hints in getDefIgnoringCopies et al"
This reverts commit 4580acf675.

Reverting while looking into some test failures.
2021-01-28 14:37:57 -08:00
Jessica Paquette 4580acf675 [GlobalISel] Walk through hints in getDefIgnoringCopies et al
Treat hint instructions like G_ASSERT_ZEXT like COPY instructions in helpers
which walk through copies.

This ensures that instructions like G_ASSERT_ZEXT won't impact any optimizations
that rely on these helpers.

Differential Revision: https://reviews.llvm.org/D95577
2021-01-28 14:27:00 -08:00
Cassie Jones f22f4557a7 [GlobalISel] Implement widenScalar for carry-in add/sub
These are widened to a wider UADDE/USUBE, with the overflow value
unused, and with the same synthesis of a new overflow value as for the
O operations.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95326
2021-01-28 17:06:24 -05:00
Jessica Paquette 24261729a4 [GlobalISel] Add G_ASSERT_ZEXT
This adds a generic opcode which communicates that a type has already been
zero-extended from a narrower type.

This is intended to be similar to AssertZext in SelectionDAG.

For example,

```
%x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16
```

Signifies that the top 48 bits of %x are known to be 0.

This is useful in cases like this:

```
define i1 @zeroext_param(i8 zeroext %x) {
  %cmp = icmp ult i8 %x, -20
  ret i1 %cmp
}
```

In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit
value.

If we know that `%x` is already zero-ed out in the relevant high bits, we can
avoid the truncate.

Currently, in GISel, this looks like this:

```
_zeroext_param:
  and w8, w0, #0xff ; We don't actually need this!
  cmp w8, #236
  cset w0, lo
  ret
```

While SDAG does not produce the truncation, since it knows that it's
unnecessary:

```
_zeroext_param:
  cmp w0, #236
  cset w0, lo
  ret
```

This patch

- Adds G_ASSERT_ZEXT
- Adds MIRBuilder support for it
- Adds MachineVerifier support for it
- Documents it

It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There
should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.)

This allows us to skip over hints in the legalizer etc. These can then later
be selected like COPY instructions or removed.

Differential Revision: https://reviews.llvm.org/D95564
2021-01-28 13:58:37 -08:00
Jessica Paquette f36007e811 [GlobalISel] Implement computeKnownBits for G_SEXT_INREG
Just use the existing `Known.sextInReg` implementation.

- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.

Differential Revision: https://reviews.llvm.org/D95484
2021-01-26 15:01:38 -08:00
Amara Emerson cbed865e1e [GlobalISel][IRTranslator] Ignore the llvm.experimental.noalias.scope.decl intrinsic.
These don't generate any code.
2021-01-26 13:04:11 -08:00
Amara Emerson 03bce0bf4e [GlobalISel][Localizer] Don't localize phi operands which are used more than once in the phi.
The current algorithm just tries to localize defs as far as they can go, and in
the case of G_PHI operands, it clones the def into the predecessor block for
each incoming edge. When multiple edges have the same register value, this can
cause unnecessary code bloat, and inhibit later optimizations.

This change checks if a given phi operand is unique in the phi, if not the
def of that register is not localized to the predecessor.

Differential Revision: https://reviews.llvm.org/D95406
2021-01-25 17:48:04 -08:00
Mitch Phillips c9466ede7e Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method""
This reverts commit 554b3211fe.

Differential Revision: https://reviews.llvm.org/D95035
2021-01-25 16:22:22 -08:00
Cassie Jones aa8f3677f7 Recommit "[AArch64][GlobalISel] Implement widenScalar for signed overflow"
Implement widening for G_SADDO and G_SSUBO.
Add legalize-add/sub tests for narrow overflowing add/sub on AArch64.

Differential Revision: https://reviews.llvm.org/D95034
2021-01-25 16:57:20 -05:00
Mitch Phillips e3a7532cc9 Revert "[AArch64][GlobalISel] Implement widenScalar for signed overflow"
This reverts commit 541d98efa2.

Reason: Dependent patch 3dedad475d broke
UBSan on Android: http://lab.llvm.org:8011/#/builders/77/builds/3082
2021-01-22 14:32:11 -08:00
Mitch Phillips 554b3211fe Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"
This reverts commit 2bb92bf451.

Dependent patch broke UBSan on Android:
3dedad475d
2021-01-22 14:32:11 -08:00
Cassie Jones 2bb92bf451 [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method
The widenScalar implementation for signed and unsigned overflowing
operations were very similar: both are checked by truncating the result
and then re-sign/zero-extending it and checking that it matches the
computed operation.

Using a truncate + zero-extend for the unsigned case instead of manually
producing the AND instruction like before leads to an extra copy
instruction during legalization, but this should be harmless.

Differential Revision: https://reviews.llvm.org/D95035
2021-01-22 14:08:46 -08:00
Cassie Jones 541d98efa2 [AArch64][GlobalISel] Implement widenScalar for signed overflow
Implement widening for G_SADDO and G_SSUBO. Previously it was only
implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for
narrow overflowing add/sub on AArch64.

Differential Revision: https://reviews.llvm.org/D95034
2021-01-21 22:55:42 -08:00
Kazu Hirata c5c4dbd279 [CodeGen] Use llvm::append_range (NFC) 2021-01-21 19:59:46 -08:00
Matt Arsenault 35c535a7df AArch64/GlobalISel: Factor out parametersInCSRMatch
Make this look more like the DAG handling and move to common code.

I also noticed AArch64 seems to not be properly adding the
physreg:virtreg mapping to the function live ins.
2021-01-21 10:32:48 -05:00
Mirko Brkusanin a6a72dfdf2 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Gabriel Hjort Åkerlund 2aeaaf841b [GlobalISel] Add missing operand update when copy is required
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D91244
2021-01-20 10:32:52 +01:00
Jessica Paquette cbf5246359 Fix buildbot after cfc6073017
Windows buildbots were not happy with using find_if + instructionsWithoutDebug.

In cfc6073017, instructionsWithoutDebug is not technically necessary. So,
just iterate over the block directly.

http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio
2021-01-19 10:38:04 -08:00
Jessica Paquette cfc6073017 [GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)

This tries to recognize patterns like below (assuming a little-endian target):

```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)

s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```

(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)

To recognize the pattern, this searches from the last G_OR in the expression
tree.

E.g.

```
    Reg   Reg
     \    /
      OR_1   Reg
       \    /
        OR_2
          \     Reg
           .. /
          Root
```

Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.

If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)

To simplify things, this patch

(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
    specific location

An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)

At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.

Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.

Differential Revision: https://reviews.llvm.org/D94350
2021-01-19 10:24:27 -08:00
Jay Foad 517196e569 [Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC.
Differential Revision: https://reviews.llvm.org/D94588
2021-01-14 14:02:43 +00:00
Matt Arsenault d55d592a92 GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper
This fixes double printing of insertion debug messages in the
legalizer.

Try to cleanup usage of observers. Currently the use of observers is
pretty hard to follow and it's not clear what is responsible for
them. Observers are referenced in 3 places:

1. In the MachineFunction
2. In the MachineIRBuilder
3. In the LegalizerHelper

The observers in the MachineFunction and MachineIRBuilder are both
called only on insertions, and are redundant with each other. The
source of the double printing was the same observer was added to both
the MachineFunction, and the MachineIRBuilder. One of these references
needs to be removed. Arguably observers in general should be fully
removed from one or the other, but it may be useful to have a local
observer in the MachineIRBuilder that is not added to the function's
observers. Alternatively, the wrapper observer could manage a local
observer in one place.

The LegalizerHelper only ever calls the observer on changing/changed
instructions, and never insertions. Logically these are two different
types of observers, for changes and for insertions.

Additionally, some places used the GISelObserverWrapper when they only
needed a single observer they could use directly.

Setting the observer in the LegalizerHelper constructor is not
flexible enough if the LegalizerHelper is constructed anywhere outside
the one used by the legalizer. AMDGPU calls the LegalizerHelper in
RegBankSelect, and needs to use a local observer to apply the regbank
to newly created instructions. Currently it accomplishes this by
constructing a local MachineIRBuilder. I'm trying to move the
MachineIRBuilder to be owned/maintained by the RegBankSelect pass
itself, but the locally constructed LegalizerHelper would reset the
observer.

Mips also has a special case use of the LegalizationArtifactCombiner
in applyMappingImpl; I think we do need to run the artifact combiner
during RegBankSelect, but in a more consistent way outside of
applyMappingImpl.
2021-01-13 10:44:31 -05:00
Kazu Hirata 12fc9ca3a4 [llvm] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-01-12 21:43:46 -08:00
Kazu Hirata e3d3dbd339 [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-01-10 09:24:57 -08:00
Christudasan Devadasan ae25a397e9 AMDGPU/GlobalISel: Enable sret demotion 2021-01-08 10:56:35 +05:30
Matt Arsenault 2cbbc6e87c GlobalISel: Fail legalization on narrowing extload below memory size 2021-01-07 17:40:34 -05:00
Matt Arsenault 1f9b6ef91f GlobalISel: Add combine for G_UREM by power of 2
Really I want this in the legalizer, but this is a start.
2021-01-07 16:36:35 -05:00
Kazu Hirata cfeecdf7b6 [llvm] Use llvm::all_of (NFC) 2021-01-06 18:27:36 -08:00
Christudasan Devadasan d68458bd56 [GlobalISel] Base implementation for sret demotion.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.

Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92953
2021-01-06 10:30:50 +05:30
Matt Arsenault a427f15d60 GlobalISel: Add isKnownToBeAPowerOfTwo helper function 2021-01-05 12:59:08 -05:00
Juneyoung Lee 5cdf6ed744 [CodeGen] recognize select form of and/ors when splitting branch conditions
Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a || b" in C/C++.
"a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a.
This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353).
In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison.
The discussion at D93065 has more context about this.

This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize
select form of and/or as well using m_LogicalAnd/Or.
Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93853
2021-01-01 04:46:10 +09:00
Amara Emerson 7df3544e80 [GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from getConstantVRegVal" landed.
APInt binary ops don't promote types but instead assert, which a combine was
relying on.
2020-12-26 23:51:44 -08:00
Kazu Hirata df812115e3 [CodeGen, Transforms] Use llvm::any_of (NFC) 2020-12-24 09:08:36 -08:00
Matt Arsenault 581d13f8ae GlobalISel: Return APInt from getConstantVRegVal
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.

Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).

One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
2020-12-22 22:23:58 -05:00
Matt Arsenault e7e7d371fd GlobalISel: Fix generic handling of single outgoing call arguments
Simply call the argument handler like is done for the incoming
case. This will allow removal of hacks in the AMDGPU call lowering in
a future change.
2020-12-15 17:00:27 -05:00
Amara Emerson a69b76c500 [GlobalISel][IRTranslator] Ensure branch probabilities are added when translating invoke edges.
This uses a straightforward port of findUnwindDestinations() from SelectionDAG.

Differential Revision: https://reviews.llvm.org/D93256
2020-12-14 23:36:54 -08:00
Amara Emerson 21de99d43c [[GlobalISel][IRTranslator] Fix a crash when the use of an extractvalue is a non-dominated metadata use.
We don't expect uses to come before defs in the CFG, so allocateVRegs() asserted.

Fixes PR48211
2020-12-12 14:58:54 -08:00
Fangrui Song b5ad32ef5c Migrate deprecated DebugLoc::get to DILocation::get
This migrates all LLVM (except Kaleidoscope and
CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get.

The CodeGen/StackProtector.cpp usage may have a nullptr Scope
and can trigger an assertion failure, so I don't migrate it.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D93087
2020-12-11 12:45:22 -08:00
Fangrui Song d928dfc6f9 [GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds 2020-11-30 18:31:42 -08:00
Fangrui Song 36fe1a9dea [GlobalISel] Fix -Wunused-variable 2020-11-30 18:25:54 -08:00
Amara Emerson 87ff156414 [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask.
The lowering of vector selects needs to first splat the scalar mask into a vector
first.

This was causing a crash when building oggenc in the test suite.

Differential Revision: https://reviews.llvm.org/D91655
2020-11-30 16:37:49 -08:00
Mirko Brkusanin 4cf6dd518e [AMDGPU][GlobalISel] Fix lowerShlSat
RegBankSelect would crash on G_SELECT when type is not s1.

Differential Revision: https://reviews.llvm.org/D91437
2020-11-16 17:43:31 +01:00
Jessica Paquette b184a2eccf [GlobalISel] Add matchers for specific constants and a matcher for negations
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.

Add

- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.

Also update a few places which use idioms related to the new matchers.

Differential Revision: https://reviews.llvm.org/D91397
2020-11-13 09:24:54 -08:00
Matt Arsenault c67e1a985f GlobalISel: Directly expose getDefSrcRegIgnoringCopies utility
It's useful to get both the instruction and register at the same time.
2020-11-13 11:07:04 -05:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00