DAGCombiner does this, but divisions expanded here miss this
optimization. Since 67aa18f165,
divisions have been expanded here and missed out on this
optimization. Avoids test regressions in a future patch.
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.
The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.
The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
get sorted next to each other, and
- shouldClusterMemOps knows what they mean.
One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.
Differential Revision: https://reviews.llvm.org/D71655
This using the wrong result register, and dropping the result entirely
for v2f16. This would fail to select on the scalar case. I believe it
was also mishandling packed/unpacked subtargets.
Summary:
Check that a s_cbranch_execz is not a loop exit before removing it.
As the pass is generating infinite loops.
Reviewers: cdevadas, arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits, dstuttard, foad
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72997
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.
This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.
Differential Revision: https://reviews.llvm.org/D72961
Pointers of unrecognized address spaces shoudl be treated as
global-like pointers. Even if loads and stores of them aren't handled,
dumb operations that just operate on the bits should work.
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.
This change is NFC.
There's no reason to introduce a new, unnaturally sized value
here. This has a chance to produce worse code with
legalization. Avoids regression in a future patch.
a785209bc2 switched to using a pseudos instead of manually tying
operands on the regular instruction. The VGPR indexing mode path
should have the same problems that change attempted to avoid, so these
should use the same strategy.
Use a single pseudo for the VGPR indexing mode and movreld paths, and
expand it based on the subtarget later. These have essentially the
same constraints, reading the index from m0.
Switch from using an offset to the subregister index directly, instead
of computing an offset and re-adding it back. Also add missing pseudos
for existing register class sizes.
Document some high level strategies that should be used for register
bank selection. The constant bus restriction section hasn't actually
been implemented yet.
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
Currently the lowering for i16 image coordinates asserts on gfx10. I'm
somewhat confused by this though. The feature is missing from the
gfx10 feature lists, but the a16 bit appears to be present in the
manual for MIMG instructions.
This case can be handled as a regular selection pattern, so move it
out of the weird post-isel folding code which doesn't have an exactly
equivalent place in GlobalISel.
I think it doesn't make much sense to do this optimization here
though, and it would be more useful in instcombine. There's not really
any new information that will be gained during lowering since these
inputs were known from the beginning.
This does produce slightly different code. Now a unique IMPLICIT_DEF
is emitted for each of the implicit_def operands, rather than reusing
the same one.
I'm mildly worried about potentially reordering exp/exp_done with
IntrWriteMem on the intrinsic.
Requires hacking out the illegal type on SI, so manually select that
case during lowering.
Summary:
InlineResult is used both in APIs assessing whether a call site is
inlinable (e.g. llvm::isInlineViable) as well as in the function
inlining utility (llvm::InlineFunction). It means slightly different
things (can/should inlining happen, vs did it happen), and the
implicit casting may introduce ambiguity (casting from 'false' in
InlineFunction will default a message about hight costs,
which is incorrect here).
The change renames the type to a more generic name, and disables
implicit constructors.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72744
Bitcast only really applies between scalars and vectors. Implement as
an unmerge and remerge. The test needs to tolerate failure since one
of the unmerges currently fails to legalize.
The 16 bank LDS case is complicated due to using multiple
instructions. If I attempt to write a pattern for it, the generated
selector incorrectly places the copy to m0 after the first
instruction, so that needs to be separately addressed.
Also fix not gluing the copy to m0 to the second operation in the
second half of the 16 bank lowering.
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.
This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.
Differential revision: https://reviews.llvm.org/D68092
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
arranged after phi elimination or DeSSA. It's enhanced to handle the
dead register definition by skipping use check on it. Once a register
def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
`detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.
Reviewers: rampitec, qcolombet, sunfish
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72709
This change allows to model the height of the instruction
within a bundle for latency adjustment purposes.
Differential Revision: https://reviews.llvm.org/D72669
We do not have InstrItinerary so generic getInstLatency() was always
defaulting to return 1 cycle. We need to use TargetSchedModel instead
to compute an instruction's latency.
Differential Revision: https://reviews.llvm.org/D72655
The current users of the waterfall loop utility functions do not make
use of the restored original insert point. The insertion is either
done, or they set the insert point somewhere else. A future change
will want to insert instructions after the waterfall loop, but
figuring out the point after the loop is more difficult than ensuring
the insert point is there after the loop.
The branch target needs to be changed depending on whether there is an
unconditional branch or not.
Loops also need to be similarly fixed, but compiling a simple testcase
end to end requires another set of patches that aren't upstream yet.
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
Summary:
- SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions.
While this is necessary for the special handling of moving results out of WWM live ranges,
it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every
WQM operation. This change uses a COPY psuedo in these cases, which allows the register
allocator to coalesce the moves away.
Reviewers: tpr, dstuttard, foad, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71386
This reverts the AMDGPU DAG mutation implemented in D72487 and gives
a more general way of adjusting BUNDLE operand latency.
It also replaces FixBundleLatencyMutation with adjustSchedDependency
callback in the AMDGPU, fixing not only successor latencies but
predecessors' as well.
Differential Revision: https://reviews.llvm.org/D72535
If an SGPR vector is indexed with a VGPR, the actual indexing will be
done on the SGPR and produce an SGPR. A copy needs to be inserted
inside the waterwall loop to the VGPR result.
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.
Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.
Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.
Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.
Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.
One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
When these arguments are broken down by the EVT based callbacks, the
pointer information is lost. Hack around this by coercing the register
types to be the expected pointer element type when building the
remerge operations.
This should be legal, but will require future selection work. 16-bit
shift amounts were already removed from being legal, but this didn't
adjust the transformation rules.
This doesn't enable any new imports yet, but moves the fmed patterns
from failing on this to hitting the "complex suboperand referenced
more than once" limitation in tablegen.
`APFLoat::convertFromString` returns `Expected` result, which must be
"checked" if the LLVM_ENABLE_ABI_BREAKING_CHECKS preprocessor flag is
set.
To mark an `Expected` result as "checked" we must consume the `Error`
within.
In many cases, we are only interested in knowing if an error occured,
without the need to examine the error info. This is achieved, easily,
with the `errorToBool()` API.
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
We have a lot of complex pattern variants that just set the source
modifiers that are really handled, and then set the output modifiers
to 0. We're unlikely to ever match output modifiers from the use
instruction side, and we already match clamp/omod in a separate pass.
This solves selection failures with generated selection patterns,
which would fail due to inferring the SGPR reg bank for virtual
registers with a set register class instead of VCC bank. Use
instruction selection would constrain the virtual register to a
specific class, so when the def was selected later the bank no longer
was set to VCC.
Remove the SCC reg bank. SCC isn't directly addressable, so it
requires copying from SCC to an allocatable 32-bit register during
selection, so these might as well be treated as 32-bit SGPR values.
Now any scalar boolean value that will produce an outupt in SCC should
be widened during RegBankSelect to s32. Any s1 value should be a
vector boolean during selection. This makes the vcc register bank
unambiguous with a normal SGPR during selection.
Summary of how this should now work:
- G_TRUNC is always a no-op, and never should use a vcc bank result.
- SALU boolean operations should be promoted to s32 in RegBankSelect
apply mapping
- An s1 value means vcc bank at selection. The exception is for
legalization artifacts that use s1, which are never VCC. All other
contexts should infer the VCC register classes for s1 typed
registers. The LLT for the register is now needed to infer the
correct register class. Extensions with vcc sources should be
legalized to a select of constants during RegBankSelect.
- Copy from non-vcc to vcc ensures high bits of the input value are
cleared during selection.
- SALU boolean inputs should ensure the inputs are 0/1. This includes
select, conditional branches, and carry-ins.
There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT
selection ignores the usual register-bank from register class
functions, and can't handle truncates with VCC result banks. I think
this is OK, since the artifacts are specially treated anyway. This
does require some care to avoid producing cases with vcc. There will
also be no 100% reliable way to verify this rule is followed in
selection in case of register classes, and violations manifests
themselves as invalid copy instructions much later.
Standard phi handling also only considers the bank of the result
register, and doesn't insert copies to make the source banks
match. This doesn't work for vcc, so we have to manually correct phi
inputs in this case. We should add a verifier check to make sure there
are no phis with mixed vcc and non-vcc register bank inputs.
There's also some duplication with the LegalizerHelper, and some code
which should live in the helper. I don't see a good way to share
special knowledge about what types to use for intermediate operations
depending on the bank for example. Using the helper to replace
extensions with selects also seems somewhat awkward to me.
Another issue is there are some contexts calling
getRegBankFromRegClass that apparently don't have the LLT type for the
register, but I haven't yet run into a real issue from this.
This also introduces new unnecessary instructions in most cases, since
we don't yet try to optimize out the zext when the source is known to
come from a compare.
This would complain about invalid legalizer rules otherwise.
Mark some operations as unsupported for AMDGPU. This currently seems
to produce the same legalize error as when no rules are defined, but
eventually this should produce a proper user facing error.
The existing test only covered one case for r600. The use of
mul_legacy also looks suspicious to me, but leave it for now. The
patterns are also not making use of source modifiers.
This assumed a 32-bit extract size, which would produce invalid copies
with 64-bit extracts. Handle the easy case. Ideally we would have a
way to get the proper subreg index for any 32-bit offset, but there
should probably be a tablegenerated way of getting the subreg index
for any size and offset.
This produces more intelligible looking results, more comparabble to
the DAG output in the simplest cases. This is probably wrong in
complex control flow, but RegBankSelect doesn't attempt analyzing if
this is on a masked path for selecting the bank yet.
We're checking the current register bank of the registers in the
instruction, but the mapping may have inserted cross bank copies and
is expecting to replace the registers.
We mostly get away with this currently, because VGPR->SGPR copies are
illegal, and we assume this won't happen. In a future change, we'll
start relying on more cross register bank copies being inserted, and
this starts to break down.
We can revert region schedule if new schedule decreases occupancy.
However, if we already have only one wave we would accept any new
schedule even if it blows up register pressure. Such schedule may
result in quite heavy spilling which can be avoided if we reject
this new schedule.
Differential Revision: https://reviews.llvm.org/D72181
AMDGPU can't unambiguously go back from the selected instruction
register class to the register bank without knowing if this was used
in a boolean context.
Summary:
- CF users won't be non-instruction values. Skip them to save the
compilation time. It's especially true when there are multiple
functions in that module, where, says, a constant may be used in most
functions. The current CF user tracing adds significant overhead.
Reviewers: alex-t, rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72174
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
This allows us to clean up some places that were peeking through
the MERGE_VALUES node after the call. By returning the SDValues
directly, we can clean that up.
Unfortunately, there are several call sites in AMDGPU that wanted
the MERGE_VALUES and now need to create their own.
Sometimes the result bank of the phi is already assigned to something,
and should not be ignored. This is in preparation for additional
boolean phi handling changes.
Also refine the logic to fix some cases that were incorrectly deciding
to use SGPRs.
There ended up being two result registers, which would fail on
select. It was really defing a new temp register in the correct def
position, instead of the correct result register.
Summary:
The only useful information the UndefValue conveys is the address space,
which MachinePointerInfo can represent directly without referring to an
IR value.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71838
Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.
I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
Summary:
Modify CombineInfo to only store information about a single instruction.
This is a little easier to work with and removes a lot of duplicate
initialization code.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm, nhaehnle
Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71045
Summary:
The typo has been present since memOpsHaveSameBasePtr was introduced in
r313208.
It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than
it was supposed to.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71616
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.
Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.
The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.
This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.
Differential Revision: https://reviews.llvm.org/D71359
Summary:
This is a resubmit of D71473.
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: aaron.ballman, courbet
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71547
Summary:
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71473
Summary:
At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".
In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.
Reviewers: foad, arsenm
Reviewed By: arsenm
Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Patch by Eugene Kuznetsov.
Differential Revision: https://reviews.llvm.org/D70367
This reverts commit 69fcfb7d35.
As shown in the test I attached to this commit, the change I reverted
causes a problem with "zext(cc1) - zext(cc2)". It commuted
the operands to the sub and used different logic to select the addc/subc
instruction:
sub zext (setcc), x => addcarry 0, x, setcc
sub sext (setcc), x => subcarry 0, x, setcc
... but that is bogus. I believe it is not possible to fold those commuted
patterns into any form of addcarry or subcarry. It may have worked as
intended before "AMDGPU: Change boolean content type to 0 or 1" because
the setcc was considered to be -1 rather than 1.
Differential Revision: https://reviews.llvm.org/D70978
Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
Summary:
Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2
constant operands into the same SALU operation, with neither operand able to be
encoded as an inline constant.
Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70896
Summary:
Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add.
If only one SGPR was available this crashed when trying to scavenge another
32bit SGPR to materialize the offset.
Instead, reuse a 32-bit SGPR from the carry out as the offset register.
Also prefer to use vcc for the unused carry out when it is available.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70614
Summary:
The waitcnt pass can overflow the counters when the number of outstanding events
for a type exceed the capacity of the counter. This can lead to inefficient
insertion of waitcnts, or to waitcnt instructions with max values for each type.
The last situation can cause an instruction which when disassembled appears to
be an illegal waitcnt without an operand.
In these cases we should add a wait for the 'counter maximum' - 1, and update the
waitcnt brackets accordingly.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70418
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
Summary:
Add a function attribute to allow the target specific default loop unroll threshold
to be specified on a per-function basis. This allows a front-end to give guidance
where it has insight that is not available to the back-end, while still allowing the
target specific heuristics to also have an effect.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68873
These opcodes use indirect register addressing so they need special handling by codegen (currently missing).
Reviewers: vpykhtin, arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D70400
Added a check to make sure that the selected dpp opcode is supported by target.
Reviewers: vpykhtin, arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D70402
Hostcall is a service that allows a kernel to submit requests to the
host using shared buffers, and block until a response is
received. This will eventually replace the shared buffer currently
used for printf, and repurposes the same hidden kernel argument. This
change introduces a new ValueKind in the HSA metadata to represent the
hostcall buffer.
Differential Revision: https://reviews.llvm.org/D70038
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.
This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
Summary:
Most of IR instructions got better code size estimations after commit 47a5c36b.
So default parameters values should be updated to improve inlining and
unrolling for the target.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70391
Summary:
We should check for same instruction class before checking whether they
have the same base address, else we might iterate out of bounds of a
MachineInstr operands list. The InstClass check is also cheaper.
This was introduced in SVN r373630.
Reviewers: tstellar
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68690
The usage of target boolean checks is overly inflexible, since sext
and zext of a compare are equally cheap. The choice is arbitrary, but
using 0/1 to some degree is the choice of lower resistance since
that's what most targets use. This enables a few combines that don't
bother to support ZeroOrNegativeOneBooleanContent.
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.
I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.
This was added to inhibit a warning from gcc 7.3 according to
the comment. However, it triggers warning from PVS. In addition
I cannot reproduce it with gcc 7.4 and I also cannot reproduce
it with gcc 7.3 using compiler explorer.
Differential Revision: https://reviews.llvm.org/D69863
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD
Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69734
addOperand() method of AMDGPU disassembler returns SoftFail
on error. All instances which may lead to that place are
an impossible encdoing, not something which is possible to
encode, but semantically incorrect as described for SoftFail.
Then tablegen generates a check of the following form:
if (Decode...(..) == MCDisassembler::Fail) { return MCDisassembler::Fail; }
Since we can only return Success and SoftFail that is dead
code as detected by the static code analyzer.
Solution: return Fail as it should be.
See https://bugs.llvm.org/show_bug.cgi?id=43886
Differential Revision: https://reviews.llvm.org/D69819
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.
We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.
This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886
Differential Revision: https://reviews.llvm.org/D69815
The default FP mode should really be a property of a specific
function, and not a subtarget. Introduce the necessary fields to the
SIMachineFunctionInfo to help move towards this goal.
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
readlane and writelane instructions are not allowed to use m0 as the
data operand, so spilling them is tricky and would require an
intermediate SGPR to spill it. Constrain the virtual register class in
this caes to disallow the inline spiller from folding the m0 operand
directly into the spill instruction.
I copied this hack from AArch64 which has the same problem for $sp.
Summary:
VCCZBugHandledSet was used to make sure we don't apply the same
workaround more than once to a single cbranch instruction, but it's not
necessary because the workaround involves inserting an s_waitcnt
instruction, which is enough for subsequent iterations to detect that no
further workaround is necessary.
Also beef up the test case to check that the workaround was only applied
once. I have also manually verified that the test still passes even if I
hack the big do-while loop in runOnMachineFunction to run a minimum of
five iterations.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69621
When TableGen is inferring register classes from contexts, it uses a
sorting function based on the number of registers in the class. Since
this was being treated as an alias of VGPR_32, they had exactly the
same size. The sort used wasn't a stable sort, and even if it were, I
believe the tie breaker would effectively end up being the
alphabetical ordering of the class name. There appear to be issues
trying to use an empty set of registers, so add only one so this will
always sort to the end.
Also add a comment explaining how VReg_1 is a dirty hack for
SelectionDAG.
This does end up changing the behavior of i1 with inline asm and VGPR
constraints, but the existing behavior was was already nonsensical and
inconsistent. It should probably be disallowed anyway.
Fixes bug 43699
Summary:
An outstanding load with same destination sgpr as call could cause PC to be
updated with junk value on return.
Reviewers: arsenm, rampitec
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69474
There is a minor flaw in the implementation of function lowerPhis.
This function replaces values of regclass Vreg_1 (boolean values)
involved in PHIs into an SGPR. Currently it iterates over the MBBs
and performs an inplace lowering of PHIs and fails to lower any
incoming value that itself is another PHI of Vreg_1 regclass.
The failure occurs only when the MBB where the incoming PHI value
belongs is not visited/lowered yet.
To fix this problem, collect all Vreg_1 PHIs upfront and then
perform the lowering.
Differential Revision: https://reviews.llvm.org/D69182
That used to fail in the last testcase function because after
%0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it
was further folded into S_STORE_DWORD_IMM. Its legal effective
subreg class is SReg_32 while instruction expects more restricted
SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand()
passed the legality check and it was caught in the verifier.
Borrowed code from the verifier to check for RC legality.
Differential Revision: https://reviews.llvm.org/D69445
Both tryFoldOMod() and tryFoldClamp() remove original instruction,
so the check MI.modifiesRegister() may use a deleted MI.
Differential Revision: https://reviews.llvm.org/D69448
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
Summary:
In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ.
Reviewer:
arsenm
Differential Revision:
https://reviews.llvm.org/D69358
An SUnit can be neither intruction not SDNode. It is all
null if represents a nop. Fixed a crash on using SU->getInstr().
Differential Revision: https://reviews.llvm.org/D69395
Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xexec permutations.
Differential Revision: https://reviews.llvm.org/D69280
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
Only handle simple inter-block redefs of m0 to the same value. This
avoids interference from redefs of m0 in SILoadStoreOptimzer. I was
initially teaching that pass to ignore redefs of m0, but having them
not exist beforehand is much simpler.
This is in preparation for deleting the current special m0 handling in
SIFixSGPRCopies to allow the register coalescer to handle the
difficult cases.
llvm-svn: 375449
r375293 removed the SGPR spilling with scalar stores path, so this is
no longer necessary. This also always had the defect of adding the def
even when this path wasn't in use.
llvm-svn: 375448
If a PHI defines AGPR legalize its operands to AGPR.
At the moment we can get an AGPR PHI with VGPR operands.
I am not aware of any problems as it seems to be handled
gracefully in RA, but this is not right anyway.
It also slightly decreases VGPR pressure in some cases
because we do not have to a copy via VGPR.
Differential Revision: https://reviews.llvm.org/D69206
llvm-svn: 375446
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69216
llvm-svn: 375398
We handle it this way for some other address spaces.
Since r349196, SILoadStoreOptimizer has been trying to do this. This
is after SIFoldOperands runs, which can change the addressing
patterns. It's simpler to just split this earlier.
llvm-svn: 375366
MachineInstr.h included AliasAnalysis.h, which includes a world of IR
constructs mostly unneeded in CodeGen. Prune it. Same for
DebugInfoMetadata.h.
Noticed with -ftime-trace.
llvm-svn: 375311
If all uses of a PHI are in AGPR register class we should
avoid unneeded copies via VGPRs.
Differential Revision: https://reviews.llvm.org/D69200
llvm-svn: 375297
Summary: The implementation was never completed and never used except in tests.
Reviewers: arsenm, mareko
Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69163
llvm-svn: 375293
The default implementation of isIncomingArgumentHandler could lead
to generating incorrect code.
Make it a pure virtual method, so that targets know they have to
override it to produce correct code.
NFC
Differential Revision: https://reviews.llvm.org/D69187
llvm-svn: 375277
Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This
will allow the register coalescer to do a better job eliminating
copies to m0.
For GlobalISel, as a terrible hack, use SGPR_32 for things that should
use SCC until booleans are solved.
llvm-svn: 375267
We already have isFloatType helper, and they are out of sync.
Drop one and merge the type list.
Differential Revision: https://reviews.llvm.org/D69138
llvm-svn: 375175
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
llvm-svn: 375084
Do not generate non-existing sdwa instructions. It reduces the
number of generated instructions by 185.
Differential Revision: https://reviews.llvm.org/D69010
llvm-svn: 375016
Summary:
Even though writelane doesn't have the same constraints as other valu
instructions it still can't violate the >1 SGPR operand constraint
Due to later register propagation (e.g. fixing up vgpr operands via
readfirstlane) changing writelane to only have a single SGPR is tricky.
This implementation puts a new check after SIFixSGPRCopies that prevents
multiple SGPRs being used in any writelane instructions.
The algorithm used is to check for trivial copy prop of suitable constants into
one of the SGPR operands and perform that if possible. If this isn't possible
put an explicit copy of Src1 SGPR into M0 and use that instead (this is
allowable for writelane as the constraint is for SGPR read-port and not
constant-bus access).
Reviewers: rampitec, tpr, arsenm, nhaehnle
Reviewed By: rampitec, arsenm, nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D51932
Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea
llvm-svn: 375004
Summary:
Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle
different flavours of image_load and image_sample instructions.
When the instructions of the same subclass differ only in dmask, merge
them and update dmask accordingly.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64911
llvm-svn: 374984
Summary:
Two conditions could lead to infinite loops when processing PHI nodes in
SIFixSGPRCopies.
The first condition involves a REG_SEQUENCE that uses registers defined by both
a PHI and a COPY.
The second condition arises when a physical register is copied to a virtual
register which is then used in a PHI node. If the same virtual register is
copied to the same physical register, the result is an endless loop.
%0:sgpr_64 = COPY $sgpr0_sgpr1
%2 = PHI %0, %bb.0, %1, %bb.1
$sgpr0_sgpr1 = COPY %0
Reviewers: alex-t, rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68970
llvm-svn: 374944
We define mov/update dpp intrinsics as overloaded but do not
support i64, which is a practically useful type. Fix the
selection and lowering.
Differential Revision: https://reviews.llvm.org/D68673
llvm-svn: 374910
This defaults to zero fi operand, but we do not expose it
anyway. Should we expose it later it needs to be added to
the pseudo.
This enables dpp combining on gfx10.
Differential Revision: https://reviews.llvm.org/D68888
llvm-svn: 374604
Summary:
This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion."
This reverts the code changes from commit 53f967f2bd
but keeps the test case.
Reviewers: hliao, arsenm, tpr, dstuttard
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68769
llvm-svn: 374347
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.
llvm-svn: 374284
In a future patch, this will help cleanup m0 handling.
The register coalescer handles copies from a register that
materializes an immediate, but doesn't handle move immediates
itself. The virtual register uses will often be allocated to the same
register, so there end up being no real copy.
llvm-svn: 374257
This was ignoring the register bank of the input pointer, and
isUniformMMO seems overly aggressive.
This will now conservatively assume a VGPR in cases where the incoming
bank hasn't been determined yet (i.e. is from a loop phi).
llvm-svn: 374255
If original instruction did not have source modifiers they were
not added to the new DPP instruction as well, even if needed.
Differential Revision: https://reviews.llvm.org/D68729
llvm-svn: 374241
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.
Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.
llvm-svn: 374092
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor. This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores. It also limits
the SILoadStoreOptimizer from merging more instructions.
This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65097
llvm-svn: 374087
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.
Differential Revision: https://reviews.llvm.org/D68607
llvm-svn: 374083
Start manually writing a table to get the subreg index. TableGen
should probably generate this, but I'm not sure what it looks like in
the arbitrary case where subregisters are allowed to not fully cover
the super-registers.
llvm-svn: 373947
At minimum handle the s64 insert type, which are emitted in real cases
during legalization.
We really need TableGen to emit something to emit something like the
inverse of composeSubRegIndices do determine the subreg index to use.
llvm-svn: 373938
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
Doing this makes MSVC complain that `empty(someRange)` could refer to
either C++17's std::empty or LLVM's llvm::empty, which previously we
avoided via SFINAE because std::empty is defined in terms of an empty
member rather than begin and end. So, switch callers over to the new
method as it is added.
https://reviews.llvm.org/D68439
llvm-svn: 373935
Summary:
This patch fixes a potential aliasing problem in InstClassEnum,
where local values were mixed with machine opcodes.
Introducing InstSubclass will keep them separate and help extending
InstClassEnum with other instruction types (e.g. MIMG) in the future.
This patch also makes getSubRegIdxs() more concise.
Reviewers: nhaehnle, arsenm, tstellar
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68384
llvm-svn: 373699
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.
llvm-svn: 373638
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.
llvm-svn: 373637
Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.
In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.
The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block. This will help to reduce the time this pass
spends re-optimizing instructions.
In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.
This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65961
llvm-svn: 373630
Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.
The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.
SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.
The layout looks as follows:
+-------------+
| stack arg |
+-------------+
| Callee Saves|
| X29, X30 | (if available)
|-------------| <- FP (if available)
| : |
| SVE area |
| : |
+-------------+
|/////////////| alignment gap.
| : |
| Stack objs |
| : |
+-------------+ <- SP after call and frame-setup
SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.
Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61437
llvm-svn: 373585
When SIFixSGPRCopies attempts to fix an illegal copy from vector to
scalar register it calls moveToVALU(). A copy from an agpr to sgpr
becomes a copy from agpr to agpr, which may result in the illegal
register class at a use of this copy.
Solution is to copy it always into a vgpr. This may result in a
subsequent copy into an agpr if that is what really needed, however
should not happen too often and likely will be folded later.
The opposite situation may not happen because an sgpr is always
illegal where agpr is legal, so such user instructions may not
exist.
Differential Revision: https://reviews.llvm.org/D68358
llvm-svn: 373544
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.
Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store
Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.
The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.
There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.
Reviewers: arsenm, nhaehnle, tpr
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68200
llvm-svn: 373491
Summary:
Printf lowering unconditionally visited every instruction in the module.
To make it faster in the common case where there are no printfs, look up
the printf function (if any) and iterate over its users instead.
Reviewers: rampitec, kzhuravl, alex-t, arsenm
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68145
llvm-svn: 373433
In principle this should behave as any other constant. However
eliminateFrameIndex currently assumes a VALU use and uses a vector
shift. Work around this by selecting to VGPR for now until
eliminateFrameIndex is fixed.
llvm-svn: 373415
Account and report agprs separately on gfx908. Other targets
do not change the reporting.
Differential Revision: https://reviews.llvm.org/D68307
llvm-svn: 373411
Summary:
This is a refactoring that will make future improvements to this pass easier.
This change should not change the behavior of the pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Reviewed By: nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65496
llvm-svn: 373366
There are 1024 bit register classes defined for AGPRs. Additionally
OpenCL defines vectors up to 16 x i64, and this helps those tests
legalize.
llvm-svn: 373350
This is sort of papering over the fact that we don't run a combiner
anywhere, but avoiding creating 2 instructions in the first place is
easy.
llvm-svn: 373293
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.
Differential Revision: https://reviews.llvm.org/D67121
llvm-svn: 372935
Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future.
Committed on behalf of @hvdijk (Harald van Dijk)
Differential Revision: https://reviews.llvm.org/D66138
llvm-svn: 372882
Summary:
This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class.
The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly.
Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots.
Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao
Reviewed By: hliao
Subscribers: hliao, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #amdgpu, #llvm
Differential Revision: https://reviews.llvm.org/D67974
llvm-svn: 372874
The static analyzer is warning about a potential null dereference, but we should be able to use cast<LoadSDNode> directly and if not assert will fire for us.
llvm-svn: 372528
I believe all of the uniform/divergent pattern predicates are
redundant and can be removed. The uniformity bit already influences
the register class, and nothhing has broken when I've removed this and
others.
llvm-svn: 372450
My toolchain stopped working (LLVM 8.0 , libstdc++ 5.4.0) after
r372338.
The same problem was seen in clang-cuda-build buildbots:
clang-cuda-build/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:763:12:
error: chosen constructor is explicit in copy-initialization
return {Reg, 0, nullptr};
^~~~~~~~~~~~~~~~~
/usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
note: explicit constructor declared here
constexpr tuple(_UElements&&... __elements)
^
This commit adds explicit calls to std::make_tuple to work around
the problem.
llvm-svn: 372384
The function could return zero if an extreme number or
registers were used. Minimal possible occupancy is 1.
Differential Revision: https://reviews.llvm.org/D67771
llvm-svn: 372350
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
This needs special handling due to some subtargets that have a
nonstandard register layout for f16 vectors
Also reject some illegal types on other targets.
llvm-svn: 372293
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.
Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.
llvm-svn: 371800
The result integer does not need to be the same width as the input.
AMDGPU, NVPTX, and Hexagon all have patterns working around the types
matching. GlobalISel defines these as being different type indexes.
llvm-svn: 371797
This was relying on the SGPR usable for the carry out clobber to also
be used for the input. There was no carry out on gfx9. With no carry
out clobber to worry about, so the literal can just be directly used
with a VOP2 add.
llvm-svn: 371791
Summary:
After hoisting and merging m0 initializations schedule them as early as
possible in the MBB. This helps the scheduler avoid hazards in some
cases.
Reviewers: rampitec, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67450
llvm-svn: 371671
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
The scalar f64 patterns don't work yet because they fail on multiple
results from the unused implicit def of scc in the result bit
operation.
llvm-svn: 371542
f64 doesn't work yet because tablegen currently doesn't handlde
REG_SEQUENCE.
This does regress some multi use VALU fneg cases since now the
immediate remains in an SGPR, and more moves are used for legalizing
the xor. This is a SIFixSGPRCopies deficiency.
llvm-svn: 371540
There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.
llvm-svn: 371533
Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only
G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will
probably be more convenient in most cases.
llvm-svn: 371440
This enables GlobalISel to handle various intrinsics. The custom node
pattern will be ignored, and the intrinsic will work. This will also
allow SelectionDAG to directly select the intrinsics, but as they are
all custom lowered to the nodes, this ends up leaving dead code in the
table.
Eventually either GlobalISel should add the equivalent of custom nodes
equivalent, or intrinsics should be directly used. These each have
different tradeoffs.
There are a few more to handle, but these are easy to handle
ones. Some others fail for other reasons.
llvm-svn: 371432
Unfortunately MnemonicAlias defines a "Predicates" field just like an
instruction or pattern, with a somewhat different interpretation.
This ends up overriding the intended Predicates set by
PredicateControl on the pseudoinstruction defintions with an empty
list. This allowed incorrectly selecting instructions that should have
been rejected due to the SubtargetPredicate from patterns on the
instruction definition.
This does remove the divergent predicate from the 64-bit shift
patterns, which were already not used for the 32-bit shift, so I'm not
sure what the point was. This also removes a second, redundant copy of
the 64-bit divergent patterns.
llvm-svn: 371427
Treat this as legal on gfx9 since it can use S_PACK_* instructions for
this.
This isn't used by anything yet. The same will probably apply to
16-bit G_BUILD_VECTOR without the trunc.
llvm-svn: 371423
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
Summary:
This fixes poor scheduling in a function containing a barrier and a few
load instructions.
Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial
edge in the dependency graph from the barrier instruction to the exit
node representing live-out latency, with a latency of about 500 cycles.
Because of this it thinks the critical path through the graph also has
a latency of about 500 cycles. And because of that it does not think
that any of the load instructions are on the critical path, so it
schedules them with no regard for their (80 cycle) latency, which gives
poor results.
Reviewers: arsenm, dstuttard, tpr, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67218
llvm-svn: 371192
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.
llvm-svn: 371148
Approximately 30% of the time was spent in the std::vector
constructor. In one testcase this pushes the scheduler to being the
second slowest pass.
I'm not sure I understand why these vector are necessary. The default
scheduler initCandidate seems to use some pre-existing vectors for the
pressure.
llvm-svn: 371136
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
Since an add instruction must produce an unused carry out, this
requires additional SGPRs. This can be avoided by keeping the entire
offset computation in SGPRs. If one SGPR is still available, this only
costs one extra mov. If none are available, the entire computation can
be done in place and reversed.
This does assume the use is a VGPR operand. This was already assumed,
and we currently only select frame indexes to VALU instructions. This
should probably be fixed at some point to handle more possible MIR.
llvm-svn: 370929
On AArch64, s128 types have to be split into s64 GPRs when passed as arguments.
This change adds the generic support in call lowering for dealing with multiple
registers, for incoming and outgoing args.
Support for splitting for return types not yet implemented.
Differential Revision: https://reviews.llvm.org/D66180
llvm-svn: 370822
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.
Reviewers: nhaehnle, arsenm, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65813
llvm-svn: 370667
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.
This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).
llvm-svn: 370281
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.
This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.
llvm-svn: 370280
Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor.
Differential Revision: https://reviews.llvm.org/D66900
llvm-svn: 370250
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.
Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.
llvm-svn: 369994
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.
As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.
llvm-svn: 369991
Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65644
llvm-svn: 369745
Prefer `MCFixupKind` where possible and add getTargetKind() to
convert to `unsigned` when needed rather than scattering cast
operators around the place.
Differential Revision: https://reviews.llvm.org/D59890
llvm-svn: 369720
I might look at improving PR43065 which will require being
able to mark a 256 and 512 bit vector of f16 as Legal.
Differential Revision: https://reviews.llvm.org/D66515
llvm-svn: 369565
This is necessary for handling <3 x s16> on AMDGPU, assuming this
should be handled as 2 separate legalization actions. The alternative
would be for fewerElementsVector to handle 3->2.
llvm-svn: 369547
Overriders may want to modify state in it. AMDGPU wants
to, but has to make its members mutable in order to do so.
Besides, EmitBasicBlockEnd is not const, so why should
Start be?
Patch by Bevin Hansson.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D66341
llvm-svn: 369325