This rewrites big parts of the fast register allocator. The basic
strategy of doing block-local allocation hasn't changed but I tweaked
several details:
Track register state on register units instead of physical
registers. This simplifies and speeds up handling of register aliases.
Process basic blocks in reverse order: Definitions are known to end
register livetimes when walking backwards (contrary when walking
forward then uses may or may not be a kill so we need heuristics).
Check register mask operands (calls) instead of conservatively
assuming everything is clobbered. Enhance heuristics to detect
killing uses: In case of a small number of defs/uses check if they are
all in the same basic block and if so the last one is a killing use.
Enhance heuristic for copy-coalescing through hinting: We check the
first k defs of a register for COPYs rather than relying on there just
being a single definition. When testing this on the full llvm
test-suite including SPEC externals I measured:
average 5.1% reduction in code size for X86, 4.9% reduction in code on
aarch64. (ranging between 0% and 20% depending on the test) 0.5%
faster compiletime (some analysis suggests the pass is slightly slower
than before, but we more than make up for it because later passes are
faster with the reduced instruction count)
Also adds a few testcases that were broken without this patch, in
particular bug 47278.
Patch mostly by Matthias Braun
This seems to have caused incorrect register allocation in some cases,
breaking tests in the Zig standard library (PR47278).
As discussed on the bug, revert back to green for now.
> Record internal state based on register units. This is often more
> efficient as there are typically fewer register units to update
> compared to iterating over all the aliases of a register.
>
> Original patch by Matthias Braun, but I've been rebasing and fixing it
> for almost 2 years and fixed a few bugs causing intermediate failures
> to make this patch independent of the changes in
> https://reviews.llvm.org/D52010.
This reverts commit 66251f7e1d, and
follow-ups 931a68f26b
and 0671a4c508. It also adjust some
test expectations.
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.
Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
Summary:
This fixes PR44135.
The special case when we promote a bitcast from a vector to an int
needs special handling when we are on a big-endian target.
Prior to this fix, for the added vec_to_int we see the following in the
SelectionDAG printouts
Type-legalized selection DAG: %bb.1 'foo:bb.1'
SelectionDAG has 9 nodes:
t0: ch = EntryToken
t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
t17: v4i32 = bitcast t2
t23: i32 = extract_vector_elt t17, Constant:i32<3>
t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23
t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1
and I think here the extract_vector_elt is wrong and extracts the value
from the wrong index.
The program program should return the 32 bits made up of the elements at
index 4 and 5 in the vec6 array, but with
t23: i32 = extract_vector_elt t17, Constant:i32<3>
as far as I can tell, we will extract values that originally didn't even
exist in the vec6 vectore.
If we would instead extract the element at index 2 we would get the wanted
values.
With this fix we insert a right shift after the bitcast in
DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us
Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1'
SelectionDAG has 9 nodes:
t0: ch = EntryToken
t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
t23: v4i32 = bitcast t2
t27: i32 = extract_vector_elt t23, Constant:i32<2>
t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27
t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1
So now we get
t27: i32 = extract_vector_elt t23, Constant:i32<2>
which is what we want.
Similarly, the new int_to_vec testcase exposes a bug where we cast the other
direction. Then we instead need to add a left shift before the bitcast on
big-endian targets for the bits in the input integer to end up at the exptected
place in the vector.
Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker
Reviewed By: efriedma
Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70942