When the function return type is non-void and `end` instructions are at
the very end of a function, CFGStackify's `fixEndsAtEndOfFunction`
function fixes the corresponding block/loop/try's type to match the
function's return type. This is applied to consecutive `end` markers at
the end of a function. For example, when the function return type is
`i32`,
```
block i32 ;; return type is fixed to i32
...
loop i32 ;; return type is fixed to i32
...
end_loop
end_block
end_function
```
But try-catch is a little different, because it consists of two parts:
a try part and a catch part, and both parts' return type should satisfy
the function's return type. Which means,
```
try i32 ;; return type is fixed to i32
...
block i32 ;; this should be changed i32 too!
...
end_block
catch
...
end_try
end_function
```
As you can see in this example, it is not sufficient to only `end`
instructions at the end of a function; in case of `try`, we should
check instructions before `catch`es, in case their corresponding `try`'s
type has been fixed.
This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist
that contains a reverse iterator, each of which is a starting point for
a new backward `end` instruction search.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47413.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D87207
Fixes PR47375, in which an assertion was triggering because
WebAssemblyTargetLowering::isVectorLoadExtDesirable was improperly
assuming the use of simple value types.
Differential Revision: https://reviews.llvm.org/D87110
Fixes PR47040, in which an assertion was improperly triggered during
FastISel's address computation. The issue was that an `Address` set to
be relative to the FrameIndex with offset zero was incorrectly
considered to have an unset base. When the left hand side of an add
set the Address to be 0 off the FrameIndex, the right side would not
detect that the Address base had already been set and could try to set
the Address to be relative to a register instead, triggering an
assertion.
This patch fixes the issue by explicitly tracking whether an `Address`
has been set rather than interpreting an offset of zero to mean the
`Address` has not been set.
Differential Revision: https://reviews.llvm.org/D85581
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated.
Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode).
Differential Revision: https://reviews.llvm.org/D84705
When it was first created, CFGSort only made sure BBs in each
`MachineLoop` are sorted together. After we added exception support,
CFGSort now also sorts BBs in each `WebAssemblyException`, which
represents a `catch` block, together, and
`Region` class was introduced to be a thin wrapper for both
`MachineLoop` and `WebAssemblyException`.
But how we compute those loops and exceptions is different.
`MachineLoopInfo` is constructed using the standard loop computation
algorithm in LLVM; the definition of loop is "a set of BBs that are
dominated by a loop header and have a path back to the loop header". So
even if some BBs are semantically contained by a loop in the original
program, or in other words dominated by a loop header, if they don't
have a path back to the loop header, they are not considered a part of
the loop. For example, if a BB is dominated by a loop header but
contains `call abort()` or `rethrow`, it wouldn't have a path back to
the header, so it is not included in the loop.
But `WebAssemblyException` is wasm-specific data structure, and its
algorithm is simple: a `WebAssemblyException` consists of an EH pad and
all BBs dominated by the EH pad. So this scenario is possible: (This is
also the situation in the newly added test in cfg-stackify-eh.ll)
```
Loop L: header, A, ehpad, latch
Exception E: ehpad, latch, B
```
(B contains `abort()`, so it does not have a path back to the loop
header, so it is not included in L.)
And it is sorted in this order:
```
header
A
ehpad
latch
B
```
And when CFGStackify places `end_loop` or `end_try` markers, it
previously used `WebAssembly::getBottom()`, which returns the latest BB
in the sorted order, and placed the marker there. So in this case the
marker placements will be like this:
```
loop
header
try
A
catch
ehpad
latch
end_loop <-- misplaced!
B
end_try
```
in which nesting between the loop and the exception is not correct.
`end_loop` marker has to be placed after `B`, and also after `end_try`.
Maybe the fundamental way to solve this problem is to come up with our
own algorithm for computing loop region too, in which we include all BBs
dominated by a loop header in a loop. But this takes a lot more effort.
The only thing we need to fix is actually, `getBottom()`. If we make it
return the right BB, which means in case of a loop, the latest BB of the
loop itself and all exceptions contained in there, we are good.
This renames `Region` and `RegionInfo` to `SortRegion` and
`SortRegionInfo` and extracts them into their own file. And add
`getBottom` to `SortRegionInfo` class, from which it can access
`WebAssemblyExceptionInfo`, so that it can compute a correct bottom
block for loops.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D84724
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
Rather than expanding truncating stores so that vectors are stored one
lane at a time, lower them to a sequence of instructions using
narrowing operations instead, when possible. Since the narrowing
operations have saturating semantics, but truncating stores require
truncation, mask the stored value to manually truncate it before
narrowing. Also, since narrowing is a binary operation, pass in the
original vector as the unused second argument.
Differential Revision: https://reviews.llvm.org/D84377
These tests were previously duplicates of the
unfolded_gep_negative_offset tests, and this change updates them to
test what they were meant to test.
Differential Revision: https://reviews.llvm.org/D84365
Implementing new functionality tested in this file requires adding new
tests for many IR addressing patterns, which can be a large
maintenance burden. This patch makes adding tests easier by switching
to using autogenerated checks. This patch also removes the testing
mode that has simd128 disabled because it would produce very large
checks and is not particularly interesting.
Differential Revision: https://reviews.llvm.org/D84288
Accounting for the fact that Wasm function indices are 32-bit, but in wasm64 we want uniform 64-bit pointers.
Includes reloc types for 64-bit table indices.
Differential Revision: https://reviews.llvm.org/D83729
Although the SIMD spec proposal does not specifically include a
select instruction, the select instruction in MVP WebAssembly is
polymorphic over the selected types, so it is able to work on v128
values when they are enabled. This patch introduces a new variant of
the select instruction for each legal vector type. Additional ISel
patterns are adapted from the SELECT_I32 and SELECT_I64 patterns.
Depends on D83736.
Differential Revision: https://reviews.llvm.org/D83737
Updating the simd-select.ll tests manually with consistent named
regexps for the register numbers was taking more time than it was
worth, so this patch updates that test file to have autogenerated
output. This is not a significant readability regression because the
tests in that file are all very small.
Depends on D83734.
Differential Revision: https://reviews.llvm.org/D83736
We were previously expanding vselect and matching on the expansion to
generate bitselects, but in some cases the expansion would be further
combined and a bitselect would not get generated. This patch improves
codegen in those cases by legalizing vselect and lowering it to
v128.bitselect. The old pattern that matches the expansion is still
useful for lowering IR that already uses the expansion rather than a
select operation.
Differential Revision: https://reviews.llvm.org/D83734
The existing code already considered this case. Unfortunately a typo in
the condition prevents it from triggering. Also the existing code, had
it run, forgot to do the folding.
This fixes PR42876.
Differential Revision: https://reviews.llvm.org/D65802
In BUILD_VECTOR lowering, we used to generally prefer using splats
over v128.const instructions because v128.const has a very large
encoding. However, in d5b7a4e2e8 we switched to preferring consts
because they are expected to be more efficient in engines. This patch
updates the ISel patterns to match this current preference.
Differential Revision: https://reviews.llvm.org/D83581
This patch builds on 0d7286a652 by simplifying the code for detecting
splat values and adding new tests demonstrating the lowering of
splatted absolute value shift amounts, which are common in code
generated by Halide. The lowering is very bad right now, but
subsequent patches will improve it considerably. The tests will be
useful for evaluating the improvements in those patches.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D83493
`__stack_chk_fail` does not return, but `unreachable` was not generated
following `call __stack_chk_fail`. This had a possibility to generate an
invalid binary for functions with a return type, because
`__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can
be the last instruction in the function whose return type is non-void.
Generating `unreachable` after it makes sure CFGStackify's
`fixEndsAtEndOfFunction` handles it correctly.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D83277
Since WebAssembly's vector shift instructions take a scalar shift
amount rather than a vector shift amount, we have to check in ISel
that the vector shift amount is a splat. Previously, we were checking
explicitly for splat BUILD_VECTOR nodes, but this change uses the
standard utilities for detecting splat values that can handle more
complex splat patterns. Since the C++ ISel lowering is now more
general than the ISel patterns, this change also simplifies shift
lowering by using the C++ lowering for all SIMD shifts rather than
mixing C++ and normal pattern-based lowering.
This change improves ISel for shifts to the point that the
simd-shift-unroll.ll regression test no longer tests the code path it
was originally meant to test. The bug corresponding to that regression
test is no longer reproducible with its original reported reproducer,
so rather than try to fix the regression test, this change just
removes it.
Differential Revision: https://reviews.llvm.org/D83278
This covers both the existing memory functions as well as the new bulk memory proposal.
Added new test files since changes where also required in the inputs.
Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit.
Differential Revision: https://reviews.llvm.org/D82821
Summary:
Since the br_table instruction takes an i32, switches over i64s (and
larger integers) must use the i32.wrap_i64 instruction to truncate the
table index. This truncation makes numbers just over 2^32
indistinguishable from small numbers, so it was a miscompilation to
omit the range check preceding these br_tables. This change fixes the
problem by skipping the "fixing" of the br_table when the range check
is an i64 instruction.
Fixes PR46447.
Reviewers: aheejin, dschuff, kripken
Reviewed By: kripken
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83017
When created in RegStackify pass, `TEE` has two destinations, where
op0 is stackified and op1 is not. But it is possible that
op0 becomes unstackified in `fixUnwindMismatches` function in
CFGStackify pass when a nested try-catch-end is introduced, violating
the invariant of `TEE`s destinations.
In this case we convert the `TEE` into two `COPY`s, which will
eventually be resolved in ExplicitLocals.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D81851
Summary:
This commit fixes a bug in the FixBrTables pass in which an
unconditional branch from the switch header block to the jump table
block was not removed before the blocks were combined. The result was
an invalid CFG in the MachineFunction. This commit also switches from
using bespoke branch analysis and deletion code to using the standard
utilities for the same.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81909
Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md
This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working.
Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working.
Further features need implementation, but these will be added in followups to keep things reviewable.
Differential Revision: https://reviews.llvm.org/D80769
Summary:
After their range checks were removed in 7f50c15be5, br_tables
started being duplicated into their predecessors by tail
folding. Unfortunately, when the br_tables were in loops this
transformation introduced bad irreducible control flow which was later
expanded into even more br_tables. This commit abuses the
`isNotDuplicable` property to prevent this irreducible control flow
from being introduced. This change saves a few dozen bytes of code
size and has a negligible affect on performance for most of the large
Emscripten benchmarks, but can improve performance significantly on
microbenchmarks of switches in loops.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81628
Summary:
The natural alignments for extending and splatting loads had not
previously been tested. It is good to have them tested because they
are non-obvious details in the SIMD spec proposal.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81303
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81222
There's two properties we want to verify:
1. That the successors returned by analyzeBranch are in the CFG
successor list, and
2. That there are no extraneous successors are in the CFG successor
list.
The previous implementation mostly accomplished this, but in a very
convoluted manner.
Differential Revision: https://reviews.llvm.org/D79793
Summary:
Unlike normal traps, debug traps are allowed to return and can have
additional instructions in the same basic block. Without explicit
backend support for debug traps, they are lowered in ISel as normal
traps. Since normal traps are lowered in the WebAssembly backend to
the UNREACHABLE instruction, which is a terminator, using debug traps
could lead to invalid MBBs when there are additional instructions
after the trap. This patch fixes the issue by lowering debug traps to
a new version of the UNREACHABLE instruction, DEBUG_UNREACHABLE, that
is not a terminator.
An alternative approach would have been to make UNREACHABLE not a
terminator, but that breaks a large number of tests. In particular, it
would require removing the traps inserted after noreturn calls to
@llvm.wasm.throw because otherwise the terminator throw would be
followed by a non-terminator UNREACHABLE and we would be back to
having invalid MBBs. Overall the approach in this patch seems simpler.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81055
Summary:
The code previously assumed that the index of a vector extract was
constant, but this was not always true. This patch fixes the problem
by bailing out of the lowering if the index is nonconstant and also
replaces `static_cast`s in the lowering function with `cast`s because
the latter contain type-checking asserts that would make similar
issues easier to find and debug.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81025
This reverts commit 755a895915.
Although I was not able to reproduce any test failures locally,
aheejin was able to reproduce them and found a fix, applied here.
Summary:
Jump tables for most targets cannot handle out of range indices by
themselves, so LLVM emits range checks to guard the jump
tables. WebAssembly, on the other hand, implements jump tables using
the br_table instruction, which takes a default branch target as an
operand, making the range checks redundant. This patch introduces a
new MachineFunction pass in the WebAssembly backend to find and
eliminate the redundant range checks.
Reviewers: aheejin, dschuff
Subscribers: mgorny, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80863
Summary:
`getMatchingEHPad()` in LateEHPrepare is a function to find the nearest
EH pad that dominates the given instruction. This intends to be
lightweight so it does not use full WebAssemblyException scope analysis
or dominator analysis. It simply does backward BFS to its predecessors
and stops at the first EH pad each search path encounters. All search
should end up at the same EH pad, and if not, it returns null.
But it didn't take into account that when there are inner scopes within
the current scope, some path in BFS can hit an inner EH pad first. For
example, in the given diagram, `Inst` belongs to the outer scope and
`getMathingEHPad()` should return 'EHPad 1', but some search path can go
into the inner scope and end up with 'EHPad 2'. The search will return
null because different paths end up with different EH pads.
```
--- EHPad 1 ---
| - EHPad 2 - |
| | | |
| ----------- |
| Inst |
---------------
```
So far this was OK because we haven't tested a case in which a given
instruction is far from its EH pad. Also, this bug does not happen when
the inner EH scope is a cleanup scope, because a cleanup scope ends with
a `cleanupret` whose successor is an EH pad, so the search encounters
that EH pad first before going into the child scope. But this can happen
when the child scope is a catch scope that ends with `catchret`. So this
patch, when doing backward BFS, does not search predecessors that ends
with `catchret`. Because `catchret`s are replaced with `br`s during this
pass, this records BBs that have `catchret`s in the beginning, before
doing any other transformations.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80571
Summary:
One of the things `removeUnnecessaryInstrs()` in CFGStackify does is to
remove an unnecessary unconditinal branch before an EH pad. When there
is an unconditional branch right before a catch instruction and it
branches to the end of `end_try` marker, we don't need the branch,
because it there is no exception, the control flow transfers to
that point anyway.
```
bb0:
try
...
br bb2 <- Not necessary
bb1:
catch
...
bb2:
end
```
This applies when we have a conditional branch followed by an
unconditional one, in which case we should only remove the unconditional
branch. For example:
```
bb0:
try
...
br_if someplace_else
br bb2 <- Not necessary
bb1:
catch
...
bb2:
end
```
But `TargetInstrInfo::removeBranch` we used removed all existing
branches when there are multiple ones. This patch fixes it by only
deleting the last (= unconditional) branch manually.
Also fixes some `preds` comments in the test file.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80572
Summary:
The code previously assumed the source of the bitcast in the combined
pattern was a vector type, but this is not always true. This patch
adds a check to avoid an assertion failure in that case.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80164
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
Summary:
This new custom DAG combine fixes a codegen issue with the
wasm_simd128.h intrinsics. Clang lowers the
return (v128_t)(__f32x4){__a, __a, __a, __a};
body of f32x4_splat to a splat shuffle of a bitcasted vector, as seen
in the new simd-shuffle-bitcast.ll test. The bitcast interfered with
the target-independent DAG combine that combines splat shuffles into
BUILD_VECTOR nodes, so this patch introduces a new custom DAG combine
to hoist the bitcast out of the shuffle, allowing the
target-independent combine to work as intended.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80021
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79973
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
Summary:
The WebAssembly backend automatically lowers atomic operations and TLS
to nonatomic operations and non-TLS data when either are present and
the atomics or bulk-memory features are not present, respectively. The
resulting object is no longer thread-safe, so the linker has to be
told not to allow it to be linked into a module with shared
memory. This was previously done by disallowing the 'atomics' feature,
which prevented any objct with its atomic operations or TLS removed
from being linked with any object containing atomics or TLS, and
therefore preventing it from being linked into a module with shared
memory since shared memory requires atomics.
However, as of https://github.com/WebAssembly/threads/issues/144, the
validation rules are relaxed to allow atomic operations to validate
with unshared memories, which makes it perfectly safe to link an
object with stripped atomics and TLS with another object that still
contains TLS and atomics as long as the resulting module has an
unshared memory. To allow this kind of link, this patch disallows a
pseudo-feature 'shared-mem' rather than 'atomics' to communicate to
the linker that the object is not thread-safe. This means that the
'atomics' feature is available to accurately reflect whether or not an
object has atomics enabled.
As a drive-by tweak, this change also requires that bulk-memory be
enabled in addition to atomics in order to use shared memory. This is
because initializing shared memories requires bulk-memory operations.
Reviewers: aheejin, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79542
Summary:
This fixes a few things that are connected. It is very hard to provide
an independent test case for each of those fixes, because they are
interconnected and sometimes one masks another. The provided test case
triggers some of those bugs below but not all.
---
1. Background:
`placeBlockMarker` takes a BB, and if the BB is a destination of some
branch, it places `end_block` marker there, and computes the nearest
common dominator of all predecessors (what we call 'header') and places
a `block` marker there.
When we first place markers, we traverse BBs from top to bottom. For
example, when there are 5 BBs A, B, C, D, and E and B, D, and E are
branch destinations, if mark the BB given to `placeBlockMarker` with `*`
and draw a rectangle representing the border of `block` and `end_block`
markers, the process is going to look like
```
-------
----- |-----|
--- |---| ||---||
|A| ||A|| |||A|||
--- --> |---| --> ||---||
*B | B | || B ||
C | C | || C ||
D ----- |-----|
E *D | D |
E -------
*E
```
which means when we first place markers, we go from inner to outer
scopes. So when we place a `block` marker, if the header already
contains other `block` or `try` marker, it has to belong to an inner
scope, so the existing `block`/`try` markers should go _after_ the new
marker. This was the assumption we had.
But after placing all markers we run `fixUnwindMismatches` function.
There we do some control flow transformation and create some branches,
and we call `placeBlockMarker` again to place `block`/`end_block`
markers for those newly created branches. We can't assume that we are
traversing branch destination BBs from top to bottom now because we are
basically inserting some new markers in the middle of existing markers.
Fix:
In `placeBlockMarker`, we don't have the assumption that the BB given is
in the order of top to bottom, and when placing `block` markers,
calculates whether existing `block` or `try` markers are inner or
outer scopes with respect to the current scope.
---
2. Background:
In `fixUnwindMismatches`, when there is a call whose correct unwind
destination mismatches the current destination after initially placing
`try` markers, we wrap that with a new nested `try`/`catch`/`end` and
jump to the correct handler within the new `catch`. The correct handler
code is split as a separate BB from its original EH pad so it can be
branched to. Here's an example:
- Before
```
mbb:
call @foo <- Unwind destination mismatch!
wrong-ehpad:
catch
...
cont:
end_try
...
correct-ehpad:
catch
[handler code]
```
- After
```
mbb:
try (new)
call @foo
nested-ehpad: (new)
catch (new)
local.set n / drop (new)
br %handleri (new)
nested-end: (new)
end_try (new)
wrong-ehpad:
catch
...
cont:
end_try
...
correct-ehpad:
catch
local.set n / drop (new)
handler: (new)
end_try
[handler code]
```
Note that after this transformation, it is possible there are no calls
to actually unwind to `correct-ehpad` here. `call @foo` now
branches to `handler`, and there can be no other calls to unwind to
`correct-ehpad`. In this case `correct-ehpad` does not have any
predecessors anymore.
This can cause a bug in `placeBlockMarker`, because we may need to place
`end_block` marker in `handler`, and `placeBlockMarker` computes the
nearest common dominator of all predecessors. If one of `handler`'s
predecessor (here `correct-ehpad`) does not have any predecessors, i.e.,
no way of reaching it, we cannot correctly compute the common dominator
of predecessors of `handler`, and end up placing no `block`/`end`
markers. This bug actually sometimes masks the bug 1.
Fix:
When we have an EH pad that does not have any predecessors after this
transformation, deletes all its successors, so that its successors don't
have any dangling predecessors.
---
3. Background:
Actually the `handler` BB in the example shown in bug 2 doesn't need
`end_block` marker, despite it being a new branch destination, because
it already has `end_try` marker which can serve the same purpose. I just
put that example there for an illustration purpose. There is a case we
actually need to place `end_block` marker: when the branch dest is the
appendix BB. The appendix BB is created when there is a call that is
supposed to unwind to the caller ends up unwinding to a wrong EH pad. In
this case we also wrap the call with a nested `try`/`catch`/`end`,
create an 'appendix' BB at the very end of the function, and branch to
that BB, where we rethrow the exception to the caller.
Fix:
When we don't actually need to place block markers, we don't.
---
4. In case we fall through to the continuation BB after the catch block,
after extracting handler code in `fixUnwindMismatches` (refer to bug 2
for an example), we now have to add a branch to it to bypass the
handler.
- Before
```
try
...
(falls through to 'cont')
catch
handler body
end
<-- cont
```
- After
```
try
...
br %cont (new)
catch
end
handler body
<-- cont
```
The problem is, we haven't been placing a new `end_block` marker in the
`cont` BB in this case. We should, and this fixes it. But it is hard to
provide a test case that triggers this bug, because the current
compilation pipeline from .ll to .s does not generate this kind of code;
we always have a `br` after `invoke`. But code without `br` is still
valid, and we can have that kind of code if we have some pipeline
changes or optimizations later. Even mir test cases cannot trigger this
part for now, because we don't encode auxiliary EH-related data
structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities
can be added later, but I don't think we should block this fix on that.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79324
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.
This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999