llvm-project

Commit Graph

Author	SHA1	Message	Date
Heejin Ahn	d25c17f317	[WebAssembly] Fix fixEndsAtEndOfFunction for try-catch When the function return type is non-void and `end` instructions are at the very end of a function, CFGStackify's `fixEndsAtEndOfFunction` function fixes the corresponding block/loop/try's type to match the function's return type. This is applied to consecutive `end` markers at the end of a function. For example, when the function return type is `i32`, ``` block i32 ;; return type is fixed to i32 ... loop i32 ;; return type is fixed to i32 ... end_loop end_block end_function ``` But try-catch is a little different, because it consists of two parts: a try part and a catch part, and both parts' return type should satisfy the function's return type. Which means, ``` try i32 ;; return type is fixed to i32 ... block i32 ;; this should be changed i32 too! ... end_block catch ... end_try end_function ``` As you can see in this example, it is not sufficient to only `end` instructions at the end of a function; in case of `try`, we should check instructions before `catch`es, in case their corresponding `try`'s type has been fixed. This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist that contains a reverse iterator, each of which is a starting point for a new backward `end` instruction search. Fixes https://bugs.llvm.org/show_bug.cgi?id=47413. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D87207	2020-09-08 09:27:40 -07:00
Thomas Lively	caee15a0ed	[WebAssembly] Fix incorrect assumption of simple value types Fixes PR47375, in which an assertion was triggering because WebAssemblyTargetLowering::isVectorLoadExtDesirable was improperly assuming the use of simple value types. Differential Revision: https://reviews.llvm.org/D87110	2020-09-06 15:42:21 -07:00
Jay Foad	b7e3599a22	[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT Differential Revision: https://reviews.llvm.org/D86449	2020-08-26 09:20:46 +01:00
Thomas Lively	cc612c2908	[WebAssembly] Fix FastISel address calculation bug Fixes PR47040, in which an assertion was improperly triggered during FastISel's address computation. The issue was that an `Address` set to be relative to the FrameIndex with offset zero was incorrectly considered to have an unset base. When the left hand side of an add set the Address to be 0 off the FrameIndex, the right side would not detect that the Address base had already been set and could try to set the Address to be relative to a register instead, triggering an assertion. This patch fixes the issue by explicitly tracking whether an `Address` has been set rather than interpreting an offset of zero to mean the `Address` has not been set. Differential Revision: https://reviews.llvm.org/D85581	2020-08-08 15:23:11 -07:00
Thomas Lively	cb32792210	[WebAssembly] Implement prototype v128.load{32,64}_zero instructions Specified in https://github.com/WebAssembly/simd/pull/237, these instructions load the first vector lane from memory and zero the other lanes. Since these instructions are not officially part of the SIMD proposal, they are only available on an opt-in basis via LLVM intrinsics and clang builtin functions. If these instructions are merged to the proposal, this implementation will change so that the instructions will be generated from normal IR. At that point the intrinsics and builtin functions would be removed. This PR also changes the opcodes for the experimental f32x4.qfm{a,s} instructions because their opcodes conflicted with those of the v128.load{32,64}_zero instructions. The new opcodes were chosen to match those used in V8. Differential Revision: https://reviews.llvm.org/D84820	2020-08-03 13:54:00 -07:00
Wouter van Oortmerssen	ce1eb7af9d	[WebAssembly] Fixed 64-bit indices in br_table LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated. Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode). Differential Revision: https://reviews.llvm.org/D84705	2020-07-30 10:52:16 -07:00
Heejin Ahn	276f9e8cfa	[WebAssembly] Fix getBottom for loops When it was first created, CFGSort only made sure BBs in each `MachineLoop` are sorted together. After we added exception support, CFGSort now also sorts BBs in each `WebAssemblyException`, which represents a `catch` block, together, and `Region` class was introduced to be a thin wrapper for both `MachineLoop` and `WebAssemblyException`. But how we compute those loops and exceptions is different. `MachineLoopInfo` is constructed using the standard loop computation algorithm in LLVM; the definition of loop is "a set of BBs that are dominated by a loop header and have a path back to the loop header". So even if some BBs are semantically contained by a loop in the original program, or in other words dominated by a loop header, if they don't have a path back to the loop header, they are not considered a part of the loop. For example, if a BB is dominated by a loop header but contains `call abort()` or `rethrow`, it wouldn't have a path back to the header, so it is not included in the loop. But `WebAssemblyException` is wasm-specific data structure, and its algorithm is simple: a `WebAssemblyException` consists of an EH pad and all BBs dominated by the EH pad. So this scenario is possible: (This is also the situation in the newly added test in cfg-stackify-eh.ll) ``` Loop L: header, A, ehpad, latch Exception E: ehpad, latch, B ``` (B contains `abort()`, so it does not have a path back to the loop header, so it is not included in L.) And it is sorted in this order: ``` header A ehpad latch B ``` And when CFGStackify places `end_loop` or `end_try` markers, it previously used `WebAssembly::getBottom()`, which returns the latest BB in the sorted order, and placed the marker there. So in this case the marker placements will be like this: ``` loop header try A catch ehpad latch end_loop <-- misplaced! B end_try ``` in which nesting between the loop and the exception is not correct. `end_loop` marker has to be placed after `B`, and also after `end_try`. Maybe the fundamental way to solve this problem is to come up with our own algorithm for computing loop region too, in which we include all BBs dominated by a loop header in a loop. But this takes a lot more effort. The only thing we need to fix is actually, `getBottom()`. If we make it return the right BB, which means in case of a loop, the latest BB of the loop itself and all exceptions contained in there, we are good. This renames `Region` and `RegionInfo` to `SortRegion` and `SortRegionInfo` and extracts them into their own file. And add `getBottom` to `SortRegionInfo` class, from which it can access `WebAssemblyExceptionInfo`, so that it can compute a correct bottom block for loops. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D84724	2020-07-29 10:36:32 -07:00
Thomas Lively	11bb7eef41	[WebAssembly] Remove intrinsics for SIMD widening ops Instead, pattern match extends of extract_subvectors to generate widening operations. Since extract_subvector is not a legal node, this is implemented via a custom combine that recognizes extract_subvector nodes before they are legalized. The combine produces custom ISD nodes that are later pattern matched directly, just like the intrinsic was. Also removes the clang builtins for these operations since the instructions can now be generated from portable code sequences. Differential Revision: https://reviews.llvm.org/D84556	2020-07-28 18:25:55 -07:00
Thomas Lively	ffd8c23ccb	[WebAssembly] Implement truncating vector stores Rather than expanding truncating stores so that vectors are stored one lane at a time, lower them to a sequence of instructions using narrowing operations instead, when possible. Since the narrowing operations have saturating semantics, but truncating stores require truncation, mask the stored value to manually truncate it before narrowing. Also, since narrowing is a binary operation, pass in the original vector as the unused second argument. Differential Revision: https://reviews.llvm.org/D84377	2020-07-28 17:46:45 -07:00
Thomas Lively	a459459248	[WebAssembly] Fix store_unfolded_offset tests in simd-offset.ll These tests were previously duplicates of the unfolded_gep_negative_offset tests, and this change updates them to test what they were meant to test. Differential Revision: https://reviews.llvm.org/D84365	2020-07-23 16:05:20 -07:00
Thomas Lively	51cd326f99	[WebAssembly] Autogenerate checks in simd-offset.ll Implementing new functionality tested in this file requires adding new tests for many IR addressing patterns, which can be a large maintenance burden. This patch makes adding tests easier by switching to using autogenerated checks. This patch also removes the testing mode that has simd128 disabled because it would produce very large checks and is not particularly interesting. Differential Revision: https://reviews.llvm.org/D84288	2020-07-22 10:12:26 -07:00
Wouter van Oortmerssen	cc1b9b680f	[WebAssembly] 64-bit (function) pointer fixes. Accounting for the fact that Wasm function indices are 32-bit, but in wasm64 we want uniform 64-bit pointers. Includes reloc types for 64-bit table indices. Differential Revision: https://reviews.llvm.org/D83729	2020-07-16 14:10:22 -07:00
Thomas Lively	ecb2e5bcd7	[WebAssembly] Implement v128.select Although the SIMD spec proposal does not specifically include a select instruction, the select instruction in MVP WebAssembly is polymorphic over the selected types, so it is able to work on v128 values when they are enabled. This patch introduces a new variant of the select instruction for each legal vector type. Additional ISel patterns are adapted from the SELECT_I32 and SELECT_I64 patterns. Depends on D83736. Differential Revision: https://reviews.llvm.org/D83737	2020-07-16 11:37:25 -07:00
Thomas Lively	f7868f87ac	[WebAssembly] Autogenerate tests for simd-select.ll Updating the simd-select.ll tests manually with consistent named regexps for the register numbers was taking more time than it was worth, so this patch updates that test file to have autogenerated output. This is not a significant readability regression because the tests in that file are all very small. Depends on D83734. Differential Revision: https://reviews.llvm.org/D83736	2020-07-16 11:19:09 -07:00
Thomas Lively	f0f9787646	[WebAssembly] Lower vselect to v128.bitselect We were previously expanding vselect and matching on the expansion to generate bitselects, but in some cases the expansion would be further combined and a bitselect would not get generated. This patch improves codegen in those cases by legalizing vselect and lowering it to v128.bitselect. The old pattern that matches the expansion is still useful for lowering IR that already uses the expansion rather than a select operation. Differential Revision: https://reviews.llvm.org/D83734	2020-07-16 11:11:19 -07:00
Roger Ferrer Ibanez	14bc5e149d	[DAGCombiner] Rebuild (setcc x, y, ==) from (xor (xor x, y), 1) The existing code already considered this case. Unfortunately a typo in the condition prevents it from triggering. Also the existing code, had it run, forgot to do the folding. This fixes PR42876. Differential Revision: https://reviews.llvm.org/D65802	2020-07-15 07:34:22 +00:00
Thomas Lively	b59c6fcaf3	[WebAssembly] Prefer v128.const for constant splats In BUILD_VECTOR lowering, we used to generally prefer using splats over v128.const instructions because v128.const has a very large encoding. However, in `d5b7a4e2e8` we switched to preferring consts because they are expected to be more efficient in engines. This patch updates the ISel patterns to match this current preference. Differential Revision: https://reviews.llvm.org/D83581	2020-07-10 18:27:52 -07:00
Thomas Lively	043eaa9a4a	[WebAssembly][NFC] Simplify vector shift lowering and add tests This patch builds on `0d7286a652` by simplifying the code for detecting splat values and adding new tests demonstrating the lowering of splatted absolute value shift amounts, which are common in code generated by Halide. The lowering is very bad right now, but subsequent patches will improve it considerably. The tests will be useful for evaluating the improvements in those patches. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D83493	2020-07-10 00:18:59 -07:00
Heejin Ahn	7e6793aa33	[WebAssembly] Generate unreachable after __stack_chk_fail `__stack_chk_fail` does not return, but `unreachable` was not generated following `call __stack_chk_fail`. This had a possibility to generate an invalid binary for functions with a return type, because `__stack_chk_fail`'s return type is void and `call __stack_chk_fail` can be the last instruction in the function whose return type is non-void. Generating `unreachable` after it makes sure CFGStackify's `fixEndsAtEndOfFunction` handles it correctly. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D83277	2020-07-08 01:02:05 -07:00
Thomas Lively	0d7286a652	[WebAssembly] Avoid scalarizing vector shifts in more cases Since WebAssembly's vector shift instructions take a scalar shift amount rather than a vector shift amount, we have to check in ISel that the vector shift amount is a splat. Previously, we were checking explicitly for splat BUILD_VECTOR nodes, but this change uses the standard utilities for detecting splat values that can handle more complex splat patterns. Since the C++ ISel lowering is now more general than the ISel patterns, this change also simplifies shift lowering by using the C++ lowering for all SIMD shifts rather than mixing C++ and normal pattern-based lowering. This change improves ISel for shifts to the point that the simd-shift-unroll.ll regression test no longer tests the code path it was originally meant to test. The bug corresponding to that regression test is no longer reproducible with its original reported reproducer, so rather than try to fix the regression test, this change just removes it. Differential Revision: https://reviews.llvm.org/D83278	2020-07-07 10:45:26 -07:00
Wouter van Oortmerssen	16d83c395a	[WebAssembly] Added 64-bit memory.grow/size/copy/fill This covers both the existing memory functions as well as the new bulk memory proposal. Added new test files since changes where also required in the inputs. Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit. Differential Revision: https://reviews.llvm.org/D82821	2020-07-06 12:49:50 -07:00
Thomas Lively	8df30d988e	[WebAssembly] Do not omit range checks for i64 switches Summary: Since the br_table instruction takes an i32, switches over i64s (and larger integers) must use the i32.wrap_i64 instruction to truncate the table index. This truncation makes numbers just over 2^32 indistinguishable from small numbers, so it was a miscompilation to omit the range check preceding these br_tables. This change fixes the problem by skipping the "fixing" of the br_table when the range check is an i64 instruction. Fixes PR46447. Reviewers: aheejin, dschuff, kripken Reviewed By: kripken Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83017	2020-07-03 17:15:39 -07:00
Qiu Chaofan	aa4fd7d848	[NFC] Fix typo in triples from unkown to unknown	2020-07-02 16:21:54 +08:00
Wouter van Oortmerssen	b9a539c010	[WebAssembly] Adding 64-bit versions of __stack_pointer and other globals We have 6 globals, all of which except for __table_base are 64-bit under wasm64. Differential Revision: https://reviews.llvm.org/D82130	2020-06-25 15:52:44 -07:00
Eric Christopher	cf23852587	[Target] As part of using inclusive language within the llvm project, migrate away from the use of blacklist and whitelist. This change affects an internal llvm command line option.	2020-06-20 00:06:39 -07:00
Heejin Ahn	83c26eae23	[WebAssembly] Remove TEEs when dests are unstackified When created in RegStackify pass, `TEE` has two destinations, where op0 is stackified and op1 is not. But it is possible that op0 becomes unstackified in `fixUnwindMismatches` function in CFGStackify pass when a nested try-catch-end is introduced, violating the invariant of `TEE`s destinations. In this case we convert the `TEE` into two `COPY`s, which will eventually be resolved in ExplicitLocals. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D81851	2020-06-19 14:55:21 -07:00
Sam Clegg	7ee758d691	[WebAssembly] MC: Fix for data aliases with offsets (getelementptr) For some reason we hadn't seen such cases in the wild which makes me think that clang and rustc don't generate these. In the bug which reproduces it only occurs with LTO so my guess is that some LTO pass is creating this alias + gep. See: https://github.com/emscripten-core/emscripten/issues/8731 Differential Revision: https://reviews.llvm.org/D79462	2020-06-17 16:25:50 -07:00
Thomas Lively	49754dcf22	[WebAssembly] Fix bug in FixBrTables and use branch analysis utils Summary: This commit fixes a bug in the FixBrTables pass in which an unconditional branch from the switch header block to the jump table block was not removed before the blocks were combined. The result was an invalid CFG in the MachineFunction. This commit also switches from using bespoke branch analysis and deletion code to using the standard utilities for the same. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81909	2020-06-17 12:34:45 -07:00
Wouter van Oortmerssen	d9e0bbd17b	[WebAssembly] Adding 64-bit versions of all load & store ops. Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working. Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working. Further features need implementation, but these will be added in followups to keep things reviewable. Differential Revision: https://reviews.llvm.org/D80769	2020-06-15 08:31:56 -07:00
Thomas Lively	c5d012341e	[WebAssembly] Make BR_TABLE non-duplicable Summary: After their range checks were removed in `7f50c15be5`, br_tables started being duplicated into their predecessors by tail folding. Unfortunately, when the br_tables were in loops this transformation introduced bad irreducible control flow which was later expanded into even more br_tables. This commit abuses the `isNotDuplicable` property to prevent this irreducible control flow from being introduced. This change saves a few dozen bytes of code size and has a negligible affect on performance for most of the large Emscripten benchmarks, but can improve performance significantly on microbenchmarks of switches in loops. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81628	2020-06-11 15:11:45 -07:00
Thomas Lively	a96414527c	[NFC][WebAssembly] Add tests for alignment on new SIMD loads Summary: The natural alignments for extending and splatting loads had not previously been tested. It is good to have them tested because they are non-obvious details in the SIMD spec proposal. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81303	2020-06-09 13:46:12 -07:00
Thomas Lively	b7d369280b	[WebAssembly] Implement prototype SIMD rounding instructions Summary: As specified in https://github.com/WebAssembly/simd/pull/232. These instructions are implemented as LLVM intrinsics for now rather than normal ISel patterns to make these instructions opt-in. Once the instructions are merged to the spec proposal, the intrinsics will be replaced with proper ISel patterns. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81222	2020-06-09 10:14:14 -07:00
James Y Knight	748d92b4d3	Simplify MachineVerifier's block-successor verification. There's two properties we want to verify: 1. That the successors returned by analyzeBranch are in the CFG successor list, and 2. That there are no extraneous successors are in the CFG successor list. The previous implementation mostly accomplished this, but in a very convoluted manner. Differential Revision: https://reviews.llvm.org/D79793	2020-06-06 22:30:51 -04:00
Thomas Lively	a07c08f74f	[WebAssembly] Lower llvm.debugtrap properly Summary: Unlike normal traps, debug traps are allowed to return and can have additional instructions in the same basic block. Without explicit backend support for debug traps, they are lowered in ISel as normal traps. Since normal traps are lowered in the WebAssembly backend to the UNREACHABLE instruction, which is a terminator, using debug traps could lead to invalid MBBs when there are additional instructions after the trap. This patch fixes the issue by lowering debug traps to a new version of the UNREACHABLE instruction, DEBUG_UNREACHABLE, that is not a terminator. An alternative approach would have been to make UNREACHABLE not a terminator, but that breaks a large number of tests. In particular, it would require removing the traps inserted after noreturn calls to @llvm.wasm.throw because otherwise the terminator throw would be followed by a non-terminator UNREACHABLE and we would be back to having invalid MBBs. Overall the approach in this patch seems simpler. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81055	2020-06-04 13:25:10 -07:00
Thomas Lively	25af2126f9	[WebAssembly] Fix ISel crash in SIGN_EXTEND_INREG lowering Summary: The code previously assumed that the index of a vector extract was constant, but this was not always true. This patch fixes the problem by bailing out of the lowering if the index is nonconstant and also replaces `static_cast`s in the lowering function with `cast`s because the latter contain type-checking asserts that would make similar issues easier to find and debug. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81025	2020-06-03 15:36:44 -07:00
Thomas Lively	7f50c15be5	Reland "[WebAssembly] Eliminate range checks on br_tables" This reverts commit `755a895915`. Although I was not able to reproduce any test failures locally, aheejin was able to reproduce them and found a fix, applied here.	2020-06-03 14:04:59 -07:00
Thomas Lively	755a895915	Revert "[WebAssembly] Eliminate range checks on br_tables" This reverts commit `f99d5f8c32`. The change was causing UBSan and other failures on some bots.	2020-06-03 01:26:53 -07:00
Thomas Lively	f99d5f8c32	[WebAssembly] Eliminate range checks on br_tables Summary: Jump tables for most targets cannot handle out of range indices by themselves, so LLVM emits range checks to guard the jump tables. WebAssembly, on the other hand, implements jump tables using the br_table instruction, which takes a default branch target as an operand, making the range checks redundant. This patch introduces a new MachineFunction pass in the WebAssembly backend to find and eliminate the redundant range checks. Reviewers: aheejin, dschuff Subscribers: mgorny, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80863	2020-06-02 13:14:27 -07:00
Heejin Ahn	4cd3f4b31b	[WebAssembly] Fix a bug in finding matching EH pad Summary: `getMatchingEHPad()` in LateEHPrepare is a function to find the nearest EH pad that dominates the given instruction. This intends to be lightweight so it does not use full WebAssemblyException scope analysis or dominator analysis. It simply does backward BFS to its predecessors and stops at the first EH pad each search path encounters. All search should end up at the same EH pad, and if not, it returns null. But it didn't take into account that when there are inner scopes within the current scope, some path in BFS can hit an inner EH pad first. For example, in the given diagram, `Inst` belongs to the outer scope and `getMathingEHPad()` should return 'EHPad 1', but some search path can go into the inner scope and end up with 'EHPad 2'. The search will return null because different paths end up with different EH pads. ``` --- EHPad 1 --- \| - EHPad 2 - \| \| \| \| \| \| ----------- \| \| Inst \| --------------- ``` So far this was OK because we haven't tested a case in which a given instruction is far from its EH pad. Also, this bug does not happen when the inner EH scope is a cleanup scope, because a cleanup scope ends with a `cleanupret` whose successor is an EH pad, so the search encounters that EH pad first before going into the child scope. But this can happen when the child scope is a catch scope that ends with `catchret`. So this patch, when doing backward BFS, does not search predecessors that ends with `catchret`. Because `catchret`s are replaced with `br`s during this pass, this records BBs that have `catchret`s in the beginning, before doing any other transformations. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80571	2020-05-28 19:46:11 -07:00
Heejin Ahn	3fe6ea4641	[WebAssembly] Fix a bug in removing unnecessary branches Summary: One of the things `removeUnnecessaryInstrs()` in CFGStackify does is to remove an unnecessary unconditinal branch before an EH pad. When there is an unconditional branch right before a catch instruction and it branches to the end of `end_try` marker, we don't need the branch, because it there is no exception, the control flow transfers to that point anyway. ``` bb0: try ... br bb2 <- Not necessary bb1: catch ... bb2: end ``` This applies when we have a conditional branch followed by an unconditional one, in which case we should only remove the unconditional branch. For example: ``` bb0: try ... br_if someplace_else br bb2 <- Not necessary bb1: catch ... bb2: end ``` But `TargetInstrInfo::removeBranch` we used removed all existing branches when there are multiple ones. This patch fixes it by only deleting the last (= unconditional) branch manually. Also fixes some `preds` comments in the test file. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80572	2020-05-28 19:44:27 -07:00
Thomas Lively	8a43d41a40	[WebAssembly] Fix bug in custom shuffle combine Summary: The code previously assumed the source of the bitcast in the combined pattern was a vector type, but this is not always true. This patch adds a check to avoid an assertion failure in that case. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80164	2020-05-19 12:54:15 -07:00
Thomas Lively	3181273be7	[WebAssembly] Implement i64x2.mul and remove i8x16.mul Summary: This reflects changes in the spec proposal made since basic arithmetic was first implemented. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D80174	2020-05-19 12:50:44 -07:00
Thomas Lively	40af48101b	[WebAssembly] Optimize splats of bitcasted vectors Summary: This new custom DAG combine fixes a codegen issue with the wasm_simd128.h intrinsics. Clang lowers the return (v128_t)(__f32x4){__a, __a, __a, __a}; body of f32x4_splat to a splat shuffle of a bitcasted vector, as seen in the new simd-shuffle-bitcast.ll test. The bitcast interfered with the target-independent DAG combine that combines splat shuffles into BUILD_VECTOR nodes, so this patch introduces a new custom DAG combine to hoist the bitcast out of the shuffle, allowing the target-independent combine to work as intended. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80021	2020-05-15 12:12:20 -07:00
Thomas Lively	c702d4bf41	[WebAssembly] Update latest implemented SIMD instructions Summary: Move instructions that have recently been implemented in V8 from the `unimplemented-simd128` target feature to the `simd128` target feature. The updated instructions match the update at https://github.com/WebAssembly/simd/pull/223. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D79973	2020-05-15 10:53:02 -07:00
Wouter van Oortmerssen	2b7fe0863a	[WebAssembly] Added Debug Fixup pass This pass changes debug_value instructions referring to stackified registers into TI_OPERAND_STACK with correct stack depth.	2020-05-14 13:14:45 -07:00
Thomas Lively	3d49d1cfa7	[WebAssembly] Implement pseudo-min/max SIMD instructions Summary: As proposed in https://github.com/WebAssembly/simd/pull/122. Since these instructions are not yet merged to the SIMD spec proposal, this patch makes them entirely opt-in by surfacing them only through LLVM intrinsics and clang builtins. If these instructions are made official, these intrinsics and builtins should be replaced with simple instruction patterns. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D79742	2020-05-12 09:39:01 -07:00
Thomas Lively	8e3e56f2a3	[WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic Summary: Although using `__builtin_shufflevector` and the `shufflevector` instruction works fine, they are not opaque to the optimizer. As a result, DAGCombine can potentially reduce the number of shuffles and change the shuffle masks. This is unexpected behavior for users of the WebAssembly SIMD intrinsics who have crafted their shuffles to optimize the code generated by engines. This patch solves the problem by adding a new shuffle intrinsic that is opaque to the optimizers in line with the decision of the WebAssembly SIMD contributors at https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In the future we may implement custom DAG combines to properly optimize shuffles and replace this solution. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D66983	2020-05-11 10:01:55 -07:00
Thomas Lively	a1ae9566ea	[WebAssembly] Disallow 'shared-mem' rather than 'atomics' Summary: The WebAssembly backend automatically lowers atomic operations and TLS to nonatomic operations and non-TLS data when either are present and the atomics or bulk-memory features are not present, respectively. The resulting object is no longer thread-safe, so the linker has to be told not to allow it to be linked into a module with shared memory. This was previously done by disallowing the 'atomics' feature, which prevented any objct with its atomic operations or TLS removed from being linked with any object containing atomics or TLS, and therefore preventing it from being linked into a module with shared memory since shared memory requires atomics. However, as of https://github.com/WebAssembly/threads/issues/144, the validation rules are relaxed to allow atomic operations to validate with unshared memories, which makes it perfectly safe to link an object with stripped atomics and TLS with another object that still contains TLS and atomics as long as the resulting module has an unshared memory. To allow this kind of link, this patch disallows a pseudo-feature 'shared-mem' rather than 'atomics' to communicate to the linker that the object is not thread-safe. This means that the 'atomics' feature is available to accurately reflect whether or not an object has atomics enabled. As a drive-by tweak, this change also requires that bulk-memory be enabled in addition to atomics in order to use shared memory. This is because initializing shared memories requires bulk-memory operations. Reviewers: aheejin, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79542	2020-05-08 13:52:39 -07:00
Heejin Ahn	834debfffd	[WebAssembly] Fix block marker placing after fixUnwindMismatches Summary: This fixes a few things that are connected. It is very hard to provide an independent test case for each of those fixes, because they are interconnected and sometimes one masks another. The provided test case triggers some of those bugs below but not all. --- 1. Background: `placeBlockMarker` takes a BB, and if the BB is a destination of some branch, it places `end_block` marker there, and computes the nearest common dominator of all predecessors (what we call 'header') and places a `block` marker there. When we first place markers, we traverse BBs from top to bottom. For example, when there are 5 BBs A, B, C, D, and E and B, D, and E are branch destinations, if mark the BB given to `placeBlockMarker` with `` and draw a rectangle representing the border of `block` and `end_block` markers, the process is going to look like ``` ------- ----- \|-----\| --- \|---\| \|\|---\|\| \|A\| \|\|A\|\| \|\|\|A\|\|\| --- --> \|---\| --> \|\|---\|\| B \| B \| \|\| B \|\| C \| C \| \|\| C \|\| D ----- \|-----\| E D \| D \| E ------- E ``` which means when we first place markers, we go from inner to outer scopes. So when we place a `block` marker, if the header already contains other `block` or `try` marker, it has to belong to an inner scope, so the existing `block`/`try` markers should go _after_ the new marker. This was the assumption we had. But after placing all markers we run `fixUnwindMismatches` function. There we do some control flow transformation and create some branches, and we call `placeBlockMarker` again to place `block`/`end_block` markers for those newly created branches. We can't assume that we are traversing branch destination BBs from top to bottom now because we are basically inserting some new markers in the middle of existing markers. Fix: In `placeBlockMarker`, we don't have the assumption that the BB given is in the order of top to bottom, and when placing `block` markers, calculates whether existing `block` or `try` markers are inner or outer scopes with respect to the current scope. --- 2. Background: In `fixUnwindMismatches`, when there is a call whose correct unwind destination mismatches the current destination after initially placing `try` markers, we wrap that with a new nested `try`/`catch`/`end` and jump to the correct handler within the new `catch`. The correct handler code is split as a separate BB from its original EH pad so it can be branched to. Here's an example: - Before ``` mbb: call @foo <- Unwind destination mismatch! wrong-ehpad: catch ... cont: end_try ... correct-ehpad: catch [handler code] ``` - After ``` mbb: try (new) call @foo nested-ehpad: (new) catch (new) local.set n / drop (new) br %handleri (new) nested-end: (new) end_try (new) wrong-ehpad: catch ... cont: end_try ... correct-ehpad: catch local.set n / drop (new) handler: (new) end_try [handler code] ``` Note that after this transformation, it is possible there are no calls to actually unwind to `correct-ehpad` here. `call @foo` now branches to `handler`, and there can be no other calls to unwind to `correct-ehpad`. In this case `correct-ehpad` does not have any predecessors anymore. This can cause a bug in `placeBlockMarker`, because we may need to place `end_block` marker in `handler`, and `placeBlockMarker` computes the nearest common dominator of all predecessors. If one of `handler`'s predecessor (here `correct-ehpad`) does not have any predecessors, i.e., no way of reaching it, we cannot correctly compute the common dominator of predecessors of `handler`, and end up placing no `block`/`end` markers. This bug actually sometimes masks the bug 1. Fix: When we have an EH pad that does not have any predecessors after this transformation, deletes all its successors, so that its successors don't have any dangling predecessors. --- 3. Background: Actually the `handler` BB in the example shown in bug 2 doesn't need `end_block` marker, despite it being a new branch destination, because it already has `end_try` marker which can serve the same purpose. I just put that example there for an illustration purpose. There is a case we actually need to place `end_block` marker: when the branch dest is the appendix BB. The appendix BB is created when there is a call that is supposed to unwind to the caller ends up unwinding to a wrong EH pad. In this case we also wrap the call with a nested `try`/`catch`/`end`, create an 'appendix' BB at the very end of the function, and branch to that BB, where we rethrow the exception to the caller. Fix: When we don't actually need to place block markers, we don't. --- 4. In case we fall through to the continuation BB after the catch block, after extracting handler code in `fixUnwindMismatches` (refer to bug 2 for an example), we now have to add a branch to it to bypass the handler. - Before ``` try ... (falls through to 'cont') catch handler body end <-- cont ``` - After ``` try ... br %cont (new) catch end handler body <-- cont ``` The problem is, we haven't been placing a new `end_block` marker in the `cont` BB in this case. We should, and this fixes it. But it is hard to provide a test case that triggers this bug, because the current compilation pipeline from .ll to .s does not generate this kind of code; we always have a `br` after `invoke`. But code without `br` is still valid, and we can have that kind of code if we have some pipeline changes or optimizations later. Even mir test cases cannot trigger this part for now, because we don't encode auxiliary EH-related data structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities can be added later, but I don't think we should block this fix on that. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79324	2020-05-05 02:06:47 -07:00
Alex Richardson	d1ff003fbb	[SelectionDAGBuilder] Stop setting alignment to one for hidden sret values We allocated a suitably aligned frame index so we know that all the values have ABI alignment. For MIPS this avoids using pair of lwl + lwr instructions instead of a single lw. I found this when compiling CHERI pure capability code where we can't use the lwl/lwr unaligned loads/stores and and were to falling back to a byte load + shift + or sequence. This should save a few instructions for MIPS and possibly other backends that don't have fast unaligned loads/stores. It also improves code generation for CodeGen/X86/pr34653.ll and CodeGen/WebAssembly/offset.ll since they can now use aligned loads. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D78999	2020-05-04 14:44:39 +01:00

1 2 3 4 5 ...

731 Commits