llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	913604a637	[NFC][ARM][ParallelDSP] Refactor narrow sequence Most of the code used for finding a 'narrow' sequence is not used, so I've removed it and simplified the calls from the smlad matcher. llvm-svn: 362104	2019-05-30 15:26:37 +00:00
Sam Parker	a33e311a3b	[ARM][ParallelDSP] Relax alias checks When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads. Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing. However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important. The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability. Differential Revision: https://reviews.llvm.org/D6102 llvm-svn: 360567	2019-05-13 09:23:32 +00:00
Sam Parker	9e73020bfa	[ARM][ParallelDSP] Disable for big-endian Bail early when we don't have a preheader and also if the target is big endian because it's written with only little endian in mind! Differential Revision: https://reviews.llvm.org/D59368 llvm-svn: 356243	2019-03-15 10:19:32 +00:00
Sam Parker	4c4ff13d3c	[ARM][ParallelDSP] Enable multiple uses of loads When choosing whether a pair of loads can be combined into a single wide load, we check that the load only has a sext user and that sext also only has one user. But this can prevent the transformation in the cases when parallel macs use the same loaded data multiple times. To enable this, we need to fix up any other uses after creating the wide load: generating a trunc and a shift + trunc pair to recreate the narrow values. We also need to keep a record of which loads have already been widened. Differential Revision: https://reviews.llvm.org/D59215 llvm-svn: 356132	2019-03-14 11:14:13 +00:00
James Y Knight	14359ef1b6	[opaque pointer types] Pass value type to LoadInst creation. This cleans up all LoadInst creation in LLVM to explicitly pass the value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57172 llvm-svn: 352911	2019-02-01 20:44:24 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Sam Parker	5338f7aae4	[ARM] Prevent parallel macs for unsigned values Both zext and sext are currently allowed during the search for narrow sequences and sexts operands are later added to the mac candidates. But operands of muls are also added, without checking whether they're sext or zext, which means we can generate a signed smlad when we shouldn't. Differential Revision: https://reviews.llvm.org/D54790 llvm-svn: 347542	2018-11-26 10:22:55 +00:00
Sam Parker	453ba916a0	[ARM] Small reorganisation in ARMParallelDSP A few code movement things: - AreSymmetrical is now a method of BinOpChain. - Created a lambda in CreateParallelMACPairs to reduce loop nesting. - A Reduction object now gets pasted in a couple of places instead, including CreateParallelMACPairs so it doesn't need to return a value. I've also added RecordSequentialLoads, which is run before the transformation begins, and caches the interesting loads. This can then be queried later instead of cross checking many load values. Differential Revision: https://reviews.llvm.org/D54254 llvm-svn: 346479	2018-11-09 09:18:00 +00:00
Eli Friedman	b09c778715	Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP") Still causing failures on the polly-aosp buildbot; I'll follow up with a reduced testcase. llvm-svn: 344752	2018-10-18 19:34:30 +00:00
Sam Parker	2ef3c0dad6	[ARM] bottom-top mul support in ARMParallelDSP Previously reverted in rL343082. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 344693	2018-10-17 13:02:48 +00:00
George Burgess IV	6ef8002c2c	Replace most users of UnknownSize with LocationSize::unknown(); NFC Moving away from UnknownSize is part of the effort to migrate us to LocationSizes (e.g. the cleanup promised in D44748). This doesn't entirely remove all of the uses of UnknownSize; some uses require tweaks to assume that UnknownSize isn't just some kind of int. This patch is intended to just be a trivial replacement for all places where LocationSize::unknown() will Just Work. llvm-svn: 344186	2018-10-10 21:28:44 +00:00
Hans Wennborg	4b2e7daa7e	Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP" This broke Chromium's Android build (https://crbug.com/889390) and the polly-aosp buildbot (http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable). > Originally committed in rL342210 but was reverted in rL342260 because > it was causing issues in vectorized code, because I had forgotten to > ensure that we're operating on scalar values. > > Original commit message: > > On failing to find sequences that can be converted into dual macs, > try to find sequential 16-bit loads that are used by muls which we > can then use smultb, smulbt, smultt with a wide load. > > Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 343082	2018-09-26 08:41:50 +00:00
Sam Parker	a7b2405b06	[ARM] bottom-top mul support ARMParallelDSP Originally committed in rL342210 but was reverted in rL342260 because it was causing issues in vectorized code, because I had forgotten to ensure that we're operating on scalar values. Original commit message: On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342870	2018-09-24 09:34:06 +00:00
Reid Kleckner	00f0ee718f	Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP" It causes assertion failures while building Skia for Android in Chromium: https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550 Reduction forthcoming. llvm-svn: 342260	2018-09-14 18:44:37 +00:00
Sam Parker	7b84fd7847	[ARM] bottom-top mul support in ARMParallelDSP On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load. Differential Revision: https://reviews.llvm.org/D51983 llvm-svn: 342210	2018-09-14 08:09:09 +00:00
Sam Parker	1187911b0b	[ARM] Follow-up to rL342033 Fixed typo which can cause segfault. llvm-svn: 342040	2018-09-12 09:58:56 +00:00
Sam Parker	a023c7a9cb	[ARM] Exchange MAC operands in ARMParallelDSP SMLAD and SMLALD instructions also come in the form of SMLADX and SMLALDX which perform an exchange on their second operand. To support this, more of the loads in the MAC candidates are compared for sequential access and a boolean value has been added to BinOpChain. AddMACCandiate has been refactored into a small pattern matching state machine to reduce the amount of duplicated code, but also to enable the matching to be more flexible. CreateParallelMACPairs now iterates through all the candidates to find parallel ones. Differential Revision: https://reviews.llvm.org/D51424 llvm-svn: 342033	2018-09-12 09:17:44 +00:00
Sam Parker	01db2983cd	[ARM] Add smlald support in ARMParallelDSP Search from i64 reducing phis, as well as i32, to allow the generation of smlald instructions. Differential Revision: https://reviews.llvm.org/D51101 llvm-svn: 341941	2018-09-11 14:01:22 +00:00
Sjoerd Meijer	3c859b3ec3	[ARM] ParallelDSP: add option to enable/disable the pass Differential Revision: https://reviews.llvm.org/D50511 llvm-svn: 339645	2018-08-14 07:43:49 +00:00
Fangrui Song	58407ca045	[ARM] Use unique_ptr to fix memory leak introduced in r337701 llvm-svn: 337714	2018-07-23 17:43:21 +00:00
Jordan Rupprecht	e5daf61229	OpChain has subclasses, so add a virtual destructor. Summary: OpChain has subclasses, so add a virtual destructor. This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701. Reviewers: javed.absar Subscribers: llvm-commits, SjoerdMeijer, samparker Differential Revision: https://reviews.llvm.org/D49681 llvm-svn: 337713	2018-07-23 17:38:05 +00:00
Sam Parker	89a3799a69	[ARM][NFC] ParallelDSP reorganisation In preparing to allow ARMParallelDSP pass to parallelise more than smlads, I've restructed some elements: - The ParallelMAC struct has been renamed to BinOpChain. - The BinOpChain struct holds two value lists: LHS and RHS, as well as inheriting from the OpChain base class. - The OpChain struct holds all the values of the represented chain and has had the memory locations functionality inserted into it. - ParallelMACList becomes OpChainList and it now holds pointers instead of objects. Differential Revision: https://reviews.llvm.org/D49020 llvm-svn: 337701	2018-07-23 15:25:59 +00:00
Sjoerd Meijer	53449daa96	[ARM] ParallelDSP: multiple reduction stmts in loop This fixes an issue that we were not properly supporting multiple reduction stmts in a loop, and not generating SMLADs for these cases. The alias analysis checks were done too early, making it too conservative. Differential revision: https://reviews.llvm.org/D49125 llvm-svn: 336795	2018-07-11 12:36:25 +00:00
Sjoerd Meijer	b3e06faa28	[ARM] ParallelDSP: added statistics, NFC. Added statistics for the number of SMLAD instructions created, and als renamed the pass name to -arm-parallel-dsp. Differential Revision: https://reviews.llvm.org/D48971 llvm-svn: 336441	2018-07-06 14:47:09 +00:00
Sjoerd Meijer	27be58b307	[ARM] ParallelDSP: only support i16 loads for now We were miscompiling i8 loads, so reject them as unsupported narrow operations for now. Differential Revision: https://reviews.llvm.org/D48944 llvm-svn: 336319	2018-07-05 08:21:40 +00:00
Fangrui Song	68169343a5	[ARM] Fix inconsistent declaration parameter name in r336195 llvm-svn: 336223	2018-07-03 19:12:27 +00:00
Sam Parker	ffc1681620	[ARM][NFC] Refactor sequential access for DSP With a view to support parallel operations that have their results stored to memory, refactor the consecutive access helper out so it could support stores instructions. Differential Revision: https://reviews.llvm.org/D48872 llvm-svn: 336195	2018-07-03 12:44:16 +00:00
Simon Pilgrim	c09b5e31d7	Remove unnecessary semicolon. NFCI. Fixes -Wpedantic warning. llvm-svn: 335901	2018-06-28 18:37:16 +00:00
Sjoerd Meijer	c89ca5582a	[ARM] Parallel DSP Pass Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850	2018-06-28 12:55:29 +00:00

29 Commits