llvm-project

Commit Graph

Author	SHA1	Message	Date
Matthias Braun	0c989a893b	LivePhysReg: Use reference instead of pointer in init(); NFC llvm-svn: 289002	2016-12-08 00:15:51 +00:00
Oliver Stannard	870b5cad45	[ARM] Better error message for invalid flag-preserving Thumb1 insts When we see a non flag-setting instruction for which only the flag-setting version is available in Thumb1, we should give a better error message than "invalid instruction". Differential Revision: https://reviews.llvm.org/D27414 llvm-svn: 288805	2016-12-06 12:59:08 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
Oleg Ranevskyy	e2ae41519f	[ARM] Fix for 64-bit CAS expansion on ARM32 with -O0 Summary: This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion. Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality. Example of the broken C++ code: ``` std::atomic<long long> at(2); long long ll = 1; std::atomic_compare_exchange_strong(&at, &ll, 3); ``` Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3. The patch replaces SBC with CMPEQ. Reviewers: t.p.northover Subscribers: aemerson, rengolin, llvm-commits, asl Differential Revision: https://reviews.llvm.org/D27315 llvm-svn: 288433	2016-12-01 22:58:35 +00:00
Matthias Braun	d0ee66c2e9	Move most EH from MachineModuleInfo to MachineFunction Recommitting r288293 with some extra fixes for GlobalISel code. Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288405	2016-12-01 19:32:15 +00:00
Eric Christopher	e70b7c3dfb	Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction" This apprears to have broken the global isel bot: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console This reverts commit r288293. llvm-svn: 288322	2016-12-01 07:50:12 +00:00
Matthias Braun	ed14cb0604	Move most EH from MachineModuleInfo to MachineFunction Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288293	2016-11-30 23:49:01 +00:00
Matthias Braun	f23ef437cc	Move FrameInstructions from MachineModuleInfo to MachineFunction This is per function data so it is better kept at the function instead of the module. This is a necessary step to have machine module passes work properly. Differential Revision: https://reviews.llvm.org/D27185 llvm-svn: 288291	2016-11-30 23:48:42 +00:00
Matthias Braun	c52fe2961c	Clarify rules for reserved regs, fix aarch64 ones. No test case necessary as the problematic condition is checked with the newly introduced assertAllSuperRegsMarked() function. Differential Revision: https://reviews.llvm.org/D26648 llvm-svn: 288277	2016-11-30 22:17:10 +00:00
Kuba Mracek	06995e866b	[xray] Add XRay support for Mach-O in CodeGen Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support. Differential Revision: https://reviews.llvm.org/D26983 llvm-svn: 287734	2016-11-23 02:07:04 +00:00
Tim Northover	b64fb453ea	CodeGen: simplify TargetMachine::getSymbol interface. NFC. No-one actually had a mangler handy when calling this function, and getSymbol itself went most of the way towards getting its own mangler (with a local TLOF variable) so forcing all callers to supply one was just extra complication. llvm-svn: 287645	2016-11-22 16:17:20 +00:00
Pablo Barrio	c41e856f53	[ARM] Relax restriction on variadic functions for tailcall optimization Summary: Variadic functions can be treated in the same way as normal functions with respect to the number and types of parameters. Reviewers: grosbach, olista01, t.p.northover, rengolin Subscribers: javed.absar, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26748 llvm-svn: 287219	2016-11-17 10:56:58 +00:00
Tim Northover	397f9d9d05	ARM: fix CodeGen for 64-bit shifts. One half of the shifts obviously needed conditional selection based on whether the shift amount is more than 32-bits, but leaving the other half as the natural shift isn't acceptable either: it's undefined behaviour to shift a 32-bit value by more than 31. llvm-svn: 287149	2016-11-16 20:54:28 +00:00
Tim Northover	ed55a05b01	GlobalISel: remove unused variable to silence warning. llvm-svn: 287027	2016-11-15 21:06:07 +00:00
Diana Picus	895c6aa6fd	[ARM] GlobalISel: Remove unused members. NFCI This silences some warnings that I didn't see with my host compiler. llvm-svn: 286981	2016-11-15 16:42:10 +00:00
Diana Picus	90f0a84943	[ARM] Make sure GlobalISel is only initialized once. NFCI Move some code inside the proper 'if' block to make sure it is only run once, when the subtarget is first created. Things can still break if we use different ARM target machines or if we have functions with different 'target-cpu' or 'target-features', we should fix that too in the future. llvm-svn: 286974	2016-11-15 15:38:15 +00:00
Javed Absar	f043dac25d	[ARM] Add machine scheduler for Cortex-R52 This patch adds the Sched Machine Model for Cortex-R52. Details of the pipeline and descriptions are in comments in file ARMScheduleR52.td included in this patch. Reviewers: rengolin, jmolloy Differential Revision: https://reviews.llvm.org/D26500 llvm-svn: 286949	2016-11-15 11:34:54 +00:00
Tim Northover	3d38c38826	ARM: try to fix GCC 4.8 compilation again after r286881. llvm-svn: 286882	2016-11-14 20:31:53 +00:00
Tim Northover	46a6f0fbf0	Recommit: ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. Fixed usage of std::sort so that we (hopefully) use instantiations that actually exist in GCC 4.8. llvm-svn: 286881	2016-11-14 20:28:24 +00:00
Tim Northover	1b66f39cf2	Revert "ARM: sort register lists by encoding in push/pop instructions." This reverts commit 286866. It broke a bot, something to do with exactly which templates std::sort accepts. llvm-svn: 286867	2016-11-14 19:05:28 +00:00
Tim Northover	e908ea844c	ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. llvm-svn: 286866	2016-11-14 19:02:17 +00:00
Diana Picus	22274934f4	[ARM] Add plumbing for GlobalISel Add GlobalISel skeleton, up to the point where we can select a ret void. llvm-svn: 286573	2016-11-11 08:27:37 +00:00
Davide Italiano	a22ddddfea	[Target] Rename X86/ARM Assembly printer to reflect reality. This shows up a lot profiling LTO testcases with -time-passes, so better have a non confusing name. llvm-svn: 286488	2016-11-10 18:39:31 +00:00
Oliver Stannard	18ca2adf2d	[ARM] Thumb2 LDR (literal) should accept PC as the destination The version of this instruction with the .w suffix already correctly accepts this, but the alias without the .w did not. Differential Revision: https://reviews.llvm.org/D26499 llvm-svn: 286446	2016-11-10 13:20:41 +00:00
James Molloy	b03e0879fc	[Thumb1] Move padding earlier when synthesizing TBBs off of the PC When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding. If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset. In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4. llvm-svn: 286107	2016-11-07 13:38:21 +00:00
Saleem Abdulrasool	804e12eeb5	ARM: lower fpowi appropriately for Windows ARM This handles the last case of the builtin function calls that we would generate code which differed from Microsoft's ABI. Rather than generating a call to `__pow{d,s}i2` we now promote the parameter to a float or double and invoke `powf` or `pow` instead. Addresses PR30825! llvm-svn: 286082	2016-11-06 19:46:54 +00:00
Weiming Zhao	962eaaea9c	[Cortex-M0] Atomic lowering Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls. Reviewers: rengolin, efriedma, t.p.northover, jmolloy Subscribers: efriedma, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26120 llvm-svn: 285969	2016-11-03 21:49:08 +00:00
Chandler Carruth	5589aa60c7	Remove a redundant condition found by PVS-Studio. Filed http://llvm.org/PR30897 to teach Clang to warn on this kind of stuff. llvm-svn: 285945	2016-11-03 17:42:02 +00:00
James Molloy	e7d97368f2	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently" This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912	2016-11-03 14:08:01 +00:00
James Molloy	b60d8b1987	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893	2016-11-03 10:18:20 +00:00
Nirav Dave	0a392a8e7f	[ARM][MC] Cleanup ARM Target Assembly Parser Summary: Correctly parse end-of-statement tokens and handle preprocessor end-of-line comments in ARM assembly processor. Reviewers: rnk, majnemer Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26152 llvm-svn: 285830	2016-11-02 16:22:51 +00:00
Alex Bradbury	58eba09949	[TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h As it stands, the OperandMatchResultTy is only included in the generated header if there is custom operand parsing. However, almost all backends make use of MatchOperand_Success and friends from OperandMatchResultTy for e.g. parseRegister. This is a pain when starting an AsmParser for a new backend that doesn't yet have custom operand parsing. Move the enum to MCTargetAsmParser.h. This patch is a prerequisite for D23563 Differential Revision: https://reviews.llvm.org/D23496 llvm-svn: 285705	2016-11-01 16:32:05 +00:00
James Molloy	70a3d6df52	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables [Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690	2016-11-01 13:37:41 +00:00
Saleem Abdulrasool	e1aa782bd0	CodeGen: further loosen -O0 CG for WoA division Generate the slowest possible codepath for noopt CodeGen. Even trying to be clever with the negated jump can cause out-of-range jumps. Use a wide branch instead. Although the code is modelled simplistically, the later optimizations would recombine the branching into `cbz` if possible. This re-enables the previous optimization as well as hopefully gives us working code in all cases. Addresses PR30356! llvm-svn: 285649	2016-10-31 22:12:37 +00:00
Saleem Abdulrasool	075d2e3c59	ARM: ensure that the Windows DBZ check is in range The Windows ARM target expects the compiler to emit a division-by-zero check. The check would use the form of: cmp r?, #0 cbz .Ltrap b .Lbody .Lbody: ... .Ltrap: udf #249 @ __brkdiv0 This works great most of the time. However, if the body of the function is greater than 127 bytes, the branch target limitation of cbz becomes an issue. This occurs in the unoptimized code generation cases sometimes (like in compiler-rt). Since this is a matter of correctness, possibly pay a small penalty instead. We now form this slightly differently: cbnz .Lbody udf #249 @ __brkdiv0 .Lbody: ... The positive case is through the branch instead of being the next instruction. However, because of the basic block layout, the negated branch is going to be a short distance always (2 bytes away, after the inserted __brkdiv0). The new t__brkdiv0 instruction is required to explicitly mark the instruction as a terminator as the generic UDF instruction is not a terminator. Addresses PR30532! llvm-svn: 285312	2016-10-27 16:59:22 +00:00
Sam Parker	e7d9505c08	[ARM] Predicate UMAAL selection on hasDSP. UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278	2016-10-27 09:47:10 +00:00
Tim Northover	a9cc385664	ARM: don't rely on push/pop reglists being in order when folding SP adjust. It would be a very nice invariant to rely on, but unfortunately it doesn't necessarily hold (and the causes of mis-sorted reglists appear to be quite varied) so to be robust the frame lowering code can't assume that the first register in the list is also the first one that actually gets pushed. Should fix an issue where we were turning something like: push {r8, r4, r7, lr} sub sp, #24 into nonsense like: push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr} llvm-svn: 285232	2016-10-26 20:01:00 +00:00
Matthias Braun	8b38ffaa98	CodeGen/Passes: Pass MachineFunction as functor arg; NFC Passing a MachineFunction as argument is more natural and avoids an unnecessary round-trip through the logic determining the correct Subtarget because MachineFunction already has a reference anyway. llvm-svn: 285039	2016-10-24 23:23:02 +00:00
Eli Friedman	b37864b58d	Revert r284580+r284917. ("Synthesize TBB/TBH instructions") The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993	2016-10-24 17:20:50 +00:00
James Molloy	2bae8640d7	[ARM] Fix crash in ConstantIslands tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator. llvm-svn: 284917	2016-10-22 09:58:37 +00:00
Benjamin Kramer	2a8bef8769	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Sjoerd Meijer	2fc4cb6f72	Reapply r284571 (with the new tests fixed). llvm-svn: 284588	2016-10-19 13:43:02 +00:00
James Molloy	fbfd173447	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 284580	2016-10-19 12:06:49 +00:00
Sjoerd Meijer	3f5111d363	Revert of r284571 because of failing tests. llvm-svn: 284572	2016-10-19 07:45:48 +00:00
Sjoerd Meijer	a318779263	Checking FP function attribute values and adding more build attribute tests. This renames the function for checking FP function attribute values and also adds more build attribute tests (which are in separate files because build attributes are set per file). Differential Revision: https://reviews.llvm.org/D25625 llvm-svn: 284571	2016-10-19 07:25:06 +00:00
Eli Friedman	c0a717ba5b	Improve ARM lowering for "icmp <2 x i64> eq". The custom lowering is pretty straightforward: basically, just AND together the two halves of a <4 x i32> compare. Differential Revision: https://reviews.llvm.org/D25713 llvm-svn: 284536	2016-10-18 21:03:40 +00:00
Javed Absar	e7c338081a	[ARM] Assign cost of scaling for Cortex-R52 This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy llvm-svn: 284460	2016-10-18 09:08:54 +00:00
Dean Michael Berris	156f6cafc2	[XRay] Support for for tail calls for ARM no-Thumb This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456	2016-10-18 05:54:15 +00:00
Davide Italiano	e9cdb24f67	[ArmFastISel] Kill dead code. NFCI. llvm-svn: 284320	2016-10-16 01:09:39 +00:00
Eric Christopher	445c952bd0	Tidy the calls to getCurrentSection().first -> getCurrentSectionOnly to help readability a bit. llvm-svn: 284202	2016-10-14 05:47:37 +00:00

1 2 3 4 5 ...

8857 Commits