llvm-project

Commit Graph

Author	SHA1	Message	Date
Krzysztof Parzyszek	21dc8bdd9e	[Hexagon] Add PIC support llvm-svn: 256025	2015-12-18 20:19:30 +00:00
Changpeng Fang	c9963936e7	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256022	2015-12-18 20:04:28 +00:00
Changpeng Fang	ef735b74c1	Revert "AMDGPU/SI: Test commit" This reverts commit a493cb636e0152ad28210934a47c6c44b1437193. llvm-svn: 256021	2015-12-18 20:04:26 +00:00
Changpeng Fang	7fdf674c2e	AMDGPU/SI: Test commit Summary: This is just my first commit. Test! Reviewers: none Subscribers: none Differential Revision: none llvm-svn: 256020	2015-12-18 19:57:41 +00:00
Jun Bum Lim	3509d64c24	[AArch64] Promote loads from stores This change promotes load instructions which directly read from stores by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256004	2015-12-18 18:08:30 +00:00
Zlatko Buljan	252cca555f	[mips][microMIPS][DSP] Implement PACKRL.PH, PICK.PH, PICK.QB, SHILO, SHILOV and WRDSP instructions Differential Revision: http://reviews.llvm.org/D14429 llvm-svn: 255991	2015-12-18 08:59:37 +00:00
Eric Christopher	8c2adf6b49	Remove unused class variables. llvm-svn: 255939	2015-12-17 23:43:40 +00:00
Hans Wennborg	a6a2e512cf	[X86] Use push-pop for materializing small constants under 'minsize' Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936	2015-12-17 23:18:39 +00:00
Matthew Simpson	13dddb0799	Revert "[AArch64] Add DAG combine for extract extend pattern" This reverts commit r255895. The patch breaks internal tests. Reverting until a fix is ready. llvm-svn: 255928	2015-12-17 21:29:47 +00:00
Dan Gohman	670a60ed52	[WebAssembly] Switch WebAssemblyMCAsmInfo.h from MCAsmInfo to MCAsmInfoELF. llvm-svn: 255925	2015-12-17 20:50:45 +00:00
Tom Stellard	caaa3aa07c	AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908	2015-12-17 17:05:09 +00:00
Nicolai Haehnle	87323da6eb	AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex Summary: The method insertNOPs expected the number of wait states to be passed as parameter, while eliminateFrameIndex passed the immediate argument for the S_NOP, leading to an off-by-one error. Rename the method to make the meaning of its parameter clearer. The number of 4 / 5 wait states (which is what the method has always _tried_ to do according to the comment) is correct according to the hardware docs. I stumbled upon this while trying to track down the cause of https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed, this patch unfortunately does not fix that bug... Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15542 llvm-svn: 255906	2015-12-17 16:46:42 +00:00
Rafael Espindola	f44db24e1f	Avoid explicit relocation sorting most of the time. These days relocations are created and stored in a deterministic way. The order they are created is also suitable for the .o file, so we don't need an explicit sort. The last remaining exception is MIPS. llvm-svn: 255902	2015-12-17 16:22:06 +00:00
Rafael Espindola	9e1cae510f	Revert "[AArch64] Enable PostRAScheduler for AArch64 generic build" This reverts commit r255896. It broke the tests. llvm-svn: 255899	2015-12-17 15:12:26 +00:00
Rafael Espindola	d0e16522c7	Always sort by offset first. NFC. Every target changing sortRelocs was first calling the parent implementation. Just run that first. llvm-svn: 255898	2015-12-17 15:08:24 +00:00
Diego Novillo	8561841875	Fix unused variable warning in release builds. NFC. llvm-svn: 255897	2015-12-17 14:58:34 +00:00
MinSeong Kim	d05e9fd194	[AArch64] Enable PostRAScheduler for AArch64 generic build This patch enables PostRAScheduler specifically for AArch64 generic build, which is beneficial from the performance perspective. Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed. Also benchmarks from LLVM test-suite did not regress. Differential Revision: http://reviews.llvm.org/D15557 llvm-svn: 255896	2015-12-17 14:51:22 +00:00
Matthew Simpson	4355e404d5	[AArch64] Add DAG combine for extract extend pattern This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) -> (extract_vector_elt v, i). The combine enables us to better match some SMOV patterns. Differential Revision: http://reviews.llvm.org/D15515 llvm-svn: 255895	2015-12-17 14:30:55 +00:00
Rafael Espindola	850ba46dd6	Simplify. NFC. llvm-svn: 255894	2015-12-17 14:19:52 +00:00
Alexey Bataev	7b72b658cc	[X86] Add option for enabling LEA optimization pass, by Andrey Turetsky Add option to enable/disable LEA optimization pass. By default the pass is disabled. Differential Revision: http://reviews.llvm.org/D15573 llvm-svn: 255881	2015-12-17 07:34:39 +00:00
Dan Gohman	5bf22fc84a	[WebAssembly] Convert WebAssemblyTargetObjectFile to TargetLoweringObjectFileELF llvm-svn: 255877	2015-12-17 04:55:44 +00:00
Matthias Braun	454192917b	AArch64: Simplify emitEpilogue() and related code; NFC This is in preparation to an upcoming patch. llvm-svn: 255872	2015-12-17 03:18:47 +00:00
Dan Gohman	05ac43fec3	[WebAssembly] Experimental ELF writer support This creates the initial infrastructure for writing ELF output files. It doesn't yet have any implementation for encoding instructions. Differential Revision: http://reviews.llvm.org/D15555 llvm-svn: 255869	2015-12-17 01:39:00 +00:00
JF Bastien	eefff9ccc5	WebAssembly: update expected torture test failures We now have 240 expected failures. llvm-svn: 255858	2015-12-17 00:12:06 +00:00
Dan Gohman	4172953813	[WebAssembly] Fix legalization of shift operators on large integer types. llvm-svn: 255847	2015-12-16 23:25:51 +00:00
Derek Schuff	8bb5f2927a	[WebAssembly] Implement eliminateCallFramePseudo Summary: Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN pseudo-instructions. Add a test calling a vararg function which causes non-0 adjustments. This revealed an issue with RegisterCoalescer wherein it eliminates a COPY from SP32 to a vreg but failes to update the live ranges of EXPR_STACK, causing a machineinstr verifier failure (so this test is commented out). Also add a dynamic alloca test, which causes a callseq_end dag node with a 0 (instead of undef) second argument to be generated. We currently fail to select that, so adjust the ADJCALLSTACKUP tablegen code to handle it. Differential Revision: http://reviews.llvm.org/D15587 llvm-svn: 255844	2015-12-16 23:21:30 +00:00
Ahmed Bougacha	66834ec6e1	[AArch64] Simplify some TRI/TII getters. NFC. We don't need static_casts when we use the right Subtarget. llvm-svn: 255836	2015-12-16 22:54:06 +00:00
Ahmed Bougacha	cecb6b0865	[CodeGen] Make MachineInstrBuilder::copyImplicitOps const. NFC. This matches the other MIB methods, none of which modify the builder. Without this, we can't chain copyImplicitOps. Also reformat the few users, in PPCEarlyReturn. llvm-svn: 255828	2015-12-16 22:15:30 +00:00
Manman Ren	cbe4f9417d	CXX_FAST_TLS calling convention: performance improvement for AArch64. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. The target independent portion was committed as r255353. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15341 llvm-svn: 255821	2015-12-16 21:04:19 +00:00
Derek Schuff	993d35b4aa	Remove now-unused include llvm-svn: 255817	2015-12-16 20:43:10 +00:00
Derek Schuff	83717cc297	Iterate over phys regs instead llvm-svn: 255816	2015-12-16 20:43:08 +00:00
Derek Schuff	45cd5a79b2	[WebAssembly] Print an extra local decl when the user stack pointer is used Differential Revision: http://reviews.llvm.org/D15546 llvm-svn: 255815	2015-12-16 20:43:06 +00:00
Krzysztof Parzyszek	4f9164d9b3	[Hexagon] Misc fixes to r255807 llvm-svn: 255811	2015-12-16 20:07:04 +00:00
Krzysztof Parzyszek	56bbf54b43	[Hexagon] Update the Hexagon packetizer llvm-svn: 255807	2015-12-16 19:36:12 +00:00
Reid Kleckner	187d33ee74	Revert "[ARM] Add ARMv8.2-A FP16 scalar instructions" This reverts commit r255762. llvm-svn: 255806	2015-12-16 19:21:03 +00:00
Dan Gohman	b3aa1ecab0	[WebAssembly] Fix the CFG Stackifier to handle unoptimized branches If a branch both branches to and falls through to the same block, treat it as an explicit branch. llvm-svn: 255803	2015-12-16 19:06:41 +00:00
Matt Arsenault	e05ff15186	AMDGPU: Override getCFInstrCost The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796	2015-12-16 18:37:19 +00:00
Dan Gohman	e2831b4e27	[WebAssembly] Use the new offset syntax for memory operands in inline asm. llvm-svn: 255788	2015-12-16 18:14:49 +00:00
Ulrich Weigand	88a7a2eac7	[SystemZ] Sort relocs to avoid code corruption by linker optimization The SystemZ linkers provide an optimization to transform a general- or local-dynamic TLS sequence into an initial-exec sequence if possible. Do do that, the compiler generates a function call to __tls_get_offset, which is a brasl instruction annotated with two relocations: - a R_390_PLT32DBL to install __tls_get_offset as branch target - a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that the TLS optimization should be performed if possible If the optimization is performed, the brasl is replaced by an ld load instruction. However, both relocs are processed independently by the linker. Therefore it is crucial that the R_390_PLT32DBL is processed first (installing the branch target for the brasl) and the R_390_TLS_GDCALL is processed second (replacing the whole brasl with an ld). If the relocs are swapped, the linker will first replace the brasl with an ld, and then install the __tls_get_offset branch target offset. Since ld has a different layout than brasl, this may even result in a completely different (or invalid) instruction; in any case, the resulting code is corrupted. Unfortunately, the way the MC common code sorts relocations causes these two to always end up the wrong way around, resulting in wrong code generation by the linker and crashes. This patch overrides the sortRelocs routine to detect this particular pair of relocs and enforce the required order. llvm-svn: 255787	2015-12-16 18:12:40 +00:00
Ulrich Weigand	47f3649374	[SystemZ] Fix assertion failure in adjustSubwordCmp When comparing a zero-extended value against a constant small enough to be in range of the inner type, it doesn't matter whether a signed or unsigned compare operation (for the outer type) is being used. This is why the code in adjustSubwordCmp had this assertion: assert(C.ICmpType == SystemZICMP::Any && "Signedness shouldn't matter here."); assuming the the caller had already detected that fact. However, it turns out that there cases, in particular with always-true or always- false conditions that have not been eliminated when compiling at -O0, where this is not true. Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any here, we can simply set it safely to SystemZICMP::Any, however. llvm-svn: 255786	2015-12-16 18:04:06 +00:00
Tobias Edler von Koch	b51460cf86	[Hexagon] Make memcpy lowering thread-safe This removes an unpleasant hack involving a global variable for special lowering of certain memcpy calls. These are now lowered as intended in EmitTargetCodeForMemcpy in the same way that other targets do it. llvm-svn: 255785	2015-12-16 17:29:37 +00:00
Dan Gohman	30a42bf585	[WebAssembly] Support more kinds of inline asm operands llvm-svn: 255782	2015-12-16 17:15:17 +00:00
Oliver Stannard	2de8c16913	[ARM] Add ARMv8.2-A FP16 vector instructions ARMv8.2-A adds 16-bit floating point versions of all existing SIMD floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. Note that VFP without SIMD is not a valid combination for any version of ARMv8-A, but I have ensured that these instructions all depend on both FeatureNEON and FeatureFullFP16 for consistency. Differential Revision: http://reviews.llvm.org/D15039 llvm-svn: 255764	2015-12-16 12:37:39 +00:00
Oliver Stannard	48568cbe18	[ARM] Add ARMv8.2-A FP16 scalar instructions ARMv8.2-A adds 16-bit floating point versions of all existing VFP floating-point instructions. This is an optional extension, so all of these instructions require the FeatureFullFP16 subtarget feature. The assembly for these instructions uses S registers (AArch32 does not have H registers), but the instructions have ".f16" type specifiers rather than ".f32" or ".f64". The top 16 bits of each source register are ignored, and the top 16 bits of the destination register are set to zero. These instructions are mostly the same as the 32- and 64-bit versions, but they use coprocessor 9 rather than 10 and 11. Two new instructions, VMOVX and VINS, have been added to allow packing and extracting two 16-bit floats stored in the top and bottom halves of an S register. New fixup kinds have been added for the PC-relative load and store instructions, but no ELF relocations have been added as they have a range of 512 bytes. Differential Revision: http://reviews.llvm.org/D15038 llvm-svn: 255762	2015-12-16 11:35:44 +00:00
Michael Kuperstein	e75e6e2a23	[X86] Improve shift combining This folds (ashr (shl a, [56,48,32,24,16]), SarConst) into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or into (lshr, (sext (a), SarConst - [56,48,32,24,16])) depending on sign of (SarConst - [56,48,32,24,16]) sexts in X86 are MOVs. The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size). However the MOVs have 2 advantages to SHIFTs on x86: 1. MOVs can write to a register that differs from source. 2. MOVs accept memory operands. This fixes PR24373. Patch by: evgeny.v.stupachenko@intel.com Differential Revision: http://reviews.llvm.org/D13161 llvm-svn: 255761	2015-12-16 11:22:37 +00:00
Reid Kleckner	7850c9f5ca	[WinEH] Make llvm.x86.seh.recoverfp work on x64 It adjusts from RSP-after-prologue to RBP, which is what SEH filters need to do before they can use llvm.localrecover. Fixes SEH filter captures, which were broken in r250088. Issue reported by Alex Crichton. llvm-svn: 255707	2015-12-15 23:40:58 +00:00
Hans Wennborg	7036e503d7	Fix "Not having LAHF/SAHF" assert. It wants to assert that the subtarget is 64-bit, not the register. llvm-svn: 255703	2015-12-15 23:21:46 +00:00
Tom Stellard	7750f4ed9e	AMDGPU/SI: Set the code object work group segment size when targeting HSA Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15493 llvm-svn: 255702	2015-12-15 23:15:25 +00:00
Sanjay Patel	271efcdf20	[x86] inline calls to fmaxf / llvm.maxnum.f32 using maxss (PR24475) This patch improves on the suggested codegen from PR24475: https://llvm.org/bugs/show_bug.cgi?id=24475 but only for the fmaxf() case to start, so we can sort out any bugs before extending to fmin, f64, and vectors. The fmax / maxnum definitions provide us flexibility for signed zeros, so the only thing we have to worry about in this replacement sequence is NaN handling. Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes a problem: SelectionDAGBuilder::visitSelect() transforms compare/select instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps that should be checking for NaN inputs or global unsafe-math before transforming? As it stands, that bypasses a big set of optimizations that the x86 backend already has in PerformSELECTCombine(). Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so we have completely unnecessary operations happening on undef elements of the vector. Differential Revision: http://reviews.llvm.org/D15294 llvm-svn: 255700	2015-12-15 23:11:43 +00:00
James Y Knight	99fcb721b2	[Sparc] Tweak r255668: Use llvm_unreachable. llvm-svn: 255698	2015-12-15 23:07:16 +00:00

1 2 3 4 5 ...

35449 Commits