llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	daeafb4c2a	Add back r201608, r201622, r201624 and r201625 r201608 made llvm corretly handle private globals with MachO. r201622 fixed a bug in it and r201624 and r201625 were changes for using private linkage, assuming that llvm would do the right thing. They all got reverted because r201608 introduced a crash in LTO. This patch includes a fix for that. The issue was that TargetLoweringObjectFile now has to be initialized before we can mangle names of private globals. This is trivially true during the normal codegen pipeline (the asm printer does it), but LTO has to do it manually. llvm-svn: 201700	2014-02-19 17:23:20 +00:00
Daniel Jasper	7e198ad862	Revert r201622 and r201608. This causes the LLVMgold plugin to segfault. More information on the replies to r201608. llvm-svn: 201669	2014-02-19 12:26:01 +00:00
Rafael Espindola	09dcc6a536	Fix PR18743. The IR @foo = private constant i32 42 is valid, but before this patch we would produce an invalid MachO from it. It was invalid because it would use an L label in a section where the liker needs the labels in order to atomize it. One way of fixing it would be to just reject this IR in the backend, but that would not be very front end friendly. What this patch does is use an 'l' prefix in sections that we know the linker requires symbols for atomizing them. This allows frontends to just use private and not worry about which sections they go to or how the linker handles them. One small issue with this strategy is that now a symbol name depends on the section, which is not available before codegen. This is not a problem in practice. The reason is that it only happens with private linkage, which will be ignored by the non codegen users (llvm-nm and llvm-ar). llvm-svn: 201608	2014-02-18 22:24:57 +00:00
Nico Rieck	35a237d4ed	Actually call FileCheck in tests llvm-svn: 201491	2014-02-16 13:27:39 +00:00
Rafael Espindola	e6f37b908a	"foo" is not a ppc instruction, don't try to parse it. llvm-svn: 201336	2014-02-13 15:33:35 +00:00
Daniel Sanders	753e17629d	Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Changes since review (and last commit attempt): - Fixed test failures that were missed due to configuration of local build. (fixes crash.ll and a couple others). - Fixed tests that happened to pass because the local build was on X86 (should fix 2007-12-17-InvokeAsm.ll) - mature-mc-support.ll's should no longer require all targets to be compiled. (should fix ARM and PPC buildbots) - Object output (-filetype=obj and similar) now forces the integrated assembler to be enabled regardless of default setting or -no-integrated-as. (should fix SystemZ buildbots) Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201333	2014-02-13 14:44:26 +00:00
Daniel Sanders	abe212a3b8	Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call It introduced multiple test failures in the buildbots. llvm-svn: 201241	2014-02-12 15:39:20 +00:00
Daniel Sanders	a7d504cf58	Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201237	2014-02-12 14:44:54 +00:00
Rafael Espindola	61acf5d9b0	Fix a bug with .weak_def_can_be_hidden: Mutable variables cannot use it. Thanks to John McCall for noticing it. llvm-svn: 200977	2014-02-07 16:21:30 +00:00
Rafael Espindola	803fb10874	Convert test to FileCheck. llvm-svn: 200955	2014-02-06 23:35:22 +00:00
David Blaikie	5e390e4df7	DebugInfo: Remove some unneeded conditionals now that DIBuilder no longer emits zero-length arrays as {i32 0} A bunch of test cases needed to be cleaned up for this, many my fault - when implementid imported modules I updated test cases by simply duplicating the prior metadata field - which wasn't always the empty metadata entry. llvm-svn: 200731	2014-02-04 01:23:52 +00:00
Hal Finkel	4e703bcecd	Handle spilling the PPC GPRC_NOR0 register class GPRC_NOR0 is not a subclass of GPRC (because it also contains the ZERO pseudo register). As a result, we also need to check for it in the spilling code. llvm-svn: 200288	2014-01-28 05:32:58 +00:00
Hal Finkel	5eb2466243	Add a TBAA CodeGen failure test case I disabled the use of TBAA in CodeGen in r200093. This adds a test case that demonstrates the problems with inttoptr and TBAA in CodeGen (and, specifically, the problem that causes LLVM to miscompile itself in Release mode). This test will currently fail if -use-tbaa-in-sched-mi is enabled. llvm-svn: 200097	2014-01-25 20:16:36 +00:00
Hal Finkel	3e4a34c8c3	Fix pointer info on PPC byval stores For PPC64 SVR (and Darwin), the stores that take byval aggregate parameters from registers into the stack frame had MachinePointerInfo objects with incorrect offsets. These offsets are relative to the object itself, not to the stack frame base. This fixes self hosting on PPC64 when compiling with -enable-aa-sched-mi. llvm-svn: 199763	2014-01-21 20:15:58 +00:00
Roman Divacky	32143e2bda	Implement initial-exec TLS for PPC32. llvm-svn: 197824	2013-12-20 18:08:54 +00:00
Rafael Espindola	382ee385fd	One ppc32-darwin, a i64 inside a structure can have 32 bit alignment. Thanks for Iain Sandoe for testing this with the original gcc. Clang was already getting this right. llvm-svn: 197572	2013-12-18 14:35:37 +00:00
Rafael Espindola	c56960871f	Add a reduced testcase from the recent bootstrap crash. llvm-svn: 197426	2013-12-16 21:24:00 +00:00
Iain Sandoe	e0b4cb62f5	[Powerpc darwin] AsmParser Base implementation. This is a base implementation of the powerpc-apple-darwin asm parser dialect. * Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions. * Enables parsing of {r,f,v}XX as register identifiers. * Enables parsing of lo16() hi16() and ha16() as modifiers. The changes to the test case are from David Fang (fangism). llvm-svn: 197324	2013-12-14 13:34:02 +00:00
Tim Northover	444eba2d4d	PowerPC: add Linux triple to TLS tests The tests were failing on OS X. llvm-svn: 197146	2013-12-12 11:51:23 +00:00
Hal Finkel	ceb1f12d9a	Improve instruction scheduling for the PPC POWER7 Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099	2013-12-12 00:19:11 +00:00
Hal Finkel	94a6f380bb	Fix the PPC subsumes-predicate check For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089	2013-12-11 23:12:25 +00:00
Roman Divacky	1bab70580d	Merge all tls tests to two files. One for normal codegen (initial and local exec) and one for PIC codegen (local and general dynamic). llvm-svn: 197081	2013-12-11 22:25:39 +00:00
Roman Divacky	e439f92a6c	Remove test thats testing the same thing as tls.ll. llvm-svn: 197074	2013-12-11 21:37:04 +00:00
David Fang	1b01849f2d	on darwin<10, fallback to .weak_definition (PPC,X86) .weak_def_can_be_hidden was not yet supported by the system assembler llvm-svn: 196970	2013-12-10 21:37:41 +00:00
Alp Toker	f907b891da	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Hal Finkel	ca93e47258	Convert a PPC test from grep to FileCheck Convert this test to FileCheck, and improve it to check for the instructions it is trying to exclude instead of checking for register use (especially because grepping for r1 can be thrown off, for example, by a use of r12). llvm-svn: 195979	2013-11-30 20:04:33 +00:00
Hal Finkel	2651f97333	Desensitize a couple of PPC regression tests Use CHECK-DAG to make these regression tests more resilient against changes in instruction scheduling. llvm-svn: 195978	2013-11-30 19:52:28 +00:00
Hal Finkel	2b655bb228	Update the cpu specified on some PPC regression tests Some of these tests did not specify a cpu but were also sensitive to instruction scheduling and/or register assignment choices. A few others similarly-sensitive tests specified a cpu (often the POWER7), and while the P7 currently uses the default model for PPC64, this will soon change. For those tests which should not really be cpu-dependent anyway, the cpu is set to the generic 'ppc64'. llvm-svn: 195977	2013-11-30 19:39:27 +00:00
Manman Ren	409558f81e	Debug Info: update testing cases to specify the debug info version number. We are going to drop debug info without a version number or with a different version number, to make sure we don't crash when we see bitcode files with different debug info metadata format. llvm-svn: 195504	2013-11-22 21:49:45 +00:00
Hal Finkel	884bde3031	PPC popcnt[dw] do not have record forms The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272	2013-11-20 20:54:55 +00:00
Hal Finkel	22498fa6e3	PPC: Optimize rldicl generation for masked shifts Masking operations (where only some number of the low bits are being kept) are selected to rldicl(x, 0, mb). If x is a logical right shift (which would become rldicl(y, 64-n, n)), we might be able to fold the two instructions together: rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb The right shift is really a left rotate followed by a mask, and if the explicit mask is a more-restrictive sub-mask of the mask implied by the shift, only one rldicl is needed. llvm-svn: 195185	2013-11-20 01:10:15 +00:00
Bob Wilson	9f3e6b25ee	Avoid illegal integer promotion in fastisel Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840	2013-11-15 19:09:27 +00:00
Rafael Espindola	4929301af4	Error if we see an alias to a declaration. In ELF and COFF an alias is just another offset in a section. There is no way to represent an alias to something in another file. In MachO, the spec has the N_INDR type which should allow for exactly that, but is not currently implemented. Given that it is specified but not implemented, we error in codegen to avoid miscompiling but don't reject aliases to declarations in the verifier to leave the option open of implementing it. In the past we have used alias to declarations as a way of implementing weakref, which is why it exists in some old tests which this patch updates. llvm-svn: 194705	2013-11-14 13:58:06 +00:00
Hal Finkel	c6a243987d	Add PPC option for full register names in asm On non-Darwin PPC systems, we currently strip off the register name prefix prior to instruction printing. So instead of something like this: mr r3, r4 we print this: mr 3, 4 The first form is the default on Darwin, and is understood by binutils, but not yet understood by our integrated assembler. Once our integrated-as understands full register names as well, this temporary option will be replaced by tying this functionality to the verbose-asm option. The numeric-only form is compatible with legacy assemblers and tools, and is also gcc's default on most PPC systems. On the other hand, it is harder to read, and there are some analysis tools that expect full register names. llvm-svn: 194384	2013-11-11 14:58:40 +00:00
Rafael Espindola	d1cac0af6b	Convert another llc -filetype=obj test. llvm-svn: 193548	2013-10-28 22:17:19 +00:00
Rafael Espindola	3a8c0734f9	Convert another llc -filetype=obj test. llvm-svn: 193547	2013-10-28 22:11:47 +00:00
Rafael Espindola	060e6444ea	Convert another llc -filetype=obj test. llvm-svn: 193546	2013-10-28 22:05:05 +00:00
Andrew Trick	1bda72144d	Update PPC loop tests after SCEV non-unit-stride checkin r193015. llvm-svn: 193021	2013-10-19 00:14:04 +00:00
Bill Schmidt	3684fdd59f	[PATCH] Fix PR17168 (DAG scheduler inserts DBG_VALUE before PHI with fast-isel) PR17168 describes a test case that fails when compiling for debug with fast-isel. Investigation showed that the test was failing because a DBG_VALUE machine instruction was placed prior to a PHI. For this problem to occur requires the following: * Compile for debug * Compile with fast-isel * In a block B, fast-isel must partially succeed before punting to DAG-isel * B must start with a PHI * The first unhandled node in the DAG must not generate a machine instruction * A debug value with an order less than that of that first node exists When all of these circumstances apply, the existing test that an instruction was not inserted won't fire. Currently it tests whether the block is empty, or whether the last instruction generated is a phi. When fast-isel has partially succeeded, the last instruction generated will not be a phi. Instead, we need to check whether the current insert position is immediately following a phi. This patch adds that check, and adds the test case from the PR as a regression test. llvm-svn: 192976	2013-10-18 14:20:11 +00:00
Richard Sandiford	95f7ba988b	Replace sra with srl if a single sign bit is required E.g. (and (sra (i32 x) 31) 2) -> (and (srl (i32 x) 30) 2). llvm-svn: 192884	2013-10-17 11:16:57 +00:00
Manman Ren	adf4cc171e	TBAA: update tbaa format from scalar format to struct-path aware format. llvm-svn: 191690	2013-09-30 18:17:55 +00:00
Manman Ren	1047fe452f	TBAA: remove !tbaa from testing cases when they are not needed. llvm-svn: 191689	2013-09-30 18:17:35 +00:00
Robert Wilhelm	f0cfb83bb4	Fix spelling intruction -> instruction. llvm-svn: 191610	2013-09-28 11:46:15 +00:00
Bill Schmidt	cea1596205	[PowerPC] Fix PR17354: Generate nop after local calls for PIC code. When generating code for shared libraries, even local calls may be intercepted, so we need a nop after the call for the linker to fix up the TOC. Test case adapted from the one provided in PR17354. llvm-svn: 191440	2013-09-26 17:09:28 +00:00
Bill Schmidt	bb381d7063	[PowerPC] Fix problems with large code model (PR17169). Large code model on PPC64 requires creating and referencing TOC entries when using the addis/ld form of addressing. This was not being done in all cases. The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two test cases are also modified to reflect this requirement. Fast-isel was not creating correct code for loading floating-point constants using large code model. This also requires the addis/ld form of addressing. Previously we were using the addis/lfd shortcut which is only applicable to medium code model. One test case is modified to reflect this requirement. llvm-svn: 190882	2013-09-17 20:03:25 +00:00
Hal Finkel	40c34781b5	PPC: Don't restrict lvsl generation to after type legalization This is a re-commit of r190764, with an extra check to make sure that we're not performing the transformation on illegal types (a small test case has been added for this as well). Original commit message: The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190771	2013-09-15 22:09:58 +00:00
Hal Finkel	31025a6325	Revert r190764: PPC: Don't restrict lvsl generation to after type legalization This is causing test-suite failures. Original commit message: The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190765	2013-09-15 15:41:11 +00:00
Hal Finkel	2945d4e916	PPC: Don't restrict lvsl generation to after type legalization The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190764	2013-09-15 15:20:54 +00:00
Hal Finkel	31658834e6	Prevent assert in CombinerGlobalAA with null values DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we can't use AA in this case (if we try, then the casting code in AA will assert). llvm-svn: 190763	2013-09-15 02:19:49 +00:00
Hal Finkel	a5ebe426a5	Remove unnecessary TBAA metadata from r190636's test case llvm-svn: 190637	2013-09-12 23:23:12 +00:00
Hal Finkel	262a224712	Fix PPC ABI for ByVal structs with vector members When a structure is passed by value, and that structure contains a vector member, according to the PPC ABI, the structure will receive enhanced alignment (so that the vector within the structure will always be aligned). This should resolve PR16641. llvm-svn: 190636	2013-09-12 23:20:06 +00:00
Hal Finkel	1e2e3ea584	Make the PPC fast-math sqrt expansion safe at 0 In fast-math mode sqrt(x) is calculated using the fast expansion of the reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal sqrt expansions use the associated estimate instructions along with some Newton iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN, which is not correct. Now we explicitly return a result of zero if the input is zero. llvm-svn: 190624	2013-09-12 19:04:12 +00:00
Hal Finkel	7fe6a5390f	PPC: Enable aggressive anti-dependency breaking Aggressive anti-dependency breaking is enabled by default for all PPC cores. This provides a general speedup on the P7 and other platforms (among other factors, the instruction group formation for the non-embedded PPC cores is done during post-RA scheduling). In order to do this safely, the incompatibility between uses of the MFOCRF instruction and anti-dependency breaking are resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed FIXME, the problem was that MFOCRF's output is sensitive to the identify of the source register, and always paired with a shift to undo this effect. Because anti-dependency breaking is unaware of this hidden dependency of the shift amount on the source register of the MFOCRF instruction, changing that register must be inhibited. Two test cases were adjusted: The SjLj test was made more insensitive to register choices and scheduling; the saveCR test disabled anti-dependency breaking because part of what it is testing is proper register reuse. llvm-svn: 190587	2013-09-12 05:24:49 +00:00
Manman Ren	deeafd8a58	Debug Info Testing: updated to use NULL instead of "i32 0" in a few fields. Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom), field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram (Context, Type, ContainingType). llvm-svn: 190205	2013-09-06 21:03:58 +00:00
Bill Schmidt	8470b0f96c	[PowerPC] Call support for fast-isel. This patch adds fast-isel support for calls (but not intrinsic calls or varargs calls). It also removes a badly-formed assert. There are some new tests just for calls, and also for folding loads into arguments on calls to avoid extra extends. llvm-svn: 189701	2013-08-30 22:18:55 +00:00
Bill Schmidt	8d86fe7d6f	[PowerPC] Add handling for conversions to fast-isel. Yet another chunk of fast-isel code. This one handles various conversions involving floating-point. (It also includes some miscellaneous handling throughout the back end for LWA_32 and LWAX_32 that should have been part of the load-store patch.) llvm-svn: 189677	2013-08-30 15:18:11 +00:00
Bill Schmidt	057b04f662	[PowerPC] Handle selection of compare instructions in fast-isel. Mostly trivial patch adding support for compares. The meat of the work was added with the branch support. llvm-svn: 189639	2013-08-30 03:16:48 +00:00
Bill Schmidt	579de9417c	[PowerPC] Miscellaneous fast-isel test cases. Here are a few more tests that now pass after the recent fast-isel commits. llvm-svn: 189637	2013-08-30 02:43:08 +00:00
Bill Schmidt	ccecf26157	[PowerPC] Add loads, stores, and related things to fast-isel. This is the next big chunk of fast-isel code. The primary purpose is to implement selection of loads and stores, but there is a lot of drag-along to support this. The common code to analyze addresses for both loads and stores is substantial. It's also necessary to add the materialization code for global values. Related to load-store processing is the code to fold loads into integer extends, since otherwise we generate lots of redundant instructions. We also need to add some overrides to some FastEmit routines to ensure we don't assign GPR 0 to a virtual register when this would change the meaning of an instruction. I added handling selection of a few binary arithmetic instructions, to enable committing some test cases I wrote a while back. Finally, ap couple of miscellaneous changes: * I cleaned up some poor style from a previous patch in PPCISelLowering.cpp, pointed out by David Blaikie. * I enlarged the Addr.Offset field to avoid sign problems with 32-bit offsets. llvm-svn: 189636	2013-08-30 02:29:45 +00:00
Manman Ren	0ed70aeb85	Debug Info: add an identifier field to DICompositeType. DICompositeType will have an identifier field at position 14. For now, the field is set to null in DIBuilder. For DICompositeTypes where the template argument field (the 13th field) was optional, modify DIBuilder to make sure the template argument field is set. Now DICompositeType has 15 fields. Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode. Update verifier to check that DICompositeType has 15 fields and the last field is null or a MDString. Update testing cases to include an extra field for DICompositeType. The identifier field will be used by type uniquing so a front end can genearte a DICompositeType with a unique identifer. llvm-svn: 189282	2013-08-26 22:39:55 +00:00
Bill Schmidt	d89f678cfd	[PowerPC] More fast-isel chunks (returns and integer extends) Incremental improvement to fast-isel for PPC64. This allows us to select on ret, sext, and zext. Filling in sext/zext improves some of the existing logic in handling compare-immediates that needed extends. A simplified return convention for fast-isel is also added to the PPC64 calling conventions. All call/return processing for DAG selection is handled with custom code, so there isn't an existing CC to rely on here. The include of PPCGenCallingConv.inc causes compiler warnings due to the 32-bit calling conventions that are not used, so the dummy function "usePPC32CCs()" is added here to silence those. Test cases for the return and extend logic are added. llvm-svn: 189266	2013-08-26 19:42:51 +00:00
Bill Schmidt	0300813d6a	[PowerPC] Add fast-isel branch and compare selection. First chunk of actual fast-isel selection code. This handles direct and indirect branches, as well as feeding compares for direct branches. PPCFastISel::PPCEmitIntExt() is just roughed in and will be expanded in a future patch. This also corrects a problem with selection for constant pool entries in JIT mode or with small code model. llvm-svn: 189202	2013-08-25 22:33:42 +00:00
Bill Wendling	187d3ddc50	Update to remove the no-frame-pointer-elim-non-leaf flag if it was set to 'false'. llvm-svn: 189068	2013-08-22 21:28:54 +00:00
Manman Ren	a2e9a98b06	TBAA: remove !tbaa from testing cases when they are not needed. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 188944	2013-08-21 22:20:53 +00:00
Hal Finkel	1cf48ab811	Don't form PPC CTR-based loops around a copysignl call copysign/copysignf never become function calls (because the SDAG expansion code does not lower to the corresponding function call, but rather directly implements the associated logic), but copysignl almost always is lowered into a call to the requested libm functon (and, thus, might clobber CTR). llvm-svn: 188727	2013-08-19 23:35:24 +00:00
Hal Finkel	e4eb78188c	Add ExpandFloatOp_FCOPYSIGN to handle ppcf128-related expansions We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the sum of two doubles, and the first double must have the larger magnitude, we can take the sign from the first double. As a result, in addition to fixing the crash, this is also an optimization. llvm-svn: 188655	2013-08-19 06:55:37 +00:00
Hal Finkel	dbc78e1f73	Add the PPC fcpsgn instruction Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653	2013-08-19 05:01:02 +00:00
Daniel Dunbar	9efbedfd35	[tests] Cleanup initialization of test suffixes. - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513	2013-08-16 00:37:11 +00:00
Hal Finkel	b3ca00d2a3	Actually fix PPC64 64-bit GPR inline asm constraint matching This is a follow-up to r187693, correcting that code to request the correct register class. The previous version, with the wrong register class, was not really correcting the constraints, but rather was removing them. Coincidentally, this fixed the failing test case in r187693, but obviously created other problems. llvm-svn: 188407	2013-08-14 20:05:04 +00:00
Tim Northover	501977eb7a	Fix FileCheck --check-prefix lines. Various tests had sprung up over the years which had --check-prefix=ABC on the RUN line, but "CHECK-ABC:" later on. This happened to work before, but was strictly incorrect. FileCheck is getting stricter soon though. Patch by Ron Ofir. llvm-svn: 188173	2013-08-12 12:43:26 +00:00
David Fang	b88cdf62f5	initial draft of PPCMachObjectWriter.cpp this records relocation entries in the mach-o object file for PIC code generation. tested on powerpc-darwin8, validated against darwin otool -rvV llvm-svn: 188004	2013-08-08 20:14:40 +00:00
Hal Finkel	2b7b2f373b	PPC: Map frin to round() not nearbyint() and rint() Making use of the recently-added ISD::FROUND, which allows for custom lowering of round(), the PPC backend will now map frin to round(). Previously, we had been using frin to lower nearbyint() (and rint() via some custom lowering to handle the extra fenv flags requirements), but only in fast-math mode because frin does not tie-to-even. Several users had complained about this behavior, and this new mapping of frin to round is certainly more appropriate (and does not require fast-math mode). In effect, this reverts r178362 (and part of r178337, replacing the nearbyint mapping with the round mapping). llvm-svn: 187960	2013-08-08 04:31:34 +00:00
Hal Finkel	11b9e452f6	Add PPC64 mulli pattern The PPC backend had been missing a pattern to generate mulli for 64-bit multiples. We had been generating it only for 32-bit multiplies. Unfortunately, generating li + mulld unnecessarily increases register pressure. llvm-svn: 187807	2013-08-06 17:03:03 +00:00
Hal Finkel	b176acb6b7	Fix PPC64 64-bit GPR inline asm constraint matching Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the 64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with explicit register names, on PPC64 when an i64 MVT has been requested, we need to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent) registers. At some point, we'll probably want to arrange things so that the generic code in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order to match these inline asm register constraints. If we do that, this change can be reverted. llvm-svn: 187693	2013-08-03 12:25:10 +00:00
Roman Divacky	c3825df87e	PPC32 va_list is an actual structure so va_copy needs to copy the whole structure not just a pointer. This implements that and thus fixes va_copy on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz! llvm-svn: 187158	2013-07-25 21:36:47 +00:00
Manman Ren	5873770238	Debug Info: improve the verifier to check field types. Make sure the context field of DIType is MDNode. Fix testing cases to make them pass the verifier. llvm-svn: 187150	2013-07-25 19:33:30 +00:00
Manman Ren	fdfc1ebfbc	Debug Info: improve the Finder. Improve the Finder to handle context of a DIVariable used by DbgValueInst. Fix testing cases to make them pass the verifier. llvm-svn: 187052	2013-07-24 17:10:09 +00:00
Stephen Lin	98cbca2e4d	Disambiguate function names in some CodeGen tests. (Some tests were using function names that also were names of instructions and/or doing other unusual things that were making the test not amenable to otherwise scriptable pattern matching.) No functionality change. llvm-svn: 186621	2013-07-18 22:29:15 +00:00
Hal Finkel	1860763c76	PPC: Support dynamic allocas with large alignment Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only just recently implemented fully). Now we can also support dynamic allocas with higher than the default target stack alignment (16 bytes). In order to round-up the requested size to the maximum requested alignment, we need an additional register to hold the rounded-up size. We're already using one scavenged register to hold the previous stack-pointer value (which needs to be stored with the signal-safe stdux update), and so when we have dynamic allocas and a large alignment, we allocate two emergency spill slots for the scavenger. llvm-svn: 186562	2013-07-18 04:28:21 +00:00
Hal Finkel	f05d6c7843	PPC: Add base-pointer support to builtin setjmp/longjmp First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj is implemented): instead of using r31 as the base pointer when it is not needed as a frame pointer, now the base pointer will always be r30 when needed. Second, we introduce another pseudo register, BP, which is used just like the FP pseudo register to refer to the base register before we know for certain what register it will be. Third, we now save BP into the jmp_buf, and restore r30 from that slot in longjmp. If the function that called setjmp did not use a base pointer, then r30 will be overwritten by the setjmp-calling-function's restore code. FP restoration (which is restored into r31) works the same way. llvm-svn: 186545	2013-07-17 23:50:51 +00:00
Hal Finkel	40f76d5830	PPC: Add CTR-register clobber to builtin setjmp Because the builtin longjmp implementation uses a CTR-based indirect jump, when the control flow arrives at the builtin setjmp call, the CTR register has necessarily been clobbered. Correspondingly, this adds CTR to the list of implicit definitions of the builtin setjmp pseudo instruction. We don't need to add CTR to the implicit definitions of builtin longjmp because, even though it does clobber the CTR register, the control flow cannot return to inside the loop unless there is also a builtin setjmp call. llvm-svn: 186488	2013-07-17 05:35:44 +00:00
Hal Finkel	a7c54e8cf4	PPC: Implement base pointer and stack realignment This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer support to function correctly. This implementation follows the strategy suggested by Dale Johannesen in r48188 where the following comment was added: This does not currently work, because the delta between old and new stack pointers is added to offsets that reference incoming parameters after the prolog is generated, and the code that does that doesn't handle a variable delta. You don't want to do that anyway; a better approach is to reserve another register that retains to the incoming stack pointer, and reference parameters relative to that. And now we do exactly that. If we don't need a frame pointer, then we use r31 as a base pointer. If we do need a frame pointer, then we use r30 as a base pointer. The base pointer retains the value of the stack pointer before it was decremented in the prologue. We then use the base pointer to resolve all negative frame indicies. The basic scheme follows that for base pointers in the X86 backend. We use a base pointer when we need to dynamically realign the incoming stack pointer. This currently applies only to static objects (dynamic allocas with large alignments, and base-pointer support in SjLj lowering will come in future commits). llvm-svn: 186478	2013-07-17 00:45:52 +00:00
Ulrich Weigand	1d4dbda5b9	[APFloat] PR16573: Avoid losing mantissa bits in ppc_fp128 to double truncation When truncating to a format with fewer mantissa bits, APFloat::convert will perform a right shift of the mantissa by the difference of the precision of the two formats. Usually, this will result in just the mantissa bits needed for the target format. One special situation is if the input number is denormal. In this case, the right shift may discard significant bits. This is usually not a problem, since truncating a denormal usually results in zero (underflow) after normalization anyway, since the result format's exponent range is usually smaller than the target format's. However, there is one case where the latter property does not hold: when truncating from ppc_fp128 to double. In particular, truncating a ppc_fp128 whose first double of the pair is denormal should result in just that first double, not zero. The current code however performs an excessive right shift, resulting in lost result bits. This is then caught in the APFloat::normalize call performed by APFloat::convert and causes an assertion failure. This patch checks for the scenario of truncating a denormal, and attempts to (possibly partially) replace the initial mantissa right shift by decrementing the exponent, if doing so will still result in a valid target format exponent. Index: test/CodeGen/PowerPC/pr16573.ll =================================================================== --- test/CodeGen/PowerPC/pr16573.ll (revision 0) +++ test/CodeGen/PowerPC/pr16573.ll (revision 0) @@ -0,0 +1,11 @@ +; RUN: llc < %s \| FileCheck %s + +target triple = "powerpc64-unknown-linux-gnu" + +define double @test() { + %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double + ret double %1 +} + +; CHECK: .quad -9111018957755033591 + Index: lib/Support/APFloat.cpp =================================================================== --- lib/Support/APFloat.cpp (revision 185817) +++ lib/Support/APFloat.cpp (working copy) @@ -1956,6 +1956,23 @@ X86SpecialNan = true; } + // If this is a truncation of a denormal number, and the target semantics + // has larger exponent range than the source semantics (this can happen + // when truncating from PowerPC double-double to double format), the + // right shift could lose result mantissa bits. Adjust exponent instead + // of performing excessive shift. + if (shift < 0 && isFiniteNonZero()) { + int exponentChange = significandMSB() + 1 - fromSemantics.precision; + if (exponent + exponentChange < toSemantics.minExponent) + exponentChange = toSemantics.minExponent - exponent; + if (exponentChange < shift) + exponentChange = shift; + if (exponentChange < 0) { + shift -= exponentChange; + exponent += exponentChange; + } + } + // If this is a truncation, perform the shift before we narrow the storage. if (shift < 0 && (isFiniteNonZero() \|\| category==fcNaN)) lostFraction = shiftRight(significandParts(), oldPartCount, -shift); llvm-svn: 186409	2013-07-16 13:03:25 +00:00
Hal Finkel	8e8618ae5c	Fix register subclass handling in PPCInstrInfo::insertSelect PPCInstrInfo::insertSelect and PPCInstrInfo::canInsertSelect were computing the common subclass of the true and false inputs, and then selecting either the 32-bit or the 64-bit isel variant based on the result of calling PPC::GPRCRegClass.hasSubClassEq(RC) and PPC::G8RCRegClass.hasSubClassEq(RC) (where RC is the common subclass). Unfortunately, this is not quite right: if we have something like this: %vreg8<def> = SELECT_CC_I8 %vreg4<kill>, %vreg7<kill>, %vreg6<kill>, 76; G8RC_and_G8RC_NOX0:%vreg8 CRRC:%vreg4 G8RC_NOX0:%vreg7,%vreg6 then the common subclass of G8RC_and_G8RC_NOX0 and G8RC_NOX0 is G8RC_NOX0, and G8RC_NOX0 is not a subclass of G8RC (because it also contains the ZERO8 pseudo-register). As a result, we also need to check the common subclass against GPRC_NOR0 and G8RC_NOX0 explicitly. This had not been a problem for clients of insertSelect that called canInsertSelect first (because it had a compensating mistake), but insertSelect is also used by the PPC pseudo-instruction expander, and this error was causing a problem in that context. This problem was found by csmith. llvm-svn: 186343	2013-07-15 20:22:58 +00:00
Hal Finkel	2f5e8e3d95	Remove invalid assert in DAGTypeLegalizer::RemapValue There is a comment at the top of DAGTypeLegalizer::PerformExpensiveChecks which, in part, says: // Note that these invariants may not hold momentarily when processing a node: // the node being processed may be put in a map before being marked Processed. Unfortunately, this assert would be valid only if the above-mentioned invariant held unconditionally. This was causing llc to assert when, in fact, everything was fine. Thanks to Richard Sandiford for investigating this issue! Fixes PR16562. llvm-svn: 186338	2013-07-15 18:57:05 +00:00
Stephen Lin	d24ab20e9b	Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$.$$[A-Za-z0-9_-]$:$ $$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;$.$-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;$.$-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;$.$-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;$.*$-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280	2013-07-14 06:24:09 +00:00
Stephen Lin	552c915e84	Convert Windows to Unix line endings, no functionality change. llvm-svn: 186264	2013-07-13 22:08:55 +00:00
Stephen Lin	f799e3f944	Convert CodeGen//.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion. This was done with the following sed invocation to catch label lines demarking function boundaries: sed -i '' "s/^;$ $$[A-Z0-9_]$:$ $test$[A-Za-z0-9_-]$:$ $$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen//*.ll which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct. llvm-svn: 186258	2013-07-13 20:38:47 +00:00
Stephen Lin	764d8d3d6f	Start using CHECK-LABEL in some tests. llvm-svn: 186163	2013-07-12 14:54:12 +00:00
Hal Finkel	4715081787	PPC: Add some missing V_SET0 patterns We had patterns to match v4i32 immAllZerosV -> V_SET0, but not patterns for v8i16 (which occurs in the test case) or v16i8. The same was true for V_SETALLONES (so I added the associated patterns for those as well). Another bug found by llvm-stress. llvm-svn: 186108	2013-07-11 17:43:32 +00:00
Hal Finkel	ff3ea8060c	PPCDAGToDAGISel::isRunOfOnes should return false on zero This fixes a bug (found by csmith) at -O0 where we attempt to create a RLWIMI with an out-of-range operand. Most uses of the isRunOfOnes function are guarded by a condition that the value is not zero. This was not true in two places, and in both places a zero input would result in an out-of-rage MB value (= 32). To fix this, isRunOfOnes returns false on a zero input (and I've remove one now-redundant guard). llvm-svn: 186101	2013-07-11 16:31:51 +00:00
Hal Finkel	743b194084	RegScavenger should not exclude undef uses When computing currently-live registers, the register scavenger excludes undef uses. As a result, undef uses are ignored when computing the restore points of registers spilled into the emergency slots. While the register scavenger normally excludes from consideration, when scavenging, registers used by the current instruction, we need to not exclude undef uses. Otherwise, we might end up requiring more emergency spill slots than we have (in the case where the undef use is the currently-spilled register). Another bug found by llvm-stress. llvm-svn: 186067	2013-07-11 05:55:57 +00:00
Hal Finkel	e4dd5c29f0	WidenVecRes_BUILD_VECTOR must use the first operand's type Because integer BUILD_VECTOR operands may have a larger type than the result's vector element type, and all operands must have the same type, when widening a BUILD_VECTOR node by adding UNDEFs, we cannot use the vector element type, but rather must use the type of the existing operands. Another bug found by llvm-stress. llvm-svn: 185960	2013-07-09 18:55:10 +00:00
Bill Schmidt	4122169308	[PowerPC] Better fix for PR16556. A more complete example of the bug in PR16556 was recently provided, showing that the previous fix was not sufficient. The previous fix is reverted herein. The real problem is that ReplaceNodeResults() uses LowerFP_TO_INT as custom lowering for FP_TO_SINT during type legalization, without checking whether the input type is handled by that routine. LowerFP_TO_INT requires the input to be f32 or f64, so we fail when the input is ppcf128. I'm leaving the test case from the initial fix (r185821) in place, and adding the new test as another crash-only check. llvm-svn: 185959	2013-07-09 18:50:20 +00:00
Stephen Lin	73de7bf5de	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. llvm-svn: 185956	2013-07-09 18:16:56 +00:00
Hal Finkel	ff666bd962	Don't crash in SE dealing with ashr x, -1 ScalarEvolution::getSignedRange uses ComputeNumSignBits from ValueTracking on ashr instructions. ComputeNumSignBits can return zero, but this case was not handled correctly by the code in getSignedRange which was calling: APInt::getSignedMinValue(BitWidth).ashr(NS - 1) with NS = 0, resulting in an assertion failure in APInt::ashr. Now, we just return the conservative result (as with NS == 1). Another bug found by llvm-stress. llvm-svn: 185955	2013-07-09 18:16:16 +00:00
Hal Finkel	6c29bd9088	DAGCombine tryFoldToZero cannot create illegal types after type legalization When folding sub x, x (and other similar constructs), where x is a vector, the result is a vector of zeros. After type legalization, make sure that the input zero elements have a legal type. This type may be larger than the result's vector element type. This was another bug found by llvm-stress. llvm-svn: 185949	2013-07-09 17:02:45 +00:00
Ulrich Weigand	52cf8e4488	[PowerPC] Revert r185476 and fix up TLS variant kinds In the commit message to r185476 I wrote: >The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD >correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. >This causes some confusion with the asm parser, since VK_PPC_TLSGD >is output as @tlsgd, which is then read back in as VK_TLSGD. > >To avoid this confusion, this patch removes the PowerPC-specific >modifiers and uses the generic modifiers throughout. (The only >drawback is that the generic modifiers are printed in upper case >while the usual convention on PowerPC is to use lower-case modifiers. >But this is just a cosmetic issue.) This was unfortunately incorrect, there is is fact another, serious drawback to using the default VK_TLSLD/VK_TLSGD variant kinds: using these causes ELFObjectWriter::RelocNeedsGOT to return true, which in turn causes the ELFObjectWriter to emit an undefined reference to _GLOBAL_OFFSET_TABLE_. This is a problem on powerpc64, because it uses the TOC instead of the GOT, and the linker does not provide _GLOBAL_OFFSET_TABLE_, so the symbol remains undefined. This means shared libraries using TLS built with the integrated assembler are currently broken. While the whole RelocNeedsGOT / _GLOBAL_OFFSET_TABLE_ situation probably ought to be properly fixed at some point, for now I'm simply reverting the r185476 commit. Now this in turn exposes the breakage of handling @tlsgd/@tlsld in the asm parser that this check-in was originally intended to fix. To avoid this regression, I'm also adding a different fix for this problem: while common code now parses @tlsgd as VK_TLSGD, a special hack in the asm parser translates this code to the platform-specific VK_PPC_TLSGD that the back-end now expects. While this is not really pretty, it's self-contained and shouldn't hurt anything else for now. One the underlying problem is fixed, this hack can be reverted again. llvm-svn: 185945	2013-07-09 16:41:09 +00:00
Hal Finkel	dbbf09b28e	PPC: Allocate RS spill slot for unaligned i64 load/store This fixes another bug found by llvm-stress! If we happen to be doing an i64 load or store into a stack slot that has less than a 4-byte alignment, then the frame-index elimination may need to use an indexed load or store instruction (because the offset may not be a multiple of 4, a requirement of the STD/LD instructions). The extra register needed to hold the offset comes from the register scavenger, and it is possible that the scavenger will need to use an emergency spill slot. As a result, we need to make sure that a spill slot is allocated when doing an i64 load/store into a less-than-4-byte-aligned stack slot. Because test cases for things like this tend to be fairly fragile, I've concatenated a few small bugpoint-reduced test cases together to form the regression test. llvm-svn: 185907	2013-07-09 06:34:51 +00:00
Ulrich Weigand	266db7fe04	[PowerPC] Always use "assembler dialect" 1 A setting in MCAsmInfo defines the "assembler dialect" to use. This is used by common code to choose between alternatives in a multi-alternative GNU inline asm statement like the following: __asm__ ("{sfe\|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2)); The meaning of these dialects is platform specific, and GCC defines those for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for new-style (PowerPC) mnemonics, like in the example above. To be compatible with inline asm used with GCC, LLVM ought to do the same. Specifically, this means we should always use assembler dialect 1 since old-style mnemonics really aren't supported on any current platform. However, the current LLVM back-end uses: AssemblerDialect = 1; // New-Style mnemonics. in PPCMCAsmInfoDarwin, and AssemblerDialect = 0; // Old-Style mnemonics. in PPCLinuxMCAsmInfo. The Linux setting really isn't correct, we should be using new-style mnemonics everywhere. This is changed by this commit. Unfortunately, the setting of this variable is overloaded in the back-end to decide whether or not we are on a Darwin target. This is done in PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect to 1 for both Darwin and Linux no longer allows us to make this distinction. Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter to distinguish Darwin targets, and ignores the SyntaxVariant parameter. As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs to be passed in by the caller when creating a target MCExpr. (To do so this patch implicitly also reverts commit 184441.) llvm-svn: 185858	2013-07-08 20:20:51 +00:00
Hal Finkel	21ada79757	PPC: Mark vector CC action for SETO and SETONE as Expand Another bug found by llvm-stress! This fixes hitting llvm_unreachable("Invalid integer vector compare condition"); at the end of getVCmpInst in PPCISelDAGToDAG. llvm-svn: 185855	2013-07-08 20:00:03 +00:00
Hal Finkel	e39302258e	PPC: Mark vector FREM as Expand by default Another bug found by llvm-stress! This fixes crashing with: LLVM ERROR: Cannot select: v4f32 = frem ... llvm-svn: 185840	2013-07-08 17:30:25 +00:00
Bill Schmidt	2db29ef467	[PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT). PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be either an f32 or f64, but this is not checked. A long double (ppcf128) operand will normally be custom-lowered to a conversion to f64 in this context. However, this isn't the case for an UNDEF node. This patch recognizes a ppcf128 as a legal source operand for FP_TO_INT only if it's an undef, in which case it creates an undef of the target type. At some point we might want to do a wholesale custom lowering of ISD::UNDEF when the type is ppcf128, but it's not really clear that's a great idea, and probably more work than it's worth for a situation that only arises in the case of a programming error. At this point I think simple is best. The test case comes from PR16556, and is a crash-test only. llvm-svn: 185821	2013-07-08 14:22:45 +00:00
Hal Finkel	8cb9a0e1d3	Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors This fixes a bug (found by llvm-stress) in DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result type would always be larger than the original operands. This is not always true, however, with boolean vectors. For example, promoting a node of type v8i1 (where the operands will be of type i32, the type to which i1 is promoted) will yield a node with a result vector element type of i16 (and operands of type i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands to the result type. llvm-svn: 185794	2013-07-08 06:16:58 +00:00
Ulrich Weigand	49f487e6cd	[PowerPC] Use mtocrf when available Just as with mfocrf, it is also preferable to use mtocrf instead of mtcrf when only a single CR register is to be written. Current code however always emits mtcrf. This probably does not matter when using an external assembler, since the GNU assembler will in fact automatically replace mtcrf with mtocrf when possible. It does create inefficient code with the integrated assembler, however. To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and uses those instead of MTCRF/MTCRF8 everything. Just as done in the MFOCRF patch committed as 185556, these patterns will be converted back to MTCRF if MTOCRF is not available on the machine. As a side effect, this allows to modify the MTCRF pattern to accept the full range of mask operands for the benefit of the asm parser. llvm-svn: 185561	2013-07-03 17:59:07 +00:00
Ulrich Weigand	4050995650	[PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. This causes some confusion with the asm parser, since VK_PPC_TLSGD is output as @tlsgd, which is then read back in as VK_TLSGD. To avoid this confusion, this patch removes the PowerPC-specific modifiers and uses the generic modifiers throughout. (The only drawback is that the generic modifiers are printed in upper case while the usual convention on PowerPC is to use lower-case modifiers. But this is just a cosmetic issue.) llvm-svn: 185476	2013-07-02 21:29:06 +00:00
Hal Finkel	52727c6b82	Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409	2013-07-02 03:39:34 +00:00
Bill Schmidt	48fc20a034	Index: test/CodeGen/PowerPC/reloc-align.ll =================================================================== --- test/CodeGen/PowerPC/reloc-align.ll (revision 0) +++ test/CodeGen/PowerPC/reloc-align.ll (revision 0) @@ -0,0 +1,34 @@ +; RUN: llc -mcpu=pwr7 -O1 < %s \| FileCheck %s + +; This test verifies that the peephole optimization of address accesses +; does not produce a load or store with a relocation that can't be +; satisfied for a given instruction encoding. Reduced from a test supplied +; by Hal Finkel. + +target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64" +target triple = "powerpc64-unknown-linux-gnu" + +%struct.S1 = type { [8 x i8] } + +@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1 + +; Function Attrs: nounwind readonly +define signext i32 @main() #0 { +entry: + %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1)) +; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l + ret i32 %call +} + +; Function Attrs: nounwind readonly +define internal fastcc signext i32 @func_90(%struct.S1 byval nocapture %p_91) #0 { +entry: + %0 = bitcast %struct.S1* %p_91 to i64* + %bf.load = load i64* %0, align 1 + %bf.shl = shl i64 %bf.load, 26 + %bf.ashr = ashr i64 %bf.shl, 54 + %bf.cast = trunc i64 %bf.ashr to i32 + ret i32 %bf.cast +} + +attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } Index: lib/Target/PowerPC/PPCAsmPrinter.cpp =================================================================== --- lib/Target/PowerPC/PPCAsmPrinter.cpp (revision 185327) +++ lib/Target/PowerPC/PPCAsmPrinter.cpp (working copy) @@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI OutStreamer.EmitRawText(StringRef("\tmsync")); return; } + break; + case PPC::LD: + case PPC::STD: + case PPC::LWA: { + // Verify alignment is legal, so we don't create relocations + // that can't be supported. + // FIXME: This test is currently disabled for Darwin. The test + // suite shows a handful of test cases that fail this check for + // Darwin. Those need to be investigated before this sanity test + // can be enabled for those subtargets. + if (!Subtarget.isDarwin()) { + unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1; + const MachineOperand &MO = MI->getOperand(OpNum); + if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4) + llvm_unreachable("Global must be word-aligned for LD, STD, LWA!"); + } + // Now process the instruction normally. + break; } + } LowerPPCMachineInstrToMCInst(MI, TmpInst, this); OutStreamer.EmitInstruction(TmpInst); Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp =================================================================== --- lib/Target/PowerPC/PPCISelDAGToDAG.cpp (revision 185327) +++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp (working copy) @@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() { if (GlobalAddressSDNode GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) { SDLoc dl(GA); const GlobalValue GV = GA->getGlobal(); + // We can't perform this optimization for data whose alignment + // is insufficient for the instruction encoding. + if (GV->getAlignment() < 4 && + (StorageOpcode == PPC::LD \|\| StorageOpcode == PPC::STD \|\| + StorageOpcode == PPC::LWA)) { + DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n"); + continue; + } ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags); } else if (ConstantPoolSDNode CP = dyn_cast<ConstantPoolSDNode>(ImmOpnd)) { llvm-svn: 185380	2013-07-01 20:52:27 +00:00
Cameron Zwarich	867bfcd546	Fix PR16508. When phis get lowered, destination copies are inserted using an iterator that is determined once for all phis in the block, which BuildMI interprets as a request to insert an instruction directly before the iterator. In the case of a cyclic phi, source copies may also be inserted directly before this iterator, which can cause source copies to be inserted before destination copies. The fix is to keep an iterator to the last phi and then advance it while lowering each phi in order to insert destination copies directly after the phis. llvm-svn: 185363	2013-07-01 19:42:46 +00:00
Hal Finkel	25e4a0d418	Don't form PPC CTR loops for over-sized exit counts Although you can't generate this from C on PPC64, if you have a loop using a 64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had been cauing the PPCCTRLoops pass to assert. Thanks to Joerg Sonnenberger for providing a test case! llvm-svn: 185361	2013-07-01 19:34:59 +00:00
Hal Finkel	ac1a24b508	PPC: Ignore spill/restore requests for VRSAVE (except on Darwin) This fixes PR16418, which reports that a function calling __builtin_unwind_init() asserts. The cause is that this generates a spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is only really used on Darwin). The test case checks only that we don't crash. We can add correctness checks once someone verifies what behavior the function is supposed to have. llvm-svn: 185235	2013-06-28 22:29:56 +00:00
Hal Finkel	147c287d91	Fix CodeGen/PowerPC/stack-protector.ll on OpenBSD On OpenBSD, the stack-smash protection transform uses "__guard_local" and "__stack_smash_handler" instead of "__stack_chk_guard" and "__stack_chk_fail". However, CodeGen/PowerPC/stack-protector.ll doesn't specify a target OS, so on OpenBSD it fails. Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While there, convert to FileCheck. Patch by Matthew Dempsky. llvm-svn: 185206	2013-06-28 20:18:14 +00:00
Hal Finkel	4ca70100de	Fix a PPC rlwimi instruction-selection bug Under certain (evidently rare) circumstances, this code used to convert OR(a, AND(x, y)) into OR(a, x). This was incorrect. While there, I've added a comment to the code immediately above. llvm-svn: 185201	2013-06-28 20:00:07 +00:00
Bill Schmidt	4a28e82743	[PowerPC] Disable fast-isel for existing -O0 tests for PowerPC. This is a preliminary patch for fast instruction selection on PowerPC. Code generation can differ between DAG isel and fast isel. Existing tests that specify -O0 were written to expect DAG isel. Make this explicit by adding -fast-isel=false to the tests. In some cases specifying -fast-isel=false produces different code even when there isn't a fast instruction selector specified. This is because TM.Options.EnableFastISel = 1 at -O0 whether or not a FastISel object exists. Thus disabling fast isel can actually produce less conservative code. Because of this, some of the expected code generation in the -O0 tests needs to be adjusted. In particular, handling of function arguments is less conservative with -fast-isel=false (see isOnlyUsedInEntryBlock() in SelectionDAGBuilder.cpp). This results in fewer stack accesses and, in some cases, reduced stack size as uselessly loaded values are no longer stored back to spill locations in the stack. No functional change with this patch; test case adjustments only. llvm-svn: 183939	2013-06-13 20:23:34 +00:00
Hal Finkel	fa5f6f7440	Disallow i64 div/rem in PPC32 counter loops On PPC32, [su]div,rem on i64 types are transformed into runtime library function calls. As a result, they are not allowed in counter-based loops (the counter-loops verification pass caught this error; this change fixes PR16169). llvm-svn: 183581	2013-06-07 22:16:19 +00:00
Rafael Espindola	4f60a38f18	Change how we iterate over relocations on ELF. For COFF and MachO, sections semantically have relocations that apply to them. That is not the case on ELF. In relocatable objects (.o), a section with relocations in ELF has offsets to another section where the relocations should be applied. In dynamic objects and executables, relocations don't have an offset, they have a virtual address. The section sh_info may or may not point to another section, but that is not actually used for resolving the relocations. This patch exposes that in the ObjectFile API. It has the following advantages: * Most (all?) clients can handle this more efficiently. They will normally walk all relocations, so doing an effort to iterate in a particular order doesn't save time. * llvm-readobj now prints relocations in the same way the native readelf does. * probably most important, relocations that don't point to any section are now visible. This is the case of relocations in the rela.dyn section. See the updated relocation-executable.test for example. llvm-svn: 182908	2013-05-30 03:05:14 +00:00
Hal Finkel	7d8a691b5d	Prefer to duplicate PPC Altivec loads when expanding unaligned loads When expanding unaligned Altivec loads, we use the decremented offset trick to prevent page faults. Unfortunately, if we have a sequence of consecutive unaligned loads, this leads to suboptimal code generation because the 'extra' load from the first unaligned load can be combined with the base load from the second (but only if the decremented offset trick is not used for the first). Search up and down the chain, through loads and token factors, looking for consecutive loads, and if one is found, don't use the offset reduction trick. These duplicate loads are later combined to yield the desired sequence (in the future, we might want a more-powerful chain search, but that will require some changes to allow the combiner routines to access the AA object). This should complete the initial implementation of the optimized unaligned Altivec load expansion. There is some refactoring that should be done, but that will happen when the unaligned store expansion is added. llvm-svn: 182719	2013-05-26 18:08:30 +00:00
Hal Finkel	bc2ee4c4e6	PPC: Combine duplicate (offset) lvsl Altivec intrinsics The lvsl permutation control instruction is a function only of the alignment of the pointer operand (relative to the 16-byte natural alignment of Altivec vectors). As a result, multiple lvsl intrinsics where the operands differ by a multiple of 16 can be combined. llvm-svn: 182708	2013-05-25 04:05:05 +00:00
Hal Finkel	cf2e908014	PPC: Initial support for permutation-based unaligned Altivec loads Altivec only directly supports aligned loads, but the loads have a strange property: If given an unaligned address, they truncate the address to the next lower aligned address, and load from there. This property, along with an extra load and some special-purpose permutation-control instructions that generate the appropriate permutations from the original unaligned address, allow efficient lowering of aligned loads. This code uses the trick explained in the Apple Velocity Engine optimization overview document to prevent the needed extra load from possibly causing a page fault if the original address happens to be aligned. As noted in the FIXMEs, there are several additional optimizations that can be performed to reduce the cost of these loads even more. These will be implemented in future commits. llvm-svn: 182691	2013-05-24 23:00:14 +00:00
Hal Finkel	2f474f0e8a	Check InlineAsm clobbers in PPCCTRLoops We don't need to reject all inline asm as using the counter register (most does not). Only those that explicitly clobber the counter register need to prevent the transformation. llvm-svn: 182191	2013-05-18 09:20:39 +00:00
Hal Finkel	778c73c56c	Fix cpu on test CodeGen/PowerPC/ctrloop-fp64.ll We need ppc instead of generic to override native features on ppc machines. llvm-svn: 182049	2013-05-16 20:28:05 +00:00
Hal Finkel	5f587c59a5	Create an new preheader in PPCCTRLoops to avoid counter register clobbers Some IR-level instructions (such as FP <-> i64 conversions) are not chained w.r.t. the mtctr intrinsic and yet may become function calls that clobber the counter register. At the selection-DAG level, these might be reordered with the mtctr intrinsic causing miscompiles. To avoid this situation, if an existing preheader has instructions that might use the counter register, create a new preheader for the mtctr intrinsic. This extra block will be remerged with the old preheader at the MI level, but will prevent unwanted reordering at the selection-DAG level. llvm-svn: 182045	2013-05-16 19:58:38 +00:00
Ulrich Weigand	9d980cbdb9	[PowerPC] Use true offset value in "memrix" machine operands This is the second part of the change to always return "true" offset values from getPreIndexedAddressParts, tackling the case of "memrix" type operands. This is about instructions like LD/STD that only have a 14-bit field to encode immediate offsets, which are implicitly extended by two zero bits by the machine, so that in effect we can access 16-bit offsets as long as they are a multiple of 4. The PowerPC back end currently handles such instructions by carrying the 14-bit value (as it will get encoded into the actual machine instructions) in the machine operand fields for such instructions. This means that those values are in fact not the true offset, but rather the offset divided by 4 (and then truncated to an unsigned 14-bit value). Like in the case fixed in r182012, this makes common code operations on such offset values not work as expected. Furthermore, there doesn't really appear to be any strong reason why we should encode machine operands this way. This patch therefore changes the encoding of "memrix" type machine operands to simply contain the "true" offset value as a signed immediate value, while enforcing the rules that it must fit in a 16-bit signed value and must also be a multiple of 4. This change must be made simultaneously in all places that access machine operands of this type. However, just about all those changes make the code simpler; in many cases we can now just share the same code for memri and memrix operands. llvm-svn: 182032	2013-05-16 17:58:02 +00:00
Hal Finkel	47db66d43f	PPC32 cannot form counter loops around i64 FP conversions On PPC32, i64 FP conversions are implemented using runtime calls (which clobber the counter register). These must be excluded. llvm-svn: 182023	2013-05-16 16:52:41 +00:00
Bill Schmidt	22f9191979	Use new CHECK-DAG support to stabilize CodeGen/PowerPC/recipest.ll While testing some experimental code to add vector-scalar registers to PowerPC, I noticed that a couple of independent instructions were flipped by the scheduler. The new CHECK-DAG support is perfect for avoiding this problem. llvm-svn: 182020	2013-05-16 16:15:18 +00:00
Ulrich Weigand	7aa76b6a07	[PowerPC] Report true displacement value from getPreIndexedAddressParts DAGCombiner::CombineToPreIndexedLoadStore calls a target routine to decompose a memory address into a base/offset pair. It expects the offset (if constant) to be the true displacement value in order to perform optional additional optimizations; in particular, to convert other uses of the original pointer into uses of the new base pointer after pre-increment. The PowerPC implementation of getPreIndexedAddressParts, however, simply calls SelectAddressRegImm, which returns a TargetConstant. This value is appropriate for encoding into the instruction, but it is not always usable as true displacement value: - Its type is always MVT::i32, even on 64-bit, where addresses ought to be i64 ... this causes the optimization to simply always fail on 64-bit due to this line in DAGCombiner: // FIXME: In some cases, we can be smarter about this. if (Op1.getValueType() != Offset.getValueType()) { - Its value is truncated to an unsigned 16-bit value if negative. This causes the above opimization to generate wrong code. This patch fixes both problems by simply returning the true displacement value (in its original type). This doesn't affect any other user of the displacement. llvm-svn: 182012	2013-05-16 14:53:05 +00:00
Rafael Espindola	c533b559d0	Extend test to check the .cfi instructions. I am about to refactor the calls to addFrameMove and some of the ppc ones were not being tested. llvm-svn: 182009	2013-05-16 14:30:09 +00:00
Rafael Espindola	a5c7ceedb5	Extend test for better coverage. Without this change nothing was covering this addFrameMove: // For 64-bit SVR4 when we have spilled CRs, the spill location // is SP+8, not a frame-relative slot. if (Subtarget.isSVR4ABI() && Subtarget.isPPC64() && (PPC::CR2 <= Reg && Reg <= PPC::CR4)) { MachineLocation CSDst(PPC::X1, 8); MachineLocation CSSrc(PPC::CR2); MMI.addFrameMove(Label, CSDst, CSSrc); continue; } llvm-svn: 181976	2013-05-16 03:48:50 +00:00
Hal Finkel	25c1992bc7	Implement PPC counter loops as a late IR-level pass The old PPCCTRLoops pass, like the Hexagon pass version from which it was derived, could only handle some simple loops in canonical form. We cannot directly adapt the new Hexagon hardware loops pass, however, because the Hexagon pass contains a fundamental assumption that non-constant-trip-count loops will contain a guard, and this is not always true (the result being that incorrect negative counts can be generated). With this commit, we replace the pass with a late IR-level pass which makes use of SE to calculate the backedge-taken counts and safely generate the loop-count expressions (including any necessary max() parts). This IR level pass inserts custom intrinsics that are lowered into the desired decrement-and-branch instructions. The most fragile part of this new implementation is that interfering uses of the counter register must be detected on the IR level (and, on PPC, this also includes any indirect branches in addition to function calls). Also, to make all of this work, we need a variant of the mtctr instruction that is marked as having side effects. Without this, machine-code level CSE, DCE, etc. illegally transform the resulting code. Hopefully, this can be improved in the future. This new pass is smaller than the original (and much smaller than the new Hexagon hardware loops pass), and can handle many additional cases correctly. In addition, the preheader-creation code has been copied from LoopSimplify, and after we decide on where it belongs, this code will be refactored so that it can be explicitly shared (making this implementation even smaller). The new test-case files ctrloop-{le,lt,ne}.ll have been adapted from tests for the new Hexagon pass. There are a few classes of loops that this pass does not transform (noted by FIXMEs in the files), but these deficiencies can be addressed within the SE infrastructure (thus helping many other passes as well). llvm-svn: 181927	2013-05-15 21:37:41 +00:00
Bill Schmidt	ef3d1a24ed	PPC32: Fix stack collision between FP and CR save areas. The changes to CR spill handling missed a case for 32-bit PowerPC. The code in PPCFrameLowering::processFunctionBeforeFrameFinalized() checks whether CR spill has occurred using a flag in the function info. This flag is only set by storeRegToStackSlot and loadRegFromStackSlot. spillCalleeSavedRegisters does not call storeRegToStackSlot, but instead produces MI directly. Thus we don't see the CR is spilled when assigning frame offsets, and the CR spill ends up colliding with some other location (generally the FP slot). This patch sets the flag in spillCalleeSavedRegisters for PPC32 so that the CR spill is properly detected and gets its own slot in the stack frame. llvm-svn: 181800	2013-05-14 16:08:32 +00:00
Bill Schmidt	22d40dcfe9	PPC64: Constant initializers with dynamic relocations go in .data.rel.ro. This fixes warning messages observed in the oggenc application test in projects/test-suite. Special handling is needed for the 64-bit PowerPC SVR4 ABI when a constant is initialized with a pointer to a function in a shared library. Because a function address is implemented as the address of a function descriptor, the use of copy relocations can lead to problems with initialization. GNU ld therefore replaces copy relocations with dynamic relocations to be resolved by the dynamic linker. This means the constant cannot reside in the read-only data section, but instead belongs in .data.rel.ro, which is designed for constants containing dynamic relocations. The implementation creates a class PPC64LinuxTargetObjectFile inheriting from TargetLoweringObjectFileELF, which behaves like its parent except to place constants of this sort into .data.rel.ro. The test case is reduced from the oggenc application. llvm-svn: 181723	2013-05-13 19:34:37 +00:00
Bill Schmidt	38b6cb51bc	Fix handling of anonymous aggregate parameters for powerpc*-apple-darwin8. This fixes bug 15821 similarly to the powerpc64-linux fix for bug 14779. Patch by David Fang. llvm-svn: 181449	2013-05-08 17:22:33 +00:00
Hal Finkel	08e53ee551	PPCInstrInfo::optimizeCompareInstr should not optimize FP compares The floating-point record forms on PPC don't set the condition register bits based on a comparison with zero (like the integer record forms do), but rather based on the exception status bits. llvm-svn: 181423	2013-05-08 12:16:14 +00:00
Hal Finkel	7153251ab5	LocalStackSlotAllocation improvements First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created. Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway. Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll Jim has okayed this off-list. llvm-svn: 180799	2013-04-30 20:04:37 +00:00
Manman Ren	1a5ff287fd	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796	2013-04-30 17:52:57 +00:00
Rafael Espindola	1357ab74e5	Make all darwin ppc stubs local. This fixes pr15763. Patch by David Fang. llvm-svn: 180657	2013-04-27 00:43:16 +00:00
Hal Finkel	e632239d7b	Fix PPC optimizeCompareInstr swapped-sub argument handling When matching a compare with a subtract where the arguments of the compare are swapped w.r.t. the arguments of the subtract, we need to negate the predicates (or CR bit indices) of the users. This, however, is not the same as inverting the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM backend seems to do this correctly, but when I adapted the code for the PPC backend, I introduced an error in this logic. Comparison optimization is now enabled again by default. llvm-svn: 179899	2013-04-19 22:08:38 +00:00
Hal Finkel	b12da6be75	Disable PPC comparison optimization by default This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do I'm disabling this for now. llvm-svn: 179807	2013-04-18 22:54:25 +00:00
Hal Finkel	82656cb200	Implement optimizeCompareInstr for PPC Many PPC instructions have a so-called 'record form' which stores to a specific condition register the result of comparing the result of the instruction with zero (always as a signed comparison). For integer operations on PPC64, this is always a 64-bit comparison. This implementation is derived from the implementation in the ARM backend; there are some differences because PPC condition registers are allocatable virtual registers (although the record forms always use a specific one), and we look for a matching subtraction instruction after the compare (but before the first use) in addition to before it. llvm-svn: 179802	2013-04-18 22:15:08 +00:00
Hal Finkel	6736988ae2	Fix PPC64 CR spill location for callee-saved registers This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition registers, the spill location is specified relative to the stack pointer (SP + 8). However, this is not relative to the SP after the new stack frame is established, but instead relative to the caller's stack pointer (it is stored into the linkage area of the parent's stack frame). So, like with the link register, we don't directly spill the CRs with other callee-saved registers, but just mark them to be spilled during prologue generation. In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32). llvm-svn: 179500	2013-04-15 02:07:05 +00:00
Hal Finkel	d85a04b3df	Spill and restore PPC CR registers using the FP when we have one For functions that need to spill CRs, and have dynamic stack allocations, the value of the SP during the restore is not what it was during the save, and so we need to use the FP in these cases (as for all of the other spills and restores, but the CR restore has a special code path because its reserved slot, like the link register, is specified directly relative to the adjusted SP). llvm-svn: 179457	2013-04-13 08:09:20 +00:00
Nico Rieck	ba848e3bca	Replace coff-/elf-dump with llvm-readobj llvm-svn: 179361	2013-04-12 04:06:46 +00:00
Benjamin Kramer	3960c1cd56	FileCheckize a bunch of tests. llvm-svn: 179276	2013-04-11 12:32:23 +00:00
Hal Finkel	95081bff72	Manually remove successors in if conversion when CopyAndPredicateBlock is used In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is used because the to-be-predicated block has other predecessors, we need to explicitly remove the old copied block from the successors list. Normally if conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges to cleanup the successors list, but if the predicated block contained an un-analyzable branch (such as a now-predicated return), then this will fail. These extra successors were causing a problem on PPC because it was causing later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in the code. llvm-svn: 179227	2013-04-10 22:05:25 +00:00
Hal Finkel	5711eca19c	Allow PPC B and BLR to be if-converted into some predicated forms This enables us to form predicated branches (which are the same conditional branches we had before) and also a larger set of predicated returns (including instructions like bdnzlr which is a conditional return and loop-counter decrement all in one). At the moment, if conversion does not capture all possible opportunities. A simple example is provided in early-ret2.ll, where if conversion forms one predicated return, and then the PPCEarlyReturn pass picks up the other one. So, at least for now, we'll keep both mechanisms. llvm-svn: 179134	2013-04-09 22:58:37 +00:00
Hal Finkel	b5899d5774	Use virtual base registers on PPC On PowerPC, non-vector loads and stores have r+i forms; however, in functions with large stack frames these were not being used to access slots far from the stack pointer because such slots were out of range for the signed 16-bit immediate offset field. This increases register pressure because we need a separate register for each offset (when the r+r form is used). By enabling virtual base registers, we can deal with large stack frames without unduly increasing register pressure. llvm-svn: 179105	2013-04-09 17:27:09 +00:00
Hal Finkel	059825b0f8	Convert test PowerPC/2007-09-07-LoadStoreIdxForms to FileCheck llvm-svn: 179104	2013-04-09 17:26:55 +00:00
Hal Finkel	b5aa7e54d9	Generate PPC early conditional returns PowerPC has a conditional branch to the link register (return) instruction: BCLR. This should be used any time when we'd otherwise have a conditional branch to a return. This adds a small pass, PPCEarlyReturn, which runs just prior to the branch selection pass (and, importantly, after block placement) to generate these conditional returns when possible. It will also eliminate unconditional branches to returns (these happen rarely; most of the time these have already been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice optimization for small functions that do not maintain a stack frame. llvm-svn: 179026	2013-04-08 16:24:03 +00:00
Hal Finkel	81f8799fe3	Cleanup and improve PPC fsel generation First, we should not cheat: fsel-based lowering of select_cc is a finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes this clear, as does a note in our own README). This also adds fsel-based lowering of EQ and NE condition codes. As it turned out, fsel generation was covered by a grand total of zero regression test cases. I've added some test cases to cover the existing behavior (which is now finite-math only), as well as the new EQ cases. llvm-svn: 179000	2013-04-07 22:11:09 +00:00
Hal Finkel	d61d4f80e6	Implement PPCInstrInfo::FoldImmediate There are certain PPC instructions into which we can fold a zero immediate operand. We can detect such cases by looking at the register class required by the using operand (so long as it is not otherwise constrained). llvm-svn: 178961	2013-04-06 19:30:30 +00:00
Hal Finkel	ed6a28597b	Enable early if conversion on PPC On cores for which we know the misprediction penalty, and we have the isel instruction, we can profitably perform early if conversion. This enables us to replace some small branch sequences with selects and avoid the potential stalls from mispredicting the branches. Enabling this feature required implementing canInsertSelect and insertSelect in PPCInstrInfo; isel code in PPCISelLowering was refactored to use these functions as well. llvm-svn: 178926	2013-04-05 23:29:01 +00:00
Hal Finkel	f96c18e3bc	PPC: Improve code generation for mixed-precision reciprocal sqrt The DAGCombine logic that recognized a/sqrt(b) and transformed it into a multiplication by the reciprocal sqrt did not handle cases where the sqrt and the division were separated by an fpext or fptrunc. llvm-svn: 178801	2013-04-04 22:44:12 +00:00
Bill Schmidt	92e26646bc	Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC. For this we need to use a libcall. Previously LLVM didn't implement libcall support for frem, so I've added it in the usual straightforward manner. A test case from the bug report is included. llvm-svn: 178639	2013-04-03 13:05:44 +00:00
Hal Finkel	2e10331057	Use PPC reciprocal estimates with Newton iteration in fast-math mode When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617	2013-04-03 04:01:11 +00:00
Bill Schmidt	3581cd4b4c	Fix PR15630: Replace faulty stdcx. with stwcx. When doing a partword atomic operation, a lwarx was being paired with a stdcx. instead of a stwcx. when compiling for a 64-bit target. The target has nothing to do with it in this case; we always need a stwcx. Thanks to Kai Nacke for reporting the problem. llvm-svn: 178559	2013-04-02 18:37:08 +00:00
Hal Finkel	3f88d08974	Fix a bad assert in PPCTargetLowering llvm-svn: 178489	2013-04-01 18:42:58 +00:00
Hal Finkel	c2eddb0d02	Add triple to test/CodeGen/PowerPC/stfiwx-2 llvm-svn: 178486	2013-04-01 18:18:44 +00:00
Hal Finkel	f6d45f2379	Add more PPC floating-point conversion instructions The P7 and A2 have additional floating-point conversion instructions which allow a direct two-instruction sequence (plus load/store) to convert from all combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores, only some combinations were directly available). llvm-svn: 178480	2013-04-01 17:52:07 +00:00
Hal Finkel	c006295f3c	Fix PowerPC/cttz.ll to specify a cpu (and use FileCheck) llvm-svn: 178472	2013-04-01 16:31:56 +00:00
Hal Finkel	290376dd78	Add the PPC popcntw instruction The popcntw instruction is available whenever the popcntd instruction is available, and performs a separate popcnt on the lower and upper 32-bits. Ignoring the high-order count, this can be used for the 32-bit input case (saving on the explicit zero extension otherwise required to use popcntd). llvm-svn: 178470	2013-04-01 15:58:15 +00:00
Hal Finkel	beb296bea1	Add the PPC lfiwax instruction This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space). llvm-svn: 178446	2013-03-31 10:12:51 +00:00
Hal Finkel	e53429a13e	Cleanup PPC(64) i32 -> float/double conversion The existing SINT_TO_FP code for i32 -> float/double conversion was disabled because it relied on broken EXTSW_32/STD_32 instruction definitions. The original intent had been to enable these 64-bit instructions to be used on CPUs that support them even in 32-bit mode. Unfortunately, this form of lying to the infrastructure was buggy (as explained in the FIXME comment) and had therefore been disabled. This re-enables this functionality, using regular DAG nodes, but only when compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead) are removed. llvm-svn: 178438	2013-03-31 01:58:02 +00:00
Hal Finkel	f8ac57e289	Implement FRINT lowering on PPC using frin Like nearbyint, rint can be implemented on PPC using the frin instruction. The complication comes from the fact that rint needs to set the FE_INEXACT flag when the result does not equal the input value (and frin does not do that). As a result, we use a custom inserter which, after the rounding, compares the rounded value with the original, and if they differ, explicitly sets the XX bit in the FPSCR register (which corresponds to FE_INEXACT). Once LLVM has better modeling of the floating-point environment we should be able to (often) eliminate this extra complexity. llvm-svn: 178362	2013-03-29 19:41:55 +00:00
Hal Finkel	c20a08d25b	Add PPC FP rounding instructions fri[mnpz] These instructions are available on the P5x (and later) and on the A2. They implement the standard floating-point rounding operations (floor, trunc, etc.). One caveat: frin (round to nearest) does not implement "ties to even", and so is only enabled in fast-math mode. llvm-svn: 178337	2013-03-29 08:57:48 +00:00
Hal Finkel	fd53ddd71c	Specify CPUs on the PPC bswap-load-store test Otherwise, the CHECK-NOT's might trigger depending on the host's CPU. llvm-svn: 178287	2013-03-28 20:35:18 +00:00
Hal Finkel	22e41c411e	Only enable 64-bit bswap DAG combines for PPC64 Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were added for bswap with load/store. This is because these combines are really only valid in 64-bit mode, regardless of the CPU (and this was not being checked). llvm-svn: 178286	2013-03-28 20:23:46 +00:00
Hal Finkel	31d2956510	Add the PPC64 ldbrx/stdbrx instructions These are 64-bit load/store with byte-swap, and available on the P7 and the A2. Like the similar instructions for 16- and 32-bit words, these are matched in the target DAG-combine phase against load/store-bswap pairs. llvm-svn: 178276	2013-03-28 19:25:55 +00:00
Hal Finkel	a4d074863a	Add the PPC64 popcntd instruction PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233	2013-03-28 13:29:47 +00:00
Hal Finkel	035b4825ce	Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions There were a few places where kill flags were not being set correctly, and where 32-bit instruction variants were being used with 64-bit registers. After r178180, this code was being triggered causing llc to assert. llvm-svn: 178220	2013-03-28 03:38:16 +00:00
Hal Finkel	687143557d	Print PPC ZERO as 0 (not r0) even on Darwin It seems that the Darwin PPC assembler requires r0 to be written as 0 when it means 0 (at least in lwarx/stwcx.). Fixes PR15605. llvm-svn: 178142	2013-03-27 13:20:52 +00:00
Hal Finkel	0f77861d9f	Allocate r0 on PPC The R0 register can now be allocated because instructions that cannot use R0 as a GPR have been appropriately marked. llvm-svn: 178123	2013-03-27 06:52:27 +00:00
Bill Schmidt	a1b72d0f6a	Remove the link register from the GPR classes on PowerPC. Some implementation detail in the forgotten past required the link register to be placed in the GPRC and G8RC register classes. This is just wrong on the face of it, and causes several extra intersection register classes to be generated. I found this was having evil effects on instruction scheduling, by causing the wrong register class to be consulted for register pressure decisions. No code generation changes are expected, other than some minor changes in instruction order. Seven tests in the test bucket required minor tweaks to adjust to the new normal. llvm-svn: 178114	2013-03-27 02:40:14 +00:00
Hal Finkel	a7b0630ba8	Don't spill PPC VRSAVE on non-Darwin (even in SjLj) As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've added some asserts to make sure that we're not). As it turns out, we're not currently handling the Darwin case correctly (I've added a FIXME in the test case). I've tried adding various implied register definitions/uses to force the spill without success, so I'll need to address this later. llvm-svn: 178096	2013-03-27 00:02:20 +00:00
Hal Finkel	0dfbb05aff	Use multiple virtual registers in PPC CR spilling Now that the register scavenger can support multiple spill slots, and PEI can use virtual-register-based scavenging for multiple simultaneous registers, we can use a virtual register for the transfer register in the CR spilling code. This should eliminate the last place (outside of the prologue/epilogue) where we depend on the unconditional availability of the r0 register. We will soon be able to allocate it (in a somewhat restricted sense) as a GPR. llvm-svn: 178060	2013-03-26 18:57:22 +00:00
Hal Finkel	891671afe5	Fix a register-class comparison bug in PPCCTRLoops Thanks to Jakob for isolating the underlying problem from the test case in r177423. The original commit had introduced asymmetric copy operations, but these turned out to be a work-around to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops). llvm-svn: 177679	2013-03-21 23:23:34 +00:00
Hal Finkel	756810fe36	Implement builtin_{setjmp/longjmp} on PPC This implements SJLJ lowering on PPC, making the Clang functions __builtin_{setjmp/longjmp} functional on PPC platforms. The implementation strategy is similar to that on X86, with the exception that a branch-and-link variant is used to get the right jump address. Credit goes to Bill Schmidt for suggesting the use of the unconditional bcl form (instead of the regular bl instruction) to limit return-address-cache pollution. Benchmarking the speed at -O3 of: static jmp_buf env_sigill; void foo() { __builtin_longjmp(env_sigill,1); } main() { ... for (int i = 0; i < c; ++i) { if (__builtin_setjmp(env_sigill)) { goto done; } else { foo(); } done:; } ... } vs. the same code using the libc setjmp/longjmp functions on a P7 shows that this builtin implementation is ~4x faster with Altivec enabled and ~7.25x faster with Altivec disabled. This comparison is somewhat unfair because the libc version must also save/restore the VSX registers which we don't yet support. llvm-svn: 177666	2013-03-21 21:37:52 +00:00
David Blaikie	cc8d090163	Remove unused field in DISubprogram llvm-svn: 177661	2013-03-21 20:28:52 +00:00
Hal Finkel	a1431df540	Add support for spilling VRSAVE on PPC Although there is only one Altivec VRSAVE register, it is a member of a register class, and we need the ability to spill it. Because this register is normally callee-preserved and handled by special code this has never before been necessary. However, this capability will be required by a forthcoming commit adding SjLj support. llvm-svn: 177654	2013-03-21 19:03:21 +00:00
Hal Finkel	aa03c03a2d	Correct PPC FRAMEADDR lowering using a pseudo-register The old code used to lower FRAMEADDR tried to replicate the logic in the real frame-lowering code that determines whether or not the frame pointer (r31) will be used. When it seemed as through the frame pointer would not be used, the stack pointer (r1) was used instead. Unfortunately, because the stack size is not yet known, this does not work. Instead, this change introduces new always-reserved pseudo-registers (FP and FP8) that are replaced during prologue insertion with the real frame-pointer register (either r1 or r31). It is important that this intrinsic always return a valid frame address because it is used by Clang to store the frame address as part of code generation for __builtin_setjmp. llvm-svn: 177653	2013-03-21 19:03:19 +00:00
David Blaikie	43a729d165	Remove unused field in DICompileUnit llvm-svn: 177590	2013-03-20 22:34:33 +00:00
Hal Finkel	559754a482	Add a comment to the CodeGen/PowerPC/asym-regclass-copy.ll test llvm-svn: 177434	2013-03-19 20:22:32 +00:00
Ulrich Weigand	d850167a19	Rewrite pre-increment store patterns to use standard memory operands. Currently, pre-increment store patterns are written to use two separate operands to represent address base and displacement: stwu $rS, $ptroff($ptrreg) This causes problems when implementing the assembler parser, so this commit changes the patterns to use standard (complex) memory operands like in all other memory access instruction patterns: stwu $rS, $dst To still match those instructions against the appropriate pre_store SelectionDAG nodes, the patch uses the new feature that allows a Pat to match multiple DAG operands against a single (complex) instruction operand. Approved by Hal Finkel. llvm-svn: 177429	2013-03-19 19:52:04 +00:00
Hal Finkel	638a9fa43e	Prepare to make r0 an allocatable register on PPC Currently the PPC r0 register is unconditionally reserved. There are two reasons for this: 1. r0 is treated specially (as the constant 0) by certain instructions, and so cannot be used with those instructions as a regular register. 2. r0 is used as a temporary register in the CR-register spilling process (where, under some circumstances, we require two GPRs). This change addresses the first reason by introducing a restricted register class (without r0) for use by those instructions that treat r0 specially. These register classes have a new pseudo-register, ZERO, which represents the r0-as-0 use. This has the side benefit of making the existing target code simpler (and easier to understand), and will make it clear to the register allocator that uses of r0 as 0 don't conflict will real uses of the r0 register. Once the CR spilling code is improved, we'll be able to allocate r0. Adding these extra register classes, for some reason unclear to me, causes requests to the target to copy 32-bit registers to 64-bit registers. The resulting code seems correct (and causes no test-suite failures), and the new test case covers this new kind of asymmetric copy. As r0 is still reserved, no functionality change intended. llvm-svn: 177423	2013-03-19 18:51:05 +00:00
Hal Finkel	6681486375	Cleanup PPC64 unaligned i64 load/store Remove an accidentally-added instruction definition and add a comment in the test case. This is in response to a post-commit review by Bill Schmidt. No functionality change intended. llvm-svn: 177404	2013-03-19 15:23:39 +00:00
Hal Finkel	d9e10d51fa	Don't reserve R31 on PPC64 unless the frame pointer is needed llvm-svn: 177379	2013-03-19 08:09:38 +00:00
Hal Finkel	fc9aad6436	Fix a sign-extension bug in PPCCTRLoops Don't sign extend the immediate value from the OR instruction in an LIS/OR pair. llvm-svn: 177361	2013-03-18 23:58:28 +00:00
Hal Finkel	b09680b0f7	Fix PPC unaligned 64-bit loads and stores PPC64 supports unaligned loads and stores of 64-bit values, but in order to use the r+i forms, the offset must be a multiple of 4. Unfortunately, this cannot always be determined by examining the immediate itself because it might be available only via a TOC entry. In order to get around this issue, we additionally predicate the selection of the r+i form on the alignment of the load or store (forcing it to be at least 4 in order to select the r+i form). llvm-svn: 177338	2013-03-18 23:00:58 +00:00
Bill Schmidt	45bdbe7ec0	Change test cases to handle unaligned references. Hal Finkel recently added code to allow unaligned memory references for PowerPC. Two tests were temporarily modified with -disable-ppc-unaligned to keep them from failing. This patch adjusts the expected code generation for the unaligned references. llvm-svn: 177328	2013-03-18 22:12:04 +00:00
David Blaikie	3053029310	Remove unnecessary leading comment characters in lit-only file llvm-svn: 177327	2013-03-18 22:08:16 +00:00
David Blaikie	99067af791	Include '.test' suffix in target specific lit configs that need it Apparently my final cleanup to use a relevant suffix for these tests before committing r176831 caused them to stop running since lit wasn't configured to run tests with that suffix in those directories (why don't we just have a global suffix list?). So, add the suffix to the relevant directories & fix the test that has bitrotted over the last week due to my debug info schema changes. llvm-svn: 177315	2013-03-18 20:31:44 +00:00
Hal Finkel	21f2a43ab4	Fix large count and negative constant count handling in PPCCTRLoops This commit fixes an assert that would occur on loops with large constant counts (like looping for ((uint32_t) -1) iterations on PPC64). The existing code did not handle counts that it computed to be negative (asserting instead), but these can be created with valid inputs. This bug was discovered by bugpoint while I was attempting to isolate a completely different problem. Also, in writing test cases for the negative-count problem, I discovered that the ori/lsi handling was broken (there was a typo which caused the logic that was supposed to detect these pairs and extract the iteration count to always fail). This has now also been corrected (and is covered by one of the new test cases). llvm-svn: 177295	2013-03-18 17:40:44 +00:00
Hal Finkel	12337e4e7d	Cleanup initial-value constants in PPCCTRLoops Because the initial-value constants had not been added to the list of instructions considered for DCE the resulting code had redundant constant-materialization instructions. llvm-svn: 177294	2013-03-18 17:40:27 +00:00
Hal Finkel	fcc51d4ff1	Improve PPC VR (Altivec) register spilling This change cleans up two issues with Altivec register spilling: 1. The spilling code was inefficient (using two instructions, and add and a load, when just one would do) 2. The code assumed that r0 would always be available (true for now, but this will change) The new code handles VR spilling just like GPR spills but forced into r+r mode. As a result, when any VR spills are present, we must now always allocate the register-scavenger spill slot. llvm-svn: 177231	2013-03-17 04:43:44 +00:00
Hal Finkel	57080382e6	Remove FIXMEs in PPC test cases related to unaligned loads/stores As pointed out by Bill in response to r177160, these two FIXMEs can also be removed. llvm-svn: 177229	2013-03-16 23:02:31 +00:00
Hal Finkel	8d7fbc9dad	Enable unaligned memory access on PPC for scalar types Unaligned access is supported on PPC for non-vector types, and is generally more efficient than manually expanding the loads and stores. A few of the existing test cases were using expanded unaligned loads and stores to test other features (like load/store with update), and for these test cases, unaligned access remains disabled. llvm-svn: 177160	2013-03-15 15:27:13 +00:00
Hal Finkel	b0fac42987	Protect PPC Altivec patterns with a predicate In preparation for the addition of other SIMD ISA extensions (such as QPX) we need to make sure that all Altivec patterns are properly predicated on having Altivec support. No functionality change intended (one test case needed to be updated b/c it assumed that Altivec intrinsics would be supported without enabling Altivec support). llvm-svn: 177152	2013-03-15 13:21:21 +00:00
Hal Finkel	bb420f10e9	Allocate the RS spill slot for any PPC function with spills and a large stack frame For spills into a large stack frame, the FI-elimination code uses the register scavenger to obtain a free GPR for use with an r+r-addressed load or store. When there are no available GPRs, the scavenger gets one by using its spill slot. Previously, we were not always allocating that spill slot and the RS would assert when the spill slot was needed. I don't currently have a small test that triggered the assert, but I've created a small regression test that verifies that the spill slot is now added when the stack frame is sufficiently large. llvm-svn: 177140	2013-03-15 05:06:04 +00:00
Hal Finkel	e987a311ba	Not all PPC functions with a frame pointer need a RS spill slot We used to add a spill slot for the register scavenger whenever the function has a frame pointer. This is unnecessarily conservative: We may need the spill slot for dynamic stack allocations, and functions with dynamic stack allocations always have a FP, but we might also have a FP for other reasons (such as the user explicitly disabling frame-pointer elimination), and we don't necessarily need a spill slot for those functions. The structsinregs test needed adjustment because it disables FP elimination. llvm-svn: 177106	2013-03-14 19:34:32 +00:00
David Blaikie	1ca2f36289	Refactor filename/directory in DICompileUnit into a DIFile This is the next step towards making the metadata for DIScopes have a common prefix rather than having to delegate based on their tag type. llvm-svn: 176913	2013-03-13 00:01:35 +00:00
David Blaikie	452c3ff649	Remove unused "isMain" field from DICompileUnit llvm-svn: 176910	2013-03-12 22:43:04 +00:00
David Blaikie	a4f770d51c	Update debug info test cases with empty SplitDebugFilename field. This could be 'null' or the empty string, DIDescriptor::getStringField coalesces the two cases anyway so it's just a matter of legible/efficient representation. The change in behavior of the DICompileUnit::get* functions could be subsumed by the full verification check - but ideally that should just be an assertion if we could front-load the actual debug info metadata failure paths. llvm-svn: 176907	2013-03-12 22:25:36 +00:00
Jan Wen Voung	6dc3076080	Revert the test moves from 176733. Use "REQUIRES: asserts" instead. llvm-svn: 176873	2013-03-12 16:27:52 +00:00
Hal Finkel	01271c6022	Don't reserve R2 on Darwin/PPC Now that only the register-scavenger version of the CR spilling code remains, we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in any case where the System V ABI uses it (R0 is special architecturally, and so is reserved under all common ABIs). A few test cases needed to be updated to reflect the register-allocation changes. llvm-svn: 176868	2013-03-12 15:18:14 +00:00
David Blaikie	789beb5300	Remove duplicate test contents. llvm-svn: 176831	2013-03-11 22:10:14 +00:00
Benjamin Kramer	01b75cc0f2	Test case hygiene. llvm-svn: 176772	2013-03-09 18:25:40 +00:00
Jan Wen Voung	7857a64909	Disable statistics on Release builds and move tests that depend on -stats. Summary: Statistics are still available in Release+Asserts (any +Asserts builds), and stats can also be turned on with LLVM_ENABLE_STATS. Move some of the FastISel stats that were moved under DEBUG() back out of DEBUG(), since stats are disabled across the board now. Many tests depend on grepping "-stats" output. Move those into a orig_dir/Stats/. so that they can be marked as unsupported when building without statistics. Differential Revision: http://llvm-reviews.chandlerc.com/D486 llvm-svn: 176733	2013-03-08 22:56:31 +00:00
Bill Schmidt	8ea7af8e44	Fix PR15332 (patch by Florian Zeitz). There's no need to generate a stack frame for PPC32 SVR4 when there are no local variables assigned to the stack, i.e., when no red zone is needed. (PPC64 supports a red zone, but PPC32 does not.) llvm-svn: 176124	2013-02-26 21:28:57 +00:00
Bill Schmidt	441907dc09	Fix PR15359. The PowerPC TLS relocation types were not previously added to the necessary list in MCELFStreamer::fixSymbolsInTLSFixups(). Now they are! llvm-svn: 176094	2013-02-26 16:41:03 +00:00
Bill Schmidt	b454829981	Fix missing relocation for TLS addressing peephole optimization. Report and fix due to Kai Nacke. Testcase update by me. llvm-svn: 176029	2013-02-25 16:44:35 +00:00
Bill Schmidt	27917785ae	Large code model support for PowerPC. Large code model is identical to medium code model except that the addis/addi sequence for "local" accesses is never used. All accesses use the addis/ld sequence. The coding changes are straightforward; most of the patch is taken up with creating variants of the medium model tests for large model. llvm-svn: 175767	2013-02-21 17:12:27 +00:00
Bill Schmidt	f5b474c6c6	PPCDAGToDAGISel::PostprocessISelDAG() This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual method to perform post-selection peephole optimizations on the DAG representation. One optimization is implemented here: folds to clean up complex addressing expressions for thread-local storage and medium code model. It will also be useful for large code model sequences when those are added later. I originally thought about doing this on the MI representation prior to register assignment, but it's difficult to do effective global dead code elimination at that point. DCE is trivial on the DAG representation. A typical example of a candidate code sequence in assembly: addis 3, 2, globalvar@toc@ha addi 3, 3, globalvar@toc@l lwz 5, 0(3) When the final instruction is a load or store with an immediate offset of zero, the offset from the add-immediate can replace the zero, provided the relocation information is carried along: addis 3, 2, globalvar@toc@ha lwz 5, globalvar@toc@l(3) Since the addi can in general have multiple uses, we need to only delete the instruction when the last use is removed. llvm-svn: 175697	2013-02-21 00:38:25 +00:00
Bill Schmidt	4437ad7d94	Stabilize vec_constants.ll llvm-svn: 175683	2013-02-20 22:43:03 +00:00
Bill Schmidt	c6cbecc2c7	Additional fixes for bug 15155. This handles the cases where the 6-bit splat element is odd, converting to a three-instruction sequence to add or subtract two splats. With this fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed. llvm-svn: 175663	2013-02-20 20:41:42 +00:00
Bill Schmidt	6631e94838	Fix bug 14779 for passing anonymous aggregates [patch by Kai Nacke]. The PPC backend doesn't handle these correctly. This patch uses logic similar to that in the X86 and ARM backends to track these arguments properly. llvm-svn: 175635	2013-02-20 17:31:41 +00:00
Bill Schmidt	51e7951e24	Fix PR15155: lost vadd/vsplat optimization. During lowering of a BUILD_VECTOR, we look for opportunities to use a vector splat. When the splatted value fits in 5 signed bits, a single splat does the job. When it doesn't fit in 5 bits but does fit in 6, and is an even value, we can splat on half the value and add the result to itself. This last optimization hasn't been working recently because of improved constant folding. To circumvent this, create a pseudo VADD_SPLAT that can be expanded during instruction selection. llvm-svn: 175632	2013-02-20 15:50:31 +00:00
Hal Finkel	2581905f81	DAGCombiner: Constant folding around pre-increment loads/stores Previously, even when a pre-increment load or store was generated, we often needed to keep a copy of the original base register for use with other offsets. If all of these offsets are constants (including the offset which was combined into the addressing mode), then this is clearly unnecessary. This change adjusts these other offsets to use the new incremented address. llvm-svn: 174746	2013-02-08 21:35:47 +00:00
Benjamin Kramer	c35d526489	Disable a couple more vector splat optimizations on PPC. I didn't see those because the test case used "not grep". FileCheck the test and XFAIL it, preserving the old optimization, so this can be fixed eventually. llvm-svn: 174330	2013-02-04 15:52:32 +00:00
Benjamin Kramer	548ffa274a	SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors. This required disabling a PowerPC optimization that did the following: input: x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16> lowered to: tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8> x = ADD tmp, tmp The add now gets folded immediately and we're back at the BUILD_VECTOR we started from. I don't see a way to fix this currently so I left it disabled for now. Fix some trivially foldable X86 tests too. llvm-svn: 174325	2013-02-04 15:19:18 +00:00
David Blaikie	33111dfea0	Remove the (apparently) unnecessary debug info metadata indirection. The main lists of debug info metadata attached to the compile_unit had an extra layer of metadata nodes they went through for no apparent reason. This patch removes that (& still passes just as much of the GDB 7.5 test suite). If anyone can show evidence as to why these extra metadata nodes are there I'm open to reverting this patch & documenting why they're there. llvm-svn: 174266	2013-02-02 05:56:24 +00:00
Bill Schmidt	52742c25ae	LLVM enablement for some older PowerPC CPUs llvm-svn: 174230	2013-02-01 22:59:51 +00:00
Hal Finkel	e1df90958d	PPC QPX requires a 32-byte aligned stack On systems which support the QPX vector instructions, the stack must be 32-byte aligned. llvm-svn: 173993	2013-01-30 23:43:27 +00:00
Hal Finkel	efb305e54c	Add definitions for the PPC a2q core marked as having QPX available This is the first commit of a large series which will add support for the QPX vector instruction set to the PowerPC backend. This instruction set is used on the IBM Blue Gene/Q supercomputers. llvm-svn: 173973	2013-01-30 21:17:42 +00:00
Bill Schmidt	2e4ae4e154	This patch addresses bug 15031. The common code in the post-RA scheduler to break anti-dependencies on the critical path contained a flaw. In the reported case, an anti-dependency between the overlapping registers %X4 and %R4 exists: %X29<def> = OR8 %X4, %X4 %R4<def>, %X3<def,dead,tied3> = LBZU 1, %X3<kill,tied1> The unpatched code breaks the dependency by replacing %R4 and its uses with %R3, the first register on the available list. However, %R3 and %X3 overlap, so this creates two overlapping definitions on the same instruction. The fix is straightforward, preventing selection of a register that overlaps any other defined register on the same instruction. The test case is reduced from the bug report, and verifies that we no longer produce "lbzu 3, 1(3)" when breaking this anti-dependency. llvm-svn: 173706	2013-01-28 18:36:58 +00:00
Bill Schmidt	94b8cdbf55	Restore reverted test case, this time with REQUIRES: asserts llvm-svn: 172747	2013-01-17 19:46:51 +00:00
Bill Schmidt	b400204fd8	Remove bad test case llvm-svn: 172746	2013-01-17 19:39:36 +00:00
Bill Schmidt	dee1ef8f53	This patch fixes PR13626 by providing i128 support in the return calling convention. 128-bit integers are now properly returned in GPR3 and GPR4 on PowerPC. llvm-svn: 172745	2013-01-17 19:34:57 +00:00
Bill Schmidt	6b2940b01e	This patch fixes the PPC calling convention to handle returns of _Complex float and _Complex long double, by simply increasing the number of floating point registers available for return values. The test case verifies that the correct registers are loaded. llvm-svn: 172733	2013-01-17 17:45:19 +00:00
Bill Schmidt	d006c6938b	This patch addresses an incorrect transformation in the DAG combiner. The included test case is derived from one of the GCC compatibility tests. The problem arises after the selection DAG has been converted to type-legalized form. The combiner first sees a 64-bit load that can be converted into a pre-increment form. The original load feeds into a SRL that isolates the upper 32 bits of the loaded doubleword. This looks like an opportunity for DAGCombiner::ReduceLoadWidth() to replace the 64-bit load with a 32-bit load. However, this transformation is not valid, as the replacement load is not a pre-increment load. The pre-increment load produces an extra result, which feeds a subsequent add instruction. The replacement load only has one result value, and this value is propagated to all uses of the pre- increment load, including the add. Because the add is looking for the second result value as its operand, it ends up attempting to add a constant to a token chain, resulting in a crash. So the patch simply disables this transformation for any load with more than two result values. llvm-svn: 172480	2013-01-14 22:04:38 +00:00
Benjamin Kramer	5ea0349ef5	When lowering an inreg sext first shift left, then right arithmetically. Shifting right two times will only yield zero. Should fix SingleSource/UnitTests/SignlessTypes/factor. llvm-svn: 172322	2013-01-12 19:06:44 +00:00
Nadav Rotem	dbe5c72d03	PPC: Implement efficient lowering of sign_extend_inreg. llvm-svn: 172269	2013-01-11 22:57:48 +00:00
Tim Northover	3a51aab390	Simplify writing floating types to assembly. This removes previous special cases for each floating-point type in favour of a shared codepath. llvm-svn: 172189	2013-01-11 10:36:13 +00:00
Andrew Trick	9f0b95f260	MIsched: add an ILP window property to machine model. This was an experimental option, but needs to be defined per-target. e.g. PPC A2 needs to aggressively hide latency. I converted some in-order scheduling tests to A2. Hal is working on more test cases. llvm-svn: 171946	2013-01-09 03:36:49 +00:00
Tim Northover	90fb75d859	Specify complete triple for fp128 tests. This avoids FileCheck failing over different comment characters in assembly (notably powerpc64 on Linux vs Darwin) and should fix David's build-bot. llvm-svn: 171886	2013-01-08 19:36:33 +00:00
Tim Northover	7bb9992cce	Allow the asm printer to print fp128 values properly. llvm-svn: 171866	2013-01-08 16:56:23 +00:00
Bill Schmidt	9b1e3e25dc	This patch addresses bug 14678 by fixing two problems in medium code model code generation. Variables addressed through a GlobalAlias were not being handled, and variables with available_externally linkage were treated incorrectly. The patch contains two new tests to verify the correct code generation for these cases. llvm-svn: 171778	2013-01-07 19:29:18 +00:00
Hal Finkel	6dbdd4307b	Support ppcf128 in SelectionDAG::getConstantFP Fixes pr14751. Patch by Kai; Thanks! llvm-svn: 171261	2012-12-30 19:03:32 +00:00
Hal Finkel	2ebe6d08cd	Loosen scheduling restrictions on the PPC dcbt intrinsic As with the prefetch intrinsic to which it maps, simply have dcbt marked as reading from and writing to its arguments instead of having unmodeled side effects. While this might cause unwanted code motion (because aliasing checks don't really capture cache-line sharing), it is more important that prefetches in unrolled loops don't block the scheduler from rearranging the unrolled loop body. llvm-svn: 171073	2012-12-25 18:51:18 +00:00
Hal Finkel	1b5ff08d43	Expand PPC64 atomic load and store Use of store or load with the atomic specifier on 64-bit types would cause instruction-selection failures. As with the 32-bit case, these can use the default expansion in terms of cmp-and-swap. llvm-svn: 171072	2012-12-25 17:22:53 +00:00
Rafael Espindola	642c7cd56e	Simplify the testcase a bit. I checked that it would still crash llc before the corresponding fix. llvm-svn: 170709	2012-12-20 17:47:27 +00:00
Benjamin Kramer	c5071466d4	PowerPC: Expand VSELECT nodes. There's probably a better expansion for those nodes than the default for altivec, but this is better than crashing. VSELECTs occur in loop vectorizer output. llvm-svn: 170551	2012-12-19 15:49:14 +00:00
Hal Finkel	943f76d1b3	Check multiple register classes for inline asm tied registers A register can be associated with several distinct register classes. For example, on PPC, the floating point registers are each associated with both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code would fail when provided with a floating point register and an f64 operand because it would happen to find the register in the F4RC class first and return that. From the F4RC class, SDAG would extract f32 as the register type and then assert because of the invalid implied conversion between the f64 value and the f32 register. Instead, search all register classes. If a register class containing the the requested register has the requested type, then return that register class. Otherwise, as before, return the first register class found that contains the requested register. llvm-svn: 170436	2012-12-18 17:50:58 +00:00
Bill Schmidt	a4f898448c	This patch removes some nondeterminism from direct object file output for TLS dynamic models on 64-bit PowerPC ELF. The default sort routine for relocations only sorts on the r_offset field; but with TLS, there can be two relocations with the same r_offset. For PowerPC, this patch sorts secondarily on descending r_type, which matches the behavior expected by the linker. llvm-svn: 170237	2012-12-14 20:28:38 +00:00
Bill Schmidt	9f0b4ec0f5	This patch improves the 64-bit PowerPC InitialExec TLS support by providing for a wider range of GOT entries that can hold thread-relative offsets. This matches the behavior of GCC, which was not documented in the PPC64 TLS ABI. The ABI will be updated with the new code sequence. Former sequence: ld 9,x@got@tprel(2) add 9,9,x@tls New sequence: addis 9,2,x@got@tprel@ha ld 9,x@got@tprel@l(9) add 9,9,x@tls Note that a linker optimization exists to transform the new sequence into the shorter sequence when appropriate, by replacing the addis with a nop and modifying the base register and relocation type of the ld. llvm-svn: 170209	2012-12-14 17:02:38 +00:00
Bill Schmidt	25ffd19502	The ordering of two relocations on the same instruction is apparently not predictable when compiled on at least one non-PowerPC host. Source of nondeterminism not apparent. Restrict the test to build on PowerPC hosts for now while looking into the issue further. llvm-svn: 170016	2012-12-12 20:29:20 +00:00
Bill Schmidt	24b8dd6eb7	This patch implements local-dynamic TLS model support for the 64-bit PowerPC target. This is the last of the four models, so we now have full TLS support. This is mostly a straightforward extension of the general dynamic model. I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the register copy following ADDI_TLSLD_L; otherwise everything above the ADDIS_DTPREL_HA appeared dead and was removed. As before, there are new test cases to test the assembly generation, and the relocations output during integrated assembly. The expected code gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll. There are a couple of things I think can be done more efficiently in the overall TLS code, so there will likely be a clean-up patch forthcoming; but for now I want to be sure the functionality is in place. Bill llvm-svn: 170003	2012-12-12 19:29:35 +00:00
Bill Schmidt	c56f1d34bc	This patch implements the general dynamic TLS model for 64-bit PowerPC. Given a thread-local symbol x with global-dynamic access, the generated code to obtain x's address is: Instruction Relocation Symbol addis ra,r2,x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x addi r3,ra,x@got@tlsgd@l R_PPC64_GOT_TLSGD16_L x bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x R_PPC64_REL24 __tls_get_addr nop <use address in r3> The implementation borrows from the medium code model work for introducing special forms of ADDIS and ADDI into the DAG representation. This is made slightly more complicated by having to introduce a call to the external function __tls_get_addr. Using the full call machinery is overkill and, more importantly, makes it difficult to add a special relocation. So I've introduced another opcode GET_TLS_ADDR to represent the function call, and surrounded it with register copies to set up the parameter and return value. Most of the code is pretty straightforward. I ran into one peculiarity when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like BL8_NOP_ELF except that it takes another parameter to represent the symbol ("x" above) that requires a relocation on the call. Something in the TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated identically during the emit phase, so this second operand was never visited to generate relocations. This is the reason for the slightly messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding(). Two new tests are included to demonstrate correct external assembly and correct generation of relocations using the integrated assembler. Comments welcome! Thanks, Bill llvm-svn: 169910	2012-12-11 20:30:11 +00:00
Hal Finkel	66859ae0f6	Use GetUnderlyingObjects in misched misched used GetUnderlyingObject in order to break false load/store dependencies, and the -enable-aa-sched-mi feature similarly relied on GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis. Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so (especially due to LSR) all of these mechanisms failed for induction-variable-dependent loads and stores inside loops. This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects (which will recurse through phi and select instructions) in misched. Andy reviewed, tested and simplified this patch; Thanks! llvm-svn: 169744	2012-12-10 18:49:16 +00:00
Bill Schmidt	ca4a0c9dbd	This patch introduces initial-exec model support for thread-local storage on 64-bit PowerPC ELF. The patch includes code to handle external assembly and MC output with the integrated assembler. It intentionally does not support the "old" JIT. For the initial-exec TLS model, the ABI requires the following to calculate the address of external thread-local variable x: Code sequence Relocation Symbol ld 9,x@got@tprel(2) R_PPC64_GOT_TPREL16_DS x add 9,9,x@tls R_PPC64_TLS x The register 9 is arbitrary here. The linker will replace x@got@tprel with the offset relative to the thread pointer to the generated GOT entry for symbol x. It will replace x@tls with the thread-pointer register (13). The two test cases verify correct assembly output and relocation output as just described. PowerPC-specific selection node variants are added for the two instructions above: LD_GOT_TPREL and ADD_TLS. These are inserted when an initial-exec global variable is encountered by PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to machine instructions LDgotTPREL and ADD8TLS. LDgotTPREL is a pseudo that uses the same LDrs support added for medium code model's LDtocL, with a different relocation type. The rest of the processing is straightforward. llvm-svn: 169281	2012-12-04 16:18:08 +00:00
Chad Rosier	31e7d2deb3	test/CodeGen/PowerPC/vec_mul.ll: Add a triple. Thanks, Hal. llvm-svn: 169026	2012-11-30 19:15:10 +00:00
Chad Rosier	a820e7feff	test/CodeGen/PowerPC/vec_mul.ll: Fix register operands. llvm-svn: 169020	2012-11-30 18:29:01 +00:00
NAKAMURA Takumi	faaf131091	test/CodeGen/PowerPC: Add explicit -march=ppc32. FIXME: Please add another RUN line if you would like to check also on ppc64. llvm-svn: 168999	2012-11-30 13:28:31 +00:00
Adhemerval Zanella	812410f2d1	This patch fixes the Altivec addend construction for the fused multiply-add instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero result when resulting product is -0.0. The -0.0 vector addend to vmaddfp is generated by a creating a vector with full bits sets and then shifting each elements by 31-bits to the left, resulting in a vector of 0x80000000 (or -0.0 as float). The 'buildvec_canonicalize.ll' was adjusted to reflect this change and the 'vec_mul.ll' was complemented with the float vector multiplication test. llvm-svn: 168998	2012-11-30 13:05:44 +00:00
Bill Schmidt	e0a68a562b	This patch makes medium code model the default for 64-bit PowerPC ELF. When the CodeGenInfo is to be created for the PPC64 target machine, a default code-model selection is converted to CodeModel::Medium provided we are not targeting the Darwin OS. Defaults for Darwin are unaffected. llvm-svn: 168747	2012-11-27 23:36:26 +00:00
Bill Schmidt	34627e3434	This patch implements medium code model support for 64-bit PowerPC. The default for 64-bit PowerPC is small code model, in which TOC entries must be addressable using a 16-bit offset from the TOC pointer. Additionally, only TOC entries are addressed via the TOC pointer. With medium code model, TOC entries and data sections can all be addressed via the TOC pointer using a 32-bit offset. Cooperation with the linker allows 16-bit offsets to be used when these are sufficient, reducing the number of extra instructions that need to be executed. Medium code model also does not generate explicit TOC entries in ".section toc" for variables that are wholly internal to the compilation unit. Consider a load of an external 4-byte integer. With small code model, the compiler generates: ld 3, .LC1@toc(2) lwz 4, 0(3) .section .toc,"aw",@progbits .LC1: .tc ei[TC],ei With medium model, it instead generates: addis 3, 2, .LC1@toc@ha ld 3, .LC1@toc@l(3) lwz 4, 0(3) .section .toc,"aw",@progbits .LC1: .tc ei[TC],ei Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the 32-bit offset of ei's TOC entry from the TOC base pointer. Similarly, .LC1@toc@l is a relocation requesting the lower 16 bits. Note that if the linker determines that ei's TOC entry is within a 16-bit offset of the TOC base pointer, it will replace the "addis" with a "nop", and replace the "ld" with the identical "ld" instruction from the small code model example. Consider next a load of a function-scope static integer. For small code model, the compiler generates: ld 3, .LC1@toc(2) lwz 4, 0(3) .section .toc,"aw",@progbits .LC1: .tc test_fn_static.si[TC],test_fn_static.si .type test_fn_static.si,@object .local test_fn_static.si .comm test_fn_static.si,4,4 For medium code model, the compiler generates: addis 3, 2, test_fn_static.si@toc@ha addi 3, 3, test_fn_static.si@toc@l lwz 4, 0(3) .type test_fn_static.si,@object .local test_fn_static.si .comm test_fn_static.si,4,4 Again, the linker may replace the "addis" with a "nop", calculating only a 16-bit offset when this is sufficient. Note that it would be more efficient for the compiler to generate: addis 3, 2, test_fn_static.si@toc@ha lwz 4, test_fn_static.si@toc@l(3) The current patch does not perform this optimization yet. This will be addressed as a peephole optimization in a later patch. For the moment, the default code model for 64-bit PowerPC will remain the small code model. We plan to eventually change the default to medium code model, which matches current upstream GCC behavior. Note that the different code models are ABI-compatible, so code compiled with different models will be linked and execute correctly. I've tested the regression suite and the application/benchmark test suite in two ways: Once with the patch as submitted here, and once with additional logic to force medium code model as the default. The tests all compile cleanly, with one exception. The mandel-2 application test fails due to an unrelated ABI compatibility with passing complex numbers. It just so happens that small code model was incredibly lucky, in that temporary values in floating-point registers held the expected values needed by the external library routine that was called incorrectly. My current thought is to correct the ABI problems with _Complex before making medium code model the default, to avoid introducing this "regression." Here are a few comments on how the patch works, since the selection code can be difficult to follow: The existing logic for small code model defines three pseudo-instructions: LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for constant pool addresses. These are expanded by SelectCodeCommon(). The pseudo-instruction approach doesn't work for medium code model, because we need to generate two instructions when we match the same pattern. Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY node for medium code model, and generates an ADDIStocHA followed by either a LDtocL or an ADDItocL. These new node types correspond naturally to the sequences described above. The addis/ld sequence is generated for the following cases: * Jump table addresses * Function addresses * External global variables * Tentative definitions of global variables (common linkage) The addis/addi sequence is generated for the following cases: * Constant pool entries * File-scope static global variables * Function-scope static variables Expanding to the two-instruction sequences at select time exposes the instructions to subsequent optimization, particularly scheduling. The rest of the processing occurs at assembly time, in PPCAsmPrinter::EmitInstruction. Each of the instructions is converted to a "real" PowerPC instruction. When a TOC entry needs to be created, this is done here in the same manner as for the existing LDtoc, LDtocJTI, and LDtocCPT pseudo-instructions (I factored out a new routine to handle this). I had originally thought that if a TOC entry was needed for LDtocL or ADDItocL, it would already have been generated for the previous ADDIStocHA. However, at higher optimization levels, the ADDIStocHA may appear in a different block, which may be assembled textually following the block containing the LDtocL or ADDItocL. So it is necessary to include the possibility of creating a new TOC entry for those two instructions. Note that for LDtocL, we generate a new form of LD called LDrs. This allows specifying the @toc@l relocation for the offset field of the LD instruction (i.e., the offset is replaced by a SymbolLo relocation). When the peephole optimization described above is added, we will need to do similar things for all immediate-form load and store operations. The seven "mcm-n.ll" test cases are kept separate because otherwise the intermingling of various TOC entries and so forth makes the tests fragile and hard to understand. The above assumes use of an external assembler. For use of the integrated assembler, new relocations are added and used by PPCELFObjectWriter. Testing is done with "mcm-obj.ll", which tests for proper generation of the various relocations for the same sequences tested with the external assembler. llvm-svn: 168708	2012-11-27 17:35:46 +00:00
Eli Bendersky	eaf1a28594	Rewrite test to not use a FileCheck variable and redefine it on the same line. In preparation for the FileCheck functionality change which will allow using a variable later on the same line. No functionality change. llvm-svn: 168588	2012-11-26 14:09:46 +00:00
Benjamin Kramer	dd76a93af5	PPC: MCize most of the darwin PIC emission. The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the instruction definition for. Only whitespace changes in assembly output. llvm-svn: 168541	2012-11-24 13:18:25 +00:00
Andrew Trick	e615ad0063	Use a full triple for a PPC test case for asm syntax. llvm-svn: 168283	2012-11-18 06:21:03 +00:00
Andrew Trick	bb1b351860	Silence the buildbots for this test while I figure out the triple llvm-svn: 168249	2012-11-17 03:39:26 +00:00
Andrew Trick	28c000b234	Broaden isSchedulingBoundary to check aliases of SP. On PPC the stack pointer is X1, but ADJCALLSTACK writes R1. Fixes PR14315: Register regmask dependency problem with misched. llvm-svn: 168248	2012-11-17 03:35:11 +00:00
Adhemerval Zanella	bdface5699	PowerPC: Lowering floor intrinsic for Altivec This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and llvm.nearbyint to Altivec instruction when using 4 single-precision float vectors. llvm-svn: 168086	2012-11-15 20:56:03 +00:00
Bill Schmidt	451499f02d	This patch is in preparation for adding medium code model support to the PPC64 target. The five tests modified herein test code generation that is sensitive to the code model selected. So I've added -code-model=small to the RUN commands for each. Since small code model is the default, this has no effect for now; but this prepares us for eventually changing the default to medium code model for PPC64. Test changes verified with small and medium code model as default on powerpc64-unknown-linux-gnu. All tests continue to pass. llvm-svn: 167999	2012-11-14 23:23:27 +00:00
Ulrich Weigand	3946877f88	Do not consider a machine instruction that uses and defines the same physical register as candidate for common subexpression elimination in MachineCSE. This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc caused by MachineCSE invalidly merging two separate DYNALLOC insns. llvm-svn: 167855	2012-11-13 18:40:58 +00:00
Jakob Stoklund Olesen	13d5562963	Fix assertions in updateRegMaskSlots(). The RegMaskSlots contains 'r' slots while NewIdx and OldIdx are 'B' slots. This broke the checks in the assertions. This fixes PR14302. llvm-svn: 167625	2012-11-09 19:18:49 +00:00
Ulrich Weigand	339d0597d3	On PowerPC64, integer return values (as well as arguments) are supposed to be extended to a full register. This is modeled in the IR by marking the return value (or argument) with a signext or zeroext attribute. However, while these attributes are respected for function arguments, they are currently ignored for function return values by the PowerPC back-end. This patch updates PPCCallingConv.td to ask for the promotion to i64, and fixes LowerReturn and LowerCallResult to implement it. The new test case verifies that both arguments and return values are properly extended when passing them; and also that the optimizers understand incoming argument and return values are in fact guaranteed by the ABI to be extended. The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll, since the test case used a "ret" instruction to create a use of an i32 value at the end of the function (to set up data flow as required for what the test is intended to test). Since there's now an implicit promotion to i64, that data flow no longer works as expected. To fix this, this patch now adds an extra "add" to ensure we have an appropriate use of the i32 value. llvm-svn: 167396	2012-11-05 19:39:45 +00:00
Hal Finkel	4f24c621d9	Add support for the PowerPC-specific inline asm Z constraint and y modifier. The Z constraint specifies an r+r memory address, and the y modifier expands to the "r, r" in the asm string. For this initial implementation, the base register is forced to r0 (which has the special meaning of 0 for r+r addressing on PowerPC) and the full address is taken in the second register. In the future, this should be improved. llvm-svn: 167388	2012-11-05 18:18:42 +00:00
Adhemerval Zanella	c4182d1890	[PATCH] PowerPC: Expand load extend vector operations This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for vector types when altivec is enabled. llvm-svn: 167386	2012-11-05 17:15:56 +00:00
Bill Schmidt	9953cf294b	This patch addresses an ABI compatibility issue with empty aggregate parameters. Examples of these are: struct { } a; union { } b[256]; int a[0]; An empty aggregate has an address, although dereferencing that address is pointless. When passed as a parameter, an empty aggregate does not consume a protocol register, nor does it consume a doubleword in the parameter save area. Passing an empty aggregate by reference passes an address just as for any other aggregate. Returning an empty aggregate uses GPR3 as a hidden address of the return value location, just as for any other aggregate. The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate parameters passed by value. The handling of return values and by-reference parameters was already correct. Built on powerpc64-unknown-linux-gnu and tested with no new regressions. A test case is included to test proper handling of empty aggregate parameters on both sides of the function call protocol. llvm-svn: 167090	2012-10-31 01:15:05 +00:00
Adhemerval Zanella	5c043aeb1b	PowerPC: Expand FSRQT for vector types This patch expands FSQRT for floating point vector types when altivec is used. llvm-svn: 167034	2012-10-30 18:29:42 +00:00
Adhemerval Zanella	56775e0f13	PowerPC: More support for Altivec compare operations This patch adds more support for vector type comparisons using altivec. It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector types for comparison operators ==, !=, >, >=, <, and <=. llvm-svn: 167015	2012-10-30 13:50:19 +00:00
Bill Schmidt	bd4ac26973	This patch solves a problem with passing varargs parameters under the PPC64 ELF ABI. A varargs parameter consisting of a single-precision floating-point value, or of a single-element aggregate containing a single-precision floating-point value, must be passed in the low-order (rightmost) four bytes of the doubleword stack slot reserved for that parameter. If there are GPR protocol registers remaining, the parameter must also be mirrored in the low-order four bytes of the reserved GPR. Prior to this patch, such parameters were being passed in the high-order four bytes of the stack slot and the mirrored GPR. The patch adds a new test case to verify the correct code generation. llvm-svn: 166968	2012-10-29 21:18:16 +00:00
Ulrich Weigand	3abb34389d	In various places throughout the code generator, there were special checks to avoid performing compile-time arithmetic on PPCDoubleDouble. Now that APFloat supports arithmetic on PPCDoubleDouble, those checks are no longer needed, and we can treat the type like any other. llvm-svn: 166958	2012-10-29 18:35:49 +00:00
Ulrich Weigand	0de4a1e4ae	Allow i32/i64 for 'f' constraint on PowerPC. This fixes PR12757. llvm-svn: 166943	2012-10-29 17:49:34 +00:00
Bill Schmidt	bbc661e572	This patch adds alignment information for long double to the 64-bit PowerPC ELF subtarget. The existing logic is used as a fallback to avoid any changes to the Darwin ABI. PPC64 ELF now has two possible data layout strings: one for FreeBSD, which requires 8-byte alignment, and a default string that requires 16-byte alignment. I've added a test for PPC64 Linux to verify the 16-byte alignment. If somebody wants to add a separate test for FreeBSD, that would be great. Note that there is a companion patch to update the alignment information in Clang, which I am committing now as well. llvm-svn: 166928	2012-10-29 14:59:36 +00:00
Bill Schmidt	6ed3b99f43	This patch addresses a PPC64 ELF issue with passing parameters consisting of structs having size 3, 5, 6, or 7. Such a struct must be passed and received as right-justified within its register or memory slot. The problem is only present for structs that are passed in registers. Previously, as part of a patch handling all structs of size less than 8, I added logic to rotate the incoming register so that the struct was left- justified prior to storing the whole register. This was incorrect because the address of the parameter had already been adjusted earlier to point to the right-adjusted value in the storage slot. Essentially I had accidentally accounted for the right-adjustment twice. In this patch, I removed the incorrect logic and reorganized the code to make the flow clearer. The removal of the rotates changes the expected code generation, so test case structsinregs.ll has been modified to reflect this. I also added a new test case, jaggedstructs.ll, to demonstrate that structs of these sizes can now be properly received and passed. I've built and tested the code on powerpc64-unknown-linux-gnu with no new regressions. I also ran the GCC compatibility test suite and verified that earlier problems with these structs are now resolved, with no new regressions. llvm-svn: 166680	2012-10-25 13:38:09 +00:00
Ulrich Weigand	d34b5bd610	This patch fixes failures in the SingleSource/Regression/C/uint64_to_float test case on PowerPC caused by rounding errors when converting from a 64-bit integer to a single-precision floating point. The reason for this are double-rounding effects, since on PowerPC we have to convert to an intermediate double-precision value first, which gets rounded to the final single-precision result. The patch fixes the problem by preparing the 64-bit integer so that the first conversion step to double-precision will always be exact, and the final rounding step will result in the correctly-rounded single-precision result. The generated code sequence is equivalent to what GCC would generate. When -enable-unsafe-fp-math is in effect, that extra effort is omitted and we accept possible rounding errors (just like GCC does as well). llvm-svn: 166178	2012-10-18 13:16:11 +00:00
Bill Schmidt	48081cad0d	This patch addresses PR13949. For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8 bytes are to be passed in the low-order bits ("right-adjusted") of the doubleword register or memory slot assigned to them. A previous patch addressed this for aggregates passed in registers. However, small aggregates passed in the overflow portion of the parameter save area are still being passed left-adjusted. The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on the callee side. The main fix on the callee side simply extends existing logic for 1- and 2-byte objects to 1- through 7-byte objects, and correcting a constant left over from 32-bit code. There is also a fix to a bogus calculation of the offset to the following argument in the parameter save area. On the caller side, again a constant left over from 32-bit code is fixed. Additionally, some code for 1, 2, and 4-byte objects is duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only. The LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to handle both ABIs, and I propose to separate this into two functions in a future patch, at which time the duplication can be removed. The patch adds a new test (structsinmem.ll) to demonstrate correct passing of structures of all seven sizes. Eight dummy parameters are used to force these structures to be in the overflow portion of the parameter save area. As a side effect, this corrects the case when aggregates passed in registers are saved into the first eight doublewords of the parameter save area: Previously they were stored left-justified, and now are properly stored right-justified. This requires changing the expected output of existing test case structsinregs.ll. llvm-svn: 166022	2012-10-16 13:30:53 +00:00
NAKAMURA Takumi	4d88f05fe0	llvm/test/CodeGen/PowerPC/2012-10-12-bitcast.ll: Try to fix failure on non-ppc hosts, to add -mattr=+altivec. llvm-svn: 165803	2012-10-12 16:01:08 +00:00
Ulrich Weigand	9aa51d1a2c	Fix big-endian codegen bug in DAGTypeLegalizer::ExpandRes_BITCAST On PowerPC, a bitcast of <16 x i8> to i128 may run through a code path in ExpandRes_BITCAST that attempts to do an intermediate bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts of the resulting i128 by pairing up two of those i32 vector elements each. The code already recognizes that on a big-endian system, the first two vector elements form the Hi part, and the final two vector elements form the Lo part (vice-versa from the little-endian situation). However, we also need to take endianness into account when forming each of those separate pairs: on a big-endian system, vector element 0 is the high part of the pair making up the Hi part of the result, and vector element 1 is the low part of the pair. The code currently always uses vector element 0 as the low part and vector element 1 as the high part, as is appropriate for little-endian platforms only. This patch fixes this by swapping the vector elements as they are paired up as appropriate. llvm-svn: 165802	2012-10-12 15:42:58 +00:00
Bill Schmidt	22162470ba	This patch addresses PR13947. For function calls on the 64-bit PowerPC SVR4 target, each parameter is mapped to as many doublewords in the parameter save area as necessary to hold the parameter. The first 13 non-varargs floating-point values are passed in registers; any additional floating-point parameters are passed in the parameter save area. A single-precision floating-point parameter (32 bits) must be mapped to the second (rightmost, low-order) word of its assigned doubleword slot. Currently LLVM violates this ABI requirement by mapping such a parameter to the first (leftmost, high-order) word of its assigned doubleword slot. This is internally self-consistent but will not interoperate correctly with libraries compiled with an ABI-compliant compiler. This patch corrects the problem by adjusting the parameter addressing on both sides of the calling convention. llvm-svn: 165714	2012-10-11 15:38:20 +00:00
Bill Schmidt	e09d4fd769	Add -mattr=+altivec and remove XFAIL. llvm-svn: 165666	2012-10-10 22:25:11 +00:00
Bill Schmidt	6d110a5139	XFAIL for all targets pending investigation llvm-svn: 165664	2012-10-10 21:52:10 +00:00
Bill Schmidt	b9bc47409d	When generating spill and reload code for vector registers on PowerPC, the compiler makes use of GPR0. However, there are two flavors of GPR0 defined by the target: the 32-bit GPR0 (R0) and the 64-bit GPR0 (X0). The spill/reload code makes use of R0 regardless of whether we are generating 32- or 64-bit code. This patch corrects the problem in the obvious manner, using X0 and ADDI8 for 64-bit and R0 and ADDI for 32-bit. llvm-svn: 165658	2012-10-10 21:25:01 +00:00
Bill Schmidt	38d9458720	The PowerPC VRSAVE register has been somewhat of an odd beast since the Altivec extensions were introduced. Its use is optional, and allows the compiler to communicate to the operating system which vector registers should be saved and restored during a context switch. In practice, this information is ignored by the various operating systems using the SVR4 ABI; the kernel saves and restores the entire register state. Setting the VRSAVE register is no longer performed by the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux systems. It seems best to avoid this logic within LLVM as well. This patch avoids generating code to update and restore VRSAVE for the PowerPC SVR4 ABIs (32- and 64-bit). The code remains in place for the Darwin ABI. llvm-svn: 165656	2012-10-10 20:54:15 +00:00
Adhemerval Zanella	fe3f793cec	PR12716: PPC crashes on vector compare Vector compare using altivec 'vcmpxxx' instructions have as third argument a vector register instead of CR one, different from integer and float-point compares. This leads to a failure in code generation, where 'SelectSETCC' expects a DAG with a CR register and gets vector register instead. This patch changes the behavior by just returning a DAG with the vector compare instruction based on the type. The patch also adds a testcase for all vector types llvm defines. It also included a fix on signed 5-bits predicates printing, where signed values were not handled correctly as signed (char are unsigned by default for PowerPC). This generates 'vspltisw' (vector splat) instruction with SIM out of range. llvm-svn: 165419	2012-10-08 18:59:53 +00:00
Adhemerval Zanella	5c6e08435e	Add floating-point to and from integer conversion This patch add altivec support for v4i32 to v4f32 and for v4f32 to v4i32 vector rounding conversion. llvm-svn: 165409	2012-10-08 17:27:24 +00:00
Rafael Espindola	144e271570	Convert to unix line endings. llvm-svn: 165308	2012-10-05 13:32:38 +00:00
Roman Divacky	ca10389bfe	Specify MachinePointerInfo as refering to the argument value and offset of the store when handling byval arguments. Thus preventing reordering of the store with load with post-RA scheduler. llvm-svn: 164553	2012-09-24 20:47:19 +00:00
Roman Divacky	264f504077	Specify cpu to get the correct instruction ordering. Remove XFAIL. llvm-svn: 164306	2012-09-20 14:59:42 +00:00
Jordan Rose	b64c123453	Really XFAIL test/CodeGen/PowerPC/structsinregs.ll. XFAIL needs a trailing colon. Hopefully this will get the buildbots happy again while Bill works on getting it passing. llvm-svn: 164237	2012-09-19 17:03:11 +00:00
Bill Schmidt	479a4588b9	XFAIL test/CodeGen/PowerPC/structsinregs.ll llvm-svn: 164233	2012-09-19 16:18:23 +00:00
Bill Schmidt	019cc6fe03	Small structs for PPC64 SVR4 must be passed right-justified in registers. lib/Target/PowerPC/PPCISelLowering.{h,cpp} Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4. Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4. Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4. Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4. Rename LowerCall_SVR4 to LowerCall_32SVR4. Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4. test/CodeGen/PowerPC/structsinregs.ll New test. llvm-svn: 164228	2012-09-19 15:42:13 +00:00
Roman Divacky	947148aa45	Add test for r164155 and remove two tests superseded by ppc64-calls.ll. llvm-svn: 164162	2012-09-18 19:51:44 +00:00
Roman Divacky	0be33598ce	Avoid symbol name clash when filling TOC. Patch by Adhemerval Zanella. llvm-svn: 164141	2012-09-18 17:10:37 +00:00
Roman Divacky	d4f6f421a9	On PPC64 emit the environment pointer. Patch by Adhemerval Zanella. llvm-svn: 164139	2012-09-18 16:55:29 +00:00
Roman Divacky	762930637c	Optimize local func calls to not emit nop for TOC restoration. Patch by Adhemerval Zanella. llvm-svn: 164138	2012-09-18 16:47:58 +00:00
Roman Divacky	c9e23d93ae	This patch corrects logic in PPCFrameLowering for save and restore of nonvolatile condition register fields across calls under the SVR4 ABIs. * With the 64-bit ABI, the save location is at a fixed offset of 8 from the stack pointer. The frame pointer cannot be used to access this portion of the stack frame since the distance from the frame pointer may change with alloca calls. * With the 32-bit ABI, the save location is just below the general register save area, and is accessed via the frame pointer like the rest of the save areas. This is an optional slot, so it must only be created if any of CR2, CR3, and CR4 were modified. * For both ABIs, save/restore logic is generated only if one of the nonvolatile CR fields were modified. I also took this opportunity to clean up an extra FIXME in PPCFrameLowering.h. Save area offsets for 32-bit GPRs are meaningless for the 64-bit ABI, so I removed them for correctness and efficiency. Fixes PR13708 and partially also PR13623. It lets us enable exception handling on PPC64. Patch by William J. Schmidt! llvm-svn: 163713	2012-09-12 14:47:47 +00:00
Jakob Stoklund Olesen	866908c42c	Allow overlaps between virtreg and physreg live ranges. The RegisterCoalescer understands overlapping live ranges where one register is defined as a copy of the other. With this change, register allocators using LiveRegMatrix can do the same, at least for copies between physical and virtual registers. When a physreg is defined by a copy from a virtreg, allow those live ranges to overlap: %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11 %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill> We can assign %vreg11 to %ECX, overlapping the live range of %CL. llvm-svn: 163336	2012-09-06 18:15:23 +00:00
Jakob Stoklund Olesen	c7579cdded	Move tie checks into MachineVerifier::visitMachineOperand. llvm-svn: 163152	2012-09-04 18:38:28 +00:00
Hal Finkel	1859d26528	Reserve space for the mandatory traceback fields on PPC64. We need to reserve space for the mandatory traceback fields, though leaving them as zero is appropriate for now. Although the ABI calls for these fields to be filled in fully, no compiler on Linux currently does this, and GDB does not read these fields. GDB uses the first word of zeroes during exception handling to find the end of the function and the size field, allowing it to compute the beginning of the function. DWARF information is used for everything else. We need the extra 8 bytes of pad so the size field is found in the right place. As a comparison, GCC fills in a few of the fields -- language, number of saved registers -- but ignores the rest. IBM's proprietary OSes do make use of the full traceback table facility. Patch by Bill Schmidt. llvm-svn: 162854	2012-08-29 20:22:24 +00:00
Roman Divacky	8c4b6a307e	Emit word of zeroes after the last instruction as a start of the mandatory traceback table on PowerPC64. This helps gdb handle exceptions. The other mandatory fields are ignored by gdb and harder to implement so just add there a FIXME. Patch by Bill Schmidt. PR13641. llvm-svn: 162778	2012-08-28 19:06:55 +00:00
Hal Finkel	742b535e40	Add PPC Freescale e500mc and e5500 subtargets. Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to the PowerPC backend. Patch by Tobias von Koch. llvm-svn: 162764	2012-08-28 16:12:39 +00:00
Hal Finkel	686f2ee226	Allow remat of LI on PPC. Allow load-immediates to be rematerialised in the register coalescer for PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail, because it relies on a register move getting emitted. The immediate load is equivalent, so change this test case. Patch by Tobias von Koch. llvm-svn: 162727	2012-08-28 02:10:33 +00:00
Hal Finkel	5ab378037f	Eliminate redundant CR moves on PPC32. The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and unset if it doesn't. The solution up to now was to insert a MachineNode to set/unset the CR bit, which produces a CR vreg. This vreg was then copied into CR bit 6. When the register allocator saw a bunch of these in the same function, it allocated the set/unset CR bit in some random CR register (1 extra instruction) and then emitted CR moves before every vararg function call, rather than just setting and unsetting CR bit 6 directly before every vararg function call. This patch instead inserts a PPCcrset/PPCcrunset instruction which are then matched by a dedicated instruction pattern. Patch by Tobias von Koch. llvm-svn: 162725	2012-08-28 02:10:27 +00:00
Hal Finkel	e39526a789	Optimize zext on PPC64. The zeroextend IR instruction is lowered to an 'and' node with an immediate mask operand, which in turn gets legalised to a sequence of ori's & ands. This can be done more efficiently using the rldicl instruction. Patch by Tobias von Koch. llvm-svn: 162724	2012-08-28 02:10:15 +00:00
Roman Divacky	ace4707ea6	Lower constant pools and jump tables via TOC on PPC64/SVR4. In collaboration with Adhemerval Zanella. llvm-svn: 162562	2012-08-24 16:26:02 +00:00
Nadav Rotem	70409991bc	During the CodeGenPrepare we often lower intrinsics (such as objsize) and allow some optimizations to turn conditional branches into unconditional. This commit adds a simple control-flow optimization which merges two consecutive basic blocks which are connected by a single edge. This allows the codegen to operate on larger basic blocks. rdar://11973998 llvm-svn: 161852	2012-08-14 05:19:07 +00:00
Hal Finkel	33e529d56b	MFTB on PPC64 should really be encoded using MFSPR. The MFTB instruction itself is being phased out, and its functionality is provided by MFSPR. According to the ISA docs, using MFSPR works on all known chips except for the 601 (which did not have a timebase register anyway) and the POWER3. Thanks to Adhemerval Zanella for pointing this out! llvm-svn: 161346	2012-08-06 21:21:44 +00:00
Hal Finkel	70381a7b18	Add readcyclecounter lowering on PPC64. On PPC64, this can be done with a simple TableGen pattern. To enable this, I've added the (otherwise missing) readcyclecounter SDNode definition to TargetSelectionDAG.td. llvm-svn: 161302	2012-08-04 14:10:46 +00:00
Bob Wilson	874886cd66	Refactor and check "onlyReadsMemory" before optimizing builtins. This patch is mostly just refactoring a bunch of copy-and-pasted code, but it also adds a check that the call instructions are readnone or readonly. That check was already present for sin, cos, sqrt, log2, and exp2 calls, but it was missing for the rest of the builtins being handled in this code. llvm-svn: 161282	2012-08-03 23:29:17 +00:00
Chandler Carruth	ff123d5c63	Fix the remaining TCL-style quotes found in the testsuite. This is another mechanical change accomplished though the power of terrible Perl scripts. I have manually switched some "s to 's to make escaping simpler. While I started this to fix tests that aren't run in all configurations, the massive number of tests is due to a really frustrating fragility of our testing infrastructure: things like 'grep -v', 'not grep', and 'expected failures' can mask broken tests all too easily. Essentially, I'm deeply disturbed that I can change the testsuite so radically without causing any change in results for most platforms. =/ llvm-svn: 159547	2012-07-02 19:09:46 +00:00
Chandler Carruth	5da53436d5	Convert the uses of '\|&' to use '2>&1 \|' instead, which works on old versions of Bash. In addition, I can back out the change to the lit built-in shell test runner to support this. This should fix the majority of fallout on Darwin, but I suspect there will be a few straggling issues. llvm-svn: 159544	2012-07-02 18:37:59 +00:00
Chandler Carruth	a5a29f970e	Convert all tests using TCL-style quoting to use shell-style quoting. This was done through the aid of a terrible Perl creation. I will not paste any of the horrors here. Suffice to say, it require multiple staged rounds of replacements, state carried between, and a few nested-construct-parsing hacks that I'm not proud of. It happens, by luck, to be able to deal with all the TCL-quoting patterns in evidence in the LLVM test suite. If anyone is maintaining large out-of-tree test trees, feel free to poke me and I'll send you the steps I used to convert things, as well as answer any painful questions etc. IRC works best for this type of thing I find. Once converted, switch the LLVM lit config to use ShTests the same as Clang. In addition to being able to delete large amounts of Python code from 'lit', this will also simplify the entire test suite and some of lit's architecture. Finally, the test suite runs 33% faster on Linux now. ;] For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s llvm-svn: 159525	2012-07-02 12:47:22 +00:00
Hal Finkel	460e94d842	Add support for the PPC isel instruction. The isel (integer select) instruction is supported on the 440 and A2 embedded cores and on the POWER7. llvm-svn: 159045	2012-06-22 23:10:08 +00:00
Lang Hames	c98ebda325	Rename fp-op fusion option (yet again) for compatibility with GCC option. llvm-svn: 159042	2012-06-22 22:31:00 +00:00
Lang Hames	b8650f106a	Rename -allow-excess-fp-precision flag to -fuse-fp-ops, and switch from a boolean flag to an enum: { Fast, Standard, Strict } (default = Standard). This option controls the creation by optimizations of fused FP ops that store intermediate results in higher precision than IEEE allows (E.g. FMAs). The behavior of this option is intended to match the behaviour specified by a soon-to-be-introduced frontend flag: '-ffuse-fp-ops'. Fast mode - allows formation of fused FP ops whenever they're profitable. Standard mode - allow fusion only for 'blessed' FP ops. At present the only blessed op is the fmuladd intrinsic. In the future more blessed ops may be added. Strict mode - allow fusion only if/when it can be proven that the excess precision won't effect the result. Note: This option only controls formation of fused ops by the optimizers. Fused operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic) will always be honored, regardless of the value of this option. Internally TargetOptions::AllowExcessFPPrecision has been replaced by TargetOptions::AllowFPOpFusion. llvm-svn: 158956	2012-06-22 01:09:09 +00:00
Hal Finkel	a86b0f20dd	Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC. Thanks to Tobias von Koch for pointing out this problem. llvm-svn: 158932	2012-06-21 20:10:48 +00:00
Hal Finkel	ca542beffe	Add support for generating reg+reg (indexed) pre-inc loads on PPC. llvm-svn: 158823	2012-06-20 15:43:03 +00:00
Lang Hames	39fb1d08dc	Add DAG-combines for aggressive FMA formation. This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or FSUB + FMUL. The combines are performed when: (a) Either AllowExcessFPPrecision option (-enable-excess-fp-precision for llc) OR UnsafeFPMath option (-enable-unsafe-fp-math) are set, and (b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of the FADD/FSUB, and (c) The FMUL only has one user (the FADD/FSUB). If your target has fast FMA instructions you can make use of these combines by overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for types supported by your FMA instruction, and adding patterns to match ISD::FMA to your FMA instructions. llvm-svn: 158757	2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen	77a0cfb19a	Add a triple. The test was failing on Linux because of asm syntax differences. llvm-svn: 158748	2012-06-19 21:46:25 +00:00
Jakob Stoklund Olesen	0f855e4263	Implement PPCInstrInfo::isCoalescableExtInstr(). The PPC::EXTSW instruction preserves the low 32 bits of its input, just like some of the x86 instructions. Use it to reduce register pressure when the low 32 bits have multiple uses. This requires a small change to PeepholeOptimizer since EXTSW takes a 64-bit input register. This is related to PR5997. llvm-svn: 158743	2012-06-19 21:14:34 +00:00
Hal Finkel	1cc27e44a4	Add support for generating reg+reg preinc stores on PPC. PPC will now generate STWUX and friends. llvm-svn: 158698	2012-06-19 02:34:32 +00:00
Hal Finkel	6261c2dc28	Cleanup trip-count finding for PPC CTR loops (and some bug fixes). This cleans up the method used to find trip counts in order to form CTR loops on PPC. This refactoring allows the pass to find loops which have a constant trip count but also happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different classes of loops that are currently ignored. In addition, we now search through all potential induction operations instead of just the first. Also, we check the predicate code on the conditional branch and abort the transformation if the code is not EQ or NE, and we then make sure that the branch to be transformed matches the condition register defined by the comparison (multiple possible comparisons will be considered). llvm-svn: 158607	2012-06-16 20:34:07 +00:00
Hal Finkel	4e9f1a859f	Enable ILP scheduling for all nodes by default on PPC. Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296	2012-06-10 19:32:29 +00:00
Hal Finkel	2edfbddcf0	Improve ext/trunc patterns on PPC64. The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283	2012-06-09 22:10:19 +00:00
Hal Finkel	eb50c2d4a4	Enable tail merging on PPC. Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259	2012-06-09 03:14:50 +00:00
Jakob Stoklund Olesen	33a1b416ac	Don't run RAFast in the optimizing regalloc pipeline. The fast register allocator is not supposed to work in the optimizing pipeline. It doesn't make sense to compute live intervals, run full copy coalescing, and then run RAFast. Fast register allocation in the optimizing pipeline is better done by RABasic. llvm-svn: 158242	2012-06-08 23:15:12 +00:00
Hal Finkel	c6b5debb40	Enable PPC CTR loop formation by default. Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223	2012-06-08 19:19:53 +00:00
Hal Finkel	821e00121c	Disable the PPC CTR-Loops pass by default. The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206	2012-06-08 15:38:25 +00:00
Hal Finkel	8b01503ee5	Fix a bug in the new PPC CTR-Loops pass. The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205	2012-06-08 15:38:23 +00:00
Hal Finkel	96c2d4d945	Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204	2012-06-08 15:38:21 +00:00
Roman Divacky	e3f15c98d1	Implement local-exec TLS on PowerPC. llvm-svn: 157935	2012-06-04 17:36:38 +00:00
Hal Finkel	595817eebe	Enable generating PPC pre-increment (r+imm) instructions by default. It seems that this no longer causes test suite failures on PPC64 (after r157159), and often gives a performance benefit, so it can be enabled by default. llvm-svn: 157911	2012-06-04 02:21:00 +00:00
Hal Finkel	601f555eee	Add a missing PPC 64-bit stwu pattern. This seems to fix the remaining compile-time failures on PPC64 when compiling with -enable-ppc-preinc. llvm-svn: 157159	2012-05-20 17:11:24 +00:00
Jakob Stoklund Olesen	589c6eb95c	Remove -join-physregs from the test suite. This option has been disabled for a while, and it is going away so I can clean up the coalescer code. The tests that required physreg joining to be enabled were almost all of the form "tiny function with interference between arguments and return value". Such functions are usually inlined in the real world. The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly rare. llvm-svn: 157027	2012-05-17 23:44:19 +00:00
Hal Finkel	e0cf6397fd	Remove dead SD nodes after the combining pass. Fixes PR12201. llvm-svn: 154786	2012-04-16 03:33:22 +00:00
Hal Finkel	322e41a914	Enable prefetch generation on PPC64. llvm-svn: 153851	2012-04-01 20:08:17 +00:00
Hal Finkel	9f9f8929ee	Add instruction itinerary for the PPC64 A2 core. This adds a full itinerary for IBM's PPC64 A2 embedded core. These cores form the basis for the CPUs in the new IBM BG/Q supercomputer. llvm-svn: 153842	2012-04-01 19:22:40 +00:00
Eli Bendersky	f33086052d	Continue cleanup of LIT, getting rid of the remaining artifacts from dejagnu * Removed test/lib/llvm.exp - it is no longer needed * Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files left in the test suite so this code is no longer required. test/lit.cfg is now much shorter and clearer * Removed a lot of duplicate code in lit.local.cfg files that need access to the root configuration, by adding a "root" attribute to the TestingConfig object. This attribute is dynamically computed to provide the same information as was previously provided by the custom getRoot functions. * Documented the config.root attribute in docs/CommandGuide/lit.pod llvm-svn: 153408	2012-03-25 09:02:19 +00:00
Hal Finkel	e44eb28807	Fix small-integer VAARG on SVR4 ABI PPC64. The PPC64 SVR4 ABI requires integer stack arguments, and thus the var. args., that are smaller than 64 bits be zero extended to 64 bits. llvm-svn: 153373	2012-03-24 03:53:55 +00:00
Roman Divacky	ded7f01062	Test the section specification. llvm-svn: 151552	2012-02-27 20:42:19 +00:00
Roman Divacky	8fe40cd659	Reapply r151278 with fixes. MCize function entry label emission on PowerPC64 properly. llvm-svn: 151547	2012-02-27 20:20:47 +00:00
Hal Finkel	6fd2b434bd	Revert r151278, breaks static linking. Reverting this because it breaks static linking on ppc64. Specifically, it may be linkonce_odr functions that are the problem. With this patch, if you link statically, calls to some functions end up calling their descriptor addresses instead of calling to their entry points. This causes the execution to fail with SIGILL (b/c the descriptor address just has some pointers, not code). llvm-svn: 151433	2012-02-25 03:40:11 +00:00
Hal Finkel	a3e6ed2161	X11/X2 loads around indirect calls on ppc64 should not be deleted. llvm-svn: 151374	2012-02-24 17:54:01 +00:00
Hal Finkel	b9a3d61894	Don't crash when a glue node contains an internal CopyToReg This is necessary to support the existing ppc lowering code for indirect calls. Fixes PR12071. llvm-svn: 151373	2012-02-24 17:53:59 +00:00
Roman Divacky	a2d3608f78	MCize function entry label emission on PowerPC64 properly. llvm-svn: 151278	2012-02-23 20:28:39 +00:00
Hal Finkel	ad4d9f5848	Allow the use of an alternate symbol for calculating a function's size. The standard function epilog includes a .size directive, but ppc64 uses an alternate local symbol to tag the actual start of each function. Until recently, binutils accepted the .size directive as: .size test1, .Ltmp0-test1 however, using this directive with recent binutils will result in the error: .size expression for XXX does not evaluate to a constant so we must use the label which actually tags the start of the function. llvm-svn: 151200	2012-02-22 21:11:47 +00:00
Jakob Stoklund Olesen	d0dc3920de	Remove a bad PowerPC test. This test case was way too strict, matching the entire assembly output. Every non-trivial change to the ppc backend or -O0 pipeline required the test to be updated. It should be replaced with a test of the specific vaarg feature. llvm-svn: 151105	2012-02-21 23:49:18 +00:00
Eli Bendersky	924f9a671d	Replace all instances of dg.exp file with lit.local.cfg, since all tests are run with LIT now and now Dejagnu. dg.exp is no longer needed. Patch reviewed by Daniel Dunbar. It will be followed by additional cleanup patches. llvm-svn: 150664	2012-02-16 06:28:33 +00:00
Hal Finkel	8606e3c7e3	AggressiveAntiDepBreaker needs to skip debug values because a debug value does not have a corresponding SUnit llvm-svn: 148260	2012-01-16 22:53:41 +00:00
Hal Finkel	692d1fb355	Cleanup stack/frame register define/kill states. This fixes two bugs: 1. The ST*UX instructions that store and update the stack pointer did not set define/kill on R1. This became a problem when I activated post-RA scheduling (and had incorrectly adjusted the Frames-large test). 2. eliminateFrameIndex did not kill its scavenged temporary register, and this could cause the scavenger to exhaust all available registers (and its emergency spill slot) when there were a lot of CR values to spill. The 2010-02-12-saveCR test has been adjusted to check for this. llvm-svn: 147359	2011-12-30 00:34:00 +00:00
Hal Finkel	750366f014	Add a test case to make sure that the nop really does follow the bl on ppc64 elf llvm-svn: 146666	2011-12-15 17:59:23 +00:00
Chandler Carruth	6b0e34c445	Manually upgrade the test suite to specify the flag to cttz and ctlz. I followed three heuristics for deciding whether to set 'true' or 'false': - Everything target independent got 'true' as that is the expected common output of the GCC builtins. - If the target arch only has one way of implementing this operation, set the flag in the way that exercises the most of codegen. For most architectures this is also the likely path from a GCC builtin, with 'true' being set. It will (eventually) require lowering away that difference, and then lowering to the architecture's operation. - Otherwise, set the flag differently dependending on which target operation should be tested. Let me know if anyone has any issue with this pattern or would like specific tests of another form. This should allow the x86 codegen to just iteratively improve as I teach the backend how to differentiate between the two forms, and everything else should remain exactly the same. llvm-svn: 146370	2011-12-12 11:59:10 +00:00
Hal Finkel	67a7f18faf	Make CR spill and restore use a reserved register. These operations cannot use the register scavenger because the scavenger can only scavenge one register and frame-index elimination may have already grabbed it. llvm-svn: 146318	2011-12-10 04:50:53 +00:00
Eli Friedman	053a724483	Fix a couple of logic bugs in TargetLowering::SimplifyDemandedBits. PR11514. llvm-svn: 146219	2011-12-09 01:16:26 +00:00
Hal Finkel	0fc34bc2d3	delaying restore-cr changed assigned registers in some tests llvm-svn: 145963	2011-12-06 20:55:46 +00:00
Hal Finkel	0702bc1b28	add a test case that uses RESTORE_CR llvm-svn: 145962	2011-12-06 20:55:41 +00:00
Hal Finkel	97a6028b3a	Add test case - this input used to crash because of duplicate generation of SPILL_CRs llvm-svn: 145820	2011-12-05 17:55:22 +00:00
Hal Finkel	8f6834dfa5	enable PPC register scavenging by default (update tests and remove some FIXMEs) llvm-svn: 145819	2011-12-05 17:55:17 +00:00
Hal Finkel	e18c72689c	remove wasted space for extra bit copies of CR2 subregs llvm-svn: 145817	2011-12-05 17:55:06 +00:00
Hal Finkel	d87f7af1f3	specify cpu for test to fix failure on some darwin systems with a g4+ cpu llvm-svn: 145699	2011-12-02 19:38:17 +00:00
Hal Finkel	9286705955	adjust the instruction ordering in some PPC tests: changes due to postRA haz. rec. llvm-svn: 145678	2011-12-02 04:58:12 +00:00
Chris Lattner	6a144a2227	Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic. llvm-svn: 145171	2011-11-27 06:54:59 +00:00
Hal Finkel	6f0ae783fe	add basic PPC register-pressure feedback; adjust the vaarg test to match the new register-allocation pattern llvm-svn: 145065	2011-11-22 16:21:04 +00:00
NAKAMURA Takumi	6e315dd8ba	test/CodeGen/PowerPC/2008-10-17-AsmMatchingOperands.ll: [PR11218] Mark "REQUIRES: asserts" for now. llvm-svn: 143247	2011-10-28 23:11:03 +00:00
Dan Gohman	c32af340fc	Change the default scheduler from Latency to ILP, since Latency is going away. llvm-svn: 142810	2011-10-24 17:45:02 +00:00
Hal Finkel	9dda3e0d13	use FileCheck and not grep in new tests llvm-svn: 142189	2011-10-17 16:01:41 +00:00
Hal Finkel	7ccb391d21	Test case for CanLowerReturn fix (r141981) llvm-svn: 142172	2011-10-17 04:03:59 +00:00
Hal Finkel	ad677b64db	Add PPC 440 scheduler and some associated tests (new files) llvm-svn: 142171	2011-10-17 04:03:55 +00:00
Eli Friedman	6fb0c1e474	Convert more tests over to the new atomic instructions. I did not convert Atomics-32.ll and Atomics-64.ll by hand; the diff is autoupgrade output. The wmb test is gone because there isn't any way to express wmb with the new atomic instructions; if someone really needs a non-asm way to write a wmb on Alpha, a platform-specific intrisic could be added. llvm-svn: 140566	2011-09-26 21:30:17 +00:00
Devang Patel	edc216515e	Remove ancient debug info constructs from test cases, they are not relevant to test case's main objective. llvm-svn: 139675	2011-09-14 00:29:50 +00:00
Duncan Sands	a098436b32	Split the init.trampoline intrinsic, which currently combines GCC's init.trampoline and adjust.trampoline intrinsics, into two intrinsics like in GCC. While having one combined intrinsic is tempting, it is not natural because typically the trampoline initialization needs to be done in one function, and the result of adjust trampoline is needed in a different (nested) function. To get around this llvm-gcc hacks the nested function lowering code to insert an additional parent variable holding the adjust.trampoline result that can be accessed from the child function. Dragonegg doesn't have the luxury of tweaking GCC code, so it stored the result of adjust.trampoline in the memory GCC set aside for the trampoline itself (this is always available in the child function), and set up some new memory (using an alloca) to hold the trampoline. Unfortunately this breaks Go which allocates trampoline memory on the heap and wants to use it even after the parent has exited (!). Rather than doing even more hacks to get Go working, it seemed best to just use two intrinsics like in GCC. Patch mostly by Sanjoy Das. llvm-svn: 139140	2011-09-06 13:37:06 +00:00
Bill Wendling	e6174a2c85	Update more tests to the new EH scheme. llvm-svn: 138894	2011-08-31 21:04:11 +00:00
Roman Divacky	71038e7021	Set CR1EQ only when lowering vararg floating arguments (not any vararg arguments as before), unset CR1EQ otherwise. llvm-svn: 138802	2011-08-30 17:04:16 +00:00
Evan Cheng	76792992d6	Add MCObjectFileInfo and sink the MCSections initialization code from TargetLoweringObjectFileImpl down to MCObjectFileInfo. TargetAsmInfo is done to one last method. It's almost gone! llvm-svn: 135569	2011-07-20 05:58:47 +00:00
Eli Friedman	4d5532a085	FileCheck-ize a couple tests. llvm-svn: 135427	2011-07-18 21:23:42 +00:00
NAKAMURA Takumi	5b304432bb	test/CodeGen/PowerPC/vector.ll: Tweak redirection >%t >%t to >%t >>%t. See also r134814 (test/CodeGen/X86/vector.ll). llvm-svn: 134900	2011-07-11 16:21:52 +00:00
Roman Divacky	4394e68c24	Implement ISD::VAARG lowering on PPC32. llvm-svn: 134005	2011-06-28 15:30:42 +00:00
Roman Divacky	254f82112d	Don't apply on PPC64 the 32bit ADDIC optimizations as there's no overflow with 32bit values. llvm-svn: 133439	2011-06-20 15:28:39 +00:00
Chris Lattner	80ed9dc9e5	rip out a ton of intrinsic modernization logic from AutoUpgrade.cpp, which is for pre-2.9 bitcode files. We keep x86 unaligned loads, movnt, crc32, and the target indep prefetch change. As usual, updating the testsuite is a PITA. llvm-svn: 133337	2011-06-18 06:05:24 +00:00
Roman Divacky	d041962c20	Fix a few places where 32bit instructions/registerset were used on PPC64. llvm-svn: 133260	2011-06-17 15:21:10 +00:00
Chris Lattner	33de427cd6	remove parser support for the obsolete "multiple return values" syntax, which was replaced with return of a "first class aggregate". llvm-svn: 133245	2011-06-17 06:49:41 +00:00
Chris Lattner	def1949c00	Remove support for using "foo" as symbols instead of %"foo". This is ancient syntax and has been long obsolete. As usual, updating the tests is the nasty part of this. llvm-svn: 133242	2011-06-17 06:36:20 +00:00
Chris Lattner	b90ed2233c	manually upgrade a bunch of tests to modern syntax, and remove some that are either unreduced or only test old syntax. llvm-svn: 133228	2011-06-17 03:14:27 +00:00
Roman Divacky	a4a59aebd9	Fix wrong usages of CTR/MCTR where CTR8/MCTR8 was meant. - Check for MTCTR8 in addition to MTCTR when looking up a hazard. - When lowering an indirect call use CTR8 when targeting 64bit. - Introduce BCTR8 that uses CTR8 and use it on 64bit when expanding ISD::BRIND. The last change fixes PR8487. With those changes, we are able to compile a running "ls" and "sh" on FreeBSD/PowerPC64. llvm-svn: 132552	2011-06-03 15:47:49 +00:00
Jakob Stoklund Olesen	e7528c45ea	FileCheckize and break dependence on coalescing order. llvm-svn: 130856	2011-05-04 19:02:01 +00:00
Jakob Stoklund Olesen	067ba3c23c	Explicitly request -join-physregs for some tests that depend on it. llvm-svn: 130855	2011-05-04 19:01:59 +00:00
Rafael Espindola	5164e6e8b2	Add 130690 back. llvm-svn: 130693	2011-05-02 15:58:16 +00:00
Rafael Espindola	a392475865	Revert while I debug the tests that use march but not mtriple. llvm-svn: 130691	2011-05-02 15:42:31 +00:00
Rafael Espindola	c2aad4f2a3	Move ppc OS X to cfi too. I am building it on an old ppc mini, but it will take some time. llvm-svn: 130690	2011-05-02 15:00:52 +00:00
Rafael Espindola	fc8223670a	Add r130623 back now that ELF has been fixed to work with -fno-dwarf2-cfi-asm. llvm-svn: 130658	2011-05-01 15:44:13 +00:00
Rafael Espindola	b7c2286055	Revert the previous patch while I figure out how to make llvm-gcc less agressive about disabling cfi on linux :-( llvm-svn: 130626	2011-04-30 23:03:44 +00:00
Rafael Espindola	5265bc483e	Enable CFI on OS X. Currently the output should be almost identical to the one produced by CodeGen to make the transition easier. The only two differences I know of are: * Some files get an extra advance loc of size 0. This will be fixed when relaxations are enabled. * The optimization of declaring an EH symbol as an external variable is not implemented. This is a subset of adding the nounwind attribute, so we if really this at -O0 we should probably do it at the IL level. llvm-svn: 130623	2011-04-30 22:29:54 +00:00
Jakob Stoklund Olesen	1ec41e2bd9	These tests no longer require linear scan because reserved register coalescing is now universal. llvm-svn: 128936	2011-04-05 21:40:41 +00:00
Jakob Stoklund Olesen	8296e30627	Disable the PowerPC/Atomics-64 test. The code inserted by PPCTargetLowering::EmitInstrWithCustomInserter for ppc64 is wrong, and I don't know how to fix it. It seems to be using the correct register classes for pointers, but it inserts all 32-bit instructions. llvm-svn: 128835	2011-04-04 17:57:26 +00:00
Jakob Stoklund Olesen	218661346a	Fix PowerPC tests to be register allocator independent. llvm-svn: 128827	2011-04-04 17:07:03 +00:00
Benjamin Kramer	1885d21700	Fix mistyped CHECK lines. llvm-svn: 127366	2011-03-09 22:07:31 +00:00
Joerg Sonnenberger	62f759791a	Be nice to Xcore and the XMOS assembler and avoid quoting section names that contain only letters, digits and the characters "_" and ".". llvm-svn: 127028	2011-03-04 20:03:14 +00:00
Joerg Sonnenberger	852ab890b5	Bug#9033: For the ELF assembler output, always quote the section name. llvm-svn: 126963	2011-03-03 22:31:08 +00:00
Anton Korobeynikov	3eb4fedecf	Restore the behavior of frame lowering before my refactoring. It turns out that ppc backend has really weird interdependencies over different hooks and all stuff is fragile wrt small changes. This should fix PR8749 llvm-svn: 122155	2010-12-18 19:53:14 +00:00
Devang Patel	c24048a718	If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0. llvm-svn: 121059	2010-12-06 22:39:26 +00:00
Chris Lattner	85854eb4d9	remove a pointless testcase. llvm-svn: 119119	2010-11-15 05:07:03 +00:00
Chris Lattner	e75bb34963	remove some extraneous quotes to make the new instprinter match. llvm-svn: 119104	2010-11-15 02:43:46 +00:00
Chris Lattner	295097cc32	add some nounwind's. llvm-svn: 119086	2010-11-14 22:22:14 +00:00
John Thompson	beffa5bef1	Inline asm mult-alt constraint tests. llvm-svn: 118107	2010-11-02 23:01:44 +00:00
Jakob Stoklund Olesen	6c4353ecee	PowerPC varargs functions store live-in registers on the stack. Make sure we use virtual registers for those stores since RegAllocFast requires that each live physreg only be used once. This fixes PR8357. llvm-svn: 116222	2010-10-11 20:43:09 +00:00
Chris Lattner	f8f7537a77	force a triple, varargs isn't supported with the SVR4 ABI the buildbot tells me. llvm-svn: 116170	2010-10-10 18:59:01 +00:00
Chris Lattner	d10babfd65	fix the expansion of va_arg instruction on PPC to know the arg alignment for PPC32/64, avoiding some masking operations. llvm-gcc expands vaarg inline instead of using the instruction so it has never hit this. llvm-svn: 116168	2010-10-10 18:34:00 +00:00
Chris Lattner	9f06f911d1	the latest assembler that runs on powerpc 10.4 machines doesn't support aligned comm. Detect when compiling for 10.4 and don't emit an alignment for comm. THis will hopefully fix PR8198. llvm-svn: 114817	2010-09-27 06:44:54 +00:00
Eli Friedman	7595ce05a2	PR7781: Fix incorrect shifting in PPCTargetLowering::LowerBUILD_VECTOR. llvm-svn: 109998	2010-08-02 00:18:19 +00:00
Bill Wendling	bf8370ff36	Consider this function: void foo() { __builtin_unreachable(); } It will output the following on Darwin X86: _func1: Leh_func_begin0: pushq %rbp Ltmp0: movq %rsp, %rbp Ltmp1: Leh_func_end0: This prolog adds a new Call Frame Information (CFI) row to the FDE with an address that is not within the address range of the code it describes -- part is equal to the end of the function -- and therefore results in an invalid EH frame. If we emit a nop in this situation, then the CFI row is now within the address range. llvm-svn: 108568	2010-07-16 22:51:10 +00:00
Bill Wendling	4bda1c8e68	Revert. This isn't the correct way to go. llvm-svn: 108478	2010-07-15 23:42:21 +00:00
Bill Wendling	973dc3b1d8	Handle code gen for the unreachable instruction if it's the only instruction in the function. We'll just turn it into a "trap" instruction instead. The problem with not handling this is that it might generate a prologue without the equivalent epilogue to go with it: $ cat t.ll define void @foo() { entry: unreachable } $ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo Leh_func_begin0: ## BB#0: ## %entry pushq %rbp Ltmp0: movq %rsp, %rbp Ltmp1: Leh_func_end0: ... The unwind tables then have bad data in them causing all sorts of problems. Fixes <rdar://problem/8096481>. llvm-svn: 108473	2010-07-15 23:32:40 +00:00
Eric Christopher	2ad0c779c3	Fix up -fstack-protector on linux to use the segment registers. Split out testcases per architecture and os now. Patch from Nelson Elhage. llvm-svn: 107640	2010-07-06 05:18:56 +00:00
Bill Wendling	03bcd6ecc8	Implement the "linker_private_weak" linkage type. This will be used for Objective-C metadata types which should be marked as "weak", but which the linker will remove upon final linkage. However, this linkage isn't specific to Objective-C. For example, the "objc_msgSend_fixup_alloc" symbol is defined like this: .globl l_objc_msgSend_fixup_alloc .weak_definition l_objc_msgSend_fixup_alloc .section __DATA, __objc_msgrefs, coalesced .align 3 l_objc_msgSend_fixup_alloc: .quad _objc_msgSend_fixup .quad L_OBJC_METH_VAR_NAME_1 This is different from the "linker_private" linkage type, because it can't have the metadata defined with ".weak_definition". Currently only supported on Darwin platforms. llvm-svn: 107433	2010-07-01 21:55:59 +00:00
Dan Gohman	463f26b4be	Eliminate the other half of the BRCOND optimization, and update as many tests as possible. llvm-svn: 106749	2010-06-24 15:24:03 +00:00
Dan Gohman	df6b33e778	Eliminate the first have of the optimization which eliminates BRCOND when the condition is constant. This optimization shouldn't be necessary, because codegen shouldn't be able to find dead control paths that the IR-level optimizer can't find. And it's undesirable, because it encourages bugpoint to leave "br i1 false" branches in its output. And it wasn't updating the CFG. I updated all the tests I could, but some tests are too reduced and I wasn't able to meaningfully preserve them. llvm-svn: 106748	2010-06-24 15:04:11 +00:00
Jakob Stoklund Olesen	ec2e964fd6	Remove the local register allocator. Please use the fast allocator instead. llvm-svn: 106051	2010-06-15 21:58:33 +00:00
Evan Cheng	cc2efe11db	Fix some latency computation bugs: if the use is not a machine opcode do not just return zero. llvm-svn: 105061	2010-05-28 23:26:21 +00:00
Jakob Stoklund Olesen	7d22a81b61	Only use clairvoyance when defining a register, and then only if it has one use. This makes allocation independent on the ordering of use-def chains. llvm-svn: 103935	2010-05-17 04:50:57 +00:00
Jakob Stoklund Olesen	0ba2e2a568	Take allocation hints from copy instructions to/from physregs. This causes way more identity copies to be generated, ripe for coalescing. llvm-svn: 103686	2010-05-13 00:19:43 +00:00
Jakob Stoklund Olesen	e6e39dc310	Enable a bunch more -regalloc=fast tests llvm-svn: 103531	2010-05-12 00:11:24 +00:00
Dale Johannesen	81bfca7bde	Implement builtin_return_address(x) and builtin_frame_address(x) on PPC for x!=0. 7624113. llvm-svn: 102972	2010-05-03 22:59:34 +00:00
Duncan Sands	211427bda9	Remove the -enable-sjlj-eh option, which doesn't do anything. Remove the -enable-eh option which is only used by the JIT, and replace it with -jit-enable-eh. llvm-svn: 102865	2010-05-02 15:36:26 +00:00
Chris Lattner	6a5e706e3c	on darwin empty functions need to codegen into something of non-zero length, otherwise labels get incorrectly merged. We handled this by emitting a ".byte 0", but this isn't correct on thumb/arm targets where the text segment needs to be a multiple of 2/4 bytes. Handle this by emitting a noop. This is more gross than it should be because arm/ppc are not fully mc'ized yet. This fixes rdar://7908505 llvm-svn: 102400	2010-04-26 23:37:21 +00:00
Chris Lattner	5100367ff3	Bill's change in r95336 broke empty aggregates embedded in other types. fix this by only bumping zero-byte globals up to a single byte if the entire global is zero size, fixing PR6340. This also fixes empty arrays etc to be handled correctly, and only does this on subsection-via-symbols targets (aka darwin) which is the only place where this matters. llvm-svn: 101879	2010-04-20 06:20:21 +00:00
Dan Gohman	4fee6f3bdd	Start function numbering at 0. llvm-svn: 101638	2010-04-17 16:29:15 +00:00
Chris Lattner	3ae2dd2ba5	add newlines at the end of files. llvm-svn: 100705	2010-04-07 22:53:17 +00:00
Dale Johannesen	f118f9788b	Split big test into multiple directories to cater to those who don't build all targets. llvm-svn: 100688	2010-04-07 20:43:35 +00:00
Evan Cheng	604bc162da	After trivial coalescing, the MI being visited may have become a copy. Avoid adding it to CSE hash table since copies aren't being considered for CSE and they may be deleted. rdar://7819990 llvm-svn: 100170	2010-04-02 02:21:24 +00:00
Chris Lattner	6bba2f3c69	add some nounwinds llvm-svn: 99752	2010-03-28 07:58:37 +00:00
Chris Lattner	108667f3ec	this takes an insane amount of time to run, disable it for now (PR6727) llvm-svn: 99751	2010-03-28 07:58:09 +00:00
Duncan Sands	ca595495e4	Turn calls to copysignl into an FCOPYSIGN node. Handle FCOPYSIGN nodes with ppc_f128 type by having the type legalizer turn these back into a call to copysignl. llvm-svn: 98514	2010-03-14 21:08:40 +00:00
Chris Lattner	9efbbcbe45	fix AsmPrinter::GetBlockAddressSymbol to always return a unique label instead of trying to form one based on the BB name (which causes collisions if the name is empty). This fixes PR6608 llvm-svn: 98495	2010-03-14 17:53:23 +00:00
Chris Lattner	6e52e9db31	get MMI out of the label uniquing business, just go to MCContext to get unique assembler temporary labels. llvm-svn: 98489	2010-03-14 08:36:50 +00:00
Evan Cheng	80ad113731	Enable machine cse pass. llvm-svn: 98132	2010-03-10 03:07:41 +00:00
Dale Johannesen	90eab67320	The address of an indirect call must be in R12 on Darwin. Make it so. (This patch is in LowerCall_Darwin, which seems to be used by SVR4 code as well; since that doesn't belong here, I haven't worried about this case.) llvm-svn: 98077	2010-03-09 20:15:42 +00:00
Chris Lattner	90e7924cf0	add some random nounwinds. llvm-svn: 97411	2010-02-28 20:36:49 +00:00
Jakob Stoklund Olesen	ddbf7a858e	Use the right floating point load/store instructions in PPCInstrInfo::foldMemoryOperandImpl(). The PowerPC floating point registers can represent both f32 and f64 via the two register classes F4RC and F8RC. F8RC is considered a subclass of F4RC to allow cross-class coalescing. This coalescing only affects whether registers are spilled as f32 or f64. Spill slots must be accessed with load/store instructions corresponding to the class of the spilled register. PPCInstrInfo::foldMemoryOperandImpl was looking at the instruction opcode which is wrong. X86 has similar floating point register classes, but doesn't try to fold memory operands, so there is no problem there. llvm-svn: 97262	2010-02-26 21:09:24 +00:00
Chris Lattner	df8a8a8c6f	Change the scheduler from adding nodes in allnodes order to adding them in a determinstic order (bottom up from the root) based on the structure of the graph itself. This updates tests for some random changes, interesting bits: CodeGen/Blackfin/promote-logic.ll no longer crashes. I have no idea why, but that's good right? CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but now compiles to have one fewer constant pool entry, making the expected load that was being folded disappear. Since it is an unreduced mass of gnast, I just removed it. This fixes PR6370 llvm-svn: 97023	2010-02-24 06:11:37 +00:00
Dan Gohman	4506fcb3c2	When emitting an instruction which depends on both a post-incremented induction variable value and a loop-variant value, don't force the insert position to be at the post-increment position, because it may not be dominated by the loop-variant value. This fixes a use-before-def problem noticed on PPC. llvm-svn: 96774	2010-02-22 03:59:54 +00:00
Chris Lattner	745219ea64	add some no-unwinds, other minor cleanups. llvm-svn: 96756	2010-02-21 20:33:20 +00:00
Chris Lattner	c43c88ebce	add a triple so that this doesn't fail due to linux/ppc register printing syntax. llvm-svn: 96748	2010-02-21 19:27:38 +00:00
Chris Lattner	53485469b4	filecheckize and add nouwinds. llvm-svn: 96745	2010-02-21 18:53:28 +00:00
Dale Johannesen	cee887425e	Make g5 target explicit; scheduling affects register choice. llvm-svn: 96413	2010-02-16 23:25:23 +00:00
Dale Johannesen	0062f7bf59	Adjust register numbers in tests to compensate for the new lack of R2. llvm-svn: 96407	2010-02-16 22:31:31 +00:00
Dale Johannesen	26062150fa	When save/restoring CR at prolog/epilog, in a large stack frame, the prolog/epilog code was using the same register for the copy of CR and the address of the save slot. Oops. This is fixed here for Darwin, sort of, by reserving R2 for this case. A better way would be to do the store before the decrement of SP, which is safe on Darwin due to the red zone. SVR4 probably has the same problem, but I don't know how to fix it; there is no red zone and R2 is already used for something else. I'm going to leave it to someone interested in that target. Better still would be to rewrite the CR-saving code completely; spilling each CR subregister individually is horrible code. llvm-svn: 96015	2010-02-12 21:35:34 +00:00
Rafael Espindola	4536b9a904	Fix alignment on ppc linux. This fixes the build of crtend.o llvm-svn: 95477	2010-02-06 03:32:21 +00:00
Bill Wendling	994da1a479	Make test more fucused eliminating extraneous bits. llvm-svn: 95384	2010-02-05 11:21:05 +00:00
Bill Wendling	6510dc8dc3	An empty global constant (one of size 0) may have a section immediately following it. However, the EmitGlobalConstant method wasn't emitting a body for the constant. The assembler doesn't like that. Before, we were generating this: .zerofill __DATA, __common, __cmd, 1, 3 This fix puts us back to that semantic. llvm-svn: 95336	2010-02-05 00:17:02 +00:00
Dale Johannesen	a466692552	Reapply 95050 with a tweak to check the register class. llvm-svn: 95183	2010-02-03 01:40:33 +00:00

... 7 8 9 10 11 ...

1112 Commits