llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	00d34ed64f	[AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR. Thanks to Igor Breger for pointing out my mistake. llvm-svn: 277292	2016-07-31 17:15:07 +00:00
Craig Topper	e4f868ea16	[AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable. llvm-svn: 277120	2016-07-29 06:06:04 +00:00
Matthias Braun	941a705b7b	MachineFunction: Return reference for getFrameInfo(); NFC getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017	2016-07-28 18:40:00 +00:00
Craig Topper	ce415ff9c5	[AVX512] Add load folding support for the unmasked forms of the FMA instructions. llvm-svn: 276615	2016-07-25 07:20:35 +00:00
Craig Topper	2dca3b287b	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Craig Topper	05629d05c7	[X86] Replace CodeGenOnly VPSRAVW/D/Q_Int instructions with patterns since the operand types exactly match the normal VPSRAVW/D/Q instructions. llvm-svn: 276555	2016-07-24 07:32:45 +00:00
Craig Topper	8152b9cd96	[X86] Fix typo in comment. llvm-svn: 276528	2016-07-23 16:44:08 +00:00
Craig Topper	b6519db90d	[AVX512] Implement commuting support for EVEX encoded FMA3 instructions. llvm-svn: 276521	2016-07-23 07:16:56 +00:00
Craig Topper	6172b0b3e9	[X86] Make one of the FMA3 commuting methods static. Remove a call to isFMA3 just to get the IsIntrisic flag, instead get it during the first call and pass it along. NFC llvm-svn: 276520	2016-07-23 07:16:53 +00:00
Craig Topper	ca8f5f309c	[X86] Fix switch statement indentation per coding standards. llvm-svn: 276519	2016-07-23 07:16:50 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Craig Topper	0b90756b0a	[AVX512] Add load folding for some AVX512VL logic and arithmetic instructions. llvm-svn: 276391	2016-07-22 05:00:39 +00:00
Craig Topper	ab13b33ded	[AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too. llvm-svn: 276390	2016-07-22 05:00:35 +00:00
Matthias Braun	ca8210a952	X86InstrInfo: No need for liveness analysis in classifyLEAReg() classifyLEAReg() deals with switching operands from 32bit to 64bit in order to use a LEA64_32 instruction (for three address code goodness). It currently performs a liveness analysis to determine the kill/undef flag for the newly added operand. This should not be necessary: - If the previous operand had a kill flag, then the 32bit part of the register gets killed, this will kill the super register as well. - If the previous operand had an undef flag then we didn't care what value we read, just use the same flag on the new operand. (No matter what an operand with an undef flag won't affect liveness) This makes the code independent of the presence of kill flags because it avoids a call to MachineBasicBlock::computeRegisterLiveness(). Differential Revision: http://reviews.llvm.org/D22283 llvm-svn: 276222	2016-07-21 00:33:38 +00:00
Craig Topper	a3c55f5915	[AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables. llvm-svn: 275775	2016-07-18 06:49:32 +00:00
Craig Topper	16a0744955	[AVX512] Add KADD/KAND/KOR/KXOR to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275771	2016-07-18 06:14:59 +00:00
Craig Topper	463f949a3a	[X86] Add VPMULLW/D/Q instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275770	2016-07-18 06:14:57 +00:00
Craig Topper	1af6cc00dc	[X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275769	2016-07-18 06:14:54 +00:00
Craig Topper	ba9b93d7f2	[X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275768	2016-07-18 06:14:50 +00:00
Craig Topper	3a99de4067	[X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275767	2016-07-18 06:14:47 +00:00
Craig Topper	fe5a6dc581	[X86] Add more AVX512 instructions to X86InstrInfo::isHighLatencyDef. Also add all packed fp division instructions. llvm-svn: 275766	2016-07-18 06:14:45 +00:00
Craig Topper	f7a06c29bc	[X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr. llvm-svn: 275765	2016-07-18 06:14:43 +00:00
Craig Topper	650a15e2b3	[X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related. llvm-svn: 275764	2016-07-18 06:14:39 +00:00
Craig Topper	5c913e84df	[AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported. Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step. llvm-svn: 275763	2016-07-18 06:14:34 +00:00
Craig Topper	53f3d1b4d0	[X86] Fix 80-column violations. NFC llvm-svn: 275762	2016-07-18 06:14:26 +00:00
Justin Lebar	0af80cd6f0	[CodeGen] Take a MachineMemOperand::Flags in MachineFunction::getMachineMemOperand. Summary: Previously we took an unsigned. Hooray for type-safety. Reviewers: chandlerc Subscribers: dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D22282 llvm-svn: 275591	2016-07-15 18:26:59 +00:00
Jacques Pienaar	71c30a14b7	Rename AnalyzeBranch* to analyzeBranch*. Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564	2016-07-15 14:41:04 +00:00
Dean Michael Berris	52735fc435	XRay: Add entry and exit sleds Summary: In this patch we implement the following parts of XRay: - Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture. There are some caveats here: 1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet. 2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library. Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D19904 llvm-svn: 275367	2016-07-14 04:06:33 +00:00
Duncan P. N. Exon Smith	7b4c18e8f3	X86: Avoid implicit iterator conversions, NFC Avoid implicit conversions from MachineInstrBundleIterator to MachineInstr, mainly by preferring MachineInstr& over MachineInstr and using range-based for loops. llvm-svn: 275149	2016-07-12 03:18:50 +00:00
Craig Topper	516e14cd8e	[AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors. llvm-svn: 275045	2016-07-11 05:36:48 +00:00
Craig Topper	8674849d6e	[X86] Add the AVX512 SET0 pseudos to foldMemoryOperandImpl since they are marked for CanFoldAsLoad. I don't really know how to test this. llvm-svn: 275044	2016-07-11 05:36:41 +00:00
Duncan P. N. Exon Smith	d26fdc83c9	CodeGen: Use MachineInstr& in LiveVariables API, NFC Change all the methods in LiveVariables that expect non-null MachineInstr* to take MachineInstr& and update the call sites. This clarifies the API, and designs away a class of iterator to pointer implicit conversions. llvm-svn: 274319	2016-07-01 01:51:32 +00:00
Duncan P. N. Exon Smith	9cfc75c214	CodeGen: Use MachineInstr& in TargetInstrInfo, NFC This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189	2016-06-30 00:01:54 +00:00
Rafael Espindola	a99ccfce1a	Drop support for creating $stubs. They are created by ld64 since OS X 10.5. llvm-svn: 274130	2016-06-29 14:59:50 +00:00
Dehao Chen	8cd84aaa6f	Relax the clearance calculating for breaking partial register dependency. Summary: LLVM assumes that large clearance will hide the partial register spill penalty. But in our experiment, 16 clearance is too small. As the inserted XOR is normally fairly cheap, we should have a higher clearance threshold to aggressively insert XORs that is necessary to break partial register dependency. Reviewers: wmi, davidxl, stoklund, zansari, myatsina, RKSimon, DavidKreitzer, mkuper, joerg, spatel Subscribers: davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D21560 llvm-svn: 274068	2016-06-28 21:19:34 +00:00
Rafael Espindola	f9e348bd59	Convert a few more comparisons to isPositionIndependent(). NFC. llvm-svn: 273945	2016-06-27 21:33:08 +00:00
Igor Breger	e59165ca63	[AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering. Differential Revision: http://reviews.llvm.org/D20897 llvm-svn: 273138	2016-06-20 07:05:43 +00:00
Benjamin Kramer	4ca41fd09e	Run clang-tidy's performance-unnecessary-copy-initialization over LLVM. No functionality change intended. llvm-svn: 272516	2016-06-12 17:30:47 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Craig Topper	7a2993093e	[X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249	2016-06-09 07:06:38 +00:00
Rafael Espindola	712f957cae	Simplify handling of hidden stub. Since r207518 they are printed exactly like non-hidden stubs on x86 and since r207517 on ARM. This means we can use a single set for all stubs in those platforms. llvm-svn: 269776	2016-05-17 16:01:32 +00:00
David L Kreitzer	e7c583e06f	Fix for PR27750. Correctly handle the case where the fallthrough block and target block are the same in getFallThroughMBB. Differential Revision: http://reviews.llvm.org/D20288 llvm-svn: 269760	2016-05-17 12:47:46 +00:00
Quentin Colombet	220f7da488	[X86] Properly check that EAX is dead when copying EFLAGS. This fixes a bug introduced in r267623, where we got smarter and avoided to save EAX before using it. However, we failed to check if any of the subregister of EAX were alive and thus, missed cases where we have to save EAX before using it. The problem may happen on every X86/i386/... platform. This fixes llvm.org/PR27624 llvm-svn: 269115	2016-05-10 20:49:46 +00:00
Jonas Paulsson	8e5b0c65cc	[foldMemoryOperand()] Pass LiveIntervals to enable liveness check. SystemZ (and probably other targets as well) can fold a memory operand by changing the opcode into a new instruction that as a side-effect also clobbers the CC-reg. In order to do this, liveness of that reg must first be checked. When LIS is passed, getRegUnit() can be called on it and the right LiveRange is computed on demand. Reviewed by Matthias Braun. http://reviews.llvm.org/D19861 llvm-svn: 269026	2016-05-10 08:09:37 +00:00
Craig Topper	3e0c038a84	[X86][AVX512] Strengthen the assertions from r269001. We need VLX to use the 128/256-bit move opcodes for extended registers. llvm-svn: 269019	2016-05-10 05:28:04 +00:00
Quentin Colombet	ee5f36bd54	[X86][AVX512] Use the proper load/store for AVX512 registers. When loading or storing AVX512 registers we were not using the AVX512 variant of the load and store for VR128 and VR256 like registers. Thus, we ended up with the wrong encoding and actually were dropping the high bits of the instruction. The result was that we load or store the wrong register. The effect is visible only when we emit the object file directly and disassemble it. Then, the output of the disassembler does not match the assembly input. This is related to llvm.org/PR27481. llvm-svn: 269001	2016-05-10 01:09:14 +00:00
Craig Topper	e5ce84a33c	[AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX encoded VPXORD so all 32 registers can be used. llvm-svn: 268884	2016-05-08 21:33:53 +00:00
Matthias Braun	d1aabb2813	livePhysRegs: Pass MBB by reference in addLive{Ins\|Outs}(); NFC The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340	2016-05-03 00:24:32 +00:00
Matthias Braun	24f26e6d91	LivePhysRegs: Automatically determine presence of pristine regs. Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336	2016-05-03 00:08:46 +00:00
David L Kreitzer	0fe4632bd7	Enable the X86 call frame optimization for the 64-bit targets that allow it. Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227	2016-05-02 13:45:25 +00:00
Igor Breger	131008fbcb	Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190	2016-05-01 08:40:00 +00:00
Craig Topper	e012ede137	[X86] Reduce memory usage of MemOp2RegOp and RegOp2MemOp folding maps. llvm-svn: 268164	2016-04-30 17:59:49 +00:00
Craig Topper	477649a4c0	[X86] Remove unused operand from a function and all its callers. NFC llvm-svn: 267854	2016-04-28 05:58:46 +00:00
Quentin Colombet	2b3a4e787e	[X86] Teach the expansion of copy instructions how to do proper liveness. When the simple analysis provided by MachineBasicBlock::computeRegisterLiveness fails, fall back on the LivePhysReg utility. llvm-svn: 267623	2016-04-26 23:14:32 +00:00
Andrew Kaylor	2bee5ef462	Optimization bisect support in X86-specific passes Differential Revision: http://reviews.llvm.org/D19439 llvm-svn: 267608	2016-04-26 21:44:24 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Hans Wennborg	4ae5119eeb	X86: Use push-pop for materializing 8-bit immediates for minsize (take 2) This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375	2016-03-25 01:10:56 +00:00
Simon Pilgrim	a6ba27fbde	[X86][XOP] Fixed instruction postfixes to more closely match operands Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky llvm-svn: 264305	2016-03-24 16:31:30 +00:00
Cong Hou	94710840fb	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 264199	2016-03-23 21:45:37 +00:00
Chad Rosier	c27a18f39f	[TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC. http://reviews.llvm.org/D17967 llvm-svn: 263021	2016-03-09 16:00:35 +00:00
Craig Topper	cf65c62737	[X86] Use MCPhysReg and uint16_t for static arrays of registers and opcodes respectively should reduce size tiny bit. NFC llvm-svn: 262458	2016-03-02 04:42:31 +00:00
Duncan P. N. Exon Smith	6307eb5518	CodeGen: TII: Take MachineInstr& in predicate API, NFC Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605	2016-02-23 02:46:52 +00:00
Ahmed Bougacha	f3cccab1e0	[X86] Remove the now-unused X86ISD::PSIGN. NFC. llvm-svn: 261025	2016-02-16 22:14:12 +00:00
Igor Breger	4dc7d390db	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	a207436b01	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Sanjoy Das	881de4d12a	[X86] Fix a bug in getMemOpBaseRegImmOfs Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of `MemOp` is a frame index memory operand. The fix is to have `getMemOpBaseRegImmOfs` bail out in such cases. We can possibly be more clever here, if needed. llvm-svn: 259456	2016-02-02 02:32:43 +00:00
Benjamin Kramer	d477e9e378	Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed." and "Add a missing test case for r258847." This reverts commit r258847, r258848. Causes miscompilations and backend errors. llvm-svn: 258927	2016-01-27 12:44:12 +00:00
Cong Hou	551a57f797	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 258847	2016-01-26 20:08:01 +00:00
Simon Pilgrim	d1d118097d	[X86][AVX] Add commutation support for VPERM2X128 instructions Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726	2016-01-25 21:51:34 +00:00
Craig Topper	e00bffbc13	[X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. It was only needed for rematerialization. llvm-svn: 256818	2016-01-05 07:44:14 +00:00
David Majnemer	869be0a4a6	Revert "[X86] Use push-pop for materializing small constants under 'minsize'" The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808	2016-01-05 02:32:06 +00:00
Matthias Braun	7e762e4f9c	MachineInstrBundle: Fix reversed isSuperRegisterEq() call Unfortunately this fix had the effect of exposing the -verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for which I disabled it for now. Two testcases also have additional pushq/popq where the corrected code cannot prove that %rax is dead any longer. Looking at the examples, this could potentially be fixed by improving computeRegisterLiveness() to check the live-in lists of the successors blocks when reaching the end of a block. This fixes http://llvm.org/PR25951. llvm-svn: 256799	2016-01-05 00:45:35 +00:00
David Majnemer	ca1c9f074f	[X86] Make hasFP constant time We need a frame pointer if there is a push/pop sequence after the prologue in order to unwind the stack. Scanning the instructions to figure out if this happened made hasFP not constant-time which is a violation of expectations. Let's compute this up-front and reuse that computation when we need it. llvm-svn: 256730	2016-01-04 04:49:41 +00:00
Sanjay Patel	4104f78640	use range-based for-loops; NFCI llvm-svn: 256573	2015-12-29 19:14:23 +00:00
Sanjay Patel	cc4c71b4fb	tidy up; NFC llvm-svn: 256506	2015-12-28 18:18:22 +00:00
David Majnemer	334676355a	[X86, Win64] Use a frame pointer if pushf is emitted A frame pointer must be used if stack pointer is modified after the prologue. LLVM will emit pushf/popf if we need to save/restore the FLAGS register, requiring us to have a frame pointer for the function. There is a small twist: this sequence might exist in user code via inline-assembly. For now, conservatively assume that such functions require a frame pointer. For real world justification, please see clang's implementation of __readeflags. This fixes PR25945. llvm-svn: 256456	2015-12-27 06:07:26 +00:00
Craig Topper	91dab7baee	[X86] Replace MVT::SimpleValueType in the AsmParser library and getX86SubSuperRegister with just an unsigned representing size. This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types. llvm-svn: 256425	2015-12-25 22:09:45 +00:00
Elena Demikhovsky	9e225a2f52	AVX-512: Kreg set 0/1 optimization The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365	2015-12-24 08:12:22 +00:00
Craig Topper	ca66fc5473	[X86] Use range-based for loop. NFC llvm-svn: 256127	2015-12-20 18:41:57 +00:00
Hans Wennborg	a6a2e512cf	[X86] Use push-pop for materializing small constants under 'minsize' Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936	2015-12-17 23:18:39 +00:00
Hans Wennborg	7036e503d7	Fix "Not having LAHF/SAHF" assert. It wants to assert that the subtarget is 64-bit, not the register. llvm-svn: 255703	2015-12-15 23:21:46 +00:00
Hans Wennborg	08d5905bac	[X86] Smaller code for materializing 32-bit 1 and -1 constants "movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 llvm-svn: 255656	2015-12-15 17:10:28 +00:00
Matthias Braun	60d69e2865	CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness() computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362	2015-12-11 19:42:09 +00:00
Simon Pilgrim	4ba5969224	[X86][ADX] Added memory folding patterns and stack folding tests llvm-svn: 254844	2015-12-05 07:27:50 +00:00
Hans Wennborg	5000ce8a63	X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793	2015-12-04 23:00:33 +00:00
JF Bastien	580b6572b5	X86InstrInfo::copyPhysReg: workaround reg liveness Summary: computeRegisterLiveness and analyzePhysReg are currently getting confused about liveness in some cases, breaking copyPhysReg's calculation of whether AX is dead in some cases. Work around this issue temporarily by assuming that AX is always live. See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7 And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201. This workaround makes the code correct but slightly inefficient, but it seems to confuse the machine instr verifier which now things EAX was undefined in some cases where it's being conservatively saved / restored. Reviewers: majnemer, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15198 llvm-svn: 254680	2015-12-04 01:18:17 +00:00
Craig Topper	271f9ded44	[X86] Use range-based for loops. NFC llvm-svn: 254387	2015-12-01 06:13:15 +00:00
Craig Topper	ba894c3c0d	[X86] Use array_lengthof instead of calculating manually. Also change index types to size_t to match. llvm-svn: 254386	2015-12-01 06:13:13 +00:00
Craig Topper	27e2912fa8	Revert r254279 "[X86] Use ArrayRef. NFC". It seems to have upset an MSVC build bot. llvm-svn: 254280	2015-11-30 02:28:19 +00:00
Craig Topper	b84f39865f	[X86] Use ArrayRef. NFC llvm-svn: 254279	2015-11-30 02:08:05 +00:00
Vyacheslav Klochkov	ed865dfcc5	X86-FMA3: Improved/enabled the memory folding optimization for scalar loads generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}(). Reviewer: David Kreitzer Differential Revision: http://reviews.llvm.org/D14762 llvm-svn: 254140	2015-11-26 07:45:30 +00:00
Sanjay Patel	a0d354541d	[x86] remove duplicate movq instruction defs (PR25554) We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE: def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src), "mov{d\|q}\t{$src, $dst\|$dst, $src}", // X86-64 only [(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))], IIC_SSE_MOVDQ>; def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src), "mov{d\|q}\t{$src, $dst\|$dst, $src}", [(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))], IIC_SSE_MOVDQ>, Sched<[WriteMove]>; As shown in the test case and PR25554: https://llvm.org/bugs/show_bug.cgi?id=25554 This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction. This patch deletes one pair of these defs. Sadly, this won't fix the original test case in the bug report. Something else is still broken. Differential Revision: http://reviews.llvm.org/D14941 llvm-svn: 253988	2015-11-24 15:44:35 +00:00
Simon Pilgrim	ae0140d6ec	[X86] Use existing MachineInstrBuilder::addDisp to create offseted pointer. NFC. Minor code duplication tidyup to D13988 llvm-svn: 253606	2015-11-19 21:50:57 +00:00
Elena Demikhovsky	7c2c9fd243	AVX-512: Fixed COPY_TO_REGCLASS for mask registers Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits. Copying 8 bits under DQ may be done with kmovb. Differential Revision: http://reviews.llvm.org/D14812 llvm-svn: 253563	2015-11-19 13:13:00 +00:00
Vyacheslav Klochkov	cbc56baae6	X86-FMA3: Implemented commute transformations FMA_Int instructions. It made it possible to apply the memory folding optimization for the 2nd operand of FMA_Int instructions. Reviewer: Quentin Colombet Differential Revision: http://reviews.llvm.org/D14550 llvm-svn: 252973	2015-11-13 00:07:35 +00:00
Vyacheslav Klochkov	1ff9cbdfc0	My first/test commit. Removed a trailing whitespace. llvm-svn: 252940	2015-11-12 20:11:57 +00:00
Andrew Kaylor	4731bea3e5	Improved the operands commute transformation for X86-FMA3 instructions. All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335	2015-11-06 19:47:25 +00:00
Simon Pilgrim	f669d381f9	Warning fix. llvm-svn: 252078	2015-11-04 21:27:22 +00:00
Simon Pilgrim	7e6606f4f1	[X86][SSE] Add general memory folding for (V)INSERTPS instruction This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074	2015-11-04 20:48:09 +00:00

1 2 3 4 5 ...

1009 Commits