Commit Graph

1133 Commits

Author SHA1 Message Date
Serge Pavlov 49acf9c8eb Use methods to access data stored with frame instructions
Instructions CALLSEQ_START..CALLSEQ_END and their target dependent
counterparts keep data like frame size, stack adjustment etc. These
data are accessed by getOperand using hard coded indices. It is
error prone way. This change implements the access by special methods,
which improve readability and allow changing data representation without
massive changes of index values.

Differential Revision: https://reviews.llvm.org/D31953

llvm-svn: 300196
2017-04-13 14:10:52 +00:00
Easwaran Raman 02a0e91831 Fix the bootstrap failure caused by r299986.
llvm-svn: 300069
2017-04-12 15:26:15 +00:00
Easwaran Raman ddb9ae192a [x86] Relax the check in areLoadsFromSameBasePtr
Check if the scale operand is identical (doesn't have to be 1) and
do not check the chaain operand.

Differential revision: https://reviews.llvm.org/D31833

llvm-svn: 299986
2017-04-11 21:05:02 +00:00
Craig Topper 058f2f6d72 [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.

This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.

I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.

Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.

This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.

Differential Revision: https://reviews.llvm.org/D30968

llvm-svn: 298928
2017-03-28 16:35:29 +00:00
Matthias Braun e9f8209e87 ExecutionDepsFix: Normalize names; NFC
Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and
ExecutionDepsFix to the last one.

llvm-svn: 298183
2017-03-18 05:05:40 +00:00
Matthias Braun e959544517 TargetInstrInfo: Provide default implementation of isTailCall().
In fact this default implementation should be the only implementation,
keep it virtual for now to accomodate targets that don't model flags
correctly.

Differential Revision: https://reviews.llvm.org/D30747

llvm-svn: 297980
2017-03-16 20:02:30 +00:00
Jessica Paquette c984e21394 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.

llvm-svn: 297653
2017-03-13 18:39:33 +00:00
Craig Topper 58647b16e5 [AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask.
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.

I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.

llvm-svn: 297574
2017-03-12 03:37:37 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Ayman Musa 4b2c968c43 [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).

llvm-svn: 295970
2017-02-23 13:15:44 +00:00
Ayman Musa 524dbdaa2b [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding.
(Quick fix to buildbot fail). 

llvm-svn: 295946
2017-02-23 08:13:36 +00:00
Ayman Musa 6e670cf44f [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions.
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.

Differential Revision: https://reviews.llvm.org/D29988

llvm-svn: 295940
2017-02-23 07:24:21 +00:00
Craig Topper 218d1a020e [AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch.
The instructions are marked commutable, but without special handling we don't get the immediate correct.

While here also remove the masked memory forms that aren't commutable.

llvm-svn: 295602
2017-02-19 08:03:26 +00:00
Craig Topper 811756b4dc [X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.

llvm-svn: 295579
2017-02-18 22:53:43 +00:00
Craig Topper de10312bea Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
Clang has now been fixed to not use these intrinsics.

llvm-svn: 295571
2017-02-18 21:50:58 +00:00
Craig Topper ba2a726cc6 Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support.

llvm-svn: 295565
2017-02-18 20:14:20 +00:00
Craig Topper 884db3f85d [X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR.
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.

llvm-svn: 295564
2017-02-18 19:51:25 +00:00
Hans Wennborg 35905d6a67 Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)"
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.

llvm-svn: 295357
2017-02-16 19:04:42 +00:00
Hans Wennborg a468601e0e [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

llvm-svn: 295262
2017-02-16 00:04:05 +00:00
Craig Topper ec5df5f4aa [AVX-512] Add PACKSS/PACKUS instructions to load folding tables.
llvm-svn: 295154
2017-02-15 06:51:39 +00:00
Craig Topper d2d50cba2a [AVX-512] Add PAVGB/PAVGW to load folding tables.
llvm-svn: 295035
2017-02-14 06:54:57 +00:00
Craig Topper cfe8ce3a58 [AVX-512] Add various EVEX move instructions to load folding tables using the VEX equivalents as a guide.
llvm-svn: 294908
2017-02-12 18:47:46 +00:00
Craig Topper 6eca3170a8 [AVX-512] Add VPEXTRD/Q to load folding tables.
llvm-svn: 294905
2017-02-12 18:47:37 +00:00
Craig Topper 255343483d [AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables.
llvm-svn: 294858
2017-02-11 17:35:28 +00:00
Craig Topper b2fa216dd5 [X86] Improve alphabetizing of load folding tables. NFC
llvm-svn: 294857
2017-02-11 17:35:25 +00:00
Simon Pilgrim 6411a0ebed [X86][3DNow!] Enable PFSUB<->PFSUBR commutation
llvm-svn: 294847
2017-02-11 13:51:14 +00:00
Craig Topper 1f6153bab4 [AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables.
llvm-svn: 294830
2017-02-11 07:01:40 +00:00
Craig Topper 3afa777f10 [AVX-512] Add VPSADBW instructions to load folding tables.
llvm-svn: 294827
2017-02-11 06:24:03 +00:00
Craig Topper 464b8cb244 [X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available.
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.

This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.

Overall I think this produces better results in the modified test cases.

llvm-svn: 294824
2017-02-11 05:32:57 +00:00
Peter Collingbourne d7dd65ad7c X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols.
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.

Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.

Differential Revision: https://reviews.llvm.org/D28621

llvm-svn: 294634
2017-02-09 21:58:24 +00:00
Hans Wennborg 819e3e02a9 [X86] Disable conditional tail calls (PR31257)
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).

Reverting until we figure out the proper solution.

llvm-svn: 294348
2017-02-07 20:37:45 +00:00
Craig Topper 9191c3324a [AVX-512] Add masked and unmasked shift by immediate instructions to load folding tables.
llvm-svn: 294287
2017-02-07 07:31:00 +00:00
Craig Topper 62304d80e3 [AVX-512] Add masked shift instructions to load folding tables.
This adds the masked versions of everything, but the shift by immediate instructions.

llvm-svn: 294286
2017-02-07 07:30:57 +00:00
Craig Topper 45d9ddc687 [AVX-512] Add some of the shift instructions to the load folding tables.
This includes unmasked forms of variable shift and shifting by the lower element of a register.

Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.

llvm-svn: 294285
2017-02-07 07:30:54 +00:00
Craig Topper 5d9ecd23e8 [AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables.
llvm-svn: 294170
2017-02-06 05:12:14 +00:00
Craig Topper f0eb60a6f3 [AVX-512] Add VPABSB/D/Q/W to load folding tables.
llvm-svn: 294169
2017-02-06 03:18:01 +00:00
Craig Topper 864b1a5376 [AVX-512] Add VSHUFPS/PD to load folding tables.
llvm-svn: 294168
2017-02-06 03:17:58 +00:00
Craig Topper 75218fb6b1 [AVX-512] Add VPMULLD/Q/W instructions to load folding tables.
llvm-svn: 294164
2017-02-06 01:19:26 +00:00
Craig Topper 452a7770e6 [AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to load folding tables.
llvm-svn: 294163
2017-02-05 23:31:48 +00:00
Craig Topper 8eb1f315ac [AVX-512] Add scalar masked max/min intrinsic instructions to the load folding tables.
llvm-svn: 294153
2017-02-05 22:25:46 +00:00
Craig Topper cb4bc8be5b [AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the load folding tables.
llvm-svn: 294152
2017-02-05 22:25:42 +00:00
Craig Topper 59af67206d [AVX-512] Add masked scalar FMA intrinsics to isNonFoldablePartialRegisterLoad to improve load folding of scalar loads.
llvm-svn: 294151
2017-02-05 22:25:40 +00:00
Evandro Menezes 94edf02923 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737
2017-02-01 02:54:34 +00:00
Craig Topper 2cfa2071bd [AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway.
llvm-svn: 293608
2017-01-31 06:49:55 +00:00
Craig Topper 797e32dd98 [X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version.
llvm-svn: 293607
2017-01-31 06:49:53 +00:00
Craig Topper 779e4c5bb4 [AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions.
llvm-svn: 293606
2017-01-31 06:49:50 +00:00
Craig Topper f6df4a6978 [AVX-512] Remove duplicate CodeGenOnly patterns for scalar register broadcast. We can use COPY_TO_REGCLASS like AVX does.
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.

llvm-svn: 293464
2017-01-30 06:59:06 +00:00
Craig Topper 0265a39472 [AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select KSET0W/KSET1W for v8i1.
llvm-svn: 293458
2017-01-30 05:37:47 +00:00