the compact unwind claiming that one register was saved before another, which
isn't all that great in general. Process them in the natural order. Reverse the
list only when necessary for the algorithm.
llvm-svn: 146612
to finalize MI bundles (i.e. add BUNDLE instruction and computing register def
and use lists of the BUNDLE instruction) and a pass to unpack bundles.
- Teach more of MachineBasic and MachineInstr methods to be bundle aware.
- Switch Thumb2 IT block to MI bundles and delete the hazard recognizer hack to
prevent IT blocks from being broken apart.
llvm-svn: 146542
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
subdirectories to traverse into.
- Originally I wanted to avoid this and just autoscan, but this has one key
flaw in that new subdirectories can not automatically trigger a rerun of the
llvm-build tool. This is particularly a pain when switching back and forth
between trees where one has added a subdirectory, as the dependencies will
tend to be wrong. This will also eliminates FIXME implicitly.
llvm-svn: 146436
does. The _GLOBAL_OFFSET_TABLE_ is still magical in that we get a R_386_GOTPC,
but it doesn't change the immediate in the same way as when the expression
has no right hand side symbol.
llvm-svn: 146311
if (HasAVX)
X86SSELevel = NoMMXSSE;
This is so patterns that are predicated on hasSSE3, etc. would not be selected when avx is available. Instead, the AVX variant is selected.
However, this breaks instructions which do not have AVX variants.
The right way to fix this is for the SSE but not-AVX patterns to predicate on something like hasSSE3() && !hasAVX().
Then we can take out the hack in X86Subtarget.cpp. Patterns which do not have AVX variants do not need to change.
However, we need to audit all the patterns before we make the change. This patch is workaround that fixes one specific case,
the prefetch instructions. rdar://10538297
llvm-svn: 146163
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
This was actually a bit of a mess. TLI.setPrefLoopAlignment was clearly
documented as taking log2(bytes) units, but the x86 target would still
set a preferred loop alignment of '16'.
CodePlacementOpt passed this number on to the basic block, and
AsmPrinter interpreted it as bytes.
Now both MachineFunction and MachineBasicBlock use logarithmic
alignments.
Obviously, MachineConstantPool still measures alignments in bytes, so we
can emulate the thrill of using as.
llvm-svn: 145889
Whether a fixup needs relaxation for the associated instruction is a
target-specific function, as the FIXME indicated. Create a hook for that
and use it.
llvm-svn: 145881
libgcc sets the stack limit field in TCB to 256 bytes above the actual
allocated stack limit. This means if the function's stack frame needs
less than 256 bytes, we can just compare the stack pointer with the
stack limit. This should result in lesser calls to __morestack.
llvm-svn: 145766
Currently LLVM pads the call to __morestack with a add and sub of 8
bytes to esp. This isn't correct since __morestack expects the call
to be followed directly by a ret.
This commit also adjusts the relevant test-case.
llvm-svn: 145765
the same value) to this variable. This code could be refactored, but it doesn't
matter since the old JIT is going away. Add tsan annotations to ignore the
race.
llvm-svn: 145745
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.
This also makes the AVX variants redundant.
llvm-svn: 145440
as MC is the only assembler we support.
This splits MS/Windows and GNU/Windows ASM infos into two seperate classes.
While there is currently only one difference, full MS C++ ABI support will
require many more.
llvm-svn: 145409
Before:
movabsq $4294967296, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
testq %rax, %rdi ## encoding: [0x48,0x85,0xf8]
jne LBB0_2 ## encoding: [0x75,A]
After:
btq $32, %rdi ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
jb LBB0_2 ## encoding: [0x72,A]
btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.
llvm-svn: 145103
This was a bug in keeping track of the available domains when merging
domain values.
The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.
Also add an assertion to catch future attempts at emitting AVX2
instructions.
llvm-svn: 145096
and code model. This eliminates the need to pass OptLevel flag all over the
place and makes it possible for any codegen pass to use this information.
llvm-svn: 144788
Two new TargetInstrInfo hooks lets the target tell ExecutionDepsFix
about instructions with partial register updates causing false unwanted
dependencies.
The ExecutionDepsFix pass will break the false dependencies if the
updated register was written in the previoius N instructions.
The small loop added to sse-domains.ll runs twice as fast with
dependency-breaking instructions inserted.
llvm-svn: 144602
handle defining the "magic" target related components (like native,
nativecodegen, and engine).
- We still require these components to be in the project (currently in
lib/Target) so that we have a place to document them and hopefully make it
more obvious that they are "magic".
llvm-svn: 144253
When this field is true it means that the load is from constant (runt-time or compile-time) and so can be hoisted from loops or moved around other memory accesses
llvm-svn: 144100
The xorps instruction is smaller than pxor, so prefer that encoding.
The ExecutionDepsFix pass will switch the encoding to pxor and xorpd
when appropriate.
llvm-svn: 143996
On x86: (shl V, 1) -> add V,V
Hardware support for vector-shift is sparse and in many cases we scalarize the
result. Additionally, on sandybridge padd is faster than shl.
llvm-svn: 143311
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.
llvm-svn: 143206
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
Delete #if 0 code accidentally left in.
llvm-svn: 143188
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.
Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.
Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.
Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.
This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.
llvm-svn: 143177
not depend on In32BitMode. Use the sysexitq mnemonic for the version with the
REX.W prefix and only allow it only In64BitMode. rdar://9738584
llvm-svn: 143112
MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET
followed by a MOV respectively. Having a fake instruction prevents
the verifier from seeing a MachineBasicBlock end with a
non-terminator (MOV). It also prevents the rather eccentric case of a
MachineBasicBlock ending with RET but having successors nevertheless.
Patch by Sanjoy Das.
llvm-svn: 143062
the X86 asmparser to produce ranges in the one case that was annoying me, for example:
test.s:10:15: error: invalid operand for instruction
movl 0(%rax), 0(%edx)
^~~~~~~
It should be straight-forward to enhance filecheck, tblgen, and/or the .ll parser to use
ranges where appropriate if someone is interested.
llvm-svn: 142106
TableGen infers unmodeled side effects on instructions without a
pattern. Fix some instruction definitions where that was overlooked.
Also raise an error if a rematerializable instruction has unmodeled side
effects. That doen't make any sense.
llvm-svn: 141929
TableGen will mark any pattern-less instruction as having unmodeled side
effects. This is extra bad for V_SET0 which gets rematerialized a lot.
This was part of the cause for PR11125, but the real bug was fixed
in r141923.
llvm-svn: 141924
http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101
--- Reverse-merging r141854 into '.':
U test/MC/Disassembler/X86/x86-32.txt
U test/MC/Disassembler/X86/simple-tests.txt
D test/CodeGen/X86/bmi.ll
U lib/Target/X86/X86InstrInfo.td
U lib/Target/X86/X86ISelLowering.cpp
U lib/Target/X86/X86.td
U lib/Target/X86/X86Subtarget.h
llvm-svn: 141857
promoting allocas to preferred alignments that exceed the natural
alignment. This avoids some potentially expensive dynamic stack realignments.
The natural stack alignment is set in target data strings via the "S<size>"
option. Size is in bits and must be a multiple of 8. The natural stack alignment
defaults to "unspecified" (represented by a zero value), and the "unspecified"
value does not prevent any alignment promotions. Target maintainers that care
about avoiding promotions should explicitly add the "S<size>" option to their
target data strings.
llvm-svn: 141599
A GR8_NOREX virtual register is created when extrating a sub_8bit_hi
sub-register:
%vreg2<def> = COPY %vreg1:sub_8bit_hi; GR8_NOREX:%vreg2 %GR64_ABCD:%vreg1
TEST8ri_NOREX %vreg2, 1, %EFLAGS<imp-def>; GR8_NOREX:%vreg2
If such a live range is ever split, its register class must not be
inflated to GR8. The sub-register copy can only target GR8_NOREX.
I dont have a test case for this theoretical bug.
llvm-svn: 141500
In 64-bit mode, sub_8bit_hi sub-registers can only be used by NOREX
instructions. The COPY created from the EXTRACT_SUBREG DAG node cannot
target all GR8 registers, only those in GR8_NOREX.
TO enforce this, we ensure that all instructions using the
EXTRACT_SUBREG are GR8_NOREX constrained.
This fixes PR11088.
llvm-svn: 141499
This instruction is explicitly encoded without an REX prefix, so both
operands but be *_NOREX.
Also add an assertion to copyPhysReg() that fires when the MOV8rr_NOREX
constraints are not satisfied.
This fixes a miscompilation in 20040709-2 in the gcc test suite.
llvm-svn: 141410
There are fewer registers with sub_8bit sub-registers in 32-bit mode
than in 64-bit mode. In 32-bit mode, sub_8bit behaves the same as
sub_8bit_hi.
llvm-svn: 141206
This uses less memory and it reduces the complexity of sub-class
operations:
- hasSubClassEq() and friends become O(1) instead of O(N).
- getCommonSubClass() becomes O(N) instead of O(N^2).
In the future, TableGen will infer register classes. This makes it
cheap to add them.
llvm-svn: 140898
This also enables domain swizzling for AVX code which required a few
trivial test changes.
The pass will be moved to lib/CodeGen shortly.
llvm-svn: 140659
I am going to unify the SSEDomainFix and NEONMoveFix passes into a
single target independent pass. They are essentially doing the same
thing.
llvm-svn: 140652
We already support GR64 <-> VR128 copies. All of these copies break
partial register dependencies by zeroing the high part of the target
register.
llvm-svn: 140348
- x87: no min or max.
- SSE1: min/max for single precision scalars and vectors.
- SSE2: min/max for single and double precision scalars and vectors.
- AVX: as SSE2, but also supports the wider ymm vectors. (this is covered by the isTypeLegal check)
llvm-svn: 140296
dag-combine optimization to implement the ext-load efficiently (using shuffles).
For example the type <4 x i8> is stored in memory as i32, but it needs to
find its way into a <4 x i32> register. Previously we scalarized the memory
access, now we use shuffles.
llvm-svn: 139995
maxps and maxpd). This broke the sse41-blend.ll testcase by causing
maxpd to be produced rather than a cmp+blend pair, which is the reason
I tweaked it. Gives a small speedup on doduc with dragonegg when the
GCC vectorizer is used.
llvm-svn: 139986
are declared with load patterns. This fix the crash in PR10941. No testcases,
since a fold is triggered and then converted back to the register form
afterwards.
llvm-svn: 139953
This PR basically reports a problem where a crash in generated code
happened due to %rbp being clobbered:
pushq %rbp
movq %rsp, %rbp
....
vmovmskps %ymm12, %ebp
....
movq %rbp, %rsp
popq %rbp
ret
Since Eric's r123367 commit, the default stack alignment for x86 32-bit
has changed to be 16-bytes. Since then, the MaxStackAlignmentHeuristicPass
hasn't been really used, but with AVX it becomes useful again, since per
ABI compliance we don't always align the stack to 256-bit, but only when
there are 256-bit incoming arguments.
ReserveFP was only used by this pass, but there's no RA target hook that
uses getReserveFP() to check for the presence of FP (since nothing was
triggering the pass to run, the uses of getReserveFP() were removed
through time without being noticed). Change this pass to use
setForceFramePointer, which is properly called by MachineFunction
hasFP method.
The testcase is very big and dependent on RA, not sure if it's worth
adding to test/CodeGen/X86.
llvm-svn: 139939
take into consideration the presence of AVX. This change, together with
the SSEDomainFix enabled for AVX, makes AVX codegen to always (hopefully)
emit the same code as SSE for 128-bit vector ops. I don't
have a testcase for this, but AVX now beats SSE in performance for
128-bit ops in the majority of programas in the llvm testsuite
llvm-svn: 139817
alignment check for 256-bit classes more strict. There're no testcases
but we catch more folding cases for AVX while running single and multi
sources in the llvm testsuite.
Since some 128-bit AVX instructions have different number of operands
than their SSE counterparts, they are placed in different tables.
256-bit AVX instructions should also be added in the table soon. And
there a few more 128-bit versions to handled, which should come in
the following commits.
llvm-svn: 139687
more strict about the alignment checking. This was found by inspection
and I don't have any testcases so far, although the llvm testsuite runs
without any problem.
llvm-svn: 139625
However with this fix it does now.
Basically the operand order for the x86 target specific node
is not the same as the instruction, but since the intrinsic need that
specific order at the instruction definition, just change the order
during legalization. Also, there were some wrong invertions of condition
codes, such as GE => LE, GT => LT, fix that too. Fix PR10907.
llvm-svn: 139528
assert("not implemented for target shuffle node");
to:
assert(0 && "not implemented for target shuffle node");
This causes a test failure in CodeGen/X86/palignr.ll which has
been marked as XFAIL for the time being.
Test failure filed at PR10901.
llvm-svn: 139454
single field (Flags), which is a bitwise OR of items from the TB_*
enum. This makes it easier to add new information in the future.
* Gives every static array an equivalent layout: { RegOp, MemOp, Flags }
* Adds a helper function, AddTableEntry, to avoid duplication of the
insertion code.
* Renames TB_NOT_REVERSABLE to TB_NO_REVERSE.
* Adds TB_NO_FORWARD, which is analogous to TB_NO_REVERSE, except that
it prevents addition of the Reg->Mem entry. (This is going to be used
by Native Client, in the next CL).
Patch by David Meyer
llvm-svn: 139311
in Nadav's r139285 and r139287 commits.
1) Rename vsel.ll to a more descriptive name
2) Change the order of BLEND operands to "Op1, Op2, Cond", this is
necessary because PBLENDVB is already used in different places with
this order, and it was being emitted in the wrong way for vselect
3) Add AVX patterns and tests for the same SSE41 instructions
llvm-svn: 139305
with a vector condition); such selects become VSELECT codegen nodes.
This patch also removes VSETCC codegen nodes, unifying them with SETCC
nodes (codegen was actually often using SETCC for vector SETCC already).
This ensures that various DAG combiner optimizations kick in for vector
comparisons. Passes dragonegg bootstrap with no testsuite regressions
(nightly testsuite as well as "make check-all"). Patch mostly by
Nadav Rotem.
llvm-svn: 139159
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
llvm-svn: 139140
instructions are more aligned than the CPU requires, and adds some additional
directives, to follow in future patches. Patch by David Meyer!
llvm-svn: 139125
The explanation about a 0 argument being materialized as xor is no
longer valid. Rematerialization will check if EFLAGS is live before
clobbering it.
The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.
This causes one less testb instruction to be generated in the cmov.ll
test case.
llvm-svn: 139057
- Duplicate some store patterns to their AVX forms!
- Catched a bug while restricting the patterns subtarget, fix it
and update a testcase to check it properly
llvm-svn: 138851
code is inserted to first check if the current stacklet has enough
space. If so, space is allocated by simply decrementing the stack
pointer. Otherwise a runtime routine (__morestack_allocate_stack_space
in libgcc) is called which allocates the required memory from the
heap.
Patch by Sanjoy Das.
llvm-svn: 138818
from DYNAMIC_STACKALLOC.
Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added. They
will be custom emitted to inject the actual stack handling code.
Patch by Sanjoy Das.
llvm-svn: 138814
X86. Modify the pass added in the previous patch to call this new
code.
This new prologues generated will call a libgcc routine (__morestack)
to allocate more stack space from the heap when required
Patch by Sanjoy Das.
llvm-svn: 138812
explicit about which subtarget they refer to, and add AVX versions of
the ones we currently don't. Make the mask check more strict, to be
clear it won't be used to match to 256-bit versions!
llvm-svn: 138514
SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.
llvm-svn: 138317
instead of 2. They were already defined this way in their regular
version, but not for the intrinsics versions (*_Int), and that would work
for assembly emission but not for object code, since a MachineOperand
would be missing. This commit fix PR10697.
Also removed the {VSQRT,VRSQRT,VRCP}r_Int forms and match the intrinsic
via INSERT_SUBREG+EXTRACT_SUBREG patterns. The same couldn't be done for
memory versions because sse_load_f32/sse_load_f64 operand need special
handling and don't work like regular "addr" operands.
There are right now 114 "*_Int" and 98 "Int_*" forms! I'm slowly
removing them as I step through, but hope we can get rid of these
someday, they are really annoying :)
llvm-svn: 138012
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.
llvm-svn: 137810
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!
llvm-svn: 137807
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:
For this shuffle:
shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
<i32 1, i32 0, i32 7, i32 6>
This was expanded to:
vextractf128 $1, %ymm1, %xmm2
vpextrq $0, %xmm2, %rax
vmovd %rax, %xmm1
vpextrq $1, %xmm2, %rax
vmovd %rax, %xmm2
vpunpcklqdq %xmm1, %xmm2, %xmm1
vpextrq $0, %xmm0, %rax
vmovd %rax, %xmm2
vpextrq $1, %xmm0, %rax
vmovd %rax, %xmm0
vpunpcklqdq %xmm2, %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
ret
Now we get:
vshufpd $1, %xmm0, %xmm0, %xmm0
vextractf128 $1, %ymm1, %xmm1
vshufpd $1, %xmm1, %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
llvm-svn: 137733
Allow a target assembly parser to do context sensitive constraint checking
on a potential instruction match. This will be used, for example, to handle
Thumb2 IT block parsing.
llvm-svn: 137675
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
vinsertf128 $1, %xmm3, %ymm0, %ymm3
vinsertf128 $0, %xmm1, %ymm3, %ymm1
vmovaps %ymm1, 416(%rsp)
After:
vmovaps %xmm3, 416+16(%rsp)
vmovaps %xmm1, 416(%rsp)
llvm-svn: 137308
data in-register prior to saving to memory. When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
llvm-svn: 137238
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.
llvm-svn: 137227
X86FloatingPoint keeps track of pending ST registers for an upcoming
inline asm instruction with fixed stack register constraints. It does
this by remembering which FP register holds the value that should appear
at a fixed stack position for the inline asm.
When that FP register is killed before the inline asm, make sure to
duplicate it to a scratch register, so the ST register still has a live
FP reference.
This could happen when the same FP register was copied to two ST
registers, or when a spill instruction is inserted between the ST copy
and the inline asm.
This fixes PR10602.
llvm-svn: 137050
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547
llvm-svn: 136642
working on x86 (at least for trivial testcases); other architectures will
need more work so that they actually emit the appropriate instructions for
orderings stricter than 'monotonic'. (As far as I can tell, the ARM, PPC,
Mips, and Alpha backends need such changes.)
llvm-svn: 136457
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
llvm-svn: 136433
LLVM*AsmPrinter.
GenLibDeps.pl fails to detect vtable references. As this is the only
referenced symbol from LLVM*Desc to LLVM*AsmPrinter on optimized
builds, the algorithm that creates the list of libraries to be linked
into tools doesn't know about the dependency and sometimes places the
libraries on the wrong order, yielding error messages like this:
../../lib/libLLVMARMDesc.a(ARMMCTargetDesc.cpp.o): In function
`llvm::ARMInstPrinter::ARMInstPrinter(llvm::MCAsmInfo const&)':
ARMMCTargetDesc.cpp:(.text._ZN4llvm14ARMInstPrinterC1ERKNS_9MCAsmInfoE
[llvm::ARMInstPrinter::ARMInstPrinter(llvm::MCAsmInfo
const&)]+0x2a): undefined reference to `vtable for
llvm::ARMInstPrinter'
llvm-svn: 136328
This can happen in cases where TableGen generated asm matcher cannot check
whether a register operand is in the right register class. e.g. mem operands.
rdar://8204588
llvm-svn: 136292
llvm-mc gives an "invalid operand" error for instructions that take an unsigned
immediate which have the high bit set such as:
pblendw $0xc5, %xmm2, %xmm1
llvm-mc treats all x86 immediates as signed values and range checks them.
A small number of x86 instructions use the imm8 field as a set of bits.
This change only changes those instructions and where the high bit is not
ignored. The others remain unchanged.
llvm-svn: 136287
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.
llvm-svn: 136200
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.
This code is generated for __builtin_clz and friends.
llvm-svn: 136167
The first problem to fix is to stop creating synthetic *Table_gen
targets next to all of the LLVM libraries. These had no real effect as
CMake specifies that add_custom_command(OUTPUT ...) directives (what the
'tablegen(...)' stuff expands to) are implicitly added as dependencies
to all the rules in that CMakeLists.txt.
These synthetic rules started to cause problems as we started more and
more heavily using tablegen files from *subdirectories* of the one where
they were generated. Within those directories, the set of tablegen
outputs was still available and so these synthetic rules added them as
dependencies of those subdirectories. However, they were no longer
properly associated with the custom command to generate them. Most of
the time this "just worked" because something would get to the parent
directory first, and run tablegen there. Once run, the files existed and
the build proceeded happily. However, as more and more subdirectories
have started using this, the probability of this failing to happen has
increased. Recently with the MC refactorings, it became quite common for
me when touching a large enough number of targets.
To add insult to injury, several of the backends *tried* to fix this by
adding explicit dependencies back to the parent directory's tablegen
rules, but those dependencies didn't work as expected -- they weren't
forming a linear chain, they were adding another thread in the race.
This patch removes these synthetic rules completely, and adds a much
simpler function to declare explicitly that a collection of tablegen'ed
files are referenced by other libraries. From that, we can add explicit
dependencies from the smaller libraries (such as every architectures
Desc library) on this and correctly form a linear sequence. All of the
backends are updated to use it, sometimes replacing the existing attempt
at adding a dependency, sometimes adding a previously missing dependency
edge.
Please let me know if this causes any problems, but it fixes a rather
persistent and problematic source of build flakiness on our end.
llvm-svn: 136023
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.
llvm-svn: 136002
unwind encoding for that function. This simply crawls through the prolog looking
for machine instrs marked as "frame setup". It can calculate from these what the
compact unwind should look like.
This is currently disabled because of needed linker support. But initial tests
look good.
llvm-svn: 135922
the way to go. Doing this here will prevent several node matches later,
and would have to force looking all the way through several
VINSERTF128/VEXTRACTF128 chains to optimize simple things.
llvm-svn: 135730
and was actually very wrong, fix it and make it simpler. Also remove the
ConcatVectors function, which is unused now.
- Fix a introduction of useless nodes in r126664 and r126264. The
VUNPCKL* should never be introduced cause we don't want duplicate
nodes for 128 AVX and non-AVX modes, the actual instruction
difference only exists during isel, but not for target specific DAG
nodes. We only introduce V* target nodes when there is no 128-bit
version already there.
- Fix a fragile test and make it more useful.
llvm-svn: 135729
instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:
Instead of:
vextractf128 $0, %ymm0, %xmm0
punpcklbw %xmm0, %xmm0
punpckhbw %xmm0, %xmm0
vinsertf128 $0, %xmm0, %ymm0, %ymm1
vinsertf128 $1, %xmm0, %ymm1, %ymm0
vextractf128 $1, %ymm0, %xmm1
shufps $1, %xmm1, %xmm1
movss %xmm1, 28(%rsp)
movss %xmm1, 24(%rsp)
movss %xmm1, 20(%rsp)
movss %xmm1, 16(%rsp)
vextractf128 $0, %ymm0, %xmm0
shufps $1, %xmm0, %xmm0
movss %xmm0, 12(%rsp)
movss %xmm0, 8(%rsp)
movss %xmm0, 4(%rsp)
movss %xmm0, (%rsp)
vmovaps (%rsp), %ymm0
We get:
vextractf128 $0, %ymm0, %xmm0
punpcklbw %xmm0, %xmm0
punpckhbw %xmm0, %xmm0
vinsertf128 $0, %xmm0, %ymm0, %ymm1
vinsertf128 $1, %xmm0, %ymm1, %ymm0
vpermilps $85, %ymm0, %ymm0
llvm-svn: 135662
refactor the code and add a bunch of comments. The final shuffle
emitted by handling 256-bit types is suitable for the VPERM shuffle
instruction which is going to be introduced in a next commit (with
a testcase which cover this commit)
llvm-svn: 135661
- Introduce JITDefault code model. This tells targets to set different default
code model for JIT. This eliminates the ugly hack in TargetMachine where
code model is changed after construction.
llvm-svn: 135580
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.
llvm-svn: 135468
to MCRegisterInfo. Also initialize the mapping at construction time.
This patch eliminate TargetRegisterInfo from TargetAsmInfo. It's another step
towards fixing the layering violation.
llvm-svn: 135424
1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us
canonize the loads and handle things the same way we use to handle
for 128-bit registers. Despite of what one of the removed comments
explained, the load promotion would not mess with VPERM, it's only a
matter of doing the appropriate bitcasts when this instructions comes
to be introduced. Also make LOAD v8i32 legal.
2) Doing 1) exposed two bugs:
- v4i64 was being promoted to itself for several opcodes (introduced
in r124447 by David Greene) causing endless recursion and the stack to
explode.
- there was no support for allOnes BUILD_VECTORs and ANDNP would fail to
match because it was generating early target constant pools during
lowering.
3) The testcases are already checked-in, doing 1) exposed the
bugs in the current testcases.
4) Tidy up code to be more clear and explicit about AVX.
llvm-svn: 135313
was really intended, and it may have been required prior to some of the
recent refactors. Including it however causes LLVMX86Desc to need
symbols from LLVMX86CodeGen, forming a dependency cycle. This was masked
in almost all builds: Clang, and GCC w/ optimizations didn't actually
emit the symbols!
llvm-svn: 135242
backend. Moved some MCAsmInfo files down into the MCTargetDesc
sublibraries, removed some (i suspect long) dead files from other parts
of the CMake build, etc. Also copied the include directory hack from the
Makefile.
Finally, updated the lib deps. I spot checked this, and think its
correct, but review appreciated there.
llvm-svn: 135234
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.
Fixes PR10352 and rdar://9777108
llvm-svn: 135180
During type legalization we often use the SIGN_EXTEND_INREG SDNode.
When this SDNode is legalized during the LegalizeVector phase, it is
scalarized because non-simple types are automatically marked to be expanded.
In this patch we add support for lowering SIGN_EXTEND_INREG manually.
This fixes CodeGen/X86/vec_sext.ll when running with the '-promote-elements'
flag.
llvm-svn: 135144
Update the debug output interface for MCParsedAsmOperand to have a print()
method which takes an output stream argument, an << operator which invokes
the print method using the given stream, and a dump() method which prints
the operand to the dbgs() stream. This makes the interface more consistent
with the rest of LLVM, and more convenient to use at the debugger command
line.
llvm-svn: 135043
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
llvm-svn: 134884
CPU, and feature string. Parsing some asm directives can change
subtarget state (e.g. .code 16) and it must be reflected in other
modules (e.g. MCCodeEmitter). That is, the MCSubtargetInfo instance
must be shared.
llvm-svn: 134795
- Each target asm parser now creates its own MCSubtatgetInfo (if needed).
- Changed AssemblerPredicate to take subtarget features which tablegen uses
to generate asm matcher subtarget feature queries. e.g.
"ModeThumb,FeatureThumb2" is translated to
"(Bits & ModeThumb) != 0 && (Bits & FeatureThumb2) != 0".
llvm-svn: 134678
Add a MI->emitError() method that the backend can use to report errors
related to inline assembly. Call it from X86FloatingPoint.cpp when the
constraints are wrong.
This enables proper clang diagnostics from the backend:
$ clang -c pr30848.c
pr30848.c:5:12: error: Inline asm output regs must be last on the x87 stack
__asm__ ("" : "=u" (d)); /* { dg-error "output regs" } */
^
1 error generated.
llvm-svn: 134307
itineraries.
- Refactor TargetSubtarget to be based on MCSubtargetInfo.
- Change tablegen generated subtarget info to initialize MCSubtargetInfo
and hide more details from targets.
llvm-svn: 134257
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.
The fix is to just have the clients explictly pass the CPU name!
llvm-svn: 134127
Some x86-32 calls pop values off the stack, and we need to readjust the
stack pointer after the call. This happens when ADJCALLSTACKUP is
eliminated.
It could happen that spill code was inserted between the CALL and
ADJCALLSTACKUP instructions, and we would compute wrong stack pointer
offsets for those frame index references.
Fix this by inserting the stack pointer adjustment immediately after the
call instead of where the ADJCALLSTACKUP instruction was erased.
I don't have a test case since we don't currently insert code in that
position. We will soon, though. I am testing a regalloc patch that
didn't work on Linux because of this.
llvm-svn: 134113
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
Drop the FpMov instructions, use plain COPY instead.
Drop the FpSET/GET instruction for accessing fixed stack positions.
Instead use normal COPY to/from ST registers around inline assembly, and
provide a single new FpPOP_RETVAL instruction that can access the return
value(s) from a call. This is still necessary since you cannot tell from
the CALL instruction alone if it returns anything on the FP stack. Teach
fast isel to use this.
This provides a much more robust way of handling fixed stack registers -
we can tolerate arbitrary FP stack instructions inserted around calls
and inline assembly. Live range splitting could sometimes break x87 code
by inserting spill code in unfortunate places.
As a bonus we handle floating point inline assembly correctly now.
llvm-svn: 134018
Move the target-specific RecordRelocation logic out of the generic MC
MachObjectWriter and into the target-specific object writers. This allows
nuking quite a bit of target knowledge from the supposedly target-independent
bits in lib/MC.
llvm-svn: 133844
target machine from those that are only needed by codegen. The goal is to
sink the essential target description into MC layer so we can start building
MC based tools without needing to link in the entire codegen.
First step is to refactor TargetRegisterInfo. This patch added a base class
MCRegisterInfo which TargetRegisterInfo is derived from. Changed TableGen to
separate register description from the rest of the stuff.
llvm-svn: 133782
This simplifies many of the target description files since it is common
for register classes to be related or contain sequences of numbered
registers.
I have verified that this doesn't change the files generated by TableGen
for ARM and X86. It alters the allocation order of MBlaze GPR and Mips
FGR32 registers, but I believe the change is benign.
llvm-svn: 133105
optimizations when emitting calls to the function; instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call. This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.
Currently only implemented for x86-64, where it turns into a call through
the global offset table.
Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.
llvm-svn: 133080
we try to branch to them.
Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for
-----------
...
jne foo
jmp bar
foo:
----------
llvm-svn: 132882
memcpy/memset symbol doesn't get marked up correctly in PIC modes otherwise.
Should fix llvm-x86_64-linux-checks buildbot. Followup to r132864.
llvm-svn: 132869
The register allocators automatically filter out reserved registers and
place the callee saved registers last in the allocation order, so custom
methods are no longer necessary just for that.
Some targets still use custom allocation orders:
ARM/Thumb: The high registers are removed from GPR in thumb mode. The
NEON allocation orders prefer to use non-VFP2 registers first.
X86: The GR8 classes omit AH-DH in x86-64 mode to avoid REX trouble.
SystemZ: Some of the allocation orders are omitting R12 aliases without
explanation. I don't understand this target well enough to fix that. It
looks like all the boilerplate could be removed by reserving the right
registers.
llvm-svn: 132781
floating-point comparison, generate a mask of 0s or 1s, and generally
DTRT with NaNs. Only profitable when the user wants a materialized 0
or 1 at runtime. rdar://problem/5993888
llvm-svn: 132404
Add TargetRegisterInfo::hasSubClassEq and use it to check for compatible
register classes instead of trying to list all register classes in
X86's getLoadStoreRegOpcode.
llvm-svn: 132398
same dwarf number. This will be used for creating a dwarf number to register
mapping.
The only case that needs this so far is the XMM/YMM registers that unfortunately
do have the same numbers.
llvm-svn: 132314
was saying that the matching superregister class of GR32_NOREX in GR64_NOREX_NOSP
is GR64_NOREX, which drops the NOSP constraint. This fixes PR10032.
llvm-svn: 132225
The practical effects here are that x86-64 fast-isel can now handle trunc from i8 to i1, and ARM fast-isel can handle many more constructs involving integers narrower than 32 bits (including loads, stores, and many integer casts).
rdar://9437928 .
llvm-svn: 132099
non-zero.
- Teach X86 cmov optimization to eliminate the cmov from ctlz, cttz extension
when the source of X86ISD::BSR / X86ISD::BSF is proven to be non-zero.
rdar://9490949
llvm-svn: 131948
after folding ADD32ri to ADD32mi, so don't do that.
This only happens when the greedy register allocator gets itself in trouble and
spills %vreg9 here:
16L %vreg9<def> = MOVPC32r 0, %ESP<imp-use>; GR32:%vreg9
48L %vreg9<def> = ADD32ri %vreg9, <es:_GLOBAL_OFFSET_TABLE_>[TF=1], %EFLAGS<imp-def,dead>; GR32:%vreg9
That should never happen, the live range should be split instead.
llvm-svn: 130625
Currently the output should be almost identical to the one produced by CodeGen
to make the transition easier.
The only two differences I know of are:
* Some files get an extra advance loc of size 0. This will be fixed when
relaxations are enabled.
* The optimization of declaring an EH symbol as an external variable is not
implemented. This is a subset of adding the nounwind attribute, so we if really
this at -O0 we should probably do it at the IL level.
llvm-svn: 130623
give it a bit more responsibility. Also implement it for MachO.
If hacked to use cfi, 32 bit MachO will produce
.cfi_personality 155, L___gxx_personality_v0$non_lazy_ptr
and 64 bit will produce
.cfi_presonality ___gxx_personality_v0
The general idea is that .cfi_personality gets passed the final symbol. It is
up to codegen to produce it if using indirect representation (like 32 bit
MachO), but it is up to MC to decide which relocations to create.
llvm-svn: 130341
The hook will be used by the register allocator when recomputing register
classes after removing constraints.
Thumb1 code doesn't allow anything larger than tGPR, and x86 needs to ensure
that the spill size doesn't change.
llvm-svn: 130228
This tends to happen a lot with bitfield code generated by clang. A simple example for x86_64 is
uint64_t foo(uint64_t x) { return (x&1) << 42; }
which used to compile into bloated code:
shlq $42, %rdi ## encoding: [0x48,0xc1,0xe7,0x2a]
movabsq $4398046511104, %rax ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x00,0x04,0x00,0x00]
andq %rdi, %rax ## encoding: [0x48,0x21,0xf8]
ret ## encoding: [0xc3]
with this patch we can fold the immediate into the and:
andq $1, %rdi ## encoding: [0x48,0x83,0xe7,0x01]
movq %rdi, %rax ## encoding: [0x48,0x89,0xf8]
shlq $42, %rax ## encoding: [0x48,0xc1,0xe0,0x2a]
ret ## encoding: [0xc3]
It's possible to save another byte by using 'andl' instead of 'andq' but I currently see no way of doing
that without making this code even more complicated. See the TODOs in the code.
llvm-svn: 129990
On the x86-64 and thumb2 targets, some registers are more expensive to encode
than others in the same register class.
Add a CostPerUse field to the TableGen register description, and make it
available from TRI->getCostPerUse. This represents the cost of a REX prefix or a
32-bit instruction encoding required by choosing a high register.
Teach the greedy register allocator to prefer cheap registers for busy live
ranges (as indicated by spill weight).
llvm-svn: 129864
llvm is built with unsigned chars where an immediate such as 0xff would be zero
extended to 64-bits, turning "cmp $0xff,%eax" into
"cmp $0xffffffffffffffff,%eax".
llvm-svn: 129845
when they are a truncate from something else. This eliminates fully half of all the
fastisel rejections on a test c++ file I'm working with, which should make a substantial
improvement for -O0 compile of c++ code.
This fixed rdar://9297003 - fast isel bails out on all functions taking bools
llvm-svn: 129752
Before we would bail out on i1 arguments all together, now we just bail on
non-constant ones. Also, we used to emit extraneous code. e.g. test12 was:
movb $0, %al
movzbl %al, %edi
callq _test12
and test13 was:
movb $0, %al
xorl %edi, %edi
movb %al, 7(%rsp)
callq _test13f
Now we get:
movl $0, %edi
callq _test12
and:
movl $0, %edi
callq _test13f
llvm-svn: 129751
the generated FastISel. X86 doesn't need to generate code to match ADD16ri8
since ADD16ri will do just fine. This is a small codesize win in the generated
instruction selector.
llvm-svn: 129692
simplifying them and exposing more information to tblgen. It would be nice
if other target authors adopted this as well, particularly arm since it has fastisel.
llvm-svn: 129676
kind of predicate: one that is specific to imm nodes. The predicate function
specified here just checks an int64_t directly instead of messing around with
SDNode's. The virtue of this is that it means that fastisel and other things
can reason about these predicates.
llvm-svn: 129675
structure and fix some fixmes. We now have a TreePredicateFn class
that handles all of the decoding of these things. This is an internal
cleanup that has no impact on the code generated by tblgen.
llvm-svn: 129670
2. implement rdar://9289501 - fast isel should fold trivial multiplies to shifts
3. teach tblgen to handle shift immediates that are different sizes than the
shifted operands, eliminating some code from the X86 fast isel backend.
4. Have FastISel::SelectBinaryOp use (the poorly named) FastEmit_ri_ function
instead of FastEmit_ri to simplify code.
llvm-svn: 129666
when we have a global variable base an an index. Instead, just give up on
folding the global variable.
Before we'd geenrate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
leaq (%rax), %rax
addq %rdi, %rax
movzbl (%rax), %eax
ret
now we generate:
_test: ## @test
## BB#0:
movq _rtx_length@GOTPCREL(%rip), %rax
movzbl (%rax,%rdi), %eax
ret
The difference is even more significant when there is a scale
involved.
This fixes rdar://9289558 - total fail with addr mode formation at -O0/x86-64
llvm-svn: 129664
less trivial things) into a dummy lea. Before we generated:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
leaq (%rax), %rax
ret
now we produce:
_test: ## @test
movq _G@GOTPCREL(%rip), %rax
ret
This is part of rdar://9289558
llvm-svn: 129662
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
llvm-svn: 129571
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
llvm-svn: 129401
with the newer, cleaner model. It uses the IAPrinter class to hold the
information that is needed to match an instruction with its alias. This also
takes into account the available features of the platform.
There is one bit of ugliness. The way the logic determines if a pattern is
unique is O(N**2), which is gross. But in reality, the number of items it's
checking against isn't large. So while it's N**2, it shouldn't be a massive time
sink.
llvm-svn: 129110
Define most shift masks incrementally to reduce the redundant
hard-coding. Introduce new shift for the VEX flags to replace the
magic constant 32 in various places.
llvm-svn: 128822
the alias of an InstAlias instead of the thing being aliased. Because we need to
know the features that are valid for an InstAlias.
This is part of a work-in-progress.
llvm-svn: 127986
to have single return block (at least getting there) for optimizations. This
is general goodness but it would prevent some tailcall optimizations.
One specific case is code like this:
int f1(void);
int f2(void);
int f3(void);
int f4(void);
int f5(void);
int f6(void);
int foo(int x) {
switch(x) {
case 1: return f1();
case 2: return f2();
case 3: return f3();
case 4: return f4();
case 5: return f5();
case 6: return f6();
}
}
=>
LBB0_2: ## %sw.bb
callq _f1
popq %rbp
ret
LBB0_3: ## %sw.bb1
callq _f2
popq %rbp
ret
LBB0_4: ## %sw.bb3
callq _f3
popq %rbp
ret
This patch teaches codegenprep to duplicate returns when the return value
is a phi and where the phi operands are produced by tail calls followed by
an unconditional branch:
sw.bb7: ; preds = %entry
%call8 = tail call i32 @f5() nounwind
br label %return
sw.bb9: ; preds = %entry
%call10 = tail call i32 @f6() nounwind
br label %return
return:
%retval.0 = phi i32 [ %call10, %sw.bb9 ], [ %call8, %sw.bb7 ], ... [ 0, %entry ]
ret i32 %retval.0
This allows codegen to generate better code like this:
LBB0_2: ## %sw.bb
jmp _f1 ## TAILCALL
LBB0_3: ## %sw.bb1
jmp _f2 ## TAILCALL
LBB0_4: ## %sw.bb3
jmp _f3 ## TAILCALL
rdar://9147433
llvm-svn: 127953
not have native support for this operation (such as X86).
The legalized code uses two vector INT_TO_FP operations and is faster
than scalarizing.
llvm-svn: 127951
comparisons on x86. Essentially, the way this works is that SUB+SBB sets
the relevant flags the same way a double-width CMP would.
This is a substantial improvement over the generic lowering in LLVM. The output
is also shorter than the gcc-generated output; I haven't done any detailed
benchmarking, though.
llvm-svn: 127852
rather than an int. Thankfully, this only causes LLVM to miss optimizations, not
generate incorrect code.
This just fixes the zext at the return. We still insert an i32 ZextAssert when
reading a function's arguments, but it is followed by a truncate and another i8
ZextAssert so it is not optimized.
llvm-svn: 127766
instruction set. This code adds support for the VEX prefix
and for the YMM registers accessible on AVX-enabled
architectures. Instruction table support that enables AVX
instructions for the disassembler is in an upcoming patch.
llvm-svn: 127644
flexible.
If it returns a register class that's different from the input, then that's the
register class used for cross-register class copies.
If it returns a register class that's the same as the input, then no cross-
register class copies are needed (normal copies would do).
If it returns null, then it's not at all possible to copy registers of the
specified register class.
llvm-svn: 127368
testcases accordingly. Some are currently xfailed and will be filed
as bugs to be fixed or understood.
Performance results:
roughly neutral on SPEC
some micro benchmarks in the llvm suite are up between 100 and 150%, only
a pair of regressions that are due to be investigated
john-the-ripper saw:
10% improvement in traditional DES
8% improvement in BSDI DES
59% improvement in FreeBSD MD5
67% improvement in OpenBSD Blowfish
14% improvement in LM DES
Small compile time impact.
llvm-svn: 127208
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
llvm-svn: 127067
and 256-bit forms. Because the number of elements in a vector
does not determine the vector type (4 elements could be v4f32 or
v4f64), pass the full type of the vector to decode routines.
llvm-svn: 126664