Before this patch in pic 32 bit code we would add the global base register
and not load from that address. This is a really old bug, but before the
introduction of the tls attributes we would never select initial exec for
pic code.
llvm-svn: 159409
The TargetInstrInfo::getNumMicroOps API does not change, but soon it
will be used by MachineScheduler. Now each subtarget can specify the
number of micro-ops per itinerary class. For ARM, this is currently
always dynamic (-1), because it is used for load/store multiple which
depends on the number of register operands.
Zero is now a valid number of micro-ops. This can be used for
nop pseudo-instructions or instructions that the hardware can squash
during dispatch.
llvm-svn: 159406
Corrected type for index of llvm.x86.avx2.gather.d.pd.256
from 256-bit to 128-bit.
Corrected types for src|dst|mask of llvm.x86.avx2.gather.q.ps.256
from 256-bit to 128-bit.
Support the following intrinsics:
llvm.x86.avx2.gather.d.q, llvm.x86.avx2.gather.q.q
llvm.x86.avx2.gather.d.q.256, llvm.x86.avx2.gather.q.q.256
llvm.x86.avx2.gather.d.d, llvm.x86.avx2.gather.q.d
llvm.x86.avx2.gather.d.d.256, llvm.x86.avx2.gather.q.d.256
llvm-svn: 159402
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
llvm-svn: 159312
It takes advantage of r159299 which introduces relocation support for N64.
elf-dump needed to be upgraded to support N64 relocations as well.
This passes make check.
Jack
llvm-svn: 159302
It takes advantage of r159299 which introduces relocation support for N64.
elf-dump needed to be upgraded to support N64 relocations as well.
This passes make check.
Jack
llvm-svn: 159301
which many Mips 64 ABIs use than for O64 which many
if not all other target ABIs use.
Most architectures have the following 64 bit relocation record format:
typedef struct
{
Elf64_Addr r_offset; /* Address of reference */
Elf64_Xword r_info; /* Symbol index and type of relocation */
} Elf64_Rel;
typedef struct
{
Elf64_Addr r_offset;
Elf64_Xword r_info;
Elf64_Sxword r_addend;
} Elf64_Rela;
Whereas N64 has the following format:
typedef struct
{
Elf64_Addr r_offset;/* Address of reference */
Elf64_Word r_sym; /* Symbol index */
Elf64_Byte r_ssym; /* Special symbol */
Elf64_Byte r_type3; /* Relocation type */
Elf64_Byte r_type2; /* Relocation type */
Elf64_Byte r_type; /* Relocation type */
} Elf64_Rel;
typedef struct
{
Elf64_Addr r_offset;/* Address of reference */
Elf64_Word r_sym; /* Symbol index */
Elf64_Byte r_ssym; /* Special symbol */
Elf64_Byte r_type3; /* Relocation type */
Elf64_Byte r_type2; /* Relocation type */
Elf64_Byte r_type; /* Relocation type */
Elf64_Sxword r_addend;
} Elf64_Rela;
The structure is the same size, but the r_info data element
is now 5 separate elements. Besides the content aspects,
endian byte reordering will be different for the area with
each element being endianized separately.
I treat this as generic and continue to pass r_type as
an integer masking and unmasking the byte sized N64
values for N64 mode. I've implemented this and it causes no
affect on other current targets.
This passes make check.
Jack
llvm-svn: 159299
up to r158925 were handled as processor specific. Making them
generic and putting tests for these modifiers in the CodeGen/Generic
directory caused a number of targets to fail.
This commit addresses that problem by having the targets call
the generic routine for generic modifiers that they don't currently
have explicit code for.
For now only generic print operands 'c' and 'n' are supported.vi
Affected files:
test/CodeGen/Generic/asm-large-immediate.ll
lib/Target/PowerPC/PPCAsmPrinter.cpp
lib/Target/NVPTX/NVPTXAsmPrinter.cpp
lib/Target/ARM/ARMAsmPrinter.cpp
lib/Target/XCore/XCoreAsmPrinter.cpp
lib/Target/X86/X86AsmPrinter.cpp
lib/Target/Hexagon/HexagonAsmPrinter.cpp
lib/Target/CellSPU/SPUAsmPrinter.cpp
lib/Target/Sparc/SparcAsmPrinter.cpp
lib/Target/MBlaze/MBlazeAsmPrinter.cpp
lib/Target/Mips/MipsAsmPrinter.cpp
MSP430 isn't represented because it did not even run with
the long existing 'c' modifier and it was not apparent what
needs to be done to get it inline asm ready.
Contributer: Jack Carter
llvm-svn: 159203
More condition codes are included when deciding whether to remove cmp after
a sub instruction. Specifically, we extend from GE|LT|GT|LE to
GE|LT|GT|LE|HS|LS|HI|LO|EQ|NE. If we have "sub a, b; cmp b, a; movhs", we
should be able to replace with "sub a, b; movls".
rdar: 11725965
llvm-svn: 159166
The function live-out registers must be live at all function returns,
and %RCX is only used by eh.return. When a function also has a normal
return, only %RAX holds a return value.
This fixes PR13188.
llvm-svn: 159116
This allows the user/front-end to specify a model that is better
than what LLVM would choose by default. For example, a variable
might be declared as
@x = thread_local(initialexec) global i32 42
if it will not be used in a shared library that is dlopen'ed.
If the specified model isn't supported by the target, or if LLVM can
make a better choice, a different model may be used.
llvm-svn: 159077
There are patterns to handle immediates when they fit in the immediate field.
e.g. %sub = add i32 %x, -123
=> sub r0, r0, #123
Add patterns to catch immediates that do not fit but should be materialized
with a single movw instruction rather than movw + movt pair.
e.g. %sub = add i32 %x, -65535
=> movw r1, #65535
sub r0, r0, r1
rdar://11726136
llvm-svn: 159057
As an example of how the custom DiagnosticType can be used to provide
better operand-mismatch diagnostics, add a custom diagnostic for
the imm0_15 operand class used for several system instructions.
Update the tests to expect the improved diagnostic.
rdar://8987109
llvm-svn: 159051
Original commit message:
Allow up to 64 functional units per processor itinerary.
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
llvm-svn: 159027
This makes it explicit when ScoreboardHazardRecognizer will be used.
"GenericItineraries" would only make sense if it contained real
itinerary values and still required ScoreboardHazardRecognizer.
llvm-svn: 158963
The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame
pointer exists, but the frame pointer was forced by the presence of
llvm.eh.unwind.init which isn't guaranteed.
If llvm.eh.unwind.init is actually required in functions calling
eh.return (is it?), we should diagnose that instead of emitting bad
machine code.
This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot.
llvm-svn: 158961
This is a minor drive-by fix with no robust way to unit test.
As an example see neon-div.ll:
SU(16): %Q8<def> = VMOVLsv4i32 %D17, pred:14, pred:%noreg, %Q8<imp-use,kill>
val SU(1): Latency=2 Reg=%Q8
...should be latency=1
llvm-svn: 158960
Minor drive by fix to cleanup latency computation. Calling
getOperandLatency with a deliberately incorrect operand index does not
give you the latency you want.
llvm-svn: 158959
boolean flag to an enum: { Fast, Standard, Strict } (default = Standard).
This option controls the creation by optimizations of fused FP ops that store
intermediate results in higher precision than IEEE allows (E.g. FMAs). The
behavior of this option is intended to match the behaviour specified by a
soon-to-be-introduced frontend flag: '-ffuse-fp-ops'.
Fast mode - allows formation of fused FP ops whenever they're profitable.
Standard mode - allow fusion only for 'blessed' FP ops. At present the only
blessed op is the fmuladd intrinsic. In the future more blessed ops may be
added.
Strict mode - allow fusion only if/when it can be proven that the excess
precision won't effect the result.
Note: This option only controls formation of fused ops by the optimizers. Fused
operations that are explicitly requested (e.g. FMA via the llvm.fma.* intrinsic)
will always be honored, regardless of the value of this option.
Internally TargetOptions::AllowExcessFPPrecision has been replaced by
TargetOptions::AllowFPOpFusion.
llvm-svn: 158956
to be generic across architectures. It has the
following description in the gnu sources:
Substitute immediate value without immediate syntax
Several Architectures such as x86 have local implementations
of operand modifier 'c' which go beyond the above description
slightly. To make use of the generic modifiers without overriding
local implementation one can make a call to the base class method
for AsmPrinter::PrintAsmOperand() in the locally derived method's
"default" case in the switch statement. That way if it is already
defined locally the generic version will never get called.
This change is needed when test/CodeGen/generic/asm-large-immediate.ll
failed on a native Mips board. The test was assuming a generic
implementation was in place.
Affected files:
lib/Target/Mips/MipsAsmPrinter.cpp:
Changed the default case to call the base method.
lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp
Added 'c' to the switch cases.
test/CodeGen/Mips/asm-large-immediate.ll
Mips compiled version of the generic one
Contributer: Jack Carter
llvm-svn: 158925
that are generated by TableGen and are already available in
MipsGenRegisterInfo.inc. Suggested by Jakob Stoklund Olesen.
Also, fix bug in function DecodeAFGR64RegisterClass.
Patch by Vladimir Medic.
llvm-svn: 158846
There is a pretty staggering amount of this in LLVM's header files, this
is not all of the instances I'm afraid. These include all of the
functions that (in my build) are used by a non-static inline (or
external) function. Specifically, these issues were caught by the new
'-Winternal-linkage-in-inline' warning.
I'll try to just clean up the remainder of the clearly redundant "static
inline" cases on functions (not methods!) defined within headers if
I can do so in a reliable way.
There were even several cases of a missing 'inline' altogether, or my
personal favorite "static bool inline". Go figure. ;]
llvm-svn: 158800
This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or
FSUB + FMUL. The combines are performed when:
(a) Either
AllowExcessFPPrecision option (-enable-excess-fp-precision for llc)
OR
UnsafeFPMath option (-enable-unsafe-fp-math)
are set, and
(b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of
the FADD/FSUB, and
(c) The FMUL only has one user (the FADD/FSUB).
If your target has fast FMA instructions you can make use of these combines by
overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for
types supported by your FMA instruction, and adding patterns to match ISD::FMA
to your FMA instructions.
llvm-svn: 158757
The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
llvm-svn: 158743
For processors with the G5-like instruction-grouping scheme, this helps avoid
early group termination due to a write-after-write dependency within the group.
It should also help on pipelined embedded cores.
On POWER7, over the test suite, this gives an average 0.5% speedup. The largest
speedups are:
SingleSource/Benchmarks/Stanford/Quicksort - 33%
MultiSource/Applications/d/make_dparser - 21%
MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12%
MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12%
Largest slowdowns:
SingleSource/Benchmarks/Stanford/Bubblesort - 23%
MultiSource/Benchmarks/Prolangs-C++/city/city - 21%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16%
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13%
llvm-svn: 158719
TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM,
on X86 it is not easy to find out if .init_array should be used or not, so
the decision is made via TargetOptions and defaults to off.
Add a command line option to llc that enables it.
llvm-svn: 158692
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
llvm-svn: 158679
The NOP, WFE, WFI, SEV and YIELD instructions are all hints w/
a different immediate value in bits [7,0]. Define a generic HINT
instruction and refactor NOP, WFI, WFI, SEV and YIELD to be
assembly aliases of that.
rdar://11600518
llvm-svn: 158674
when a compile time constant is known. This occurs when implicitly zero
extending function arguments from 16 bits to 32 bits. The 8 bit case doesn't
need to be handled, as the 8 bit constants are encoded directly, thereby
not needing a separate load instruction to form the constant into a register.
<rdar://problem/11481151>
llvm-svn: 158659
This patch causes problems when both dynamic stack realignment and
dynamic allocas combine in the same function. With this patch, we no
longer build the epilog correctly, and silently restore registers from
the wrong position in the stack.
Thanks to Matt for tracking this down, and getting at least an initial
test case to Chad. I'm going to try to check a variation of that test
case in so we can easily track the fixes required.
llvm-svn: 158654
This cleans up the method used to find trip counts in order to form CTR loops on PPC.
This refactoring allows the pass to find loops which have a constant trip count but also
happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different
classes of loops that are currently ignored.
In addition, we now search through all potential induction operations instead of just the first.
Also, we check the predicate code on the conditional branch and abort the transformation if the
code is not EQ or NE, and we then make sure that the branch to be transformed matches the
condition register defined by the comparison (multiple possible comparisons will be considered).
llvm-svn: 158607
This patch will optimize abs(x-y)
FROM
sub, movs, rsbmi
TO
subs, rsbmi
For abs, we will use cmp instead of movs. This is necessary because we already
have an existing peephole pass which optimizes away cmp following sub.
rdar: 11633193
llvm-svn: 158551
to load an immediate that does not fit into 16-bit. Also, take into
consideration the global base register slot on the stack when computing the
stack size.
llvm-svn: 158430
delay slot filler pass of MIPS, per suggestion of Jakob Stoklund Olesen.
This change, along with the fix in r158154, enables machine verification
to be run after delay slot filling.
llvm-svn: 158426
POWER4 is a 64-bit CPU (better matched to the 970).
The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750).
Patch by Andreas Tobler.
llvm-svn: 158363
Original commit message:
Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).
llvm-svn: 158349
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).
llvm-svn: 158337
The PPC target feature gpul (IsGigaProcessor) was only used for one thing:
To enable the generation of the MFOCRF instruction. Furthermore, this
instruction is available on other PPC cores outside of the G5 line. This
feature now corresponds to the HasMFOCRF flag.
No functionality change.
llvm-svn: 158323
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>
llvm-svn: 158302
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).
Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)
Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%
llvm-svn: 158296
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.
This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%
Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%
llvm-svn: 158294
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.
Thanks to Jakob and Owen for their suggestions in this matter.
llvm-svn: 158283
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).
Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%
Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%
This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.
llvm-svn: 158259
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
llvm-svn: 158223
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.
Thanks again to Jakob for suggesting this fix to the CTR loop problem!
llvm-svn: 158221
Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.
The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.
To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.
llvm-svn: 158212
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
llvm-svn: 158206
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
%vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16
llvm-svn: 158205
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
llvm-svn: 158204
This patch will generate the following for integer ABS:
movl %edi, %eax
negl %eax
cmovll %edi, %eax
INSTEAD OF
movl %edi, %ecx
sarl $31, %ecx
leal (%rdi,%rcx), %eax
xorl %ecx, %eax
There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov.
This is implemented in PerformXorCombine.
rdar://10695237
llvm-svn: 158175
This patch will optimize the following
movq %rdi, %rax
subq %rsi, %rax
cmovsq %rsi, %rdi
movq %rdi, %rax
to
cmpq %rsi, %rdi
cmovsq %rsi, %rdi
movq %rdi, %rax
Perform this optimization if the actual result of SUB is not used.
rdar: 11540023
llvm-svn: 158126
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.
llvm-svn: 158122
LLVM is now -Wunused-private-field clean except for
- lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields.
- gtest.
llvm-svn: 158096
There are some that I didn't remove this round because they looked like
obvious stubs. There are dead variables in gtest too, they should be
fixed upstream.
llvm-svn: 158090
This allows a subtarget to explicitly specify the issue width and
other properties without providing pipeline stage details for every
instruction.
llvm-svn: 157979
when a compile time constant is known. This occurs when implicitly zero
extending function arguments from 16 bits to 32 bits.
<rdar://problem/11481151>
llvm-svn: 157966
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.
llvm-svn: 157911
This is the first of a series of patches which make changes to the backend to
emit unaligned load/store instructions (lwl,lwr,swl,swr) during instruction
selection.
llvm-svn: 157862
No functional change intended.
Sorry for the churn. The iterator classes are supposed to help avoid
giant commits like this one in the future. The TableGen-produced
register lists are getting quite large, and it may be necessary to
change the table representation.
This makes it possible to do so without changing all clients (again).
llvm-svn: 157854
This patch will optimize the following:
sub r1, r3
cmp r3, r1 or cmp r1, r3
bge L1
TO
sub r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can eliminate
the "cmp" instruction.
llvm-svn: 157831
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.
llvm-svn: 157818
We handle struct byval by inserting a pseudo op, which will be expanded to a
loop at ExpandISelPseudos.
A separate patch for clang will be submitted to enable struct byval.
rdar://9877866
llvm-svn: 157793
This patch will optimize the following
movq %rdi, %rax
subq %rsi, %rax
cmovsq %rsi, %rdi
movq %rdi, %rax
to
cmpq %rsi, %rdi
cmovsq %rsi, %rdi
movq %rdi, %rax
Perform this optimization if the actual result of SUB is not used.
rdar: 11540023
llvm-svn: 157755
Reg-units are named after their root registers, and most units have a
single root, so they simply print as 'AL', 'XMM0', etc. The rare dual
root reg-units print as FPSCR~FPSCR_NZCV, FP0~ST7, ...
The printing piggybacks on the existing register name tables, so no
extra const data space is required.
llvm-svn: 157754
I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks.
I added tests for GodeGen and intrinsics.
I did not change llvm.fma.f32/64 - it may be done later.
llvm-svn: 157737
integer registers. This is already supported by the fastcc convention, but it doesn't
hurt to support it in the standard conventions as well.
In cases where we can cheat at the calling convention, this allows us to avoid returning
things through memory in more cases.
llvm-svn: 157698
This required light surgery on the assembler and disassembler
because the instructions use an uncommon encoding. They are
the only two instructions in x86 that use register operands
and two immediates.
llvm-svn: 157634
to pass around a struct instead of a large set of individual values. This
cleans up the interface and allows more information to be added to the struct
for future targets without requiring changes to each and every target.
NV_CONTRIB
llvm-svn: 157479
instruction encodings can be excluded during mips16 processing.
This revision fixes the issue raised by Jim Grosbach.
bool hasStandardEncoding() const { return !inMips16Mode(); }
When micromips is added it will be
bool StandardEncoding() const { return !inMips16Mode()&& !inMicroMipsMode(); }
No additional testing is needed other than to assure that there is no regression
from this patch.
Patch by Reed Kotler.
llvm-svn: 157234
32-bit offset jump tables just use real branch instructions and so aren't
marked as data regions. We were still emitting the .end_data_region
marker though, which assert()ed.
rdar://11499158
llvm-svn: 157221
The current code will generate a prologue which starts with something like:
mflr 0
stw 31, -4(1)
stw 0, 4(1)
stwu 1, -16(1)
But under the PPC32 SVR4 ABI, access to negative offsets from R1 is not allowed.
This was pointed out by Peter Bergner.
llvm-svn: 157133
Use a dedicated MachO load command to annotate data-in-code regions.
This is the same format the linker produces for final executable images,
allowing consistency of representation and use of introspection tools
for both object and executable files.
Data-in-code regions are annotated via ".data_region"/".end_data_region"
directive pairs, with an optional region type.
data_region_directive := ".data_region" { region_type }
region_type := "jt8" | "jt16" | "jt32" | "jta32"
end_data_region_directive := ".end_data_region"
The previous handling of ARM-style "$d.*" labels was broken and has
been removed. Specifically, it didn't handle ARM vs. Thumb mode when
marking the end of the section.
rdar://11459456
llvm-svn: 157062
the 0b10 mask encoding bits. Make MSR APSR writes without a _<bits> qualifier
an alias for MSR APSR_nzcvq even though ARM as deprecated it use. Also add
support for suffixes (_nzcvq, _g, _nzcvqg) for APSR versions. Some FIXMEs in
the code for better error checking when versions shouldn't be used.
rdar://11457025
llvm-svn: 157019
llc to recognize MIPS16 as a MIPS ASE extension. -mips16 will mean the
mips16 ASE for mips32 by default.
As part of fixing of adding this we discovered some small changes that
need to be made to MipsInstrInfo::storeRegToStackSLot and
MipsInstrInfo::loadRegFromStackSlot. We were using some "==" equality tests
where in fact we should have been using Mips::<regclas>.hasSubClassEQ instead,
per suggestion of Jakob Stoklund Olesen.
Patch by Reed Kotler.
llvm-svn: 156958
The purpose of this option is to silence error messages issued by machine
verifier passes and enable them to run to the end. If this option is not
provided, -verify-machineinstrs complains when it discovers there is a
non-terminator instruction (an instruction that is in a delay slot) after the
first terminator in a basic block.
llvm-svn: 156790
the ones that get or set the frame index for the $gp save slot.
Remove the piece of code in MipsFunctionInfo::getGlobalBaseReg() which returns
GP. This function should always return a virtual register.
llvm-svn: 156695
- Stop creating stack frame objects needed for saving $gp.
- Insert a node that copies the global pointer register to register $gp
before the call node. This will ensure $gp is valid at the entry of the
called function.
llvm-svn: 156692
- Stop emitting instructions needed to initialize the global pointer register.
- Stop emitting .cprestore directive.
- Do not take into account the $gp save slot when computing stack size.
llvm-svn: 156691
- Remove code which lowers pseudo SETGP01.
- Fix LowerSETGP01. The first two of the three instructions that are emitted to
initialize the global pointer register now use register $2.
- Stop emitting .cpload directive.
llvm-svn: 156689
pointer register.
This is the first of the series of patches which clean up the way global pointer
register is used. The patches will make the following improvements:
- Make $gp an allocatable temporary register rather than reserving it.
- Use a virtual register as the global pointer register and let the register
allocator decide which register to assign to it or whether spill/reloads are
needed.
- Make sure $gp is valid at the entry of a called function, which is necessary
for functions using lazy binding.
- Remove the need for emitting .cprestore and .cpload directives.
llvm-svn: 156671
This patch will optimize the following cases:
sub r1, r3 | sub r1, imm
cmp r3, r1 or cmp r1, r3 | cmp r1, imm
bge L1
TO
subs r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.
rdar: 10734411
llvm-svn: 156599
This patch will optimize the following cases:
sub r1, r3 | sub r1, imm
cmp r3, r1 or cmp r1, r3 | cmp r1, imm
bge L1
TO
subs r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.
rdar: 10734411
llvm-svn: 156550
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
This function is a generalization of getMatchingSuperRegClass() to the
symmetric case where both sides are using a sub-register index. It will
find a super-register class and sub-register indexes that make this
diagram commute:
PreA
SuperRC ----------> RCA
| |
| |
PreB | | SubA
| |
| |
V V
RCB ----------> SubRC
SubB
This can be used to coalesce copies like:
%vreg1:sub16 = COPY %vreg2:sub16; GR64:%vreg1, GR32: %vreg2
llvm-svn: 156317
This patch will optimize -(x != 0) on X86
FROM
cmpl $0x01,%edi
sbbl %eax,%eax
notl %eax
TO
negl %edi
sbbl %eax %eax
In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;
rdar: 10961709
llvm-svn: 156312
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
llvm-svn: 156233
In file included from ../lib/Target/NVPTX/VectorElementize.cpp:53:
../lib/Target/NVPTX/NVPTX.h:44:3: warning: default label in switch which covers all enumeration values [-Wcovered-switch-default]
default: assert(0 && "Unknown condition code");
^
1 warning generated.
The prevailing pattern in LLVM is to not use a default label, and instead to
use llvm_unreachable to denote that the switch in fact covers all return paths
from the function.
llvm-svn: 156209
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196
This moves the logic for selecting a TLS model to a single place,
instead of the previous three (ARM, Mips, and X86 which already
uses this function).
llvm-svn: 156162
This iterator class provides a more abstract interface to the (Idx,
Mask) lists of super-registers for a register class. The layout of the
tables shouldn't be exposed to clients.
llvm-svn: 156144
for the assembler and disassembler. Which were not being set/read correctly
for offsets greater than 22 bits in some cases.
Changes to lib/Target/ARM/ARMAsmBackend.cpp from Gideon Myles!
llvm-svn: 156118
The ensures that virtual registers always belong to an allocatable class.
If your target attempts to create a vreg for an operand that has no
allocatable register subclass, you will crash quickly.
This ensures that targets define register classes as intended.
llvm-svn: 156046
Expressions for movw/movt don't always have an :upper16: or :lower16:
on them and that's ok. When they don't, it's just a plain [0-65536]
immediate result, effectively the same as a :lower16: variant kind.
rdar://10550147
llvm-svn: 155941
in order to avoid assertion failures in the register scavenger. The assertion
failures were “Bad machine code: Using an undefined physical register” and
“Bad machine code: MBB exits via unconditional fall-through but its successor
differs from its CFG successor!”.
llvm-svn: 155930
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0
FROM
movl %edi, %ecx
subl %esi, %ecx
cmpl %edi, %esi
movl $0, %eax
cmovll %ecx, %eax
TO
xorl %eax, %eax
subl %esi, %edi
cmovll %eax, %edi
movl %edi, %eax
rdar: 10734411
llvm-svn: 155919