when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly. There is already an IR level transform that does
that, and it does it more carefully.
llvm-svn: 154296
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
llvm-svn: 154292
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
llvm-svn: 154266
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
llvm-svn: 154265
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
llvm-svn: 154119
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
When folding X == X we need to check getBooleanContents() to determine if the
result is a vector of ones or a vector of negative ones.
I tried creating a test case, but the problem seems to only be exposed on a
much older version of clang (around r144500).
rdar://10923049
llvm-svn: 153966
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.
llvm-svn: 153864
This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.
llvm-svn: 153817
here but it has no other uses, then we have a problem. E.g.,
int foo (const int *x) {
char a[*x];
return 0;
}
If we assign 'a' a vreg and fast isel later on has to use the selection
DAG isel, it will want to copy the value to the vreg. However, there are
no uses, which goes counter to what selection DAG isel expects.
<rdar://problem/11134152>
llvm-svn: 153705
execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester.
This may also improve compile-time on architectures that would otherwise
generate a libcall for urem (e.g., ARM) or fall back to the DAG selector.
rdar://10810716
llvm-svn: 153230
Type legalization can zero-extend the elements of the build_vector node, so,
for example, we may have an <8 x i8> with i32 elements of value 255. That
should return 'true' for the vector being all ones.
llvm-svn: 153203
a variable. The previous code would break the debug info changing
code invariant. This will regress debug info for arguments where
we elide the alloca created.
Fixes rdar://11066468
llvm-svn: 153074
It caused MSP430DAGToDAGISel::SelectIndexedBinOp() to be miscompiled.
When two ReplaceUses()'s are expanded as inline, vtable in base class is stored to latter (ISelUpdater)ISU.
llvm-svn: 152877
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).
rdar://11035895
llvm-svn: 152675
Renamed methods caseBegin, caseEnd and caseDefault with case_begin, case_end, and case_default.
Added some notes relative to case iterators.
llvm-svn: 152532
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120130/136146.html
Implemented CaseIterator and it solves almost all described issues: we don't need to mix operand/case/successor indexing anymore. Base iterator class is implemented as a template since it may be initialized either from "const SwitchInst*" or from "SwitchInst*".
ConstCaseIt is just a read-only iterator.
CaseIt is read-write iterator; it allows to change case successor and case value.
Usage of iterator allows totally remove resolveXXXX methods. All indexing convertions done automatically inside the iterator's getters.
Main way of iterator usage looks like this:
SwitchInst *SI = ... // intialize it somehow
for (SwitchInst::CaseIt i = SI->caseBegin(), e = SI->caseEnd(); i != e; ++i) {
BasicBlock *BB = i.getCaseSuccessor();
ConstantInt *V = i.getCaseValue();
// Do something.
}
If you want to convert case number to TerminatorInst successor index, just use getSuccessorIndex iterator's method.
If you want initialize iterator from TerminatorInst successor index, use CaseIt::fromSuccessorIndex(...) method.
There are also related changes in llvm-clients: klee and clang.
llvm-svn: 152297
ScheduleDAG is responsible for the DAG: SUnits and SDeps. It provides target hooks for latency computation.
ScheduleDAGInstrs extends ScheduleDAG and defines the current scheduling region in terms of MachineInstr iterators. It has access to the target's scheduling itinerary data. ScheduleDAGInstrs provides the logic for building the ScheduleDAG for the sequence of MachineInstrs in the current region. Target's can implement highly custom schedulers by extending this class.
ScheduleDAGPostRATDList provides the driver and diagnostics for current postRA scheduling. It maintains a current Sequence of scheduled machine instructions and logic for splicing them into the block. During scheduling, it uses the ScheduleHazardRecognizer provided by the target.
Specific changes:
- Removed driver code from ScheduleDAG. clearDAG is the only interface needed.
- Added enterRegion/exitRegion hooks to ScheduleDAGInstrs to delimit the scope of each scheduling region and associated DAG. They should be used to setup and cleanup any region-specific state in addition to the DAG itself. This is necessary because we reuse the same ScheduleDAG object for the entire function. The target may extend these hooks to do things at regions boundaries, like bundle terminators. The hooks are called even if we decide not to schedule the region. So all instructions in a block are "covered" by these calls.
- Added ScheduleDAGInstrs::begin()/end() public API.
- Moved Sequence into the driver layer, which is specific to the scheduling algorithm.
llvm-svn: 152208
To avoid problems with zero shifts when getting the bits that move between words
we use a trick: first shift the by amount-1, then do another shift by one. When
amount is 0 (and size 32) we first shift by 31, then by one, instead of by 32.
Also fix a latent bug that emitted the low and high words in the wrong order
when shifting right.
Fixes PR12113.
llvm-svn: 151637
When the GEP index is a vector of pointers, the code that calculated the size
of the element started from the vector type, and not the contained pointer type.
As a result, instead of looking at the data element pointed by the vector, this
code used the size of the vector. This works for 32bit members (on 32bit
systems), but not for other types. Added code to peel the vector type and
added a test.
llvm-svn: 151626
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
variable declaration as an argument because we want that address
anyhow for our debug information.
This seems to fix rdar://9965111, at least we have more debug
information than before and from reading the assembly it appears
to be the correct location.
llvm-svn: 151335
that are greater than the vector element type. For example BUILD_VECTOR
of type <1 x i1> with a constant i8 operand.
This patch fixes the assertion.
llvm-svn: 150477
The scheduler will sometimes check the implicit-def list on instructions
to properly handle pre-colored DAG edges.
Also check any register mask operands for physreg clobbers.
llvm-svn: 150428
v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes.
The DAGCombiner has two optimizations that can mitigate the problem. First,
if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT
nodes, then it is possible to create a new simplified BUILD_VECTOR which uses
UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes.
Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle
vector instruction.
In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be
shuffled into a wide YMM register.
This patch modifes the second optimization and allows the creation of
shuffle vectors even when the newly generated vector and the original vector
from which we extract the values are of different types.
llvm-svn: 150340
Make them accessible through MCInstrInfo. They are only used for debugging purposes so this doesn't
have an impact on performance. X86MCTargetDesc.o goes from 630K to 461K on x86_64.
llvm-svn: 150245
but with a critical fix to the SelectionDAG code that optimizes copies
from strings into immediate stores: the previous code was stopping reading
string data at the first nul. Address this by adding a new argument to
llvm::getConstantStringInfo, preserving the behavior before the patch.
llvm-svn: 149800
SelectionDAG has 4 different ways of passing physreg defs to users.
Collect all of the uses at the same time, and pass all of them to
MI->setPhysRegsDeadExcept() to mark the remaining defs dead.
The setPhysRegsDeadExcept() function will soon add the required
implicit-defs to instructions with register mask operands.
llvm-svn: 149708
In this patch we optimize this pattern and convert the sequence into extract op of a narrow type.
This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases.
llvm-svn: 149692
This new scheduler plugs into the existing selection DAG scheduling framework. It is a top-down critical path scheduler that tracks register pressure and uses a DFA for pipeline modeling.
Patch by Sergei Larin!
llvm-svn: 149547
The purpose of refactoring is to hide operand roles from SwitchInst user (programmer). If you want to play with operands directly, probably you will need lower level methods than SwitchInst ones (TerminatorInst or may be User). After this patch we can reorganize SwitchInst operands and successors as we want.
What was done:
1. Changed semantics of index inside the getCaseValue method:
getCaseValue(0) means "get first case", not a condition. Use getCondition() if you want to resolve the condition. I propose don't mix SwitchInst case indexing with low level indexing (TI successors indexing, User's operands indexing), since it may be dangerous.
2. By the same reason findCaseValue(ConstantInt*) returns actual number of case value. 0 means first case, not default. If there is no case with given value, ErrorIndex will returned.
3. Added getCaseSuccessor method. I propose to avoid usage of TerminatorInst::getSuccessor if you want to resolve case successor BB. Use getCaseSuccessor instead, since internal SwitchInst organization of operands/successors is hidden and may be changed in any moment.
4. Added resolveSuccessorIndex and resolveCaseIndex. The main purpose of these methods is to see how case successors are really mapped in TerminatorInst.
4.1 "resolveSuccessorIndex" was created if you need to level down from SwitchInst to TerminatorInst. It returns TerminatorInst's successor index for given case successor.
4.2 "resolveCaseIndex" converts low level successors index to case index that curresponds to the given successor.
Note: There are also related compatability fix patches for dragonegg, klee, llvm-gcc-4.0, llvm-gcc-4.2, safecode, clang.
llvm-svn: 149481
This SelectionDAG node will be attached to call nodes by LowerCall(),
and eventually becomes a MO_RegisterMask MachineOperand on the
MachineInstr representing the call instruction.
LowerCall() will attach a register mask that depends on the calling
convention.
llvm-svn: 148436
We know that the blend instructions only use the MSB, so if the mask is
sign-extended then we can convert it into a SHL instruction. This is a
common pattern because the type-legalizer sign-extends the i1 type which
is used by the LLVM-IR for the condition.
Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL.
llvm-svn: 148225
overly conservative. It was concerned about cases where it would prohibit
folding simple [r, c] addressing modes. e.g.
ldr r0, [r2]
ldr r1, [r2, #4]
=>
ldr r0, [r2], #4
ldr r1, [r2]
Change the logic to look for such cases which allows it to form indexed memory
ops more aggressively.
rdar://10674430
llvm-svn: 148086
When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32
and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen
the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX.
llvm-svn: 147964
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:
unsigned x = my_accelerator_table[input >> 11];
Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):
*(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));
The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.
In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.
llvm-svn: 147936
of several newly un-defaulted switches. This also helps optimizers
(including LLVM's) recognize that every case is covered, and we should
assume as much.
llvm-svn: 147861
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.
llvm-svn: 147604
Before we'd get:
$ clang t.c
fatal error: error in backend: Invalid operand for inline asm constraint 'i'!
Now we get:
$ clang t.c
t.c:16:5: error: invalid operand for inline asm constraint 'i'!
"movq (%4), %%mm0\n"
^
Which at least gets us the inline asm that is the problem.
llvm-svn: 147502
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147445
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.
The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.
I added a special test that checks vector shuffle on win32.
llvm-svn: 147399
undefined result. This adds new ISD nodes for the new semantics,
selecting them when the LLVM intrinsic indicates that the undef behavior
is desired. The new nodes expand trivially to the old nodes, so targets
don't actually need to do anything to support these new nodes besides
indicating that they should be expanded. I've done this for all the
operand types that I could figure out for all the targets. Owners of
various targets, please review and let me know if any of these are
incorrect.
Note that the expand behavior is *conservatively correct*, and exactly
matches LLVM's current behavior with these operations. Ideally this
patch will not change behavior in any way. For example the regtest suite
finds the exact same instruction sequences coming out of the code
generator. That's why there are no new tests here -- all of this is
being exercised by the existing test suite.
Thanks to Duncan Sands for reviewing the various bits of this patch and
helping me get the wrinkles ironed out with expanding for each target.
Also thanks to Chris for clarifying through all the discussions that
this is indeed the approach he was looking for. That said, there are
likely still rough spots. Further review much appreciated.
llvm-svn: 146466
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.
llvm-svn: 146150
generator to it. For non-bundle instructions, these behave exactly the same
as the MC layer API.
For properties like mayLoad / mayStore, look into the bundle and if any of the
bundled instructions has the property it would return true.
For properties like isPredicable, only return true if *all* of the bundled
instructions have the property.
For properties like canFoldAsLoad, isCompare, conservatively return false for
bundles.
llvm-svn: 146026
1. Added opcode BUNDLE
2. Taught MachineInstr class to deal with bundled MIs
3. Changed MachineBasicBlock iterator to skip over bundled MIs; added an iterator to walk all the MIs
4. Taught MachineBasicBlock methods about bundled MIs
llvm-svn: 145975
change, now you need a TargetOptions object to create a TargetMachine. Clang
patch to follow.
One small functionality change in PTX. PTX had commented out the machine
verifier parts in their copy of printAndVerify. That now calls the version in
LLVMTargetMachine. Users of PTX who need verification disabled should rely on
not passing the command-line flag to enable it.
llvm-svn: 145714
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621
llvm-svn: 145300
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.
rdar://10301431
llvm-svn: 145273
dropping weights on the floor for invokes. This was impeding my writing
further test cases for invoke when interacting with probabilities and
block placement.
No test case as there doesn't appear to be a way to test this stuff. =/
Suggestions for a test case of course welcome. I hope to be able to add
test cases that indirectly cover this eventually by adding probabilities
to the exceptional edge and reordering blocks as a result.
llvm-svn: 145060
ADDs. MaxOffs is used as a threshold to limit the size of the offset. Tradeoffs
being: (1) If we can't materialize the large constant then we'll cause fast-isel
to bail. (2) Too large of an offset can't be directly encoded in the ADD
resulting in a MOV+ADD. Generally not a bad thing because otherwise we would
have had ADD+ADD, but on Thumb this turns into a MOVS+MOVT+ADD. Working on a fix
for that. (3) Conversely, too low of a threshold we'll miss opportunities to
coalesce ADDs.
rdar://10412592
llvm-svn: 144886