may be RAUW'd by the recursive call to LegalizeOps; instead, retrieve
the other operands when calling UpdateNodeOperands. Fixes PR12889.
llvm-svn: 157162
SelectionDAGBuilder::Clusterify : main functinality was replaced with CRSBuilder::optimize, so big part of Clusterify's code was reduced.
llvm-svn: 157046
When a combine twiddles an extract_vector, care should be take to preserve
the type of the index operand. No luck extracting a reasonable testcase,
unfortunately.
rdar://11391009
llvm-svn: 156419
The getPointerRegClass() hook can return register classes that depend on
the calling convention of the current function (ptr_rc_tailcall).
So far, we have been able to infer the calling convention from the
subtarget alone, but as we add support for multiple calling conventions
per target, that no longer works.
Patch by Yiannis Tsiouris!
llvm-svn: 156328
This will be used to determine whether it's profitable to turn a select into a
branch when the branch is likely to be predicted.
Currently enabled for everything but Atom on X86 and Cortex-A9 devices on ARM.
I'm not entirely happy with the name of this flag, suggestions welcome ;)
llvm-svn: 156233
We want the representative register class to contain the largest
super-registers available. This makes the function less sensitive to the
register class numbering.
llvm-svn: 156220
The masks returned by SuperRegClassIterator are computed automatically
by TableGen. This is better than depending on the manually specified
SuperRegClasses.
llvm-svn: 156147
The ensures that virtual registers always belong to an allocatable class.
If your target attempts to create a vreg for an operand that has no
allocatable register subclass, you will crash quickly.
This ensures that targets define register classes as intended.
llvm-svn: 156046
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.
Fixes rdar://11314175: BuildSchedUnits assert.
llvm-svn: 155749
DAGCombine strangeness may result in multiple loads from the same
offset. They both may try to glue themselves to another load. We could
insist that the redundant loads glue themselves to each other, but the
beter fix is to bail out from bad gluing at the time we detect it.
Fixes rdar://11314175: BuildSchedUnits assert.
llvm-svn: 155668
The X86 target is editing the selection DAG while isel is selecting
nodes following a topological ordering. When the DAG hacking triggers
CSE, nodes can be deleted and bad things happen.
llvm-svn: 155257
Now that multiple DAGUpdateListeners can be active at the same time,
ISelPosition can become a local variable in DoInstructionSelection.
We simply register an ISelUpdater with CurDAG while ISelPosition exists.
llvm-svn: 155249
Instead of passing listener pointers to RAUW, let SelectionDAG itself
keep a linked list of interested listeners.
This makes it possible to have multiple listeners active at once, like
RAUWUpdateListener was already doing. It also makes it possible to
register listeners up the call stack without controlling all RAUW calls
below.
DAGUpdateListener uses an RAII pattern to add itself to the SelectionDAG
list of active listeners.
llvm-svn: 155248
transformation:
(X op C1) ^ C2 --> (X op C1) & ~C2 iff (C1&C2) == C2
should be done.
This change has been tested:
Using a debug+asserts build:
on the specific test case that brought this bug to light
make check-all
lnt nt
using this clang to build a release version of clang
Using the release+asserts clang-with-clang build:
on the specific test case that brought this bug to light
make check-all
lnt nt
Checking in because Evan wants it checked in. Test case forthcoming after
scrubbing.
llvm-svn: 154955
Fix a dagcombine optimization which assumes that the vsetcc result type is always
of the same size as the compared values. This is ture for SSE/AVX/NEON but not
for all targets.
llvm-svn: 154490
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.
PR12419
rdar://9770785
rdar://11195178
llvm-svn: 154370
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly. There is already an IR level transform that does
that, and it does it more carefully.
llvm-svn: 154296
in TargetLowering. There was already a FIXME about this location being
odd. The interface is simplified as a consequence. This will also make
it easier to change TLS models when compiling with PIE.
llvm-svn: 154292
shuffle node because it could introduce new shuffle nodes that were not
supported efficiently by the target.
2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
second shuffle reverses the transformation of the first shuffle.
llvm-svn: 154266
reciprocal if converting to the reciprocal is exact. Do it even if inexact
if -ffast-math. This substantially speeds up ac.f90 from the polyhedron
benchmarks.
llvm-svn: 154265
LSR always tries to make the ICmp in the loop latch use the incremented
induction variable. This allows the induction variable to be kept in a
single register.
When the induction variable limit is equal to the stride,
SimplifySetCC() would break LSR's hard work by transforming:
(icmp (add iv, stride), stride) --> (cmp iv, 0)
This forced us to use lea for the IC update, preventing the simpler
incl+cmp.
<rdar://problem/7643606>
<rdar://problem/11184260>
llvm-svn: 154119
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
When folding X == X we need to check getBooleanContents() to determine if the
result is a vector of ones or a vector of negative ones.
I tried creating a test case, but the problem seems to only be exposed on a
much older version of clang (around r144500).
rdar://10923049
llvm-svn: 153966
Do not try to optimize swizzles of shuffles if the source shuffle has more than
a single user, except when the source shuffle is also a swizzle.
llvm-svn: 153864
This is the CodeGen equivalent of r153747. I tested that there is not noticeable
performance difference with any combination of -O0/-O2 /-g when compiling
gcc as a single compilation unit.
llvm-svn: 153817
here but it has no other uses, then we have a problem. E.g.,
int foo (const int *x) {
char a[*x];
return 0;
}
If we assign 'a' a vreg and fast isel later on has to use the selection
DAG isel, it will want to copy the value to the vreg. However, there are
no uses, which goes counter to what selection DAG isel expects.
<rdar://problem/11134152>
llvm-svn: 153705
execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester.
This may also improve compile-time on architectures that would otherwise
generate a libcall for urem (e.g., ARM) or fall back to the DAG selector.
rdar://10810716
llvm-svn: 153230
Type legalization can zero-extend the elements of the build_vector node, so,
for example, we may have an <8 x i8> with i32 elements of value 255. That
should return 'true' for the vector being all ones.
llvm-svn: 153203
a variable. The previous code would break the debug info changing
code invariant. This will regress debug info for arguments where
we elide the alloca created.
Fixes rdar://11066468
llvm-svn: 153074
It caused MSP430DAGToDAGISel::SelectIndexedBinOp() to be miscompiled.
When two ReplaceUses()'s are expanded as inline, vtable in base class is stored to latter (ISelUpdater)ISU.
llvm-svn: 152877
(i16 load $addr+c*sizeof(i16)) and replace uses of (i32 vextract) with the
i16 load. It should issue an extload instead: (i32 extload $addr+c*sizeof(i16)).
rdar://11035895
llvm-svn: 152675
Renamed methods caseBegin, caseEnd and caseDefault with case_begin, case_end, and case_default.
Added some notes relative to case iterators.
llvm-svn: 152532
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120130/136146.html
Implemented CaseIterator and it solves almost all described issues: we don't need to mix operand/case/successor indexing anymore. Base iterator class is implemented as a template since it may be initialized either from "const SwitchInst*" or from "SwitchInst*".
ConstCaseIt is just a read-only iterator.
CaseIt is read-write iterator; it allows to change case successor and case value.
Usage of iterator allows totally remove resolveXXXX methods. All indexing convertions done automatically inside the iterator's getters.
Main way of iterator usage looks like this:
SwitchInst *SI = ... // intialize it somehow
for (SwitchInst::CaseIt i = SI->caseBegin(), e = SI->caseEnd(); i != e; ++i) {
BasicBlock *BB = i.getCaseSuccessor();
ConstantInt *V = i.getCaseValue();
// Do something.
}
If you want to convert case number to TerminatorInst successor index, just use getSuccessorIndex iterator's method.
If you want initialize iterator from TerminatorInst successor index, use CaseIt::fromSuccessorIndex(...) method.
There are also related changes in llvm-clients: klee and clang.
llvm-svn: 152297
ScheduleDAG is responsible for the DAG: SUnits and SDeps. It provides target hooks for latency computation.
ScheduleDAGInstrs extends ScheduleDAG and defines the current scheduling region in terms of MachineInstr iterators. It has access to the target's scheduling itinerary data. ScheduleDAGInstrs provides the logic for building the ScheduleDAG for the sequence of MachineInstrs in the current region. Target's can implement highly custom schedulers by extending this class.
ScheduleDAGPostRATDList provides the driver and diagnostics for current postRA scheduling. It maintains a current Sequence of scheduled machine instructions and logic for splicing them into the block. During scheduling, it uses the ScheduleHazardRecognizer provided by the target.
Specific changes:
- Removed driver code from ScheduleDAG. clearDAG is the only interface needed.
- Added enterRegion/exitRegion hooks to ScheduleDAGInstrs to delimit the scope of each scheduling region and associated DAG. They should be used to setup and cleanup any region-specific state in addition to the DAG itself. This is necessary because we reuse the same ScheduleDAG object for the entire function. The target may extend these hooks to do things at regions boundaries, like bundle terminators. The hooks are called even if we decide not to schedule the region. So all instructions in a block are "covered" by these calls.
- Added ScheduleDAGInstrs::begin()/end() public API.
- Moved Sequence into the driver layer, which is specific to the scheduling algorithm.
llvm-svn: 152208
To avoid problems with zero shifts when getting the bits that move between words
we use a trick: first shift the by amount-1, then do another shift by one. When
amount is 0 (and size 32) we first shift by 31, then by one, instead of by 32.
Also fix a latent bug that emitted the low and high words in the wrong order
when shifting right.
Fixes PR12113.
llvm-svn: 151637
When the GEP index is a vector of pointers, the code that calculated the size
of the element started from the vector type, and not the contained pointer type.
As a result, instead of looking at the data element pointed by the vector, this
code used the size of the vector. This works for 32bit members (on 32bit
systems), but not for other types. Added code to peel the vector type and
added a test.
llvm-svn: 151626
the processor keeps a return addresses stack (RAS) which stores the address
and the instruction execution state of the instruction after a function-call
type branch instruction.
Calling a "noreturn" function with normal call instructions (e.g. bl) can
corrupt RAS and causes 100% return misprediction so LLVM should use a
unconditional branch instead. i.e.
mov lr, pc
b _foo
The "mov lr, pc" is issued in order to get proper backtrace.
rdar://8979299
llvm-svn: 151623
variable declaration as an argument because we want that address
anyhow for our debug information.
This seems to fix rdar://9965111, at least we have more debug
information than before and from reading the assembly it appears
to be the correct location.
llvm-svn: 151335
that are greater than the vector element type. For example BUILD_VECTOR
of type <1 x i1> with a constant i8 operand.
This patch fixes the assertion.
llvm-svn: 150477