HVC::calculatePointerDifference inserts temporary instructions for
simplification, and calulation of known bits. These instructions were
inserted at the end of a basic block (after the terminator), which
caused BB->getTerminator() to return nullptr. This, in turn, caused
a crash when a PHI instruction was examined in computeKnownBits.
The carry bit from an intermediate addition was not properly propagated.
For example mulhs(7fffffff, 7fffffff) was evaluated as 3ffeffff, while
the correct result is 3fffffff.
This should fix
```
Pass modifies its input and doesn't report it: Hexagon Vector Combine
Pass modifies its input and doesn't report it UNREACHABLE executed at
[...hecks-debian/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1436!
```
HexagonSubtarget::isTypeFixHVX would stop breaking the type up when it
reached 64 bits in width. HVX vector predicates can be shorter than that,
for example <32 x i1> would have a bitwidth of 32, and it's still a valid
HVX type.
HVX v62+ has bidirectional shifts, which do not mask the shift amount to
the bit width. Instead, the shift amount is sign-extended from the log(BW)
bit value, and a negative value causes a shift in the other direction.
For the shift amount being -log(BW), this reversed shift will shift all
bits out, inserting 0s or sign bits depending on the type and direction.
HVX v60 only has splats that take a 32-bit word as input, while v62+
has splats that take 8- or 16-bit value. This makes writing output
patterns that need to use a splat annoying, because the entire output
pattern needs to be replicated for various versions of HVX.
To avoid this, the patterns will always use the pseudos, and then the
pseudos will be handled using a post-ISel hook.
V6_vzb and V6_vshuffeb can use any 2 resources in a packet, while
V6_vunpackub/V6_vpackeb both need a shift resource.
Also, add patterns for shifting vectors of i8.
There are intrinsics for most scalar instructions and almost all HVX
instructions. What's somewhat painful is that there are two intrinsics
for each HVX instruction: one for 64- and one for 128-byte mode.
Instead of checking the current codegen settings every time, this
function would simply return the right intrinsic.
1. `length(value/type)`: return the number of elements in the vector
input,
2. `getHvxTy(elem_type)`: return the HVX vector type with the element
type provided.
These will help write things more succintly.
EVT can be created for any Type, and so this function can now be used to
check if given Type, as-is, is an HVX type (as opposed to a type that may
be subject to legalization to an HVX type).
Resizing operations (e.g. sign extension) in DAG can go from any width
to any other width, e.g. i8 -> i32. If the input and the result differ
by a factor larger than 2, the operation cannot be legal in HVX, since
the only two legal vector sizes in HVX are a single vector and a pair
of vectors.
To simplify the legalization, such operations are expanded into steps
that only double/halve the type size, so that each such step can be fully
legalized on its own. The complication is that DAG will automatically
fold these steps back into one, e.g. sext(sext) -> sext. To prevent that
new HexagonISD nodes are introduced: TL_EXTEND and TL_TRUNCATE. Once
legalized, these nodes are replaced with the original opcodes.
The type legalization is now common to aext/sext/zext/trunc and Hexagon-
specific ssat/usat nodes.
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit. This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.
For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.
Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.
Differential Revision: https://reviews.llvm.org/D132520
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet. The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.
This is the change which motivated the whole sequence which preceeded it. In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact. This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.
I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance. For instance, every parameter which changes type in this change also changes name. This was intentional to make sure that every call site possible effected must show up in the diff. This let me audit each one closely.
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.
Differential Revision: https://reviews.llvm.org/D132287
This avoids deprecation warning:
```
warning: definition of implicit copy assignment operator for 'AddrInfo'
is deprecated because it has a user-declared copy constructor
[-Wdeprecated-copy]
```
This fixes https://github.com/llvm/llvm-project/issues/57229
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.
E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.
Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D117723
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697