Commit Graph

11872 Commits

Author SHA1 Message Date
Craig Topper 38ffa2bb96 [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D130862
2022-09-12 10:34:52 -07:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
Joe Loser 5e96cea1db [llvm] Use std::size instead of llvm::array_lengthof
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.

Change call sites to use `std::size` instead.

Differential Revision: https://reviews.llvm.org/D133429
2022-09-08 09:01:53 -06:00
David Spickett e428baf001 [LLVM][ARM] Remove options for armv2, 2A, 3 and 3M
Fixes #57486

These pre v4 architectures are not specifically supported
by codegen. As demonstrated in the linked issue.

GCC has not supported 3M since GCC 9 and presumably
2 and 2A earlier than that. So we are aligned in that sense.

(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2abd6e34fcf3bd9f9ffafcaa47cdc3ed443f9add)

This removes the options and associated testing.

The Pre_v4 build attribute remains mainly because its absence
would be more confusing. It will not be used other than to
complete the list of build attributes as shown in the ABI.

https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#3352the-target-related-attributes

Reviewed By: nickdesaulniers, peter.smith, rengolin

Differential Revision: https://reviews.llvm.org/D133109
2022-09-08 09:49:48 +00:00
Marco Elver 0ba8886af5 [FastISel] Propagate PCSections metadata to MachineInstr
Propagate PC sections metadata to MachineInstr when FastISel is doing
instruction selection.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D130884
2022-09-07 11:36:01 +02:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
John Brawn e26cadcc32 [ARM] Constant pools need 4-byte alignment if we only have tADR
When the only ADR instruction we have is the 16-bit thumb one then all
constant pool entries need to be 4-byte aligned, as tADR has an offset
that's a multiple of 4.

It looks like previously there happened to be no situations in which
we encountered a constant pool entry with alignment less than 4, so
failing to do this didn't cause any problems, but the expansion of
cttz to a table added by D128911 does use a constant pool with
alignment 1, so we now need to handle it correctly.

Differential Revision: https://reviews.llvm.org/D133199
2022-09-06 11:36:12 +01:00
Vitaly Buka 6c52736e02 Revert "[llvm] Use range-based for loops (NFC)"
range-based loop should not be used here, as
fixupImmediateBr push_backs into the container.

http://lab.llvm.org/buildbot/#/builders/168
http://lab.llvm.org/buildbot/#/builders/74
http://lab.llvm.org/buildbot/#/builders/5
http://lab.llvm.org/buildbot/#/builders/239
http://lab.llvm.org/buildbot/#/builders/237
http://lab.llvm.org/buildbot/#/builders/236

This reverts commit fedc59734a.
2022-09-04 15:28:53 -07:00
Kazu Hirata fedc59734a [llvm] Use range-based for loops (NFC) 2022-09-03 11:17:40 -07:00
Sam Clegg 92920c4fe3 [MC][WebAssembly] Allow accurate errors in doBeforeLabelEmit
Although we only currently have one error produced in this function I am
working on changes right now that add some more.  This change makes the
error location more accurate.

Differential Revision: https://reviews.llvm.org/D133016
2022-09-01 01:26:33 -07:00
Kazu Hirata 2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Kazu Hirata c63f823875 [llvm] Use range-based for loops (NFC) 2022-08-28 17:35:04 -07:00
Alex Richardson df00dac828 [ARM] Use getSymbolPreferLocal() in GetARMGVSymbol
This allows relaxing some relocations to symbol+offset instead of emitting
a relocation against a symbol.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131433
2022-08-26 09:34:06 +00:00
Simon Pilgrim f9de13232f [X86] Promote i8/i16 CTTZ (BSF) instructions and remove speculation branch
This patch adds a Type operand to the TLI isCheapToSpeculateCttz/isCheapToSpeculateCtlz callbacks, allowing targets to decide whether branches should occur on a type-by-type/legality basis.

For X86, this patch proposes to allow CTTZ speculation for i8/i16 types that will lower to promoted i32 BSF instructions by masking the operand above the msb (we already do something similar for i8/i16 TZCNT). This required a minor tweak to CTTZ lowering - if the src operand is known never zero (i.e. due to the promotion masking) we can remove the CMOV zero src handling.

Although BSF isn't very fast, most CPUs from the last 20 years don't do that bad a job with it, although there are some annoying passthrough EFLAGS dependencies. Additionally, now that we emit 'REP BSF' in most cases, we are tending towards assuming this will most likely be executed as a TZCNT instruction on any semi-modern CPU.

Differential Revision: https://reviews.llvm.org/D132520
2022-08-24 17:28:18 +01:00
Philip Reames df20ff9ae2 [TTI] Kill last couple uses of OperandValueKind in targets [nfc]
Use the accessor methods on the containing class instead so that we can change the representation.
2022-08-23 08:54:41 -07:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Alan Zhao 8c8cfaaf0a Revert "[ARM] Use getSymbolPreferLocal() in GetARMGVSymbol"
This reverts commit 6db15a82cc.

Reverted because this breaks offical Chrome builds targeting Android on
arm: https://crbug.com/1354305

Repro: https://drive.google.com/file/d/1pgQI2adwx3DJJqIYvMY4i249ouHU0rmu/view?usp=sharing
2022-08-22 16:16:37 -04:00
David Penry ced705c440 [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reapplication of D128941/(reversion:D132037) with small fix.

Differential Revision: https://reviews.llvm.org/D132170
2022-08-22 12:10:13 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata ec5eab7e87 Use range-based for loops (NFC) 2022-08-20 21:18:32 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
David Penry 1c9f0408bc Revert "[ModuloSchedule] Add interface call to accept/reject SMS schedules"
This reverts commit 8c4aea438c.

Needed because buildbot failures (warnings) gave a clue that there was
a functional bug in the ARM rejection logic.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D132037
2022-08-17 09:32:43 -07:00
David Penry 8c4aea438c [ModuloSchedule] Add interface call to accept/reject SMS schedules
This interface allows a target to reject a proposed
SMS schedule.  For Hexagon/PowerPC, all schedules
are accepted, leaving behavior unchanged.  For ARM,
schedules which exceed register pressure limits are
rejected.

Also, two RegisterPressureTracker methods now need to be public so
that register pressure can be computed by more callers.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D128941
2022-08-17 08:13:26 -07:00
Victor Campos 784da8a722 [ARM] Simplify the creation of escaped build attribute values
There is an existing mechanism to escape strings, therefore the
functions created to escape Tag_also_compatible_with values are not
really needed. We can simply use the pre-existing utilities.

Reviewed By: pratlucas

Differential Revision: https://reviews.llvm.org/D131680
2022-08-16 11:49:33 +01:00
Kazu Hirata 6d9cd9199a Use llvm::all_of (NFC) 2022-08-14 16:25:36 -07:00
Kazu Hirata 109df7f9a4 [llvm] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-13 12:55:42 -07:00
Pengxuan Zheng 9bb6622423 [ARM] Do not use LOAD_STACK_GUARD with ROPI/RWPI
ROPI/RWPI are not supported with LOAD_STACK_GUARD currently.

Reviewed By: nickdesaulniers, rengolin

Differential Revision: https://reviews.llvm.org/D131427
2022-08-09 14:59:08 -07:00
Alex Richardson 6db15a82cc [ARM] Use getSymbolPreferLocal() in GetARMGVSymbol
This allows relaxing some relocations to STT_SECTION symbol+offset
instead of emitting a relocation against a symbol.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131433
2022-08-09 09:53:47 +00:00
Alex Richardson 9a2b14afa0 [ARM] Emit local aliases (.Lfoo$local) for functions
ARMAsmPrinter::emitFunctionEntryLabel() was not calling the base class
function so the $local alias was not being emitted. This should not have
any function effect right now since ARM does not generate different code
for the $local symbols, but it could be improved in the future.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D131392
2022-08-09 09:53:47 +00:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Simon Tatham 72017e9b16 [llvm-objdump,ARM] Fix big-endian AArch32 disassembly.
The ABI for big-endian AArch32, as specified by AAELF32, is above-
averagely complicated. Relocatable object files are expected to store
instruction encodings in byte order matching the ELF file's endianness
(so, big-endian for a BE ELF file). But executable images can
//either// do that //or// store instructions little-endian regardless
of data and ELF endianness (to support BE32 and BE8 platforms
respectively). They signal the latter by setting the EF_ARM_BE8 flag
in the ELF header.

(In the case of the Thumb instruction set, this all means that each
16-bit halfword of a Thumb instruction is stored in one or other
endianness. The two halfwords of a 32-bit Thumb instruction must
appear in the same order no matter what, because the first halfword is
the one that must avoid overlapping the encoding of any 16-bit Thumb
instruction.)

llvm-objdump was unconditionally expecting Arm instructions to be
stored little-endian. So it would correctly disassemble a BE8 image,
but if you gave it a BE32 image or a BE object file, it would retrieve
every instruction in byte-swapped form and disassemble it to
nonsense. (Even an object file output by LLVM itself, because
ARMMCCodeEmitter outputs instructions big-endian in big-endian mode,
which is correct for writing an object file.)

This patch allows llvm-objdump to correctly disassemble all three of
those classes of Arm ELF file. It does it by introducing a new
SubtargetFeature for big-endian instructions, setting it from the ELF
image type and flags during llvm-objdump setup, and teaching both
ARMDisassembler and llvm-objdump itself to pay attention to it when
retrieving instruction data from a section being disassembled.

Differential Revision: https://reviews.llvm.org/D130902
2022-08-08 10:49:51 +01:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Dawid Jurczak 1bd31a6898 [NFC] Add SmallVector constructor to allow creation of SmallVector<T> from ArrayRef of items convertible to type T
Extracted from https://reviews.llvm.org/D129781 and address comment:
https://reviews.llvm.org/D129781#3655571

Differential Revision: https://reviews.llvm.org/D130268
2022-08-05 13:35:41 +02:00
Mingming Liu bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
David Sherwood 4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
jacquesguan e38af7ba95 [LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add.
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.

This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.

Differential Revision: https://reviews.llvm.org/D130868
2022-08-02 16:02:38 +08:00
Lucas Prates ba9caf9170 [Arm] Fix parsing and emission of Tag_also_compatible_with eabi attribute
According to the ABI for the Arm Architecture, the value for the
Tag_also_compatible_with eabi attribute is represented by an NTBS entry.
This string value, in turn, is composed of a pair of tag+value encoded
in one of two formats:
- ULEB128: tag, ULEB128: value, 0.
- ULEB128: tag, NBTS: data.
(See [[ 60a8eb8c55/addenda32/addenda32.rst (3373secondary-compatibility-tag) | section 3.3.7.3 on the Addenda to, and Errata in, the ABI for the Arm Architecture ]].)

Currently the Arm assembly parser and streamer ignore the encoding of
the attribute's NTBS value, which can result in incorrect attributes
being emitted in both assembly and object file outputs.

This patch fixes these issues by properly handing the value's encoding.
An update to llvm-readobj to properly handle the attribute's value will be
covered by a separate patch.

Patch by Victor Campos and Lucas Prates.

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D129500
2022-08-01 13:28:01 +01:00
Nikita Popov a21c245307 [ARMParallelDSP] Remove unnecessary ModRef intersection (NFC)
Intersecting with ModRef is a no-op, as these are the only two
possible values.
2022-08-01 08:34:58 +02:00
David Green 39f8384964 [ARM] Correct features on pacbti instructions.
Given a patch like D129506, using instructions not valid for the current
feature set becomes an error. This updates the Arm hint-space
instructions for pac/bti to require thumbv7m as opposed to 8.1-m.main, to
make them valid when compiling for thumbv7m with -mbranch-protection.

Differential Revision: https://reviews.llvm.org/D129692
2022-07-27 09:15:14 +01:00
Nikita Popov b1b1086973 [ARM] Add target feature to force 32-bit atomics
This adds a +atomic-32 target feature, which instructs LLVM to assume
that lock-free 32-bit atomics are available for this target, even
if they usually wouldn't be.

If only atomic loads/stores are used, then this won't emit libcalls.
If atomic CAS is used, then the user is responsible for providing
any necessary __sync implementations (e.g. by masking interrupts
for single-core privileged use cases).

See https://reviews.llvm.org/D120026#3674333 for context on this
change. The tl;dr is that the thumbv6m target in Rust has
historically made atomic load/store only available, which is
incompatible with the change from D120026, which switched these to
use libatomic.

Differential Revision: https://reviews.llvm.org/D130480
2022-07-27 10:00:31 +02:00
Simon Tatham 55f1fbf005 [MC,llvm-objdump,ARM] Target-dependent disassembly resync policy.
Currently, when llvm-objdump is disassembling a code section and
encounters a point where no instruction can be decoded, it uses the
same policy on all targets: consume one byte of the section, emit it
as "<unknown>", and try disassembling from the next byte position.

On an architecture where instructions are always 4 bytes long and
4-byte aligned, this makes no sense at all. If a 4-byte word cannot be
decoded as an instruction, then the next place that a valid
instruction could //possibly// be found is 4 bytes further on.
Disassembling from a misaligned address can't possibly produce
anything that the code generator intended, or that the CPU would even
attempt to execute.

This patch introduces a new MCDisassembler virtual method called
`suggestBytesToSkip`, which allows each target to choose its own
resynchronization policy. For Arm (as opposed to Thumb) and AArch64,
I've filled in the new method to return a fixed width of 4.

Thumb is a more interesting case, because the criterion for
identifying 2-byte and 4-byte instruction encodings is very simple,
and doesn't require the particular instruction to be recognized. So
`suggestBytesToSkip` is also passed an ArrayRef of the bytes in
question, so that it can take that into account. The new test case
shows Thumb disassembly skipping over two unrecognized instructions,
and identifying one as 2-byte and one as 4-byte.

For targets other than Arm and AArch64, this is NFC: the base class
implementation of `suggestBytesToSkip` still returns 1, so that the
existing behavior is unchanged. Other targets can fill in their own
implementations as they see fit; I haven't attempted to choose a new
behavior for each one myself.

I've updated all the call sites of `MCDisassembler::getInstruction` in
llvm-objdump, and also one in sancov, which was the only other place I
spotted the same idiom of `if (Size == 0) Size = 1` after a call to
`getInstruction`.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D130357
2022-07-26 09:35:30 +01:00
David Sherwood f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
David Green 4704da1374 [ARM] Fix Thumb2 compare being emitted ExpandCMP_SWAP
Given a patch like D129506, using instructions not valid for the current
target feature set becomes an error. This fixes an issue in
ARMExpandPseudo::ExpandCMP_SWAP where Thumb2 compares were used in
Thumb1Only code, such as thumbv8m.baseline targets.

Differential Revision: https://reviews.llvm.org/D129695
2022-07-20 12:04:22 +01:00
David Green 6cb9529001 [ARM] Remove VBICimm if no cleared bits are demanded
If none of the bits of a VBICimm are demanded, we can remove the node
entirely using the input operand instead.

Differential Revision: https://reviews.llvm.org/D129966
2022-07-19 11:53:47 +01:00
Simon Pilgrim 0f6b0461b0 [DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits
The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits.

This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115

Alive2: https://alive2.llvm.org/ce/z/fl7T7K

Differential Revision: https://reviews.llvm.org/D129933
2022-07-19 10:59:07 +01:00
Matt Arsenault 8d0383eb69 CodeGen: Remove AliasAnalysis from regalloc
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.

Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.

Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
2022-07-18 17:23:41 -04:00
Simon Pilgrim 259c36e7c1 [DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC. 2022-07-18 13:11:24 +01:00