Summary:
These are output by clang -S, so can now be roundtripped thru clang.
(partially) fixes: https://bugs.llvm.org/show_bug.cgi?id=34544
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63901
llvm-svn: 364658
MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers.
The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl.
test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions.
Differential Revision: https://reviews.llvm.org/D63430
llvm-svn: 364654
We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be.
Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all).
llvm-svn: 364644
This simply adds integer and floating point VMUL patterns for MVE, same as we
have add and sub.
Differential Revision: https://reviews.llvm.org/D63866
llvm-svn: 364643
This adds handling and tests for a number of floating point math routines,
which have no MVE instructions.
Differential Revision: https://reviews.llvm.org/D63725
llvm-svn: 364641
Delete unnecessary getters of AddressRange.
Simplify AddressRange::size(): Start <= End check should be checked in an upper layer.
Delete isContiguousWith() that doesn't make sense.
Simplify AddressRanges::insert. Delete commented code. Fix it when more than 1 ranges are to be deleted.
Delete trailing newline.
llvm-svn: 364637
MVE has instructions to widen as it loads, and narrow as it stores. This adds
the required patterns and legalisation to make them work including specifying
that they are legal, patterns to select them and test changes.
Patch by David Sherwood.
Differential Revision: https://reviews.llvm.org/D63839
llvm-svn: 364636
This fills in the gaps for basic MVE loads and stores, allowing unaligned
access and adding far too many tests. These will become important as
narrowing/expanding and pre/post inc are added. Big endian might still not be
handled very well, because we have not yet added bitcasts (and I'm not sure how
we want it to work yet). I've included the alignment code anyway which maps
with our current patterns. We plan to return to that later.
Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev.
Differential Revision: https://reviews.llvm.org/D63838
llvm-svn: 364633
We don't have vector operations for these, so they need to be expanded for both
integer and float.
Differential Revision: https://reviews.llvm.org/D63595
llvm-svn: 364631
The same as integer arithmetic, we can add simple floating point MVE addition and
subtraction patterns.
Initial code by David Sherwood
Differential Revision: https://reviews.llvm.org/D63257
llvm-svn: 364629
Introduce llvm.test.set.loop.iterations which sets the loop counter
and also produces an i1 after testing that the count is not zero.
Differential Revision: https://reviews.llvm.org/D63809
llvm-svn: 364628
This adds the first few patterns for MVE code generation, adding simple integer
add and sub patterns.
Initial code by David Sherwood
Differential Revision: https://reviews.llvm.org/D63255
llvm-svn: 364627
This patch adds necessary shuffle vector and buildvector support for ARM MVE.
It essentially adds support for VDUP, VREVs and some VMOVs, which are often
required by other code (like upcoming patches).
This mostly uses the same code from Neon that already generated
NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and
moved to ARMInstrInfo as they are common to both architectures. Most of the
selection code seems to be applicable to both, but NEON does have some more
instructions making some parts specific.
Most code originally by David Sherwood.
Differential Revision: https://reviews.llvm.org/D63567
llvm-svn: 364626
Summary: This patch changes fs::setPermissions to optionally set permissions while respecting the umask. It also adds the function fs::getUmask() which returns the current umask.
Reviewers: jhenderson, rupprecht, aprantl, lhames
Reviewed By: jhenderson, rupprecht
Subscribers: sanaanajjar231288, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63583
llvm-svn: 364621
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.
llvm-svn: 364613
This patch intends to fix ASAN stack-use-after-scope error.
This is at least a short-term fix to unbreak LLVM's mainline.
Differential Revision: https://reviews.llvm.org/D63905
llvm-svn: 364611
This shaves an instruction (and a GOT entry in PIC code) off prologues of
functions with stack variables.
Differential Revision: https://reviews.llvm.org/D63472
llvm-svn: 364608
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).
Original patch D50222 by @hermord (Dmytro Shynkevych)
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364600
FeatureFusion bits was first introduced in
https://reviews.llvm.org/rL253724. for add/load integer fusion for P8.
The only use of `hasFusion` was https://reviews.llvm.org/rL255319.
However, this was removed later in https://reviews.llvm.org/rL280440.
So, there is NO any reference to fusion in code now.
Leaving it there is misleading and confusing, so remove it for now.
We can alwasy add back if we ever support fusion in the future.
llvm-svn: 364581
The `willreturn` function attribute guarantees that a function call will
come back to the call site if the call is also known not to throw.
Therefore, this attribute can be used in
`isGuaranteedToTransferExecutionToSuccessor`.
Patch by Hideto Ueno (@uenoku)
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63372
llvm-svn: 364580
The previous output was next to useless if *any* exit was not computable. If we have more than one exit, show the exit count for each so that it's easier to see what's going from with SCEV analysis when debugging.
llvm-svn: 364579
Summary:
- Match the syntax output by InstPrinter.
- Fix it always emitting 0 for align. Had to work around fact that
opcode is not available for GetDefaultP2Align while parsing.
- Updated tests that were erroneously happy with a p2align=0
Fixes https://bugs.llvm.org/show_bug.cgi?id=40752
Reviewers: aheejin, sbc100
Subscribers: jgravelle-google, sunfish, jfb, llvm-commits, dschuff
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63633
llvm-svn: 364570
We already had the infrastructure for this, but were waiting for the fix for a number of regressions which were handled by the recent shuffle(extract_subvector(),extract_subvector()) -> extract_subvector(shuffle()) shuffle combines
llvm-svn: 364569
Summary:
The new test case led to incorrect code.
Change-Id: Ief48b227e97aa662dd3535c9bafb27d4a184efca
Reviewers: arsenm, david-salinas
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63871
llvm-svn: 364566
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)
This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.
Original patch author: @hermord (Dmytro Shynkevych)!
Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.
Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00
Reviewed By: RKSimon, xbolva00
Subscribers: xbolva00, javed.absar, llvm-commits, hermord
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63391
llvm-svn: 364563
This patch introduces a new function attribute, willreturn, to indicate
that a call of this function will either exhibit undefined behavior or
comes back and continues execution at a point in the existing call stack
that includes the current invocation.
This attribute guarantees that the function does not have any endless
loops, endless recursion, or terminating functions like abort or exit.
Patch by Hideto Ueno (@uenoku)
Reviewers: jdoerfert
Subscribers: mehdi_amini, hiraditya, steven_wu, dexonsmith, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62801
llvm-svn: 364555
Emit replacements for clobbered parameters location if the parameter
has unmodified value throughout the funciton. This is basic scenario
where we can use the debug entry values.
([12/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D58042
llvm-svn: 364553
There is existing bitcode that we need to support where the structured nature
of pointer types is used to derive the result type of some operation. For
example a GEP's operation and result will be based on its input Type.
When pointers become opaque, the BitcodeReader will still have access to this
information because it's explicitly told how to construct the more complex
types used, but this information will not be attached to any Value that gets
looked up. This changes BitcodeReader so that in all places which use type
information in this manner, it's derived from a side-table rather than from the
Value in question.
llvm-svn: 364550
This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751
llvm-mc aborted when disassembling tabortdc.
This patch try to clean up TM related DAGs.
* Fixes the problem by remove explicit output of cr0, and put it as implicit def.
* Update int_ppc_tbegin pattern to accommodate the implicit def of cr0.
* Update the TCHECK operand and int_ppc_tcheck accordingly.
* Add some builtin test and disassembly tests.
* Remove unused CRRC0/crrc0
Differential Revision: https://reviews.llvm.org/D61935
llvm-svn: 364544
We saw a 70% ThinLTO link time increase in Chromium for Android, see
crbug.com/978817. Sounds like more of PR42210.
> Recommit of D32530 with a few small changes:
> - Stopped recursively walking through aggregates in
> the verifier, so that we don't impose too much
> overhead on large modules under LTO (see PR42210).
> - Changed tests to match; the errors are slightly
> different since they only report the array or
> struct that actually contains a scalable vector,
> rather than all aggregates which contain one in
> a nested member.
> - Corrected an older comment
>
> Reviewers: thakis, rengolin, sdesmalen
>
> Reviewed By: sdesmalen
>
> Differential Revision: https://reviews.llvm.org/D63321
llvm-svn: 364543
Add the IR and the AsmPrinter parts for handling of the DW_OP_entry_values
DWARF operation.
([11/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60866
llvm-svn: 364542
Handle call instruction replacements and deletions in order to preserve
valid state of the call site info of the MachineFunction.
NOTE: If the call site info is enabled for a new target, the assertion from
the MachineFunction::DeleteMachineInstr() should help to locate places
where the updateCallSiteInfo() should be called in order to preserve valid
state of the call site info.
([10/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D61062
llvm-svn: 364536
The code to generate register move instructions in and out of VPR and
FPSCR_NZCV had assertions checking that the other register involved
was a GPR _pair_, instead of a single GPR as it should have been.
Reviewers: miyuki, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63865
llvm-svn: 364534
The BF and WLS/WLSTP instructions have various branch-offset fields
occupying different positions and lengths in the instruction encoding,
and all of them were decoded at disassembly time by the function
DecodeBFLabelOffset() which returned SoftFail if the offset was zero.
In fact, it's perfectly fine and not even a SoftFail for most of those
offset fields to be zero. The only one that can't be zero is the 4-bit
field labelled `boff` in the architecture spec, occupying bits {26-23}
of the BF instruction family. If that one is zero, the encoding
overlaps other instructions (WLS, DLS, LETP, VCTP), so it ought to be
a full Fail.
Fixed by adding an extra template parameter to DecodeBFLabelOffset
which controls whether a zero offset is accepted or rejected. Adjusted
existing tests (only in error messages for bad disassemblies); added
extra tests to demonstrate zero offsets being accepted in all the
right places, and a few demonstrating rejection of zero `boff`.
Reviewers: DavidSpickett, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63864
llvm-svn: 364533
Different versions of the Arm architecture disallow the use of generic
coprocessor instructions like MCR and CDP on different sets of
coprocessors. This commit centralises the check of the coprocessor
number so that it's consistent between assembly and disassembly, and
also updates it for the new restrictions in Arm v8.1-M.
New tests added that check all the coprocessor numbers; old tests
updated, where they used a number that's now become illegal in the
context in question.
Reviewers: DavidSpickett, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63863
llvm-svn: 364532
In the `CSEL Rd,Rm,Rn` instruction family (also including CSINC, CSINV
and CSNEG), the architecture lists it as CONSTRAINED UNPREDICTABLE
(i.e. SoftFail) to use SP in the Rd or Rm slot, but outright illegal
to use it in the Rn slot, not least because some encodings of that
form are used by MVE instructions such as UQRSHLL.
MC was treating all three slots the same, as SoftFail. So the only
reason UQRSHLL was disassembled correctly at all was because the MVE
decode table is separate from the Thumb2 one and takes priority; if
you turned off MVE, then encodings such as `[0x5f,0xea,0x0d,0x83]`
would disassemble as spurious CSELs.
Fixed by inventing another version of the `GPRwithZR` register class,
which disallows SP completely instead of just SoftFailing it.
Reviewers: DavidSpickett, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63862
llvm-svn: 364531
FunctionComparator attempts to produce a stable comparison of two Function
instances by looking at all available properties. Since ByVal attributes now
contain a Type pointer, they are not trivially ordered and FunctionComparator
should use its own Type comparison logic to sort them.
llvm-svn: 364523
This allows setting different values for e_shentsize, e_shoff, e_shnum
and e_shstrndx fields and is useful for producing broken inputs for various
test cases.
Differential revision: https://reviews.llvm.org/D63771
llvm-svn: 364517
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.
([9/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60715
llvm-svn: 364516
Once MIR code leaves SSA form and the liveness of a vreg is considered,
DBG_VALUE insts are able to refer to non-live vregs, because their
debug-uses do not contribute to liveness. This non-liveness becomes
problematic for optimizations like register coalescing, as they can't
``see'' the debug uses in the liveness analyses.
As a result registers get coalesced regardless of debug uses, and that can
lead to invalid variable locations containing unexpected values. In the
added test case, the first vreg operand of ADD32rr is merged with various
copies of the vreg (great for performance), but a DBG_VALUE of the
unmodified operand is blindly updated to the modified operand. This changes
what value the variable will appear to have in a debugger.
Fix this by changing any DBG_VALUE whose operand will be resurrected by
register coalescing to be a $noreg DBG_VALUE, i.e. give the variable no
location. This is an overapproximation as some coalesced locations are
safe (others are not) -- an extra domination analysis would be required to
work out which, and it would be better if we just don't generate non-live
DBG_VALUEs.
This fixes PR40010.
Differential Revision: https://reviews.llvm.org/D56151
llvm-svn: 364515
Remove the last use of packRegs from IRTranslator and delete
pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics
with aggregate arguments, since we don't have a testcase for them so
it's hard to tell how we'd want to handle them.
Discussed in https://reviews.llvm.org/D63551
llvm-svn: 364514
Now that lowerCall and lowerFormalArgs have been refactored, we can
simplify splitToValueTypes.
Differential Revision: https://reviews.llvm.org/D63552
llvm-svn: 364513
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63551
llvm-svn: 364512
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one. This is a
follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.
With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
NFCI for AMDGPU, Mips and X86.
Differential Revision: https://reviews.llvm.org/D63550
llvm-svn: 364511
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
llvm-svn: 364510
Allow CallLowering::ArgInfo to contain more than one virtual register.
This is useful when passes split aggregates into several virtual
registers, but need to also provide information about the original type
to the call lowering. Used in follow-up patches.
Differential Revision: https://reviews.llvm.org/D63548
llvm-svn: 364509
Summary:
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code.
It tries to print an entry label at the start of every function, but
that didn't work for the first function in the module because
DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart
which is too late.
Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee
Reviewers: arsenm, tpr, kzhuravl
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63712
llvm-svn: 364508
Without the fix gcc 7.4.0 complains with
../lib/Target/X86/X86ISelLowering.cpp: In function 'bool getFauxShuffleMask(llvm::SDValue, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&)':
../lib/Target/X86/X86ISelLowering.cpp:6690:36: error: enumeral and non-enumeral type in conditional expression [-Werror=extra]
int Idx = (ZeroMask[j] ? SM_SentinelZero : (i + j + Ofs));
~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
llvm-svn: 364507
Add an attribute into the MachineFunction that tracks call site info.
([8/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D61061
llvm-svn: 364506
A unique DISubprogram may be attached to a function declaration used for
call site debug info.
([6/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60713
llvm-svn: 364500
The current implementation of ThumbRegisterInfo::saveScavengerRegister
is bad for two reasons: one, it's buggy, and two, it blocks using R12
for other optimizations. So this patch gets rid of it, and adds the
necessary support for using an ordinary emergency spill slot on Thumb1.
(Specifically, I think saveScavengerRegister was broken by r305625, and
nobody noticed for two years because the codepath is almost never used.
The new code will also probably not be used much, but it now has better
tests, and if we fail to emit a necessary emergency spill slot we get a
reasonable error message instead of a miscompile.)
A rough outline of the changes in the patch:
1. Gets rid of ThumbRegisterInfo::saveScavengerRegister.
2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an
emergency spill slot for Thumb1.
3. Implements useFPForScavengingIndex, so the emergency spill slot isn't
placed at a negative offset from FP on Thumb1.
4. Modifies the heuristics for allocating an emergency spill slot to
support Thumb1. This includes fixing ExtraCSSpill so we don't try to
use "lr" as a substitute for allocating an emergency spill slot.
5. Allocates a base pointer in more cases, so the emergency spill slot
is always accessible.
6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the
right offset in the new cases where we're forcing a base pointer.
7. Ensures we never generate a load or store with an offset outside of
its frame object. This makes the heuristics more straightforward.
8. Changes Thumb1 prologue and epilogue emission so it never uses
register scavenging.
Some of the changes to the emergency spill slot heuristics in
determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow
the compiler to avoid allocating an emergency spill slot in cases
where it isn't necessary. The rest of the changes should only affect
Thumb1.
Differential Revision: https://reviews.llvm.org/D63677
llvm-svn: 364490
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).
Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk
Reviewed By: RKSimon, dtemirbulatov
Subscribers: rnk, rcorcs, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60897
llvm-svn: 364478
The LivePhysRegs calculated in order to find a scratch register in the
epilogue code wrongly uses 'LiveIns'. Instead, it should use the
'Liveout' sets. For the liveness, also considering the operands of
the terminator (return) instruction which is the insertion point for
the scratch-exec-copy instruction.
Patch by Christudasan Devadasan
llvm-svn: 364470
This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16.
I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load.
Differential Revision: https://reviews.llvm.org/D63702
llvm-svn: 364469
Summary:
This diff enables address sanitizer on Emscripten.
On Emscripten, real memory starts at the value passed to --global-base.
All memory before this is used as shadow memory, and thus the shadow mapping
function is simply dividing by 8.
Reviewers: tlively, aheejin, sbc100
Reviewed By: sbc100
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D63742
llvm-svn: 364468
The bitstream reader handles errors poorly. This has two effects:
* Bugs in file handling (especially modules) manifest as an "unexpected end of
file" crash
* Users of clang as a library end up aborting because the code unconditionally
calls `report_fatal_error`
The bitstream reader should be more resilient and return Expected / Error as
soon as an error is encountered, not way late like it does now. This patch
starts doing so and adopting the error handling where I think it makes sense.
There's plenty more to do: this patch propagates errors to be minimally useful,
and follow-ups will propagate them further and improve diagnostics.
https://bugs.llvm.org/show_bug.cgi?id=42311
<rdar://problem/33159405>
Differential Revision: https://reviews.llvm.org/D63518
llvm-svn: 364464
This was trying to optimize concat_vectors with zero of setcc or
kand instructions. But I think it produced the same code we
produce for a concat_vectors with 0 even it it doesn't come from
one of those operations.
llvm-svn: 364463
Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended.
Fixes PR41545
llvm-svn: 364458
Summary:
This fixes a hardware bug that makes a branch offset of 0x3f unsafe.
This replaces the 32 bit branch with offset 0x3f to a 64 bit
instruction that includes the same 32 bit branch and the encoding
for a s_nop 0 to follow. The relaxer than modifies the offsets
accordingly.
Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63494
llvm-svn: 364451
This implements a small enhancement to https://reviews.llvm.org/D55506
Specifically, while we were able to match strict FP nodes for
floating-point extend operations with a register as source, this
did not work for operations with memory as source.
That is because from regular operations, this is represented as
a combined "extload" node (which is a variant of a load SD node);
but there is no equivalent using a strict FP operation.
However, it turns out that even in the absence of an extload
node, we can still just match the operations explicitly, e.g.
(strict_fpextend (f32 (load node:$ptr))
This patch implements that method to match the LDEB/LXEB/LXDB
SystemZ instructions even when the extend uses a strict-FP node.
llvm-svn: 364450
Summary:
Since the WebAssembly SIMD shift instructions take i32 operands, we
truncate the i64 operand to <2 x i64> shifts during ISel. When the i64
operand is sign extended from i32, this CL makes it so the sign
extension is dropped instead of a wrap instruction added.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63615
llvm-svn: 364446
Summary:
Implements direct and indirect tail calls enabled by the 'tail-call'
feature in both DAG ISel and FastISel. Updates existing call tests and
adds new tests including a binary encoding test.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62877
llvm-svn: 364445
We have (almost) no target opcodes that have scalar/vector equivalents - for now assume we can't scalarize them (we can add exceptions if we need to).
llvm-svn: 364429
The full GSYM patch started with: https://reviews.llvm.org/D53379
In that patch we wanted to split up getting GSYM into the LLVM code base so we are not committing too much code at once.
This is a first in a series of patches where I only add the foundation classes along with complete unit tests. They provide the foundation for encoding and decoding a GSYM file.
File entries are defined in llvm::gsym::FileEntry. This class splits the file up into a directory and filename represented by uniqued string table offsets. This allows all files that are referred to in a GSYM file to be encoded as 1 based indexes into a global file table in the GSYM file.
Function information in stored in llvm::gsym::FunctionInfo. This object represents a contiguous address range that has a name and range with an optional line table and inline call stack information.
Line table entries are defined in llvm::gsym::LineEntry. They store only address, file and line information to keep the line tables simple and allows the information to be efficiently encoded in a subsequent patch.
Inline information is defined in llvm::gsym::InlineInfo. These structs store the name of the inline function, along with one or more address ranges, and the file and line that called this function. They also contain any child inline information.
There are also utility classes for address ranges in llvm::gsym::AddressRange, and string table support in llvm::gsym::StringTable which are simple classes.
The unit tests test all the APIs on these simple classes so they will be ready for the next patches where we will create GSYM files and parse GSYM files.
Differential Revision: https://reviews.llvm.org/D63104
llvm-svn: 364427
Summary:
Doing better separation of Cost and Threshold.
Cost counts the abstract complexity of live instructions, while Threshold is an upper bound of complexity that inlining is comfortable to pay.
There are two parts:
- huge 15K last-call-to-static bonus is no longer subtracted from Cost
but rather is now added to Threshold.
That makes much more sense, as the cost of inlining (Cost) is not changed by the fact
that internal function is called once. It only changes the likelyhood of this inlining
being profitable (Threshold).
- bonus for calls proved-to-be-inlinable into callee is no longer subtracted from Cost
but added to Threshold instead.
While calculations are somewhat different, overall InlineResult should stay the same since Cost >= Threshold compares the same.
Reviewers: eraman, greened, chandlerc, yrouban, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60740
llvm-svn: 364422
Summary:
The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic.
Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality
we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones".
And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself.
Last pattern D does not seem to exhibit any of these truncation issues.
Although it has the opposite problem - if we extract low bits (no shift) from i64,
and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62806
llvm-svn: 364419
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62793
llvm-svn: 364418
Summary:
Finally tying up loose ends here.
The problem is quite simple:
If we have pattern `(x >> start) & (1 << nbits) - 1`,
and then truncate the result, that truncation will be propagated upwards,
into the `and`. And that isn't currently handled.
I'm only fixing pattern `a` here,
the same fix will be needed for patterns `b`/`c` too.
I *think* this isn't missing any extra legality checks,
since we only look past truncations. Similary, i don't think
we can get any other truncation there other than i64->i32.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62786
llvm-svn: 364417
This allows later passes (in particular InstCombine) to optimize more
cases.
One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.
llvm-svn: 364412
Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all.....
llvm-svn: 364407
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8)))
llvm-svn: 364405
Summary:
The getFixupKindContainerSizeBytes function returns the size of the
instruction containing a given fixup. Currently fixup_arm_pcrel_9 is
not handled in this function, this causes an assertion failure in
the debug build and incorrect codegen in the release build.
This patch fixes the problem.
Reviewers: ostannard, simon_tatham
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63778
llvm-svn: 364404
This patch adds the PseudoCALLReg instruction which allows using an
explicit register operand as the destination for the return address.
GCC can successfully parse this form of the call instruction, which
would be used for calls to functions which do not use ra as the return
address register, such as the __riscv_save libcalls. This patch forms
the first part of an implementation of -msave-restore for RISC-V.
Differential Revision: https://reviews.llvm.org/D62685
llvm-svn: 364403
truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts.
This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type.
llvm-svn: 364401
This patch generalizes the UnrollLoop utility to support loops that exit
from the header instead of the latch. Usually, LoopRotate would take care
of must of those cases, but in some cases (e.g. -Oz), LoopRotate does
not kick in.
Codesize impact looks relatively neutral on ARM64 with -Oz + LTO.
Program master patch diff
External/S.../CFP2006/447.dealII/447.dealII 629060.00 627676.00 -0.2%
External/SPEC/CINT2000/176.gcc/176.gcc 1245916.00 1244932.00 -0.1%
MultiSourc...Prolangs-C/simulator/simulator 86100.00 86156.00 0.1%
MultiSourc...arks/Rodinia/backprop/backprop 66212.00 66252.00 0.1%
MultiSourc...chmarks/Prolangs-C++/life/life 67276.00 67312.00 0.1%
MultiSourc...s/Prolangs-C/compiler/compiler 69824.00 69788.00 -0.1%
MultiSourc...Prolangs-C/assembler/assembler 86672.00 86696.00 0.0%
Reviewers: efriedma, vsk, paquette
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D61962
llvm-svn: 364398
PPCMIPeephole::emitRLDICWhenLoweringJumpTables should return a bool
value to indicate optimization is conducted or not.
Differential Revision: https://reviews.llvm.org/D63801
llvm-svn: 364383
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.
Differential revision: https://reviews.llvm.org/D63634
llvm-svn: 364373
The weak alias should have the characteristics set to
`IMAGE_EXTERN_WEAK_SEARCH_ALIAS` to indicate that the weak external here
is a symbol alias and that the symbol is aliased to a locally defined
symbol. We were previously setting the characteristics to
`IMAGE_EXTERN_WEAK_SEARCH_LIBRARY` which indicates that the symbol
should be looked for in the libraries.
llvm-svn: 364370
Summary:
The list of relocations with addend in lld was missing `R_WASM_MEMORY_ADDR_REL_SLEB`,
causing `wasm-ld` to generate corrupted output. This fixes that problem and while
we're at it pulls the list of such relocations into the Wasm.h header, to avoid
duplicating it in multiple places.
Reviewers: sbc100
Differential Revision: https://reviews.llvm.org/D63696
llvm-svn: 364367
Summary:
`catch_all` is from the first version of EH proposal and now has been
removed. There were no tests covering this, and thus no tests to remove
or fix.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63737
llvm-svn: 364360
This verifier check is failing for us while doing ThinLTO on Chrome for
x86, see https://crbug.com/978218, and this helps to debug the problem.
llvm-svn: 364357
When we calculate MII, we use two loops, one with iterator R++ to
check whether we can reserve the resource, then --R to move back
the iterator to do reservation.
This is risky, as R++, --R may not point to the same element at all.
The can cause wrong MII.
Differential Revision: https://reviews.llvm.org/D63536
llvm-svn: 364353