As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.
I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.
As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...
Differential Revision: https://reviews.llvm.org/D44654
llvm-svn: 329388
VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, the helper handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of d-form store (stxv). For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces register pressure as well as instructions.
Differential Revision: https://reviews.llvm.org/D45079
llvm-svn: 329377
Summary:
Replace ArrayRefs by actual std::array objects so that there are
no dangling references.
Reviewers: rsmith, gkistanova
Subscribers: sdardis, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D45338
llvm-svn: 329359
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.
The Sandy Bridge version was missing a load port use.
llvm-svn: 329347
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.
Differential Revision: https://reviews.llvm.org/D45129
llvm-svn: 329341
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
llvm-svn: 329339
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: t.p.northover, RKSimon, MatzeB, bkramer
Reviewed By: bkramer
Subscribers: javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44855
llvm-svn: 329329
This patch adds a way for users to create their own custom sections to
be added to wasm files. At the LLVM IR layer, they are defined through
the "wasm.custom_sections" named metadata. The expected use case for
this is bindings generators such as wasm-bindgen.
Patch by Dan Gohman
Differential Revision: https://reviews.llvm.org/D45297
llvm-svn: 329315
This patch adds the ability to describe properties of the hardware retire
control unit.
Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).
A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.
A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize. A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".
Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.
Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).
This patch fixes PR36661.
Differential Revision: https://reviews.llvm.org/D45259
llvm-svn: 329304
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch by myeisha (pmb).
llvm-svn: 329287
For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp
This patch causes problems for a number of models that had been incorrectly flagged as complete.
Differential Revision: https://reviews.llvm.org/D43235
llvm-svn: 329280
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
llvm-svn: 329256
A bug was found where an offset of -1 would generate an encoding
of max int64 which is invalid in the binary format.
Differential Revision: https://reviews.llvm.org/D45280
llvm-svn: 329238
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.
Differential Revision: https://reviews.llvm.org/D45239
llvm-svn: 329236
The MachineOutliner has a bunch of target hooks that will call llvm_unreachable
if the target doesn't implement them. Therefore, if you enable the outliner on
such a target, it'll just crash. It'd be much better if it'd just *not* run
the outliner at all in this case.
This commit adds a hook to TargetInstrInfo that returns false by default.
Targets that implement the hook make it return true. The outliner checks the
return value of this hook to decide whether or not to continue.
llvm-svn: 329220
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches.
Reviewers: t.p.northover, jmolloy, RKSimon, rengolin
Reviewed By: rengolin
Subscribers: dexonsmith, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D44853
llvm-svn: 329216
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.
Also remove some stale-looking comments from the Nios2 target cmakefile.
No intended behavior change.
llvm-svn: 329181
They were added in r285274, in what looks like a merge mishap.
AVRGenMCCodeEmitter.inc is the only non-dupe tablegen invocation added in that
revision.
Also sort the tablegen lines to make this easier to spot in the future.
llvm-svn: 329178
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.
This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.
Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.
v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI
Change-Id: I099f309e0a394082a5901ea196c3967afb867f04
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44939
llvm-svn: 329166
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.
There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".
Fixes a bug encountered in Nier: Automata.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D40547
llvm-svn: 329164
Summary:
Patch https://reviews.llvm.org/D44467 implements conversion of invalid
vmov instructions into valid ones. It turned out that some valid
instructions also get converted, for example
vmov.i64 d2, #0xff00ff00ff00ff00 ->
vmov.i16 d2, #0xff00
Such behavior is incorrect because according to the ARM ARM section
F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD
instructions, "On assembly, the data type must be matched in the table
if possible."
This patch fixes the isNEONmovReplicate check so that the above
instruction is not modified any more.
Reviewers: rengolin, olista01
Reviewed By: rengolin
Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D44678
llvm-svn: 329158
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both.
llvm-svn: 329154
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.
Reviewers: pcc, vitalybuka
Reviewed By: pcc
Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44802
llvm-svn: 329139
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.
Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.
This also adds a new test for the new redzone behaviour.
llvm-svn: 329134
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D45219
llvm-svn: 329131
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
llvm-svn: 329120
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
(1). In code, just swap is not enough, ConditionalCode CC
should also be swapped, otherwise incorrect code will
be generated.
(2). The ConditionalCode swap should be subject to
getHasJmpExt(). If getHasJmpExt() is False, certain
conditional codes will not be supported and swap
may generate incorrect code.
The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.
llvm-svn: 328952
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.
This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.
Thanks to Craig for the review here.
llvm-svn: 328945
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44938
llvm-svn: 328939
Summary:
Avoids having to list all intrinsics manually.
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44937
llvm-svn: 328938
This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false.
llvm-svn: 328926
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
Summary:
This exposes WebAssembly passes for use on the command line (as
arguments to -print-before and the like).
Reviewers: dschuff, sunfish
Subscribers: MatzeB, jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D45103
llvm-svn: 328901
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.
llvm-svn: 328897
Summary:
Make NVPTX require structured CFG. Added a temporary flag to
"roll back" the behavior for easy deployment.
Combined with D45008, this fixes several internal Nvidia GPU test
failures that we suspect to be ptxas miscompiles (PR27738).
Reviewers: jlebar
Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D45070
llvm-svn: 328885
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.
This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.
Reviewers: arsenm, #amdgpu
Reviewed By: arsenm, #amdgpu
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44364
llvm-svn: 328856
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 328854
Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first.
I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already.
Differential Revision: https://reviews.llvm.org/D44726
llvm-svn: 328834
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.
I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.
llvm-svn: 328831
The memory form of these instructions only read an input from memory. They don't have any register operands.
Differential Revision: https://reviews.llvm.org/D44836
llvm-svn: 328828
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd
Differential Revision: https://reviews.llvm.org/D44838
llvm-svn: 328823
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: sdardis, RKSimon, dsanders, atanasyan
Reviewed By: atanasyan
Subscribers: atanasyan, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D44869
llvm-svn: 328815
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
We are re-adding all the bitcasts, constant masks and target shuffles to the work list for no apparent gain.
Found while investigating adding SimplifyDemandedVectorElts to target shuffles.
Differential Revision: https://reviews.llvm.org/D44942
llvm-svn: 328771
I believe the role of ehDataReg has been replaced by MipsABIInfo::GetEhDataReg, thus removing the dead code.
Patch By: Wei-Ren Chen.
Reviewers: ehostunreach, sdardis
Differential Revision: https://reviews.llvm.org/D44867
llvm-svn: 328767
The IntelPrinter and the ATTPrinter produce the same strings for the same input. We already use the ATTPrinter explicitly in several other places.
llvm-svn: 328762
Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end.
I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future.
llvm-svn: 328730
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" versions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.
We need to use pseudo instructions for these instructions until
after register allocation. The problem is that these instructions
allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for
the CSx set-up until after register allocation when the Mx
register has been fixed for the instruction.
There is a related clang patch.
Patch by Brendon Cahoon.
llvm-svn: 328724
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.
llvm-svn: 328719
Similar to r328694. The number of micro opcodes should be 2 for those
instructions.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328698
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.
It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.
Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
The Jaguar backend natively supports 128-bit data types. Operations on YMM
registers are split into two COPs (complex operations). Each COP consumes a slot
in the dispatch group, and in the reorder buffer.
The scheduling model for Jaguar should mark those instructions as `let
NumMicroOps = 2`.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328694
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.
Differential Revision: https://reviews.llvm.org/D44941
llvm-svn: 328691
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.
This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.
llvm-svn: 328674
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).
This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.
Reviewers: kzhuravl, nhaehnle, timcorringham
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D44468
Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.
Differential Revision: https://reviews.llvm.org/D44924
llvm-svn: 328664
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.
llvm-svn: 328659
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.
llvm-svn: 328652
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.
Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.
Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.
llvm-svn: 328650
This implements a set of TTI functions that the loop vectorizer uses.
The only purpose of this is to enable testing. Auto-vectorization is
disabled by default, enabled by -hexagon-autohvx.
llvm-svn: 328639
Summary:
This is a canonical way to teach objdump to print the target
symbols for branches when disassembling AArch64 code.
Reviewers: evandro, t.p.northover, espindola
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D44851
llvm-svn: 328638
Summary:
I recently added a new Fixup kind to our fork of LLVM but forgot to add
it to the table in MipsAsmBackend.cpp. With this static_assert the error
would have been caught instead of zero-initializing the array entries for
the new fixups.
Reviewers: sdardis, atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44895
llvm-svn: 328616
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.
Differential Revision: https://reviews.llvm.org/D44647
llvm-svn: 328582
Summary:
Re-lands r328386 and r328443, reverting r328482.
Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small
parameters in i8 and i16 do not end up in the SysV register parameters
(EDI, ESI, etc).
I added tests for how we receive small parameters, since that is the
important part. It's always safe to store more bytes than will be read,
but the assumptions you make when loading them are what really matter.
I also tested this by self-hosting clang and it passed tests on win64.
Reviewers: mstorsjo, hans
Subscribers: hiraditya, mstorsjo, llvm-commits
Differential Revision: https://reviews.llvm.org/D44900
llvm-svn: 328570
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.
These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).
Differential Revision: https://reviews.llvm.org/D44879
llvm-svn: 328566
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: dblaikie, RKSimon, robertlytton
Reviewed By: robertlytton
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44875
llvm-svn: 328564
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.
Differential Revision: https://reviews.llvm.org/D44746
llvm-svn: 328558
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.
Differential Revision: https://reviews.llvm.org/D43086
llvm-svn: 328556
Summary:
llvm-objdump now disassembles unrecognised opcodes as data, using
the .long directive. We treat unrecognised opcodes as being 32 bit
values, so move along 4 bytes rather than the single byte which
previously resulted in a cascade of bogus disassembly following an
unrecognised opcode.
While no solution can always disassemble code that contains
embedded data correctly this provides a significant improvement.
The disassembler will now cope with an arbitrary length section
as it no longer truncates it to a multiple of 4 bytes, and will
use the .byte directive for trailing bytes.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44685
llvm-svn: 328553
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.
The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.
The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).
The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.
The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound.
Patch by Brendon Cahoon.
llvm-svn: 328542
This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up
patch at D44876 fixes this, but let's revert back to green for now until that's
ready to land.
(Also reverts r328443.)
> Both GCC and MSVC only look at the low byte of a boolean when it is
> passed.
llvm-svn: 328482
Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44644
llvm-svn: 328451
The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up.
Differential Revision: https://reviews.llvm.org/D44840
llvm-svn: 328446
Summary:
LLVM defaults to the newer .init_array/.fini_array scheme for static
constructors rather than the less desirable .ctors/.dtors (the UseCtors
flag defaults to false). This wasn't being respected in the RISC-V
backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate
flag for UseInitArray.
This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call
InitializeELF(TM.Options.UseInitArray).
Reviewers: asb, apazos
Reviewed By: asb
Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits
Differential Revision: https://reviews.llvm.org/D44750
llvm-svn: 328433
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches.
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44857
llvm-svn: 328430
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Reviewers: tstellar, RKSimon, arsenm
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44856
llvm-svn: 328429
These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them.
Differential Revision: https://reviews.llvm.org/D44375
llvm-svn: 328405
Should be NFC since nothing used the enum value. The instruction descriptions are generated from tablegen which had the correct value.
llvm-svn: 328398
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
This includes llvm-c/TargetMachine.h which is logically part of
libTarget (since libTarget implements llvm-c/TargetMachine.h's
functions).
llvm-svn: 328394
This is used from llvm tblgen and the X86Disassembler - the only common
library (apart from TableGen, which probably doesn't make sense to have
as a dependency from a release tool (rather than a use-while-building-llvm
tool) of LLVM)
llvm-svn: 328393
The branch relaxation pass collects sizes of all instructions at the
beginning, before any changes have been made. It then performs one pass
over all branches to see which ones need to be extended. It does not
account for the case when a previously valid branch becomes out-of-range
due to relaxing other branches.
This approach fixes this problem by assuming from the beginning that
all extendable branches have been extended. This may cause unneeded
relaxation in some cases, but avoids iteration and recomputing instruction
sizes.
llvm-svn: 328360
The HexagonExpandCondsets pass is incorrectly removing the dead
flag on a definition that is really dead, and adding a kill flag
to a use that is tied to a definition. This causes an assert later
during the machine scheduler when querying the live interval
information.
Patch by Brendon Cahoon.
llvm-svn: 328357
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.
Differential Revision: https://reviews.llvm.org/D43736
llvm-svn: 328349
HexagonGenMux would collapse pairs of predicated transfers if it assumed
that the predicated .new forms cannot be created. Turns out that generating
mux is preferable in almost all cases.
Introduce an option -hexagon-gen-mux-threshold that controls the minimum
distance between the instruction defining the predicate and the later of
the two transfers. If the distance is closer than the threshold, mux will
not be generated. Set the threshold to 0 by default.
llvm-svn: 328346
This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode.
In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode,
adjustBBOffsetsAfter() is not calculating postOffset correctly by
properly accounting for the padding that is required for the constant pool
that immediately follows the jump table branch instruction.
Reviewers: t.p.northover, eli.friedman
Reviewed By: t.p.northover
Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D44709
llvm-svn: 328341
When converting an instruction to the wider version, copy any
subregisters if the original operand has a subregister.
Patch by Brendon Cahoon.
llvm-svn: 328333
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.
Differential Revision: https://reviews.llvm.org/D40196
llvm-svn: 328326
Loads and stores can only shift the offset register by the size of the value
being loaded, but currently the DAGCombiner will reduce the width of the load
if it's followed by a trunc making it impossible to later combine the shift.
Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and
make it prevent the width reduction if this is what would happen, though do
allow it if reducing the load width will let us eliminate a later sign or zero
extend.
Differential Revision: https://reviews.llvm.org/D44794
llvm-svn: 328321
This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops.
llvm-svn: 328320
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.
This patch also restores using fpcmp with immediate 0 under fp-armv8.
Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443
llvm-svn: 328313
The VMOVMSKBrr was in a separate InstRW with a lower latency, but I assume they should be the same and the higher latency matches Agners table so I'm going with that.
llvm-svn: 328291
This makes the Y position consistent with other instructions.
This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.
llvm-svn: 328254
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated). This avoids executing the
the copy on paths where their results aren't needed. This also exposes
additional opportunites for dead copy elimination and shrink wrapping.
These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..
For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.
```
%bb.0:
%wzr = SUBSWri %w1, 1
%w19 = COPY %w0
Bcc 11, %bb.2
%bb.1:
Live Ins: %w19
BL @fun
%w0 = ADDWrr %w0, %w19
RET %w0
%bb.2:
%w0 = COPY %wzr
RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.
With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64.
Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz
Reviewed By: sebpop
Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41463
llvm-svn: 328237
As in SystemZ backend, correctly propagate node ids when inserting new
unselected nodes into the DAG during instruction Seleciton for X86
target.
Fixes PR36865.
Reviewers: jyknight, craig.topper
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44797
llvm-svn: 328233
This fixes a few issues with the R52 instregexs to enable the full overlap checking
Differential Revision: https://reviews.llvm.org/D44767
llvm-svn: 328216
Models were completely overriding all CLMUL instructions when the WriteCLMUL default classes could be used for exactly the same coverage.
llvm-svn: 328194
PCLMULQDQrm was using the rr itinerary.
Difference in itineraries between PCLMULQDQ/VPCLMULQDQ variants was causing an unnecessary duplication of scheduler class entries.
llvm-svn: 328193
Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage.
Removes 6 unnecessary scheduler classes from every model.
Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default.
llvm-svn: 328192
I've also merged some VEX/non-VEX instregex strings with a (V?) prefix or (Y?) ymm variant - there are still a lot more of these to do.
This reduces the size of the optimized llc binary on my computer by 400K. Presumably because we went from 5000+ scheduler classes per CPU to ~2000.
llvm-svn: 328179
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.
Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.
Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.
llvm-svn: 328165
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.
Differential Revision: https://reviews.llvm.org/D44719
llvm-svn: 328158
The "ShouldEmitMatchRegisterName" bit wasn't taking effect because the
WebAssembly target didn't point to the custom WebAssemblyAsmParser
record.
llvm-svn: 328155
Add barrier edges to check for any physical register. The previous code
worked for the function return registers: r0/d0, v0/w0.
Patch by Brendon Cahoon.
llvm-svn: 328120
I've also merged some VEX/non-VEX instregex strings with a (V?) prefix or (Y?) ymm variant - there are still a lot more of these to do.
llvm-svn: 328111
This patch also includes extensive tests targeted at select and br+fcmp IR
inputs. A sequence of br+fcmp required support for FPR32 registers to be added
to RISCVInstrInfo::storeRegToStackSlot and
RISCVInstrInfo::loadRegFromStackSlot.
llvm-svn: 328104
Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server.
Fixes PR36808.
llvm-svn: 328061
The default thread model for wasm is single, and in this mode thread-local
global variables can be lowered identically to non-thread-local variables.
Differential Revision: https://reviews.llvm.org/D44703
llvm-svn: 328049
Mingw uses the same stack protector functions as GCC provides
on other platforms as well.
Patch by Valentin Churavy!
Differential Revision: https://reviews.llvm.org/D27296
llvm-svn: 328039
It uses the MC framework and the tablegen matcher to do the
heavy lifting. Can handle both explicit and implicit locals
(-disable-wasm-explicit-locals). Comes with a small regression
test.
This is a first basic implementation that can parse most llvm .s
output and round-trips most instructions succesfully, but in order
to keep the commit small, does not address all issues.
There are a fair number of mismatches between what MC / assembly
matcher think a "CPU" should look like and what WASM provides,
some already have workarounds in this commit (e.g. the way it
deals with register operands) and some that require further work.
Some of that further work may involve changing what the
Disassembler outputs (and what s2wasm parses), so are probably
best left to followups.
Some known things missing:
- Many directives are ignored and not emitted.
- Vararg calls are parsed but extra args not emitted.
- Loop signatures are likely incorrect.
- $drop= is not emitted.
- Disassembler does not output SIMD types correctly, so assembler
can't test them.
Patch by Wouter van Oortmerssen
Differential Revision: https://reviews.llvm.org/D44329
llvm-svn: 328028
I'm not entirely sure these hacks are still needed. If you remove the hacks completely, the name of the library call that gets generated doesn't match the grep the test previously had. So the test wasn't really checking anything.
If the hack is still needed it belongs in PPC specific code. I believe the FP_TO_SINT code here is the only place in the tree where a FP_ROUND_INREG node is created today. And I don't think its even being used correctly because the legalization returned a BUILD_PAIR with the same value twice. That doesn't seem right to me. By moving the code entirely to PPC we can avoid creating the FP_ROUND_INREG at all.
I replaced the grep in the existing test with full checks generated by hacking update_llc_test_check.py to support ppc32 just long enough to generate it.
Differential Revision: https://reviews.llvm.org/D44061
llvm-svn: 328017
Registers E[A-D]X, E[SD]I, E[BS]P, and EIP have 16-bit subregisters
that cover the low halves of these registers. This change adds artificial
subregisters for the high halves in order to differentiate (in terms of
register units) between the 32- and the low 16-bit registers.
This patch contains parts that aim to preserve the calculated register
pressure. This is in order to preserve the current codegen (minimize the
impact of this patch). The approach of having artificial subregisters
could be used to fix PR23423, but the pressure calculation would need
to be changed.
Differential Revision: https://reviews.llvm.org/D43353
llvm-svn: 328016
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.
Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.
Updated tests to use non-default address spaces.
Differential Revision: https://reviews.llvm.org/D43268
llvm-svn: 328006
TopReadyCycle and BotReadyCycle were off by one cycle when an SU is either
the first instruction or the last instruction in a packet.
Patch by Ikhlas Ajbar.
llvm-svn: 328000
This patch changes the isLatencyBound heuristic to look at the
path length based upon the number of packets needed to schedule
a basic block. For small basic blocks, the heuristic uses a small
threshold for isLatencyBound. For large basic blocks, the
heuristic uses a large threshold.
The goal is to increase the priority of an instruction in a small
basic block that has a large height or depth relative to the code
size. For large functions, the height and depth are ignored
because it increases the live range of a register and causes more
spills. That is, for large functions, it is more important to
schedule instructions when available, and attempt to keep the defs
and uses closer together.
Patch by Brendon Cahoon.
llvm-svn: 327987
In these cases, both parameters and return values are passed
as a pointer to a stack allocation.
MSVC doesn't use the f80 data type at all, while it is used
for long doubles on mingw.
Normally, this part of the calling convention is handled
within clang, but for intrinsics that are lowered to libcalls,
it may need to be handled within llvm as well.
Differential Revision: https://reviews.llvm.org/D44592
llvm-svn: 327957
They were incorrectly marked as RMW operations. Some of the CMP instrucions worked, but the ones that use a similar encoding as RMW form of ADD ended up marked as RMW.
TEST used the same tablegen class as some of the CMPs.
llvm-svn: 327947
E.g.
bar (int x)
{
char p[x];
push outgoing variables for foo.
call foo
}
We need to generate stack adjustment instructions for outgoing arguments by
eliminateCallFramePseudoInstr when the function contains variable size
objects to avoid outgoing variables corrupt the variable size object.
Default hasReservedCallFrame will return !hasFP().
We don't want to generate extra sp adjustment instructions when hasFP()
return true, So We override hasReservedCallFrame as !hasVarSizedObjects().
Differential Revision: https://reviews.llvm.org/D43752
llvm-svn: 327938
When outlining calls, the outliner needs to update CFI to ensure that, say,
exception handling works. This commit adds that functionality and adds a test
just for call outlining.
Call outlining stuff in machine-outliner.mir should be moved into
machine-outliner-calls.mir in a later commit.
llvm-svn: 327917
We don't need to create an ISD::TRUNCATE node to return, we started with one and can return it. Also remove the call to getExtendInVec, the result is just going to be a getNode of that value passed in.
llvm-svn: 327914
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).
This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.
Differential Revision: https://reviews.llvm.org/D44291
llvm-svn: 327897
PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong.
llvm-svn: 327882
JRCXZ was already present, but not the others.
We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.
llvm-svn: 327881
The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago.
llvm-svn: 327880
PowerPC targets do not use address spaces. As a result, we can get selection
failures with address space casts. This patch makes those casts noops.
Patch by Valentin Churavy.
Differential revision: https://reviews.llvm.org/D43781
llvm-svn: 327877
With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that.
llvm-svn: 327869
Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage.
We also used the MUL8 itinerary for MULX32/64 which was also weird.
The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result.
llvm-svn: 327866
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.
Differential Revision: https://reviews.llvm.org/D40196
llvm-svn: 327856
Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults.
This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases.
I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific.
Differential Revision: https://reviews.llvm.org/D44612
llvm-svn: 327855
1. Given that we already have a classification bucket with 'nop' in the name,
that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably
llvm-mca) how to model the resource requirements better even though a nop has no
dependencies.
Differential Revision: https://reviews.llvm.org/D44608
llvm-svn: 327853
Summary:
The docs already claim that this happens, but so far it hasn't. As a
consequence, existing TableGen files get this wrong a lot, but luckily
the fixes are all reasonably straightforward.
To make this work with all the existing forms of self-references (since
the true type of a record is only built up over time), the lookup of
self-references in !cast is delayed until the final resolving step.
Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D44475
llvm-svn: 327849
Summary:
These are cases of self-references that exist today in practice. Let's
add tests for them to avoid regressions.
The self-references in PPCInstrInfo.td can be expressed in a simpler
way. Allowing this type of self-reference while at the same time
consistently doing late-resolve even for self-references is problematic
because there are references to fields that aren't in any class. Since
there's no need for this type of self-reference anyway, let's just
remove it.
Change-Id: I914e0b3e1ae7adae33855fac409b536879bc3f62
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: nemanjai, wdng, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D44474
llvm-svn: 327848
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.
Replace with IMPLICIT_DEF to avoid iterator issues.
llvm-svn: 327842
This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which
uses v4f16 and v8f16 vector operands and return values. All the moving parts
are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16
intrinsic. In a follow-up patch the rest of the intrinsics and tests will be
added.
Differential Revision: https://reviews.llvm.org/D44538
llvm-svn: 327839
If DoneMBB becomes empty it must have CC added to its live-in list, since it
will fall-through into EndMBB. This happens when the CLC loop does the
complete range.
Review: Ulrich Weigand
llvm-svn: 327834
Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm.
llvm-svn: 327820
This patch adds i128 division support by instruction LLVM to lower
128-bit divisions to the __udivmodti4 and __divmodti4 rtlib functions.
This also adds test for 64-bit division and 128-bit division.
Patch by Peter Nimmervoll.
llvm-svn: 327814