This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D84345
Add an option -munsafe-fp-atomics for AMDGPU target.
When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics"
to any functions for amdgpu target. This allows amdgpu backend to use
unsafe fp atomic instructions in these functions.
Differential Revision: https://reviews.llvm.org/D91546
We want to allow using MMA on P10 CPU only. This patch prevents the use of MMA
with the -mmma option on P9 CPUs and earlier.
Differential Revision: https://reviews.llvm.org/D91200
We have option -mabi=ieeelongdouble to set current long double to
IEEEquad semantics. Like what GCC does, we need to define
__LONG_DOUBLE_IEEE128__ macro in this case, and __LONG_DOUBLE_IBM128__
if using PPCDoubleDouble.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90208
Support a vector register constraint in inline asm of clang.
Add a regression test also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D91251
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
On AIX, to support vector types, which should always be 16 bytes aligned,
we set alloca to return 16 bytes aligned memory space.
Differential Revision: https://reviews.llvm.org/D89910
Many non-language extensions are defined but also unused. This patch
removes them with their tests as they do not require compiler support.
The cl_khr_select_fprounding_mode extension is also removed because it
has been deprecated since OpenCL 1.1 and Clang doesn't have any specific
support for it.
The cl_khr_context_abort extension is only referred to in "The OpenCL
Specification", version 1.2 and 2.0, in Table 4.3, but no specification
is provided in "The OpenCL Extension Specification" for these versions.
Because it is both unused in Clang and lacks specification, this
extension is removed.
The following extensions are platform extensions that bring new OpenCL
APIs but do not impact the kernel language nor require compiler support.
They are therefore removed.
- cl_khr_gl_sharing, introduced in OpenCL 1.0
- cl_khr_icd, introduced in OpenCL 1.2
- cl_khr_gl_event, introduced in OpenCL 1.1
Note: this extension adds a new API to create cl_event but it also
specifies that these can only be used by clEnqueueAcquireGLObjects.
Hence, they cannot be used on the device side and the extension does
not impact the kernel language.
- cl_khr_d3d10_sharing, introduced in OpenCL 1.1
- cl_khr_d3d11_sharing, introduced in OpenCL 1.2
- cl_khr_dx9_media_sharing, introduced in OpenCL 1.2
- cl_khr_image2d_from_buffer, introduced in OpenCL 1.2
- cl_khr_initialize_memory, introduced in OpenCL 1.2
- cl_khr_gl_depth_images, introduced in OpenCL 1.2
Note: this extension is related to cl_khr_depth_images but only the
latter adds new features to the kernel language.
- cl_khr_spir, introduced in OpenCL 1.2
- cl_khr_egl_event, introduced in OpenCL 1.2
Note: this extension adds a new API to create cl_event but it also
specifies that these can only be used by clEnqueueAcquire* API
functions. Hence, they cannot be used on the device side and the
extension does not impact the kernel language.
- cl_khr_egl_image, introduced in OpenCL 1.2
- cl_khr_terminate_context, introduced in OpenCL 1.2
The minimum required OpenCL version used in OpenCLExtensions.def for
these extensions is not always correct. Removing these address that
issue.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D89372
- The goal of this patch is improve option compatible with RISCV-V GCC,
-mcpu support on GCC side will sent patch in next few days.
- -mtune only affect the pipeline model and non-arch/extension related
target feature, e.g. instruction fusion; in td file it called
TuneFeatures, which is introduced by X86 back-end[1].
- -mtune accept all valid option for -mcpu and extra alias processor
option, e.g. `generic`, `rocket` and `sifive-7-series`, the purpose is
option compatible with RISCV-V GCC.
- Processor alias for -mtune will resolve according the current target arch,
rv32 or rv64, e.g. `rocket` will resolve to `rocket-rv32` or `rocket-rv64`.
- Interaction between -mcpu and -mtune:
* -mtune has higher priority than -mcpu for pipeline model and
TuneFeatures.
[1] https://reviews.llvm.org/D85165
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D89025
GCC 11 will define this macro.
In LLVM, the feature flag only applies to 64-bit mode and we always define the
macro in 32-bit mode. This is different from GCC -m32 in which -mno-sahf can
suppress the macro. The discrepancy can unlikely cause trouble.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D89198
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Set the default alignment control variables for z/OS target and add test case for alignment rules on z/OS.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D88845
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.
This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.
The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.
Differential Revision: https://reviews.llvm.org/D88524
This adds support for -mcpu=cortex-r82. Some more information about this
core can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-r/cortex-r82
One note about the system register: that is a bit of a refactoring because of
small differences between v8.4-A AArch64 and v8-R AArch64.
This is based on patches from Mark Murray and Mikhail Maltsev.
Differential Revision: https://reviews.llvm.org/D88660
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.
It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.
This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.
Differential Revision: https://reviews.llvm.org/D84968
Set the default wchar_t type on z/OS, and unsigned as the default.
Reviewed By: hubert.reinterpretcast, fanbo-meng
Differential Revision: https://reviews.llvm.org/D87624
Aligned allocation is not supported on z/OS. This patch sets -faligned-alloc-unavailable as default in z/OS toolchain.
Reviewed By: abhina.sreeskantharajan, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D87611
d4ce862f introduced HasStrictFP to disable generating constrained FP
operations for platforms lacking support. Since work for enabling
constrained FP on PowerPC is almost done, we'd like to enable it.
Reviewed By: kpn, steven.zhang
Differential Revision: https://reviews.llvm.org/D87223
As reported in Bug 42535, `clang` doesn't inline atomic ops on 32-bit
Sparc, unlike `gcc` on Solaris. In a 1-stage build with `gcc`, only two
testcases are affected (currently `XFAIL`ed), while in a 2-stage build more
than 100 tests `FAIL` due to this issue.
The reason for this `gcc`/`clang` difference is that `gcc` on 32-bit
Solaris/SPARC defaults to `-mpcu=v9` where atomic ops are supported, unlike
with `clang`'s default of `-mcpu=v8`. This patch changes `clang` to use
`-mcpu=v9` on 32-bit Solaris/SPARC, too.
Doing so uncovered two bugs:
`clang -m32 -mcpu=v9` chokes with any Solaris system headers included:
/usr/include/sys/isa_defs.h:461:2: error: "Both _ILP32 and _LP64 are defined"
#error "Both _ILP32 and _LP64 are defined"
While `clang` currently defines `__sparcv9` in a 32-bit `-mcpu=v9`
compilation, neither `gcc` nor Studio `cc` do. In fact, the Studio 12.6
`cc(1)` man page clearly states:
These predefinitions are valid in all modes:
[...]
__sparcv8 (SPARC)
__sparcv9 (SPARC -m64)
At the same time, the patch defines `__GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248]`
for a 32-bit Sparc compilation with any V9 cpu. I've also changed
`MaxAtomicInlineWidth` for V9, matching what `gcc` does and the Oracle
Developer Studio 12.6: C User's Guide documents (Ch. 3, Support for Atomic
Types, 3.1 Size and Alignment of Atomic C Types).
The two testcases that had been `XFAIL`ed for Bug 42535 are un-`XFAIL`ed
again.
Tested on `sparcv9-sun-solaris2.11` and `amd64-pc-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D86621
The __ARM_FEATURE_SVE_BITS feature macro is specified in the Arm C
Language Extensions (ACLE) for SVE [1] (version 00bet5). From the spec,
where __ARM_FEATURE_SVE_BITS==N:
When N is nonzero, indicates that the implementation is generating
code for an N-bit SVE target and that the arm_sve_vector_bits(N)
attribute is available.
This was defined in D83550 as __ARM_FEATURE_SVE_BITS_EXPERIMENTAL and
enabled under the -msve-vector-bits flag to simplify initial tests.
This patch drops _EXPERIMENTAL now there is support for the feature.
[1] https://developer.arm.com/documentation/100987/latest
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D86720
This patch defaults to -mtune=generic unless -march is present. If -march is present we'll use the empty string unless its overridden by mtune. The back should use the target cpu if the tune-cpu isn't present.
It also adds AST serialization support to fix some tests that emit AST and parse it back. These tests diff the IR against the output from not going through AST. So if we don't serialize the tune CPU we fail the diff.
Differential Revision: https://reviews.llvm.org/D86488
This patch adds the z/OS target and defines macros as a stepping stone
towards enabling a native build on z/OS.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D85324
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86503
This patch adds frontend and backend options to enable and disable
the PowerPC MMA operations added in ISA 3.1. Instructions using these
options will be added in subsequent patches.
Differential Revision: https://reviews.llvm.org/D81442
gcc errors on this, but I'm nervous that since -mtune has been
ignored by clang for so long that there may be code bases out
there that pass 32-bit cpus to clang.
This adds parsing and codegen support for tune in target attribute.
I've implemented this so that arch in the target attribute implicitly disables tune from the command line. I'm not sure what gcc does here. But since -march implies -mtune. I assume 'arch' in the target attribute implies tune in the target attribute.
Differential Revision: https://reviews.llvm.org/D86187
Properly set "simd128" in the feature map when "unimplemented-simd128"
is requested.
initFeatureMap is used to create the feature vector used by
handleTargetFeatures. There are later calls to initFeatureMap in
CodeGen that were using these flags to recreate the map. But the
original feature vector should be passed to those calls. So that
should be enough to rebuild the map.
The only issue seemed to be that simd128 was not enabled in the
map by the first call to initFeatureMap. Using the SIMDLevel set
by handleTargetFeatures in the later calls allowed simd128 to be
set in the later versions of the map.
To fix this I've added an override of setFeatureEnabled that
will update the map the first time with the correct simd dependency.
Differential Revision: https://reviews.llvm.org/D85806
COFF targets have a max object alignment of 8192, so trying to create
one with a larger size results in an unreachable in WinCOFFObjectWriter.
For the reproducer I have uses thread local storage, however other
alignments are likely affected as well.
This patch sets the MaxVectorAlign for COFF to 8192. Additionally,
though there is no longer a way to reproduce that I could find, it
correctly sets the MaxTLSAlign for COFF to that value as well, so that
if anyone comes up with a situation where this is true, it will cause an
error.
Differential Revision: https://reviews.llvm.org/D85543
If the CPU string is empty, the target feature map may end up having
an empty string inserted to it. The symptom of the problem is a warning
message:
'+' is not a recognized feature for this target (ignoring feature)
Also, the target-features attribute in the module will have an empty
string in it.
Adds frontend and backend options to enable and disable the
PowerPC paired vector memory operations added in ISA 3.1.
Instructions using these options will be added in subsequent patches.
Differential Revision: https://reviews.llvm.org/D83722
This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:
```
generic->global->host
->device
->private
->local
constant
```
Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)
Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.
Patch by Dmitry Sidorov.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D82174
Implement AIX default `power` alignment rule by adding `PreferredAlignment` and
`PreferredNVAlignment` in ASTRecordLayout class.
The patchh aims at returning correct value for `__alignof(x)` and `alignof(x)`
under `power` alignment rules.
Differential Revision: https://reviews.llvm.org/D79719
Implement __builtin_eh_return_data_regno for SystemZ.
Match behavior of GCC.
Author: slavek-kucera
Differential Revision: https://reviews.llvm.org/D84341
Use 'o' for the mangling specification instead of 'e'. This fixes an
error in the backend caused by a mismatch between the data layouts
generated by the backend and the frontend.
rdar://problem/64168540
Summary:
This patch implements parsing support for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE, version 00bet5,
section 3.7.3) for SVE [1].
The purpose of this attribute is to define fixed-length (VLST) versions
of existing sizeless types (VLAT). For example:
#if __ARM_FEATURE_SVE_BITS==512
typedef svint32_t fixed_svint32_t __attribute__((arm_sve_vector_bits(512)));
#endif
Creates a type 'fixed_svint32_t' that is a fixed-length version of
'svint32_t' that is normal-sized (rather than sizeless) and contains
exactly 512 bits. Unlike 'svint32_t', this type can be used in places
such as structs and arrays where sizeless types can't.
Implemented in this patch is the following:
* Defined and tested attribute taking single argument.
* Checks the argument is an integer constant expression.
* Attribute can only be attached to a single SVE vector or predicate
type, excluding tuple types such as svint32x4_t.
* Added the `-msve-vector-bits=<bits>` flag. When specified the
`__ARM_FEATURE_SVE_BITS__EXPERIMENTAL` macro is defined.
* Added a language option to store the vector size specified by the
`-msve-vector-bits=<bits>` flag. This is used to validate `N ==
__ARM_FEATURE_SVE_BITS`, where N is the number of bits passed to the
attribute and `__ARM_FEATURE_SVE_BITS` is the feature macro defined under
the same flag.
The `__ARM_FEATURE_SVE_BITS` macro will be made non-experimental in the final
patch of the series.
[1] https://developer.arm.com/documentation/100987/latest
This is patch 1/4 of a patch series.
Reviewers: sdesmalen, rsandifo-arm, efriedma, ctetreau, cameron.mcinally, rengolin, aaron.ballman
Reviewed By: sdesmalen, aaron.ballman
Differential Revision: https://reviews.llvm.org/D83550
Summary:
1. gcc uses `-march` and `-mtune` flag to chose arch and
pipeline model, but clang does not have `-mtune` flag,
we uses `-mcpu` to chose both infos.
2. Add SiFive e31 and u54 cpu which have default march
and pipeline model.
3. Specific `-mcpu` with rocket-rv[32|64] would select
pipeline model only, and use the driver's arch choosing
logic to get default arch.
Reviewers: lenary, asb, evandro, HsiangKai
Reviewed By: lenary, asb, evandro
Tags: #llvm, #clang
Differential Revision: https://reviews.llvm.org/D71124
This patch adds override to several overriding virtual functions that were missing the keyword within the clang/ directory. These were found by the new -Wsuggest-override.
Remove _COMPAT. Drop the ARCHNAME. Remove the non-COMPAT versions
that are no longer needed.
We now only use these macros in places where we need compatibility
with libgcc/compiler-rt. So we don't need to call out _COMPAT
specifically.
We currently have strict floating point/constrained floating point enabled
for all targets. Constrained SDAG nodes get converted to the regular ones
before reaching the target layer. In theory this should be fine.
However, the changes are exposed to users through multiple clang options
already in use in the field, and the changes are _completely_ _untested_
on almost all of our targets. Bugs have already been found, like
"https://bugs.llvm.org/show_bug.cgi?id=45274".
This patch disables constrained floating point options in clang everywhere
except X86 and SystemZ. A warning will be printed when this happens.
Use the new -fexperimental-strict-floating-point flag to force allowing
strict floating point on hosts that aren't already marked as supporting
it (X86 and SystemZ).
Differential Revision: https://reviews.llvm.org/D80952
setFeatureEnabled is a virtual function. setFeatureEnabledImpl
was its implementation. This split was to avoid virtual calls
when we need to call setFeatureEnabled in initFeatureMap.
With C++11 we can use 'final' on setFeatureEnabled to enable
the compiler to perform de-virtualization for the initFeatureMap
calls.
Previously we had to specify the forward and backwards feature dependencies separately which was error prone. And as dependencies have gotten more complex it was hard to be sure the transitive dependencies were handled correctly. The way it was written was also not super readable.
This patch replaces everything with a table that lists what features a feature is dependent on directly. Then we can recursively walk through the table to find the transitive dependencies. This is largely based on how we handle subtarget features in the MC layer from the tablegen descriptions.
Differential Revision: https://reviews.llvm.org/D83273
Instead of detecting the string in 2 places. Just swap the string
to 'sse4.1' or 'sse4.2' at the top of the function.
Prep work for a patch to switch the rest of this function to a
table based system. And I don't want to include 'sse4a' in the
table.
We currently have strict floating point/constrained floating point enabled
for all targets. Constrained SDAG nodes get converted to the regular ones
before reaching the target layer. In theory this should be fine.
However, the changes are exposed to users through multiple clang options
already in use in the field, and the changes are _completely_ _untested_
on almost all of our targets. Bugs have already been found, like
"https://bugs.llvm.org/show_bug.cgi?id=45274".
This patch disables constrained floating point options in clang everywhere
except X86 and SystemZ. A warning will be printed when this happens.
Differential Revision: https://reviews.llvm.org/D80952
Summary:
Change stack alignment from 64 bits to 128 bits to follow ABI correctly.
And add a regression test for datalayout.
Reviewers: simoll, k-ishizaka
Reviewed By: simoll
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #llvm, #ve, #clang
Differential Revision: https://reviews.llvm.org/D83173
Summary:
D80831 changed part of the prefix usage for AIX.
But there are other places getting prefix from DataLayout.
This patch intends to make prefix usage consistent on AIX.
Reviewed by: hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D81270
This replaces the switch statement implementation in the clang's
X86.cpp with a lookup table in X86TargetParser.cpp.
I've used constexpr and copy of the FeatureBitset from
SubtargetFeature.h to store the features in a lookup table.
After the lookup the bitset is translated into strings for use
by the rest of the frontend code.
I had to modify the implementation of the FeatureBitset to avoid
bugs in gcc 5.5 constexpr handling. It seems to not like the
same array entry to be used on the left side and right hand side
of an assignment or &= or |=. I've also used uint32_t instead of
uint64_t and sized based on the X86::CPU_FEATURE_MAX.
I've initialized the features for different CPUs outside of the
table so that we can express inheritance in an adhoc way. This
was one of the big limitations of the switch and we had resorted
to labels and gotos.
Differential Revision: https://reviews.llvm.org/D82731
Summary:
The following feature macros have been added:
__ARM_FEATURE_SVE_BF16
__ARM_FEATURE_SVE_MATMUL_INT8
__ARM_FEATURE_SVE_MATMUL_FP32
__ARM_FEATURE_SVE_MATMUL_FP64
The driver has been updated to enable them accordingly to the value of
the target feature passed at command line.
The SVE ACLE tests using the macros have been modified to work with
the target feature instead of passing the macro at command line.
Reviewers: sdesmalen, efriedma, c-rhodes, kmclaughlin, SjoerdMeijer, rengolin
Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82623
Previously we inferred it if sse4.2 ended up being enabled after
all feature processing. But writing -march=nehalem -mno-sse4.2
should have popcnt enabled.
CPUs with avx always have xsave, but some CPUs without avx also
have xsave. So we shouldn't disable xsave just because avx is
disabled. This would prevent xsave from being enabled with
-march=native on CPUs with xsave and not avx.
But we also don't want -mavx -mno-avx to leave xsave eanabled.
So only enable xsave if avx is enabled after processing all features.
I thought about just not turning xsave on with avx at all, but
there might be someone out there depending on it.
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
That is, I cannot use `fp` register name from the C code because Clang does not accept it (exactly like GCC). But the accepted name `r4` is not recognised by `llc` (it can be used in listings passed to `llvm-mc` and even `fp` is replace to `r4` by `llvm-mc`). So I can specify any of `fp` or `r4` for the string literal of `asm(...)` but nothing in the clobber list.
This patch replaces `MSP430::FP` with `MSP430::R4` in the backend code (even [MSP430 EABI](http://www.ti.com/lit/an/slaa534/slaa534.pdf) doesn't mention FP as a register name). The R0 - R3 registers, on the other hand, are left as is in the backend code (after all, they have some special meaning on the ISA level). It is just ensured clang is renaming them as expected by the downstream tools. There is probably not much sense in **marking them clobbered** but rename them //just in case// for use at potentially different contexts.
Differential Revision: https://reviews.llvm.org/D82184
These features implicitly enabled XSAVE in the frontend, but not
the backend. Disabling XSAVE in the frontend disabled XSAVEOPT, but
not the other 2. Nothing happened in the backend.
The PREFETCHW instruction was originally part of the 3DNow. But
it was given its own CPUID bit on later CPUs just before 3DNow
was deprecated.
We were setting the -mprfchw flag if -m3dnow was passed or the CPU
supported 3dnow unless -mno-prfchw was passed. But -march=native
on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw.
So -march=k8 will behave differently than -march=native on a K8
for example.
So remove this implicit setting from the frontend and instead
enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled.
Also enable PRFCHW flag on amdfam10/barcelona which seems to be
where this CPUID bit was introduced. That CPU also supported
3dnow.
The PREFETCHW instruction was originally part of the 3DNow. But
it was given its own CPUID bit on later CPUs just before 3DNow
was deprecated.
We were setting the -mprfchw flag if -m3dnow was passed or the CPU
supported 3dnow unless -mno-prfchw was passed. But -march=native
on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw.
So -march=k8 will behave differently than -march=native on a K8
for example.
So remove this implicit setting from the frontend and instead
enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled.
Also enable PRFCHW flag on amdfam10/barcelona which seems to be
where this CPUID bit was introduced. That CPU also supported
3dnow.
This patch enables the following macros when their corresponding
target attributes are set:
__ARM_FEATURE_SVE (+sve)
__ARM_FEATURE_SVE2 (+sve2)
__ARM_FEATURE_SVE2_AES (+sve2-aes)
__ARM_FEATURE_SVE2_BITPERM (+sve2-bitperm)
__ARM_FEATURE_SVE2_SHA3 (+sve2-sha3)
__ARM_FEATURE_SVE2_SM4 (+sve2-sm4)
This implies that the base SVE and SVE2 ACLE (00bet2) are now feature
complete, meaning that all intrinsics are implemented in LLVM and Clang.
Disclaimer:
To implement the ACLE we have had to fix up many parts of LLVM to make it
support scalable vectors. We have also used many target-specific intrinsics
to reduce reliance on parts of LLVM where we know scalable vectors may
not yet be handled properly (e.g. some transformation might drop the
'scalable' flag on a vector type). While we've done a best effort with
the limited testing that is available to us, we're still working to improve the
stability of the implementation. Additionally, Clang may print warnings
that code may have miscompiled. We find this often to be a false alarm
where the wrong interfaces have been used in LLVM and where resulting
code is not actually incorrect. However, this warrants a bug report
and investigation. If you find any bugs or issues, please raise them on
bugs.llvm.org and let us know!
Reviewers: rengolin, efriedma, david-arm, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D81725
This patch removes the PROC macro in favor of CPUKind enum and a
table that contains information about CPUs.
The current information in the table is the CPU name, CPUKind enum
value, key feature for target multiversioning, and Is64Bit capable.
For the strings that are aliases, I've duplicated the information
in the table. This means there are more rows in the table than
CPUKind enums.
This replaces multiple StringSwitch's with loops through the table.
They are linear searches due to the table being more logically
ordered than alphabetical. The StringSwitch's would have also been
linear. I've used StringLiteral on the strings in the table so we
can quickly check the length while searching.
I contemplated having a CPUKind for each string so there was a 1:1
mapping, but didn't want to spread more names to the places that
use the enum.
My ultimate goal here is to store the features for each CPU as a
bitset within the table. Hoping to use constexpr to make this
composable so we can group features and inherit them. After the
table lookup we can turn the bitset into a list of strings for the
frontend. The current switch we have for selecting features for
CPUs has become difficult to maintain while trying to express
inheritance relationships.
Differential Revision: https://reviews.llvm.org/D82414
This was orignally done so we could separate the compatibility
values and the llvm internal only features into a separate entries
in the feature array. This was needed when we explicitly had to
convert the feature into the proper 32-bit chunk at every reference
and we didn't want things moving around.
Now everything is in an array and we have helper funtions or macros
to convert encoding to index. So we renumbering is no longer an
issue.
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
Differential Revision: https://reviews.llvm.org/D82184