https://bugs.llvm.org/show_bug.cgi?id=48918
The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.
Differential Revision: https://reviews.llvm.org/D97504
We implement getHostCPUName() for AIX via systemcfg interfaces since access to the processor version register is a privileged operation. We return a value based on the current processor implementation mode.
This fixes the cpu detection used by clang for `-mcpu=native`.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D95966
All of these families were claiming to be a73 based, which was causing
-mcpu/mtune=native to never use the newer features available to these
cores.
Goes through each and bumps the individual cores to their respective Big
counterparts. Since this code path doesn't support big.little detection,
there was already a precedent set with the Qualcomm line to choose the
big cores only.
Adds a comment on each line for the product's name that the part number
refers to. Confirmed on-device and through Linux header naming
convections.
Additionally newer SoCs mix CPU implementer parts from multiple
implementers. Both 0x41 (ARM) and 0x51 (Qualcomm) in the Snapdragon case
This was causing a desync in information where the scan at the start to
find the implementer would mismatch the part scan later on.
Now scan for both implementer and part at the start so these stay in
sync.
Differential Revision: https://reviews.llvm.org/D94954
This patch mainly made the following changes:
1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.
Differential Revision: https://reviews.llvm.org/D89105
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86503
On z/OS, the information is stored in the Common System Data Area
(CSD). It is the number of CPs allocated to the current LPAR.
Reviewers: aganea, hubert.reinterpertcast, MaskRay
Reviewed By: hubert.reinterpertcast
Differential Revision: https://reviews.llvm.org/D85531
Commit https://reviews.llvm.org/rGcd53ded557c3 attempts to fix the
computation in computeHostNumPhysicalCores() to respect Affinity.
However, the GLIBC wrapper of the affinity system call fails with
a default size of cpu_set_t on systems that have more than 1024 CPUs.
This just fixes the computation on such large machines.
computeHostNumPhysicalCores() is designed to respect CPU affinity.
D84764 used sysconf(_SC_NPROCESSORS_ONLN) which does not respect
affinity.
SupportTests Threading.PhysicalConcurrency may fail if taskset -c is specified.
ThinLTO is run using a single thread on Linux on Power. The
compute_thread_count() routine calls getHostNumPhysicalCores which
returns -1 by default, and so `MaxThreadCount is set to 1.
unsigned llvm::ThreadPoolStrategy::compute_thread_count() const {
int MaxThreadCount = UseHyperThreads
? computeHostNumHardwareThreads()
: sys::getHostNumPhysicalCores();
if (MaxThreadCount <= 0)
MaxThreadCount = 1;
…
}
Fix: provide custom implementation of getHostNumPhysicalCores for
Linux on Power and Linux on Z.
Reviewed By: Kai, uweigand
Differential Revision: https://reviews.llvm.org/D84764
For model 6 CPUs, we have a fallback detection method based on
available features. That mechanism should be enough to detect
these early family 6 CPUs as they only differ in the features
used by the detection anyway.
Rather than converting type/subtype into strings, just directly
select the string as part of family/model decoding. This avoids
the need for creating fake Type/SubTypes for CPUs not supported
by compiler-rtl. I've left the Type/SubType in place where it matches
compiler-rt so that the code can be diffed, but the Type/SubType
is no longer used by Host.cpp.
compiler-rt was already updated to select strings that aren't used
so the code will look similar.
This patch upstreams support for the Arm-v8 Cortex-A78 and Cortex-X1
processors for AArch64 and ARM.
In detail:
- Adding cortex-a78 and cortex-x1 as cpu options for aarch64 and arm targets in clang
- Adding Cortex-A78 and Cortex-X1 CPU names and ProcessorModels in llvm
details of the CPU can be found here:
https://www.arm.com/products/cortex-xhttps://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev
Reviewers: t.p.northover, dmgreen
Reviewed By: dmgreen
Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D83206
This time without the change to make operator| use operator&=.
That seems to be the source of the gcc 5.3 miscompile.
Original commit message:
These represent the same thing but 64BIT only showed up from
getHostCPUFeatures providing a list of featuers to clang. While
EM64T showed up from getting the features for a named CPU.
EM64T didn't have a string specifically so it would not be passed
up to clang when getting features for a named CPU. While 64bit
needed a name since that's how it is index.
Merge them by filtering 64bit out before sending features to clang
for named CPUs.
It gets miscompiled with GCC 5.3, causing Clang to crash with
"error: unknown target CPU 'x86-64'"
See the llvm-commits thread for reproduction steps.
This reverts commit 51b0da731a.
This seems to be leftover copied from an older implementation
of getHostCPUName where we needed this to check the name of
CPU vendor. We don't check the CPU vendor at all in
getHostCPUFeatures so this union and the variable are unneeded.
These represent the same thing but 64BIT only showed up from
getHostCPUFeatures providing a list of featuers to clang. While
EM64T showed up from getting the features for a named CPU.
EM64T didn't have a string specifically so it would not be passed
up to clang when getting features for a named CPU. While 64bit
needed a name since that's how it is index.
Merge them by filtering 64bit out before sending features to clang
for named CPUs.
These represent the same thing but 64BIT only showed up from
getHostCPUFeatures providing a list of featuers to clang. While
EM64T showed up from getting the features for a named CPU.
EM64T didn't have a string specifically so it would not be passed
up to clang when getting features for a named CPU. While 64bit
needed a name since that's how it is index.
Merge them by filtering 64bit out before sending features to clang
for named CPUs.
This patch upstreams support for the Arm-v8 Cortex-A77
processor for AArch64 and ARM.
In detail:
- Adding cortex-a77 as a cpu option for aarch64 and arm targets in clang
- Cortex-A77 CPU name and ProcessorModel in llvm
details of the CPU can be found here:
https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77
and a similar submission to GCC can be found here:
e0664b7a63
The following people contributed to this patch:
- Luke Geeson
- Mikhail Maltsev
Reviewers: t.p.northover, dmgreen, ostannard, SjoerdMeijer
Reviewed By: dmgreen
Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D82887
Brand index was a feature some Pentium III and Pentium 4 CPUs.
It provided an index into a software lookup table to provide a
brand name for the CPU. This is separate from the family/model.
It's unclear to me why this index being non-zero was used to
block checking family/model. I think the effect of this is that
-march=native was not working correctly on the CPUs that have a
non-zero brand index. They are all about 20 years old so this
probably hasn't affected many users.
We have three 32 bit variables containing feature bits. But our
enum is a flat 96 bit space. So we need to pick which of the
variables to use based on the bit value. We used to do this
manually by mentioning the correct variable and subtracting an
offset from the enum. But this is error prone.
The exact same #if is already inside isCpuIdSupported and causes
it to return true. The definition of isCpuIdSupported isn't
conditional so we should be able just rely on its body doing
the right thing.
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
This adds the family/model returned by CPUID for some Intel
Comet Lake CPUs. Instruction set and tuning wise these are
the same as "skylake".
These are not in the Intel SDM yet, but these should be correct.
This patch upstreams support for the ARM Armv8.1m cpu Cortex-M55.
In detail adding support for:
- mcpu option in clang
- Arm Target Features in clang
- llvm Arm TargetParser definitions
details of the CPU can be found here:
https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55
Reviewers: chill
Reviewed By: chill
Subscribers: dmgreen, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74966
This patch upstreams support for the AArch64 Armv8-A cpu Cortex-A34.
In detail adding support for:
- mcpu option in clang
- AArch64 Target Features in clang
- llvm AArch64 TargetParser definitions
details of the cpu can be found here:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a34
Reviewers: SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74483
Change-Id: Ida101fc544ca183a0a0e61a1277c8957855fde0b
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This is a continuation of D70262
The previous patch as listed above added the future CPU in clang. This patch
adds the future CPU in the PowerPC backend. At this point the patch simply
assumes that a future CPU will have the same characteristics as pwr9. Those
characteristics may change with later patches.
Differential Revision: https://reviews.llvm.org/D70333
Darwin lazily saves the AVX512 context on first use [1]: instead of checking
that it already does to figure out if the OS supports AVX512, trust that
the kernel will do the right thing and always assume the context save
support is available.
[1] https://github.com/apple/darwin-xnu/blob/xnu-4903.221.2/osfmk/i386/fpu.c#L174
Reviewers: ab, RKSimon, craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D70453
Summary: Currently there is no implementation of `sys::getHostCPUName()` for Darwin ARM targets. This patch makes it so that LLVM running on ARM makes reasonable guesses about the CPU features of the host CPU.
Reviewers: t.p.northover, lhames, efriedma
Reviewed By: efriedma
Subscribers: rjmccall, efriedma, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69597
The recently announced IBM z15 processor implements the architecture
already supported as "arch13" in LLVM. This patch adds support for
"z15" as an alternate architecture name for arch13.
The patch also uses z15 in a number of places where we used arch13
as long as the official name was not yet announced.
llvm-svn: 372435
-Deprecate -mmpx and -mno-mpx command line options
-Remove CPUID detection of mpx for -march=native
-Remove MPX from all CPUs
-Remove MPX preprocessor define
I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything.
gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX
Differential Revision: https://reviews.llvm.org/D66669
llvm-svn: 370393
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D65840
llvm-svn: 368543
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Assembler/disassembler support for new instructions.
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of arch13 as host processor.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365932
Returns "cortex-a73" for 3rd and 4th gen Kryo; not precisely correct,
but close enough.
Differential Revision: https://reviews.llvm.org/D63099
llvm-svn: 363013
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Author: LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel
Reviewed By: craig.topper
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60550
llvm-svn: 360017
CMPXCHG8B was introduced on i586/pentium generation.
If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG.
Differential Revision: https://reviews.llvm.org/D59576
llvm-svn: 356631
This patch enables the following
1) AMD family 17h "znver2" tune flag (-march, -mcpu).
2) ISAs that are enabled for "znver2" architecture.
3) For the time being, it uses the znver1 scheduler model.
4) Tests are updated.
5) Scheduler descriptions are yet to be put in place.
Reviewers: craig.topper
Differential Revision: https://reviews.llvm.org/D58343
llvm-svn: 354897
We implicitly mark this feature as enabled when the target is 64-bits, but our detection code for -march=native didn't support it so you can't detect it on 32-bit targets.
llvm-svn: 353963
JMP32 instructions has been added to eBPF ISA. They are 32-bit variants of
existing BPF conditional jump instructions, but the comparison happens on
low 32-bit sub-register only, therefore some unnecessary extensions could
be saved.
JMP32 instructions will only be available for -mcpu=v3. Host probe hook has
been updated accordingly.
JMP32 instructions will only be enabled in code-gen when -mattr=+alu32
enabled, meaning compiling the program using sub-register mode.
For JMP32 encoding, it is a new instruction class, and is using the
reserved eBPF class number 0x6.
This patch has been tested by compiling and running kernel bpf selftests
with JMP32 enabled.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 353384
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This is skylake-avx512 with the addition of avx512vnni ISA.
Patch by Jianping Chen
Differential Revision: https://reviews.llvm.org/D54785
llvm-svn: 347681
This small patch updates the CPU detection for Cavium processors when
-mcpu=native is passed on compile-line.
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D51939
llvm-svn: 343897
After r341022, we more strictly check the 64bit feature in X86Subtargets constructor when a 64-bit triple is used. If we don't infer this feature for autodetected CPUs we might incorrectly report an error if the CPU name wasn't autodetected to a CPU that supports 64-bit.
llvm-svn: 342914
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.
Reviewers: craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D47141
llvm-svn: 333255
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.
Comes with a clang patch (D46881).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46882
llvm-svn: 332705
See r331124 for how I made a list of files missing the include.
I then ran this Python script:
for f in open('filelist.txt'):
f = f.strip()
fl = open(f).readlines()
found = False
for i in xrange(len(fl)):
p = '#include "llvm/'
if not fl[i].startswith(p):
continue
if fl[i][len(p):] > 'Config':
fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
found = True
break
if not found:
print 'not found', f
else:
open(f, 'w').write(''.join(fl))
and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.
No intended behavior change.
llvm-svn: 331184
LLVM_ON_WIN32 is set exactly with MSVC and MinGW (but not Cygwin) in
HandleLLVMOptions.cmake, which is where _WIN32 defined too. Just use the
default macro instead of a reinvented one.
See thread "Replacing LLVM_ON_WIN32 with just _WIN32" on llvm-dev and cfe-dev.
No intended behavior change.
This moves over all uses of the macro, but doesn't remove the definition
of it in (llvm-)config.h yet.
llvm-svn: 331127