By default `llvm::seq` would happily iterate over enums, which may be unsafe if the enum values are not continuous. This patch disable enum iteration with `llvm::seq` and `llvm::seq_inclusive` and adds two new functions: `enum_seq` and `enum_seq_inclusive`.
To make sure enum iteration is safe, we require users to declare their enum types as iterable by specializing `enum_iteration_traits<SomeEnum>`. Because it's not always possible to add these traits next to enum definition (e.g., for enums defined in external libraries), we provide an escape hatch to allow iteration on per-callsite basis by passing `force_iteration_on_noniterable_enum`.
The main benefit of this approach is that these global declarations via traits can appear just next to enum definitions, making easy to spot when enums are miss-labeled, e.g., after introducing new enum values, whereas `force_iteration_on_noniterable_enum` should stand out and be easy to grep for.
This emerged from a discussion with gchatelet@ about reusing llvm's `Sequence.h` in lieu of https://github.com/GPUOpen-Drivers/llpc/blob/dev/lgc/interface/lgc/EnumIterator.h.
Reviewed By: dblaikie, gchatelet, aaron.ballman
Differential Revision: https://reviews.llvm.org/D107378
This patch allows iterating typed enum via the ADT/Sequence utility.
It also changes the original design to better separate concerns:
- `StrongInt` only deals with safe `intmax_t` operations,
- `SafeIntIterator` presents the iterator and reverse iterator
interface but only deals with safe `StrongInt` internally.
- `iota_range` only deals with `SafeIntIterator` internally.
This design ensures that operations are always valid. In particular,
"Out of bounds" assertions fire when:
- the `value_type` is not representable as an `intmax_t`
- iterator operations make internal computation underflow/overflow
- the internal representation cannot be converted back to `value_type`
Differential Revision: https://reviews.llvm.org/D106279
[llvm-exegesis] Disable the LBR check on AMD
https://bugs.llvm.org/show_bug.cgi?id=48918
The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.
Differential Revision: https://reviews.llvm.org/D97504
New change:
- Surround usages of x86 helper in llvm-exegesis/X86/Target.cpp with ifdef
- Fix bug which caused the caller of getVendorSignature to not have a copy of EAX that it expected.
https://bugs.llvm.org/show_bug.cgi?id=48918
The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.
Differential Revision: https://reviews.llvm.org/D97504
Include x86 intrinsics only when compiling for x86_64
or i386. _MSC_VER no longer implies x86.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D96498
The addition of TILELOADD instructions with a new encoding format
triggered a hard abort instead of proper error reporting due to the use
of `llvm_unreachable` for actually reachable code.
Properly report an error when the encoding format is unknown.
Differential Revision: https://reviews.llvm.org/D90289
This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency.
Differential Revision: https://reviews.llvm.org/D85254
New change: Updated lit.local.cfg to use pass the right argument to llvm-exegesis to actually request the LBR mode.
Differential Revision: https://reviews.llvm.org/D88670
This reverts commit 4fcd1a8e65 as
`llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s` failed on hosts
without LBR supported if the build has LIBPFM enabled. On that host,
`perf_event_open` fails with `EOPNOTSUPP` on LBR config. That change's
basic assumption
> If this is run on a non-supported hardware, it will produce all zeroes for latency.
could not stand as `perf_event_open` system call will fail if the
underlying hardware really don't have LBR supported.
This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency.
Differential Revision: https://reviews.llvm.org/D85254
Starting with Skylake, the LBR contains the precise number of cycles between the two
consecutive branches.
Making use of this will hopefully make the measurements more precise than the
existing methods of using RDTSC.
Differential Revision: https://reviews.llvm.org/D77422
New change: check for existence of field `cycles` in perf_branch_entry before enabling this mode.
This should prevent compilation errors when building for older kernel whose headers don't support it.
From @erichkeane:
```
This patch doesn't seem to build for me:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp: In function ‘llvm::Error llvm::exegesis::parseDataBuffer(const char*, size_t, const void*, const void*, llvm::SmallVector<long int, 4>*)’:
/iusers/ekeane1/workspaces/llvm-project/llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:99:37: error: ‘struct perf_branch_entry’ has no member named ‘cycles’
CycleArray->push_back(Entry.cycles);
I'm on RHEL7, so I have kernel 3.10, so it doesn't have 'cycles'.
According ot this: https://elixir.bootlin.com/linux/v4.3/source/include/uapi/linux/perf_event.h#L963 kernel 4.3 is the first time that 'cycles' appeared in this structure.
```
Starting with Skylake, the LBR contains the precise number of cycles between the two
consecutive branches.
Making use of this will hopefully make the measurements more precise than the
existing methods of using RDTSC.
Differential Revision: https://reviews.llvm.org/D77422
These are documented as using modrm byte of 0xe8, 0xf0, and 0xf8
respectively. But hardware ignore bits 2:0. So 0xe9-0xef is treated
the same as 0xe8. Similar for the other two.
Fixing this required adding 8 new formats to the X86 instructions
to convey this information. Could have gotten away with 3, but
adding all 8 made for a more logical conversion from format to
modrm encoding.
I renumbered the format encodings to keep the register modrm
formats grouped together.
isPrefix was added to support the patches to align branches.
it relies on a switch over instruction names.
This moves those opcodes to a new format so the information is
tablegen and we can just check for a specific value in some bits
in TSFlags instead.
I've left the other function in place for now so that the
existing patches in phabricator will still work. I'll work with
the owner to get them migrated.
function_ref is non-owning, so if we get it as a parameter in constructor,
our reference goes out-of-scope as soon as constructor returns.
Instead, let's just take it as a parameter to the actual `generate()` call
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)
Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.
Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.
Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.
As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.
Reviewers: courbet, gchatelet
Reviewed By: gchatelet
Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74156
Summary:
It turns out that CUR_DIRECTION is just an internal placeholder, not an actual
valid encoded value.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73343
Summary:
What we're redoing already exists in the X86 backend, it's called
`X86II::getOperandBias`.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73340
Summary:
... instead of crashing.
On typical exmaple is when there are no available registers.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73196
Summary:
Right now when picking a back-to-back instruction at random, we might select
instructions that we do not know how to handle.
Add a ExegesisTarget hook to possibly filter instructions.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73161
Summary:
Add unit test to show the issue: We must select an *aliasing* output
register, not the exact register.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73095
The addition of `inverse_throughput` mode highlighted the disjointedness
of snippet generators and benchmark runners because it used the
`UopsSnippetGenerator` with the `LatencyBenchmarkRunner`.
To keep the code consistent tie the snippet generators to
parallelization/serialization rather than their benchmark runners.
Renaming `LatencySnippetGenerator` -> `SerialSnippetGenerator`.
Renaming `UopsSnippetGenerator` -> `ParallelSnippetGenerator`.
Differential Revision: https://reviews.llvm.org/D72928
Summary: This is a following up to D70874. It adds the initialization of FPCW in llvm-exegesis.
Reviewers: craig.topper, RKSimon, courbet, gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70891
Summary: This patch is used to initialize the new added register MXCSR.
Reviewers: craig.topper, RKSimon
Subscribers: tschuett, courbet, llvm-commits, LiuChen3
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70874
This was breaking some bots:
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/include/llvm/Support/Error.h:483:5: required from ‘llvm::Expected<T>::Expected(OtherT&&, typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type*) [with OtherT = std::vector<llvm::exegesis::CodeTemplate>&; T = std::vector<llvm::exegesis::CodeTemplate>; typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type = void]’
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:238:20: required from here
/usr/include/c++/6/bits/stl_construct.h:75:7: error: use of deleted function ‘llvm::exegesis::CodeTemplate::CodeTemplate(const llvm::exegesis::CodeTemplate&)’
{ ::new(static_cast<void*>(__p)) _T1(std::forward<_Args>(__args)...); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llvm-svn: 374149
Summary:
This will help for PR32326.
This shows the well-known issue with `RBP` and `R13` as base registers.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits, RKSimon, andreadb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68646
llvm-svn: 374146
Summary:
This adds a `-max-configs-per-opcode` option to limit the number of
configs per opcode.
Reviewers: gchatelet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68642
llvm-svn: 374054