2c196bbc6b asserted that
`SmallVector::push_back` doesn't invalidate the parameter when it needs
to grow. Do the same for `resize`, `append`, `assign`, `insert`, and
`emplace_back`.
Differential Revision: https://reviews.llvm.org/D91744
- The new option, -arcmt-action, is a simple enum based option.
- The driver is modified to translate the existing -ccc-acmt-* options accordingly
Depends on D83298
Reviewed By: Bigcheese
Original patch by Daniel Grumberg.
Differential Revision: https://reviews.llvm.org/D83315
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).
Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D90039
This enables automatically parsing and generating CC1 arguments for options where two flags control the same field, e.g. -fexperimental-new-pass-manager and -fno-experimental new pass manager.
Reviewed By: Bigcheese, dexonsmith
Original patch by Daniel Grumberg.
Differential Revision: https://reviews.llvm.org/D83071
Describe in the BackEnd Developer's Guide. Instrument a few backends.
Remove an old unused timing facility. Add a null backend for timing
the parser.
Differential Revision: https://reviews.llvm.org/D91388
Merge existing marhsalling info kinds and add some primitives to
express flag options that contribute to a bitfield.
Depends on D82574
Original patch by Daniel Grumberg.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D82860
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
This reverts commit 09248a5d25.
Some builds are broken. I suspect a `static constexpr` in a class missing a
definition out of class (required pre-c++17).
Merge existing marhsalling info kinds and add some primitives to
express flag options that contribute to a bitfield.
Depends on D82574
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D82860
This ports a number of OpenCL and fast-math flags for floating point
over to the new marshalling infrastructure.
As part of this, `Opt{In,Out}FFlag` were enhanced to allow other flags to
imply them, via `DefaultAnyOf<>`. For example:
```
defm signed_zeros : OptOutFFlag<"signed-zeros", ...,
"LangOpts->NoSignedZero",
DefaultAnyOf<[cl_no_signed_zeros, menable_unsafe_fp_math]>>;
```
defines `-fsigned-zeros` (`false`) and `-fno-signed-zeros` (`true`)
linked to the keypath `LangOpts->NoSignedZero`, defaulting to `false`,
but set to `true` implicitly if one of `-cl-no-signed-zeros` or
`-menable-unsafe-fp-math` is on.
Note that the initial patch was written Daniel Grumberg.
Differential Revision: https://reviews.llvm.org/D82756
Some of these were found by running clang-format over the generated
code, although that complains about far more issues than I have fixed
here.
Differential Revision: https://reviews.llvm.org/D90937
This patch add some parsing and clause validity tests for the set directive.
It makes use of the possibility introduces in patch D90770 to check the restriction
were one of the default_async, device_num and device_type clauses is required but also
not more than once on the set directive.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D90771
Validity check introduce in D90241 are a bit too restrict and this patch propose to losen
them a bit. The duplicate clauses is now check only between the three allowed lists and between the
requiredClauses and allowedClauses lists. This allows to enable some check where a clause can be
required but also appear only once on the directive. We found these kind of restriction useful
on the set directive in OpenACC for example.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D90770
This patch adds some helper in the DirectiveLanguage wrapper to initialize it from
the RecordKeeper and validate the records. This simplify arguments in lots of function
since only the DirectiveLanguge is passed.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D90358
Patch fixes case when sched class has write and read variants belonging
to different processor models.
Differential revision: https://reviews.llvm.org/D89777
D9844 fixed a problem where the ss suffix in the AsmString "cmp${cc}ss"
was recognised as the X86 SS register, by only recognising a token as a
register name if it is "isolated", i.e. surrounded by separator
characters.
In the AMDGPU backend there are many operands like $clamp which expand
to an optional string " clamp" including the preceding space, so we want
to have AsmStrings including sequences like "vcc$clamp" where vcc is a
register name.
This patch relaxes the rules for an isolated token, to say that it's OK
if the token is immediately followed by a '$'.
Differential Revision: https://reviews.llvm.org/D90315
Check for duplicate clauses associated with directive. Clauses can appear only once
in the 4 lists associated with each directive (allowedClauses, allowedOnceClauses,
allowedExclusiveClauses, requiredClauses). Duplicates were already present (removed with this
patch) or were introduce in new patches by mistake (D89861).
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D90241
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.
Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.
Added more tests to tables.s
Differential Revision: https://reviews.llvm.org/D89797
In CodeGenDAGPatterns.cpp we were relying upon TypeSize comparison
operators for ordering types, when we can actually just use the known
minimum size since the scalable property is already being taken into
account. Also, in TypeInfer::EnforceSameSize I fixed some implicit
TypeSize->uint64_t casts by changing the code to test the equality
of TypeSize objects instead.
In other places I have replaced calls to getSizeInBits() with
getFixedSizeInBits() because we are only ever expecting integer values.
Differential Revision: https://reviews.llvm.org/D88947
This makes diffing with the manual tables easier. And if we ever
directly use the autogenerated tables instead of the manual tables
we'll need them to be in sorted order for the binary search.
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
Commit 6e56046f65 may trigger SEGV in llvm-tablegen if the latter
is built with -DLLVM_OPTIMIZED_TABLEGEN=OFF. The reason of SEGV was
accessing stale memory after expansion of std::vector.
Summary: Since willreturn will soon be added as default attribute, we can end up with both noreturn and willreturn on the same intrinsic. This was exposed by llvm.wasm.throw which has IntrNoReturn.
Reviewers: jdoerfert, arsenm
Differential Revision: https://reviews.llvm.org/D88644
This change implements generation of a function which may be used by a backend to check if a given instruction is supported for a specific subtarget.
Reviewers: sdesmalen
Differential Revision: https://reviews.llvm.org/D88214
When nesting INSERT_SUBREG and EXTRACT_SUBREG, GlobalISelEmitter would
fail to find the register class of the nested node. This patch fixes
that for registers with subregs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D88487
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This patch adds support for the lxvp, lxvpx, plxvp, stxvp, stxvpx and pstxvp
instructions in the PowerPC backend. These instructions allow loading and
storing VSX register pairs. This patch also adds the VSRp register class
definition needed for these instructions.
Differential Revision: https://reviews.llvm.org/D84359
When generating matching tables for GlobalISel, TableGen would output
"::zero_reg" whenever encountering the zero_reg, which in turn would
result in compilation error. This patch fixes that by instead outputting
NoRegister (== 0), which is the same result that TableGen produces when
generating matching tables for ISelDAG.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86215
Building LLVM with -DEXPENSIVE_CHECKS fails with the following error
message with libstdc++ in debug mode:
Error: comparison doesn't meet irreflexive requirements,
assert(!(a < a)).
The patch fixes the comparison function SizeOrder by returning false
when comparing two equal items.
The "name" of a non-leaf complex pattern (MY_PAT $op1, $op2) is
"MY_PAT:op1:op2" and the ones with same "name" represent same operand.
Add 'same operand check' for this case.
Differential Revision: https://reviews.llvm.org/D87351
Predicates with 'let PredicateCodeUsesOperands = 1' want to examine
matched operands. When we encounter predicate code that uses operands,
analyze its named operand arguments and create a map between argument
index and name. Later, when leaf node with name is encountered, emit
GIM_RecordNamedOperand that will store that operand at its argument
index in operand list. This operand list will be an argument to c++
code of the predicate.
Differential Revision: https://reviews.llvm.org/D87285
Tablegen does not have link time dependencies on MC. Having llvm-tblgen
depend on it causes it to be rebuilt in the gn build every time somebody
touches any cpp file in llvm/lib/MC* or llvm/lib/DebugInfo/Codeview*.
Touching tablegen invalidates most of the rest of the build, and
re-running it takes a while. This is is annoying for me when swapping
between branches that touch CodeView logic.
This dep was added to LLVMBuild.txt back in 2018, and presumably it was
carried over into the gn build.
Differential Revision: https://reviews.llvm.org/D87553
This patch adds NoUndef to Intrinsics.td.
The attribute is attached to llvm.assume's operand, because llvm.assume(undef)
is UB.
It is attached to pointer operands of several memory accessing intrinsics
as well.
This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check
unnecessary, so it is removed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86576
When optimizing the table, PointerToAnyOperandMatchers would be
incorrectly reported as identical even though they have different
SizeInBits values. This bug was due to failing to overload the
isIdentical() method, which this patch addresses.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86199
Intrinsic properties can now be set to default and applied to all
intrinsics. If the attributes are not needed, the user can opt-out by
setting the DisableDefaultAttributes flag to true.
Differential Revision: https://reviews.llvm.org/D70365
This is to initially handleg immAllOnesV, which should match
G_BUILD_VECTOR or G_BUILD_VECTOR_TRUNC. In the future, it could be
used for other patterns cases that map to multiple G_* instructions,
such as G_ADD and G_PTR_ADD.
This commit introduced a non-trivial compile time regression that needs
to be addressed: https://reviews.llvm.org/D70365#2227627
Given that it is unclear how long that will take, I'll revert it for
now.
This reverts commit eedf18fc1f.
Intrinsic properties can now be set to default and applied to all
intrinsics. If the attributes are not needed, the user can opt-out by
setting the DisableDefaultAttributes flag to true.
Differential Revision: https://reviews.llvm.org/D70365
This patch fixes a bug which skipped
adding predicate matcher for a pattern in many cases.
For example, if predicate is Load and
its memoryVT is non-null then the loop
continues and never reaches to the end which
adds the predicate matcher. This patch moves the
matcher addition to the top of the loop
so that it gets added regardless of contextual checks
later in the loop.
Other way to fix this issue is to remove all "continue" statements
in checks and let the loop continue till end.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D83034
Use the TableGen directive back-end to generate code for the clauses unparsing.
Reviewed By: sscalpone, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D85851
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
The assembly parser "canonicalizes" the mnemonics it processes at an
early level by making them lowercase. The goal of this is presumably to
allow assembly to be case-insensitive. However, if one declares an
instruction with a mnemonic using uppercase letters, then it will
never get matched, since the generated lookup tables for the
AsmMatcherEmitter didn't lower() their inputs. This made it difficult to
have instructions that get printed using a mnemonic that includes
uppercase letters, since they could not be parsed.
To fix this problem, this patch adds a few calls to lower() to make the
lookup tables used in AsmMatcherEmitter be case-insensitive. This allows
instruction mnemonics with uppercase letters to be parsed.
Differential Revision: https://reviews.llvm.org/D85858
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.
I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.
Also doesn't try to handle undef elements like the DAG version.
This patch adds the translation of the proc_bind clause in a
parallel operation.
The values that can be specified for the proc_bind clause are
specified in the OMP.td tablegen file in the llvm/Frontend/OpenMP
directory. From this single source of truth enumeration for
proc_bind is generated in llvm and mlir (used in specification of
the parallel Operation in the OpenMP dialect). A function to return
the enum value from the string representation is also generated.
A new header file (DirectiveEmitter.h) containing definitions of
classes directive, clause, clauseval etc is created so that it can
be used in mlir as well.
Reviewers: clementval, jdoerfert, DavidTruby
Differential Revision: https://reviews.llvm.org/D84347
This patch takes advantage of the directive information and tablegen generation
to replace the clauses class parse tree and in the dump parse tree sections.
Reviewed By: sscalpone
Differential Revision: https://reviews.llvm.org/D85549
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.
Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.
Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
This patch takes advantage of the directive information and tablegen generation
to replace the clauses class parse tree and in the dump parse tree sections.
Reviewed By: sscalpone
Differential Revision: https://reviews.llvm.org/D85549
This patch remove duplicated code between the check-omp-structure and the check-acc-structure
and unify it into a check-directive-structure templated class.
Reviewed By: kiranchandramohan, sscalpone, ichoyjx
Differential Revision: https://reviews.llvm.org/D85104
Add wrapper classes to to access record's fields. This makes it easier to
pass record information to the diverse functions for code generation.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D84612
This patch refactors the llvm tools namely, llvm-stress and sancov,
as well as the llvm TableGen utility, to use the new InitLLVM
interface which encapsulates PrettyStackTrace.
This is from https://reviews.llvm.org/D70702, but only for LLVM.
Reviewed-by: Kai
Differential Revision: https://reviews.llvm.org/D83484
The DAG behavior allows matchching input patterns with a single result
to the first result of an output instruction that defines multiple
results. The remaining defs are implicitly dead.
This starts to fix using manual selection for AMDGPU add/sub (although
it's still needed, mostly because it's also still needed for
G_PTR_ADD).
This is very similar to 243970d03cace2, but handling a slightly
different form of predicated operations. When starting with a pattern of
the form select(p, BinOp(x, y), x), Instcombine will often transform
this to BinOp(x, select(p, y, 0)), where 0 is the identity value of the
binop (0 for adds/subs, 1 for muls, -1 for ands etc). This adds the
patterns that transforms those back into predicated binary operations.
There is also a very minor adjustment to tablegen null_frag in here, to
allow it to also be recognized as a PatLeaf node, so that it can be used
in MVE_TwoOpPattern to easily exclude the cases where we do not need the
alternate transform.
Differential Revision: https://reviews.llvm.org/D84091
In clang <3.9 the `unique_ptr` constructor that is supposed to allow
for Derived to Base conversion does not work. Remove this if we drop
support for such configurations.
This is the same fix as in fda901a987, and it updates the comments
to better reflect the actual issue. The same thing reproduces with
libc++ with older clangs.
This is a workaround for a bug in older versions of Clang when. The
constructor that is supposed to allow for Derived to Base conversion
does not work. Remove this if we drop support for such configurations.
This gives a nice error if you accidentally try to use an empty list for
the RegTypes of a RegisterClass.
Differential Revision: https://reviews.llvm.org/D78285
The size of VTList that is pushed into this container is usually 1, but
often 6 or 7. Change the vector to SmallVector to eliminate frequent
mallocs. This happens hundreds of thousands of times in each tablegen
execution during the LLVM/clang build.
https://reviews.llvm.org/D83849
Currently custom code predicates can only really be used for
contextless checks tied to a single instruction (e.g. check the def
for hasOneUse). If you do want to inspect the input instructions in
the source pattern, you cannot without re-verifying the opcode and
type checks implied by the patterns, since this check was emitted
before any operand constraints. Really, these are pattern level
predicates that implicitly depend on the instruction and operand
checks.
Introduce a filtering function so the custom predicate is emitted
last. I'm not sure this is the most elegant solution. It seems like
this is really a different thing from the InstructionMatcher/IPM_
predicate kinds. I initially tried keeping this in a separate
predicate list, but that also seemed awkward.
This only half fixes the problem I'm trying to solve. The AMDGPU
pattern I'm attempting to port also uses the PredicateCodeUsesOperands
feature to allow checks on the source operands when the input pattern
is commuted. Really the emitter should reject the pattern since it
doesn't handle this case, but at this point it would be more
productive to just implement this.
This was emitting the raw value for the reg class ID with a comment
for the actual class name. Switch to emitting the qualified enum name
instead, which obviates the need for the comment and also helps keep
the lit tests on the emitter output more stable.
Summary:
This patch is enabling the generation of clauses enum sets for semantics check in Flang through
tablegen. Enum sets and directive - sets map is generated by the new tablegen infrsatructure for OpenMP
and other directive languages.
The semantic checks for OpenMP are modified to use this newly generated map.
Reviewers: DavidTruby, sscalpone, kiranchandramohan, ichoyjx, jdoerfert
Reviewed By: DavidTruby, ichoyjx
Subscribers: mgorny, yaxunl, hiraditya, guansong, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83326
Summary:
Diff D83176 moved the last piece of code from OMPConstants.cpp and now this file was only
useful to include the tablegen generated file. This patch replace OMPConstants.cpp with OMP.cpp
generated by tablegen.
Reviewers: sstefan1, jdoerfert, jdenny
Reviewed By: sstefan1
Subscribers: mgorny, yaxunl, hiraditya, guansong, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83583
Summary:
Change the test in isAllowedClauseForDirective from if with multiple conditions
to a main switch on directive and then switches on clause for each directive. Version
check is still done with a condition in the return statment.
Reviewers: jdoerfert, jdenny
Reviewed By: jdenny
Subscribers: yaxunl, guansong, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83363
Summary:
Generate the isAllowedClauseForDirective function from tablegen. This patch introduce
the VersionedClause in the tablegen file so that clause can be encapsulated in this class to
specify a range of validity on a directive.
VersionedClause has default minVersion, maxVersion so it can be used without them or
minVersion.
Reviewers: jdoerfert, jdenny
Reviewed By: jdenny
Subscribers: yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82982
Summary:
Follow up to D81736. Move getOpenMPDirectiveKind, getOpenMPClauseKind, getOpenMPDirectiveName and
getOpenMPClauseName to the new tablegen code generation. The code is generated in a new file named OMP.cpp.inc
Reviewers: jdoerfert, jdenny, thakis
Reviewed By: jdoerfert, jdenny
Subscribers: mgorny, yaxunl, hiraditya, guansong, sstefan1, llvm-commits, thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82405
In RISC-V vector extension, users could group multiple vector registers
as one pseudo register. In mixed width operations, users could use
partial vector registers to reduce the register pressure. The parameter
to control register grouping and partial use is called LMUL. LMUL is a
part of the type. So, we have a bunch of vector types. In order to
support all these types, we need new MVT types in LLVM. In this patch, I
added several MVT types that are used in RISC-V vector implementation.
This is a standalone patch for MVT types without RISC-V related implementation.
Differential revision: https://reviews.llvm.org/D81724
Summary:
Separate introduction of IntrNoFree property as suggested in D70365
Reviewers: arsenm, nhaehnle
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82587
This change includes the following:
- Add additional information in the relevant table-gen files to encode
the necessary information to automatically parse the argument into a
CompilerInvocation instance and to generate the appropriate command
line argument from a CompilerInvocation instance.
- Extend OptParserEmitter to emit the necessary macro tables as well as
constant tables to support parsing and generating command line
arguments for options that provide the necessary information.
- Port some options to use this new system for parsing and generating
command line arguments.
Differential Revision: https://reviews.llvm.org/D79796
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.
Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp
Reviewed By: jdoerfert, jdenny
Subscribers: arphaman, martong, cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #openmp, #clang
Differential Revision: https://reviews.llvm.org/D81736
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.
Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp
Reviewed By: jdoerfert, jdenny
Subscribers: cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #openmp, #clang
Differential Revision: https://reviews.llvm.org/D81736
These are documented as using modrm byte of 0xe8, 0xf0, and 0xf8
respectively. But hardware ignore bits 2:0. So 0xe9-0xef is treated
the same as 0xe8. Similar for the other two.
Fixing this required adding 8 new formats to the X86 instructions
to convey this information. Could have gotten away with 3, but
adding all 8 made for a more logical conversion from format to
modrm encoding.
I renumbered the format encodings to keep the register modrm
formats grouped together.
Summary:
Adds two features to the generated rule disable option:
- '*' - Disable all rules
- '!<foo>' - Re-enable rule(s)
- '!foo' - Enable rule named 'foo'
- '!5' - Enable rule five
- '!4-9' - Enable rule four to nine
- '!foo-bar' - Enable rules from 'foo' to (and including) 'bar'
(the '!' is available to the generated disable option but is not part of the underlying and determines whether to call setRuleDisabled() or setRuleEnabled())
This is intended to support unit testing of combine rules so
that you can do:
GeneratedCfg.setRuleDisabled("*")
GeneratedCfg.setRuleEnabled("foo")
to ensure only a specific rule is in effect. The rule is still
required to be included in a combiner though
Also added --...-only-enable-rule=X,Y which is effectively an
alias for --...-disable-rule=*,!X,!Y and as such interacts
properly with disable-rule.
Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81889
Summary:
Adds the ability to add members to a generated combiner via
a State base class. In the current AArch64PreLegalizerCombiner
this is used to make Helper available without having to
provide it to every call.
As part of this, split the command line processing into a
separate object so that it still only runs once even though
the generated combiner is constructed more frequently.
Depends on D81862
Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81863
Summary:
This allows combiners to delegate to other helpers or depend
on additional information. It's not great as an overall
solution though as callers must provide the argument on every call, even for
static data like an additional helper. Another patch will follow to
support additional members of the generated combiner.
Reviewers: aditya_nandakumar, bogner, aemerson, paquette, volkan, arsenm
Reviewed By: aditya_nandakumar
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81862
Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::HandleByVal` without marking it `override`.
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81365
Summary: This is a follow up on D81196, fixing one more call site.
Reviewers: courbet
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81268
When `variable_ops` is specified in `InOperandList` of instruction,
it behaves as expected, i.e., does not count as operand.
So for `(ins variable_ops)` instruction description will have 0
operands. However when used in OutOperandList it is counted as
operand. So `(outs variable_ops)` results in instruction with
one def.
This patch makes behavior of `variable_ops` in `out` list to match
that of `in` list.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D81095
Summary:
TableGen interprets braces ('{}') in the asm string of instruction aliases as
variants but when defining aliases with literal braces they have to be escaped
to prevent them being removed.
Braces are escaped with '\\', for example:
def FooBraces : InstAlias<"foo \\{$imm\\}", (foo IntOperand:$imm)>;
Although when TableGen is emitting the assembly writer (-gen-asm-writer)
the AsmString that gets emitted is:
AsmString = "foo \{$\x01\}";
In c/c++ braces don't need to be escaped which causes compilation
warnings:
warning: use of non-standard escape character '\{' [-Wpedantic]
This patch fixes the issue by unescaping the flattened alias asm string
in the asm writer, by replacing '\{\}' with '{}'.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D79991
- This allow us to specify the (minimal) alignment on an intrinsic's
arguments and, more importantly, the return value.
Differential Revision: https://reviews.llvm.org/D80422
- Argument attribute needs specifiying through `ArgIndex<n>`
(corresponding to `FirstArgIndex`) to distinguish explicitly from the
index number from the overloaded type list.
- In addition, `RetIndex` (corresponding to `ReturnIndex`) and
`FuncIndex` (corresponding to `FunctionIndex`) are introduced for us
to associate attributes on the return value and potentially function
itself.
Differential Revision: https://reviews.llvm.org/D80422
This is quite expensive and it's already available.
Just ReadLegalValueTypes is taking 4 seconds for me in a debug build
for AMDGPU's -gen-instr-info, and this was introducing a second call.
We need to use it to handle <16 x double> indirect indexes
in the AMDGPU BE.
The only visible change from adding it is in ARM cost model.
To me it looks reasonable. With doubling a vector size it
quadruples the cost up to the size 8 and then it did only
double it. Now it also quadruples, which seems a logical
progression to me.
Actual AMDGPU code is to follow, this is a common part, plus
load/store legalization in the AMDGPU BE not to break what
works now.
Differential Revision: https://reviews.llvm.org/D79952
Summary:
In TableGen's instruction selection table generator, references to
register classes were handled by generating a matcher table entry in the
form of "EmitStringInteger, MVT::i32, 'RegisterClassID'". This ID is in
fact the enum integer value corresponding to the register class.
However, both the table generator and the table consumer
(SelectionDAGISel) assume that this ID is less than or equal to 127,
i.e. at most 7 bits. Values greater than this threshold cause completely
wrong behaviours in the instruction selection process.
This patch adds a check to determine if the enum integer value is
greater than the limit of 127. In finding so, the generator emits an
"EmitInteger" instead, which properly supports values with arbitrary
sizes.
Commit f8d044bbcf fixed the very same bug
for register subindices. The present patch now extends this cover to
register classes.
Reviewers: rampitec
Reviewed By: rampitec
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79705
Set the right target name in clang/examples/Attribute.
Add a missing dependency in the TableGen GlobalISel sublibrary.
Skip building the Bye pass plugin example on windows; plugins
that should have undefined symbols that are found in the host
process aren't supported on windows - this matches what was done
for a unit test in bc8e442188.
This means AttrBuilder will always create a sorted set of attributes and
we can skip the sorting step. Sorting attributes is surprisingly
expensive, and I recently made it worse by making it use array_pod_sort.
This was hitting the default instruction constraint code which uses
the register classes in the instruction def, which REG_SEQUENCE does
not have.
Fixes not constraining the register class for AMDGPU fneg/fabs
patterns, which would fail when the use was another generic,
unconstrained instruction.
Another oddity I noticed is that the temporary registers are created
with an unnecessary, but incorrect 16-bit LLT but this shouldn't
matter.
I'm also still unclear why root and sub-instructions have to be
handled differently.
Summary:
Some compressed instructions match against negative values; store
immediates as a signed value such that these patterns will now match
the intended instructions.
Reviewers: asb, lenary, PaoloS
Reviewed By: asb
Subscribers: rbar, johnrusso, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76767
Follow-up of D72172 and D72180
This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.
Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.
```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc: 20000: bl .+4
x86: 20000: callq 0
// Ideal output
aarch64: 20000: bl 0x20000
ppc: 20000: bl 0x20004
x86: 20000: callq 0x20005
// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc: 20000: bl 0x20004
x86: 20000: callq 20005
```
In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):
```
case 12:
// CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
- printPCRelImm(MI, 0, O);
+ printPCRelImm(MI, Address, 0, O);
return;
```
Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76574
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This patch rewrites the RegisterBankEmitter class to derive
RegisterClassHierarchy from CodeGenTarget::getRegBank() rather than
constructing our own copy. All are now accessed through a const
reference.
Differential Revision: https://reviews.llvm.org/D76006
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).
AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.
To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.
This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html
Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.
At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.
Differential Revision: https://reviews.llvm.org/D74338
Lots of headers pass around MemoryBuffer objects, but very few open
them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h & FileSystem.h:
$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr
254 - ../llvm/include/llvm/Support/FileSystem.h
253 - ../llvm/include/llvm/Support/Chrono.h
237 - ../llvm/include/llvm/Support/NativeFormatting.h
237 - ../llvm/include/llvm/Support/FormatProviders.h
192 - ../llvm/include/llvm/ADT/StringSwitch.h
190 - ../llvm/include/llvm/Support/FormatVariadicDetails.h
...
This requires duplicating the file_t typedef, which is unfortunate. I
sunk the choice of mapping mode down into the cpp file using variable
template specializations instead of class members in headers.
Summary:
The type used to represent functional units in MC is
'unsigned', which is 32 bits wide. This is currently
not a problem in any upstream target as no one seems
to have hit the limit on this yet, but in our
downstream one, we need to define more than 32
functional units.
Increasing the size does not seem to cause a huge
size increase in the binary (an llc debug build went
from 1366497672 to 1366523984, a difference of 26k),
so perhaps it would be acceptable to have this patch
applied upstream as well.
Subscribers: hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71210
isPrefix was added to support the patches to align branches.
it relies on a switch over instruction names.
This moves those opcodes to a new format so the information is
tablegen and we can just check for a specific value in some bits
in TSFlags instead.
I've left the other function in place for now so that the
existing patches in phabricator will still work. I'll work with
the owner to get them migrated.
This was checking for default operands in the current DAG instruction,
rather than the correct result operand list. I'm not entirly sure how
this managed to work before, but was failing for me when multiple
default operands were overridden.
Summary:
Previously TableGen would crash trying to print the undefined value as
an integer.
Change-Id: I3900071ceaa07c26acafb33bc49966d7d7a02828
Reviewers: nhaehnle
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74210
Summary:
In the DAG pattern backend, `SimplifyTree` simplifies a pattern by
removing bitconverts between two identical types. But that function is
also run on the fragments list in instances of `PatFrags`, in which
the types haven't been specified yet. So the input and output of the
bitconvert always evaluate to the empty set of types, which makes them
compare equal. So the test always passes, and bitconverts are
unconditionally removed from the PatFrag RHS.
Fixed by spotting the empty type set and using it to inhibit the
optimization.
Reviewers: nhaehnle, hfinkel
Reviewed By: nhaehnle
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74627
Tablegen's DAGISelMatcher emits integers in a VBR format,
so if an integer is below 128 it can fit into a single
byte, otherwise high bit is set, next byte is used etc.
MatcherTable is essentially an unsigned char table. When
SelectionDAGISel parses the table it does a reverse translation.
In a situation when numeric value of an integer to emit is
unknown it can be emitted not as OPC_EmitInteger but as
OPC_EmitStringInteger using a symbolic name of the value.
In this situation the value should not exceed 127.
One of the situations when OPC_EmitStringInteger is used is
if we need to emit a subreg into a matcher table. However,
number of subregs can exceed 127. Currently last defined subreg
for AMDGPU is 192. That results in a silent bug in the ISel
with matcher reading from an invalid offset.
Fixed this bug to emit actual VBR encoded value for a subregs
which value exceeds 127.
Differential Revision: https://reviews.llvm.org/D74368
Summary:
this patch makes tablegen generate llvm attributes in a more generic and simpler (at least to me).
changes: make tablegen generate
...
ATTRIBUTE_ENUM(Alignment,align)
ATTRIBUTE_ENUM(AllocSize,allocsize)
...
which can be used to generate most of what was previously used and more.
Tablegen was also generating attributes from 2 identical files leading to identical output. so I removed one of them and made user use the other.
Reviewers: jdoerfert, thakis, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72455
Summary:
this patch makes tablegen generate llvm attributes in a more generic and simpler (at least to me).
changes: make tablegen generate
...
ATTRIBUTE_ENUM(Alignment,align)
ATTRIBUTE_ENUM(AllocSize,allocsize)
...
which can be used to generate most of what was previously used and more.
Tablegen was also generating attributes from 2 identical files leading to identical output. so I removed one of them and made user use the other.
Reviewers: jdoerfert, thakis, aaron.ballman
Reviewed By: aaron.ballman
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72455
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
This changes the generated (Instr|Asm|Reg|Regclass)Name tables from this
form:
extern const char HexagonInstrNameData[] = {
/* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
/* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
/* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
/* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
[...]
};
...to this:
extern const char HexagonInstrNameData[] = {
/* 0 */ "G_FLOG10\0"
/* 9 */ "ENDLOOP0\0"
/* 18 */ "V6_vdd0\0"
/* 26 */ "PS_vdd0\0"
[...]
};
This should make debugging and exploration a lot easier for mortals,
while providing a significant compile-time reduction for common compilers.
To avoid issues with low implementation limits, this is disabled by
default for visual studio.
To force output one way or the other, pass
`--long-string-literals=<bool>` to `tablegen`
Reviewers: mstorsjo, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D73044
A variation of this patch was originally committed in ce23515f5a and
then reverted in e464b31c due to build failures.
This previously only handled EXTRACT_SUBREGs from leafs, such as
operands directly in the original output. Handle extracting from a
result instruction.
This would hit the "Biggest class wasn't first" assert in
getMatchingSubClassWithSubRegs in a future patch for EXTRACT_SUBREG
handling.
Mips defines 4 identical register classes (MSA128B, MSA128H, MSA128BW,
MSA128D). These have the same set of registers, and only differ by the
isel type. I believe this is an ill formed way of defining registers,
that probably is just to work around the inconvenience of mixing
different types in a single register class in DAG patterns.
Since these all have the same size, they would all sort to the
beginning, but you would not necessarily get the same super register
at the front as the assert enforces. Breaking the ambiguity by also
sorting by name doesn't work, since each of these register classes all
want to be first. Force sorting of the original register class if the
size is the same.
This changes the generated (Instr|Asm|Reg|Regclass)Name tables from this
form:
extern const char HexagonInstrNameData[] = {
/* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
/* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
/* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
/* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
[...]
};
...to this:
extern const char HexagonInstrNameData[] = {
/* 0 */ "G_FLOG10\0"
/* 9 */ "ENDLOOP0\0"
/* 18 */ "V6_vdd0\0"
/* 26 */ "PS_vdd0\0"
[...]
};
This should make debugging and exploration a lot easier for mortals,
while providing a significant compile-time reduction for common compilers.
To avoid issues with low implementation limits, this is disabled by
default for visual studio or when cross-compiling.
To force output one way or the other, pass
`--long-string-literals=<bool>` to `tablegen`
Reviewers: mstorsjo, rnk
Subscribers: llvm-commit
Differential Revision: https://reviews.llvm.org/D73044
Summary:
In the DFAPacketizer we copy the Transitions array
into a map in order to later access the transitions
based on a "Current State/Action" pair as a key.
This map lives in the Automaton object used by the DFAPacketizer.
It is never changed during the life of the object after
having been created during the creation of the Automaton
itself.
This map creation can make the creation of a DFAPacketizer
quite expensive if the target contains a considerable
amount of transition states.
Considering that TableGen already generates a
sorted list of transitions by State/Action pairs
we could just use that directly in our Automaton
and search entries with std::lower_bound instead of copying
it in a map and paying the execution time and memory cost.
Reviewers: jmolloy, ThomasRaoux
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72682
These return temporary Optional<> values which are immediately
destroyed. I'm not sure why no sanitizers seem to have caught this,
but I encountered crashes on these in a future patch.
The maps for dealing with the relationships between different register
classes and subregister indexes rely on unique pointers for every
class/index. By constructing a second copy of CodeGenRegBank, two
different pointer values existed for a given subregister depending on
where you were querying.
Use the existing CodeGenRegBank owned by the CodeGenTarget instead of
constructing a second copy. This avoids incorrectly failing map
lookups in a future change.
In x86Disassembler{OneByte,TwoByte,...}Codes,
"/* EmptyTable */" is very common. Omitting it saves lots of space.
Also, there is no need to display a table entry in multiple lines.
It is also common that the whole OpcodeDecision is { MODRM_ONEENTRY, 0}.
Make use of zero-initialization.
Add a predicate to MCInstDesc that allows tools to determine whether an
instruction authenticates a pointer. This can be used by diagnostic
tools to hint at pointer authentication failures.
Differential Revision: https://reviews.llvm.org/D70329
rdar://55089604
For arguments that are not expected to be materialized with
G_CONSTANT, this was emitting predicates which could never match. It
was first adding a meaningless LLT check, which would always fail due
to the operand not being a register.
Infer the cases where a literal should check for an immediate operand,
instead of a register This avoids needing to invent a special way of
representing timm literal values.
Also handle immediate arguments in GIM_CheckLiteralInt. The comments
stated it handled isImm() and isCImm(), but that wasn't really true.
This unblocks work on the selection of all of the complicated AMDGPU
intrinsics in future commits.
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.
Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.
Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.
Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.
Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.
One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
Summary:
Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions.
Also add a regression test for llvm.org/PR44272, since this finishes fixing it.
Reviewers: thakis, rnk
Reviewed By: thakis
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72417
Summary:
GIMatchTree's job is to build a decision tree by zipping all the
GIMatchDag's together.
Each DAG is added to the tree builder as a leaf and partitioners are used
to subdivide each node until there are no more partitioners to apply. At
this point, the code generator is responsible for testing any untested
predicates and following any unvisited traversals (there shouldn't be any
of the latter as the getVRegDef partitioner handles them all).
Note that the leaves don't always fit into partitions cleanly and the
partitions may overlap as a result. This is resolved by cloning the leaf
into every partition it belongs to. One example of this is a rule that can
match one of N opcodes. The leaf for this rule would end up in N partitions
when processed by the opcode partitioner. A similar example is the
getVRegDef partitioner where having rules (add $a, $b), and (add ($a, $b), $c)
will result in the former being in the partition for successfully
following the vreg-def and failing to do so as it doesn't care which
happens.
Depends on D69151
Fixed the issues with the windows bots which were caused by stdout/stderr
interleaving.
Reviewers: bogner, volkan
Reviewed By: volkan
Subscribers: lkail, mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69152
Copy the logic from the existing handling in the DAG matcher emittter.
This will enable some AMDGPU pattern cleanups without breaking
GlobalISel tests, and eventually handle importing more patterns.
The test is a bit annoying since the sections seem to randomly sort
themselves if anything else is added in the future.
All the windows bots are failing match-tree.td and there's no obvious cause that
I can see. It's not just the %p formatting problem. My best guess is that
there's an ordering issue too but I'll need further information to figure that
out. Revert while I'm investigating.
This reverts commit 64f1bb5cd2 and 77d4b5f5fe
Summary:
GIMatchTree's job is to build a decision tree by zipping all the
GIMatchDag's together.
Each DAG is added to the tree builder as a leaf and partitioners are used
to subdivide each node until there are no more partitioners to apply. At
this point, the code generator is responsible for testing any untested
predicates and following any unvisited traversals (there shouldn't be any
of the latter as the getVRegDef partitioner handles them all).
Note that the leaves don't always fit into partitions cleanly and the
partitions may overlap as a result. This is resolved by cloning the leaf
into every partition it belongs to. One example of this is a rule that can
match one of N opcodes. The leaf for this rule would end up in N partitions
when processed by the opcode partitioner. A similar example is the
getVRegDef partitioner where having rules (add $a, $b), and (add ($a, $b), $c)
will result in the former being in the partition for successfully
following the vreg-def and failing to do so as it doesn't care which
happens.
Depends on D69151
Reviewers: bogner, volkan
Reviewed By: volkan
Subscribers: lkail, mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69152
This assumed a single pattern if there was a predicate. Relax this a
bit, and allow multiple patterns as long as they have the same class.
This was only broken for the DAG path. GlobalISel seems to have
handled this correctly already.
Summary:
This is used by the extending_loads combine to tell the apply step which
use is the preferred one to fold and the other uses should be re-written
to consume.
Depends on D69117
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69147
This reverts commit e62e760f29.
The issue @uweigand raised should have been fixed by iterating over the
vector that owns the operand list data instead of the FoldingSet.
The MSVC issue raised by @thakis should have been fixed by relaxing the
regexes a little. I don't have a Windows machine available to test that so
I tested it by using `perl -p -e 's/0x([0-9a-f]+)/\U\1\E/g' to convert the
output of %p to the windows style.
I've guessed at the issue @phosek raised as there wasn't enough information
to investigate it. What I think is happening on that bot is the -debug
option isn't available because the second stage build is a release build.
I'm not sure why other release-mode bots didn't report it though.
and follow-on patches.
This is breaking a few build bots and local builds with follow-up already
on the patch thread.
This reverts commits 390c8baa54 and
520e3d66e7.
Summary:
When we build the walk across these DAG's we need to be able to reach every node
from the roots. Flip and traversal edges (so that use->def becomes def->uses)
that make nodes unreachable. Note that early on we'll just error out on these
flipped edges as def->uses edges are more complicated to match due to their
one->many nature.
Depends on D69077
Reviewers: volkan, bogner
Subscribers: llvm-commits
Summary:
The MatchDag structure is a representation of the checks that need to be
performed and the dependencies that limit when they can happen.
There are two kinds of node in the MatchDag:
* Instrs - Represent a MachineInstr
* Predicates - Represent a check that needs to be performed (i.e. opcode, is register, same machine operand, etc.)
and two kinds of edges:
* (Traversal) Edges - Represent a register that can be traversed to find one instr from another
* Predicate Dependency Edges - Indicate that a predicate requires a piece of information to be tested.
For example, the matcher:
(match (MOV $t, $s),
(MOV $d, $t))
with MOV declared as an instruction of the form:
%dst = MOV %src1
becomes the following MatchDag with the following instruction nodes:
__anon0_0 // $t=getOperand(0), $s=getOperand(1)
__anon0_1 // $d=getOperand(0), $t=getOperand(1)
traversal edges:
__anon0_1[src1] --[t]--> __anon0_0[dst]
predicate nodes:
<<$mi.getOpcode() == MOV>>:$__anonpred0_2
<<$mi.getOpcode() == MOV>>:$__anonpred0_3
and predicate dependencies:
__anon0_0 ==> __anonpred0_2[mi]
__anon0_0 ==> __anonpred0_3[mi]
The result of this parse is currently unused but can be tested
using -gicombiner-stop-after-parse as done in parse-match-pattern.td. The
dump for testing includes a graphviz format dump to allow the rule to be
viewed visually.
Later on, these MatchDag's will be used to generate code and to build an
efficient decision tree.
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: arsenm, mgorny, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69077
Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.
This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.
Reviewers: asb, luismarques
Reviewed By: luismarques
Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67493
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
AMDGPU was the last in tree target to use this tablegen mode. I plan to
split up the global intrinsic enum similar to the way that clang
diagnostics are split up today. I don't plan to build on this mode.
Reviewers: arsenm, echristo, efriedma
Reviewed By: echristo
Differential Revision: https://reviews.llvm.org/D71318
Before this change, the *InstPrinter.cpp files of each target where some
of the slowest objects to compile in all of LLVM. See this snippet produced by
ClangBuildAnalyzer:
https://reviews.llvm.org/P8171$96
Search for "InstPrinter", and see that it shows up in a few places.
Tablegen was emitting a large switch containing a sequence of operand checks,
each of which created many conditions and many BBs. Register allocation and
jump threading both did not scale well with such a large repetitive sequence of
basic blocks.
So, this change essentially turns those control flow structures into
data. The previous structure looked like:
switch (Opc) {
case TGT::ADD:
// check alias 1
if (MI->getOperandCount() == N && // check num opnds
MI->getOperand(0).isReg() && // check opnd 0
...
MI->getOperand(1).isImm() && // check opnd 1
AsmString = "foo";
break;
}
// check alias 2
if (...)
...
return false;
The new structure looks like:
OpToPatterns: Sorted table of opcodes mapping to pattern indices.
\->
Patterns: List of patterns. Previous table points to subrange of
patterns to match.
\->
Conds: The if conditions above encoded as a kind and 32-bit value.
See MCInstPrinter.cpp for the details of how the new data structures are
interpreted.
Here are some before and after metrics.
Time to compile AArch64InstPrinter.cpp:
0m29.062s vs. 0m2.203s
size of the obj:
3.9M vs. 676K
size of clang.exe:
97M vs. 96M
I have not benchmarked disassembly performance, but typically
disassemblers are bottlenecked on IO and string processing, not alias
matching, so I'm not sure it's interesting enough to be worth doing.
Reviewers: RKSimon, andreadb, xbolva00, craig.topper
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D70650
This reverts commit 3f76260dc0.
Breaks at least these tests on Windows:
Clang :: Driver/clang-offload-bundler.c
Clang :: Driver/clang-offload-wrapper.c
For lldb and dsymutil, the command guide is essentially a copy of its
help output generated by libOption. Making sure the two stay in sync is
tedious and error prone. Given that we already generate the help from a
tablegen file, we might as well generate the RST as well.
This adds a tablegen backend for generating Sphinx/RST command guides
from the tablegen file.
Differential revision: https://reviews.llvm.org/D70610
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
AMDGPU has some atomic instructions that do not return the previous
result, and can only be selected if there are no uses. The source
pattern will only match if the use is empty, so it should be safe to
discard the result.
Summary:
To drive the automaton we used a uint64_t as an action type. This
contained the transition's resource requirements as a conjunction:
(a OR b) AND (b OR c)
We encoded this conjunction as a sequence of four 16-bit bitmasks.
This limited the number of addressable functional units to 16, which
is quite low and has bitten many people in the past.
Instead, the DFAEmitter now generates a lookup table from InstrItinerary
class (index of the ItinData inside the ProcItineraries) to an internal
action index which is essentially a dense embedding of the conjunctive
form. Because we never materialize the conjunctive form, we no longer
have the 16 FU restriction.
In this patch we limit to 64 functional units due to using a uint64_t
bitmask in the DFAEmitter. Now that we've decoupled these representations
we can increase this in future.
Reviewers: ThomasRaoux, kparzysz, majnemer
Reviewed By: ThomasRaoux
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69110
Summary:
Found by PVS Studio
Not familiar with this code; no testcase.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69741
If there is a dag node with a variable number of operands that has at
least N operands (for some non-negative N), and multiple patterns with
that node with different number of operands, we would drop the number of
operands check in patterns with N operands, presumably because it's
guaranteed in such case that none of the per-operand checks will access
the operand list out-of-bounds.
Except semantically the check is about having exactly N operands, not at
least N operands, and a backend might rely on it to disambiguate
different patterns.
In this patch we change the condition on emitting the number of operands
check from "the instruction is not guaranteed to have at least as many
operands as are checked by the pattern being matched" to "the
instruction is not guaranteed to have a specific number of operands".
We're relying (still) on the rest of the CodeGenPatterns mechanics to
validate that the pattern itself doesn't try to access more operands
than there is in the instruction in cases when the instruction does have
fixed number of operands, and on the machine verifier to validate at
runtime that particular MIs like that satisfy the constraint as well.
Reviewers: dsanders, qcolombet
Reviewed By: qcolombet
Subscribers: arsenm, rovka, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69653
D68992 / rL375086 refactored the packetizer and removed a bunch of logic. Unfortunately it creates an Automaton object whenever a DFAPacketizer is required. These objects have no longevity, and in particular on a debug build the population of the Automaton's transition map from the underlying table is very slow (because it is called ~10 times per MachineFunction, in the testcase I'm looking at).
This patch changes Automaton to wrap its underlying constant data in std::shared_ptr, which allows trivial copy construction. The DFAPacketizer creation function now creates a static archetypical Automaton and copies that whenever a new DFAPacketizer is required.
This takes a testcase down from ~20s to ~0.5s in debug mode.
llvm-svn: 375240
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.
After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68992
llvm-svn: 375086
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
Assume that, ModelA has scheduling resource for InstA and ModelB has scheduling resource for InstB. This is what the llvm::MCSchedClassDesc looks like:
llvm::MCSchedClassDesc ModelASchedClasses[] = {
...
InstA, 0, ...
InstB, -1,...
};
llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, -1,...
InstB, 0,...
};
The -1 means invalid num of macro ops, while it is valid if it is >=0. This is what we look like now:
llvm::MCSchedClassDesc ModelASchedClasses[] = {
...
InstA, 0, ...
InstB, 0,...
};
llvm::MCSchedClassDesc ModelBSchedClasses[] = {
...
InstA, 0,...
InstB, 0,...
};
And compiler hit the assertion here because the SCDesc is valid now for both InstA and InstB.
Differential Revision: https://reviews.llvm.org/D67950
llvm-svn: 374524
When an instruction has an encoding definition for only a subset of
the available HwModes, ensure we just avoid generating an encoding
rather than crash.
llvm-svn: 374150
Summary:
While working with DagInit's, it's often the case that you expect the
operator to be a reference to a def. This patch adds a wrapper for this
common case to reduce the amount of boilerplate callers need to duplicate
repeatedly.
getOperatorAsDef() returns the record if the DagInit has an operator that is
a DefInit. Otherwise, it prints a fatal error.
There's only a few pre-existing examples in LLVM at the moment and I've
left a few instances of the code this simplifies as they had more specific
error messages than the generic one this produces. I'm going to be using
this a fair bit in my subsequent patches.
Reviewers: bogner, volkan, nhaehnle
Reviewed By: nhaehnle
Subscribers: nhaehnle, hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68424
llvm-svn: 374101
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.
Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.
llvm-svn: 373937
Summary:
This patch introduces -gen-automata, a backend for generating deterministic finite-state automata.
DFAs are already generated by the -gen-dfa-packetizer backend. This backend is more generic and will
hopefully be used to implement the DFA generation (and determinization) for the packetizer in the
future.
This backend allows not only generation of a DFA from an NFA (nondeterministic finite-state
automaton), it also emits sidetables that allow a path through the DFA under a sequence of inputs to
be analyzed, and the equivalent set of all possible NFA transitions extracted.
This allows a user to not just answer "can my problem be solved?" but also "what is the
solution?". Clearly this analysis is more expensive than just playing a DFA forwards so is
opt-in. The DFAPacketizer has this behaviour already but this is a more compact and generic
representation.
Examples are bundled in unittests/TableGen/Automata.td. Some are trivial, but the BinPacking example
is a stripped-down version of the original target problem I set out to solve, where we pack values
(actually immediates) into bins (an immediate pool in a VLIW bundle) subject to a set of esoteric
constraints.
Reviewers: t.p.northover
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67968
llvm-svn: 373718
Summary:
This will handle expansion of C++ fragments in the declarative combiner
including custom predicates, and escapes into C++ to aid the migration
effort.
Fixed the -DLLVM_LINK_LLVM_DYLIB=ON using DISABLE_LLVM_LINK_LLVM_DYLIB when
creating the library. Apparently it automatically links to libLLVM.dylib
and we don't want that from tablegen.
Reviewers: bogner, volkan
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68288
> llvm-svn: 373551
llvm-svn: 373651
Summary:
This will handle expansion of C++ fragments in the declarative combiner
including custom predicates, and escapes into C++ to aid the migration
effort.
Reviewers: bogner, volkan
Subscribers: mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68288
llvm-svn: 373551
Summary:
This is the first of a series of patches extracted from a much bigger WIP
patch. It merely establishes the tblgen pass and the way empty combiner
helpers are declared and integrated into a combiner info.
The tablegen pass takes a -combiners option to select the combiner helper
that will be generated. This can be given multiple values to generate
multiple combiner helpers at once. Doing so helps to minimize parsing
overhead.
The reason for creating a GlobalISel subdirectory in utils/TableGen is that
there will be quite a lot of non-pass files (~15) by the time the patch
series is done.
Reviewers: volkan
Subscribers: mgorny, hiraditya, simoncook, Petar.Avramovic, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68286
llvm-svn: 373527
Summary:
This allows intrinsics such as the following to be defined:
- declare <n x 4 x i32> @llvm.something.nxv4f32(<n x 4 x i32>, <n x 4 x i1>, <n x 4 x float>)
...where <n x 4 x i32> is derived from <n x 4 x float>, but
the element needs bitcasting to int.
Reviewers: c-rhodes, sdesmalen, rovka
Reviewed By: c-rhodes
Subscribers: tschuett, hiraditya, jdoerfert, llvm-commits, cfe-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68021
llvm-svn: 373437
Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD
were mapped to the same VEX instruction. But we should keep
the commutableness when change the opcode.
llvm-svn: 373303
https://reviews.llvm.org/D66773
The OpTypes::OperandType was creating an enum for all records that
inherit from Operand, but in reality there are operands for instructions
that inherit from other types too. In particular, RegisterOperand and
RegisterClass. This commit adds those types to the list of operand types
that are tracked by the OperandType enum.
Patch by: nlguillemot
llvm-svn: 372641
We're now using a lot more TargetConstant nodes in SelectionDAG.
But we were still telling isel to convert some of them
to TargetConstants even though they already are. This is because
isel emits a conversion anytime the output pattern has a an 'imm'.
I guess for patterns in instructions we take the 'timm' from the
'set' pattern, but for Pat patterns with explcicit output we
previously had to say 'imm' since 'timm' wasn't allowed in outputs.
llvm-svn: 372525
Summary:
Both match the type of another intrinsic parameter of a vector type, but where each element is subdivided to form a vector with more elements of a smaller type.
Subdivide2Argument allows intrinsics such as the following to be defined:
- declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 8 x i16>)
Subdivide4Argument allows intrinsics such as:
- declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 16 x i8>)
Tests are included in follow up patches which add intrinsics using these types.
Reviewers: sdesmalen, SjoerdMeijer, greened, rovka
Reviewed By: sdesmalen
Subscribers: rovka, tschuett, jdoerfert, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67549
llvm-svn: 372380
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)
This was missing one switch to getTargetConstant in an untested case.
llvm-svn: 372338
Much like ValueTypeByHwMode/RegInfoByHwMode, this patch allows targets
to modify an instruction's encoding based on HwMode. When the
EncodingInfos field is non-empty the Inst and Size fields of the Instruction
are ignored and taken from EncodingInfos instead.
As part of this promote getHwMode() from TargetSubtargetInfo to MCSubtargetInfo.
This is NFC for all existing targets - new code is generated only if targets
use EncodingByHwMode.
llvm-svn: 372320
This broke the Chromium build, causing it to fail with e.g.
fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>
See llvm-commits thread of r372285 for details.
This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.
> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372314
Encode them directly as an imm argument to G_INTRINSIC*.
Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.
This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.
SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.
Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.
The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.
This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.
llvm-svn: 372285
Summary:
Also fixup rL371928 for cases that occur on our out-of-tree backend
There were still quite a few intermediate APInts and this caused the
compile time of MCCodeEmitter for our target to jump from 16s up to
~5m40s. This patch, brings it back down to ~17s by eliminating pretty
much all of them using two new APInt functions (extractBitsAsZExtValue(),
insertBits() but with a uint64_t). The exact conditions for eliminating
them is that the field extracted/inserted must be <=64-bit which is
almost always true.
Note: The two new APInt API's assume that APInt::WordSize is at least
64-bit because that means they touch at most 2 APInt words. They
statically assert that's true. It seems very unlikely that someone
is patching it to be smaller so this should be fine.
Reviewers: jmolloy
Reviewed By: jmolloy
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67686
llvm-svn: 372243
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.
llvm-svn: 372146
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
Some VLIW instruction sets are Very Long Indeed. Using uint64_t constricts the Inst encoding to 64 bits (naturally).
This change switches CodeEmitter to a mode that uses APInts when Inst's bitwidth is > 64 bits (NFC for existing targets).
When Inst.BitWidth > 64 the prototype changes to:
void TargetMCCodeEmitter::getBinaryCodeForInstr(const MCInst &MI,
SmallVectorImpl<MCFixup> &Fixups,
APInt &Inst,
APInt &Scratch,
const MCSubtargetInfo &STI);
The Inst parameter returns the encoded instruction, the Scratch parameter is used internally for manipulating operands and is exposed so that the underlying storage can be reused between calls to getBinaryCodeForInstr. The goal is to elide any APInt constructions that we can.
Similarly the operand encoding prototype changes to:
getMachineOpValue(const MCInst &MI, const MCOperand &MO, APInt &op, SmallVectorImpl<MCFixup> &Fixups, const MCSubtargetInfo &STI);
That is, the operand is passed by reference as APInt rather than returned as uint64_t.
To reiterate, this APInt mode is enabled only when Inst.BitWidth > 64, so this change is NFC for existing targets.
llvm-svn: 371928
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
The scalar f64 patterns don't work yet because they fail on multiple
results from the unused implicit def of scc in the result bit
operation.
llvm-svn: 371542
Reapply with fix to reduce resources required by the compiler - use
unsigned[2] instead of std::pair. This causes clang and gcc to compile
the generated file multiple times faster, and hopefully will reduce
the resource requirements on Visual Studio also. This fix is a little
ugly but it's clearly the same issue the previous author of
DFAPacketizer faced (the previous tables use unsigned[2] rather uglily
too).
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371399
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
........
Reverted as this is causing "compiler out of heap space" errors on MSVC 2017/19 NDEBUG builds
llvm-svn: 371393
This patch allows the DFAPacketizer to be queried after a packet is formed to work out which
resources were allocated to the packetized instructions.
This is particularly important for targets that do their own bundle packing - it's not
sufficient to know simply that instructions can share a packet; which slots are used is
also required for encoding.
This extends the emitter to emit a side-table containing resource usage diffs for each
state transition. The packetizer maintains a set of all possible resource states in its
current state. After packetization is complete, all remaining resource states are
possible packetization strategies.
The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default
(most uses of the packetizer like MachinePipeliner don't care and don't need the extra
maintained state).
Differential Revision: https://reviews.llvm.org/D66936
llvm-svn: 371198
This was only using the correct register constraints if this was the
final result instruction. If the extract was a sub instruction of the
result, it would attempt to use GIR_ConstrainSelectedInstOperands on a
COPY, which won't work. Move the handling to
createAndImportSubInstructionRenderer so it works correctly.
I don't fully understand why runOnPattern and
createAndImportSubInstructionRenderer both need to handle these
special cases, and constrain them with slightly different methods. If
I remove the runOnPattern handling, it does break the constraint when
the final result instruction is EXTRACT_SUBREG.
llvm-svn: 371150
This partially adds support for patterns with REG_SEQUENCE. The source
patterns are now accepted, but the pattern is still rejected due to
missing support for the instruction renderer.
llvm-svn: 370920
The Hexagon itineraries are cunningly crafted such that functional units between
itineraries do not clash. Because all itineraries are bundled into the same DFA,
a functional unit index clash would cause an incorrect DFA to be generated.
A workaround for this is to ensure all itineraries declare the universe of all
possible functional units, but this isn't ideal for three reasons:
1) We only have a limited number of FUs we can encode in the packetizer, and
using the universe causes us to hit the limit without care.
2) Silent codegen faults are bad, and careful triage of the FU list shouldn't
be required.
3) Smooshing all itineraries into the same automaton allows combinations of
instruction classes that cannot exist, which bloats the table.
A simple solution is to allow "namespacing" packetizers.
Differential Revision: https://reviews.llvm.org/D66940
llvm-svn: 370508
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.
This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.
llvm-svn: 370280
Reuse the logic for INSERT_SUBREG to also import SUBREG_TO_REG patterns.
- Split `inferSuperRegisterClass` into two functions, one which tries to use
an existing TreePatternNode (`inferSuperRegisterClassForNode`), and one that
doesn't. SUBREG_TO_REG doesn't have a node to leverage, which is the cause
for the split.
- Rename GlobalISelEmitterInsertSubreg.td to GlobalISelEmitterSubreg.td and
update it.
- Update impacted tests in the AArch64 and X86 backends.
This is kind of a hit/miss for code size improvements/regressions. E.g. in
add-ext.ll, we now get some identity copies. This isn't really anything the
importer can handle, since it's caused by a later pass introducing the copy for
the sake of correctness.
Differential Revision: https://reviews.llvm.org/D66769
llvm-svn: 370254
I thought `llvm::sort` was stable for some reason but it's not.
Use `llvm::stable_sort` in `CodeGenTarget::getSuperRegForSubReg`.
Original patch: https://reviews.llvm.org/D66498
llvm-svn: 370084
When EXPENSIVE_CHECKS are enabled, GlobalISelEmitterSubreg.td doesn't get
stable output.
Reverting while I debug it.
See: https://reviews.llvm.org/D66498
llvm-svn: 370080
Summary:
This patch adds support for scalable vectors in intrinsics, enabling
intrinsics such as the following to be defined:
declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 4 x i32>)
Support for this is implemented by defining a new type descriptor for
scalable vectors and adding mangling support for scalable vector types
in the name mangling scheme used by 'any' types in intrinsic signatures.
Tests have been added for IRBuilder to test scalable vectors work as
expected when using intrinsics through this interface. This required
implementing an intrinsic that is explicitly defined with scalable
vectors, e.g. LLVMType<nxv4i32>, an SVE floating-point convert
intrinsic was used for this. The behaviour of the overloaded type
LLVMScalarOrSameVectorWidth with scalable vectors is tested using the
existing masked load intrinsic. Also added an .ll test to test the
Verifier catches a bad intrinsic argument when passing a fixed-width
predicate (mask) to the masked.load intrinsic where a scalable is
expected.
Patch by Paul Walker
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D65930
llvm-svn: 370053
This teaches the importer to handle INSERT_SUBREG instructions.
We were missing patterns involving INSERT_SUBREG in AArch64. It appears in
AArch64InstrInfo.td 107 times, and 14 times in AArch64InstrFormats.td.
To meaningfully import it, the GlobalISelEmitter needs to know how to infer a
super register class for a given register class.
This patch introduces the following:
- `getSuperRegForSubReg`, a function which finds the largest register class
which supports a value type and subregister index
- `inferSuperRegisterClass`, a function which finds the appropriate super
register class for an INSERT_SUBREG'
- `inferRegClassFromPattern`, a function which allows for some trivial
lookthrough into instructions
- `getRegClassFromLeaf`, a helper function which returns the register class for
a leaf `TreePatternNode`
- Support for subregister index operands in `importExplicitUseRenderer`
It also
- Updates tests in each backend which are impacted by the change
- Adds GlobalISelEmitterSubreg.td to test that we import and skip the expected
patterns
As a result of this patch, INSERT_SUBREG patterns in X86 may use the
LOW32_ADDR_ACCESS_RBP register class instead of GR32. This is correct, since the
register class contains the same registers as GR32 (except with the addition of
RBP). So, this also teaches X86 to handle that register class. This is in line
with X86ISelLowering, which treats this as a GR class.
Differential Revision: https://reviews.llvm.org/D66498
llvm-svn: 369973
This requires std::intializer_list to be a literal type, which it is
starting with C++14. The downside is that std::bitset is still not
constexpr-friendly so this change contains a re-implementation of most
of it.
Shrinks clang by ~60k.
llvm-svn: 369847
Overloaded intrinsics can use iPTRAny in used/input operands.
The GlobalISelEmitter doesn't know that these are pointers, so it treats them
as scalars. As a result, these intrinsics can't be imported.
This teaches the GlobalISelEmitter to recognize these as pointers rather than
scalars.
Differential Revision: https://reviews.llvm.org/D65756
llvm-svn: 369455
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.
Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.
llvm-svn: 369038
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary:
The problem:
When an operand had bits explicitly set to "1" (as in the InitValue.td test case attached), the decoder was ignoring those bits, and the DecoderMethod was receiving an input where the bits were still zero.
The solution:
We added an "InitValue" variable that stores the initial value of the operand based on what bits were explicitly initialized to 1 in TableGen code. The generated decoder code then uses that initial value to initialize the "tmp" variable, then calls fieldFromInstruction to read the values for the remaining bits that were left unknown in TableGen.
This is mainly useful when there are variations of an instruction that differ based on what bits are set in the operands, since this change makes it possible to access those bits in a DecoderMethod. The DecoderMethod can use those bits to know how to handle the input.
Patch by Nicolas Guillemot
Reviewers: craig.topper, dsanders, fhahn
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D63741
llvm-svn: 368458
The upper 4 bits of the immediate byte are used to encode a
register. We need to limit the explicit immediate to fit in the
remaining 4 bits.
Fixes PR42899.
llvm-svn: 368123
This was causing a bug where non-truncating stores would be selected instead of truncating ones.
Differential Revision: https://reviews.llvm.org/D64845
llvm-svn: 367737
AMDGPU uses some custom code predicates for testing alignments.
I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.
llvm-svn: 367373
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.
This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.
llvm-svn: 367326
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.
Differential Revision: https://reviews.llvm.org/D64967
llvm-svn: 366598
Summary:
Deduce the "willreturn" attribute for functions.
For now, intrinsics are not willreturn. More annotation will be done in another patch.
Reviewers: jdoerfert
Subscribers: jvesely, nhaehnle, nicholas, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63046
llvm-svn: 366335
If an intrinsic is defined without outputs, but having side effects,
it still can be removed completely from the program. This patch makes
TableGen not set Attribute::ReadNone for intrinsics which
are declared with IntrHasSideEffects.
Differential Revision: https://reviews.llvm.org/D64414
llvm-svn: 366312
Rather than an array of std::initializer_list, generate a table of
offsets and a flat array of the operands for getOperandType. This is a
bit more efficient on platforms that don't manage to get the array of
inintializer_lists initialized at link time (I'm looking at you
macOS). It's also quite quite a bit faster to compile.
llvm-svn: 366278
The InstrInfoEmitter outputs an enum called "OperandType" which gives
numerical IDs to each operand type. This patch makes use of this enum
to define a function called "getOperandType", which allows looking up
the type of an operand given its opcode and operand index.
Patch by Nicolas Guillemot. Thanks!
Differential Revision: https://reviews.llvm.org/D63320
llvm-svn: 366274
Summary:
We agreed to rename `except_ref` to `exnref` for consistency with other
reference types in
https://github.com/WebAssembly/exception-handling/issues/79. This also
renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order
to use the file for other reference types in future.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64703
llvm-svn: 366145
This was failing to import the AMDGPU truncstore patterns. The
truncating stores from 32-bit to 8/16 were then somehow being
incorrectly selected to a 4-byte store.
A separate check is emitted for the LLT size in comparison to the
specific memory VT, which looks strange to me but makes sense based on
the hierarchy of PatFrags used for the default truncstore PatFrags.
llvm-svn: 366129
Currently AMDGPU uses a CodePatPred to check address spaces from the
MachineMemOperand. Introduce a new first class property so that the
existing patterns can be easily modified to uses the new generated
predicate, which will also be handled for GlobalISel.
I would prefer these to match against the pointer type of the
instruction, but that would be difficult to get working with
SelectionDAG compatbility. This is much easier for now and will avoid
a painful tablegen rewrite for all the loads and stores.
I'm also not sure if there's a better way to encode multiple address
spaces in the table, rather than putting the number to expect.
llvm-svn: 366128
Some out of tree backend require larger vector type. Since maintaining the changes out of tree is difficult due to the many manual changes needed when adding a new type we are adding it even if no backend currently use it.
Differential Revision: https://reviews.llvm.org/D64141
Patch by Thomas Raoux!
llvm-svn: 365274
When a Tablegen instruction description uses `OperandWithDefaultOps`,
isel patterns for that instruction don't have to fill in the default
value for the operand in question. But the flip side is that they
actually //can't// override the defaults even if they want to.
This will be very inconvenient for the Arm backend, when we start
wanting to write isel patterns that generate the many MVE predicated
vector instructions, in the form with predication actually enabled. So
this small Tablegen fix makes it possible to write an isel pattern
either with or without values for a defaulted operand, and have the
default values filled in only if they are not overridden.
If all the defaulted operands come at the end of the instruction's
operand list, there's a natural way to match them up to the arguments
supplied in the pattern: consume pattern arguments until you run out,
then fill in any missing instruction operands with their default
values. But if defaulted and non-defaulted operands are interleaved,
it's less clear what to do. This does happen in existing targets (the
first example I came across was KILLGT, in the AMDGPU/R600 backend),
and of course they expect the previous behaviour (that the default for
those operands is used and a pattern argument is not consumed), so for
backwards compatibility I've stuck with that.
Reviewers: nhaehnle, hfinkel, dmgreen
Subscribers: mehdi_amini, javed.absar, tpr, kristof.beyls, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63814
llvm-svn: 365114
r363233 rewrote a bunch of the Intrin Emitter code, however the new
function to update the arg codes did not properly consider a pointer to
an any. This patch adds that logic.
Differential Revision: https://reviews.llvm.org/D63507
llvm-svn: 364364
It seems macOS lets you have ArrayRef<const X> even though this is apparently
forbidden by the language standard (Thanks MSVC++ for the clear error message).
Removed the problematic const's to fix this.
(It also seems I'm not receiving buildbot emails anymore and I'm trying to find
out why. In the mean time I'll be polling lab.llvm.org to hopefully see if/when
failures occur)
llvm-svn: 363753
Summary:
Add an AdditionalEncoding class which can be used to define additional encodings
for a given instruction. This causes the disassembler to add an additional
encoding to its matching tables that map to the specified instruction.
Usage:
def ADD1 : Instruction {
bits<8> Reg;
bits<32> Inst;
let Size = 4;
let Inst{0-7} = Reg;
let Inst{8-14} = 0;
let Inst{15} = 1; // Continuation bit
let Inst{16-31} = 0;
...
}
def : AdditionalEncoding<ADD1> {
bits<8> Reg;
bits<16> Inst; // You can also have bits<32> and it will still be a 16-bit encoding
let Size = 2;
let Inst{0-3} = 0;
let Inst{4-7} = Reg;
let Inst{8-15} = 0;
...
}
with those definitions, llvm-mc will successfully disassemble both of these:
0x01 0x00
0x10 0x80 0x00 0x00
to:
ADD1 r1
Depends on D52366
Reviewers: bogner, charukcs
Reviewed By: bogner
Subscribers: nlguillemot, nhaehnle, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D52369
llvm-svn: 363744
Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes.
Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately.
llvm-svn: 363330
This patch uses the mechanism from D62995 to strengthen the
definitions of the reduction intrinsics by letting the scalar
result/accumulator type be overloaded from the vector element type.
For example:
; The LLVM LangRef specifies that the scalar result must equal the
; vector element type, but this is not checked/enforced by LLVM.
declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)
This patch changes that into:
declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a)
Which has the type-constraint more explicit and causes LLVM to check
the result type with the vector element type.
Reviewers: RKSimon, arsenm, rnk, greened, aemerson
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D62996
llvm-svn: 363240
Extend the mechanism to overload intrinsic arguments by using either
backward or forward references to the overloadable arguments.
In for example:
def int_something : Intrinsic<[LLVMPointerToElt<0>],
[llvm_anyvector_ty], []>;
LLVMPointerToElt<0> is a forward reference to the overloadable operand
of type 'llvm_anyvector_ty' and would allow intrinsics such as:
declare i32* @llvm.something.v4i32(<4 x i32>);
declare i64* @llvm.something.v2i64(<2 x i64>);
where the result pointer type is deduced from the element type of the
first argument.
If the returned pointer is not a pointer to the element type, LLVM will
give an error:
Intrinsic has incorrect return type!
i64* (<4 x i32>)* @llvm.something.v4i32
Reviewers: RKSimon, arsenm, rnk, greened
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D62995
llvm-svn: 363233
Summary:
Use a set in getReqFeatures() in RISCVCompressInstEmitter instead of a map
because the index we save is not needed.
This also fixes bug 41666.
Reviewers: llvm-commits, apazos, asb, nickdesaulniers
Reviewed By: asb
Subscribers: Jim, nickdesaulniers, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61412
llvm-svn: 362968
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
A std::array is implemented as a template with an array inside a struct.
Older versions of clang, like 3.6, require an extra set of curly braces
around std::array initializations to avoid warnings.
The C++ language was changed regarding this by CWG 1270. So more modern
tool chains does not complain even if leaving out one level of braces.
llvm-svn: 362360
Fix the misleadingly indentation introduced in rL362064. This will get rid of
the compiler warning, and it was actually a bug. This change will be used and
tested in D62669.
llvm-svn: 362211
If an assembly instruction has to mention an input operand name twice,
for example the MVE VMOV instruction that accesses two lanes of the
same vector by writing 'vmov r1, r2, q0[3], q0[1]', then the obvious
way to write its AsmString is to include the same operand (here $Qd)
twice. But this causes the AsmMatcher generator to omit that
instruction completely from the match table, on the basis that the
generator isn't clever enough to deal with the duplication.
But you need to have _some_ way of dealing with an instruction like
this - and in this case, where the mnemonic is shared with many other
instructions that the AsmMatcher does handle, it would be very painful
to take it out of the AsmMatcher system completely.
A nicer way is to add a custom AsmMatchConverter routine, and let that
deal with the problem if the autogenerated converter can't. But that
doesn't work, because TableGen leaves the instruction out of its table
_even_ if you provide a custom converter.
Solution: this change, which makes TableGen relax the restriction on
duplicated operands in the case where there's a custom converter.
Patch by: Simon Tatham
Differential Revision: https://reviews.llvm.org/D60695
llvm-svn: 362066
This is a new special identifier which you can use as a default in
OperandWithDefaultOps. The idea is that you use it for an input
operand of an instruction that's tied to an output operand, and its
semantics are that (in the default case) the input operand's value is
not used at all.
The detailed effect is that when instruction selection emits the
instruction in the form of a pre-regalloc MachineInstr, it creates an
IMPLICIT_DEF node to use as that input.
If you're creating an MCInst with explicit register names, then the
right handling would be to set the input operand to the same register
as the output one (honouring the tie) and to add the 'undef' flag
indicating that that register is deemed to acquire a new don't-care
definition just before we read it. But I haven't done that in this
commit, because there was no need to - no Tablegen backend seems to
autogenerate default fields in an MCInst.
Patch by: Simon Tatham
Differential Revision: https://reviews.llvm.org/D60696
llvm-svn: 362064
Summary:
This reverts commit r360106.
The revisioin causes llvm-tblgen to hang while generating info for
RISCV.td. The root cause might be in the RISCV.td definition but I don't
know enough about this to investigate further.
Command that starts hangning after r360106:
`llvm-build/bin/llvm-tblgen -I llvm/include -I llvm/tools/clang/include -I llvm/lib/Target/RISCV -gen-instr-info llvm/lib/Target/RISCV/RISCV.td`
Reviewers: sammccall, yan_luo, craig.topper, gribozavr
Reviewed By: gribozavr
Subscribers: PkmX, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61632
llvm-svn: 360136
If you have more than one schedule model in your TableGen target
definitions, then the diagnostic "No schedule information for
instruction 'foo'" is rather unhelpful, because it doesn't tell you
_which_ schedule model is missing the necessary information (or, as it
might be, missing the UnsupportedFeatures definition that would stop
it thinking it needed it).
Extended the message to include the name of the schedule model that
it's complaining about.
Reviewers: nhaehnle, hfinkel, javedabsar, efriedma, javed.absar
Reviewed By: javed.absar
Subscribers: javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60559
llvm-svn: 358389
The composite existed to simplify some other tablegen code and not really in an
important way. Remove the combined field and just calculate the vector size
using two ifs.
llvm-svn: 357972
The instruction's document this as W0 for the VEX encoding. But there's a
footnote mentioning that VEX.W is ignored in 64-bit mode. And the main VEX
encoding description says the VEX.W bit is ignored for instructions that are
equivalent to a legacy SSE instruction that uses REX.W to select a GPR which
would apply here.
By making this match EVEX we can remove a special case of allowing EVEX2VEX to
turn an EVEX.WIG instruction into VEX.W0.
llvm-svn: 357971
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon
Reviewed By: RKSimon
Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60228
llvm-svn: 357802
Summary:
This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes.
Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser.
Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri
Reviewed By: andreadb
Subscribers: hiraditya, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60138
llvm-svn: 357801
Summary:
Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models.
This avoids needing an isel pattern for each condition code. And it removes
translation switches for converting between CMOV instructions and condition
codes.
Now the printer, encoder and disassembler take care of converting the immediate.
We use InstAliases to handle the assembly matching. But we print using the
asm string in the instruction definition. The instruction itself is marked
IsCodeGenOnly=1 to hide it from the assembly parser.
This does complicate the scheduler models a little since we can't assign the
A and BE instructions to a separate class now.
I plan to make similar changes for SETcc and Jcc.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60041
llvm-svn: 357800
We were using the number of Matchables rather than the number of rows in the converter table.
This only matters for a few of the targets where the number of matchables is more than 255, but the number of converters is less than 255. Many of the targets have more than 256 converters. So already required a uint16_t.
llvm-svn: 357527
Summary:
It does not currently make sense to use WebAssembly features in some functions
but not others, so this CL adds an IR pass that takes the union of all used
feature sets and applies it to each function in the module. This allows us to
prevent atomics from being lowered away if some function has opted in to using
them. When atomics is not enabled anywhere, we detect whether there exists any
atomic operations or thread local storage that would be stripped and disallow
linking with objects that contain atomics if and only if atomics or tls are
stripped. When atomics is enabled, mark it as used but do not require it of
other objects in the link. These changes allow libraries that do not use atomics
to be built once and linked into both single-threaded and multithreaded
binaries.
Reviewers: aheejin, sbc100, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59625
llvm-svn: 357226
These were added in r355423.
We only use the autogenerated table to assist with the maintenance of the
manual table. These entries are alreayd in the manual table.
llvm-svn: 356357
AMDGPU would like to use these MVTs.
Differential Revision: https://reviews.llvm.org/D58901
Change-Id: I6125fea810d7cc62a4b4de3d9904255a1233ae4e
llvm-svn: 356351
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.
The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".
The _alt form on the other hand parsed and printed like any other instruction with no specialness.
With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.
I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.
I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.
Differential Revision: https://reviews.llvm.org/D59398
llvm-svn: 356343
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.
Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.
This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.
llvm-svn: 355981
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.
This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.
Differential Revision: https://reviews.llvm.org/D59002
llvm-svn: 355839
Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher.
Original commit message:
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355784
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.
llvm-svn: 355433
These arrays are both keyed by CPU name and go into the same tablegenerated file. Merge them so we only need to store keys once.
This also removes a weird space saving quirk where we used the ProcDesc.size() to create to build an ArrayRef for ProcSched.
Differential Revision: https://reviews.llvm.org/D58939
llvm-svn: 355431
The description for CPUs was just the CPU name wrapped with "Select the " and " processor". We can just do that directly in the help printer instead of making a separate version in the binary for each CPU.
Also remove the Value field that isn't needed and was always 0.
Differential Revision: https://reviews.llvm.org/D58938
llvm-svn: 355429
Apparently older versions of clang like 3.6 require an extra set of curly braces around std::array initializations. I'm told the C++ language was changed regarding this by CWG 1270.
llvm-svn: 355327
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.
By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.
This removes something like 40,000 bytes from the X86 isel table.
Differential Revision: https://reviews.llvm.org/D58595
llvm-svn: 355224
Subtarget features are stored in a std::bitset that has been subclassed. There is a special constructor to allow the tablegen files to provide a list of bits to initialize the std::bitset to. This constructor isn't constexpr and std::bitset doesn't support many constexpr operations either. This results in a static global constructor being used to initialize the feature bitsets in these files at startup.
To fix this I've introduced a new FeatureBitArray class that holds three 64-bit values representing the initial bit values and taught tablegen to emit hex constants for them based on the feature enum values. This makes the tablegen files less readable than they were before. I can add the list of features back as a comment if we think that's important.
I've added a method to convert from this class into the std::bitset subclass we had before. I considered making the new FeatureBitArray class just implement the std::bitset interface we need instead, but thought I'd see how others felts about that first.
I've simplified the interfaces to SetImpliedBits and ClearImpliedBits a little minimize the number of times we need to convert to the bitset.
This removes about 27K from my local release+asserts build of llc.
Differential Revision: https://reviews.llvm.org/D58520
llvm-svn: 355167
The previous sort comparator was not deterministic, i.e. in some
situations it would be possible for lhs < rhs && rhs < lhs. This was
discovered by an STL assertion in a Windows debug build of llvm-tblgen.
Differential Revision: https://reviews.llvm.org/D58687
llvm-svn: 354910
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.
The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.
Differential Revision: https://reviews.llvm.org/D57680
llvm-svn: 354870
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.
In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.
(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)
So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.
I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.
Reviewers: SjoerdMeijer, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57823
llvm-svn: 354768
OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2.
This saves ~3000 bytes in the X86 table.
llvm-svn: 354763
Summary:
This adds support for defining patterns for global isel using pointer
types, for example:
def : Pat<(load GPR32:$src),
(p1 (LOAD GPR32:$src))>;
DAGISelEmitter will ignore the pointer information and treat these
types as integers with the same bit-width as the pointer type.
Reviewers: dsanders, rtereshin, arsenm
Reviewed By: arsenm
Subscribers: Petar.Avramovic, wdng, rovka, kristof.beyls, jfb, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57065
llvm-svn: 354510
This class is used for two difference tablegen generated tables. For one of the tables the Value FeatureBitset only has one bit set. For the other usage the Implies field was unused.
This patch changes the Value field to just be an unsigned. For the usage that put a real vector in bitset, we now use the previously unused Implies field and leave the Value field unused instead.
This is good for a 16K reduction in the size of llc on my local build with all targets enabled.
llvm-svn: 354243
Summary:
While working on the GISel Combiner, I noticed I was producing location-less
error messages fairly often and set about fixing this. In the process, I
noticed quite a few places elsewhere in TableGen that also neglected to include
a relevant location.
This patch adds locations to errors that relate to a specific record (or a
field within it) and also have easy access to the relevant location. This is
particularly useful when multiclasses are involved as many of these errors
refer to the full name of a record and it's difficult to guess which substring
is grep-able.
Unfortunately, tablegen currently only supports Record granularity so it's not
currently possible to point at a specific Init so these sometimes point at the
record that caused the error rather than the precise origin of the error.
Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, nhaehnle
Reviewed By: nhaehnle
Subscribers: jdoerfert, nhaehnle, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58077
llvm-svn: 353862
This patch adds a -time-regions option to tablegen that can enable timers
(currently only one) that assess the performance of tablegen itself. This
can be useful for identifying scaling problems with tablegen backends.
This particular timer has allowed me to ignore time that is not attributed
the GISel combiner pass. It's useful by itself but it is particularly
useful in combination with https://reviews.llvm.org/D52954 which causes
this period of time to be annotated within Xcode Instruments which in turn
allows profile samples and recorded allocations attributed to reading
instructions to be filtered out.
llvm-svn: 353763
If we run into a pattern that looks like this:
add
(complex $x, $y)
(complex $x, $z)
We should skip the pattern instead of asserting/doing something unpredictable.
This makes us return an Error in that case, and adds a testcase for skipped
patterns.
Differential Revision: https://reviews.llvm.org/D57980
llvm-svn: 353586
Summary:
There are a few instructions that all map to the same opcode, so
when disassembling, we have to pick one. That was just the first one
before (the except_ref variant in the case of "call"), now it is the
one marked as IsCanonical in tablegen, or failing that, the shortest
name (which is typically the "canonical" one).
Also introduced a canonical "end" instruction for this purpose.
Reviewers: dschuff, tlively
Subscribers: sbc100, jgravelle-google, aheejin, llvm-commits, sunfish
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57713
llvm-svn: 353131
LLVM_ENABLE_DAGISEL_COV can be used to instrument DAGISel tablegen
selection code to show which patterns along with Complex patterns were
used when selecting instructions. Unfortunately this is turned off by
default and was broken but never tested.
This required a simple fix (missing new line) to get it to build again.
llvm-svn: 353091
This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth.
The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth.
I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type.
The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour
Differential Revision: https://reviews.llvm.org/D57090
llvm-svn: 351957
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
Right now we include ${TGT}GenCallingConv.inc once per each instruction
selection method implemented by ${TGT}:
- ${TGT}ISelLowering.cpp
- ${TGT}CallLowering.cpp
- ${TGT}FastISel.cpp
Instead, add a mechanism to tablegen for marking a particular convention
as "External", which causes tablegen to emit into the ::llvm namespace,
instead of as a static helper. This allows us to provide a header to
forward declare it, so we can simply call the function from all the
places it is referenced. Typically the calling convention analyzer is
called indirectly, so it doesn't benefit from inlining.
This saves a bit of final binary size, but mostly just saves object file
size:
before after diff artifact
12852K 12492K -360K X86ISelLowering.cpp.obj
4640K 4280K -360K X86FastISel.cpp.obj
1704K 2092K +388K X86CallingConv.cpp.obj
52448K 52336K -112K llc.exe
I didn't collect before numbers for X86CallLowering.cpp.obj, which is
for GlobalISel, but we should save 360K there as well.
This patch applies the strategy to the X86 backend, but there is no
reason it couldn't be applied to the other backends that implement
multiple ISel strategies, like AArch64.
Reviewers: craig.topper, hfinkel, efriedma
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D56883
llvm-svn: 351616
Summary:
The previously introduced new operand type for br_table didn't have
a disassembler implementation, causing an assert.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56227
llvm-svn: 350366
When you define an instruction alias as a subclass of InstAlias, you
specify all the MC operands for the instruction it expands to, except
for operands that are tied to a previous one, which you leave out in
the expectation that the Tablegen output code will fill them in
automatically.
But the code in Tablegen's AsmWriter backend that skips over a tied
operand was doing it using 'if' instead of 'while', because it wasn't
expecting to find two tied operands in sequence.
So if an instruction updates a pair of registers in place, so that its
MC representation has two input operands tied to the output ones (for
example, Arm's UMLAL instruction), then any alias which wants to
expand to a special case of that instruction is likely to fail to
match, because the indices of subsequent operands will be off by one
in the generated printAliasInstr function.
This patch re-indents some existing code, so it's clearest when
viewed as a diff with whitespace changes ignored.
Reviewers: fhahn, rengolin, sdesmalen, atanasyan, asb, jholewinski, t.p.northover, kparzysz, craig.topper, stoklund
Reviewed By: rengolin
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D53816
llvm-svn: 349141
One of the GCC based bots is objecting to a vector of const EncodingAndInst's:
In file included from /usr/include/c++/8/vector:64,
from /export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/utils/TableGen/CodeGenInstruction.h:22,
from /export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/utils/TableGen/FixedLenDecoderEmitter.cpp:15:
/usr/include/c++/8/bits/stl_vector.h: In instantiation of 'class std::vector<const {anonymous}::EncodingAndInst, std::allocator<const {anonymous}::EncodingAndInst> >':
/export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/utils/TableGen/FixedLenDecoderEmitter.cpp:375:32: required from here
/usr/include/c++/8/bits/stl_vector.h:351:21: error: static assertion failed: std::vector must have a non-const, non-volatile value_type
static_assert(is_same<typename remove_cv<_Tp>::type, _Tp>::value,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/stl_vector.h:354:21: error: static assertion failed: std::vector must have the same value_type as its allocator
static_assert(is_same<typename _Alloc::value_type, _Tp>::value,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llvm-svn: 349046
Summary:
Separate the concept of an encoding from an instruction. This will enable
the definition of additional encodings for the same instruction which can
be used to support variable length instruction sets in the disassembler
(and potentially assembler but I'm not working towards that right now)
without causing an explosion in the number of Instruction records that
CodeGen then has to pick between.
Reviewers: bogner, charukcs
Reviewed By: bogner
Subscribers: kparzysz, llvm-commits
Differential Revision: https://reviews.llvm.org/D52366
llvm-svn: 349041
Summary:
This fixes support in DAGISelMatcher backend for DAG nodes with multiple
result values. Previously the order of results in selected DAG nodes always
matched the order of results in ISel patterns. After the change the order of
results matches the order of operands in OutOperandList instead.
For example, given this definition from the attached test case:
def INSTR : Instruction {
let OutOperandList = (outs GPR:$r1, GPR:$r0);
let InOperandList = (ins GPR:$t0, GPR:$t1);
let Pattern = [(set i32:$r0, i32:$r1, (udivrem i32:$t0, i32:$t1))];
}
the DAGISelMatcher backend currently produces a matcher that creates INSTR
nodes with the first result `$r0` and the second result `$r1`, contrary to the
order in the OutOperandList. The order of operands in OutOperandList does not
matter at all, which is unexpected (and unfortunate) because the order of
results of a DAG node does matters, perhaps a lot.
With this change, if the order in OutOperandList does not match the order in
Pattern, DAGISelMatcherGen emits CompleteMatch opcodes with the order of
results taken from OutOperandList. Backend writers can use it to express
result reorderings in TableGen.
If the order in OutOperandList matches the order in Pattern, the result of
DAGISelMatcherGen is unaffected.
Patch by Eugene Sharygin
Reviewers: andreadb, bjope, hfinkel, RKSimon, craig.topper
Reviewed By: craig.topper
Subscribers: nhaehnle, craig.topper, llvm-commits
Differential Revision: https://reviews.llvm.org/D55055
llvm-svn: 348326
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.
This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.
This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.
Differential revision: https://reviews.llvm.org/D54853
llvm-svn: 348114
Simple predicates, such as those defined by `CheckRegOperandSimple` or
`CheckImmOperandSimple`, were not being negated when used with `CheckNot`.
This change fixes this issue by defining the previously declared methods to
handle simple predicates.
Differential revision: https://reviews.llvm.org/D55089
llvm-svn: 348034
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.
For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.
With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.
Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.
Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c
Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D51994
llvm-svn: 347992
When tablegen detects that there exist two subregister compositions that
result in the same value for some register, it will emit a warning. This
kind of an overlap in compositions should only happen when it is caused
by a user-defined composition. It can happen, however, that the user-
defined composition is not identically equal to another one, but it does
produce the same value for one or more registers. In such cases suppress
the warning.
This patch is to silence the warning when building the System Z backend
after D50725.
Differential Revision: https://reviews.llvm.org/D50977
llvm-svn: 347894
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.
A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues. Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`. Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).
At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.
With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".
About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage. This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.
Differential Revision: https://reviews.llvm.org/D54957
llvm-svn: 347857