After discussion in `D93482` we found that the some of the clauses were not
following the common OmpClause convention.
The benefits of using OmpClause:
- Functionalities from structure checker are mostly aligned to work with
`llvm::omp::Clause`.
- The unparsing as well can take advantage.
- Homogeneity with OpenACC and rest of the clauses in OpenMP.
- Could even generate the parser with TableGen, when there is homogeneity.
- It becomes confusing when to use `flangClass` and `flangClassValue` inside
TableGen, if incase we generate parser using TableGen we could have only a
single `let expression`.
This patch makes `OmpProcBindClause` clause part of `OmpClause`.
The unparse function is dropped as the unparsing is done by `WALK_NESTED_ENUM`
for `OmpProcBindClause`.
Reviewed By: clementval, kiranktp
Differential Revision: https://reviews.llvm.org/D93642
After discussion in `D93482` we found that the some of the clauses were not
following the common OmpClause convention.
The benefits of using OmpClause:
- Functionalities from structure checker are mostly aligned to work with
`llvm::omp::Clause`.
- The unparsing as well can take advantage.
- Homogeneity with OpenACC and rest of the clauses in OpenMP.
- Could even generate the parser with TableGen, when there is homogeneity.
- It becomes confusing when to use `flangClass` and `flangClassValue` inside
TableGen, if incase we generate parser using TableGen we could have only a
single `let expression`.
This patch makes `OmpDefaultClause` clause part of `OmpClause`.
The unparse function is dropped as the unparsing is done by `WALK_NESTED_ENUM`
for `OmpDefaultClause`.
Reviewed By: clementval, kiranktp
Differential Revision: https://reviews.llvm.org/D93641
After discussion in `D93482` we found that the some of the clauses were not
following the common OmpClause convention.
The benefits of using OmpClause:
- Functionalities from structure checker are mostly aligned to work with
`llvm::omp::Clause`.
- The unparsing as well can take advantage.
- Homogeneity with OpenACC and rest of the clauses in OpenMP.
- Could even generate the parser with TableGen, when there is homogeneity.
- It becomes confusing when to use `flangClass` and `flangClassValue` inside
TableGen, if incase we generate parser using TableGen we could have only a
single `let expression`.
This patch makes `allocate` clause part of `OmpClause`.The unparse function for
`OmpAllocateClause` is adapted since the keyword and parenthesis are issued by
the corresponding unparse function for `parser::OmpClause::Allocate`.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D93640
Define vector compare intrinsics and lower them to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93368
Define vleff intrinsics and lower to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93516
This defines vmadd, vmacc, vnmsub, and vnmsac intrinsics and
lower to V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Differential Revision: https://reviews.llvm.org/D93632
This is a follow-up patch of D87045.
The patch implements "loop-nest mode" for `LPMUpdater` and `FunctionToLoopPassAdaptor` in which only top-level loops are operated.
`createFunctionToLoopPassAdaptor` decides whether the returned adaptor is in loop-nest mode or not based on the given pass. If the pass is a loop-nest pass or the pass is a `LoopPassManager` which contains only loop-nest passes, the loop-nest version of adaptor is returned; otherwise, the normal (loop) version of adaptor is returned.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D87531
Define the `vand`, `vor` and `vxor` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D93574
CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the
intention clearer and matches the direction of reverted r240130 (to drop the
unneeded parameters).
No behavior change.
AMDGPUTargetMachine::adjustPassManager() adds some alias analyses to the
legacy PM. We need a way to do the same for the new PM in order to port
AMDGPUTargetMachine::adjustPassManager() to the new PM.
Currently the new PM adds alias analyses by creating an AAManager via
PassBuilder and overriding the AAManager a PassManager uses via
FunctionAnalysisManager::registerPass().
We will continue to respect a custom AA pipeline that specifies an exact
AA pipeline to use, but for "default" we will now add alias analyses
that backends specify. Most uses of PassManager use the "default"
AAManager created by PassBuilder::buildDefaultAAPipeline(). Backends can
override the newly added TargetMachine::registerAliasAnalyses() to add custom
alias analyses.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D93261
Use the TableGen feature to have enum values for clauses.
Next step will be to extend the MLIR part used currently by OpenMP
to use the same enum on the dialect side.
This patch also add function that convert the enum to StringRef to be
used on the dump-parse-tree from flang.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93576
byval arguments should mostly get the same treatment as noalias
arguments in alias analysis. This was not the case for the
isIdentifiedFunctionLocal() function.
Marking byval arguments as identified function local means that
they cannot alias with other arguments, which I believe is correct.
Differential Revision: https://reviews.llvm.org/D93602
Before this patch, you needed to use `AutoNormalizeEnumJoined` whenever you wanted to **de**normalize joined enum.
Besides the naming confusion, this means the fact the option is joined is specified in two places: in the normalization multiclass and in the `Joined<["-"], ...>` multiclass.
This patch makes this work automatically, taking into account the `OptionClass` of options.
Also, the enum denormalizer now just looks up the spelling of the present enum case in a table and forwards it to the string denormalizer.
I also added more tests that exercise this.
Reviewed By: dexonsmith
Original patch by Daniel Grumberg.
Differential Revision: https://reviews.llvm.org/D84189
This replaces the existing `MarshallingInfoFlag<...>, IsNegative` with simpler `MarshallingInfoNegativeFlag`.
Reviewed By: dexonsmith
Original patch by Daniel Grumberg.
Differential Revision: https://reviews.llvm.org/D84675
This patch base on D93366, and define vector fixed-point intrinsics.
1. vaaddu/vaadd/vasubu/vasub
2. vsmul
3. vssrl/vssra
4. vnclipu/vnclip
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Differential Revision: https://reviews.llvm.org/D93508
This patch does two things:
1: fix the typo that intrinsic mfvscr should be with no readmem property
2: since VSCR is not modeled yet, add has side effect for SAT bit clobber
intrinsics/instructions.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D90807
Currently there is an issue where the legacy pass manager uses a different OptBisect counter than the new pass manager.
This fix makes the npm OptBisectInstrumentation use the global OptBisect.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D92897
Define vector vfwmul intrinsics and lower them to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93584
Define vector vfwadd/vfwsub intrinsics and lower them to V
instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93583
Define vector vfsgnj/vfsgnjn/vfsgnjx intrinsics and lower them to V
instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93581
Define vector vfmul/vfdiv/vfrdiv intrinsics and lower them to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93580
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.
We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D86978
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch deduplicates
extracted functions that only have inputs and non of the special cases.
We test that correctly deduplicate in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
These properties aren't additive. They are closer to ReadOnly and
WriteOnly. The default is ReadWrite. ReadMem cancels the write property and
WriteMem cancels the read property. Combining them leaves neither.
This patch checks that when we process WriteMem, the Mod flag is
still set. And for ReadMem we check that the Ref flag set still set.
I've updated 2 target intrinsics that were combining these properties.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D93571
inputs.
Extracted regions can have both inputs and outputs. In addition, the
CodeExtractor removes inputs that are only used in llvm.assumes, and
sunken allocas (values are used entirely in the extracted region as
denoted by lifetime intrinsics). We also cannot combine sections that
have different constants in the same structural location, and these
constants will have to elevated to argument. This patch limits the
extracted regions to those that only require inputs, and do not have any
other special cases.
We test that we do not outline the wrong constants in:
test/Transforms/IROutliner/outliner-different-constants.ll
test/Transforms/IROutliner/outliner-different-globals.ll
test/Transforms/IROutliner/outliner-constant-vs-registers.ll
We test that correctly outline in:
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Reviewers: paquette, plofti
Differential Revision: https://reviews.llvm.org/D86977
Define vlxe/vsxe intrinsics and lower to vlxei<EEW>/vsxei<EEW>
instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Differential Revision: https://reviews.llvm.org/D93471
Introduce CHECK modifiers that change the behavior of the CHECK
directive. Also add a LITERAL modifier for cases where matching could
end requiring escaping strings interpreted as regex where only
literal/fixed string matching is desired (making the CHECK's more
difficult to write/fragile and difficult to interpret).
This patch adds two IR intrinsics for vsetvli instruction. One to set the vector length to a user specified value and one to set it to vlmax. The vlmax uses the X0 source register encoding.
Clang builtins will follow in a separate patch
Differential Revision: https://reviews.llvm.org/D92973
This time with tests.
Original message:
Similar to D93365, but for floating point. No need for special ISD opcodes
though. We can directly isel these from intrinsics. I had to use anyfloat_ty
instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not
crash when imported into the -gen-dag-isel tablegen backend.
Differential Revision: https://reviews.llvm.org/D93426
Similar to D93365, but for floating point. No need for special ISD opcodes
though. We can directly isel these from intrinsics. I had to use anyfloat_ty
instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not
crash when imported into the -gen-dag-isel tablegen backend.
Differential Revision: https://reviews.llvm.org/D93426
This adds intrinsics for vmv.x.s and vmv.s.x.
I've used stricter type constraints on these intrinsics than what we've been doing on the arithmetic intrinsics so far. This will allow us to not need to pass the scalar type to the Intrinsic::getDeclaration call when creating these intrinsics.
A custom ISD is used for vmv.x.s in order to implement the change in computeNumSignBitsForTargetNode which can remove sign extends on the result.
I also modified the MC layer description of these instructions to show the tied source/dest operand. This is different than what we do for masked instructions where we drop the tied source operand when converting to MC. But it is a more accurate description of the instruction. We can't do this for masked instructions since we use the same MC instruction for masked and unmasked. Tools like llvm-mca operate in the MC layer and rely on ins/outs and Uses/Defs for analysis so I don't know if we'll be able to maintain the current behavior for masked instructions. So I went with the accurate description here since it was easy.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D93365
The main change is to add a 'IsDecl' field to DIModule so
that when IsDecl is set to true, the debug info entry generated
for the module would be marked as a declaration. That way, the debugger
would look up the definition of the module in the gloabl scope.
Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll
for what the debug info entries would look like.
Differential Revision: https://reviews.llvm.org/D93462
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Craig Topper <craig.topper@sifive.com>
Differential Revision: https://reviews.llvm.org/D93514
Similar to D69312, and documented in D69839, the IRBuilder needs to add
the strictfp attribute to invoke instructions when constrained floating
point is enabled.
This is try 2, with the test corrected.
Differential Revision: https://reviews.llvm.org/D93134
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions.
This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet:
```
while (!PN.use_empty()) {
LoadInst *LI = cast<LoadInst>(PN.user_back());
LI->replaceAllUsesWith(NewPN);
LI->eraseFromParent();
}
```
If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line:
```
LoadInst *Load = PredBuilder.CreateAlignedLoad(
LoadTy, InVal, Alignment,
(PN.getName() + ".sroa.speculate.load." + Pred->getName()));
```
This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D92431
Currently, `ELFFile<ELFT>::getEntry` does not check an index of
an entry. Because of that the code might read past the end of the symbol
table silently. I've added a test to `llvm-readobj\ELF\relocations.test`
to demonstrate the possible issue. Also, I've added a unit test for
this method.
After this change, `getEntry` stops reporting the section index and
reuses the `getSectionContentsAsArray` method, which already has
all the validation needed. Our related warnings now provide
more and better context sometimes.
Differential revision: https://reviews.llvm.org/D93209
This introduces asm support for the Branch Record Buffer extension, through
the new 'brbe' subtarget feature. It consists of a new set of system registers
that enable the handling of branch records.
Patch written by Simon Tatham.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D92389
This is split off from D91718 and adds a new target hook
supportsScalableVectors that can be queried to check if scalable vectors
are supported by the backend. For AArch64 this returns true if SVE is
enabled.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D93060
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ
The intrinsics have overloaded source and result type and support
vector operands:
i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc
On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:
i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19
I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.
There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:
f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
If the bounds cannot be exactly represented, we expand to something
like this instead:
i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN
It should be noted that this expansion assumes a non-trapping fptoxi.
Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.
Original patch by @nikic and @ebevhan (based on D54696).
Differential Revision: https://reviews.llvm.org/D54749
The last use of the function was removed on Sep 18, 2016 in commit
5f8cc0c346.
The function was later moved to llvm/lib/Analysis/IVDescriptors.cpp on
Sep 12, 2018 in commit 7e98d69847.
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: Monk Chiang <monk.chiang@sifive.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93366
This adds a custom InstVisitor to return false on instructions that
should not be allowed to be outlined. These match the illegal
instructions in the IRInstructionMapper with exception of the addition
of the llvm.assume intrinsic.
Tests all the tests marked: illegal-*-.ll with a test for each kind of
instruction that has been marked as illegal.
Reviewers: jroelofs, paquette
Differential Revisions: https://reviews.llvm.org/D86976
Define vlse/vsse intrinsics and lower to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93445
Remove the OpenMP clause information from the OMPKinds.def file and use the
information from the new OMP.td file. There is now a single source of truth for the
directives and clauses.
To avoid generate lots of specific small code from tablegen, the macros previously
used in OMPKinds.def are generated almost as identical. This can be polished and
possibly removed in a further patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D92955
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.
Differential Revision: https://reviews.llvm.org/D91974
Extracting the similar regions is the first step in the IROutliner.
Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed. Each
IRSimilarityCandidate is used to define an OutlinableRegion. Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.
Each region is then extracted with the CodeExtractor into its own
function.
We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll
Recommit of bf899e8913 fixing memory
leaks.
Reviewers: paquette, jroelofs, yroux
Differential Revision: https://reviews.llvm.org/D86975
This patch add some checks for the restriction on the routine directive
and fix several issue at the same time.
Validity tests have been added in a separate file than acc-clause-validity.f90 since this one
became quite large. I plan to split the larger file once on-going review are done.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D92672
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.
Below is an example of splitting the block bb1 at its first instruction.
/// Original IR
bb0:
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
br bb1
bb1:
br bb1.split
bb1.split:
%0 = mul i32 1, 2
br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
br bb1.split
bb1.split
br bb1
bb1:
%0 = mul i32 1, 2
br bb2
bb2:
Differential Revision: https://reviews.llvm.org/D92200
Update the allowed clauses for the SERIAL construct for the new OpenACC 3.1
specification.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D92123
It mimics the GNU readelf where it prints a [VARIANT_PCS] for symbols
with st_other with STO_AARCH64_VARIANT_PCS.
Reviewed By: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D93044
This extends the command-line support for the 'armv8.7-a' architecture
name to the ARM target.
Based on a patch written by Momchil Velikov.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D93231
This introduces command-line support for the 'armv8.7-a' architecture name
(and an alias without the '-', as usual), and for the 'ls64' extension name.
Based on patches written by Simon Tatham.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D91776
This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.
Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.
The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.
The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D93400
Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).
There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:
// 1) Missing const
namespace missing_const {
struct A {
#ifndef FIXED
bool operator==(A const&);
#else
bool operator==(A const&) const;
#endif
};
bool a = A{} == A{}; // error
}
// 2) Type mismatch on CRTP
namespace crtp_mismatch {
template <typename Derived>
struct Base {
#ifndef FIXED
bool operator==(Derived const&) const;
#else
// in one case changed to taking Base const&
friend bool operator==(Derived const&, Derived const&);
#endif
};
struct D : Base<D> { };
bool b = D{} == D{}; // error
}
// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
template <bool Const>
struct iterator {
using const_iterator = iterator<true>;
iterator();
template <bool B, std::enable_if_t<(Const && !B), int> = 0>
iterator(iterator<B> const&);
#ifndef FIXED
bool operator==(const_iterator const&) const;
#else
friend bool operator==(iterator const&, iterator const&);
#endif
};
bool c = iterator<false>{} == iterator<false>{} // error
|| iterator<false>{} == iterator<true>{}
|| iterator<true>{} == iterator<false>{}
|| iterator<true>{} == iterator<true>{};
}
// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
enum Color { Red };
struct C {
C();
C(Color);
operator Color() const;
bool operator==(Color) const;
friend bool operator==(C, C);
};
bool c = C{} == C{}; // error
bool d = C{} == Red;
}
Differential revision: https://reviews.llvm.org/D78938
Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.
This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.
As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.
Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.
Differential Revision: https://reviews.llvm.org/D92645
in ExtBinary format.
Currently ExtBinary format doesn't support multiple sections with the same type
in the profile. We add the support in this patch. Previously we use the section
type to identify a section uniquely. Now we introduces a LayoutIndex in the
SecHdrTableEntry and use the LayoutIndex to locate the target section. The
allocations of NameTable and FuncOffsetTable are adjusted accordingly.
Currently it works as a NFC because it won't change anything for current layout.
The test for multiple sections support will be included in another patch where a
new type of profile containing multiple sections with the same type is
introduced.
Differential Revision: https://reviews.llvm.org/D93254
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D91595
We can use LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> to infer the mask type from the anyvector_ty. This will save us from needing to pass it to getDeclaration when creating these intrinsics from clang.
No tests updates are needed because our declarations are exploiting a behavior in the IR parser where the declaration of an intrinsic doesn't need to mention all the types as long as there isn't a name conflict in the file.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D93409
Define vector widening mul intrinsics and lower them to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93381
Define vector mul/div/rem intrinsics and lower them to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93380
Define vle/vse intrinsics and lower to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93359
Add mir-check-debug pass to check MIR-level debug info.
For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D91595
The `assumes` directive is an OpenMP 5.1 feature that allows the user to
provide assumptions to the optimizer. Assumptions can refer to
directives (`absent` and `contains` clauses), expressions (`holds`
clause), or generic properties (`no_openmp_routines`, `ext_ABCD`, ...).
The `assumes` spelling is used for assumptions in the global scope while
`assume` is used for executable contexts with an associated structured
block.
This patch only implements the global spellings. While clauses with
arguments are "accepted" by the parser, they will simply be ignored for
now. The implementation lowers the assumptions directly to the
`AssumptionAttr`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D91980
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.
Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).
**Adjustment to profile format **
A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.
Differential Revision: https://reviews.llvm.org/D92347
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
This is being recommitted to try and address the MSVC complaint.
This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D90159
Per http://llvm.org/OpenProjects.html#llvm_loopnest, the goal of this
patch (and other following patches) is to create facilities that allow
implementing loop nest passes that run on top-level loop nests for the
New Pass Manager.
This patch extends the functionality of LoopPassManager to handle
loop-nest passes by specializing the definition of LoopPassManager that
accepts both kinds of passes in addPass.
Only loop passes are executed if L is not a top-level one, and both
kinds of passes are executed if L is top-level. Currently, loop nest
passes should have the following run method:
PreservedAnalyses run(LoopNest &, LoopAnalysisManager &,
LoopStandardAnalysisResults &, LPMUpdater &);
Reviewed By: Whitney, ychen
Differential Revision: https://reviews.llvm.org/D87045
Set the return variable to "" in find_first_existing_vc_file to
say that there is a repository, but no file to depend on. This works
transparently for all other callers that handle undefinedness and
equality to an empty string the same way.
Use the knowledge to avoid depending on __FakeVCSRevision.h if there
is no git repository at all (for example when building a release) as
there is no point in regenerating an empty VCSRevision.h.
Differential Revision: https://reviews.llvm.org/D92718
SUMMARY:
In order for the runtime on AIX to find the compact unwind section(EHInfo table),
we would need to set the following on the traceback table:
The 6th byte's longtbtable field to true to signal there is an Extended TB Table Flag.
The Extended TB Table Flag to be 0x08 to signal there is an exception handling info presents.
Emit the offset between ehinfo TC entry and TOC base after all other optional portions of traceback table.
The patch is authored by Jason Liu.
Reviewers: David Tenty, Digger Lin
Differential Revision: https://reviews.llvm.org/D92766
Adds cost model support for the new llvm.experimental.vector.{extract,insert}
intrinsics, using the existing getExtractSubvectorOverhead and
getInsertSubvectorOverhead functions for shuffles.
Previously this case would throw an assertion.
Differential Revision: https://reviews.llvm.org/D93043
This patch replaces FixedVectorType by VectorType in getIntrinsicInstrCost
in BasicTTIImpl.h. It re-arranges the scalable type test earlier return
and add tests for scalable types.
Depends on D91532
Differential Revision: https://reviews.llvm.org/D92094
This change makes use of the llvm.vector.extract intrinsic to avoid
going through memory when performing bitcasts between vector-length
agnostic types and vector-length specific types.
Depends on D91362
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D92761
When a field is optional we can use the `=<none>` syntax in macros.
This patch makes `Value`/`Size` fields of `Symbol` optional
and adds test cases for them.
Differential revision: https://reviews.llvm.org/D93010