There was no pattern to fold into these instructions. This patch adds
the pattern obtained from the following ACLE intrinsics so that they
generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and
sqadd/sqsub instructions:
- vqdmlalh_s16, vqdmlslh_s16
- vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16,
vqdmlslh_laneq_s16 (when the lane index is 0)
It also modifies the result of the existing pattern for the latter, when
the lane index is not 0, to use the v1i32_indexed instructions instead
of the v4i16_indexed ones.
Fixes#49997.
Differential Revision: https://reviews.llvm.org/D131700
As discussed in D85414 <https://reviews.llvm.org/D85414>, two tests
currently `FAIL` on Sparc since that backend uses the Sun assembler syntax
for the `.section` directive, controlled by
`SunStyleELFSectionSwitchSyntax`.
Instead of adapting the affected tests, this patch changes that default.
The internal assembler still accepts both forms as input, only the output
syntax is affected.
Current support for the Sun syntax is cursory at best: the built-in
assembler cannot even assemble some of the directives emitted by GCC, and
the set supported by the Solaris assembler is even larger: SPARC Assembly
Language Reference Manual, 3.4 Pseudo-Op Attributes
<https://docs.oracle.com/cd/E37838_01/html/E61063/gmabi.html#scrolltoc>.
A few Sparc test cases need to be adjusted. At the same time, the patch
fixes the failures from D85414 <https://reviews.llvm.org/D85414>.
Tested on `sparcv9-sun-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D85415
Currently, clang ignores the 0 initialisation in finite math
For example:
```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
f_prod *= arr[i];
}
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.
Reviewed By: fhahn, spatel
Differential Revision: https://reviews.llvm.org/D131672
When testing LLVM 15.0.0 rc1 on Solaris, I found that 50+ flang tests
`FAIL`ed with
error:
/vol/llvm/src/llvm-project/local/flang/lib/Optimizer/CodeGen/Target.cpp:310:
not yet implemented: target not implemented
This patch fixes that for Solaris/x86, where the fix is trivial (just
handling it like the other x86 OSes).
Tested on `amd64-pc-solaris2.11`; only a single failure remains now.
Differential Revision: https://reviews.llvm.org/D131054
Simdloop collapse clause is supported in the same way
as colllapse clause for worksharing loops.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D131674
Signed-off-by: Dominik Adamski <dominik.adamski@amd.com>
Handle cases where a forked pointer has an add or sub instruction
before reaching a select.
Reviewed By: fhahn
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130278
This reverts commit b8ecf32f81.
This commit introduces undefined behavior according to UBSan:
UndefinedBehaviorSanitizer: nullptr-with-nonzero-offset third_party/llvm/llvm-project/mlir/include/mlir/ExecutionEngine/CRunnerUtils.h:377:5
When resolving symbols during IR execution, lldb makes a last effort attempt
to resolve external symbols from object files by approximate name matching.
It currently uses `CPlusPlusNameParser` to parse the demangled function name
and arguments for the unresolved symbol and its candidates. However, this
hand-rolled C++ parser doesn’t support ABI tags which, depending on the demangler,
get demangled into `[abi:tag]`. This lack of parsing support causes lldb to never
consider a candidate mangled function name that has ABI tags.
The issue reproduces by calling an ABI-tagged template function from the
expression evaluator. This is particularly problematic with the recent
addition of ABI tags to numerous libcxx APIs.
The issue stems from the fact that `clang::CodeGen` emits function
function calls using the mangled name inferred from the `FunctionDecl`
LLDB constructs from DWARF. Debug info often lacks information for
us to construct a perfect FunctionDecl resulting in subtle mangled
name inaccuracies.
This patch side-steps the problem of inaccurate `FunctionDecl`s by
attaching an `asm()` label to each `FunctionDecl` LLDB creates from DWARF.
`clang::CodeGen` consults this label to get the mangled name as one of
the first courses of action when emitting a function call.
LLDB already does this for C++ member functions as of
[675767a591](https://reviews.llvm.org/D40283)
**Testing**
* Added API tests
Differential Revision: https://reviews.llvm.org/D131974
This is mostly a mechanical change to adapt standard type hierarchy
support proposed in LSP 3.17 on top of clangd's existing extension support.
This does mainly two things:
- Incorporate symbolids for all the parents inside resolution parameters, so
that they can be retrieved from index later on. This is a new code path, as
extension always resolved them eagerly.
- Propogate parent information when resolving children, so that at least one
branch of parents is always preserved. This is to address a shortcoming in the
extension.
This doesn't drop support for the extension, but it's deprecated from now on and
will be deleted in upcoming releases. Currently we use the same struct
internally but don't serialize extra fields.
Fixes https://github.com/clangd/clangd/issues/826.
Differential Revision: https://reviews.llvm.org/D131385
The https://reviews.llvm.org/D53014 added CMAKE_BUILD_TYPE to
the list of BOOTSTRAP_DEFAULT_PASSTHROUGH variables.
The downside is that both stage-1 and stage-2 configurations
are the same. So it is not possible to build different stage
configurations.
This patch allow explicit bootstrap options to override any
passthrough ones.
For instance, the following settings would build:
stage-1 (Release) and stage-2(Debug)
-DCMAKE_BUILD_TYPE=Release -DBOOTSTRAP_CMAKE_BUILD_TYPE=Debug
Reviewed By: @beanz
Differential Revision: https://reviews.llvm.org/D131755
Patch sets ARM cpu, before compiling JIT code. This enables FastISel for armv6 and higher CPUs and allows using hardware FPU
~~~
OS Laboratory. Huawei RRI. Saint-Petersburg
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D131783
When looking at template arguments in LLDB, we usually care about what
the user passed in his code, not whether some of those arguments where
passed as a variadic parameter pack.
This patch extends all the C++ APIs to look at template parameters to
take an additional 'expand_pack' boolean that automatically unwraps the
potential argument packs. The equivalent SBAPI calls have been changed
to pass true for this parameter.
A byproduct of the patch is to also fix the support for template type
that have only a parameter pack as argument (like the OnlyPack type in
the test). Those were not recognized as template instanciations before.
The added test verifies that the SBAPI is able to iterate over the
arguments of a variadic template.
The original patch was written by Fred Riss almost 4 years ago.
Differential revision: https://reviews.llvm.org/D51387
Add scheduling resource for vector segment load/store instructions in D128886.
I miss to add scheduling resource for pseudo segment instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D130222
Seems this complicated lldb sufficiently for some cases that it hasn't
been worth supporting/fixing there - and it so far hasn't provided any
new use cases/value for debug info consumers, so let's remove it until
someone has a use case for it.
(side note: the original implementation of this still had a bug (I
should've caught it in review) that we still didn't produce
auto-returning function declarations in types where the function wasn't
instantiatied (that requires a fix to remove the `if
getContainedAutoType` condition in
`CGDebugInfo::CollectCXXMemberFunctions` - without that, auto returning
functions were still being handled the same as member function templates
and special member functions - never added to the member list, only
attached to the type via the declaration chain from the definition)
Further discussion about this in D123319
This reverts commit 5ff992bca208a0e37ca6338fc735aec6aa848b72: [DEBUG-INFO] Change how we handle auto return types for lambda operator() to be consistent with gcc
This reverts commit c83602fdf51b2692e3bacb06bf861f20f74e987f: [DWARF5][clang]: Added support for DebugInfo generation for auto return type for C++ member functions.
Differential Revision: https://reviews.llvm.org/D131933
(sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1.
This would avoid the xori and in some cases remove an instruction
neede to materialize the constant.
We've started using C++17 constructs in compiler-rt now (e.g.
string_view in ORC), but when using the bootstrapping build, we won't
inherit the C++ standard from LLVM, and compilation may fail if we
default to an older standard. Explicitly build compiler-rt with C++17 in
a standalone build, which matches what other subprojects (e.g. Clang and
LLD) do.
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
Currently bpf supports calling kernel functions (x86_64, arm64, etc.)
in bpf programs. Tejun discovered a problem where the x86_64 func
return value (a unsigned char type) is stored in 8-bit subregister %al
and the other 56-bits in %rax might be garbage. But based on current
bpf ABI, the bpf program assumes the whole %rax holds the correct value
as the callee is supposed to do necessary sign/zero extension.
This mismatch between bpf and x86_64 caused the incorrect results.
To resolve this problem, this patch forced caller to do needed
sign/zero extension for 8/16-bit return values as well. Note that
32-bit return values already had sign/zero extension even without
this patch.
For example, for the test case attached to this patch:
$ cat t.c
_Bool bar_bool(void);
unsigned char bar_char(void);
short bar_short(void);
int bar_int(void);
int foo_bool(void) {
if (bar_bool() != 1) return 0; else return 1;
}
int foo_char(void) {
if (bar_char() != 10) return 0; else return 1;
}
int foo_short(void) {
if (bar_short() != 10) return 0; else return 1;
}
int foo_int(void) {
if (bar_int() != 10) return 0; else return 1;
}
Without this patch, generated call insns in IR looks like:
%call = call zeroext i1 @bar_bool()
%call = call zeroext i8 @bar_char()
%call = call signext i16 @bar_short()
%call = call i32 @bar_int()
So it is assumed that zero extension has been done for return values of
bar_bool()and bar_char(). Sign extension has been done for the return
value of bar_short(). The return value of bar_int() does not have any
assumption so caller needs to do necessary shifting to get correct
32bit values.
With this patch, generated call insns in IR looks like:
%call = call i1 @bar_bool()
%call = call i8 @bar_char()
%call = call i16 @bar_short()
%call = call i32 @bar_int()
There are no assumptions for return values of the above four function calls,
so necessary shifting is necessary for all of them.
The following is the objdump file difference for function foo_char().
Without this patch:
0000000000000010 <foo_char>:
2: 85 10 00 00 ff ff ff ff call -1
3: bf 01 00 00 00 00 00 00 r1 = r0
4: b7 00 00 00 01 00 00 00 r0 = 1
5: 15 01 01 00 0a 00 00 00 if r1 == 10 goto +1 <LBB1_2>
6: b7 00 00 00 00 00 00 00 r0 = 0
0000000000000038 <LBB1_2>:
7: 95 00 00 00 00 00 00 00 exit
With this patch:
0000000000000018 <foo_char>:
3: 85 10 00 00 ff ff ff ff call -1
4: bf 01 00 00 00 00 00 00 r1 = r0
5: 57 01 00 00 ff 00 00 00 r1 &= 255
6: b7 00 00 00 01 00 00 00 r0 = 1
7: 15 01 01 00 0a 00 00 00 if r1 == 10 goto +1 <LBB1_2>
8: b7 00 00 00 00 00 00 00 r0 = 0
0000000000000048 <LBB1_2>:
9: 95 00 00 00 00 00 00 00 exit
The zero extension of the return 'char' value is done here.
Differential Revision: https://reviews.llvm.org/D131598
This time using N1 instead of N0 since N1 points to the original
setcc. This now affects scheduling as I expected.
Original commit message:
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
We change seteq<->setne but it doesn't change the semantics
of the setcc. We should keep original debug location. This is
consistent with visitXor in the generic DAGCombiner.
The object contained in unique_ptr will be automatically deleted at the end of
the current scope. In createMachineFunction,
auto TM = createTargetMachine();
creates a TM contained in unique_ptr, a reference of the TM is stored in a
MachineFunction object, but at the end of the function, the TM is deleted, so
later access to the TM(and contained STI, TRI ...) through MachineFunction
object is invalid.
So we should not use unique_ptr<BogusTargetMachine> in functions
createMachineFunction and createTargetMachine.
Differential Revision: https://reviews.llvm.org/D131790
While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is
still preferrable. (addi X, -1) can be compressed, sub with x0 on
the LHS is never compressible.
ConvertToStructuredArray() relies on its caller to deallocate the heap-allocated object pointer it returns. One of its call-sites, in GetRenumberedThreadIds(), fails to deallocate causing a memory/resource leak. Fix the memory leak by converting the return type to shared_ptr, and clean up the rest of the file to use the typedef-ed shared_ptr types for StructuredData for safety and consistency.
Differential Revision: https://reviews.llvm.org/D131900