Commit Graph

163 Commits

Author SHA1 Message Date
hyeongyu kim 1b1c8d83d3 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2022-01-16 18:54:17 +09:00
Freddy Ye 0bab742805 [X86] Add missing CET intrinsics support
These two intrinsics are documented o SDM and intrinsic guide.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D116325
2022-01-04 11:40:40 +08:00
Philip Reames e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Phoebe Wang 925ec98d00 Revert "[X86][clang] Emit diagnostic for float and double when we have features -x87 and -sse on 64-bits"
This reverts commit 4a2c827b17.

Need to fix the problem when using `-mno-sse` together with "x86intrin.h"
2021-12-10 10:31:09 +08:00
Phoebe Wang d7c07f60b3 [X86][MS-InlineAsm] Make the constraint *m to be simple place holder
D113096 solved the "undefined reference to xxx" issue by adding
constraint *m for the global var. But it has strong side effect due to
the symbol in the assembly being replaced with constraint variable.
This leads to some lowering fails. https://godbolt.org/z/h3nWoerPe

This patch fix the problem by use the constraint *m as place holder
rather than real constraint. It has negligible effect for the existing
code generation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D115225
2021-12-10 09:29:38 +08:00
Phoebe Wang 4a2c827b17 [X86][clang] Emit diagnostic for float and double when we have features -x87 and -sse on 64-bits
A follow up of D114162.

Reviewed By: asavonic

Differential Revision: https://reviews.llvm.org/D114782
2021-12-08 09:50:26 +08:00
Zahira Ammarguellat fd759d42c9 Revert "The _Float16 type is supported on x86 systems with SSE2 enabled."
This reverts commit 6623c02d70.
The change seems to be breaking build of compiler-rt on Debian.
2021-11-23 08:00:57 -05:00
Zahira Ammarguellat 6623c02d70 The _Float16 type is supported on x86 systems with SSE2 enabled.
Operations are emulated by software emulation and “float” instructions.
This patch is allowing the support of _Float16 type without the use of
-max512fp16 flag. The final goal being, perform _Float16 emulation for
all arithmetic expressions.
2021-11-19 08:59:50 -05:00
Freddy Ye eb9dc0c78f [X86] add 3 missing intrinsics: _mm_(mask/maskz)_cvtpbh_ps
Reviewed By: craig.topper, pengfei

Differential Revision: https://reviews.llvm.org/D114059
2021-11-18 08:48:19 +08:00
Freddy Ye 73c9cf8204 [X86][FP16] add alias for f*mul_*ch intrinsics
*_mul_*ch is to align with *_mul_*s, *_mul_*d and *_mul_*h.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D112777
2021-11-17 13:26:11 +08:00
hyeongyu kim fd9b099906 Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit aacfbb953e.

Revert "Fix lit test failures in CodeGenCoroutines"

This reverts commit 63fff0f5bf.
2021-11-09 02:15:55 +09:00
hyeongyukim aacfbb953e [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

Resolve lit failures in clang after 8ca4b3e's land

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

Fix internal_clone(aarch64) inline assembly
2021-11-06 19:19:22 +09:00
Juneyoung Lee 89ad2822af Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a.
2021-11-06 15:39:19 +09:00
Juneyoung Lee 7584ef766a [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-11-06 15:36:42 +09:00
Shengchen Kan be08e452f3 [X86][MS-InlineAsm] Add constraint *m for memory access w/ global var
Constraint `*m` should be used when the address of a variable is passed
as a value. And the constraint is missing for MS inline assembly when sth
is written to the address of the variable.

The missing would cause FE delete the definition of the static varible,
and then result in "undefined reference to xxx" issue.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D113096
2021-11-05 09:11:41 +08:00
Arthur Eubanks cb5a10199b [test] Remove tests pinned to the legacy PM
Now that the legacy PM is deprecated for the optimization pipeline, we
can start deleting legacy PM tests.

For tests that test both PMs, merge the RUN lines.
Delete tests specific to the legacy PM.
2021-10-18 16:40:46 -07:00
Juneyoung Lee f193bcc701 Revert D105169 due to the two-stage failure in ASAN
This reverts the following commits:
37ca7a795b
9aa6c72b92
705387c507
8ca4b3ef19
80dba72a66
2021-10-18 23:52:46 +09:00
Juneyoung Lee 8ca4b3ef19 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453
2021-10-16 12:01:41 +09:00
Wang, Pengfei c0f9c7c015 [X86] Check if struct is blank before getting the inner types
This fixes pr52011.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D111037
2021-10-08 17:09:34 +08:00
Wang, Pengfei 7d6889964a [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
2021-09-27 09:27:04 +08:00
Wang, Pengfei 227673398c [X86] Always check the size of SourceTy before getting the next type
D109607 results in a regression in llvm-test-suite.
The reason is we didn't check the size of SourceTy, so that we will
return wrong SSE type when SourceTy is overlapped.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110037
2021-09-20 23:34:19 +08:00
Wang, Pengfei 5b47256fa5 [X86] Add test to show the effect caused by D109607. NFC 2021-09-20 23:34:18 +08:00
Wang, Pengfei e9e1d4751b [X86] Refactor GetSSETypeAtOffset to fix pr51813
D105263 adds support for _Float16 type. It introduced a bug (pr51813) that generates a <4 x half> type instead the default double when passing blank structure by SSE registers.

Although I doubt it may expose a bug somewhere other than D105263, it's good to avoid return half type when no half type in arguments.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109607
2021-09-17 10:51:59 +08:00
Xiang1 Zhang 1f1c71aeac [X86][InlineAsm] Use mem size information (*word ptr) for "global variable + registers" memory expression in inline asm.
Differential Revision: https://reviews.llvm.org/D109739
2021-09-15 16:11:14 +08:00
Xiang1 Zhang c81d6ab875 [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109488
2021-09-13 18:03:27 +08:00
Xiang1 Zhang bdce8d40c6 Revert "[X86] Adjust Keylocker handle mem size"
This reverts commit 3731de6b7f.
2021-09-13 18:00:46 +08:00
Xiang1 Zhang 3731de6b7f [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 17:59:33 +08:00
Wang, Pengfei 2aaa6466fe [X86] Support *_set1_pch(Float16 _Complex h)
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109487
2021-09-11 17:47:31 +08:00
Simon Pilgrim ea685e1028 [X86][AVX] Update _mm256_loadu2_m128* intrinsics to use _mm256_set_m128* (PR51796)
As reported on PR51796, the _mm256_loadu2_m128i in particular was inserting bitcasts and shuffles with different types making it trickier for some combines, and prevented the value tracker from identifying the shuffle sequences as a single insert_subvector style concat_vectors pattern.

This patch instead concatenate the 128-bit unaligned loads with _mm256_set_m128*, which was written to avoid the unnecessary bitcasts and only emits a single shuffle.

Differential Revision: https://reviews.llvm.org/D109497
2021-09-09 19:15:48 +01:00
Tianqing Wang 12fa608af4 [X86] Add CRC32 feature.
d8faf03807 implemented general-regs-only for X86 by disabling all features
with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses
only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this
instruction and allows it to be used with general-regs-only.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105462
2021-09-06 17:24:30 +08:00
Nico Weber e5438f3868 clang/win: Add __readfsdword to intrin.h
When using __readfsdword(), clang used to warn that one has
to include <intrin.h> -- no matter if that was already included
or not.

Now it only warns if it's not yet included.

To verify that this was the only intrin with this problem, I ran:

    $ for f in $(grep intrin.h clang/include/clang/Basic/BuiltinsX86* |
                 egrep -o '\([^,]+,' | egrep -o '[^(,]*'); do
        if ! grep -q $f clang/lib/Headers/intrin.h; then echo $f; fi;
      done

This printed 9 more functions, but those are all in emmintrin.h,
xsaveintrin.h (which are included by intrin.h based on /arch: flags).
So this is indeed the only built-in that was missing in intrin.h.

Fixes PR51188.

Differential Revision: https://reviews.llvm.org/D109085
2021-09-02 12:22:07 -04:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Wang, Pengfei ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Xiang1 Zhang 80f7ce8993 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 09:55:35 +08:00
Xiang1 Zhang 4c29dc18cf Revert "[X86] Support __SSC_MARK(const int id)"
This reverts commit 78fbde5779.
2021-08-30 09:50:26 +08:00
Xiang1 Zhang 78fbde5779 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 09:21:22 +08:00
Xiang1 Zhang fd88fac6ca Revert "[X86] Support __SSC_MARK(const int id)"
This reverts commit 83e82ff767.
2021-08-30 09:18:27 +08:00
Xiang1 Zhang 83e82ff767 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 08:51:20 +08:00
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Wang, Pengfei 5aeca3b0a5 [CFE][X86] Enable complex _Float16 support
Support complex _Float16 on X86 in C/C++ following the latest X86 psABI. (https://gitlab.com/x86-psABIs)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105331
2021-08-18 11:16:14 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Craig Topper 4190d99dfc [X86] Add parentheses around casts in some of the X86 intrinsic headers.
This covers the SSE and AVX/AVX2 headers. AVX512 has a lot more macros
due to rounding mode.

Fixes part of PR51324.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D107843
2021-08-13 09:36:16 -07:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Xiang1 Zhang 6d234a6908 [X86] Zero some outputs of Kelocker intrinsics in error case
Reviewed By: WangPengfei

Differential Revision: https://reviews.llvm.org/D104766
2021-06-29 13:35:40 +08:00
Craig Topper 7a112356e4 [X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector.
We need to mask the immediate to the width of a single vector
rather than 2 vectors. If we use the width of 2 vectors then
any shift larger than the length of 1 vector is going to overflow
the shuffle indices.

Fixes PR50895.
2021-06-26 19:06:00 -07:00
Bing1 Yu 56d5c46b49 [X86] Support __tile_stream_loadd intrinsic for new AMX interface
Adding support for __tile_stream_loadd intrinsic.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D103784
2021-06-11 17:28:43 +08:00
Michael Benfield cf49cae278 [Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable
These are intended to mimic warnings available in gcc.

Differential Revision: https://reviews.llvm.org/D100581
2021-06-01 15:38:48 -07:00
Juneyoung Lee a723ca32af fix broken clang tests after 7161bb87c9 2021-05-31 19:25:14 +09:00
Arthur Eubanks 3a0b6dc3e8 Revert "[Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable"
This reverts commit 14dfb3831c.

More false positives, see D100581.
2021-05-17 12:16:10 -07:00
Michael Benfield 14dfb3831c [Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable
These are intended to mimic warnings available in gcc.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D100581
2021-05-17 11:02:26 -07:00
Arthur Eubanks 6d8d133862 Revert "[Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable"
This reverts commit 9b0501abc7.

False positives reported in D100581.
2021-04-28 12:47:18 -07:00
Michael Benfield 9b0501abc7 [Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable
These are intended to mimic warnings available in gcc.

-Wunused-but-set-variable is triggered in the case of a variable which
appears on the LHS of an assignment but not otherwise used.

For instance:

  void f() {
    int x;
    x = 0;
  }

-Wunused-but-set-parameter works similarly, but for function parameters
instead of variables.

In C++, they are triggered only for scalar types; otherwise, they are
triggered for all types. This is gcc's behavior.

-Wunused-but-set-parameter is controlled by -Wextra, while
-Wunused-but-set-variable is controlled by -Wunused. This is slightly
different from gcc's behavior, but seems most consistent with clang's
behavior for -Wunused-parameter and -Wunused-variable.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D100581
2021-04-26 15:09:03 -07:00
Liu, Chen3 72e4bf12ee [X86] Support some missing intrinsics
Support for _mm512_i32logather_pd, _mm512_mask_i32logather_pd,
_mm512_i32logather_epi64, _mm512_mask_i32logather_epi64, _mm512_i32loscatter_pd,
_mm512_mask_i32loscatter_pd, _mm512_i32loscatter_epi64,
_mm512_mask_i32loscatter_epi64.

Differential Revision: https://reviews.llvm.org/D100368
2021-04-21 10:50:37 +08:00
Bing1 Yu 320b72e9cd [X86][AMX] Rename amx-bf16 intrinsic according to correct naming convention
__tile_tdpbf16ps should be renamed with __tile_dpbf16ps

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D98685
2021-03-17 11:22:52 +08:00
Thomas Preud'homme f60b35340f Stop traping on sNaN in __builtin_isinf
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: mibintc

Differential Revision: https://reviews.llvm.org/D97125
2021-03-15 15:38:08 +00:00
Thomas Preud'homme b7aeece47c Revert "Stop traping on sNaN in __builtin_isinf"
This reverts commit 1b6eb56aa0 because the
invert logic for isfinite is incorrect.
2021-03-04 12:07:35 +00:00
Thomas Preud'homme 1b6eb56aa0 Stop traping on sNaN in __builtin_isinf
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: mibintc

Differential Revision: https://reviews.llvm.org/D97125
2021-03-02 15:54:56 +00:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Liu, Chen3 4bc7c8631a [X86] Support amx-bf16 intrinsic.
Adding support for intrinsics of AMX-BF16.
This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong
predicate.

Differential Revision: https://reviews.llvm.org/D97358
2021-02-25 09:06:48 +08:00
Liu, Chen3 f8b9035aae [X86] Support amx-int8 intrinsic.
Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD.

Differential Revision: https://reviews.llvm.org/D97259
2021-02-23 17:08:05 +08:00
Wang, Pengfei 61da20575d [X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
This is a follow up of D92940.

We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93179
2021-02-15 08:52:06 +08:00
James Y Knight 8043d5a964 NFC: update clang tests to check ordering and alignment for atomicrmw/cmpxchg.
The ability to specify alignment was recently added, and it's an
important property which we should ensure is set as expected by
Clang. (Especially before making further changes to Clang's code in
this area.) But, because it's on the end of the lines, the existing
tests all ignore it.

Therefore, update all the tests to also verify the expected alignment
for atomicrmw and cmpxchg. While I was in there, I also updated uses
of 'load atomic' and 'store atomic', and added the memory ordering,
where that was missing.
2021-02-11 17:35:09 -05:00
Wang, Pengfei dd2460ed5d [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96231
2021-02-09 21:14:06 +08:00
Thomas Preud'homme 00a62547da Stop traping on sNaN in __builtin_isnan
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.

Reviewed By: kpn

Differential Revision: https://reviews.llvm.org/D95948
2021-02-05 18:28:48 +00:00
Kevin P. Neal 81b69879c9 [FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the X86 specific builtins.

Differential Revision: https://reviews.llvm.org/D94614
2021-02-03 11:49:17 -05:00
Fangrui Song 189f311130 CGDebugInfo CreatedLimitedType: Drop file/line for RecordType with invalid location
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.

In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D94735
2021-01-26 11:53:25 -08:00
Luo, Yuanke 7e1d2224b4 [X86][AMX] Fix the typo.
The dpbsud should be dpbssd.

Differential Revision: https://reviews.llvm.org/D94943
2021-01-19 16:57:34 +08:00
Fangrui Song fd739804e0 [test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.

To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.

To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
2020-12-31 00:27:11 -08:00
Luo, Yuanke 08665b1805 Support tilezero intrinsic and c interface for AMX.
Differential Revision: https://reviews.llvm.org/D92837
2020-12-31 13:24:57 +08:00
Fangrui Song 6b3351792c [test] Add {{.*}} to make tests immune to dso_local/dso_preemptable/(none) differences
For a definition (of most linkage types), dso_local is set for ELF -fno-pic/-fpie
and COFF, but not for Mach-O.  This nuance causes unneeded binary format differences.

This patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `
if there is an explicit linkage.

* Clang will set dso_local for Mach-O, which is currently implied by TargetMachine.cpp. This will make COFF/Mach-O and executable ELF similar.
* Eventually I hope we can make dso_local the textual LLVM IR default (write explicit "dso_preemptable" when applicable) and -fpic ELF will be similar to everything else. This patch helps move toward that goal.
2020-12-30 20:52:01 -08:00
Juneyoung Lee 420d046d6b clang-format, address warnings 2020-12-30 23:05:07 +09:00
Juneyoung Lee 9b29610228 Use unary CreateShuffleVector if possible
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.

Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)

The order is swapped, but in terms of correctness it is still fine.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D93923
2020-12-30 22:36:08 +09:00
Luo, Yuanke 981a0bd858 [X86] Add x86_amx type for intel AMX.
The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when
it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it
is used by load/store instruction. So amx intrinsics only operate on type x86_amx.
It can help to separate amx intrinsics from llvm IR instructions (+-*/).
Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981.

Differential Revision: https://reviews.llvm.org/D91927
2020-12-30 13:52:13 +08:00
Simon Pilgrim 4855a1004d [X86] Convert fadd/fmul _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.

Differential Revision: https://reviews.llvm.org/D92940
2020-12-13 15:37:35 +00:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Liu, Chen3 776f92e067 [X86] Add support for vex, vex2, vex3, and evex for MASM
For MASM syntax, the prefixes are not enclosed in braces.
The assembly code should like:
  "evex vcvtps2pd xmm0, xmm1"

Differential Revision: https://reviews.llvm.org/D90441
2020-11-20 16:20:19 +08:00
Liu, Chen3 756f597841 [X86] Support Intel avxvnni
This patch mainly made the following changes:

1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.

Differential Revision: https://reviews.llvm.org/D89105
2020-10-31 12:39:51 +08:00
Simon Pilgrim 973317cc5e [CodeGen][X86] Remove unused check-prefix in constrained fma tests 2020-10-30 16:23:08 +00:00
Simon Pilgrim 365f46efeb [CodeGen][X86] Remove unused check-prefix in movdir tests 2020-10-30 16:23:08 +00:00
Simon Pilgrim c44846f537 [CodeGen][X86] Cleanup + fix unused check-prefixes in bmi tests 2020-10-30 16:13:54 +00:00
Simon Pilgrim fe3d765ac7 [CodeGen][X86] Tidyup CHECKs on bitscan tests 2020-10-30 16:13:52 +00:00
Simon Pilgrim 5cdd470504 [CodeGen][X86] Remove unused check-prefix in bitscan tests 2020-10-30 16:13:50 +00:00
Simon Pilgrim 0ff9d8c8ba [CodeGen][X86] Remove unused check-prefix in bswap tests 2020-10-30 16:13:49 +00:00
Simon Pilgrim d7389f05ee [CodeGen][X86] Cleanup + remove unused check-prefixes in avx union tests 2020-10-30 16:13:47 +00:00
Simon Pilgrim bbe055dd73 [CodeGen][X86] Remove unused check-prefix in amx inline asm tests 2020-10-30 16:13:45 +00:00
Kiran Chandramohan c551ba0e90 Run test only if X86 target is available
This fixes failures in AArch64 buildbots by running the
clang/test/CodeGen/X86/att-inline-asm-prefix.c only when the X86
target is available.
2020-10-26 21:28:59 +00:00
Liu, Chen3 180548c5c7 [X86] VEX/EVEX prefix doesn't work for inline assembly.
For now, we lost the encoding information if we using inline assembly.
The encoding for the inline assembly will keep default even if we add
the vex/evex prefix.

Differential Revision: https://reviews.llvm.org/D90009
2020-10-26 08:37:45 +08:00
Tianqing Wang be39a6fe6f [X86] Add User Interrupts(UINTR) instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89301
2020-10-22 17:33:07 +08:00
Douglas Yung 774ab60125 Add option to use older clang ABI behavior when passing certain union types as function arguments
Recently commit D78699 (commit 26cfb6e562), fixed clang's behavior with respect
to passing a union type through a register to correctly follow the ABI. However,
this is an ABI breaking change with earlier versions of the clang compiler, so we
should add an -fclang-abi-compat option to address this. Additionally, the PS4 ABI
requires the older behavior, so that is added as well.

This change adds a Ver11 value to the ClangABI enum that when it is set (or the
target is the PS4 triple), we skip the ABI fix introduced in D78699.

Differential Revision: https://reviews.llvm.org/D89747
2020-10-19 18:17:34 -07:00
Matt Arsenault 0a7cd99a70 Reapply "OpaquePtr: Add type to sret attribute"
This reverts commit eb9f7c28e5.

Previously this was incorrectly handling linking of the contained
type, so this merges the fixes from D88973.
2020-10-16 11:05:02 -04:00
Simon Pilgrim d7fa9030d4 [CodeGen][X86] Emit fshl/fshr ir intrinsics for shiftleft128/shiftright128 ms intrinsics
Now that funnel shift handling is pretty good, we can use the intrinsics directly and avoid a lot of zext/trunc issues.

https://godbolt.org/z/YqhnnM

Differential Revision: https://reviews.llvm.org/D89405
2020-10-15 10:22:41 +01:00
Simon Pilgrim b967b9a711 [CodeGen] Move x86 specific ms intrinsic tests into x86 target subfolder. NFCI. 2020-10-14 17:37:26 +01:00
Liu, Chen3 bd05afcb3f [X86][NFC] Fix RUN line bug in the testcase
Testcase added in D78699 doesn't work because the wrong RUN line in the
testcase.

Differential Revision: https://reviews.llvm.org/D89361
2020-10-14 12:40:34 +08:00
Simon Pilgrim 6c23cbc560 [X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.

The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.

Differential Revision: https://reviews.llvm.org/D87604
2020-10-13 09:28:39 +01:00
Liu, Chen3 26cfb6e562 [X86] Passing union type through register
For example:

  union M256 {
    double d;
    __m256 m;
  };
  extern void foo1(union M256 A);
  union M256 m1;
  void test() {
    foo1(m1);
  }

clang will pass m1 through stack which does not follow the ABI.

Differential Revision: https://reviews.llvm.org/D78699
2020-10-09 11:24:29 +08:00