Commit Graph

965 Commits

Author SHA1 Message Date
Craig Topper 923d959b17 [X86] Custom emit __builtin_rdtscp so we can emit an explicit store for the out parameter
This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed.

Differential Revision: https://reviews.llvm.org/D51805

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341699 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-07 19:14:24 +00:00
Craig Topper 26a2dfb9c8 [X86] Modify addcarry/subborrow builtins to emit an 2 result and intrinsic and an store instruction.
This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter.

Differential Revision: https://reviews.llvm.org/D51771

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341678 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-07 16:58:57 +00:00
Craig Topper 0b68110dff [X86] Add ktest intrinsics to match gcc and icc.
These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc.

Includes these intrinsics:
_ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8
_ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8
_ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8
_ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341265 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 22:29:56 +00:00
Craig Topper 5b7dd9bf04 [X86] Add k-mask conversion and load/store instrinsics to match gcc and icc.
This adds:
_cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64
_cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64
_load_mask8, _load_mask16, _load_mask32, _load_mask64
_store_mask8, _store_mask16, _store_mask32, _store_mask64

These are currently missing from the Intel Intrinsics Guide webpage.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341251 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 20:41:06 +00:00
Craig Topper ead0755dee [X86] Add kshift intrinsics to match gcc and icc.
This adds the following intrinsics:
_kshiftli_mask8
_kshiftli_mask16
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask8
_kshiftri_mask16
_kshiftri_mask32
_kshiftri_mask64

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341234 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 18:22:52 +00:00
Craig Topper 1d39f2e5ad [X86] Add kadd intrinsics to match gcc and icc.
This adds the following intrinsics:
_kadd_mask64
_kadd_mask32
_kadd_mask16
_kadd_mask8

These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340879 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-28 22:32:14 +00:00
Craig Topper 4d1a672881 [X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks.
This matches gcc and icc despite not being documented in the Intel Intrinsics Guide.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340798 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-28 06:28:25 +00:00
Craig Topper 119327d467 [X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers.
This also adds a second intrinsic name for the 16-bit mask versions.

These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340719 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-27 06:20:22 +00:00
Nico Weber e73e540b40 Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr().
EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that
work again.

This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice
previously. This change fixes that bug.

https://reviews.llvm.org/D50979


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340348 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-21 22:19:55 +00:00
Sanjay Patel 7cb7a1d0c1 [CodeGen] add rotate builtins that map to LLVM funnel shift
This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing)
with 1 change:
Remove the changes to make microsoft builtins also use the LLVM intrinsics.
 
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).

We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops) that we want to replicate, we can change the names.

The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242

With improved codegen in:
https://reviews.llvm.org/rL337966
https://reviews.llvm.org/rL339359

And basic IR optimization added in:
https://reviews.llvm.org/rL338218
https://reviews.llvm.org/rL340022

...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.

In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.

Differential Revision: https://reviews.llvm.org/D50924


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340141 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-19 16:50:30 +00:00
Sanjay Patel 6e672c87df revert r340137: [CodeGen] add rotate builtins
At least a couple of bots (gcc host compiler on PPC only?) are showing the compiler dying while trying to compile.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340138 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-19 15:31:42 +00:00
Sanjay Patel 276876b8b3 [CodeGen] add/fix rotate builtins that map to LLVM funnel shift (retry)
This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing)
with 2 changes:
1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash).
2. The original commit had a formatting bug in the docs (missing an underscore).

Original commit message:

This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang
(when both halves of a funnel shift are the same value, it's a rotate).

We're free to name these as we want because we're not copying gcc, but if there's some other
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate,
we can change the names.

The funnel shift intrinsics were added here:
https://reviews.llvm.org/D49242

With improved codegen in:
https://reviews.llvm.org/rL337966
https://reviews.llvm.org/rL339359

And basic IR optimization added in:
https://reviews.llvm.org/rL338218
https://reviews.llvm.org/rL340022

...so these are expected to produce asm output that's equal or better to the multi-instruction
alternatives using primitive C/IR ops.

In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.

Differential Revision: https://reviews.llvm.org/D50924


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340137 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-19 14:44:47 +00:00
Sanjay Patel db071c17c6 revert r340135: [CodeGen] add rotate builtins
At least a couple of bots (PPC only?) are showing the compiler dying while trying to compile:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/11065/steps/build%20stage%201/logs/stdio
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/18267/steps/build%20stage%201/logs/stdio


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340136 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-19 13:48:06 +00:00
Sanjay Patel 3d796ad14a [CodeGen] add rotate builtins
This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang 
(when both halves of a funnel shift are the same value, it's a rotate).

We're free to name these as we want because we're not copying gcc, but if there's some other 
existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, 
we can change the names.

The funnel shift intrinsics were added here:
D49242

With improved codegen in:
rL337966
rL339359

And basic IR optimization added in:
rL338218
rL340022

...so these are expected to produce asm output that's equal or better to the multi-instruction 
alternatives using primitive C/IR ops.

In the motivating loop example from PR37387:
https://bugs.llvm.org/show_bug.cgi?id=37387#c7
...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source.

Differential Revision: https://reviews.llvm.org/D50924


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340135 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-19 13:12:40 +00:00
Nico Weber 9e45fbd56c Make __shiftleft128 / __shiftright128 real compiler built-ins.
r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h.
Microsoft's STL plans on using these functions, and they're using intrin0.h
which just has declarations of built-ins to not pull in the huge intrin.h
header in the standard library headers. That requires that these functions are
real built-ins.

https://reviews.llvm.org/D50907


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340048 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-17 17:19:06 +00:00
Craig Topper ccacccb1dd [X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339845 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-16 07:28:06 +00:00
Tomasz Krupa b8ef97085a [X86] Lowering addus/subus intrinsics to native IR
Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations.

Reviewers: craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D46892



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339651 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-14 08:01:38 +00:00
Stephen Kelly d7b659b592 Port getLocStart -> getBeginLoc
Reviewers: teemperor!

Subscribers: jholewinski, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D50350

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339385 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-09 21:08:08 +00:00
Craig Topper 495347b6b6 [Builtins] Implement __builtin_clrsb to be compatible with gcc
gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros.

This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2.

Differential Revision: https://reviews.llvm.org/D50168

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339282 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-08 19:55:52 +00:00
Scott Linder 5225b293ca [OpenCL] Restore r338899 (reverted in r338904), fixing stack-use-after-return
Always emit alloca in entry block for enqueue_kernel builtin.

Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339150 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-07 15:52:49 +00:00
Vlad Tsyrklevich 0d05391382 Revert "[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin"
This reverts commit r338899, it was causing ASan test failures on sanitizer-x86_64-linux-fast.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338904 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-03 17:47:58 +00:00
Scott Linder 77acb45e80 [OpenCL] Always emit alloca in entry block for enqueue_kernel builtin
Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC
later because it is not in the entry block.

Differential Revision: https://reviews.llvm.org/D50104



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338899 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-03 15:50:52 +00:00
Heejin Ahn 2cc8473cc7 [WebAssembly] Support for atomic.wait / atomic.wake builtins
Summary:
Add support for atomic.wait / atomic.wake builtins based on the Wasm
thread proposal.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Differential Revision: https://reviews.llvm.org/D49396

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338771 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 21:44:40 +00:00
Matt Arsenault 87aefa4312 Try to make builtin address space declarations not useless
The way address space declarations for builtins currently work
is nearly useless. The code assumes the address spaces used for
builtins is a confusingly named "target address space" from user
code using __attribute__((address_space(N))) that matches
the builtin declaration. There's no way to use this to declare
a builtin that returns a language specific address space.
The terminology used is highly cofusing since it has nothing
to do with the the address space selected by the target to use
for a language address space.

This feature is essentially unused as-is. AMDGPU and NVPTX
are the only in-tree targets attempting to use this. The AMDGPU
builtins certainly do not behave as intended (i.e. all of the
builtins returning pointers can never compile because the numbered
address space never matches the expected named address space).

The NVPTX builtins are missing tests for some, and the others
seem to rely on an implicit addrspacecast.

Change the used address space for builtins based on a target
hook to allow using a language address space for a builtin.
This allows the same builtin declaration to be used for multiple
languages with similarly purposed address spaces (e.g. the same
AMDGPU builtin can be used in OpenCL and CUDA even though the
constant address spaces are arbitarily different).

This breaks the possibility of using arbitrary numbered
address spaces alongside the named address spaces for builtins.
If this is an issue we probably need to introduce another builtin
declaration character to distinguish language address spaces from
so-called "target address spaces".

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338707 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-02 12:14:28 +00:00
Fangrui Song abdbb605f2 Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338291 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 19:24:48 +00:00
Ivan A. Kosarev e701fd8949 [NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics
This patch adds support for vrndi_f32() and vrndiq_f32()
intrinsics in AArch32 mode and for vrndns_f32() intrinsic in
AArch64 mode.

Differential Revision: https://reviews.llvm.org/D48829


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337690 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-23 13:26:37 +00:00
Erich Keane 9eb2866868 Implement cpu_dispatch/cpu_specific Multiversioning
As documented here: https://software.intel.com/en-us/node/682969 and
https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning
is an ICC feature that provides for function multiversioning.

This feature is implemented with two attributes: First, cpu_specific,
which specifies the individual function versions. Second, cpu_dispatch,
which specifies the location of the resolver function and the list of
resolvable functions.

This is valuable since it provides a mechanism where the resolver's TU
can be specified in one location, and the individual implementions
each in their own translation units.

The goal of this patch is to be source-compatible with ICC, so this
implementation diverges from the ICC implementation in a few ways:
1- Linux x86/64 only: This implementation uses ifuncs in order to
properly dispatch functions. This is is a valuable performance benefit
over the ICC implementation. A future patch will be provided to enable
this feature on Windows, but it will obviously more closely fit ICC's
implementation.
2- CPU Identification functions: ICC uses a set of custom functions to identify
the feature list of the host processor. This patch uses the cpu_supports
functionality in order to better align with 'target' multiversioning.
1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function
marked cpu_dispatch be an empty definition. This patch supports that as well,
however declarations are also permitted, since the linker will solve the
issue of multiple emissions.

Differential Revision: https://reviews.llvm.org/D47474


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337552 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-20 14:13:28 +00:00
Fangrui Song 1a8f8375fb Change \t to spaces
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337530 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-20 08:19:20 +00:00
Nemanja Ivanovic 55ada98637 NFC: Remove extraneous semicolons as pointed out in the differential review
The commit for
https://reviews.llvm.org/D49424
missed the comment about the extraneous semicolons. Remove them.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337451 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-19 12:49:27 +00:00
Nemanja Ivanovic 9647e445b0 [PowerPC] Handle __builtin_xxpermdi the same way as GCC does
The codegen for this builtin was initially implemented to match GCC.
However, due to interest from users GCC changed behaviour to account for the
big endian bias of the instruction and correct it. This patch brings the
handling inline with GCC.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38192

Differential Revision: https://reviews.llvm.org/D49424


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337449 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-19 12:44:15 +00:00
Mandeep Singh Grang 4fbd5c5474 [COFF] Add more missing MSVC ARM64 intrinsics
Summary:
Added the following intrinsics:
_BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64
_InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64,
_InterlockedExchangeAdd64, _InterlockedExchangeSub64,
_InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64.

Reviewers: compnerd, mstorsjo, rnk, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D49445

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337327 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-17 22:03:24 +00:00
Craig Topper ba066d5d85 [X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336628 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-10 00:50:03 +00:00
Craig Topper 8ccbbcf855 [X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics.
This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.

This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.

I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336622 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-10 00:37:25 +00:00
Craig Topper dc74ef665f [Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it
This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336583 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 19:00:16 +00:00
Craig Topper 1d7e46ba48 [X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types.
This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336507 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-08 01:10:47 +00:00
Craig Topper 4cc1cb57a3 [X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case.
This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases.

Factor the code into a helper function to make this easy.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336472 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-06 22:46:52 +00:00
Craig Topper 5727c8bbbf [X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic.
This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336417 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-06 07:14:47 +00:00
Craig Topper 62c4c0fac3 [X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission.
Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles.

While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336388 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 20:38:31 +00:00
Gabor Buella aa316d40ec [X86] Fix some vector cmp builtins - TRUE/FALSE predicates
This patch removes on optimization used with the TRUE/FALSE
predicates, as was suggested in https://reviews.llvm.org/D45616
for r335339.
The optimization was buggy, since r335339 used it also
for *_mask builtins, without actually applying the mask -- the
mask argument was just ignored.

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D48715



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336355 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 14:26:56 +00:00
Craig Topper 0af2982d58 [X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335945 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-29 05:43:33 +00:00
Craig Topper 8631eadc24 [X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335745 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 15:57:57 +00:00
Ivan A. Kosarev f040667a18 [NEON] Support vldNq intrinsics in AArch32 (Clang part)
This patch reworks the support for dup NEON intrinsics as
described in D48439.

Differential Revision: https://reviews.llvm.org/D48440


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335734 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-27 13:58:43 +00:00
Craig Topper 265fa9857b [X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction.
Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335564 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-26 00:44:02 +00:00
Gabor Buella 84dae05553 [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
Summary:
Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.

Affected AVX512 builtins:

__builtin_ia32_cmpps128_mask
__builtin_ia32_cmpps256_mask
__builtin_ia32_cmpps512_mask
__builtin_ia32_cmppd128_mask
__builtin_ia32_cmppd256_mask
__builtin_ia32_cmppd512_mask

Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma

Reviewed By: craig.topper, spatel, efriedma

Differential Revision: https://reviews.llvm.org/D45616


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335339 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 11:59:16 +00:00
Craig Topper b3f7cdc3d9 [X86] Update handling in CGBuiltin to be tolerant of out of range immediates.
D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled.

This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335308 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 23:39:47 +00:00
Tomasz Krupa 111922d4a2 Fix a bug introduced by rL334850
Summary: All *_sqrt_round_s[s|d] intrinsics should execute a square root on
zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around.

Reviewers: itaraban, craig.topper

Reviewed By: craig.topper

Subscribers: craig.topper, cfe-commits

Differential Revision: https://reviews.llvm.org/D48288


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334964 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-18 17:57:05 +00:00
Tomasz Krupa a577f7b462 [X86] Lowering sqrt intrinsics to native IR
Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, cfe-commits

Differential Revision: https://reviews.llvm.org/D41168


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334850 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 18:05:59 +00:00
Luke Geeson d3b82e815b [AArch64] Reverted rC334696 with Clang VCVTA test fix
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334820 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 10:10:45 +00:00
Craig Topper 744535e476 [X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files.
The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8.

This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic.

Fixes the remaining issue from PR37795.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334773 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 22:02:35 +00:00
Tomasz Krupa 26e94130c2 [X86] Lowering Mask Scalar intrinsics to native IR (Clang part)
Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls
to native IR.

Reviewers: craig.topper, RKSimon, spatel, sroland

Reviewed By: craig.topper

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47979



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334741 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 17:36:23 +00:00