clang

Commit Graph

Author	SHA1	Message	Date
Craig Topper	923d959b17	[X86] Custom emit __builtin_rdtscp so we can emit an explicit store for the out parameter This is the clang side of D51803. The llvm intrinsic now returns two results. So we need to emit an explicit store in IR for the out parameter. This is similar to addcarry/subborrow/rdrand/rdseed. Differential Revision: https://reviews.llvm.org/D51805 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341699 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-07 19:14:24 +00:00
Craig Topper	26a2dfb9c8	[X86] Modify addcarry/subborrow builtins to emit an 2 result and intrinsic and an store instruction. This is the clang side of D51769. The llvm intrinsics now return two results instead of using an out parameter. Differential Revision: https://reviews.llvm.org/D51771 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341678 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-07 16:58:57 +00:00
Craig Topper	0b68110dff	[X86] Add ktest intrinsics to match gcc and icc. These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc. Includes these intrinsics: _ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8 _ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8 _ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8 _ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341265 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-31 22:29:56 +00:00
Craig Topper	5b7dd9bf04	[X86] Add k-mask conversion and load/store instrinsics to match gcc and icc. This adds: _cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64 _cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64 _load_mask8, _load_mask16, _load_mask32, _load_mask64 _store_mask8, _store_mask16, _store_mask32, _store_mask64 These are currently missing from the Intel Intrinsics Guide webpage. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341251 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-31 20:41:06 +00:00
Craig Topper	ead0755dee	[X86] Add kshift intrinsics to match gcc and icc. This adds the following intrinsics: _kshiftli_mask8 _kshiftli_mask16 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask8 _kshiftri_mask16 _kshiftri_mask32 _kshiftri_mask64 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@341234 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-31 18:22:52 +00:00
Craig Topper	1d39f2e5ad	[X86] Add kadd intrinsics to match gcc and icc. This adds the following intrinsics: _kadd_mask64 _kadd_mask32 _kadd_mask16 _kadd_mask8 These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340879 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-28 22:32:14 +00:00
Craig Topper	4d1a672881	[X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340798 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-28 06:28:25 +00:00
Craig Topper	119327d467	[X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340719 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-27 06:20:22 +00:00
Nico Weber	e73e540b40	Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr(). EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that work again. This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice previously. This change fixes that bug. https://reviews.llvm.org/D50979 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340348 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-21 22:19:55 +00:00
Sanjay Patel	7cb7a1d0c1	[CodeGen] add rotate builtins that map to LLVM funnel shift This is a partial retry of rL340137 (reverted at rL340138 because of gcc host compiler crashing) with 1 change: Remove the changes to make microsoft builtins also use the LLVM intrinsics. This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: https://reviews.llvm.org/D49242 With improved codegen in: https://reviews.llvm.org/rL337966 https://reviews.llvm.org/rL339359 And basic IR optimization added in: https://reviews.llvm.org/rL338218 https://reviews.llvm.org/rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340141 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-19 16:50:30 +00:00
Sanjay Patel	6e672c87df	revert r340137: [CodeGen] add rotate builtins At least a couple of bots (gcc host compiler on PPC only?) are showing the compiler dying while trying to compile. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340138 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-19 15:31:42 +00:00
Sanjay Patel	276876b8b3	[CodeGen] add/fix rotate builtins that map to LLVM funnel shift (retry) This is a retry of rL340135 (reverted at rL340136 because of gcc host compiler crashing) with 2 changes: 1. Move the code into a helper to reduce code duplication (and hopefully work-around the crash). 2. The original commit had a formatting bug in the docs (missing an underscore). Original commit message: This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: https://reviews.llvm.org/D49242 With improved codegen in: https://reviews.llvm.org/rL337966 https://reviews.llvm.org/rL339359 And basic IR optimization added in: https://reviews.llvm.org/rL338218 https://reviews.llvm.org/rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340137 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-19 14:44:47 +00:00
Sanjay Patel	db071c17c6	revert r340135: [CodeGen] add rotate builtins At least a couple of bots (PPC only?) are showing the compiler dying while trying to compile: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/11065/steps/build%20stage%201/logs/stdio http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/18267/steps/build%20stage%201/logs/stdio git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340136 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-19 13:48:06 +00:00
Sanjay Patel	3d796ad14a	[CodeGen] add rotate builtins This exposes the LLVM funnel shift intrinsics as more familiar bit rotation functions in clang (when both halves of a funnel shift are the same value, it's a rotate). We're free to name these as we want because we're not copying gcc, but if there's some other existing art (eg, the microsoft ops that are modified in this patch) that we want to replicate, we can change the names. The funnel shift intrinsics were added here: D49242 With improved codegen in: rL337966 rL339359 And basic IR optimization added in: rL338218 rL340022 ...so these are expected to produce asm output that's equal or better to the multi-instruction alternatives using primitive C/IR ops. In the motivating loop example from PR37387: https://bugs.llvm.org/show_bug.cgi?id=37387#c7 ...we get the expected 'rolq' x86 instructions if we substitute the rotate builtin into the source. Differential Revision: https://reviews.llvm.org/D50924 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340135 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-19 13:12:40 +00:00
Nico Weber	9e45fbd56c	Make __shiftleft128 / __shiftright128 real compiler built-ins. r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@340048 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-17 17:19:06 +00:00
Craig Topper	ccacccb1dd	[X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339845 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-16 07:28:06 +00:00
Tomasz Krupa	b8ef97085a	[X86] Lowering addus/subus intrinsics to native IR Summary: This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D46892 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339651 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-14 08:01:38 +00:00
Stephen Kelly	d7b659b592	Port getLocStart -> getBeginLoc Reviewers: teemperor! Subscribers: jholewinski, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D50350 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339385 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-09 21:08:08 +00:00
Craig Topper	495347b6b6	[Builtins] Implement __builtin_clrsb to be compatible with gcc gcc defines an intrinsic called __builtin_clrsb which counts the number of extra sign bits on a number. This is equivalent to counting the number of leading zeros on a positive number or the number of leading ones on a negative number and subtracting one from the result. Since we can't count leading ones we need to invert negative numbers to count zeros. This patch will cause the builtin to be expanded inline while gcc uses a call to a function like clrsbdi2 that is implemented in libgcc. But this is similar to what we already do for popcnt. And I don't think compiler-rt supports clrsbdi2. Differential Revision: https://reviews.llvm.org/D50168 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339282 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-08 19:55:52 +00:00
Scott Linder	5225b293ca	[OpenCL] Restore r338899 (reverted in r338904), fixing stack-use-after-return Always emit alloca in entry block for enqueue_kernel builtin. Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@339150 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-07 15:52:49 +00:00
Vlad Tsyrklevich	0d05391382	Revert "[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin" This reverts commit r338899, it was causing ASan test failures on sanitizer-x86_64-linux-fast. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338904 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-03 17:47:58 +00:00
Scott Linder	77acb45e80	[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. Differential Revision: https://reviews.llvm.org/D50104 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338899 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-03 15:50:52 +00:00
Heejin Ahn	2cc8473cc7	[WebAssembly] Support for atomic.wait / atomic.wake builtins Summary: Add support for atomic.wait / atomic.wake builtins based on the Wasm thread proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D49396 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338771 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-02 21:44:40 +00:00
Matt Arsenault	87aefa4312	Try to make builtin address space declarations not useless The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338707 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-02 12:14:28 +00:00
Fangrui Song	abdbb605f2	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@338291 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-30 19:24:48 +00:00
Ivan A. Kosarev	e701fd8949	[NEON] Fix support for vrndi_f32(), vrndiq_f32() and vrndns_f32() intrinsics This patch adds support for vrndi_f32() and vrndiq_f32() intrinsics in AArch32 mode and for vrndns_f32() intrinsic in AArch64 mode. Differential Revision: https://reviews.llvm.org/D48829 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337690 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-23 13:26:37 +00:00
Erich Keane	9eb2866868	Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337552 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-20 14:13:28 +00:00
Fangrui Song	1a8f8375fb	Change \t to spaces git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337530 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-20 08:19:20 +00:00
Nemanja Ivanovic	55ada98637	NFC: Remove extraneous semicolons as pointed out in the differential review The commit for https://reviews.llvm.org/D49424 missed the comment about the extraneous semicolons. Remove them. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337451 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-19 12:49:27 +00:00
Nemanja Ivanovic	9647e445b0	[PowerPC] Handle __builtin_xxpermdi the same way as GCC does The codegen for this builtin was initially implemented to match GCC. However, due to interest from users GCC changed behaviour to account for the big endian bias of the instruction and correct it. This patch brings the handling inline with GCC. Fixes https://bugs.llvm.org/show_bug.cgi?id=38192 Differential Revision: https://reviews.llvm.org/D49424 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337449 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-19 12:44:15 +00:00
Mandeep Singh Grang	4fbd5c5474	[COFF] Add more missing MSVC ARM64 intrinsics Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@337327 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-17 22:03:24 +00:00
Craig Topper	ba066d5d85	[X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336628 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-10 00:50:03 +00:00
Craig Topper	8ccbbcf855	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336622 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-10 00:37:25 +00:00
Craig Topper	dc74ef665f	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336583 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-09 19:00:16 +00:00
Craig Topper	1d7e46ba48	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336507 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-08 01:10:47 +00:00
Craig Topper	4cc1cb57a3	[X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336472 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-06 22:46:52 +00:00
Craig Topper	5727c8bbbf	[X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336417 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-06 07:14:47 +00:00
Craig Topper	62c4c0fac3	[X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336388 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 20:38:31 +00:00
Gabor Buella	aa316d40ec	[X86] Fix some vector cmp builtins - TRUE/FALSE predicates This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336355 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 14:26:56 +00:00
Craig Topper	0af2982d58	[X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335945 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-29 05:43:33 +00:00
Craig Topper	8631eadc24	[X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335745 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-27 15:57:57 +00:00
Ivan A. Kosarev	f040667a18	[NEON] Support vldNq intrinsics in AArch32 (Clang part) This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335734 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-27 13:58:43 +00:00
Craig Topper	265fa9857b	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335564 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-26 00:44:02 +00:00
Gabor Buella	84dae05553	[X86] Lower _mm[256\|512]_cmp[.]_mask intrinsics to native llvm IR Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335339 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-22 11:59:16 +00:00
Craig Topper	b3f7cdc3d9	[X86] Update handling in CGBuiltin to be tolerant of out of range immediates. D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@335308 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 23:39:47 +00:00
Tomasz Krupa	111922d4a2	Fix a bug introduced by rL334850 Summary: All *_sqrt_round_s[s\|d] intrinsics should execute a square root on zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around. Reviewers: itaraban, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D48288 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334964 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-18 17:57:05 +00:00
Tomasz Krupa	a577f7b462	[X86] Lowering sqrt intrinsics to native IR Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334850 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 18:05:59 +00:00
Luke Geeson	d3b82e815b	[AArch64] Reverted rC334696 with Clang VCVTA test fix git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334820 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 10:10:45 +00:00
Craig Topper	744535e476	[X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334773 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 22:02:35 +00:00
Tomasz Krupa	26e94130c2	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part) Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334741 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-14 17:36:23 +00:00

1 2 3 4 5 ...

965 Commits