llvm-project

Commit Graph

Author	SHA1	Message	Date
Stefan Pintilie	4fc2f4979c	[PowerPC] Fix __builtin_ppc_load2r to return short instead of int. This patch fixes the return value of the builtin __builtin_ppc_load2r to correctly return short instead of int. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D110771	2021-10-04 06:17:02 -05:00
Quinn Pham	67a3d1e275	[PowerPC] swdiv builtins for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch implements the software divide builtin as wrappers for a floating point divide. XL provided these builtins because it didn't produce software estimates by default at `-Ofast`. When compiled with `-Ofast` these builtins will produce the software estimate for divide. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D106959	2021-09-29 11:31:07 -05:00
Quinn Pham	70391b3468	[PowerPC] FP compare and test XL compat builtins. This patch is in a series of patches to provide builtins for compatability with the XL compiler. This patch adds builtins for compare exponent and test data class operations on floating point values. Reviewed By: #powerpc, lei Differential Revision: https://reviews.llvm.org/D109437	2021-09-28 11:01:51 -05:00
Ahsan Saghir	593b074a09	[PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins This patch adds the following built-ins: __builtin_vsx_build_pair __builtin_mma_build_acc Reviewed By: #powerpc, nemanjai, lei Differential Revision: https://reviews.llvm.org/D107647	2021-09-27 19:51:28 -05:00
Wang, Pengfei	7d6889964a	[X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110336	2021-09-27 09:27:04 +08:00
Thomas Lively	2f519825ba	[WebAssembly] Add prototype relaxed SIMD fma/fms instructions Add experimental clang builtins, LLVM intrinsics, and backend definitions for the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Do not allow these instructions to be selected without explicit user opt-in. Differential Revision: https://reviews.llvm.org/D110295	2021-09-23 11:01:36 -07:00
Xiang1 Zhang	c81d6ab875	[X86] Adjust Keylocker handle mem size Reviewed By: Topper Craig Differential Revision: https://reviews.llvm.org/D109488	2021-09-13 18:03:27 +08:00
Xiang1 Zhang	bdce8d40c6	Revert "[X86] Adjust Keylocker handle mem size" This reverts commit `3731de6b7f`.	2021-09-13 18:00:46 +08:00
Xiang1 Zhang	3731de6b7f	[X86] Adjust Keylocker handle mem size Reviewed By: Topper Craig Differential Revision: https://reviews.llvm.org/D109354	2021-09-13 17:59:33 +08:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Andrei Elovikov	1724a16437	[NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp ...that is located under llvm/lib/Support/. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D108423	2021-08-30 09:48:48 -07:00
Wang, Pengfei	c728bd5bba	[X86] AVX512FP16 instructions enabling 5/6 Enable FP16 FMA instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105268	2021-08-24 09:07:19 +08:00
Andrei Elovikov	f5c2889488	[NFC][clang] Use X86 Features declaration from X86TargetParser ...instead of redeclaring them in clang's own X86Target.def. They were already required to be in sync (IIUC), so no reason to maintain two identical lists. Reviewed By: erichkeane, craig.topper Differential Revision: https://reviews.llvm.org/D108151	2021-08-23 12:30:28 -07:00
Simon Pilgrim	7f48bd3bed	CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC. Don't pass the struct by value.	2021-08-22 12:13:17 +01:00
Wang, Pengfei	b088536ce9	[X86] AVX512FP16 instructions enabling 4/6 Enable FP16 unary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105267	2021-08-22 08:59:35 +08:00
Thomas Lively	88962cea46	[WebAssembly] Restore builtins and intrinsics for pmin/pmax Partially reverts `85157c0079`, which had removed these builtins and intrinsics in favor of normal codegen patterns. It turns out that it is possible for the patterns to be split over multiple basic blocks, however, which means that DAG ISel is not able to select them to the pmin/pmax instructions. To make sure the SIMD intrinsics generate the correct instructions in these cases, reintroduce the clang builtins and corresponding LLVM intrinsics, but also keep the normal pattern matching as well. Differential Revision: https://reviews.llvm.org/D108387	2021-08-20 09:21:31 -07:00
Martin Storsjö	cc3affd8b0	[clang] [MSVC] Implement __mulh and __umulh builtins for aarch64 The code is based on the same __mulh and __umulh intrinsics for x86. This should fix PR51128. Differential Revision: https://reviews.llvm.org/D106721	2021-08-19 11:29:55 +03:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Wang, Pengfei	2379949aad	[X86] AVX512FP16 instructions enabling 3/6 Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265	2021-08-18 09:03:41 +08:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Anshil Gandhi	39dac1f7f6	[clang] Add clang builtins support for gfx90a Implement target builtins for gfx90a including fadd64, fadd32, add2h, max and min on various global, flat and ds address spaces for which intrinsics are implemented. Differential Revision: https://reviews.llvm.org/D106909	2021-08-05 02:08:06 -06:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Kai Luo	e4902e69e9	[PowerPC] Fix return type of XL compat CAS `__compare_and_swap*` should return `i32` rather than `i1`. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D107077	2021-07-29 14:49:26 +00:00
Thomas Lively	33786576fd	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Nemanja Ivanovic	1c50a5da36	[PowerPC] Implement partial vector ld/st builtins for XL compatibility XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a sequence of 1 to 16 bytes in big endian order, right justified in the vector register (regardless of target endianness). This is equivalent to vec_xl_len_r/vec_xst_len_r which are only available on Power9. This patch simply uses the Power9 functions when compiled for Power9, but provides a more general implementation for Power8. Differential revision: https://reviews.llvm.org/D106757	2021-07-26 13:19:52 -05:00
Thomas Lively	85157c0079	[WebAssembly] Codegen for pmin and pmax Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax} with standard codegen patterns. Since wasm_simd128.h uses an integer vector as the standard single vector type, the IR for the pmin and pmax intrinsic functions contains bitcasts that would not be there otherwise. Add extra codegen patterns that can still select the pmin and pmax instructions in the presence of these bitcasts. Differential Revision: https://reviews.llvm.org/D106612	2021-07-23 14:49:21 -07:00
Kai Luo	e4ed93cb25	[PowerPC] Implement XL compatible behavior of __compare_and_swap According to https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp XL's `__compare_and_swap` has a weird behavior that > In either case, the contents of the memory location specified by addr are copied into the memory location specified by old_val_addr. (unlike c11 `atomic_compare_exchange` specified in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) This patch let clang's implementation follow this behavior. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106344	2021-07-23 01:16:02 +00:00
Thomas Lively	8af333cf1a	[WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8 Use the standard target-independent intrinsic to take advantage of standard optimizations. Differential Revision: https://reviews.llvm.org/D106506	2021-07-21 16:45:54 -07:00
Thomas Lively	db7efcab7d	[WebAssembly] Remove clang builtins for extract_lane and replace_lane These builtins were added to capture the fact that the underlying Wasm instructions return i32s and implicitly sign or zero extend the extracted lanes in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations during code gen that these low-level details do not need to be exposed to users. This commit replaces the use of the builtins in wasm_simd128.h with normal target-independent vector code. As a result, we can switch the relevant intrinsics to use functions rather than macros and can use more user-friendly return types rather than trying to precisely expose the underlying Wasm types. Note, however, that the generated LLVM IR is no different after this change. Differential Revision: https://reviews.llvm.org/D106500	2021-07-21 16:11:00 -07:00
Thomas Lively	1a57ee1276	[WebAssembly] Codegen for v128.load{32,64}_zero Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal instruction selection patterns. The wasm_simd128.h intrinsics header was already using portable code for the corresponding intrinsics, so now it produces the correct instructions. Differential Revision: https://reviews.llvm.org/D106400	2021-07-21 09:02:12 -07:00
Quinn Pham	e002d251dd	[PowerPC] Floating Point Builtins for XL Compat. This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds builtins related to floating point operations Reviewed By: #powerpc, nemanjai, amyk, NeHuang Differential Revision: https://reviews.llvm.org/D103986	2021-07-21 08:33:39 -05:00
Albion Fung	2fd1520247	[PowerPC] Implemented mtmsr, mfspr, mtspr Builtins Implemented builtins for mtmsr, mfspr, mtspr on PowerPC; the patch is intended for XL Compatibility. Differential revision: https://reviews.llvm.org/D106130	2021-07-20 17:51:00 -05:00
Albion Fung	3434ac9e39	[PowerPC] Store, load, move from and to registers related builtins This patch implements store, load, move from and to registers related builtins, as well as the builtin for stfiw. The patch aims to provide feature parady with xlC on AIX. Differential revision: https://reviews.llvm.org/D105946	2021-07-20 15:46:14 -05:00
Victor Huang	1a762f93f8	[PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch add the builtin and emit target independent code for __cmpb. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D105194	2021-07-20 13:06:22 -05:00
Quinn Pham	fd855c24c7	[PowerPC] Restore FastMathFlags of Builder for Vector FDiv Builtins This patch fixes `__builtin_ppc_recipdivf`, `__builtin_ppc_recipdivd`, `__builtin_ppc_rsqrtf`, and `__builtin_ppc_rsqrtd`. FastMathFlags are set to fast immediately before emitting these builtins. Now the flags are restored to their previous values after the builtins are emitted. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D105984	2021-07-20 09:41:00 -05:00
Stefan Pintilie	02cd937945	[PowerPC][Builtins] Added a number of builtins for compatibility with XL. Added a number of different builtins that exist in the XL compiler. Most of these builtins already exist in clang under a different name. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D104386	2021-07-20 08:57:55 -05:00
Quinn Pham	0268e123be	[PowerPC] swdiv_nochk Builtins for XL Compat This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds software divide builtins with no checking. These builtins are each emitted as a fast fdiv. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D106150	2021-07-19 16:51:10 -05:00
Nikita Popov	2c68ecccc9	[OpaquePtr] Remove uses of CreateGEP() without element type Remove uses of to-be-deprecated API. In cases where the correct element type was not immediately obvious to me, fall back to explicit getPointerElementType().	2021-07-17 22:56:27 +02:00
Nikita Popov	6d3e7c783b	[OpaquePtr] Remove uses of CreateConstGEP1_32() without element type Remove uses of to-be-deprecated API. I've fallen back to calling getPointerElementType() in some cases where the correct type wasn't immediately obvious to me.	2021-07-17 18:32:36 +02:00
Nemanja Ivanovic	35a18a981f	[PowerPC] Implement intrinsics for mtfsf[i] This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`). The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands. Reviewed By: #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D105957	2021-07-16 16:26:11 -05:00
Victor Huang	4eb107ccba	[PowerPC] Add PowerPC population count, reversed load and store related builtins and instrinsics for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and instrisics for population count, reversed load and store related operations. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D106021	2021-07-15 17:23:56 -05:00
Artem Belevich	d774b4aa5e	[NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction. That should allow clang to compile mma.h from CUDA-11.3. Differential Revision: https://reviews.llvm.org/D105384	2021-07-15 12:02:09 -07:00
Quinn Pham	de3956605a	[PowerPC] Fix popcntb XL Compat Builtin for 32bit This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins. Reviewed By: #powerpc, nemanjai, amyk Differential Revision: https://reviews.llvm.org/D105360	2021-07-15 13:19:47 -05:00
Victor Huang	d40e8091bd	[PowerPC] Add PowerPC rotate related builtins and emit target independent code for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch adds the builtins and emit target independent code for rotate related operations. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D104744	2021-07-15 10:23:54 -05:00
Thomas Lively	4a4229f70f	[WebAssembly] Codegen for v128.storeX_lane instructions Replace the experimental clang builtins and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50435. Differential Revision: https://reviews.llvm.org/D106019	2021-07-14 16:15:25 -07:00
Thomas Lively	970e090010	[WebAssembly] Codegen for v128.loadX_lane instructions Replace the experimental clang builtin and LLVM intrinsics for these instructions with normal codegen patterns. Resolves PR50433. Differential Revision: https://reviews.llvm.org/D105950	2021-07-14 11:31:53 -07:00
Albion Fung	f1aca5ac96	[PowerPC] Fix L[D\|W]ARX Implementation LDARX and LWARX sometimes gets optimized out by the compiler when it is critical to the correctness of the code. This inline asm generation ensures that it preserved. Differential Revision: https://reviews.llvm.org/D105754	2021-07-13 11:02:07 -05:00
Thomas Lively	cbabfc63b1	[WebAssembly] Custom combines for f32x4.demote_zero_f64x2 Replace the clang builtin function and LLVM intrinsic for f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern. Differential Revision: https://reviews.llvm.org/D105755	2021-07-12 10:32:18 -07:00
Thomas Lively	e5220104d0	[WebAssembly] Custom combines for f64x2.promote_low_f32x4 Replace the clang builtin function and LLVM intrinsic previously used to select the f64x2.promote_low_f32x4 instruction with custom combines from standard SelectionDAG nodes. Implement the new combines to share code with the similar combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232. Differential Revision: https://reviews.llvm.org/D105675	2021-07-09 18:59:29 -07:00
Hsiangkai Wang	593bf9b4de	[Clang][RISCV] Implement vlseg and vlsegff. Differential Revision: https://reviews.llvm.org/D103527	2021-07-07 13:44:40 +08:00
Xiang1 Zhang	a39bb960fc	[X86] Refine code of generating BB labels in Keylocker Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105336	2021-07-05 09:29:51 +08:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Yaxun (Sam) Liu	434bd5bf54	[AMDGPU] Add builtin functions image_bvh_intersect_ray Reviewed by: Stanislav Mekhanoshin, Matt Arsenault Differential Revision: https://reviews.llvm.org/D104946	2021-06-30 13:10:47 -04:00
Melanie Blower	e773216f46	[clang][patch] Add builtin __arithmetic_fence and option fprotect-parens This patch adds a new clang builtin, __arithmetic_fence. The purpose of the builtin is to provide the user fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. The builtin prevents the optimizer from rearranging floating point expression evaluation. The new option fprotect-parens has the same effect on parenthesized expressions, forcing the optimizer to respect the parentheses. Reviewed By: aaron.ballman, kpn Differential Revision: https://reviews.llvm.org/D100118	2021-06-30 09:58:06 -04:00
Steffen Larsen	3644726a78	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0. PTX ISA description of - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D104847	2021-06-29 15:44:07 -07:00
Akira Hatanaka	8d21d54725	[CodeGen] Stop creating fake FunctionDecls when generating IR for functions implicitly generated by the compiler These fake functions would cause clang to crash if the changes proposed in https://reviews.llvm.org/D98799 were made.	2021-06-29 14:22:33 -07:00
Xiang1 Zhang	6d234a6908	[X86] Zero some outputs of Kelocker intrinsics in error case Reviewed By: WangPengfei Differential Revision: https://reviews.llvm.org/D104766	2021-06-29 13:35:40 +08:00
Melanie Blower	c27e5a2a8e	Revert "[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens" This reverts commit `4f1238e44d`. Buildbot fails on predecessor patch	2021-06-28 12:42:59 -04:00
Melanie Blower	4f1238e44d	[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens This patch adds a new clang builtin, __arithmetic_fence. The purpose of the builtin is to provide the user fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. The builtin prevents the optimizer from rearranging floating point expression evaluation. The new option fprotect-parens has the same effect on parenthesized expressions, forcing the optimizer to respect the parentheses. Reviewed By: aaron.ballman, kpn Differential Revision: https://reviews.llvm.org/D100118	2021-06-28 12:26:53 -04:00
Jinsong Ji	eb237ffca8	[PowerPC] Add XL Compat fetch builtins Prototype ``` unsigned int __fetch_and_add (volatile unsigned int* addr, unsigned int val); unsigned long __fetch_and_addlp (volatile unsigned long* addr, unsigned long val); ``` Ref: https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-fetch Reviewed By: #powerpc, w2yehia, lkail Differential Revision: https://reviews.llvm.org/D104991	2021-06-28 02:52:32 +00:00
Craig Topper	7a112356e4	[X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector. We need to mask the immediate to the width of a single vector rather than 2 vectors. If we use the width of 2 vectors then any shift larger than the length of 1 vector is going to overflow the shuffle indices. Fixes PR50895.	2021-06-26 19:06:00 -07:00
Jinsong Ji	f3ef4f5bff	[PowerPC] Add XL compat __compare_and_swap builtins Prototype int __compare_and_swap (volatile int* addr, int* old_val_addr, int new_val); int __compare_and_swaplp (volatile long* addr, long* old_val_addr, long new_val); Refer to https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp Reviewed By: w2yehia Differential Revision: https://reviews.llvm.org/D104837	2021-06-25 01:08:48 +00:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Bradley Smith	60c9b5f35c	[AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics Use llvm.experimental.vector.insert instead of storing into an alloca when generating code for these intrinsics. This defers the codegen of the generated vector to instruction selection, allowing existing shufflevector style optimizations to apply. Additionally, introduce a new target transform that can recognise fixed predicate patterns in the svbool variants of these intrinsics. Differential Revision: https://reviews.llvm.org/D103082	2021-06-07 12:21:38 +01:00
Irina Dobrescu	50511df32e	[AArch64] Lower bitreverse in ISel Adding lowering support for bitreverse. Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102397	2021-05-17 13:35:27 +01:00
Roman Lebedev	a624cec56d	[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs As it was discovered in post-commit feedback for `0aa0458f14`, we handle thunks incorrectly, and end up annotating their this/return with attributes that are valid for their callees, not for thunks themselves. While it would be good to fix this properly, and keep annotating them on thunks, i've tried doing that in https://reviews.llvm.org/D100388 with little success, and the patch is stuck for a month now. So for now, as a stopgap measure, subj.	2021-05-13 20:33:08 +03:00
Ahsan Saghir	25bbff632d	[PowerPC] Provide MMA builtins for compatibility Vector pair intrinsics and builtins were renamed in https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_. However, some projects used the _mma_ version, so this patch adds these intrinsics to provide compatibility. Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159 Reviewed By: nemanjai, amyk Differential Revision: https://reviews.llvm.org/D100482	2021-05-07 09:10:16 -05:00
Nemanja Ivanovic	c3da07d216	[PowerPC] Provide fastmath sqrt and div functions in altivec.h This adds the long overdue implementations of these functions that have been part of the ABI document and are now part of the "Power Vector Intrinsic Programming Reference" (PVIPR). The approach is to add new builtins and to emit code with the fast flag regardless of whether fastmath was specified on the command line. Differential revision: https://reviews.llvm.org/D101209	2021-04-30 19:17:48 -05:00
Ryan Santhirarajan	0395f9e70b	[ARM] Neon Polynomial vadd Intrinsic fix The Neon vadd intrinsics were added to the ARMSIMD intrinsic map, however due to being defined under an AArch64 guard in arm_neon.td, were not previously useable on ARM. This change rectifies that. It is important to note that poly128 is not valid on ARM, thus it was extracted out of the original arm_neon.td definition and separated for the sake of AArch64. Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D100772	2021-04-28 11:59:40 -07:00
Levy Hsu	8cf54c7ff5	[RISCV] [1/2] Add IR intrinsic for Zbe extension RV32/64: bcompress bdecompress RV64 ONLY: bcompressw bdecompressw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101143	2021-04-25 19:14:34 -07:00
Thomas Lively	502f54049d	[WebAssembly] Finalize wasm_simd128.h intrinsics Adds new intrinsics for instructions that are in the final SIMD spec but did not previously have intrinsics. Also updates the names of existing intrinsics to reflect the final names of the underlying instructions in the spec. Keeps the old names as deprecated functions to ease the transition to the new names. Differential Revision: https://reviews.llvm.org/D101112	2021-04-23 13:37:27 -07:00
Levy Hsu	b49337bbb9	[RISCV] [1/2] Add IR intrinsic for Zbp extension RV32/64: grev grevi gorc gorci shfl shfli unshfl unshfli RV64 ONLY: grevw greviw gorcw gorciw shflw shfli (For non-existing shfliw) unshfli (For non-existing unshfliw) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100830	2021-04-22 16:34:51 -07:00
Thomas Lively	5c729750a6	[WebAssembly] Remove saturating fp-to-int target intrinsics Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead. This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u}, which are now represented in IR as a saturating truncation to a v2i32 followed by a concatenation with a zero vector. Differential Revision: https://reviews.llvm.org/D100596	2021-04-16 12:11:20 -07:00
Thomas Lively	6a18cc23ef	[WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u} Removes the builtins and intrinsics used to opt in to using these instructions and replaces them with normal ISel patterns now that they are no longer prototypes. Differential Revision: https://reviews.llvm.org/D100402	2021-04-14 13:43:09 -07:00
Thomas Lively	af7925b4dd	[WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u} Add a custom DAG combine and ISD opcode for detecting patterns like (uint_to_fp (extract_subvector ...)) before the extract_subvector is expanded to ensure that they will ultimately lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions are no longer prototypes and can now be produced via standard IR, this commit also removes the target intrinsics and builtins that had been used to prototype the instructions. Differential Revision: https://reviews.llvm.org/D100425	2021-04-14 10:42:45 -07:00
Thomas Lively	af7ab81ce3	[WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops Now that these instructions are no longer prototypes, we do not need to be careful about keeping them opt-in and can use the standard LLVM infrastructure for them. This commit removes the bespoke intrinsics we were using to represent these operations in favor of the corresponding target-independent intrinsics. The clang builtins are preserved because there is no standard way to easily represent these operations in C/C++. For consistency with the scalar codegen in the Wasm backend, the intrinsic used to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though @llvm.roundeven better captures the semantics of the underlying Wasm instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is left to a potential future patch. Differential Revision: https://reviews.llvm.org/D100411	2021-04-14 09:19:27 -07:00
Yaxun (Sam) Liu	25942d7c49	[AMDGPU] Allow relaxed/consume memory order for atomic inc/dec Reviewed by: Jon Chesterfield Differential Revision: https://reviews.llvm.org/D100144	2021-04-09 09:23:41 -04:00
Simon Pilgrim	2901dc7575	Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI. Replace with castAs<> which asserts the cast is valid. Fixes a number of static analyzer warnings.	2021-04-06 12:24:19 +01:00
Craig Topper	b4f2e80600	[RISCV] Refactor conversion of B extensions to IR intrinsics a little to reduce clang binary size. These all pass 1 type to getIntrinsic. So rather than assigning IntrinsicTypes for each builtin which invokes the SmallVector constructor, just select the intrinsic ID with a switch and share a single assignment of IntrinsicTypes.	2021-04-02 23:49:44 -07:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Thomas Lively	45783d0e8a	[WebAssembly] Implement i64x2 comparisons Removes the prototype builtin and intrinsic for i64x2.eq and implements that instruction as well as the other i64x2 comparison instructions in the final SIMD spec. Unsigned comparisons were not included in the final spec, so they still need to be scalarized via a custom lowering. Differential Revision: https://reviews.llvm.org/D99623	2021-03-31 10:46:17 -07:00
Yaxun (Sam) Liu	cc9477166a	[CUDA][HIP] add __builtin_get_device_side_mangled_name Add builtin function __builtin_get_device_side_mangled_name to get device side manged name for functions and global variables, which can be used to get symbol address of kernels or variables by mangled name in dynamically loaded bundled code objects at run time. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D99301	2021-03-25 15:25:29 -04:00
Nemanja Ivanovic	4020932706	[PowerPC] Make altivec.h work with AIX which has no __int128 There are a number of functions in altivec.h that use vector __int128 which isn't supported on AIX. Those functions need to be guarded for targets that don't support the type. Furthermore, the functions that produce quadword instructions without using the type need a builtin. This patch adds the macro guards to altivec.h using the __SIZEOF_INT128__ which is only defined on targets that support the __int128 type.	2021-03-24 00:35:51 -05:00
Thomas Lively	f5764a8654	[WebAssembly] Finalize SIMD names and opcodes Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all SIMD instructions to match the finalized SIMD spec. Deliberately does not change the public interface in wasm_simd128.h yet; that will require more care. Depends on D98466. Differential Revision: https://reviews.llvm.org/D98676	2021-03-18 11:21:25 -07:00
Thomas Lively	2f2ae08da9	[WebAssembly] Remove experimental SIMD instructions Removes the instruction definitions, intrinsics, and builtins for qfma/qfms, signselect, and prefetch instructions, which were not included in the final WebAssembly SIMD spec. Depends on D98457. Differential Revision: https://reviews.llvm.org/D98466	2021-03-18 11:21:24 -07:00
Bradley Smith	cf0da91ba5	[AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE Previously NEON used a target specific intrinsic for frintn, given that the FROUNDEVEN ISD node now exists, move over to that instead and add codegen support for that node for both NEON and fixed length SVE. Differential Revision: https://reviews.llvm.org/D98487	2021-03-17 11:41:22 +00:00
Amy Huang	f5352dd9da	Emit inline implementation of __builtin__wmemchr on MSVCRT platforms. The MSVC runtime library doesn't have a definition for wmemchr, so provide an inline implementation. Differential Revision: https://reviews.llvm.org/D98472	2021-03-15 15:30:55 -07:00
Stelios Ioannou	ab86edbc88	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
Thomas Preud'homme	f60b35340f	Stop traping on sNaN in __builtin_isinf __builtin_isinf currently generates a floating-point compare operation which triggers a trap when faced with a signaling NaN in StrictFP mode. This commit uses integer operations instead to not generate any trap in such a case. Reviewed By: mibintc Differential Revision: https://reviews.llvm.org/D97125	2021-03-15 15:38:08 +00:00
Nikita Popov	42eb658f65	[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC) This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.	2021-03-12 21:01:16 +01:00
Nikita Popov	46354bac76	[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.	2021-03-11 14:40:57 +01:00
Nikita Popov	68e01339cc	[CGBuilder] Remove type-less CreateAlignedLoad() APIs (NFC) These are incompatible with opaque pointers. This is in preparation of dropping this API on the IRBuilder side as well. Instead explicitly pass the loaded type.	2021-03-11 10:41:23 +01:00
Zakk Chen	d6a0560bf2	[Clang][RISCV] Add custom TableGen backend for riscv-vector intrinsics. Demonstrate how to generate vadd/vfadd intrinsic functions 1. add -gen-riscv-vector-builtins for clang builtins. 2. add -gen-riscv-vector-builtin-codegen for clang codegen. 3. add -gen-riscv-vector-header for riscv_vector.h. It also generates ifdef directives with extension checking, base on D94403. 4. add -gen-riscv-vector-generic-header for riscv_vector_generic.h. Generate overloading version Header for generic api. https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#c11-generic-interface 5. update tblgen doc for riscv related options. riscv_vector.td also defines some unused type transformers for vadd, because I think it could demonstrate how tranfer type work and we need them for the whole intrinsic functions implementation in the future. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Reviewed By: jrtc27, craig.topper, HsiangKai, Jim, Paul-C-Anagnostopoulos Differential Revision: https://reviews.llvm.org/D95016	2021-03-10 18:43:43 -08:00
Jingu Kang	25951c5ab8	[AArch64] Add missing intrinsics for scalar FP rounding Differential Revision: https://reviews.llvm.org/D98269	2021-03-10 13:22:29 +00:00
Jingu Kang	9b302513f6	[AArch64] Add missing intrinsics for vrnd	2021-03-05 11:26:12 +00:00
Thomas Preud'homme	b7aeece47c	Revert "Stop traping on sNaN in __builtin_isinf" This reverts commit `1b6eb56aa0` because the invert logic for isfinite is incorrect.	2021-03-04 12:07:35 +00:00
Soumi Manna	eec7f8f7b1	[WebAssembly] Add missing default cases in switch statements unsigned variable 'IntNo' has been declared but not been defined inside function EmitWebAssemblyBuiltinExpr(). static code analysis tool complains about uninitialized variable "IntNo" since this enters to default branch without setting any intrinsics and calls Function *Callee = CGM.getIntrinsic(IntNo). This patch fixes the problem by adding default cases in switch statements.	2021-03-03 13:15:23 -08:00
Thomas Preud'homme	1b6eb56aa0	Stop traping on sNaN in __builtin_isinf __builtin_isinf currently generates a floating-point compare operation which triggers a trap when faced with a signaling NaN in StrictFP mode. This commit uses integer operations instead to not generate any trap in such a case. Reviewed By: mibintc Differential Revision: https://reviews.llvm.org/D97125	2021-03-02 15:54:56 +00:00
Hsiangkai Wang	1a35a1b074	[RISCV] Add vadd with mask and without mask builtin. Demonstrate how to add RISC-V V builtins and lower them to IR intrinsics for V extension. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com> Differential Revision: https://reviews.llvm.org/D93446	2021-02-24 07:57:31 +08:00
Ryan Santhiraraja	2c25efcbd3	[AArch64] Adding SHA3 Intrinsics support This patch adds the following SHA3 Intrinsics: vsha512hq_u64, vsha512h2q_u64, vsha512su0q_u64, vsha512su1q_u64 veor3q_u8 veor3q_u16 veor3q_u32 veor3q_u64 veor3q_s8 veor3q_s16 veor3q_s32 veor3q_s64 vrax1q_u64 vxarq_u64 vbcaxq_u8 vbcaxq_u16 vbcaxq_u32 vbcaxq_u64 vbcaxq_s8 vbcaxq_s16 vbcaxq_s32 vbcaxq_s64 Note need to include +sha3 and +crypto when building from the front-end Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96381	2021-02-22 12:09:20 +00:00
Christopher Tetreault	55448ab540	[AArch64] Adding Neon Polynomial vadd Intrinsics This patch adds the following intrinsics: vadd_p8 vadd_p16 vadd_p64 vaddq_p8 vaddq_p16 vaddq_p64 vaddq_p128 Reviewed By: t.p.northover, DavidSpickett, ctetreau Differential Revision: https://reviews.llvm.org/D96825	2021-02-19 14:48:12 -08:00
Pengxuan Zheng	0ec32f1326	Revert "[AArch64] Adding Neon Polynomial vadd Intrinsics" Revert the patch due to buildbot failures. This reverts commit `d9645059c5`.	2021-02-18 12:38:16 -08:00
Pengxuan Zheng	d9645059c5	[AArch64] Adding Neon Polynomial vadd Intrinsics This patch adds the following intrinsics: vadd_p8 vadd_p16 vadd_p64 vaddq_p8 vaddq_p16 vaddq_p64 vaddq_p128 Reviewed By: t.p.northover, DavidSpickett Differential Revision: https://reviews.llvm.org/D96825	2021-02-18 11:33:24 -08:00
Jonas Paulsson	e57bd1ff4f	[CFE, SystemZ] New target hook testFPKind() for checks of FP values. The recent commit `00a6254` "Stop traping on sNaN in builtin_isnan" changed the lowering in constrained FP mode of builtin_isnan from an FP comparison to integer operations to avoid trapping. SystemZ has a special instruction "Test Data Class" which is the preferred way to do this check. This patch adds a new target hook "testFPKind()" that lets SystemZ emit the s390_tdc intrinsic instead. testFPKind() takes the BuiltinID as an argument and is expected to soon handle more opcodes than just 'builtin_isnan'. Review: Thomas Preud'homme, Ulrich Weigand Differential Revision: https://reviews.llvm.org/D96568	2021-02-18 12:36:46 -06:00
Wang, Pengfei	61da20575d	[X86] Convert fmin/fmax _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) This is a follow up of D92940. We have successfully converted fadd/fmul _mm_reduce_* intrinsics to llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax too, i.e. llvm.reduction + nnan flag. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93179	2021-02-15 08:52:06 +08:00
Pengxuan Zheng	61cca0f2e5	[AArch64] Adding Neon Sm3 & Sm4 Intrinsics This adds SM3 and SM4 Intrinsics support for AArch64, specifically: vsm3ss1q_u32 vsm3tt1aq_u32 vsm3tt1bq_u32 vsm3tt2aq_u32 vsm3tt2bq_u32 vsm3partw1q_u32 vsm3partw2q_u32 vsm4eq_u32 vsm4ekeyq_u32 Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D95655	2021-02-11 14:20:20 -08:00
Wang, Pengfei	dd2460ed5d	[X86] Always assign reassoc flag for intrinsics reduce_add/mul_ps/pd. Intrinsics reduce_add/mul_ps/pd have assumption that the elements in the vector are reassociable. So we need to always assign the reassoc flag when we call _mm_reduce_* intrinsics. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D96231	2021-02-09 21:14:06 +08:00
Thomas Preud'homme	00a62547da	Stop traping on sNaN in __builtin_isnan __builtin_isnan currently generates a floating-point compare operation which triggers a trap when faced with a signaling NaN in StrictFP mode. This commit uses integer operations instead to not generate any trap in such a case. Reviewed By: kpn Differential Revision: https://reviews.llvm.org/D95948	2021-02-05 18:28:48 +00:00
Kevin P. Neal	81b69879c9	[FPEnv][X86] Platform builtins edition: clang should get from the AST the metadata for constrained FP builtins Currently clang is not correctly retrieving from the AST the metadata for constrained FP builtins. This patch fixes that for the X86 specific builtins. Differential Revision: https://reviews.llvm.org/D94614	2021-02-03 11:49:17 -05:00
Thomas Lively	4b68b64dcc	[WebAssembly] Prototype i8x16 to i32x4 widening instructions As proposed in https://github.com/WebAssembly/simd/pull/395 and matching the opcodes used in V8: https://chromium-review.googlesource.com/c/v8/v8/+/2617385/4/src/wasm/wasm-opcodes.h Differential Revision: https://reviews.llvm.org/D95557	2021-01-28 10:59:32 -08:00
Thomas Lively	11802eced5	[WebAssembly] Prototype new f64x2 conversions As proposed in https://github.com/WebAssembly/simd/pull/383. Differential Revision: https://reviews.llvm.org/D95012	2021-01-20 11:28:06 -08:00
Qiu Chaofan	168be42083	[Clang] Mutate long-double math builtins into f128 under IEEE-quad Under -mabi=ieeelongdouble on PowerPC, IEEE-quad floating point semantic is used for long double. This patch mutates call to related builtins into f128 version on PowerPC. And in theory, this should be applied to other targets when their backend supports IEEE 128-bit style libcalls. GCC already has these mutations except nansl, which is not available on PowerPC along with other variants (nans, nansf). Reviewed By: RKSimon, nemanjai Differential Revision: https://reviews.llvm.org/D92080	2021-01-15 16:56:20 +08:00
Lucas Prates	2b1e25befe	[AArch64] Adding ACLE intrinsics for the LS64 extension This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`, and `__arm_st64bv0`. These are selected into the LS64 instructions LD64B, ST64B, ST64BV and ST64BV0, respectively. Based on patches written by Simon Tatham. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D93232	2021-01-14 09:43:58 +00:00
Heejin Ahn	7be271537e	[WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin `wasm_rethrow_in_catch` intrinsic and builtin are used in order to rethrow an exception when the exception is caught but there is no matching clause within the current `catch`. For example, ``` try { foo(); } catch (int n) { ... } ``` If the caught exception does not correspond to C++ `int` type, it should be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch` because at the time I thought there would be another intrinsic for C++'s `throw` keyword, which rethrows an exception. It turned out that `throw` keyword doesn't require wasm's `rethrow` instruction, so we rename `rethrow_in_catch` to just `rethrow` here. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94038	2021-01-08 06:55:04 -08:00
Wang, Pengfei	c102b9697b	[X86] Correct the comments about comparison intrinsics. NFCI.	2021-01-08 15:36:15 +08:00
Thomas Lively	497026c902	[WebAssembly] Prototype prefetch instructions As proposed in https://github.com/WebAssembly/simd/pull/352 and using the opcodes used in the V8 prototype: https://chromium-review.googlesource.com/c/v8/v8/+/2543167. These instructions are only usable via intrinsics and clang builtins to make them opt-in while they are being benchmarked. Differential Revision: https://reviews.llvm.org/D93883	2021-01-05 11:32:03 -08:00
Thorsten Schütt	2fd11e0b1e	Revert "[NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II)" This reverts commit `efc82c4ad2`.	2021-01-04 23:17:45 +01:00
Thorsten Schütt	efc82c4ad2	[NFC, Refactor] Modernize StorageClass from Specifiers.h to a scoped enum (II) Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D93765	2021-01-04 22:58:26 +01:00
Brandon Bergren	6cee9d0cf8	[PowerPC] Support powerpcle target in Clang [3/5] Add powerpcle support to clang. For FreeBSD, assume a freestanding environment for now, as we only need it in the first place to build loader, which runs in the OpenFirmware environment instead of the FreeBSD environment. For Linux, recognize glibc and musl environments to match current usage in Void Linux PPC. Adjust driver to match current binutils behavior regarding machine naming. Adjust and expand tests. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D93919	2021-01-02 12:17:58 -06:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Thomas Lively	5e09e9979b	[WebAssembly] Prototype extending pairwise add instructions As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes the new instructions available only via clang builtins and LLVM intrinsics to make their use opt-in while they are still being evaluated for inclusion in the SIMD proposal. Depends on D93771. Differential Revision: https://reviews.llvm.org/D93775	2020-12-28 14:11:14 -08:00
Tom Stellard	3203143f13	CodeGen: Improve generated IR for __builtin_mul_overflow(uint, uint, int) Add a special case for handling __builtin_mul_overflow with unsigned inputs and a signed output to avoid emitting the __muloti4 library call on x86_64. __muloti4 is not implemented in libgcc, so avoiding this call fixes compilation of some programs that call __builtin_mul_overflow with these arguments. For example, this fixes the build of cpio with clang, which includes code from gnulib that calls __builtin_mul_overflow with these argument types. Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D84405	2020-12-17 14:30:31 -08:00
Baptiste Saleil	c2892978e9	[PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_ On PPC, the vector pair instructions are independent from MMA. This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names. We also move the vector pair type/intrinsic/builtin tests to their own files. Differential Revision: https://reviews.llvm.org/D91974	2020-12-17 13:19:27 -05:00
Simon Pilgrim	4855a1004d	[X86] Convert fadd/fmul _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well. Differential Revision: https://reviews.llvm.org/D92940	2020-12-13 15:37:35 +00:00
Melanie Blower	320af6b138	Create SPIRABIInfo to enable SPIR_FUNC calling convention. Background: Call to library arithmetic functions for div is emitted by the compiler and it set wrong “C” calling convention for calls to these functions, whereas library functions are declared with `spir_function` calling convention. InstCombine optimization replaces such calls with “unreachable” instruction. It looks like clang lacks SPIRABIInfo class which should specify default calling conventions for “system” function calls. SPIR supports only SPIR_FUNC and SPIR_KERNEL calling convention. Reviewers: Erich Keane, Anastasia Differential Revision: https://reviews.llvm.org/D92721	2020-12-12 05:48:20 -08:00
Florian Hahn	9c4cddb53a	[Clang] Add vcmla and rotated variants for Arm ACLE. This patch adds vcmla and the rotated variants as defined in "Arm Neon Intrinsics Reference for ACLE Q3 2020" [1] The _lane_ are still missing, but they can be added separately. This patch only adds the builtin mapping for AArch64. [1] https://developer.arm.com/documentation/ihi0073/latest Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D92930	2020-12-10 16:54:08 +00:00
Kevin P. Neal	abfbc5579b	[FPEnv] clang should get from the AST the metadata for constrained FP builtins Currently clang is not correctly retrieving from the AST the metadata for constrained FP builtins. This patch fixes that for the non-target specific builtins. Differential Revision: https://reviews.llvm.org/D92122	2020-11-30 11:59:37 -05:00
Reid Kleckner	1e843a987d	[MS] Add more 128bit cmpxchg intrinsics for AArch64 The MSVC STL for requires this on ARM64. Requested in https://llvm.org/pr47099 Depends on D92061 Differential Revision: https://reviews.llvm.org/D92062	2020-11-25 12:07:28 -08:00
Reid Kleckner	3bd0672726	[MS] Fix double evaluation of MSVC builtin arguments This code got quite twisted because we consider some MSVC builtins to be target agnostic, and some to be target specific. Target specific intrinsics have a pattern of doing up-front argument evaluation, while general intrinsics do not evaluate their arguments up front. As we tried to share codepaths between the target-specific and target-agnostic handling, we ended up doing double evaluation. Instead, have each target handle MSVC intrinsics consistently before up front argument evaluation. This requires passing less data around and is more consistent with target independent intrinsic handling. See D50979 for past examples of this bug. I noticed this while looking into adding some more intrinsics. Differential Revision: https://reviews.llvm.org/D92061	2020-11-25 11:55:01 -08:00
Florian Hahn	ca2e7e5999	[IRGen] Add !annotation metadata for auto-init stores. This patch updates Clang's IRGen to add !annotation nodes with an "auto-init" annotation to all stores for auto-initialization. As discussed in 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) this allows using optimization remarks to track down where auto-init code was inserted (and not removed by optimizations). There are a few cases in the tests where !annotation gets dropped by optimizations. Those optimizations will be updated in subsequent patches. This patch is based on a patch by Francis Visoiu Mistrih. Reviewed By: thegameg, paquette Differential Revision: https://reviews.llvm.org/D91417	2020-11-16 10:37:02 +00:00
Mehdi Amini	42e88bd6b1	Replace sequences of v.push_back(v[i]); v.erase(&v[i]); with std::rotate (NFC) The code has a few sequence that looked like: Ops.push_back(Ops[0]); Ops.erase(Ops.begin()); And are equivalent to: std::rotate(Ops.begin(), Ops.begin() + 1, Ops.end()); The latter has the advantage of never reallocating the vector, which would be a bug in the original code as push_back would read from the memory it deallocated.	2020-11-14 00:55:33 +00:00
Heejin Ahn	902ea588ea	[WebAssembly] Rename atomic.notify and *.atomic.wait - atomic.notify -> memory.atomic.notify - i32.atomic.wait -> memory.atomic.wait32 - i64.atomic.wait -> memory.atomic.wait64 See https://github.com/WebAssembly/threads/pull/149. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D91447	2020-11-13 12:04:48 -08:00
Baptiste Saleil	3f78605a8c	[PowerPC] Add paired vector load and store builtins and intrinsics This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs. Differential Revision: https://reviews.llvm.org/D90799	2020-11-13 12:35:10 -06:00
Qiu Chaofan	7faf62a80b	[Clang] Add more fp128 math library function builtins Since glibc has supported math library functions conforming IEEE 128-bit floating point types on some platform (like ppc64le), we can fix clang's math builtins missing this type. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D90593	2020-11-04 17:58:42 +08:00
Baptiste Saleil	daa127d77e	[PowerPC] Add MMA builtin decoding and definitions Add MMA builtin decoding. These builtins use the new PowerPC-specific types __vector_pair and __vector_quad. So to avoid pervasive changes, we use custom type descriptors and custom decoding for these builtins. We also use custom code generation to expand builtin calls with pointers to simpler intrinsic calls with non-pointer types. Differential Revision: https://reviews.llvm.org/D81748	2020-11-03 15:08:46 -06:00
Thomas Lively	a787e09779	[WebAssembly] Prototype i64x2.bitmask As proposed in https://github.com/WebAssembly/simd/pull/368. Differential Revision: https://reviews.llvm.org/D90514	2020-10-30 17:23:30 -07:00
Thomas Lively	0a512a555a	[WebAssembly] Prototype i64x2.eq As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still in the prototyping phase, it is only accessible via a target builtin function and a target intrinsic. Depends on D90504. Differential Revision: https://reviews.llvm.org/D90508	2020-10-30 16:38:15 -07:00
Thomas Lively	1cb0b56607	[WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u} As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these instructions are available only via builtin functions and intrinsics while they are in the prototyping stage. Differential Revision: https://reviews.llvm.org/D90504	2020-10-30 15:44:04 -07:00
Thomas Lively	be6f50798e	[WebAssembly] Implement SIMD signselect instructions As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes adopted by V8 in https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h. Uses new builtin functions and a new target intrinsic exclusively to ensure that the new instructions are only emitted when a user explicitly opts in to using them since they are still in the prototyping and evaluation phase. Differential Revision: https://reviews.llvm.org/D90357	2020-10-29 11:06:20 -07:00
Jon Chesterfield	dee7704829	[AMDGPU] Add __builtin_amdgcn_grid_size [AMDGPU] Add __builtin_amdgcn_grid_size Similar to D76772, loads the data from the dispatch pointer. Marked invariant. Patch also updates the openmp devicertl to use this builtin. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D90251	2020-10-29 16:25:13 +00:00
Thomas Lively	5b464f2aa5	[WebAssembly] Fix incorrectly named target builtin Rename __builtin_wasm_q15mulr_saturate_s_i8x16 to __builtin_wasm_q15mulr_saturate_s_i16x8, fixing the implied lane interpretation of the result.	2020-10-28 10:22:43 -07:00
Heejin Ahn	98941279b9	[WebAssembly] Clang-format builtins generation (NFC) Differential Revision: https://reviews.llvm.org/D90294	2020-10-28 10:01:21 -07:00
Thomas Lively	31e944556f	[WebAssembly] Prototype extending multiplication SIMD instructions As proposed in https://github.com/WebAssembly/simd/pull/376. This commit implements new builtin functions and intrinsics for these instructions, but does not yet add them to wasm_simd128.h because they have not yet been merged to the proposal. These are the first instructions with opcodes greater than 0xff, so this commit updates the MC layer and disassembler to handle that correctly. Differential Revision: https://reviews.llvm.org/D90253	2020-10-28 09:38:59 -07:00
Tyker	d3205bbca3	[Annotation] Allows annotation to carry some additional constant arguments. This allows using annotation in a much more contexts than it currently has. especially when annotation with template or constexpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D88645	2020-10-26 10:50:05 +01:00
Caroline Concatto	e8d9ee9c7c	[SVE][CodeGen]Use getFixedSize() function for TypeSize comparison in clang This patch makes sure that the instance of TypeSize comparison operator is done with a fixed type size. Differential Revision: https://reviews.llvm.org/D89312	2020-10-16 10:56:39 +01:00
Fangrui Song	5a338599fb	[CGBuiltin] Respect asm labels and redefine_extname for builtins with specialized emitting rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with specialized emitting (e.g. memcpy, various math functions) still do not work. This patch makes these functions work for `asm()` and `#pragma redefine_extname`. glibc uses `asm()` to redirect internal libc function calls to hidden aliases. Limitation: such a function is a builtin in clang, but will not be recognized as a libcall in optimization passes because Clang does not annotate the renamed function as a libcall. In GCC -O1 or above, `abs` can be optimized out but we can't. Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example: double sin(double x) asm("real_sin"); double f(double d) { return __builtin_sin(d); } --- According to @rsmith, the following three statements cannot be simultaneously true: (1) The frontend function foo has known, builtin semantics X. (2) The symbol foo has known, builtin semantics X. (3) It's not correct to lower a call to the frontend function foo to the symbol foo. People do want (1) (if it is profitable to expand a memcpy, do it). This also means that people do not want to add -fno-builtin-memcpy. People do want (3): that is why they use asm("__GI_memcpy") in the first place. So unfortunately we make a compromise by not refuting (2) (see the limitation above). For most libcalls, there is a small loss because compilers don't synthesize them. For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make the assembly level redirection. (Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable). Reviewed By: rsmith Differential Revision: https://reviews.llvm.org/D88712	2020-10-15 15:14:38 -07:00
Thomas Lively	1992e30c2d	[WebAssembly] Prototype i8x16.popcnt As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target builtin and intrinsic rather than normal codegen patterns to make the instruction opt-in until it is merged to the proposal and stabilized in engines. Differential Revision: https://reviews.llvm.org/D89446	2020-10-15 21:18:22 +00:00
Thomas Lively	3f738d1f5e	Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions" This reverts commit `7c8385a352` with a typing fix to an instruction selection pattern.	2020-10-15 19:32:34 +00:00
Thomas Lively	7c8385a352	Revert "[WebAssembly] v128.load{8,16,32,64}_lane instructions" This reverts commit `7c6bfd90ab`.	2020-10-15 15:49:36 +00:00
Thomas Lively	7c6bfd90ab	[WebAssembly] v128.load{8,16,32,64}_lane instructions Prototype the newly proposed load_lane instructions, as specified in https://github.com/WebAssembly/simd/pull/350. Since these instructions are not available to origin trial users on Chrome stable, make them opt-in by only selecting them from intrinsics rather than normal ISel patterns. Since we only need rough prototypes to measure performance right now, this commit does not implement all the load and store patterns that would be necessary to make full use of the offset immediate. However, the full suite of offset tests is included to make it easy to track improvements in the future. Since these are the first instructions to have a memarg immediate as well as an additional immediate, the disassembler needed some additional hacks to be able to parse them correctly. Making that code more principled is left as future work. Differential Revision: https://reviews.llvm.org/D89366	2020-10-15 15:33:10 +00:00
Simon Pilgrim	d7fa9030d4	[CodeGen][X86] Emit fshl/fshr ir intrinsics for shiftleft128/shiftright128 ms intrinsics Now that funnel shift handling is pretty good, we can use the intrinsics directly and avoid a lot of zext/trunc issues. https://godbolt.org/z/YqhnnM Differential Revision: https://reviews.llvm.org/D89405	2020-10-15 10:22:41 +01:00
Simon Pilgrim	6c23cbc560	[X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506) Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences. The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions. Differential Revision: https://reviews.llvm.org/D87604	2020-10-13 09:28:39 +01:00
Thomas Lively	d8f58bf53a	[WebAssembly] Prototype i16x8.q15mulr_sat_s This saturating, rounding, Q-format multiplication instruction is proposed in https://github.com/WebAssembly/simd/pull/365. Differential Revision: https://reviews.llvm.org/D88968	2020-10-09 21:17:53 +00:00
Craig Topper	a02b449bb1	[X86] Sync AESENC/DEC Key Locker builtins with gcc. For the wide builtins, pass a single input and output pointer to the builtins. Emit the GEPs and input loads from CGBuiltin.	2020-10-04 12:09:41 -07:00
Craig Topper	230c57b0bd	[X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned. We were taking multiple pointer arguments in the builtin. gcc accepts a single void. The cast from void to _m128i* caused the IR generation to assume the pointer was aligned. Instead make the builtin take a single void, emit i8 GEPs to adjust then cast to <2 x i64>* and perform a store with align of 1.	2020-10-04 12:09:35 -07:00
Richard Smith	8fb2a235b0	Don't reject calls to MinGW's unusual _setjmp declaration. We now recognize this function as a builtin despite it having an unexpected number of parameters; make sure we don't enforce that it has only 1 argument for its 2 parameters.	2020-10-02 15:12:15 -07:00
Xiang1 Zhang	413577a879	[X86] Support Intel Key Locker Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access to the raw key value by converting AES keys into “handles”. These handles can be used to perform the same encryption and decryption operations as the original AES keys, but they only work on the current system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot), then any previous handles can no longer be used. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D88398	2020-09-30 18:08:45 +08:00
Craig Topper	288c5776c9	[X86] Use inlineasm flag output for the _bittest* intrinsics. Instead of expliciting emitting a setc in the inline asm instructions, we can use flag output. This allows the backend to use the flag directly if it is needed by a branch. Previously we needed a test instruction to convert the register back to a flag. If the flag can't be used directly, the backend will emit a setcc. Differential Revision: https://reviews.llvm.org/D87888	2020-09-28 13:33:22 -07:00
David Sherwood	bafdd11326	[SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy After some recent upstream discussion we decided that it was best to avoid having the / operator for both ElementCount and TypeSize, since this could give the impression that these classes can be used in the same way as basic integer integer types. However, division for scalable types is a bit odd because we are only dividing the minimum quantity by a value, as opposed to something like: (MinSize * Vscale) / SomeValue This is why when performing division it's important the caller first establishes whether the operation makes sense, perhaps by calling isKnownMultipleOf() prior to division. The caller must now explictly call divideCoefficientBy() on the class to perform the operation. Differential Revision: https://reviews.llvm.org/D87700	2020-09-28 08:03:00 +01:00
Amy Kwan	6b136b19cb	[Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. This patch implements custom codegen for the vec_replace_elt and vec_replace_unaligned builtins. These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd intrinsics depending on the arguments. The main motivation for doing custom codegen for these intrinsics is because there are float and double versions of the builtin. Normally, the converting the float to an integer would be done via fptoui in the IR. This is incorrect as fptoui truncates the value and we must ensure the value is not truncated. Therefore, we provide custom codegen to utilize bitcast instead as bitcasts do not truncate. Differential Revision: https://reviews.llvm.org/D83500	2020-09-23 22:55:25 -05:00
Craig Topper	d9717d8ee7	[X86] Add a memory clobber to the bittest intrinsic inline asm. Get default clobbers from the target I believe the inline asm emitted here should have a memory clobber since it writes to memory. It was also missing the dirflag clobber that we use by default along with flags and fpsr. To avoid missing defaults in the future, get the default list from the target Differential Revision: https://reviews.llvm.org/D88121	2020-09-23 14:54:39 -07:00
Stanislav Mekhanoshin	59691dc874	[AMDGPU] Make ds fp atomics overloadable Differential Revision: https://reviews.llvm.org/D87947	2020-09-23 11:39:50 -07:00
Simon Pilgrim	9eab73fa17	[X86] Update SSE/AVX integer MINMAX intrinsics to emit llvm.smax.* etc. (PR46851) We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics. This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern. Differential Revision: https://reviews.llvm.org/D87603	2020-09-15 11:19:08 +01:00
Qiu Chaofan	88ff4d2ca1	[PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering In standard C library, both rint and nearbyint returns rounding result in current rounding mode. But nearbyint never raises inexact exception. On PowerPC, x(v\|s)r(d\|s)pic may modify FPSCR XX, raising inexact exception. So we can't select constrained fnearbyint into xvrdpic. One exception here is xsrqpi, which will not raise inexact exception, so fnearbyint f128 is okay here. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87220	2020-09-09 22:40:58 +08:00
Simon Pilgrim	2853ae3c1b	[X86] Update SSE/AVX ABS intrinsics to emit llvm.abs.* (PR46851) We're now getting close to having the necessary analysis/combines etc. for the new generic llvm.abs.* intrinsics. This patch updates the SSE/AVX ABS vector intrinsics to emit the generic equivalents instead of the icmp+sub+select code pattern. Differential Revision: https://reviews.llvm.org/D87101	2020-09-07 13:54:12 +01:00
Simon Pilgrim	a8a91533dd	[X86] Replace EmitX86AddSubSatExpr with EmitX86BinaryIntrinsic generic helper. NFCI. Feed the Intrinsic::ID value directly instead of via the IsSigned/IsAddition bool flags.	2020-09-07 13:33:48 +01:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Mikhail Maltsev	ae1396c7d4	[ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics This patch adjusts the following ARM/AArch64 LLVM IR intrinsics: - neon_bfmmla - neon_bfmlalb - neon_bfmlalt so that they take and return bf16 and float types. Previously these intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from implementation lacking bf16 IR type). The neon_vbfdot[q] intrinsics are adjusted similarly. This change required some additional selection patterns for vbfdot itself and also for vector shuffles (in a previous patch) because of SelectionDAG transformations kicking in and mangling the original code. This patch makes the generated IR cleaner (less useless bitcasts are produced), but it does not affect the final assembly. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86146	2020-08-27 18:43:16 +01:00
Christopher Tetreault	19e883fc59	[SVE] Remove calls to VectorType::getNumElements from clang Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D82582	2020-08-26 11:12:26 -07:00
Wang, Pengfei	9512525947	[X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics. When we use mask compare intrinsics under strict FP option, the masked elements shouldn't raise any exception. So, we cann't replace the intrinsic with a full compare + "and" operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D85385	2020-08-11 10:28:41 +08:00
Yonghong Song	00602ee7ef	BPF: simplify IR generation for __builtin_btf_type_id() This patch simplified IR generation for __builtin_btf_type_id(). For __builtin_btf_type_id(obj, flag), previously IR builtin looks like if (obj is a lvalue) llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type else llvm.bpf.btf.type.id(obj, 0, flag) !type The purpose of the 2nd argument is to differentiate __builtin_btf_type_id(obj, flag) where obj is a lvalue vs. __builtin_btf_type_id(obj.ptr, flag) Note that obj or obj.ptr is never used by the backend and the `obj` argument is only used to derive the type. This code sequence is subject to potential llvm CSE when - obj is the same .e.g., nullptr - flag is the same - metadata type is different, e.g., typedef of struct "s" and strust "s". In the above, we don't want CSE since their metadata is different. This patch change IR builtin to llvm.bpf.btf.type.id(seq_num, flag) !type and seq_num is always increasing. This will prevent potential llvm CSE. Also report an error if the type name is empty for remote relocation since remote relocation needs non-empty type name to do relocation against vmlinux. Differential Revision: https://reviews.llvm.org/D85174	2020-08-04 16:29:42 -07:00
Yonghong Song	6d67506964	[clang][BPF] support type exist/size and enum exist/value relocations This patch added the following additional compile-once run-everywhere (CO-RE) relocations: - existence/size of typedef, struct/union or enum type - enum value and enum value existence These additional relocations will make CO-RE bpf programs more adaptive for potential kernel internal data structure changes. For existence/size relocations, the following two code patterns are supported: 1. uint32_t __builtin_preserve_type_info((<type> )0, flag); 2. <type> var; uint32_t __builtin_preserve_field_info(var, flag); flag = 0 for existence relocation and flag = 1 for size relocation. For enum value existence and enum value relocations, the following code pattern is supported: uint64_t __builtin_preserve_enum_value((<enum_type> )<enum_value>, flag); flag = 0 means existence relocation and flag = 1 for enum value. relocation. In the above <enum_type> can be an enum type or a typedef to enum type. The <enum_value> needs to be an enumerator value from the same enum type. The return type is uint64_t to permit potential 64bit enumerator values. Differential Revision: https://reviews.llvm.org/D83242	2020-08-04 08:39:53 -07:00
Thomas Lively	cb32792210	[WebAssembly] Implement prototype v128.load{32,64}_zero instructions Specified in https://github.com/WebAssembly/simd/pull/237, these instructions load the first vector lane from memory and zero the other lanes. Since these instructions are not officially part of the SIMD proposal, they are only available on an opt-in basis via LLVM intrinsics and clang builtin functions. If these instructions are merged to the proposal, this implementation will change so that the instructions will be generated from normal IR. At that point the intrinsics and builtin functions would be removed. This PR also changes the opcodes for the experimental f32x4.qfm{a,s} instructions because their opcodes conflicted with those of the v128.load{32,64}_zero instructions. The new opcodes were chosen to match those used in V8. Differential Revision: https://reviews.llvm.org/D84820	2020-08-03 13:54:00 -07:00
Eli Friedman	8dfb5d767e	[clang codegen][AArch64] Use llvm.aarch64.neon.fcvtzs/u where it's necessary fptosi/fptoui have similar, but not identical, semantics. In particular, the behavior on overflow is different. Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit. (The corresponding patch for 32-bit is more involved because the equivalent intrinsics don't exist, as far as I can tell.) Differential Revision: https://reviews.llvm.org/D84703	2020-07-30 15:41:54 -07:00
Thomas Lively	11bb7eef41	[WebAssembly] Remove intrinsics for SIMD widening ops Instead, pattern match extends of extract_subvectors to generate widening operations. Since extract_subvector is not a legal node, this is implemented via a custom combine that recognizes extract_subvector nodes before they are legalized. The combine produces custom ISD nodes that are later pattern matched directly, just like the intrinsic was. Also removes the clang builtins for these operations since the instructions can now be generated from portable code sequences. Differential Revision: https://reviews.llvm.org/D84556	2020-07-28 18:25:55 -07:00
Richard Smith	6c18f7db73	For PR46800, implement the GCC __builtin_complex builtin. glibc's implementation of the CMPLX macro uses it (with -fgnuc-version set to 4.7 or later).	2020-07-22 13:43:10 -07:00
David Blaikie	36036aa70e	Reapply "Rename/refactor isIntegerConstantExpression to getIntegerConstantExpression" Reapply `49e5f603d4` which had been reverted in `c94332919b`. Originally reverted because I hadn't updated it in quite a while when I got around to committing it, so there were a bunch of missing changes to new code since I'd written the patch. Reviewers: aaron.ballman Differential Revision: https://reviews.llvm.org/D76646	2020-07-21 20:57:12 -07:00
Tim Northover	9697a9e2d3	Fix typo in identifier in assert.	2020-07-15 09:57:53 +01:00
Tim Northover	5165b2b5fd	AArch64+ARM: make LLVM consider system registers volatile. Some of the system registers readable on AArch64 and ARM platforms return different values with each read (for example a timer counter), these shouldn't be hoisted outside loops or otherwise interfered with, but the normal @llvm.read_register intrinsic is only considered to read memory. This introduces a separate @llvm.read_volatile_register intrinsic and maps all system-registers on ARM platforms to use it for the __builtin_arm_rsr calls. Registers declared with asm("r9") or similar are unaffected.	2020-07-15 09:47:36 +01:00
David Blaikie	c94332919b	Revert "Rename/refactor isIntegerConstantExpression to getIntegerConstantExpression" Broke buildbots since I hadn't updated this patch in a while. Sorry for the noise. This reverts commit `49e5f603d4`.	2020-07-12 20:29:19 -07:00
David Blaikie	49e5f603d4	Rename/refactor isIntegerConstantExpression to getIntegerConstantExpression There is a version that just tests (also called isIntegerConstantExpression) & whereas this version is specifically used when the value is of interest (a few call sites were actually refactored to calling the test-only version) so let's make the API look more like it. Reviewers: aaron.ballman Differential Revision: https://reviews.llvm.org/D76646	2020-07-12 19:43:24 -07:00
Craig Topper	b4dbb37f32	[X86] Rename X86_CPU_TYPE_COMPAT_ALIAS/X86_CPU_TYPE_COMPAT/X86_CPU_SUBTYPE_COMPAT macros. NFC Remove _COMPAT. Drop the ARCHNAME. Remove the non-COMPAT versions that are no longer needed. We now only use these macros in places where we need compatibility with libgcc/compiler-rt. So we don't need to call out _COMPAT specifically.	2020-07-12 17:00:24 -07:00
Jennifer Yu	6cf0dac1ca	orrectly generate invert xor value for Binary Atomics of int size > 64 When using __sync_nand_and_fetch with __int128, a problem is found that the wrong value for the 'invert' value gets emitted to the xor in case where the int size is greater than 64 bits. This is because uses of llvm::ConstantInt::get which zero extends the greater than 64 bits, so instead -1 that we require, it end up getting 18446744073709551615 This patch replaces the call to llvm::ConstantInt::get with the call to llvm::Constant::getAllOnesValue which works for all integer types. Reviewers: jfp, erichkeane, rjmccall, hfinkel Differential Revision: https://reviews.llvm.org/D82832	2020-07-07 10:20:14 -07:00
Wouter van Oortmerssen	16d83c395a	[WebAssembly] Added 64-bit memory.grow/size/copy/fill This covers both the existing memory functions as well as the new bulk memory proposal. Added new test files since changes where also required in the inputs. Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit. Differential Revision: https://reviews.llvm.org/D82821	2020-07-06 12:49:50 -07:00
Craig Topper	3537939cda	[X86] Move frontend CPU feature initialization to a look up table based implementation. NFCI This replaces the switch statement implementation in the clang's X86.cpp with a lookup table in X86TargetParser.cpp. I've used constexpr and copy of the FeatureBitset from SubtargetFeature.h to store the features in a lookup table. After the lookup the bitset is translated into strings for use by the rest of the frontend code. I had to modify the implementation of the FeatureBitset to avoid bugs in gcc 5.5 constexpr handling. It seems to not like the same array entry to be used on the left side and right hand side of an assignment or &= or \|=. I've also used uint32_t instead of uint64_t and sized based on the X86::CPU_FEATURE_MAX. I've initialized the features for different CPUs outside of the table so that we can express inheritance in an adhoc way. This was one of the big limitations of the switch and we had resorted to labels and gotos. Differential Revision: https://reviews.llvm.org/D82731	2020-06-30 12:04:58 -07:00
Francesco Petrogalli	67e4330fac	[sve][acle] Implement some of the C intrinsics for brain float. Summary: The following intrinsics have been extended to support brain float types: svbfloat16_t svclasta[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data) bfloat16_t svclasta[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data) bfloat16_t svlasta[_bf16](svbool_t pg, svbfloat16_t op) svbfloat16_t svclastb[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data) bfloat16_t svclastb[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data) bfloat16_t svlastb[_bf16](svbool_t pg, svbfloat16_t op) svbfloat16_t svdup[_n]_bf16(bfloat16_t op) svbfloat16_t svdup[_n]_bf16_m(svbfloat16_t inactive, svbool_t pg, bfloat16_t op) svbfloat16_t svdup[_n]_bf16_x(svbool_t pg, bfloat16_t op) svbfloat16_t svdup[_n]_bf16_z(svbool_t pg, bfloat16_t op) svbfloat16_t svdupq[_n]_bf16(bfloat16_t x0, bfloat16_t x1, bfloat16_t x2, bfloat16_t x3, bfloat16_t x4, bfloat16_t x5, bfloat16_t x6, bfloat16_t x7) svbfloat16_t svdupq_lane[_bf16](svbfloat16_t data, uint64_t index) svbfloat16_t svinsr[_n_bf16](svbfloat16_t op1, bfloat16_t op2) Reviewers: sdesmalen, kmclaughlin, c-rhodes, ctetreau, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82345	2020-06-29 16:09:08 +00:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Francesco Petrogalli	7200fa38a9	[sve][acle] Add some C intrinsics for brain float types. Summary: The following intrinsics has been added: svuint16_t svcnt[_bf16]_m(svuint16_t inactive, svbool_t pg, svbfloat16_t op) svuint16_t svcnt[_bf16]_x(svbool_t pg, svbfloat16_t op) svuint16_t svcnt[_bf16]_z(svbool_t pg, svbfloat16_t op) svbfloat16_t svtbl[_bf16](svbfloat16_t data, svuint16_t indices) svbfloat16_t svtbl2[_bf16](svbfloat16x2_t data, svuint16_t indices) svbfloat16_t svtbx[_bf16](svbfloat16_t fallback, svbfloat16_t data, svuint16_t indices) Reviewers: c-rhodes, kmclaughlin, efriedma, sdesmalen, ctetreau Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82429	2020-06-25 16:31:01 +00:00
Andrew Wock	15edd7aaa7	[FPEnv] PowerPC-specific builtin constrained FP enablement This change enables PowerPC compiler builtins to generate constrained floating point operations when clang is indicated to do so. A couple of possibly unexpected backend divergences between constrained floating point and regular behavior are highlighted under the test tag FIXME-CHECK. This may be something for those on the PPC backend to look at. Patch by: Drew Wock <drew.wock@sas.com> Differential Revision: https://reviews.llvm.org/D82020	2020-06-25 11:42:58 -04:00
Cullen Rhodes	05e10ee0ae	[AArch64][SVE2] Add bfloat16 support to whilerw/whilewr intrinsics Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D82399	2020-06-24 10:06:31 +00:00
Cullen Rhodes	fd2c4b8999	[AArch64][SVE] Add bfloat16 support to svlen intrinsic Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D82186	2020-06-24 10:05:51 +00:00
Mikhail Maltsev	3f353a2e5a	[BFloat] Add convert/copy instrinsic support This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a Specifically it adds intrinsic support in clang and llvm for Arm and AArch64. The bfloat type, and its properties are specified in the Arm Architecture Reference Manual: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile The following people contributed to this patch: - Alexandros Lamprineas - Luke Cheeseman - Mikhail Maltsev - Momchil Velikov - Luke Geeson Differential Revision: https://reviews.llvm.org/D80928	2020-06-23 14:27:05 +00:00

... 2 3 4 5 6 ...

1630 Commits