Commit Graph

965 Commits

Author SHA1 Message Date
Luke Geeson 9c72f2596e [AArch64] reverting rC334693 due to build failures
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334696 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 08:59:33 +00:00
Luke Geeson f19651e6bf [AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334693 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 08:28:56 +00:00
Mandeep Singh Grang b23a0ab08a [COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl
Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility.

Reviewers: mstorsjo, compnerd, javed.absar

Reviewed By: mstorsjo

Subscribers: kristof.beyls, chrib, cfe-commits

Differential Revision: https://reviews.llvm.org/D48132

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334639 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 18:49:35 +00:00
Craig Topper 93dd997106 [X86] Fix operand order in the shuffle created for blend builtins.
This was broken when the builtin was added in r334249.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334422 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-11 17:06:01 +00:00
Craig Topper ceda64df0e [X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins.
Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.

Reviewers: RKSimon, delena, spatel, GBuella

Reviewed By: RKSimon

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D47693

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334366 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-10 17:27:05 +00:00
Ivan A. Kosarev b911437b32 [NEON] Support VST1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47446


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334362 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-10 09:28:10 +00:00
Craig Topper ceac5045d9 [X86] Add back some masked vector truncate builtins. Custom IRgen a a few others.
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.

By using special buitlins we can emit a select without using the 128/256-bit select builtins.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334331 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 21:50:08 +00:00
Craig Topper 3cef3a6363 [X86] Fold masking into subvector extract builtins.
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.

The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334330 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 21:50:07 +00:00
Craig Topper 8c873daccc [X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334311 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 18:00:25 +00:00
Craig Topper a90c85acaf [X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334266 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 07:18:33 +00:00
Craig Topper ff71c0eccc [X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334265 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 06:13:16 +00:00
Craig Topper c38646b332 [X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking.
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334261 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 03:24:47 +00:00
Craig Topper 3abc771329 [X86] Add builtins for vpermilps/pd instructions to enable target feature checking.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334256 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 00:59:27 +00:00
Craig Topper 1efc613997 [X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334249 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-08 00:00:21 +00:00
Craig Topper 9399467959 [X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334244 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 23:03:08 +00:00
Reid Kleckner c1c07cca8c [MS] Re-add support for the ARM interlocked bittest intrinscs
Adds support for these intrinsics, which are ARM and ARM64 only:
  _interlockedbittestandreset_acq
  _interlockedbittestandreset_rel
  _interlockedbittestandreset_nf
  _interlockedbittestandset_acq
  _interlockedbittestandset_rel
  _interlockedbittestandset_nf

Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334239 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 21:39:04 +00:00
Craig Topper 2dbbac4ddd [X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking.
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.

I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334237 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 21:27:41 +00:00
Craig Topper 3b4ec91916 [X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics.
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.

This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334208 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 17:28:03 +00:00
Craig Topper 57ebd133fb [X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins.
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.

I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.

The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.

I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.

Reviewers: tkrupa, RKSimon, spatel, GBuella

Reviewed By: tkrupa

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D47724

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334159 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 02:46:02 +00:00
Reid Kleckner 67a59f7040 [MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib
Factor out the common setjmp call emission code.

Based on a patch by Chris January

Differential Revision: https://reviews.llvm.org/D47784

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334112 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 18:39:47 +00:00
Reid Kleckner 56d0ebe7c2 Fix std::tuple errors
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334060 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 01:44:10 +00:00
Reid Kleckner 09fc95cbbc Implement bittest intrinsics generically for non-x86 platforms
I tested these locally on an x86 machine by disabling the inline asm
codepath and confirming that it does the same bitflips as we do with the
inline asm.

Addresses code review feedback.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334059 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 01:35:08 +00:00
Craig Topper 26b20caa52 [X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics.
Previously we were just using extended vector operations in the header file.

This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load.

By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count.

User code still has the option to use extended vector operations themselves if they need non-constant indexing.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334057 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-06 00:24:55 +00:00
Craig Topper b5a0e0ac7d [X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only use it with an index of 0.
This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334054 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 22:40:03 +00:00
Reid Kleckner ce9d274375 Reimplement the bittest intrinsic family as builtins with inline asm
We need to implement _interlockedbittestandset as a builtin for
windows.h, so we might as well do the whole family. It reduces code
duplication anyway.

Fixes PR33188, a long standing bug in our bittest implementation
encountered by Chakra.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333978 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 01:33:40 +00:00
Reid Kleckner faf42a8ee2 Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms."
Adding __attribute__((aligned(32))) to __m256 breaks the implementation
of _mm256_loadu_ps on Windows. On Windows, alignment attributes have
higher precedence than packing attributes.

We also might want to carefully consider the consequences of changing
our vector typedefs, since many users copy them and invent their own
new, non-Intel specific vector type names.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333958 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-04 21:39:20 +00:00
Craig Topper e0184a3e05 [X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333853 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-03 19:42:59 +00:00
Craig Topper b76cfbb69a [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-03 19:02:57 +00:00
Craig Topper b442eb45fd Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC"
Looks like I missed some changes to make this work.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333850 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-03 18:41:22 +00:00
Craig Topper b788d58757 [X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333848 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-03 18:08:37 +00:00
Craig Topper 3c6cf4a209 [X86] When emitting masked loads/stores don't check for all ones mask.
This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics.

We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333847 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-03 18:08:36 +00:00
Ivan A. Kosarev ea8877c890 [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333829 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-02 17:42:59 +00:00
John McCall 63074aa297 Cap "voluntary" vector alignment at 16 for all Darwin platforms.
This fixes two major problems:
- We were not capping vector alignment as desired on 32-bit ARM.
- We were using different alignments based on the AVX settings on
  Intel, so we did not have a consistent ABI.

This is an ABI break, but we think we can get away with it because
vectors tend to be used mostly in inline code (which is why not having
a consistent ABI has not proven disastrous on Intel).

Intel's AVX types are specified as having 32-byte / 64-byte alignment,
so align them explicitly instead of relying on the base ABI rule.
Note that this sort of attribute is stripped from template arguments
in template substitution, so there's a possibility that code templated
over vectors will produce inadequately-aligned objects.  The right
long-term solution for this is for alignment attributes to be
interpreted as true qualifiers and thus preserved in the canonical type.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333791 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-01 21:34:26 +00:00
Dan Gohman 8c75735104 [WebAssembly] Update to the new names for the memory builtin functions.
The WebAssembly committee has decided on the names `memory.size` and
`memory.grow` for the memory intrinsics, so update the clang builtin
functions to follow those names, keeping both sets of old names in place
for compatibility.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333712 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-01 00:05:51 +00:00
Gabor Buella cf48649b35 [X86] Lowering FMA intrinsics to native IR (Clang part)
This patch replaces all packed (and scalar without rounding
mode) fused intrinsics with fmadd/fmaddsub variations.
Then fmadd/fmaddsub are lowered to native IR.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47444


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333555 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 15:27:49 +00:00
Simon Tatham 206c89ddd5 Support __iso_volatile_load8 etc on aarch64-win32.
These intrinsics are used by MSVC's header files on AArch64 Windows as
well as AArch32, so we should support them for both targets. I've
factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate
functions that EmitAArch64BuiltinExpr can call as well.

Reviewers: javed.absar, mstorsjo

Reviewed By: mstorsjo

Subscribers: kristof.beyls, cfe-commits

Differential Revision: https://reviews.llvm.org/D47476


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333513 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 07:54:05 +00:00
Craig Topper fbd2776604 [X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333061 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-23 04:51:54 +00:00
Sanjay Patel 067ba096ad [CodeGen] use nsw negation for builtin abs
The clang builtins have the same semantics as the stdlib functions.
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."

That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.

Differential Revision: https://reviews.llvm.org/D47202


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333038 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 23:02:13 +00:00
Craig Topper 9f33b6a567 [X86] Remove mask argument from some builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333027 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 20:48:24 +00:00
Sanjay Patel 98de0f5c7b [CodeGen] produce the LLVM canonical form of abs
We chose the 'slt' form as canonical in IR with:
rL332819
...so we should generate that form directly for efficiency.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@332989 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 15:36:50 +00:00
Craig Topper 4f6bfc6f7d [X86] Remove masking from pternlog llvm intrinsics and use a select instruction instead.
Because the intrinsics in the headers are implemented as macros, we can't just use a select builtin and pternlog builtin. This would require one of the macro arguments to be used twice. Depending on what was passed to the macro we could expand an expression twice leading to weird behavior. We could maybe declare our local variable in the macro, but that would need to worry about name collisions.

To avoid that just generate IR directly in CGBuiltin.cpp.

Differential Revision: https://reviews.llvm.org/D47125

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@332891 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-21 20:58:23 +00:00
Daniil Fukalov a3f8c02f04 [AMDGPU] fixes for lds f32 builtins
1. added restrictions to memory scope, order and volatile parameters
2. added custom processing for these builtins - currently is not used code,
   needed to switch off GCCBuiltin link to the builtins (ongoing change to llvm
   tree)
3. builtins renamed as requested

Differential Revision: https://reviews.llvm.org/D43281

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@332848 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-21 16:18:07 +00:00
Craig Topper 12891fdf0f [X86] Change the implementation of scalar masked load/store intrinsics to not use a 512-bit intermediate vector.
This is unnecessary for AVX512VL supporting CPUs like SKX. We can just emit a 128-bit masked load/store here no matter what. The backend will widen it to 512-bits on KNL CPUs.

Fixes the frontend portion of PR37386. Need to fix the backend to optimize the new sequences well.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@331958 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-10 05:43:43 +00:00
Craig Topper 1d6049b739 [Builtins] Improve the IR emitted for MSVC compatible rotr/rotl builtins to match what the middle and backends understand
Previously we emitted something like

rotl(x, n) {
  n &= bitwidth-1;
  return n != 0 ? ((x << n) | (x >> (bitwidth - n)) : x;
}

We use a select to avoid the undefined behavior on the (bitwidth - n) shift.

The middle and backend don't really recognize this as a rotate and end up emitting a cmov or control flow because of the select.

A better pattern is (x << (n & mask)) | (x << (-n & mask)) where mask is bitwidth - 1.

Fixes the main complaint in PR37387. There's still some work to be done if the user writes that sequence directly on a short or char where type promotion rules can prevent it from being recognized. The builtin is emitting direct IR with unpromoted types so that isn't a problem for it.

Differential Revision: https://reviews.llvm.org/D46656

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@331943 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-10 00:05:13 +00:00
Yaxun Liu 113fcfe647 [OpenCL] Fix typos in emitted enqueue kernel function names
Two typos: 
vaarg => vararg
get_kernel_preferred_work_group_multiple => get_kernel_preferred_work_group_size_multiple

Differential Revision: https://reviews.llvm.org/D46601


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@331895 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 17:07:06 +00:00
Adrian Prantl 647be32c60 Remove \brief commands from doxygen comments.
This is similar to the LLVM change https://reviews.llvm.org/D46290.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46320

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@331834 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 01:00:01 +00:00
Oliver Stannard acc1201fba [ARM,AArch64] Add intrinsics for dot product instructions
The ACLE spec which describes these intrinsics hasn't been published yet, but
this is based on the final draft which will be published soon, and these have
already been implemented by GCC.

Differential revision: https://reviews.llvm.org/D46109



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@331039 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-27 14:03:32 +00:00
Sven van Haastregt 2bf83a47cc [OpenCL] Add separate read_only and write_only pipe IR types
SPIR-V encodes the read_only and write_only access qualifiers of pipes,
so separate LLVM IR types are required to target SPIR-V.  Other backends
may also find this useful.

These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which
replace `opencl.pipe_t`.

This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...)
which took a read_only pipe with separate versions for read_only and
write_only pipes, namely:

 * __get_pipe_num_packets_ro(...)
 * __get_pipe_num_packets_wo(...)
 * __get_pipe_max_packets_ro(...)
 * __get_pipe_max_packets_wo(...)

These separate versions exist to avoid needing a bitcast to one of the
two qualified pipe types.

Patch by Stuart Brady.

Differential Revision: https://reviews.llvm.org/D46015


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@331026 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-27 10:37:04 +00:00
Chandler Carruth b86e22b65d [x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics
The LLVM commit introduces a crash in LLVM's instruction selection.

I filed http://llvm.org/PR37260 with the test case.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330997 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-26 21:46:01 +00:00
Alexander Ivchenko 2ce56cc5c9 Lowering x86 adds/addus/subs/subus intrinsics (clang)
This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D44786


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330323 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-19 12:15:11 +00:00
Artem Belevich c4d3d32435 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330296 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-18 21:51:48 +00:00
Keith Wyss 07bff7a364 [XRay] Add clang builtin for xray typed events.
Summary:
A clang builtin for xray typed events. Differs from
__xray_customevent(...) by the presence of a type tag that is vended by
compiler-rt in typical usage. This allows xray handlers to expand logged
events with their type description and plugins to process traced events
based on type.

This change depends on D45633 for the intrinsic definition.

Reviewers: dberris, pelikan, rnk, eizan

Subscribers: cfe-commits, llvm-commits

Differential Revision: https://reviews.llvm.org/D45716

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330220 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-17 21:32:43 +00:00
Aaron Ballman 560afaf82b Add modifiers for unsigned char and signed char field printing for __builtin_dump_struct.
Patch by Paul Semel.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330188 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-17 14:00:06 +00:00
Aaron Ballman 939b185f02 Add checks for format specifiers used by __builtin_dump_struct and added a new specifier for null-terminated constant strings.
Patch by Paul Semel.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330185 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-17 11:57:47 +00:00
Ivan A. Kosarev 97cf2229cd [NEON] Support vrndns_f32 intrinsic
Differential Revision: https://reviews.llvm.org/D45515


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@330012 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-13 12:46:02 +00:00
Dean Michael Berris a50e57f11b [XRay][clang] Add flag to choose instrumentation bundles
Summary:
This change addresses http://llvm.org/PR36926 by allowing users to pick
which instrumentation bundles to use, when instrumenting with XRay. In
particular, the flag `-fxray-instrumentation-bundle=` has four valid
values:

- `all`: the default, emits all instrumentation kinds
- `none`: equivalent to -fnoxray-instrument
- `function`: emits the entry/exit instrumentation
- `custom`: emits the custom event instrumentation

These can be combined either as comma-separated values, or as
repeated flag values.

Reviewers: echristo, kpw, eizan, pelikan

Reviewed By: pelikan

Subscribers: mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D44970

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@329985 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-13 02:31:58 +00:00
Aaron Ballman ff7e5afb9c Introduce a new builtin, __builtin_dump_struct, that is useful for dumping structure contents at runtime in circumstances where debuggers may not be easily available (such as in kernel work).
Patch by Paul Semel.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@329762 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-10 21:58:13 +00:00
Craig Topper 55aa0b574a [X86] Emit native IR for pmuldq/pmuludq builtins.
I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts.

Differential Revision: https://reviews.llvm.org/D45421

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@329605 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-09 19:17:54 +00:00
Alexander Kornienko b8b9458165 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@329399 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-06 15:14:32 +00:00
Krzysztof Parzyszek 0a1726abd9 [Hexagon] Remove default values from lambda parameters
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@329394 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-06 13:51:48 +00:00
Gor Nishanov 925856310e [coroutines] Add __builtin_coro_noop => llvm.coro.noop
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined
coroutine noop_coroutine that does nothing. To implement this feature, we implemented
an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that
does nothing when resumed or destroyed.

This patch adds a builtin __builtin_coro_noop() that maps to llvm.coro.noop intrinsic.

Related llvm change: https://reviews.llvm.org/D45114

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328993 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 17:35:37 +00:00
Krzysztof Parzyszek c6a82d971b [Hexagon] Aid bit-reverse load intrinsics lowering with bitcode
The conversion of operatios to bitcode helps to eliminate an additional
store in certain cases. We used to lower these load intrinsics in DAG to
DAG conversion by which time, the "Dead Store Elimination" pass is
already run. There is an associated LLVM patch.
    
Patch by Sumanth Gundapaneni.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328776 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 13:54:31 +00:00
Krzysztof Parzyszek 1ec33d54df [Hexagon] Add support for "new" circular buffer intrinsics
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" vesrions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.

There is a related llvm patch.

Patch by Brendon Cahoon.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328725 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-28 19:40:57 +00:00
Abderrazek Zaafrani f466627bea [ARM] Add ARMv8.2-A FP16 vector intrinsic
Putting back the code in commit r327189 that was reverted in r322737. The code is being committed in three stages and this one is the last stage: 1) r327455 fp16 feature flags, 2) r327836 pass half type or i16 based on FullFP16, and 3) the code here which the front-end fp16 vector intrinsic for ARM.

Differential revision https://reviews.llvm.org/D43650

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328277 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 00:08:40 +00:00
Artem Belevich 350f0bf9a4 [NVPTX] Make tensor shape part of WMMA intrinsic's name.
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.

Differential Revision: https://reviews.llvm.org/D44719

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328158 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-21 21:55:02 +00:00
Eric Fiselier f353d86deb [Builtins] Overload __builtin_operator_new/delete to allow forwarding to usual allocation/deallocation functions.
Summary:
Libc++'s default allocator uses `__builtin_operator_new` and `__builtin_operator_delete` in order to allow the calls to new/delete to be ellided. However, libc++ now needs to support over-aligned types in the default allocator. In order to support this without disabling the existing optimization Clang needs to support calling the aligned new overloads from the builtins.

See llvm.org/PR22634 for more information about the libc++ bug.

This patch changes `__builtin_operator_new`/`__builtin_operator_delete` to call any usual `operator new`/`operator delete` function. It does this by performing overload resolution with the arguments passed to the builtin to determine which allocation function to call. If the selected function is not a usual allocation function a diagnostic is issued.

One open issue is if the `align_val_t` overloads should be considered "usual" when `LangOpts::AlignedAllocation` is disabled.


In order to allow libc++ to detect this new behavior the value for `__has_builtin(__builtin_operator_new)` has been updated to `201802`.

Reviewers: rsmith, majnemer, aaron.ballman, erik.pilkington, bogner, ahatanak

Reviewed By: rsmith

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D43047

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328134 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-21 19:19:48 +00:00
Abderrazek Zaafrani 9901645365 [AArch64] Add vmulxh_lane fp16 vector intrinsic
https://reviews.llvm.org/D44591

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328038 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-20 20:37:31 +00:00
Artem Belevich b1444f3997 [NVPTX] Make tensor load/store intrinsics overloaded.
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.

Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.

Updated tests to use non-default address spaces.

Differential Revision: https://reviews.llvm.org/D43268

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@328006 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-20 17:18:59 +00:00
Sjoerd Meijer 2d4ee510b2 [ARM] Pass half or i16 types for NEON intrinsics
For generating NEON intrinsics, this determines the NEON data type, and whether
it should be a half type or an i16 type. I.e., we always pass a half type for
AArch64, this hasn't changed, but now also for ARM but only when FullFP16 is
enabled, and i16 otherwise.

This is intended to be non-functional change, but together with the backend
work in D44538 which adds support for f16 vectors, this enables adding the
AArch32 FP16 (vector) intrinsics.

Differential Revision: https://reviews.llvm.org/D44561


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@327836 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 13:22:49 +00:00
Sjoerd Meijer 7110eaf5a2 This reverts "r327189 - [ARM] Add ARMv8.2-A FP16 vector intrinsic"
This is causing problems in testing, and PR36683 was raised.
Reverting it until we have sorted out how to pass f16 vectors.



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@327437 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-13 19:38:56 +00:00
Abderrazek Zaafrani 72858dcc19 [ARM] Add ARMv8.2-A FP16 vector intrinsic
Add the fp16 neon vector intrinsic for ARM as described in the ARM ACLE document.

Reviews in https://reviews.llvm.org/D43650

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@327189 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-09 23:39:34 +00:00
Craig Topper 9a5369fb11 [X86] Reverse the operand order of the implementation of the kunpack builtins.
The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior.

Fixes PR36360.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@324954 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-12 22:38:52 +00:00
Abderrazek Zaafrani 44ca936342 [AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - clang portion
https://reviews.llvm.org/D42993

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@324940 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-12 21:26:06 +00:00
Craig Topper 6ce1cf35b9 [X86] Change the signature of the AVX512 packed fp compare intrinsics to return vXi1 mask. Make bitcasts to scalar explicit in IR
Summary: This is the clang equivalent of r324827

Reviewers: zvi, delena, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43143

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@324828 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-10 23:34:27 +00:00
Craig Topper 3c9141475e [X86] Replace kortest intrinsics with native IR.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@324647 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-08 20:16:17 +00:00
Peter Collingbourne 902a9663be IRGen: Emit an inline implementation of __builtin_wmemcmp on MSVCRT platforms.
The MSVC runtime library does not provide a definition of wmemcmp,
so we need an inline implementation.

Differential Revision: https://reviews.llvm.org/D42441

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@323362 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-24 18:59:58 +00:00
Dan Gohman a11eb9ed78 [WebAssembly] Add mem.* builtin functions.
This corresponds to r323222 in LLVM. The new names are not yet
finalized, so use them at your own risk.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@323224 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-23 17:04:04 +00:00
Abderrazek Zaafrani 5d89c44ccd [AArch64] Add ARMv8.2-A FP16 scalar intrinsics
https://reviews.llvm.org/D41792

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@323006 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-19 23:11:18 +00:00
Craig Topper 4647e409df [X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or
Summary:
kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation.

I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations.

Reviewers: RKSimon, spatel, zvi, jina.nahias

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D42016

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@322461 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-14 19:23:50 +00:00
Craig Topper 78c4666039 [X86] Replace cvt*2mask intrinsics with native IR using 'icmp slt X, zeroinitializer.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@322038 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-08 22:37:56 +00:00
Benjamin Kramer b99f3a83f3 Add support for a limited subset of TS 18661-3 math builtins.
These just overloads for _Float128. They're supported by GCC 7 and used
by glibc. APFloat support is already there so just add the overloads.

__builtin_copysignf128
__builtin_fabsf128
__builtin_huge_valf128
__builtin_inff128
__builtin_nanf128
__builtin_nansf128

This is the same support that GCC has, according to the documentation,
but limited to _Float128.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321948 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-06 21:49:54 +00:00
Vedant Kumar 696af9766e [CGBuiltin] Handle unsigned mul overflow properly (PR35750)
r320902 fixed the IRGen for some types of checked multiplications. It
did not handle unsigned overflow correctly in the case where the signed
operand is negative (PR35750).

Eli pointed out that on overflow, the result must be equal to the unique
value that is equivalent to the mathematically-correct result modulo two
raised to the k power, where k is the number of bits in the result type.

This patch fixes the specialized IRGen from r320902 accordingly.

Testing: Apart from check-clang, I modified the test harness from
r320902 to validate the results of all multiplications -- not just the
ones which don't overflow:

  https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

llvm.org/PR35750, rdar://34963321

Differential Revision: https://reviews.llvm.org/D41717

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321771 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-03 23:11:32 +00:00
Coby Tayree 8b794e9ef2 [x86][icelake][bitalg]
added bitalg feature recognition
added intrinsics support for bitalg instructions
_mm512_popcnt_epi16
_mm512_mask_popcnt_epi16
_mm512_maskz_popcnt_epi16
_mm512_popcnt_epi8
_mm512_mask_popcnt_epi8
_mm512_maskz_popcnt_epi8
_mm512_mask_bitshuffle_epi64_mask
_mm512_bitshuffle_epi64_mask
_mm256_popcnt_epi16
_mm256_mask_popcnt_epi16
_mm256_maskz_popcnt_epi16
_mm128_popcnt_epi16
_mm128_mask_popcnt_epi16
_mm128_maskz_popcnt_epi16
_mm256_popcnt_epi8
_mm256_mask_popcnt_epi8
_mm256_maskz_popcnt_epi8
_mm128_popcnt_epi8
_mm128_mask_popcnt_epi8
_mm128_maskz_popcnt_epi8
_mm256_mask_bitshuffle_epi32_mask
_mm256_bitshuffle_epi32_mask
_mm128_mask_bitshuffle_epi16_mask
_mm128_bitshuffle_epi16_mask
matching a similar work on the backend (D40222)
Differential Revision: https://reviews.llvm.org/D41564


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321483 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-27 10:01:00 +00:00
Craig Topper 56e4bd682d [X86] Allow _mm_prefetch (both the header implementation and the builtin) to accept bit 2 which is supposed to indicate the prefetched addresses will be written to
Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321325 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21 23:50:22 +00:00
Abderrazek Zaafrani af137029ce [AArch64] Enable fp16 data type for the Builtin for AArch64 only.
Differential Revision: https:://reviews.llvm.org/D41360

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321301 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21 20:10:03 +00:00
Abderrazek Zaafrani 749de2d465 [AARch64] Add ARMv8.2-A FP16 vector intrinsics
Putting back the code that was reverted few weeks ago.

Differential Revision: https://reviews.llvm.org/D34161

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321294 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21 19:20:01 +00:00
Vedant Kumar 817fbca018 [ubsan] Diagnose noreturn functions which return
Diagnose 'unreachable' UB when a noreturn function returns.

  1. Insert a check at the end of functions marked noreturn.

  2. A decl may be marked noreturn in the caller TU, but not marked in
     the TU where it's defined. To diagnose this scenario, strip away the
     noreturn attribute on the callee and insert check after calls to it.

Testing: check-clang, check-ubsan, check-ubsan-minimal, D40700

rdar://33660464

Differential Revision: https://reviews.llvm.org/D40698

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321231 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-21 00:10:25 +00:00
Adrian Prantl 550c574546 Silence a bunch of implicit fallthrough warnings
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@321115 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-19 22:06:11 +00:00
Craig Topper d073eeed8a [X86] Implement kand/kandn/kor/kxor/kxnor/knot intrinsics using native IR.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@320919 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-16 08:26:22 +00:00
Craig Topper c6aa3927dc [X86] Add builtins and tests for 128 and 256 bit vpopcntdq.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@320915 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-16 06:02:31 +00:00
Vedant Kumar 87961dfbc9 [CodeGen] Specialize mixed-sign mul-with-overflow (fix PR34920)
This patch introduces a specialized way to lower overflow-checked
multiplications with mixed-sign operands. This fixes link failures and
ICEs on code like this:

  void mul(int64_t a, uint64_t b) {
    int64_t res;
    __builtin_mul_overflow(a, b, &res);
  }

The generic checked-binop irgen would use a 65-bit multiplication
intrinsic here, which requires runtime support for _muloti4 (128-bit
multiplication), and therefore fails to link on i386. To get an ICE
on x86_64, change the example to use __int128_t / __uint128_t.

Adding runtime and backend support for 65-bit or 129-bit checked
multiplication on all of our supported targets is infeasible.

This patch solves the problem by using simpler, specialized irgen for
the mixed-sign case.

llvm.org/PR34920, rdar://34963321

Testing: Apart from check-clang, I compared the output from this fairly
comprehensive test driver using unpatched & patched clangs:
https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

Differential Revision: https://reviews.llvm.org/D41149

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@320902 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-16 01:28:25 +00:00
Reid Kleckner 512570d77f [CodeGen][X86] Implement _InterlockedCompareExchange128 intrinsic
Summary:
InterlockedCompareExchange128 is a bit more complicated than the other
InterlockedCompareExchange functions, so it requires a bit more work. It
doesn't directly refer to 128bit ints, instead it takes pointers to
64bit ints for Destination and ComparandResult, and exchange is taken as
two 64bit ints (high & low). The previous value is written to
ComparandResult, and success is returned. This implementation does the
following in order to produce a cmpxchg instruction:

  1. Cast everything to 128bit ints or int pointers, and glues together
     the Exchange values
  2. Reads from CompareandResult to get the comparand
  3. Calls cmpxchg volatile (on X86 this will produce a lock cmpxchg16b
     instruction)
    1. Result 0 (previous value) is written back to ComparandResult
    2. Result 1 (success bool) is zext'ed to a uchar and returned

Resolves bug https://llvm.org/PR35251

Patch by Colden Cullen!

Reviewers: rnk, agutowski

Reviewed By: rnk

Subscribers: majnemer, cfe-commits

Differential Revision: https://reviews.llvm.org/D41032

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@320730 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-14 19:00:21 +00:00
Krzysztof Parzyszek 95387d6350 [Hexagon] Intrinsic support for V62 and V65
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@320609 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-13 19:56:03 +00:00
Sanjay Patel bf691ab9cd [CodeGen] fix mapping from fmod calls to frem instruction
Similar to D40044 and discussed in D40594.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@319619 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-02 17:52:00 +00:00
Sanjay Patel 1b87c65388 [CodeGen] remove stale comment; NFC
The libm functions with LLVM intrinsic twins were moved above this blob with:
https://reviews.llvm.org/rL319593



git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@319618 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-02 16:29:34 +00:00
Sanjay Patel 26ba7faa47 [CodeGen] convert math libcalls/builtins to equivalent LLVM intrinsics
There are 20 LLVM math intrinsics that correspond to mathlib calls according to the LangRef:
http://llvm.org/docs/LangRef.html#standard-c-library-intrinsics

We were only converting 3 mathlib calls (sqrt, fma, pow) and 12 builtin calls (ceil, copysign, 
fabs, floor, fma, fmax, fmin, nearbyint, pow, rint, round, trunc) to their intrinsic-equivalents.

This patch pulls the transforms together and handles all 20 cases. The switch is guarded by a 
check for const-ness to make sure we're not doing the transform if errno could possibly be set by
the libcall or builtin.

Differential Revision: https://reviews.llvm.org/D40044


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@319593 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-01 23:15:52 +00:00
Dean Michael Berris f4f187631c [XRay][clang] Introduce -fxray-always-emit-customevents
Summary:
The -fxray-always-emit-customevents flag instructs clang to always emit
the LLVM IR for calls to the `__xray_customevent(...)` built-in
function. The default behaviour currently respects whether the function
has an `[[clang::xray_never_instrument]]` attribute, and thus not lower
the appropriate IR code for the custom event built-in.

This change allows users calling through to the
`__xray_customevent(...)` built-in to always see those calls lowered to
the corresponding LLVM IR to lay down instrumentation points for these
custom event calls.

Using this flag enables us to emit even just the user-provided custom
events even while never instrumenting the start/end of the function
where they appear. This is useful in cases where "phase markers" using
__xray_customevent(...) can have very few instructions, must never be
instrumented when entered/exited.

Reviewers: rnk, dblaikie, kpw

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D40601

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@319388 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-30 00:04:54 +00:00
Erich Keane 5c803947ed [X86] Update CPUSupports code to reuse LLVM .def file [NFC]
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@318815 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-22 00:54:01 +00:00
Erich Keane 35416f3b78 Simplify CpuIs code to use include from LLVM
LLVM exposes a file in the backend (X86TargetParser.def) that
contains information about the correct list of CpuIs values.

This patch removes 2 of the copied and pasted versions of this
list from clang and instead includes the data from the .def file.

Differential Revision: https://reviews.llvm.org/D40054


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@318234 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-15 00:11:24 +00:00
Sanjay Patel 98c05f2351 [CodeGen] fix const-ness of cbrt and fma
cbrt() is always constant because it can't overflow or underflow. Therefore, it can't set errno.

fma() is not always constant because it can overflow or underflow. Therefore, it can set errno.
But we know that it never sets errno on GNU / MSVC, so make it constant in those environments.

Differential Revision: https://reviews.llvm.org/D39641


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@318093 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-13 22:11:49 +00:00