Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX.
Patch by Gabor Buella
Differential Revision: https://reviews.llvm.org/D43459
llvm-svn: 325655
I based that commit on what was in Intel's public documentation here https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
Which specifically said CLWB wasn't until Icelake.
But I've since cross checked with SDE and it thinks these features exist on CNL and ICL. So now I don't know what to believe.
I've added test coverage of the current behavior as part of the revert so at least now have proof of what we're doing.
llvm-svn: 321547
We have cannonlake and icelake inheriting from skylake server in a switch using fallthroughs. But they aren't perfect supersets of skylake server.
llvm-svn: 321504
added vbmi2 feature recognition
added intrinsics support for vbmi2 instructions
_mm[128,256,512]_mask[z]_compress_epi[16,32]
_mm[128,256,512]_mask_compressstoreu_epi[16,32]
_mm[128,256,512]_mask[z]_expand_epi[16,32]
_mm[128,256,512]_mask[z]_expandloadu_epi[16,32]
_mm[128,256,512]_mask[z]_sh[l,r]di_epi[16,32,64]
_mm[128,256,512]_mask_sh[l,r]dv_epi[16,32,64]
matching a similar work on the backend (D40206)
Differential Revision: https://reviews.llvm.org/D41557
llvm-svn: 321487
added vpclmulqdq feature recognition
added intrinsics support for vpclmulqdq instructions
_mm256_clmulepi64_epi128
_mm512_clmulepi64_epi128
matching a similar work on the backend (D40101)
Differential Revision: https://reviews.llvm.org/D41573
llvm-svn: 321480
added vaes feature recognition
added intrinsics support for vaes instructions, matching a similar work on the backend (D40078)
_mm256_aesenc_epi128
_mm512_aesenc_epi128
_mm256_aesenclast_epi128
_mm512_aesenclast_epi128
_mm256_aesdec_epi128
_mm512_aesdec_epi128
_mm256_aesdeclast_epi128
_mm512_aesdeclast_epi128
llvm-svn: 321474
I think the only reason they are different is because we don't set tune_i686 for -march=i686 to match GCC. But GCC 4.9.0 seems to have changed this behavior and they do set it now. So I think they can aliases now.
Differential Revision: https://reviews.llvm.org/D39349
llvm-svn: 316712
As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017.
llvm-svn: 316593
Summary:
Also:
- Add support for some older Myriad CPUs that were missing.
- Fix some incorrect compiler defines for exisitng CPUs.
Reviewers: jyknight
Subscribers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D37551
llvm-svn: 314706
This patch extends the -fzvector language feature to enable the new
"vector float" data type when compiling at -march=z14. This matches
the updated extension definition implemented by other compilers for
the platform, which is indicated to applications by pre-defining
__VEC__ to 10302 (instead of 10301).
llvm-svn: 308198
This patch series adds support for the IBM z14 processor. This part includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
Support for the -fzvector extension to vector float and the new
high-level vector intrinsics is provided by separate patches.
llvm-svn: 308197
1. Adds the command line flag for clzero.
2. Includes the clzero flag under znver1.
3. Defines the macro for clzero.
4. Adds a new file which has the intrinsic definition for clzero instruction.
Patch by Ganesh Gopalasubramanian with some additional tests from me.
Differential revision: https://reviews.llvm.org/D29386
llvm-svn: 294559
GCC 7 will predefine two new macros on s390x:
- __ARCH__ indicates the ISA architecture level
- __VX__ indicates that the vector facility is available
This adds those macros to clang as well to ensure continued
compatibility with GCC.
llvm-svn: 294197
Summary:
This patch enables the following
1. AMD family 17h architecture using "znver1" tune flag (-march, -mcpu).
2. ISAs that are enabled for "znver1" architecture.
3. Checks ADX isa from cpuid to identify "znver1" flag when -march=native is used.
4. ISAs FMA4, XOP are disabled as they are dropped from amdfam17.
5. For the time being, it uses the btver2 scheduler model.
6. Test file is updated to check this flag.
This is linked to llvm review item https://reviews.llvm.org/D28017
Patch by Ganesh Gopalasubramanian. Additional test cases added by Craig Topper.
Reviewers: RKSimon, craig.topper
Subscribers: cfe-commits, RKSimon, ashutosh.nema, llvm-commits
Differential Revision: https://reviews.llvm.org/D28018
llvm-svn: 291544
For compatibility with other compilers on the platform, allow specifying
levels of the z/Architecture instead of model names with -march. In
particular, the following aliases are now supported:
-march=arch8 equals -march=z10
-march=arch9 equals -march=z196
-march=arch10 equals -march=zEC12
-march=arch11 equals -march=z13
This parallels the equivalent (and prerequisite) LLVM change in r285577.
llvm-svn: 285578
This patch corresponds to review:
https://reviews.llvm.org/D24397
It adds the __POWER9_VECTOR__ macro and the -mpower9-vector option along with
a number of altivec.h functions (refer to the code review for a list).
llvm-svn: 282481
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE.
Differential Revision: http://reviews.llvm.org/D17682
llvm-svn: 262772
Define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_[1248] macros on SystemZ.
This fixes a miscompile of GCC C++11 standard library headers
due to use of those macros in an ABI-changing manner.
See e.g. /usr/include/c++/4.8.5/ext/concurrence.h:
// Compile time constant that indicates prefered locking policy in
// the current configuration.
static const _Lock_policy __default_lock_policy =
#ifdef __GTHREADS
#if (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2) \
&& defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4))
_S_atomic;
#else
_S_mutex;
#endif
#else
_S_single;
#endif
A different choice of __default_lock_policy causes different
sizes of several of the C++11 data structures, which are then
incompatible when inlined in clang-compiled code with what the
(GCC-compiled) external library expects.
This in turn leads to various crashes when using std::thread
in code compiled with clang, as see e.g. via the ThreadPool
unit tests. See PR 26473 for an example.
llvm-svn: 259931
is not defined for 32bit mode, but __sparcv9 is. Pass down the correct
-target-cpu flags to the backend, so that instruction restrictions are
applied correctly. Pass down the correct -A flag when not using IAS.
The latter is limited to NetBSD targets in this commit.
llvm-svn: 252545
We support all __sync_val_compare_and_swap_* builtins (only 64-bit on 64-bit
targets) on all cores, and should define the corresponding
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_* macros, just as GCC does. As it turns out,
this is really important because they're needed to prevent a bad ODR violation
with libstdc++'s std::shared_ptr (this is well explained in PR12730).
We were doing this only for P8, but this is necessary on all PPC systems.
llvm-svn: 249009
The z13 vector facility has an associated language extension,
closely modeled on AltiVec/VSX. The main differences are:
- vector long, vector float and vector pixel are not supported
- vector long long and vector double are supported (like VSX)
- comparison operators return a vector rather than a scalar integer
- shift operators behave like the OpenCL shift operators
- vector bool is only supported as argument to certain operators;
some operators allow mixing a bool with a non-bool vector
This patch adds clang support for the extension. It is closely modelled
on the AltiVec support. Similarly to the -faltivec option, there's a
new -fzvector option to enable the extensions (as well as an -mzvector
alias for compatibility with GCC). There's also a separate LangOpt.
The extension as implemented here is intended to be compatible with
the -mzvector extension recently implemented by GCC.
Based on a patch by Richard Sandiford.
Differential Revision: http://reviews.llvm.org/D11001
llvm-svn: 243642
Follow-up to commit for revision 236848.
Just a test case for the macro definition under the right CPU/Arch.
One combination was actually missed in the initial fix:
- powerpc64-unknown-unknown -mcpu=pwr8 (rather than -mcpu=power8).
llvm-svn: 237386
The zEC12 provides the transactional-execution facility. This is exposed
to users via a set of builtin routines on other compilers. This patch
adds clang support to enable those builtins. In partciular, the patch:
- enables the transactional-execution feature by default on zEC12
- allows to override presence of that feature via the -mhtm/-mno-htm options
- adds a predefined macro __HTM__ if the feature is enabled
- adds support for the transactional-execution GCC builtins
- adds Sema checking to verify the __builtin_tabort abort code
- adds the s390intrin.h header file (for GCC compatibility)
- adds s390 sections to the htmintrin.h and htmxlintrin.h header files
Since this is first use of target-specific intrinsics on the platform,
the patch creates the include/clang/Basic/BuiltinsSystemZ.def file and
hooks it up in TargetBuiltins.h and lib/Basic/Targets.cpp.
An associated LLVM patch adds the required LLVM IR intrinsics.
For reference, the transactional-execution instructions are documented
in the z/Architecture Principles of Operation for the zEC12:
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/DZ9ZR009.pdf
The associated builtins are documented in the GCC manual:
http://gcc.gnu.org/onlinedocs/gcc/S_002f390-System-z-Built-in-Functions.html
The htmxlintrin.h intrinsics provided for compatibility with the IBM XL
compiler are documented in the "z/OS XL C/C++ Programming Guide".
llvm-svn: 233804
This patch simplifies how default target features are set for AMD bdver2
and bdver1. In particular, method 'getDefaultFeatures' now implements a
fallthrough from case 'CK_BDVER2' to case 'CK_BDVER1'.
That is because 'bdver2' has the same features available in bdver1 plus
BMI, FMA, F16C and TBM.
This patch also adds missing checks for predefined macros in test
predefined-arch-macros.c. In the case of BTVER2, the test now also checks
for F16C, BMI and PCLMUL. In the case of BDVER3 and BDVER4, the test now
also checks for the presence of FSGSBASE.
Differential Revision: http://reviews.llvm.org/D6134
llvm-svn: 221449
The current VSX feature for PowerPC specifies availability of the VSX
instructions added with the 2.06 architecture version. With 2.07, the
architecture adds new instructions to both the Category:Vector and
Category:VSX instruction sets. Additionally, unaligned vector storage
operations have improved performance.
This patch adds a feature to provide access to the new instructions
and performance capabilities of Power8. For compatibility with GCC,
the feature is controlled via a new -mpower8-vector switch, and the
feature causes the __POWER8_VECTOR__ builtin define to be generated by
the preprocessor.
There is a companion patch for llvm being committed at the same time.
llvm-svn: 219502
a) add SKX support to Clang driver;
b) add tests for SKX target and AVX512BW, AVX512DQ, AVX512VL features into clang driver tests
Patch by Zinovy Nis <zinovy.y.nis@intel.com>
llvm-svn: 214306
clang front end. This change will allow the __PRFCHW__ macro to be set on these
processors and hence include prfchwintrin.h in x86intrin.h header. Support for
the intrinsic itself seems to have already been added in r178041.
Differential Revision: http://llvm-reviews.chandlerc.com/D1934
llvm-svn: 192829
x86 TBM instruction set. Also adding a __TBM__ macro if the TBM feature is
enabled. Otherwise there should be no functionality change to existing features.
Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1693
llvm-svn: 191326
- New options '-mrtm'/'-mno-rtm' are added to enable/disable RTM feature
- Builtin macro '__RTM__' is defined if RTM feature is enabled
- RTM intrinsic header is added and introduces 3 new intrinsics, namely
'_xbegin', '_xend', and '_xabort'.
- 3 new builtins are added to keep compatible with gcc, namely
'__builtin_ia32_xbegin', '__builtin_ia32_xend', and '__builtin_ia32_xabort'.
- Test cases for pre-defined macro and new intrinsic codegen are added.
llvm-svn: 167665
predefines based on the output of GCC as well as the CPU predefines.
Invert tests for __AVX__, Clang's AVX feature is hard coded off still.
Switch Atom from 'SSE3' to 'SSSE3'. This matches GCC's behavior, Intel's
documentation, and ICC's documentation (such as I could dig up).
Switch Athlon and Geode to enable 3dnowa rather than just 3dnow and
nothing (resp.).
llvm-svn: 140692
automate the process of updating and generating these tests.
If anyone is really interested, I can check my scripts for generating
this test in, but its a horrible pile of shell... Not sure its really
worth it.
llvm-svn: 140691
is *very* much a WIP that I'll be refining over the next several
commits, but I need to get this checkpoint in place for sanity.
This also adds a much more comprehensive test for architecture macros,
which is roughly generated by inspecting the behavior of a trunk build
of GCC. It still requires some massaging, but eventually I'll even check
in the script that generates these so that others can use it to append
more tests for more architectures, etc.
Next up is a bunch of simplification of the Targets.cpp code, followed
by a lot more test cases once we can reject invalid architectures.
llvm-svn: 140673