llvm-project

Commit Graph

Author	SHA1	Message	Date
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit `d882ba7aea`.	2022-11-29 15:21:09 -06:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Pierre van Houtryve	c05f1639f7	[clang][cuda/hip] Allow `__noinline__` lambdas D124866 seem to have had an unintended side effect: __noinline__ on lambdas was no longer accepted. This fixes the regression and adds a test case for it. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D137251	2022-11-04 07:33:31 +00:00
Artem Belevich	0e8a414ab3	[CUDA, NVPTX] Added basic __bf16 support for NVPTX. Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and that makes those types visible furing GPU-side compilation, where it currently fails with Sema complaining that __bf16 is not supported. Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's enabled on the host should pose no issues, correctness-wise. Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better support for __bf16 on NVPTX going forward. Differential Revision: https://reviews.llvm.org/D136311	2022-10-25 11:08:06 -07:00
Artem Belevich	a10eb07d1a	Do not append terminating NUL to the binary string with embedded fatbin. Extra NUL does not impact functionality of the generated code, but it confuses various NVIDIA tools used to examine embedded GPU binaries. Differential Revision: https://reviews.llvm.org/D135832	2022-10-17 15:39:39 -07:00
Fangrui Song	ccd89be645	[Driver] Remove legacy cc1 alias -mlink-cuda-bitcode r340193 added -mlink-builtin-bitcode as the canonical spelling.	2022-10-15 14:02:13 -07:00
Zahira Ammarguellat	84a9ec2ff1	Remove redundant option -menable-unsafe-fp-math. There are currently two options that are used to tell the compiler to perform unsafe floating-point optimizations: '-ffast-math' and '-funsafe-math-optimizations'. '-ffast-math' is enabled by default. It automatically enables the driver option '-menable-unsafe-fp-math'. Below is a table illustrating the special operations enabled automatically by '-ffast-math', '-funsafe-math-optimizations' and '-menable-unsafe-fp-math' respectively. Special Operations -ffast-math -funsafe-math-optimizations -menable-unsafe-fp-math MathErrno 0 1 1 FiniteMathOnly 1 0 0 AllowFPReassoc 1 1 1 NoSignedZero 1 1 1 AllowRecip 1 1 1 ApproxFunc 1 1 1 RoundingMath 0 0 0 UnsafeFPMath 1 0 1 FPContract fast on on '-ffast-math' enables '-fno-math-errno', '-ffinite-math-only', '-funsafe-math-optimzations' and sets 'FpContract' to 'fast'. The driver option '-menable-unsafe-fp-math' enables the same special options than '-funsafe-math-optimizations'. This is redundant. We propose to remove the driver option '-menable-unsafe-fp-math' and use instead, the setting of the special operations to set the function attribute 'unsafe-fp-math'. This attribute will be enabled only if those special operations are enabled and if 'FPContract' is either 'fast' or set to the default value. Differential Revision: https://reviews.llvm.org/D135097	2022-10-14 10:55:29 -04:00
Yaxun (Sam) Liu	eb26baf4ad	Fix test bool-range.cu Promoting kernel arg pointer to global addr space is only available with registered amdgcn target. Fix test so that it does not require registered amdgcn target.	2022-10-07 11:17:28 -04:00
Yaxun (Sam) Liu	107ee26130	[AMDGPU] Disable bool range metadata to workaround backend issue Currently there is a middle-end or backend issue https://github.com/llvm/llvm-project/issues/58176 which causes values loaded from bool pointer incorrect when bool range metadata is emitted. Temporarily disable bool range metadata until the backend issue is fixed. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D135269 Fixes: SWDEV-344137	2022-10-07 10:46:04 -04:00
Yaxun (Sam) Liu	5e25284dbc	[AMDGPU] Emit module flag for all code object versions Reviewed by: Changpeng Fang, Matt Arsenault, Brian Sumner Differential Revision: https://reviews.llvm.org/D134355	2022-09-22 16:51:33 -04:00
Fangrui Song	74742147ee	[test] Change cc1 -fvisibility to -fvisibility=	2022-09-02 12:36:44 -07:00
Johannes Doerfert	48d6f52401	[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive A copy-paste error caused UB in the definition of the unsigned long long versions of the shfl intrinsics. Reported and diagnosed by @trws. Differential Revision: https://reviews.llvm.org/D129536	2022-07-21 12:36:54 -05:00
Joseph Huber	b370be37cc	[CUDA] Allow the new driver to compile CUDA in non-RDC mode The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation, but we still would like the new driver to be able to handle this mode so we can transition away from the old driver in the future. This patch adds the necessary code to support creating a fatbinary for CUDA code generation as well as removing old assumptions and errors about RDC-mode with the new driver. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D129655	2022-07-13 21:49:15 -04:00
Joseph Huber	e88d53d25f	[HIP] Generate offloading entries for HIP with the new driver. This patch adds the small change required to output offloading entried for HIP instead of CUDA. These should be placed in different sections so because they need to be distinct to the offloading toolchain, otherwise we'd have HIP trying to register CUDA kernels or vice-versa. This patch will precede support for HIP in the linker wrapper. Reviewed By: yaxunl, tra Differential Revision: https://reviews.llvm.org/D128850	2022-07-11 15:49:21 -04:00
Nikita Popov	935570b2ad	[ConstExpr] Don't create div/rem expressions This removes creation of udiv/sdiv/urem/srem constant expressions, in preparation for their removal. I've added a ConstantExpr::isDesirableBinOp() predicate to determine whether an expression should be created for a certain operator. With this patch, div/rem expressions can still be created through explicit IR/bitcode, forbidding them entirely will be the next step. Differential Revision: https://reviews.llvm.org/D128820	2022-07-05 15:54:53 +02:00
Yaxun (Sam) Liu	8ad4c6e4b1	[HIP] add -fhip-kernel-arg-name Add option -fhip-kernel-arg-name to emit kernel argument name metadata, which is needed for certain HIP applications. Reviewed by: Artem Belevich, Fangrui Song, Brian Sumner Differential Revision: https://reviews.llvm.org/D128022	2022-06-24 11:15:36 -04:00
Nikita Popov	6f258c0fd3	[Clang] Don't test register allocation This test was broken by `719658d078`. How did an assembly test get into clang/test?	2022-06-23 12:46:00 +02:00
Yaxun (Sam) Liu	af9ee3357c	[HIP] fix long double size For amdgpu target long double type is the same as double type. The width and align of long double type was incorrectly overridden when copying aux target properties, which caused assertion in codegen when emitting global variables with long double type. This patch fix that by saving and restoring width and align of long double type. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D127771 Fixes: SWDEV-335515	2022-06-14 21:57:56 -04:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Joseph Huber	1bae02b773	[Cuda] Use fallback method to mangle externalized decls if no CUID given CUDA requires that static variables be visible to the host when offloading. However, The standard semantics of a stiatc variable dictate that it should not be visible outside of the current file. In order to access it from the host we need to perform "externalization" on the static variable on the device. This requires generating a semi-unique name that can be affixed to the variable as to not cause linker errors. This is currently done using the CUID functionality, an MD5 hash value set up by the clang driver. This allows us to achieve is mostly unique ID that is unique even between multiple compilations of the same file. However, this is not always availible. Instead, this patch uses the unique ID from the file to generate a unique symbol name. This will create a unique name that is consistent between the host and device side compilations without requiring the CUID to be entered by the driver. The one downside to this is that we are no longer stable under multiple compilations of the same file. However, this is a very niche use-case and is not supported by Nvidia's CUDA compiler so it likely to be good enough. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D125904	2022-05-26 09:18:22 -04:00
Joseph Huber	0035f7154c	[CUDA] Create offloading entries when using the new driver The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals for CUDA code. This patch adds the code generation to create these offloading entries when compiling using the new offloading driver mode. The offloading entries are simple structs that contain the information necessary to register the global. The struct used is as follows: ``` Type struct __tgt_offload_entry { void addr; // Pointer to the offload entry info. // (function or global) char name; // Name of the function or global. size_t size; // Size of the entry info (0 if it a function). int32_t flags; int32_t reserved; }; ``` Currently CUDA handles RDC code generation by deferring the registration of globals in the current TU to a callback function containing the modules ID. Later all the module IDs will be used to register all of the globals at once. Rather than mimic this, offloading entries allow us to mimic the way OpenMP registers globals. That is, we create a simple global struct for each device global to be registered. These are placed at a special section `cuda_offloading_entires`. Because this section is a valid C-identifier, the linker will profide a `__start` and `__stop` pointer that we can use to iterate and register all globals at runtime. the registration requires a flag variable to indicate which registration function to use. I have assigned the flags somewhat arbitrarily, but these use the following values. Kernel: 0 Variable: 0 Managed: 1 Surface: 2 Texture: 3 Depends on D120272 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123471	2022-05-11 07:30:21 -04:00
Yaxun (Sam) Liu	afc9d674fe	[CUDA][HIP] support __noinline__ as keyword CUDA/HIP programs use __noinline__ like a keyword e.g. __noinline__ void foo() {} since __noinline__ is defined as a macro __attribute__((noinline)) in CUDA/HIP runtime header files. However, gcc and clang supports __attribute__((__noinline__)) the same as __attribute__((noinline)). Some C++ libraries use __attribute__((__noinline__)) in their header files. When CUDA/HIP programs include such header files, clang will emit error about invalid attributes. This patch fixes this issue by supporting __noinline__ as a keyword, so that CUDA/HIP runtime could remove the macro definition. Reviewed by: Aaron Ballman, Artem Belevich Differential Revision: https://reviews.llvm.org/D124866	2022-05-10 14:32:27 -04:00
Yaxun (Sam) Liu	11d3e31c60	[CUDA][HIP] Fix mangling number for local struct MSVC and Itanium mangling use different mangling numbers for function-scope structs, which causes inconsistent mangled kernel names in device and host compilations. This patch uses Itanium mangling number for structs in for mangling device side names in CUDA/HIP host compilation on Windows to fix this issue. A state is added to ASTContext to indicate whether the current name mangling is for device side names in host compilation. Device and host mangling number are encoded/decoded as upper and lower half of 32 bit unsigned integer to fit into the original mangling number field for AST. Diagnostic will be emitted if a manglining number exceeds limit. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D122734 Fixes: SWDEV-328515	2022-04-28 19:54:43 -04:00
Yaxun (Sam) Liu	57a210e5b7	[CUDA][HIP] Fix linkage of __clang_gpu_used_external Different TU's may have this globl var. appending linkage can only be used with lld recognized special variables. Change it to internal linkage. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D124466	2022-04-26 20:43:39 -04:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Yaxun (Sam) Liu	04fb81674e	[CUDA][HIP] Externalize kernels with internal linkage This patch is a continuation of https://reviews.llvm.org/D123353. Not only kernels in anonymous namespace, but also template kernels with template arguments in anonymous namespace need to be externalized. To be more generic, this patch checks the linkage of a kernel assuming the kernel does not have __global__ attribute. If the linkage is internal then clang will externalize it. This patch also fixes the postfix for externalized symbol since nvptx does not allow '.' in symbol name. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D124189 Fixes: https://github.com/llvm/llvm-project/issues/54560	2022-04-22 17:05:36 -04:00
Yaxun (Sam) Liu	cac4e2fe25	[CUDA][HIP] Fix gpu.used.external Rename gpu.used.external as __clang_gpu_used_external as ptxas does not allow . in global variable name. Fixes: https://github.com/llvm/llvm-project/issues/54934 Reviewed by: Joseph Huber, Artem Belevich Differential Revision: https://reviews.llvm.org/D123946	2022-04-18 23:10:31 -04:00
Yaxun (Sam) Liu	0424b5115c	[CUDA][HIP] Fix host used external kernel in archive For -fgpu-rdc, a host function may call an external kernel which is defined in an archive of bitcode. Since this external kernel is only referenced in host function, the device bitcode does not contain reference to this external kernel, then the linker will not try to resolve this external kernel in the archive. To fix this issue, host-used external kernels and device variables are tracked. A global array containing pointers to these external kernels and variables is emitted which serves as an artificial references to the external kernels and variables used by host. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D123441	2022-04-13 10:47:16 -04:00
Fangrui Song	63fbc77121	[Driver][test] Remove unused/obsoleted REQUIRES: clang-driver It (introduced by `556d713c70`) appears to be related to the removed dragonegg project. In addition, the feature was a bit misnamed and may lur users to unnecessarily use it.	2022-04-12 13:29:46 -07:00
Yaxun (Sam) Liu	4ea1d43509	[CUDA][HIP] Externalize kernels in anonymous name space kernels in anonymous name space needs to have unique name to avoid duplicate symbols. Fixes: https://github.com/llvm/llvm-project/issues/54560 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D123353	2022-04-10 21:56:28 -04:00
Jonas Hahnfeld	e4903d8be3	[CUDA/HIP] Remove argument from module ctor/dtor signatures In theory, constructors can take arguments when called via .init_array where at least glibc passes in (argc, argv, envp). This isn't used in the generated code and if it was, the first argument should be an integer, not a pointer. For destructors registered via atexit, the function should never take an argument. Differential Revision: https://reviews.llvm.org/D123370	2022-04-09 12:34:41 +02:00
Nikita Popov	b16a3b4f3b	[Clang] Add -no-opaque-pointers to more tests (NFC) This adds the flag to more tests that were not caught by the mass-migration in `532dc62b90`.	2022-04-07 12:53:29 +02:00
Nikita Popov	532dc62b90	[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9. The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change. Differential Revision: https://reviews.llvm.org/D123115	2022-04-07 12:09:47 +02:00
Nikita Popov	8a72391f60	[IR] Require intrinsic struct return type to be anonymous This is an alternative to D122376. Rather than working around the problem, this patch requires that struct return types in intrinsics are anonymous/literal and adds auto-upgrade code to convert existing uses of intrinsics with named struct types. This ensures that the mapping between intrinsic name and intrinsic function type is actually bijective, as it is supposed to be. This also fixes https://github.com/llvm/llvm-project/issues/37891. Differential Revision: https://reviews.llvm.org/D122471	2022-03-30 09:51:24 +02:00
Yaxun (Sam) Liu	d41445113b	[CUDA][HIP] Fix hostness check with -fopenmp CUDA/HIP determines whether a function can be called based on the device/host attributes of callee and caller. Clang assumes the caller is CurContext. This is correct in most cases, however, it is not correct in OpenMP parallel region when CUDA/HIP program is compiled with -fopenmp. This causes incorrect overloading resolution and missed diagnostics. To get the correct caller, clang needs to chase the parent chain of DeclContext starting from CurContext until a function decl or a lambda decl is reached. Sema API is adapted to achieve that and used to determine the caller in hostness check. Reviewed by: Artem Belevich, Richard Smith Differential Revision: https://reviews.llvm.org/D121765	2022-03-24 15:19:47 -04:00
Daniil Kovalev	828b63c309	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D120129	2022-03-24 12:36:52 +03:00
Daniil Kovalev	a034878564	Revert "[NVPTX] Enhance vectorization of ld.param & st.param" This reverts commit `f854434f0f`. Placed URL to wrong differential revision in commit message.	2022-03-24 12:32:06 +03:00
Daniil Kovalev	f854434f0f	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:25:36 +03:00
Changpeng Fang	dd5895cc39	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265	2022-03-17 14:12:36 -07:00
Stanislav Mekhanoshin	9eabea3968	[AMDGPU] Set noclobber metadata on loads instead of cast to constant A load via pointer cast to constant will return true from pointsToConstantMemory which is not necessarily so. Fixes: SWDEV-326463 Differential Revision: https://reviews.llvm.org/D121172	2022-03-07 23:13:02 -08:00
Yaxun (Sam) Liu	9d899d8f01	[HIP] Support `-fgpu-default-stream` Introduce -fgpu-default-stream={legacy\|per-thread} option to support per-thread default stream for HIP runtime. When -fgpu-default-stream=per-thread, HIP kernels are launched through hipLaunchKernel_spt instead of hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1 is defined by the preprocessor to enable other per-thread stream API's. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D120298	2022-02-23 22:28:29 -05:00
Stanislav Mekhanoshin	b0aa1946df	[AMDGPU] Promote recursive loads from kernel argument to constant Not clobbered pointer load chains are promoted to global now. That is possible to promote these loads itself into constant address space. Loaded pointers still need to point to global because we need to be able to store into that pointer and because an actual load from it may occur after a clobber. Differential Revision: https://reviews.llvm.org/D119886	2022-02-17 11:07:03 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Yaxun (Sam) Liu	1d97cb1f6e	[HIP] Emit amdgpu_code_object_version module flag code object version determines ABI, therefore should not be mixed. This patch emits amdgpu_code_object_version module flag in LLVM IR based on code object version (default 4). The amdgpu_code_object_version value is code object version times 100. LLVM IR with different amdgpu_code_object_version module flag cannot be linked. The -cc1 option -mcode-object-version=none is for ROCm device library use only, which supports multiple ABI. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D119026	2022-02-08 21:58:40 -05:00
Yaxun (Sam) Liu	8428c75da1	[CUDA][HIP] Do not treat host var address as constant in device compilation Currently clang treats host var address as constant in device compilation, which causes const vars initialized with host var address promoted to device variables incorrectly and results in undefined symbols. This patch fixes that. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D118153 Fixes: SWDEV-309881 Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473	2022-01-28 16:04:52 -05:00
Florian Hahn	67aa314bce	[IRGen] Do not overwrite existing attributes in CGCall. When adding new attributes, existing attributes are dropped. While this appears to be a longstanding issue, this was highlighted by D105169 which dropped a lot of attributes due to adding the new noundef attribute. Ahmed Bougacha (@ab) tracked down the issue and provided the fix in CGCall.cpp. I bundled it up and updated the tests.	2022-01-20 13:45:19 +00:00
Yaxun (Sam) Liu	85c2bd2a0e	Prevent adding module flag amdgpu_hostcall multiple times HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one for printf and one for sanitize option. This patch fixes that issue. Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu, Roman Lebedev Differential Revision: https://reviews.llvm.org/D116216	2022-01-19 12:52:33 -05:00
hyeongyu kim	1b1c8d83d3	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2022-01-16 18:54:17 +09:00
Matt Arsenault	33315ef321	clang/AMDGPU: Don't set implicit arg attribute to default size Since `2959e082e1`, we conservatively assume all inputs are enabled by default. This isn't the best interface for controlling these anyway, since it's not granular and only allows trimming the last fields.	2022-01-14 18:43:30 -05:00
Yaxun (Sam) Liu	3b172f60c6	[HIP] Fix -fgpu-rdc for Windows This patch fixes issues for -fgpu-rdc for Windows MSVC toolchain: Fix COFF specific section flags and remove section types in llvm-mc input file for Windows. Escape fatbin path in llvm-mc input file. Add -triple option to llvm-mc. Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D115039	2021-12-06 16:42:23 -05:00

1 2 3 4 5 ...

315 Commits