llvm-project

Author	SHA1	Message	Date
Joseph Huber	0035f7154c	[CUDA] Create offloading entries when using the new driver The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals for CUDA code. This patch adds the code generation to create these offloading entries when compiling using the new offloading driver mode. The offloading entries are simple structs that contain the information necessary to register the global. The struct used is as follows: ``` Type struct __tgt_offload_entry { void addr; // Pointer to the offload entry info. // (function or global) char name; // Name of the function or global. size_t size; // Size of the entry info (0 if it a function). int32_t flags; int32_t reserved; }; ``` Currently CUDA handles RDC code generation by deferring the registration of globals in the current TU to a callback function containing the modules ID. Later all the module IDs will be used to register all of the globals at once. Rather than mimic this, offloading entries allow us to mimic the way OpenMP registers globals. That is, we create a simple global struct for each device global to be registered. These are placed at a special section `cuda_offloading_entires`. Because this section is a valid C-identifier, the linker will profide a `__start` and `__stop` pointer that we can use to iterate and register all globals at runtime. the registration requires a flag variable to indicate which registration function to use. I have assigned the flags somewhat arbitrarily, but these use the following values. Kernel: 0 Variable: 0 Managed: 1 Surface: 2 Texture: 3 Depends on D120272 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123471	2022-05-11 07:30:21 -04:00
Yaxun (Sam) Liu	afc9d674fe	[CUDA][HIP] support __noinline__ as keyword CUDA/HIP programs use __noinline__ like a keyword e.g. __noinline__ void foo() {} since __noinline__ is defined as a macro __attribute__((noinline)) in CUDA/HIP runtime header files. However, gcc and clang supports __attribute__((__noinline__)) the same as __attribute__((noinline)). Some C++ libraries use __attribute__((__noinline__)) in their header files. When CUDA/HIP programs include such header files, clang will emit error about invalid attributes. This patch fixes this issue by supporting __noinline__ as a keyword, so that CUDA/HIP runtime could remove the macro definition. Reviewed by: Aaron Ballman, Artem Belevich Differential Revision: https://reviews.llvm.org/D124866	2022-05-10 14:32:27 -04:00
Yaxun (Sam) Liu	11d3e31c60	[CUDA][HIP] Fix mangling number for local struct MSVC and Itanium mangling use different mangling numbers for function-scope structs, which causes inconsistent mangled kernel names in device and host compilations. This patch uses Itanium mangling number for structs in for mangling device side names in CUDA/HIP host compilation on Windows to fix this issue. A state is added to ASTContext to indicate whether the current name mangling is for device side names in host compilation. Device and host mangling number are encoded/decoded as upper and lower half of 32 bit unsigned integer to fit into the original mangling number field for AST. Diagnostic will be emitted if a manglining number exceeds limit. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D122734 Fixes: SWDEV-328515	2022-04-28 19:54:43 -04:00
Yaxun (Sam) Liu	57a210e5b7	[CUDA][HIP] Fix linkage of __clang_gpu_used_external Different TU's may have this globl var. appending linkage can only be used with lld recognized special variables. Change it to internal linkage. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D124466	2022-04-26 20:43:39 -04:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Yaxun (Sam) Liu	04fb81674e	[CUDA][HIP] Externalize kernels with internal linkage This patch is a continuation of https://reviews.llvm.org/D123353. Not only kernels in anonymous namespace, but also template kernels with template arguments in anonymous namespace need to be externalized. To be more generic, this patch checks the linkage of a kernel assuming the kernel does not have __global__ attribute. If the linkage is internal then clang will externalize it. This patch also fixes the postfix for externalized symbol since nvptx does not allow '.' in symbol name. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D124189 Fixes: https://github.com/llvm/llvm-project/issues/54560	2022-04-22 17:05:36 -04:00
Yaxun (Sam) Liu	cac4e2fe25	[CUDA][HIP] Fix gpu.used.external Rename gpu.used.external as __clang_gpu_used_external as ptxas does not allow . in global variable name. Fixes: https://github.com/llvm/llvm-project/issues/54934 Reviewed by: Joseph Huber, Artem Belevich Differential Revision: https://reviews.llvm.org/D123946	2022-04-18 23:10:31 -04:00
Yaxun (Sam) Liu	0424b5115c	[CUDA][HIP] Fix host used external kernel in archive For -fgpu-rdc, a host function may call an external kernel which is defined in an archive of bitcode. Since this external kernel is only referenced in host function, the device bitcode does not contain reference to this external kernel, then the linker will not try to resolve this external kernel in the archive. To fix this issue, host-used external kernels and device variables are tracked. A global array containing pointers to these external kernels and variables is emitted which serves as an artificial references to the external kernels and variables used by host. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D123441	2022-04-13 10:47:16 -04:00
Fangrui Song	63fbc77121	[Driver][test] Remove unused/obsoleted REQUIRES: clang-driver It (introduced by 556d713c70bfaf58ac18d089883f9c34c581633a) appears to be related to the removed dragonegg project. In addition, the feature was a bit misnamed and may lur users to unnecessarily use it.	2022-04-12 13:29:46 -07:00
Yaxun (Sam) Liu	4ea1d43509	[CUDA][HIP] Externalize kernels in anonymous name space kernels in anonymous name space needs to have unique name to avoid duplicate symbols. Fixes: https://github.com/llvm/llvm-project/issues/54560 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D123353	2022-04-10 21:56:28 -04:00
Jonas Hahnfeld	e4903d8be3	[CUDA/HIP] Remove argument from module ctor/dtor signatures In theory, constructors can take arguments when called via .init_array where at least glibc passes in (argc, argv, envp). This isn't used in the generated code and if it was, the first argument should be an integer, not a pointer. For destructors registered via atexit, the function should never take an argument. Differential Revision: https://reviews.llvm.org/D123370	2022-04-09 12:34:41 +02:00
Nikita Popov	b16a3b4f3b	[Clang] Add -no-opaque-pointers to more tests (NFC) This adds the flag to more tests that were not caught by the mass-migration in 532dc62b907554b3f07f17205674aa71e76fc863.	2022-04-07 12:53:29 +02:00
Nikita Popov	532dc62b90	[OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC) This adds -no-opaque-pointers to clang tests whose output will change when opaque pointers are enabled by default. This is intended to be part of the migration approach described in https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9. The patch has been produced by replacing %clang_cc1 with %clang_cc1 -no-opaque-pointers for tests that fail with opaque pointers enabled. Worth noting that this doesn't cover all tests, there's a remaining ~40 tests not using %clang_cc1 that will need a followup change. Differential Revision: https://reviews.llvm.org/D123115	2022-04-07 12:09:47 +02:00
Nikita Popov	8a72391f60	[IR] Require intrinsic struct return type to be anonymous This is an alternative to D122376. Rather than working around the problem, this patch requires that struct return types in intrinsics are anonymous/literal and adds auto-upgrade code to convert existing uses of intrinsics with named struct types. This ensures that the mapping between intrinsic name and intrinsic function type is actually bijective, as it is supposed to be. This also fixes https://github.com/llvm/llvm-project/issues/37891. Differential Revision: https://reviews.llvm.org/D122471	2022-03-30 09:51:24 +02:00
Yaxun (Sam) Liu	d41445113b	[CUDA][HIP] Fix hostness check with -fopenmp CUDA/HIP determines whether a function can be called based on the device/host attributes of callee and caller. Clang assumes the caller is CurContext. This is correct in most cases, however, it is not correct in OpenMP parallel region when CUDA/HIP program is compiled with -fopenmp. This causes incorrect overloading resolution and missed diagnostics. To get the correct caller, clang needs to chase the parent chain of DeclContext starting from CurContext until a function decl or a lambda decl is reached. Sema API is adapted to achieve that and used to determine the caller in hostness check. Reviewed by: Artem Belevich, Richard Smith Differential Revision: https://reviews.llvm.org/D121765	2022-03-24 15:19:47 -04:00
Daniil Kovalev	828b63c309	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D120129	2022-03-24 12:36:52 +03:00
Daniil Kovalev	a034878564	Revert "[NVPTX] Enhance vectorization of ld.param & st.param" This reverts commit f854434f0f2a01027bdaad8e6fdac5a782fce291. Placed URL to wrong differential revision in commit message.	2022-03-24 12:32:06 +03:00
Daniil Kovalev	f854434f0f	[NVPTX] Enhance vectorization of ld.param & st.param Since function parameters and return values are passed via param space, we can force special alignment for values hold in it which will add vectorization options. This change may be done if the function has private or internal linkage. Special alignment is forced during 2 phases. 1) Instruction selection lowering. Here we use special alignment for function prototypes (changing both own return value and parameters alignment), call lowering (changing both callee's return value and parameters alignment). 2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that belong to param space (or are casted to it). We only handle cases when all uses of such parameters are loads from it. For such loads, we can change the alignment according to special type alignment and the load offset. Then, load-store-vectorizer IR pass will perform vectorization where alignment allows it. Special alignment calculated as maximum from default ABI type alignment and alignment 16. Alignment 16 is chosen because it's the maximum size of vectorized ld.param & st.param. Before specifying such special alignment, we should check if it is a multiple of the alignment that the type already has. For example, if a value has an enforced alignment of 64, default ABI alignment of 4 and special alignment of 16, we should preserve 64. This patch will be followed by a refactoring patch that removes duplicating code in handling byval and non-byval arguments. Differential Revision: https://reviews.llvm.org/D121549	2022-03-24 12:25:36 +03:00
Changpeng Fang	dd5895cc39	AMDGPU: Use the implicit kernargs for code object version 5 Summary: Specifically, for trap handling, for targets that do not support getDoorbellID, we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1]. To get aperture bases when targets do not have aperture registers, we load private_base or shared_base directly from the implicit kernarg. In clang, we use implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}. Reviewers: arsenm, sameerds, yaxunl Differential Revision: https://reviews.llvm.org/D120265	2022-03-17 14:12:36 -07:00
Stanislav Mekhanoshin	9eabea3968	[AMDGPU] Set noclobber metadata on loads instead of cast to constant A load via pointer cast to constant will return true from pointsToConstantMemory which is not necessarily so. Fixes: SWDEV-326463 Differential Revision: https://reviews.llvm.org/D121172	2022-03-07 23:13:02 -08:00
Yaxun (Sam) Liu	9d899d8f01	[HIP] Support `-fgpu-default-stream` Introduce -fgpu-default-stream={legacy\|per-thread} option to support per-thread default stream for HIP runtime. When -fgpu-default-stream=per-thread, HIP kernels are launched through hipLaunchKernel_spt instead of hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1 is defined by the preprocessor to enable other per-thread stream API's. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D120298	2022-02-23 22:28:29 -05:00
Stanislav Mekhanoshin	b0aa1946df	[AMDGPU] Promote recursive loads from kernel argument to constant Not clobbered pointer load chains are promoted to global now. That is possible to promote these loads itself into constant address space. Loaded pointers still need to point to global because we need to be able to store into that pointer and because an actual load from it may occur after a clobber. Differential Revision: https://reviews.llvm.org/D119886	2022-02-17 11:07:03 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Yaxun (Sam) Liu	1d97cb1f6e	[HIP] Emit amdgpu_code_object_version module flag code object version determines ABI, therefore should not be mixed. This patch emits amdgpu_code_object_version module flag in LLVM IR based on code object version (default 4). The amdgpu_code_object_version value is code object version times 100. LLVM IR with different amdgpu_code_object_version module flag cannot be linked. The -cc1 option -mcode-object-version=none is for ROCm device library use only, which supports multiple ABI. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D119026	2022-02-08 21:58:40 -05:00
Yaxun (Sam) Liu	8428c75da1	[CUDA][HIP] Do not treat host var address as constant in device compilation Currently clang treats host var address as constant in device compilation, which causes const vars initialized with host var address promoted to device variables incorrectly and results in undefined symbols. This patch fixes that. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D118153 Fixes: SWDEV-309881 Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473	2022-01-28 16:04:52 -05:00
Florian Hahn	67aa314bce	[IRGen] Do not overwrite existing attributes in CGCall. When adding new attributes, existing attributes are dropped. While this appears to be a longstanding issue, this was highlighted by D105169 which dropped a lot of attributes due to adding the new noundef attribute. Ahmed Bougacha (@ab) tracked down the issue and provided the fix in CGCall.cpp. I bundled it up and updated the tests.	2022-01-20 13:45:19 +00:00
Yaxun (Sam) Liu	85c2bd2a0e	Prevent adding module flag amdgpu_hostcall multiple times HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one for printf and one for sanitize option. This patch fixes that issue. Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu, Roman Lebedev Differential Revision: https://reviews.llvm.org/D116216	2022-01-19 12:52:33 -05:00
hyeongyu kim	1b1c8d83d3	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2022-01-16 18:54:17 +09:00
Matt Arsenault	33315ef321	clang/AMDGPU: Don't set implicit arg attribute to default size Since 2959e082e1427647e107af0b82770682eaa58fe1, we conservatively assume all inputs are enabled by default. This isn't the best interface for controlling these anyway, since it's not granular and only allows trimming the last fields.	2022-01-14 18:43:30 -05:00
Yaxun (Sam) Liu	3b172f60c6	[HIP] Fix -fgpu-rdc for Windows This patch fixes issues for -fgpu-rdc for Windows MSVC toolchain: Fix COFF specific section flags and remove section types in llvm-mc input file for Windows. Escape fatbin path in llvm-mc input file. Add -triple option to llvm-mc. Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D115039	2021-12-06 16:42:23 -05:00
Anshil Gandhi	df0560ca00	[HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang Introduce `__hip_atomic_load`, `__hip_atomic_store` and `__hip_atomic_compare_exchange_weak` builtins in HIP. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D114553	2021-11-29 12:07:13 -07:00
Yaxun (Sam) Liu	38211bbab1	[HIP] Fix device stub name for Windows This is a follow up of https://reviews.llvm.org/D68578 where device stub name is changed for Itanium mangling but not Microsoft mangling. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D113491	2021-11-23 12:03:49 -05:00
Yaxun (Sam) Liu	e13246a2ec	[HIP] Add HIP scope atomic operations Add an AtomicScopeModel for HIP and support for OpenCL builtins that are missing in HIP. Patch by: Michael Liao Revised by: Anshil Ghandi Reviewed by: Yaxun Liu Differential Revision: https://reviews.llvm.org/D113925	2021-11-23 10:13:37 -05:00
Yaxun (Sam) Liu	4b3881e9f3	Emit hidden hostcall argument for sanitized kernels this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also use hostcall buffer to report a error for invalid memory access, which is not handled by the above patch and it leads to vdi runtime error: Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu, Matt Arsenault Differential Revision: https://reviews.llvm.org/D112820	2021-11-10 17:05:57 -05:00
Yaxun (Sam) Liu	80072fde61	[CUDA][HIP] Allow comdat for kernels Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also requires these symbols be in comdat sections. Linux does not require the symbols in comdat sections to be merged by linker but by default clang puts them in comdat sections. If a template kernel is instantiated identically in two TU's. MSVC requires that them to be in comdat sections, otherwise MSVC linker will diagnose them as duplicate symbols. However, currently clang does not put instantiated template kernels in comdat sections, which causes link error for MSVC. This patch allows putting instantiated template kernels into comdat sections. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D112492	2021-11-10 16:42:23 -05:00
hsmahesha	3b9a85d10a	[CFE][Codegen] Make sure to maintain the contiguity of all the static allocas at the start of the entry block, which in turn would aid better code transformation/optimization. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D110257	2021-11-10 08:45:21 +05:30
hyeongyu kim	fd9b099906	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit aacfbb953eb705af2ecfeb95a6262818fa85dd92. Revert "Fix lit test failures in CodeGenCoroutines" This reverts commit 63fff0f5bffe20fa2c84a45a41161afa0043cb34.	2021-11-09 02:15:55 +09:00
hyeongyukim	aacfbb953e	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453 Resolve lit failures in clang after 8ca4b3e's land Fix lit test failures in clang-ppc* and clang-x64-windows-msvc Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc Fix internal_clone(aarch64) inline assembly	2021-11-06 19:19:22 +09:00
Juneyoung Lee	89ad2822af	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit 7584ef766a7219b6ee5a400637206d26e0fa98ac.	2021-11-06 15:39:19 +09:00
Juneyoung Lee	7584ef766a	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2021-11-06 15:36:42 +09:00
Anshil Gandhi	0567f03331	[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707	2021-10-18 16:53:15 -06:00
Juneyoung Lee	f193bcc701	Revert D105169 due to the two-stage failure in ASAN This reverts the following commits: 37ca7a795b277c20c02a218bf44052278c03344b 9aa6c72b92b6c89cc6d23b693257df9af7de2d15 705387c5074bcca36d626882462ebbc2bcc3bed4 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515 80dba72a669b5416e97a42fd2c2a7bc5a6d3f44a	2021-10-18 23:52:46 +09:00
Juneyoung Lee	8ca4b3ef19	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453	2021-10-16 12:01:41 +09:00
Anshil Gandhi	1830ec94ac	Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols" This reverts commit 03375a3fb33b11e1249d9c88070b7f33cb97802a.	2021-10-15 16:16:18 -06:00
Anshil Gandhi	f92db6d3ff	[HIP] Relax conditions for address space cast in builtin args Allow (implicit) address space casting between LLVM-equivalent target address spaces. Reviewed By: yaxunl, tra Differential Revision: https://reviews.llvm.org/D111734	2021-10-15 15:35:52 -06:00
Anshil Gandhi	53fc5100e0	Revert "[HIP] Relax conditions for address space cast in builtin args" This reverts commit 3b48e1170dc623a95ff13a1e34c839cc094bf321.	2021-10-15 14:42:28 -06:00
Anshil Gandhi	3b48e1170d	[HIP] Relax conditions for address space cast in builtin args Allow (implicit) address space casting between LLVM-equivalent target address spaces. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D111734	2021-10-15 14:06:47 -06:00
Anshil Gandhi	03375a3fb3	[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707	2021-10-15 11:39:15 -06:00
hsmahesha	393581d8a5	[CFE][Codegen] Update auto-generated check lines for few GPU lit tests which is essentially required as a pre-commit for https://reviews.llvm.org/D110257. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D110676	2021-10-07 09:05:39 +05:30
Yaxun (Sam) Liu	c4afb5f81b	[HIP] Fix linking of asanrt.bc HIP currently uses -mlink-builtin-bitcode to link all bitcode libraries, which changes the linkage of functions to be internal once they are linked in. This works for common bitcode libraries since these functions are not intended to be exposed for external callers. However, the functions in the sanitizer bitcode library is intended to be called by instructions generated by the sanitizer pass. If their linkage is changed to internal, their parameters may be altered by optimizations before the sanitizer pass, which renders them unusable by the sanitizer pass. To fix this issue, HIP toolchain links the sanitizer bitcode library with -mlink-bitcode-file, which does not change the linkage. A struct BitCodeLibraryInfo is introduced in ToolChain as a generic approach to pass the bitcode library information between ToolChain and Tool. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D110304	2021-09-27 13:25:46 -04:00

... 2 3 4 5 6 ...

445 Commits