llvm-project

Author	SHA1	Message	Date
Sander de Smalen	d313614b60	[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166 ) Since https://github.com/ARM-software/acle/pull/276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).	2024-02-01 13:37:37 +00:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Vojislav Tomasevic	2a77d92e2e	[clang] Incorrect IR involving the use of bcopy (#79298 ) This patch addresses the issue regarding the call of bcopy function in a conditional expression. It is analogous to the already accepted patch which deals with the same problem, just regarding the bzero function [0]. Here is the testcase which illustrates the issue: ``` void bcopy(const void , void , unsigned long); void foo(void); void test_bcopy() { char dst[20]; char src[20]; int _sz = 20, len = 20; return (_sz ? ((_sz >= len) ? bcopy(src, dst, len) : foo()) : bcopy(src, dst, len)); } ``` When processing it with clang, following issue occurs: Instruction does not dominate all uses! %arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0, !dbg !38 %cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5, %cond.false3 ], !dbg !33 fatal error: error in backend: Broken module found, compilation aborted! This happens because an incorrect phi node is created. It is created because bcopy function call is lowered to the call of llvm.memmove intrinsic and function memmove returns void *. Since llvm.memmove is called in two places in the same return statement, clang creates a phi node in the final basic block for the return value and that phi node is incorrect. However, bcopy function should return void in the first place, so this phi node is unnecessary. This is what this patch addresses. An appropriate test is also added and no existing tests fail when applying this patch. Also, this crash only happens when LLVM is configured with -DLLVM_ENABLE_ASSERTIONS=On option. [0] https://reviews.llvm.org/D39746	2024-01-24 09:39:36 -08:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00
Matthew Devereau	6ba62f4f25	[AArch64][SME2] Refine fcvtu/fcvts/scvtf/ucvtf (#77947 ) Rename intrinsics for fcvtu to fcvtzu and fcvts to fcvtzs. Use llvm_anyvector_ty for both multi vector returns and operands, therefore the return and operands can be specified in the intrinsic call, e.g. @llvm.aarch64.sve.scvtf.x4.nxv4f32.nxv4i32	2024-01-22 15:11:49 +00:00
Piotr Sobczak	57f6a3f7ea	[AMDGPU] Add global_load_tr for GFX12 (#77772 ) Support new amdgcn_global_load_tr instructions for load with transpose. * MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128 * Intrinsic int_amdgcn_global_load_tr * Clang builtins amdgcn_global_load_tr*	2024-01-18 15:14:42 +01:00
Mikael Holmen	e6bd9835d9	[clang][CodeGen] Fix gcc warning about unused variable [NFC] Without the fix gcc warned with ../../clang/lib/CodeGen/CGBuiltin.cpp:1022:19: warning: unused variable 'DRE' [-Wunused-variable] 1022 \| if (const auto *DRE = dyn_cast<DeclRefExpr>(Base)) { \| ^~~ Fix the warning by removing the unused variable and change the "dyn_cast" to "isa".	2024-01-17 13:23:08 +01:00
Bill Wendling	00b6d032a2	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-16 14:26:12 -08:00
Craig Topper	142f270c27	Recommit "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 )" With lldb build fix. Original message: EnumConstantDecl is allocated by the ASTContext allocator so the destructor is never called. This patch takes a similar approach to IntegerLiteral by using APIntStorage to allocate large APSInts using the ASTContext allocator as well. The downside is that an additional heap allocation and copy of the data needs to be made when calling getInitValue if the APSInt is large. Fixes #78160.	2024-01-16 13:52:17 -08:00
Craig Topper	f3d534c425	Revert "[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 )" This reverts commit 4737959d91fab7673b1bb642f88658bb2a24d723. Missed an lldb update.	2024-01-16 12:39:47 -08:00
Craig Topper	4737959d91	[AST] Use APIntStorage to fix memory leak in EnumConstantDecl. (#78311 ) EnumConstantDecl is allocated by the ASTContext allocator so the destructor is never called. This patch takes a similar approach to IntegerLiteral by using APIntStorage to allocate large APSInts using the ASTContext allocator as well. The downside is that an additional heap allocation and copy of the data needs to be made when calling getInitValue if the APSInt is large. Fixes #78160.	2024-01-16 12:10:38 -08:00
Rashmi Mudduluru	a511c1a9ec	Revert "[Clang] Implement the 'counted_by' attribute (#76348 )" This reverts commit 164f85db876e61cf4a3c34493ed11e8f5820f968.	2024-01-15 18:37:52 -08:00
Bill Wendling	164f85db87	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-10 22:20:31 -08:00
Nico Weber	2dce77201c	Revert "[Clang] Implement the 'counted_by' attribute (#76348 )" This reverts commit fefdef808c230c79dca2eb504490ad0f17a765a5. Breaks check-clang, see https://github.com/llvm/llvm-project/pull/76348#issuecomment-1886029515 Also revert follow-on "[Clang] Update 'counted_by' documentation" This reverts commit 4a3fb9ce27dda17e97341f28005a28836c909cfc.	2024-01-10 21:05:19 -05:00
Bill Wendling	4a3fb9ce27	[Clang] Update 'counted_by' documentation Describe a limitation of the 'counted_by' attribute when used in unions. Also fix a errant typo.	2024-01-10 15:36:33 -08:00
Bill Wendling	fefdef808c	[Clang] Implement the 'counted_by' attribute (#76348 ) The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member holding the count of elements in the flexible array. This information is used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin. The 'count' field member must be within the same non-anonymous, enclosing struct as the flexible array member. For example: ``` struct bar; struct foo { int count; struct inner { struct { int count; /* The 'count' referenced by 'counted_by' / }; struct { / ... / struct bar array[] __attribute__((counted_by(count))); }; } baz; }; ``` This example specifies that the flexible array member 'array' has the number of elements allocated for it in 'count': ``` struct bar; struct foo { size_t count; /* ... / struct bar array[] __attribute__((counted_by(count))); }; ``` This establishes a relationship between 'array' and 'count'; specifically that 'p->array' must have at least 'p->count' number of elements available. It's the user's responsibility to ensure that this relationship is maintained throughout changes to the structure. In the following, the allocated array erroneously has fewer elements than what's specified by 'p->count'. This would result in an out-of-bounds access not not being detected: ``` struct foo p; void foo_alloc(size_t count) { p = malloc(MAX(sizeof(struct foo), offsetof(struct foo, array[0]) + count sizeof(struct bar ))); p->count = count + 42; } ``` The next example updates 'p->count', breaking the relationship requirement that 'p->array' must have at least 'p->count' number of elements available: ``` void use_foo(int index, int val) { p->count += 42; p->array[index] = val; / The sanitizer can't properly check this access */ } ``` In this example, an update to 'p->count' maintains the relationship requirement: ``` void use_foo(int index, int val) { if (p->count == 0) return; --p->count; p->array[index] = val; } ```	2024-01-10 15:21:10 -08:00
CarolineConcatto	14e7dac92a	[Clang][LLVM][AArch64]SVE2.1 update the intrinsics according to acle[1] (#76844 ) This patch changes the following intrinsic ```svst1uwq[_{d}] replaced by svst1wq[_{d}] svst1uwq_vnum[_{d}] replaced by svst1wq_vnum[_{d}] svst1udq[_{d}] replaced by svst1dq[_{d}] svst1udq_vnum[_{d}] replaced by svst1dq_vnum[_{d}] ``` Drops 'u' from the quadword stores because it is simply truncating the quadwords to 32 bits ``` svextq_lane[_{d}] replaced by svextq[_{d}] ``` EXTQ follows the previous defined EXT intrinsics ``` svdot[_{d}_{2}_{3}] replaced by svdot[_{d}_{2}] ``` Introduced with the latest SME2 ACLE change [1]https://github.com/ARM-software/acle/pull/257	2024-01-10 17:12:14 +00:00
Sander de Smalen	5055eeea52	[Clang][AArch64] Add missing SME functions to header file. (#75791 ) This includes: * __arm_in_streaming_mode() * __arm_has_sme() * __arm_za_disable() * __svundef_za()	2024-01-02 09:43:30 +00:00
Dinar Temirbulatov	809f2f3d7d	[AArch64][SME2] Add builtins for FDOT, BFDOT, SUDOT, USDOT, SDOT, UDOT. (#75737 ) Add SME2 DOT builtins.	2023-12-21 19:41:24 +00:00
Dinar Temirbulatov	77c5c44b01	[AArch64][SME2] Add SME2 MLA/MLS builtins. (#75584 ) Add SME2 MLA/MLS builtins.	2023-12-21 16:42:24 +00:00
Bill Wendling	cca4d6cfd2	Revert counted_by attribute feature (#75857 ) There are many issues that popped up with the counted_by feature. The patch #73730 has grown too large and approval is blocking Linux testing. Includes reverts of: commit 769bc11f684d ("[Clang] Implement the 'counted_by' attribute (#68750)") commit bc09ec696209 ("[CodeGen] Revamp counted_by calculations (#70606)") commit 1a09cfb2f35d ("[Clang] counted_by attr can apply only to C99 flexible array members (#72347)") commit a76adfb992c6 ("[NFC][Clang] Refactor code to calculate flexible array member size (#72790)") commit d8447c78ab16 ("[Clang] Correct handling of negative and out-of-bounds indices (#71877)") Partial commit b31cd07de5b7 ("[Clang] Regenerate test checks (NFC)") Closes #73168 Closes #75173	2023-12-18 15:16:09 -08:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Simon Pilgrim	df3ddd78f6	CGBuiltin - fix gcc Wunused-variable warning. NFC.	2023-12-18 11:51:24 +00:00
Akira Hatanaka	31429e7a89	[CodeGen] Emit a more accurate alignment for non-temporal loads/stores (#75675 ) Call EmitPointerWithAlignment to compute the alignment based on the underlying lvalue's alignment when it's available.	2023-12-17 18:22:44 -08:00
Lei Huang	aaa3f72c1c	[PowerPC] Emit libcall to frexpl for calls to frexp(ppcDoublDouble) (#75226 ) On Linux PPC call lib func ``frexpl`` for calls to ``frexp()`` for input of type PPCDoubleDouble. Fixes bug: https://github.com/llvm/llvm-project/issues/64426	2023-12-15 17:23:16 -05:00
CarolineConcatto	f2464ca317	[SVE2.1][Clang][LLVM]Int/FP reduce builtin in Clang and LLVM intrinsic (#69926 ) This patch implements the builtins in Clang and the LLVM-IR intrinsic for the following: // Variants are also available for: // _s8, _s16, _u16, _s32, _u32, _s64, _u64, // _f16, _f32, _f64uint8x16_t svaddqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64 uint8x16_t svandqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t sveorqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svorqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for: // _s8, _u16, _s16, _u32, _s32, _u64, _s64; uint8x16_t svmaxqv[_u8](svbool_t pg, svuint8_t zn); uint8x16_t svminqv[_u8](svbool_t pg, svuint8_t zn); // Variants are also available for _f32, _f64 float16x8_t svmaxnmqv[_f16](svbool_t pg, svfloat16_t zn); float16x8_t svminnmqv[_f16](svbool_t pg, svfloat16_t zn); According to the PR#257[1] The reduction instruction uses scalable vectors as input and fixed vectors as output, therefore we changed SVEEmitter to emit fixed vector types in case the neon header(arm_neon.h) is not present. [1]https://github.com/ARM-software/acle/pull/257 Co-author: Dinar Temirbulatov <dinar.temirbulatov@arm.com>	2023-12-13 15:45:59 +00:00
Dinar Temirbulatov	49b27b150b	[AArch64][SME2] Add builtins to cast svbool from/to svcount. (#74720 ) Add builtin: 'svreinterpret_b' to cast from svcount_t to svbool_t. Add builtin: 'svreinterpret_c' to cast from svbool_t to svcount_t. Patch by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-12-08 16:38:29 +00:00
James Y Knight	4d4c30a37c	Use Address for CGBuilder's CreateAtomicRMW and CreateAtomicCmpXchg. (#74349 ) Update all callers to pass through the Address. For the older builtins such as `__sync_` and MSVC `_Interlocked`, natural alignment of the atomic access is _assumed_. This change preserves that behavior. It will pass through greater-than-required alignments, however.	2023-12-04 13:37:04 -05:00
Ulrich Weigand	c61eb44005	[SystemZ] Implement vector rotate in terms of funnel shift Clang currently implements a set of vector rotate builtins (__builtin_s390_verll) in terms of platform-specific LLVM intrinsics. To simplify the IR (and allow for common code optimizations if applicable), this patch removes those LLVM intrinsics and implements the builtins in terms of the platform-independent funnel shift intrinsics instead. Also, fix the prototype of the __builtin_s390_verll builtins for full compatibility with GCC.	2023-12-04 16:52:00 +01:00
Dominik Adamski	95943d2fab	[Flang] Add code-object-version option (#72638 ) Information about code object version can be configured by the user for AMD GPU target and it needs to be placed in LLVM IR generated by Flang. Information about code object version in MLIR generated by the parser can be reused by other tools. There is no need to specify extra flags if we want to invoke MLIR tools (like fir-opt) separately. Changes in comparison to a8ac93: * added information about required targets for test flang/test/Driver/driver-help.f90	2023-11-29 03:01:01 -06:00
Dominik Adamski	f00ffcdb58	Revert "[Flang] Add code-object-version option (#72638 )" This commit causes test errors on buildbots. This reverts commit a8ac930b99d93b2a539ada7e566993d148899144.	2023-11-28 13:18:46 -06:00
Dominik Adamski	a8ac930b99	[Flang] Add code-object-version option (#72638 ) Information about code object version can be configured by the user for AMD GPU target and it needs to be placed in LLVM IR generated by Flang. Information about code object version in MLIR generated by the parser can be reused by other tools. There is no need to specify extra flags if we want to invoke MLIR tools (like fir-opt) separately.	2023-11-28 19:57:36 +01:00
Youngsuk Kim	10e483521a	[clang][CodeGen] Remove ptr-to-ptr bitcasts (NFC) (#73020 ) Opaque ptr cleanup effort	2023-11-23 11:34:59 -05:00
Momchil Velikov	f335883808	[AArch64][SVE2.1] Add intrinsics for quadword loads/stores with unscaled offset (#70474 ) This patch adds a set of SVE2.1 quadword load/store intrisics: * Contiguous zero-extending load to quadword (single vector) sv<type>_t svld1uwq[_<typ>](svbool_t, const <type>_t ptr); sv<type>_t svld1uwq_vnum[_<typ>](svbool_t, const <type> ptr, int64_t vnum); sv<type>_t svld1udq[_<typ>](svbool_t, const <type>_t ptr); sv<type>_t svld1udq_vnum[_<typ>](svbool_t, const <type>_t ptr, int64_t vnum); * Contiguous truncating store of single vector operand void svst1uwq[_<typ>](svbool_t, const <type>_t ptr, sv<type>_t data); void svst1uwq_vnum[_<typ>](svbool_t, const <type>_t ptr, int64_t vnum, sv<type>_t data); void svst1udq[_<typ>](svbool_t, const <type>_t ptr, sv<type>_t data); void svst1udq_vnum[_<typ>](svbool_t, const <type>_t ptr, int64_t vnum, sv<type>_t data); * Gather load quadword sv<type>_t svld1q_gather[_u64base]_<typ>(svbool_t pg, svuint64_t zn); sv<type>_t svld1q_gather[_u64base]_offset_<typ>(svbool_t pg, svuint64_t zn, int64_t offset); * Scatter store quadword void svst1q_scatter[_u64base][_<typ>](svbool_t pg, svuint64_t zn, sv<type>_t data); void svst1q_scatter[_u64base]_offset[_<typ>](svbool_t pg, svuint64_t zn, int64_t offset, sv<type>_t data); * Contiguous load two, three or four quadword structures. sv<type>x2_t svld2q[_<typ>](svbool_t pg, const <type>_t rn); sv<type>x2_t svld2q_vnum[_<typ>](svbool_t pg, const <type>_t rn, uint64_t vnum); sv<type>x3_t svld3q[_<typ>](svbool_t pg, const <type>_t rn); sv<type>x3_t svld3q_vnum[_<typ>](svbool_t pg, const <type>_t rn, uint64_t vnum); sv<type>x4_t svld4q[_<typ>](svbool_t pg, const <type>_t rn); sv<type>x4_t svld4q_vnum[_<typ>](svbool_t pg, const <type>_t rn, uint64_t vnum); * Contiguous store two, three or four quadword structures. void svst2q[_<typ>](svbool_t pg, <type>_t rn, sv<type>x2_t zt); void svst2q_vnum[_<typ>](svbool_t pg, <type>_t rn, int64_t vnum, sv<type>x2_t zt); void svst3q[_<typ>](svbool_t pg, <type>_t rn, sv<type>x3_t zt); void svst3q_vnum[_<typ>](svbool_t pg, <type>_t rn, int64_t vnum, sv<type>x3_t zt); void svst4q[_<typ>](svbool_t pg, <type>_t rn, sv<type>x4_t zt); void svst4q_vnum[_<typ>](svbool_t pg, <type>_t rn, int64_t vnum, sv<type>x4_t zt); ACLE spec: https://github.com/ARM-software/acle/pull/257 Co-authored-by: Caroline Concatto <caroline.concatto@arm.com> Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>	2023-11-21 15:34:59 +00:00
Bill Wendling	d8447c78ab	[Clang] Correct handling of negative and out-of-bounds indices (#71877 ) GCC returns 0 for a negative index on an array in a structure. It also returns 0 for an array index that goes beyond the extent of the array. In addition. a pointer to a struct field returns that field's size, not the size of it plus the rest of the struct, unless it's the first field in the struct. struct s { int count; char dummy; int array[] __attribute((counted_by(count))); }; struct s *p = malloc(...); p->count = 10; A __bdos on the elements of p return: __bdos(p, 0) == 30 __bdos(p->array, 0) == 10 __bdos(&p->array[0], 0) == 10 __bdos(&p->array[-1], 0) == 0 __bdos(&p->array[42], 0) == 0 Also perform some refactoring, putting the "counted_by" calculations in their own function.	2023-11-20 09:49:20 -08:00
Sam Tebbs	f7b5c25507	[AArch64][SME] Remove immediate argument restriction for svldr and svstr (#68565 ) The svldr_vnum and svstr_vnum builtins always modify the base register and tile slice and provide immediate offsets of zero, even when the offset provided to the builtin is an immediate. This patch optimises the output of the builtins when the offset is an immediate, to pass it directly to the instruction and to not need the base register and tile slice updates.	2023-11-20 09:57:29 +00:00
Bill Wendling	a76adfb992	[NFC][Clang] Refactor code to calculate flexible array member size (#72790 ) The code that calculates the flexible array member size is big enough to warrant its own method.	2023-11-19 19:25:10 -08:00
Momchil Velikov	96ef623a75	[AArch64] Cast predicate operand of SVE gather loads/scater stores to the parameter type of the intrinsic (NFC) (#71289 ) When emitting LLVM IR for gather loads/scatter stores, the predicate parameter is cast to a type that depends on the loaded, resp. stored type. That's correct for operation where we have a predicate per lane, however it is not correct for quadword loads and stores (`LD1Q`, `ST1Q`) where the predicate is per 128-bit chunk, independent from the ACLE intrinsic type. This can be universally handled by cast to the corresponding parameter type of the intrinsic. The intrinsic itself should be defined in a way that enforces relations between parameter types.	2023-11-13 16:01:07 +00:00
Jessica Del	b025864af8	[AMDGPU] - Add clang builtins for tied WMMA intrinsics (#70669 ) Add clang builtins for the new tied wmma intrinsics. These variations tie the destination accumulator matrix to the input accumulator matrix. See https://github.com/llvm/llvm-project/pull/69903 for context.	2023-11-13 13:23:26 +01:00
Fangrui Song	65f2cf25c3	Revert "[CodeGen] -fsanitize=alignment: add cl::opt sanitize-alignment-builtin to disable memcpy instrumentation (#69240 )" This reverts commit e8fe4de64ffb84924c41e54116a04570046eed74. memcpy/memmove instrumentation for -fsanitize=alignment has been tested on a huge code base. There were some cleanups but the number does not justify a workaround.	2023-11-12 22:26:27 -08:00
Bill Wendling	bc09ec6962	[CodeGen] Revamp counted_by calculations (#70606 ) Break down the counted_by calculations so that they correctly handle anonymous structs, which are specified internally as IndirectFieldDecls. Improves the calculation of __bdos on a different field member in the struct. And also improves support for __bdos in an index into the FAM. If the index is further out than the length of the FAM, then we return __bdos's "can't determine the size" value (zero or negative one, depending on type). Also simplify the code to use helper methods to get the field referenced by counted_by and the flexible array member itself, which also had some issues with FAMs in sub-structs.	2023-11-09 10:18:17 -08:00
Saiyedul Islam	21861991e7	[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234 ) Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already available global variable "oclc_ABI_version" instead of "llvm.amdgcn.abi.verion". It also adds some minor fields in ImplicitArg structure.	2023-11-09 10:34:35 +05:30
Pravin Jagtap	1f21e49870	Revert "Revert "[AMDGPU] const-fold imm operands of (#71669 ) amdgcn_update_dpp intrinsic (#71139)"" This reverts commit d1fb9307951319eea3e869d78470341d603c8363 and fixes the lit test clang/test/CodeGenHIP/dpp-const-fold.hip --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-11-09 10:09:22 +05:30
Mitch Phillips	d1fb930795	Revert "[AMDGPU] const-fold imm operands of amdgcn_update_dpp intrinsic (#71139 )" This reverts commit 32a3f2afe6ea7ffb02a6a188b123ded6f4c89f6c. Reason: Broke the sanitizer buildbots. More details at `32a3f2afe6`	2023-11-08 12:50:53 +01:00
Pravin Jagtap	32a3f2afe6	[AMDGPU] const-fold imm operands of amdgcn_update_dpp intrinsic (#71139 ) Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant to match the intrinsic requirements. Fixes: SWDEV-426822, SWDEV-431138 --------- Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2023-11-08 15:09:10 +05:30
Noah Goldstein	590884a860	[Clang][CodeGen] Stoping emitting alignment assumes for `align_{up,down}` Now that `align_{up,down}` use `llvm.ptrmask` (as of #71238), the assume doesn't preserve any information that is not still easily re-computable. Closes #71295	2023-11-07 00:31:04 -06:00
Vlad Serebrennikov	dda8e3de35	[clang][NFC] Refactor `ImplicitParamDecl::ImplicitParamKind` This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.	2023-11-06 12:01:09 +03:00
Noah Goldstein	71be514fa0	[Clang][CodeGen] Emit `llvm.ptrmask` for `align_up` and `align_down` Since PR's #69343 and #67166 we probably have enough support for `llvm.ptrmask` to make it preferable to the GEP stategy. Closes #71238	2023-11-04 14:20:54 -05:00
Momchil Velikov	9b3bb7a066	[AArch64] Implement reinterpret builtins for SVE vector tuples (#69598 ) This patch adds reinterpret builtins as proposed here: https://github.com/ARM-software/acle/pull/275. The builtins take the form: sv<dst>x<N>_t svreinterpret_<dst>_<src>_x<N>(sv<src>x<N>_t op) where - <src> and <dst> designate the source and the destination type, respectively, all pairs chosen from {s8, u8, s16, u8, s32, u32, s64, u64, bf16, f16, f32, f64} - <N> designated the number of tuple elements, 2, 3 or 4 A short (overloaded) for is also provided, where the destination type is explicitly designated and the source type is deduced from the parameter type. These take the form sv<dst>x<N>_t svreinterpret_<dst>(sv<src>x<N>_t op) For example: svuin16x2_t svreinterpret_u16_s32_x2(svint32x2_t op); svuin16x2_t svreinterpret_u16(svint32x2_t op);	2023-11-03 11:45:08 +00:00
Kerry McLaughlin	8f59c168a9	[AArch64][Clang] Refactor code to emit SVE & SME builtins (#70959 ) This patch removes duplicated code in EmitAArch64SVEBuiltinExpr and EmitAArch64SMEBuiltinExpr by creating a new function called GetAArch64SVEProcessedOperands which handles splitting up multi-vector arguments using vector extracts. These changes are non-functional.	2023-11-02 15:47:37 +00:00

1 2 3 4 5 ...

1856 Commits