llvm-project

Author	SHA1	Message	Date
Helena Kotas	776cddacb1	[HLSL] Implement default constant buffer `$Globals` (#125807 ) All variable declarations in the global scope that are not resources, static or empty are implicitly added to implicit constant buffer `$Globals`. They are created in `hlsl_constant` address space and collected in an implicit `HLSLBufferDecl` node that is added to the AST at the end of the translation unit. Codegen is the same as for explicit constant buffers. Fixes #123801	2025-02-20 17:27:53 -08:00
Helena Kotas	19af8581d5	[HLSL] Constant Buffers CodeGen (#124886 ) Translates `cbuffer` declaration blocks to `target("dx.CBuffer")` type. Creates global variables in `hlsl_constant` address space for all `cbuffer` constant and adds metadata describing which global constant belongs to which constant buffer. For explicit constant buffer layout information an explicit layout type `target("dx.Layout")` is used. This might change in the future. The constant globals are temporary and will be removed in upcoming pass that will translate `load` instructions in the `hlsl_constant` address space to constant buffer load intrinsics calls off a CBV handle (#124630, #112992). See [Constant buffer design doc](https://github.com/llvm/wg-hlsl/pull/94) for more details. Fixes #113514, #106596	2025-02-20 10:32:14 -08:00
Bruno De Fraine	458b1e9d2c	[TBAA] Refine pointer-tbaa for void pointers by pointer depth (#126047 ) Commit 77d3f8a avoids distinct tags for any pointers where the ultimate pointee type is `void`, to solve breakage in real-world code that uses (indirections to) `void` for polymorphism over different pointer types. While this matches the TBAA implementation in GCC, this patch implements a refinement that distinguishes void pointers by pointer depth, as described in the "strict aliasing" documentation included in the aforementioned commit: > `void` is permitted to alias any pointer type, `void` is permitted > to alias any pointer to pointer type, and so on. For example, `void` is no longer considered to alias `int` in this refinement, but it remains possible to use `void*` for polymorphism over pointers to pointers.	2025-02-20 16:06:56 +04:00
Benjamin Maxwell	d595fc91ae	Revert "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#127987 ) Reverts llvm/llvm-project#126750 Revering while I investigate: https://lab.llvm.org/buildbot/#/builders/72/builds/8406	2025-02-20 10:25:18 +00:00
Anshil Gandhi	95000fdb9e	[CUDA] Increment VTable index for device thunks (#124989 ) Currently, the clang frontend incorrectly emits the callee instead of the thunk for the callee in the VTable. This is the case because the thunk index is not incremented when their callees cannot be emitted. This patch fixes the bug.	2025-02-19 23:47:22 -05:00
Sebastian Jodłowski	0127f169dc	[CUDA] Add support for sm101 and sm120 target architectures (#127187 ) Add support for sm101 and sm120 target architectures. It requires CUDA 12.8. --------- Co-authored-by: Sebastian Jodlowski <sjodlowski@nuro.ai>	2025-02-19 14:41:07 -08:00
Deric Cheung	1c762c288f	[HLSL] Implement the 'and' HLSL function (#127098 ) Addresses #125604 - Implements `and` as an HLSL builtin function - The `and` HLSL builtin function gets lowered to the the LLVM `and` instruction	2025-02-19 11:22:46 -08:00
Jakub Ficek	fda0e63e73	[clang] handle fp options in __builtin_convertvector (#125522 ) This patch allows using fpfeatures pragmas with __builtin_convertvector: - added TrailingObjects with FPOptionsOverride and methods for handling it to ConvertVectorExpr - added support for codegen, node dumping, and serialization of fpfeatures contained in ConvertVectorExpr	2025-02-19 09:03:18 -08:00
Benjamin Maxwell	d804c83893	[clang] Lower modf builtin using `llvm.modf` intrinsic (#126750 ) This updates the existing `modf[f\|l]` builtin to be lowered via the `llvm.modf.*` intrinsic (rather than directly to a library call).	2025-02-19 14:49:22 +00:00
Benjamin Maxwell	7781e1040d	[clang] Lower non-builtin sincos[f\|l] calls to llvm.sincos.* when -fno-math-errno is set (#121763 ) This will allow vectorizing these calls (after a few more patches). This should not change the codegen for targets that enable the use of AA during the codegen (in `TargetSubtargetInfo::useAA()`). This includes targets such as AArch64. This notably does not include x86 but can be worked around by passing `-mllvm -combiner-global-alias-analysis=true` to clang. Follow up to #114086.	2025-02-19 10:13:41 +00:00
Fabian Ritter	029c8e783d	[AMDGPU][clang] Replace gfx940 and gfx941 with gfx942 in clang (#126762 ) gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all occurrences of gfx940/gfx941 from clang that can be removed without changes in the llvm directory. The target-invalid-cpu-note/amdgcn.c test is not included here since it tests a list of targets that is defined in llvm/lib/TargetParser/TargetParser.cpp. For SWDEV-512631	2025-02-19 10:11:48 +01:00
Krzysztof Drewniak	f7d03707d1	[AMDGPU] Generalize amdgcn.make.buffer.rsrc to fat pointers (#126828 ) Attempting to pass a `ptr addrspace(7)` to functions that take `ptr` arguments produces undesirable `addrspacecast(addrspacecast(p8 x to p7) to p0) => addrspacecast(p8 x to p0)` folds. This results in illegal GEP operations on buffer resources, which can't be GEP'd. (However, note that, while unimplemneted, addressspacecast from ptr addrspace(7) to ptr is legal - it's just an effective address computation) To resolve this problem, and thus prevent illegal `getelementptr T, ptr addrspace(8) %x, ...` s from being produces, this commit extends amdgcn.make.buffer.rsrc to also be variadic in its result type, auto-upgrading old manglings. The logic for handling a make.buffer.rsrc in instruction selection remains untouched and expects the output type to be a ptr addrspace(8), as does the Clang lowering for its builtin (the pointer-to-pointer version might want a different name in clang). LowerBufferFatPointers has been updated to lower amdgcn.make.buffer.rsrc.p7.p* to amdgcn.make.buffer.rsrc.p8.p* . This'll also make exposing buffer fat pointers in Clang easier, since you don't have to cast between a `__amdgcn_rsrc_t` and a pointer.	2025-02-18 14:15:28 -06:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Alex Voicu	a7a356833d	[NFC][Clang][CodeGen] Remove vestigial assertion (#127528 ) This removes a vestigial assertion, which would erroneously trigger even though we now correctly handle valid arg mismatches (<`2dda529838/clang/lib/CodeGen/CGCall.cpp (L5397)`>), after #114062 went in.	2025-02-17 22:05:22 +02:00
Akira Hatanaka	f5c5bc5ed5	[CodeGen][ObjC] Invalidate cached ObjC class layout information after parsing ObjC class implementations if new ivars are added to the interface (#126591 ) The layout and the size of an ObjC interface can change after its corresponding implementation is parsed when synthesized ivars or ivars declared in categories are added to the interface's list of ivars. This can cause clang to mis-compile if the optimization that emits fixed offsets for ivars (see 923ddf65f4e21ec67018cf56e823895de18d83bc) uses an ObjC class layout that is outdated and no longer reflects the current state of the class. For example, when compiling `constant-non-fragile-ivar-offset.m`, clang emits 20 instead of 24 as the offset for `IntermediateClass2Property` as the class layout for `SuperClass2`, which is created when the implementation of IntermediateClass3 is parsed, is outdated when the implementation of `IntermediateClass2` is parsed. This commit invalidates the stale layout information of the class and its subclasses if new ivars are added to the interface. With this change, we can also stop using ObjC implementation decls as the key to retrieve ObjC class layouts information as the layout retrieved using the ObjC interface as the key will always be up to date. rdar://139531391	2025-02-17 11:50:44 -08:00
Chris B	761d422441	[HLSL] Implement HLSL intialization list support (#123141 ) This PR implements HLSL's initialization list behvaior as specified in the draft language specifcation under [Decl.Init.Agg](https://microsoft.github.io/hlsl-specs/specs/hlsl.html#Decl.Init.Agg). This behavior is a bit unusual for C/C++ because intermediate braces in initializer lists are ignored and a whole array of additional conversions occur unintuitively to how initializaiton works in C. The implementaiton in this PR generates a valid C/C++ initialization list AST for the HLSL initializer so that there are no changes required to Clang's CodeGen to support this. This design will also allow us to use Clang's rewrite to convert HLSL initializers to valid C/C++ initializers that are equivalent. It does have the downside that it will generate often redundant accesses during codegen. The IR optimizer is extremely good at eliminating those so this will have no impact on the final executable performance. There is some opportunity for optimizing the initializer list generation that we could consider in subsequent commits. One notable opportunity would be to identify aggregate objects that occur in the same place in both initializers and do not require converison, those aggregates could be initialized as aggregates rather than fully scalarized. Closes #56067 --------- Co-authored-by: Finn Plummer <50529406+inbelic@users.noreply.github.com> Co-authored-by: Helena Kotas <hekotas@microsoft.com> Co-authored-by: Justin Bogner <mail@justinbogner.com>	2025-02-15 13:21:36 -06:00
Sarah Spall	4d2d0afcee	[HLSL] Implement HLSL Aggregate splatting (#118992 ) Implement HLSL Aggregate Splat casting that handles splatting for arrays and structs, and vectors if splatting from a vec1. Closes #100609 and Closes #100619 Depends on #118842	2025-02-14 09:25:24 -08:00
Florian Hahn	50d10b5c1c	[Clang] Add __builtin_assume_dereferenceable to encode deref assumption. (#121789 ) This patch adds a new __builtin_assume_dereferenceable to encode dereferenceability of a pointer using llvm.assume with an operand bundle. For now the builtin only accepts constant sizes, I am planning to drop this restriction in a follow-up change. This can be used to better optimize cases where a pointer is known to be dereferenceable, e.g. unconditionally loading from p2 when vectorizing the loop. int get_ptr(); void foo(int src, int x) { int *p2 = get_ptr(); __builtin_assume_aligned(p2, 4); __builtin_assume_dereferenceable(p2, 4000); for (unsigned I = 0; I != 1000; ++I) { int x = src[I]; if (x == 0) x = p2[I]; src[I] = x; } } PR: https://github.com/llvm/llvm-project/pull/121789	2025-02-14 12:44:20 +01:00
Alex Voicu	39ec9de7c2	[clang][CodeGen] `sret` args should always point to the `alloca` AS, so use that (#114062 ) `sret` arguments are always going to reside in the stack/`alloca` address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures that `sret` ends up pointing to the `alloca` AS in IR function signatures, and also guards agains trying to pass a casted `alloca`d pointer to a `sret` arg, which can happen for most languages, when compiled for targets that have a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a non-zero value (SPIR-V). A target could still choose to do something different here, by e.g. overriding `classifyReturnType` behaviour. In a broader sense, this patch extends non-aliased indirect args to also carry an AS, which leads to changing the `getIndirect()` interface. At the moment we're only using this for (indirect) returns, but it allows for future handling of indirect args themselves. We default to using the AllocaAS as that matches what Clang is currently doing, however if, in the future, a target would opt for e.g. placing indirect returns in some other storage, with another AS, this will require revisiting. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>	2025-02-14 11:20:45 +00:00
schittir	8a0914c245	[clang][NFC] Avoid potential null dereferences (#127017 ) Add null checking.	2025-02-13 21:14:36 -08:00
Ulrich Weigand	adacbf68eb	[SystemZ] Add codegen support for llvm.roundeven This is straightforward as we already had all the necessary instructions, they simply were not wired up. Also allows implementing the vec_round intrinsic via the standard llvm.roundeven IR instead of a platform intrinsic now.	2025-02-14 00:10:37 +01:00
Zahira Ammarguellat	cf69b4c668	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#126927 ) This patch was reviewed and approved here: https://github.com/llvm/llvm-project/pull/119891 However it has been reverted here: `083df25dc2` due to a build issue here: https://lab.llvm.org/buildbot/#/builders/51/builds/10694 This patch is reintroducing the support.	2025-02-13 07:14:36 -05:00
Harald van Dijk	1083ec647f	[reland][DebugInfo] Update DIBuilder insertion to take InsertPosition (#126967 ) After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock . This commit also updates the existing calls to the DIBuilder methods to pass in iterators. As a special exception, DIBuilder::insertDeclare() keeps a separate overload accepting a BasicBlock InsertAtEnd. This is because despite the name, this method does not insert at the end of the block, therefore this cannot be handled implicitly by using InsertPosition.	2025-02-13 10:46:42 +00:00
S. Bharadwaj Yadavalli	f2650c54c9	[DirectX] Set Shader Flag DisableOptimizations (#126813 ) - Set the shader flag `DisableOptimizations` based on `optnone` attribute of shader entry functions. - Add DXIL Metadata Analysis pass as pre-requisite for Shader Flags pass to obtain entry function information collected therein. - Named module metadata `dx.disable_optimizations` is intended to indicate disabling optimizations (`-O0`) via commandline flag. However, its intent is fulfilled by `optnone` attribute of shader entry functions as implemented in a recent change, and thus not needed. Delete generation of named metadata and corresponding test file `disable_opt.ll`. - Add tests to verify correctness of setting shader flag. Closes #112263	2025-02-12 16:45:01 -05:00
Peter Rong	53c618c071	[clang] run clang-format on some CGObjC files (#126644 ) These files are relatively old and don't confront our formatting rules. It's hard to change them without massive clang-format changes. --------- Signed-off-by: Peter Rong <PeterRong@meta.com>	2025-02-12 11:52:49 -08:00
Harald van Dijk	23209eb1d9	Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 )" This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.	2025-02-12 17:50:39 +00:00
Harald van Dijk	3ec9f7494b	[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059 ) After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock *. This commit also updates the existing calls to the DIBuilder methods to pass in iterators.	2025-02-12 17:38:59 +00:00
Nick Sarnie	cb3498c670	[OpenMP][OpenMPIRBuilder] Support SPIR-V device variant matches (#126801 ) We should be able to use `spirv64` as a device variant match and it should be considered a GPU. Also add the triple to an RTTI check. Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-02-12 16:40:05 +00:00
Alex MacLean	a282b6c486	[NVPTX] Convert scalar function nvvm.annotations to attributes (#125908 ) Replace some more nvvm.annotations with function attributes, auto-upgrading the annotations as needed. These new attributes will be more idiomatic and compile-time efficient than the annotations. - !"maxclusterrank" / !"cluster_max_blocks" -> "nvvm.maxclusterrank" - !"minctasm" -> "nvvm.minctasm" - !"maxnreg" -> "nvvm.maxnreg"	2025-02-12 07:33:22 -08:00
Matt	a1826b4d26	[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#126172 ) A proposed fix for the issue #95611, [OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 Changes: - Implement new lowering behavior: Conservatively serialize "omp simd" loops that have `omp simd ordered` directive to prevent incorrect vectorization (which results in incorrect execution behavior of the miscompiled program). Implementation outline: - We start with the optimistic default initial value of `LoopStack.setParallel(/Enable=/true);` in `CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`. - We only disable the loop parallel memory access assumption with `if (HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the `HasOrderedDirective` (which tests for the presence of an `OMPOrderedDirective`). - This results in no longer incorrectly vectorizing the loop when the `omp simd ordered` directive is present. Motivation: We'd like to prevent incorrect vectorization of the loops marked with the `#pragma omp ordered simd` directive which has previously resulted in miscompiled code. At the same time, we'd like the usage outside of the `#pragma omp ordered simd` context to remain unaffected: Note that in the test "clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the `!llvm.access.group` metadata in `foo_simd` alone. This is conservative, in that it's possible some of the loops would be possible to vectorize, but we prefer to avoid miscompilation of the loops that are currently illegal to vectorize. A concrete example follows: ```cpp // "test.c" #include <float.h> #include <math.h> #include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int compare_float(float x1, float x2, float scalar) { const float diff = fabsf(x1 - x2); x1 = fabsf(x1); x2 = fabsf(x2); const float l = (x2 > x1) ? x2 : x1; if (diff <= l * scalar * FLT_EPSILON) return 1; else return 0; } #define ARRAY_SIZE 256 __attribute__((noinline)) void initialization_loop( float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { const float max = 1000.0; srand(time(NULL)); for (int r = 0; r < ARRAY_SIZE; r++) { for (int c = 0; c < ARRAY_SIZE; c++) { X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max; Y[r][c] = X[r][c]; } } } __attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) { for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { #pragma omp simd for (int k = 2; k < ARRAY_SIZE; ++k) { #pragma omp ordered simd X[r][k] = X[r][k - 2] + sinf((float)(r / c)); } } } } __attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { int totalErrors_simd = 0; const float scalar = 1.0; for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { for (int k = 2; k < ARRAY_SIZE; ++k) { Y[r][k] = Y[r][k - 2] + sinf((float)(r / c)); } } // check row for simd update for (int k = 0; k < ARRAY_SIZE; ++k) { if (!compare_float(X[r][k], Y[r][k], scalar)) { ++totalErrors_simd; } } } return totalErrors_simd; } int main(void) { float X[ARRAY_SIZE][ARRAY_SIZE]; float Y[ARRAY_SIZE][ARRAY_SIZE]; initialization_loop(X, Y); omp_simd_loop(X); const int totalErrors_simd = comparison_loop(X, Y); if (totalErrors_simd) { fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd); fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n", __FILE__, __LINE__); } else { fprintf(stdout, "Success!\n"); } return totalErrors_simd; } ``` Before: ``` $ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test totalErrors_simd: 15408 test.c : 76 - FAIL: error in ordered simd computation. ``` clang 19.1.0: https://godbolt.org/z/6EvhxqEhe After: ``` $ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test Success! ``` Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>	2025-02-12 08:53:47 -05:00
Kazu Hirata	67e1e98811	Revert "[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 )" This reverts commit 070f84ebc89b11df616a83a56df9ac56efbab783. Buildbot failure: https://lab.llvm.org/buildbot/#/builders/51/builds/10694	2025-02-11 12:39:01 -08:00
Zahira Ammarguellat	070f84ebc8	[Clang] [OpenMP] Add support for '#pragma omp stripe'. (#119891 ) Implement basic parsing and semantic support for `#pragma omp stripe` constuct introduced in https://www.openmp.org/wp-content/uploads/[OpenMP-API-Specification-6-0.pdf](https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf), section 11.7.	2025-02-11 13:58:21 -05:00
S. Bharadwaj Yadavalli	b92bab3c01	[HLSL] Appropriately set function attribute optnone (#125937 ) When optimization is disabled, set `optnone` attribute all module entry functions. Updated test in accordance with the change. Closes #124796	2025-02-11 12:29:05 -05:00
Kazu Hirata	8e4e144931	[CodeGen] Avoid repeated hash lookups (NFC) (#126672 )	2025-02-11 09:06:40 -08:00
Nick Sarnie	f3cd223838	[OpenMP][OpenMPIRBuilder] Add initial changes for SPIR-V target frontend support (#125920 ) As Intel is working to add support for SPIR-V OpenMP device offloading in upstream clang/liboffload, we need to modify the OpenMP frontend to allow SPIR-V as well as generate valid IR for SPIR-V. For example, we need the frontend to generate code to define and interact with device globals used in the DeviceRTL. This is the beginning of what I expect will be (many) other changes, but let's get started with something simple. --------- Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>	2025-02-10 16:16:40 +00:00
Wael Yehia	8e61aae4a8	[profile] Add a clang option -fprofile-continuous that enables continuous instrumentation profiling mode (#124353 ) In Continuous instrumentation profiling mode, profile or coverage data collected via compiler instrumentation is continuously synced to the profile file. This feature has existed for a while, and is documented here: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program This PR creates a user facing option to enable the feature. --------- Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>	2025-02-08 17:25:07 -05:00
Mats Jun Larsen	a07928c3ce	[CodeGen][Hexagon] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126274 ) Follow-up to https://github.com/llvm/llvm-project/issues/123569 The obsolete bitcasts on the LoadInsts are also removed.	2025-02-08 15:13:23 +00:00
Mats Jun Larsen	e0fee55a55	[CodeGen] Replace of PointerType::get(Type) with opaque version (NFC) (#124771 ) Follow-up to https://github.com/llvm/llvm-project/issues/123569	2025-02-08 15:13:02 +00:00
Mats Jun Larsen	df2e8ee7ae	[CodeGen][AArch64] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126278 ) Follow-up to #123569	2025-02-08 13:23:08 +00:00
Mats Jun Larsen	54e0c2bbe2	[CodeGen][SystemZ] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126280 ) Follow-up to #126278	2025-02-08 13:22:53 +00:00
Mats Jun Larsen	4e29148cca	[CodeGen][XCore] Replace PointerType::getUnqual(Type) with opaque version (NFC) (#126279 ) Follow-up to #123569	2025-02-08 13:22:42 +00:00
Sarah Spall	3f8e280206	[HLSL] Implement HLSL Elementwise casting (excluding splat cases); Re-land #118842 (#126258 ) Implement HLSLElementwiseCast excluding support for splat cases Do not support casting types that contain bitfields. Partly closes https://github.com/llvm/llvm-project/issues/100609 and partly closes https://github.com/llvm/llvm-project/issues/100619 Re-land #118842 after fixing warning as an error, found by a buildbot.	2025-02-07 09:12:55 -08:00
Sjoerd Meijer	612df14c00	[Clang][Driver] Add an option to control loop-interchange (#125830 ) This introduces options `-floop-interchange` and `-fno-loop-interchange` to enable/disable the loop-interchange pass. This is part of the work that tries to get that pass enabled by default (#124911), where it was remarked that a user facing option to control this would be convenient to have. The option name is the same as GCC's.	2025-02-07 10:31:24 +00:00
Michael Buch	e00fc80c19	[clang][DebugInfo] Set EnumKind based on enum_extensibility attribute (#126045 ) This is the 2nd part to https://github.com/llvm/llvm-project/pull/124752. Here we make sure to set the `DICompositeType` `EnumKind` if the enum was declared with `__attribute__((enum_extensibility(...)))`. In DWARF this will be rendered as `DW_AT_APPLE_enum_kind` and will be used by LLDB when creating `clang::EnumDecl`s from debug-info. Depends on https://github.com/llvm/llvm-project/pull/126044	2025-02-07 09:28:10 +00:00
Sarah Spall	14716f2e4b	Revert "[HLSL] Implement HLSL Flat casting (excluding splat cases)" (#126149 ) Reverts llvm/llvm-project#118842	2025-02-06 15:25:20 -08:00
Sarah Spall	01072e546f	[HLSL] Implement HLSL Flat casting (excluding splat cases) (#118842 ) Implement HLSLElementwiseCast excluding support for splat cases Do not support casting types that contain bitfields. Partly closes #100609 and partly closes #100619	2025-02-06 14:38:01 -08:00
Alexey Bataev	3041dd5c20	Revert "[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering" (#126079 ) Reverts llvm/llvm-project#123867 to fix the test failures https://lab.llvm.org/buildbot/#/builders/144/builds/17521	2025-02-06 10:04:11 -05:00
Matt	60d8e6f528	[OpenMP][SIMD][FIX] Use conservative "omp simd ordered" lowering (#123867 ) A proposed fix for #95611 [OpenMP][SIMD] ordered has no effect in a loop SIMD region as of LLVM 18.1.0 Changes: - Implement new lowering behavior: Conservatively serialize "omp simd" loops that have `omp simd ordered` directive to prevent incorrect vectorization (which results in incorrect execution behavior of the miscompiled program). Implementation outline: - We start with the optimistic default initial value of `LoopStack.setParallel(/Enable=/true);` in `CodeGenFunction::EmitOMPSimdInit(const OMPLoopDirective &D)`. - We only disable the loop parallel memory access assumption with `if (HasOrderedDirective) LoopStack.setParallel(/Enable=/false);` using the `HasOrderedDirective` (which tests for the presence of an `OMPOrderedDirective`). - This results in no longer incorrectly vectorizing the loop when the `omp simd ordered` directive is present. Motivation: We'd like to prevent incorrect vectorization of the loops marked with the `#pragma omp ordered simd` directive which has previously resulted in miscompiled code. At the same time, we'd like the usage outside of the `#pragma omp ordered simd` context to remain unaffected: Note that in the test "clang/test/OpenMP/ordered_codegen.cpp" we only "lose" the `!llvm.access.group` metadata in `foo_simd` alone. This is conservative, in that it's possible some of the loops would be possible to vectorize, but we prefer to avoid miscompilation of the loops that are currently illegal to vectorize. A concrete example follows: ```cpp // "test.c" #include <float.h> #include <math.h> #include <omp.h> #include <stdio.h> #include <stdlib.h> #include <time.h> int compare_float(float x1, float x2, float scalar) { const float diff = fabsf(x1 - x2); x1 = fabsf(x1); x2 = fabsf(x2); const float l = (x2 > x1) ? x2 : x1; if (diff <= l * scalar * FLT_EPSILON) return 1; else return 0; } #define ARRAY_SIZE 256 __attribute__((noinline)) void initialization_loop( float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { const float max = 1000.0; srand(time(NULL)); for (int r = 0; r < ARRAY_SIZE; r++) { for (int c = 0; c < ARRAY_SIZE; c++) { X[r][c] = ((float)rand() / (float)(RAND_MAX)) * max; Y[r][c] = X[r][c]; } } } __attribute__((noinline)) void omp_simd_loop(float X[ARRAY_SIZE][ARRAY_SIZE]) { for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { #pragma omp simd for (int k = 2; k < ARRAY_SIZE; ++k) { #pragma omp ordered simd X[r][k] = X[r][k - 2] + sinf((float)(r / c)); } } } } __attribute__((noinline)) int comparison_loop(float X[ARRAY_SIZE][ARRAY_SIZE], float Y[ARRAY_SIZE][ARRAY_SIZE]) { int totalErrors_simd = 0; const float scalar = 1.0; for (int r = 1; r < ARRAY_SIZE; ++r) { for (int c = 1; c < ARRAY_SIZE; ++c) { for (int k = 2; k < ARRAY_SIZE; ++k) { Y[r][k] = Y[r][k - 2] + sinf((float)(r / c)); } } // check row for simd update for (int k = 0; k < ARRAY_SIZE; ++k) { if (!compare_float(X[r][k], Y[r][k], scalar)) { ++totalErrors_simd; } } } return totalErrors_simd; } int main(void) { float X[ARRAY_SIZE][ARRAY_SIZE]; float Y[ARRAY_SIZE][ARRAY_SIZE]; initialization_loop(X, Y); omp_simd_loop(X); const int totalErrors_simd = comparison_loop(X, Y); if (totalErrors_simd) { fprintf(stdout, "totalErrors_simd: %d \n", totalErrors_simd); fprintf(stdout, "%s : %d - FAIL: error in ordered simd computation.\n", __FILE__, __LINE__); } else { fprintf(stdout, "Success!\n"); } return totalErrors_simd; } ``` Before: ``` $ clang -fopenmp-simd -O3 -ffast-math -lm test.c -o test && ./test totalErrors_simd: 15408 test.c : 76 - FAIL: error in ordered simd computation. ``` clang 19.1.0: https://godbolt.org/z/6EvhxqEhe After: ``` $ clang -fopenmp-simd -O3 -ffast-math test.c -o test && ./test Success! ``` Co-authored-by: Matt P. Dziubinski <matt-p.dziubinski@hpe.com>	2025-02-06 09:44:11 -05:00
Joseph Huber	f1e917d07b	[Offload] Unify offloading entries into a single section (#125731 ) Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.	2025-02-06 08:24:01 -06:00
Scott Constable	e223485c9b	[X86] Extend kCFI with a 3-bit arity indicator (#121070 ) Kernel Control Flow Integrity (kCFI) is a feature that hardens indirect calls by comparing a 32-bit hash of the function pointer's type against a hash of the target function's type. If the hashes do not match, the kernel may panic (or log the hash check failure, depending on the kernel's configuration). These hashes are computed at compile time by applying the xxHash64 algorithm to each mangled canonical function (or function pointer) type, then truncating the result to 32 bits. This hash is written into each indirect-callable function header by encoding it as the 32-bit immediate operand to a `MOVri` instruction, e.g.: ``` __cfi_foo: nop nop nop nop nop nop nop nop nop nop nop movl $199571451, %eax # hash of foo's type = 0xBE537FB foo: ... ``` This PR extends x86-based kCFI with a 3-bit arity indicator encoded in the `MOVri` instruction's register (reg) field as follows: \| Arity Indicator \| Description \| Encoding in reg field \| \| --------------- \| --------------- \| --------------- \| \| 0 \| 0 parameters \| EAX \| \| 1 \| 1 parameter in RDI \| ECX \| \| 2 \| 2 parameters in RDI and RSI \| EDX \| \| 3 \| 3 parameters in RDI, RSI, and RDX \| EBX \| \| 4 \| 4 parameters in RDI, RSI, RDX, and RCX \| ESP \| \| 5 \| 5 parameters in RDI, RSI, RDX, RCX, and R8 \| EBP \| \| 6 \| 6 parameters in RDI, RSI, RDX, RCX, R8, and R9 \| ESI \| \| 7 \| At least one parameter may be passed on the stack \| EDI \| For example, if `foo` takes 3 register arguments and no stack arguments then the `MOVri` instruction in its kCFI header would instead be written as: ``` movl $199571451, %ebx # hash of foo's type = 0xBE537FB ``` This PR will benefit other CFI approaches that build on kCFI, such as FineIBT. For example, this proposed enhancement to FineIBT must be able to infer (at kernel init time) which registers are live at an indirect call target: https://lkml.org/lkml/2024/9/27/982. If the arity bits are available in the kCFI function header, then this information is trivial to infer. Note that there is another existing PR proposal that includes the 3-bit arity within the existing 32-bit immediate field, which introduces different security properties: https://github.com/llvm/llvm-project/pull/117121.	2025-02-06 10:54:22 +08:00

... 9 10 11 12 13 ...

18194 Commits