llvm-project

Author	SHA1	Message	Date
Sam Tebbs	795e35a653	Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744 ) This relands the reverted #120721 with a fix for cases where neither reduction operand are the reduction phi. Only 63114239cc8d26225a0ef9920baacfc7cc00fc58 and 63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted PR. --------- Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>	2025-01-13 11:20:35 +00:00
Durgadoss R	7e2eb0f83e	[NVPTX] Add float to tf32 conversion intrinsics (#121507 ) This patch adds the missing variants of float to tf32 conversion intrinsics, with their corresponding lit tests. PTX Spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-01-13 16:17:42 +05:30
Akshat Oke	4f96fb5fb3	Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181 )" (#122665 ) Makes Inline Spiller amenable to the new PM. This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted because of two unused private members reported on sanitizer bots.	2025-01-13 14:14:13 +05:30
Akshat Oke	f431f93a77	[CodeGen][NewPM] Use proper NPM AtomicExpandPass in AMDGPU (#122086 ) `PassRegistry.def` already has this entry, but the dummy definition was being pulled instead. I couldn't reproduce the build failures that FIXME referenced, maybe the Dummy pass getting in the way was part of the cause.	2025-01-13 10:38:24 +05:30
Sameer Sahasrabuddhe	77e6f434ec	[SPIRV] convergence anchor intrinsic does not have a parent token (#122230 )	2025-01-13 09:54:57 +05:30
Justin Bogner	0e51b54b7a	[DirectX] Implement the resource.store.rawbuffer intrinsic (#121282 ) This introduces `@llvm.dx.resource.store.rawbuffer` and generalizes the buffer store docs under DirectX/DXILResources. Fixes #106188	2025-01-12 18:52:20 -07:00
Daniel Paoliello	5ee0a71df9	[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516 ) This change implements import call optimization for AArch64 Windows (equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag). Import call optimization adds additional data to the binary which can be used by the Windows kernel loader to rewrite indirect calls to imported functions as direct calls. It uses the same [Dynamic Value Relocation Table mechanism that was leveraged on x64 to implement `/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618). The change to the obj file is to add a new `.impcall` section with the following layout: ```cpp // Per section that contains calls to imported functions: // uint32_t SectionSize: Size in bytes for information in this section. // uint32_t Section Number // Per call to imported function in section: // uint32_t Kind: the kind of imported function. // uint32_t BranchOffset: the offset of the branch instruction in its // parent section. // uint32_t TargetSymbolId: the symbol id of the called function. ``` NOTE: If the import call optimization feature is enabled, then the `.impcall` section must be emitted, even if there are no calls to imported functions. The implementation is split across a few parts of LLVM: * During AArch64 instruction selection, the `GlobalValue` for each call to a global is recorded into the Extra Information for that node. * During lowering to machine instructions, the called global value for each call is noted in its containing `MachineFunction`. * During AArch64 asm printing, if the import call optimization feature is enabled: - A (new) `.impcall` directive is emitted for each call to an imported function. - The `.impcall` section is emitted with its magic header (but is not filled in). * During COFF object writing, the `.impcall` section is filled in based on each `.impcall` directive that were encountered. The `.impcall` section can only be filled in when we are writing the COFF object as it requires the actual section numbers, which are only assigned at that point (i.e., they don't exist during asm printing). I had tried to avoid using the Extra Information during instruction selection and instead implement this either purely during asm printing or in a `MachineFunctionPass` (as suggested in [on the forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3)) but this was not possible due to how loading and calling an imported function works on AArch64. Specifically, they are emitted as `ADRP` + `LDR` (to load the symbol) then a `BR` (to do the call), so at the point when we have machine instructions, we would have to work backwards through the instructions to discover what is being called. An initial prototype did work by inspecting instructions; however, it didn't correctly handle the case where the same function was called twice in a row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the previously loaded address. Worse than that, sometimes for the double-call case LLVM decided to spill the loaded address to the stack and then reload it before making the second call. So, instead of trying to implement logic to discover where the value in a register came from, I instead recorded the symbol being called at the last place where it was easy to do: instruction selection.	2025-01-11 21:30:17 -08:00
Austin Kerbow	657fb4433e	[AMDGPU] Add target hook to isGlobalMemoryObject (#112781 ) We want special handing for IGLP instructions in the scheduler but they should still be treated like they have side effects by other passes. Add a target hook to the ScheduleDAGInstrs DAG builder so that we have more control over this.	2025-01-11 09:57:57 -08:00
Amr Hesham	1d58699f5c	[SDPatternMatch] Add Matcher m_Undef (#122521 ) Add Matcher `m_Undef` Fixes: #122439	2025-01-11 13:23:37 +01:00
Ramkumar Ramachandra	f38c40bff3	VT: teach isImpliedCondMatchingOperands about samesign (#122474 ) Move isImplied{True,False}ByMatchingCmp from CmpInst to ICmpInst, so that it can operate on CmpPredicate instead of CmpInst::Predicate, and teach it about samesign. There are two callers of this function, and we choose to migrate the one in ValueTracking, namely isImpliedCondMatchingOperands to CmpPredicate, hence teaching it about samesign, with visible test impact.	2025-01-11 09:08:57 +00:00
Fangrui Song	0de18e72c6	-ftime-report: reorganize timers The code generation time is unclear in the -ftime-report output: * The two clang timers "Code Generation Time" and "LLVM IR Generation Time" are in the default group "Miscellaneous Ungrouped Timers". * There is also a "Clang front-end time" group, which actually includes code generation time. ``` ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0611 ( 1.7%) 0.0099 ( 4.4%) 0.0710 ( 1.9%) 0.0713 ( 1.9%) LLVM IR Generation Time 3.5140 ( 98.3%) 0.2165 ( 95.6%) 3.7306 ( 98.1%) 3.7342 ( 98.1%) Code Generation Time 3.5751 (100.0%) 0.2265 (100.0%) 3.8016 (100.0%) 3.8055 (100.0%) Total ... ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== Total Execution Time: 3.9108 seconds (3.9146 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 3.6802 (100.0%) 0.2306 (100.0%) 3.9108 (100.0%) 3.9146 (100.0%) Clang front-end timer 3.6802 (100.0%) 0.2306 (100.0%) 3.9108 (100.0%) 3.9146 (100.0%) Total ``` This patch * renames "Clang front-end time report" (FrontendAction time) to "Clang time report", * renames "Clang front-end" to "Front end", * moves "LLVM IR Generation" into the group, * replaces "Code Generation time" with "Optimizer" (middle end) and "Machine code generation" (back end). ``` % clang -c sqlite3.i -w -ftime-report -mllvm -sort-timers=0 ... ===-------------------------------------------------------------------------=== Clang time report ===-------------------------------------------------------------------------=== Total Execution Time: 1.5922 seconds (1.5972 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.5107 ( 35.9%) 0.0105 ( 6.2%) 0.5211 ( 32.7%) 0.5222 ( 32.7%) Front end 0.2464 ( 17.3%) 0.0340 ( 20.0%) 0.2804 ( 17.6%) 0.2814 ( 17.6%) LLVM IR generation 0.6240 ( 43.9%) 0.1235 ( 72.7%) 0.7475 ( 47.0%) 0.7503 ( 47.0%) Machine code generation 0.0413 ( 2.9%) 0.0018 ( 1.0%) 0.0431 ( 2.7%) 0.0433 ( 2.7%) Optimizer 1.4224 (100.0%) 0.1698 (100.0%) 1.5922 (100.0%) 1.5972 (100.0%) Total ``` Pull Request: https://github.com/llvm/llvm-project/pull/122225	2025-01-10 19:25:18 -08:00
GeorgeHuyubo	9b528ed380	Debuginfod cache use index cache settings and include real file name (#120814 ) This PR include two changes: 1. Change debuginfod cache file name to include origin file name, the new file name would be something like: llvmcache-13267c5f5d2e3df472c133c8efa45fb3331ef1ea-liblzma.so.5.2.2.debuginfo.dwp So it will provide more information in image list instead of a plain llvmcache-123 2. Switch debuginfod cache to use lldb index cache settings. Currently we don't have proper settings for setting the cache path or the cache expiration time for debuginfod cache. We want to use the lldb index cache settings, as they make sense to be in the same place and have the same TTL. --------- Co-authored-by: George Hu <georgehuyubo@gmail.com>	2025-01-10 18:13:46 -08:00
Mircea Trofin	6329355860	[ctxprof] Move test serialization to yaml (#122545 ) We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader. A subsequent patch will address deserialization. (thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)	2025-01-10 18:04:25 -08:00
Fangrui Song	af4d76d909	[Support] Reduce globaal variable overhead after #121663 * Construct frequently-accessed TimerLock/DefaultTimerGroup early to reduce overhead. * Rename `aquireDefaultGroup` to `acquireTimerGlobals` and restore ManagedStatic::claim. https://reviews.llvm.org/D76099 * Drop mtg::. We use internal linkage, so mtg:: is unneeded and might mislead users. In addition, llvm/ code almost never introduces a named namespace not in llvm::. Drop mtg::. * Replace some unique_ptr with optional to reduce overhead. * Switch to `functionName()`. * Simplify `llvm::initTimerOptions` and `TimerGroup::constructForStatistics()` Pull Request: https://github.com/llvm/llvm-project/pull/122429	2025-01-10 17:59:28 -08:00
Sergei Barannikov	a475ae05fb	Revert "[ADT] Fix specialization of ValueIsPresent for PointerUnion" (#122557 ) Reverts llvm/llvm-project#121847 Causes compile time regressions and allegedly miscompilation.	2025-01-11 03:36:34 +03:00
vporpo	9248428db7	[SandboxVec][DAG][NFC] Refactor setNextNode() and setPrevNode() (#122363 ) This patch updates DAG's `setNextNode()` and `setPrevNode()` to update both nodes of the link.	2025-01-10 13:32:33 -08:00
Farzon Lotfi	b900379e26	[HLSL] Reapply Move length support out of the DirectX Backend (#121611 ) (#122337 ) ## Changes - Delete DirectX length intrinsic - Delete HLSL length lang builtin - Implement length algorithm entirely in the header. ## History - In the past if an HLSL intrinsic lowered to either a spirv op code or a DXIL opcode we represented it with intrinsics ## Why we are moving away? - To make HLSL apis more portable the team decided that it makes sense for some intrinsics to be defined only in the header. - Since there tends to be more SPIRV opcodes than DXIL opcodes the plan is to support SPIRV opcodes either with target specific builtins or via pattern matching.	2025-01-10 14:16:27 -05:00
Durgadoss R	372044ee09	[NVPTX] Add TMA Bulk Copy intrinsics (#122344 ) PR #96083 added intrinsics for async copy of 'tensor' data using TMA. Following a similar design, this PR adds intrinsics for async copy of bulk data (non-tensor variants) through TMA. * These intrinsics optionally support multicast and cache_hints, as indicated by the boolean arguments at the end of the intrinsics. * The backend looks through these flag arguments and lowers to the appropriate PTX instructions. * Lit tests are added for all combinations of these intrinsics in cp-async-bulk.ll. * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-01-10 22:31:53 +05:30
Philip Reames	24bb180e8a	[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311 ) This takes inspiration from AArch64 which does the same thing to assist with zip/trn/etc.. Doing this recursion unconditionally when the mask allows is slightly questionable, but seems to work out okay in practice. As a bit of context, it's helpful to realize that we have existing logic in both DAGCombine and InstCombine which mutates the element width of in an analogous manner. However, that code has two restriction which prevent it from handling the motivating cases here. First, it only triggers if there is a bitcast involving a different element type. Second, the matcher used considers a partially undef wide element to be a non-match. I considered trying to relax those assumptions, but the information loss for undef in mid-level opt seemed more likely to open a can of worms than I wanted.	2025-01-10 07:12:24 -08:00
Sergei Barannikov	7b05367943	[ADT] Fix specialization of ValueIsPresent for PointerUnion (#121847 ) Two instances of `PointerUnion` with different active members and null value compare unequal. Currently, this results in counterintuitive behavior when using functions from `Casting.h`, e.g.: ```C++ PointerUnion<int , float > U; // U = (int )nullptr; dyn_cast<int >(U); // Aborts dyn_cast<float >(U); // Aborts U = (float )nullptr; dyn_cast<int >(U); // OK dyn_cast<float >(U); // OK ``` `dyn_cast` should abort in all cases because the argument is null. Currently, it aborts only if the first member is active. This happens because the partial template specialization of `ValueIsPresent` for nullable types compares the union with a union constructed from nullptr, and the two unions compare equal only if their active members are the same. This patch changed the specialization of `ValueIsPresent` for nullable types to make `isPresent()` return false for all possible null values of a PointerUnion, and fixes two places where the old behavior was exploited. Pull Request: https://github.com/llvm/llvm-project/pull/121847	2025-01-10 16:43:19 +03:00
Nikita Popov	c39500f88c	Revert "[GVN] MemorySSA for GVN: add optional `AllowMemorySSA`" This reverts commit eb63cd62a4a1907dbd58f12660efd8244e7d81e9. This changes the preservation behavior for MSSA when the new flag is not enabled.	2025-01-10 12:57:00 +01:00
Mirko Brkušanin	3def49cb64	[AMDGPU] Remove s_wakeup_barrier instruction (#122277 )	2025-01-10 11:30:22 +01:00
Momchil Velikov	eb63cd62a4	[GVN] MemorySSA for GVN: add optional `AllowMemorySSA` Preparatory work to migrate from MemoryDependenceAnalysis towards MemorySSA in GVN. Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>	2025-01-10 10:43:12 +01:00
Jay Foad	fd922c4b4f	[CodeGen] Add const to getAddrModeArguments argument. NFC. (#122335 )	2025-01-10 09:19:25 +00:00
Lang Hames	e8cc4d24bc	[ORC][MachO] Fix deferred action handling during MachOPlatform bootstrap. DeferredAAs should only capture bootstrap actions, but after 30b73ed7bd it was capturing all actions, including those from other plugins. This is problematic as other plugins may introduce actions that need to run before the platform actions (e.g. on arm64e we need pointer signing to run before we access any global pointers in the graph). Note that this effectively undoes 30b73ed7bd, which was a buggy attempt to synchronize writes to the DeferredAAs vector. This patch fixes that issue the obvious way by locking the bootstrap mutex while accessing the DeferredAAs vector. No testcase yet: So far I've only seen this fail during bootstrap of arm64e JIT'd programs.	2025-01-10 18:08:43 +11:00
Akshat Oke	089555095b	Revert "Spiller: Detach legacy pass and supply analyses instead (#119… (#122426 ) …181)" This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.	2025-01-10 12:23:07 +05:30
Tyler Lanphear	4c0a0f7241	[SandboxVectorizer][NFCI] Fix use of possibly-uninitialized scalar. (#122201 ) The `EraseCallbackID` field is not always initialized in the ctor for SeedCollector; if not, it will be used uninitialized by its dtor. This could potentially lead to the erasure of a random callback, leading to a bug. Fixed by making `CallbackID` an opaque type, which is always default-initialized to an invalid ID.	2025-01-09 22:43:30 -08:00
Akshat Oke	a531800344	Spiller: Detach legacy pass and supply analyses instead (#119181 ) Makes Inline Spiller amenable to the new PM.	2025-01-10 11:46:56 +05:30
Vitaly Buka	4c8fdc2954	[nfc][BoundsChecking] Rename BoundsCheckingOptions into Options (#122359 )	2025-01-09 20:38:13 -08:00
Vitaly Buka	9c2de994a1	[nfc][BoundsChecking] Refactor BoundsCheckingOptions (#122346 ) Remove ReportingMode and ReportingOpts.	2025-01-09 20:19:01 -08:00
Lang Hames	2d10b7b750	Reapply "[ORC][llvm-jitlink] Add SimpleLazyReexportsSpeculator..." with fixes. This reapplies 6d72bf47606, which was reverted in 57447d3ddf to investigate build failures, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/10114. The original patch contained an invalid unused friend declaration of std::make_shared. This has been removed.	2025-01-10 12:16:21 +11:00
Lang Hames	57447d3ddf	Revert "[ORC][llvm-jitlink] Add SimpleLazyReexportsSpeculator, use in llvm-jitlink." This reverts commit 6d72bf47606c2a288b911d682fd96129c9c1466d while I fix bot failures.	2025-01-10 12:08:05 +11:00
Lang Hames	6d72bf4760	[ORC][llvm-jitlink] Add SimpleLazyReexportsSpeculator, use in llvm-jitlink. Also adds a new IdleTask type and updates DynamicThreadPoolTaskDispatcher to schedule IdleTasks whenever the total number of threads running is less than the maximum number of MaterializationThreads. A SimpleLazyReexportsSpeculator instance maintains a list of speculation suggestions ((JITDylib, Function) pairs) and registered lazy reexports. When speculation opportunities are available (having been added via addSpeculationSuggestions or when lazy reexports were created) it schedules an IdleTask that triggers the next speculative lookup as soon as resources are available. Speculation suggestions are processed first, followed by lookups for lazy reexport bodies. A callback can be registered at object construction time to record lazy reexport executions as they happen, and these executions can be fed back into the speculator as suggestions on subsequent executions. The llvm-jitlink tool is updated to support speculation when lazy linking is used via three new arguments: -speculate=[none\|simple] : When the 'simple' value is specified a SimpleLazyReexportsSpeculator instances is used for speculation. -speculate-order <path> : Specifies a path to a CSV containing (jit dylib name, function name) triples to use as speculative suggestions in the current run. -record-lazy-execs <path> : Specifies a path in which to record lazy function executions as a CSV of (jit dylib name, function name) pairs, suitable for use with -speculate-order. The same path can be passed to -speculate-order and -record-lazy-execs, in which case the file will be overwritten at the end of the execution. No testcase yet: Speculative linking is difficult to test (since by definition execution behavior should be unaffected by speculation) and this is an new prototype of the concept. Tests will be added in the future once the interface and behavior settle down. An earlier implementation of the speculation concept can be found in llvm/include/llvm/ExecutionEngine/Orc/Speculation.h. Both systems have the same goal (hiding compilation latency) but different mechanisms. This patch relies entirely on information available in the controller, where the old system could receive additional information from the JIT'd runtime via callbacks. I aim to combine the two in the future, but want to gain more practical experience with speculation first.	2025-01-10 11:48:08 +11:00
Mingming Liu	a6aa9365f7	[NFC][AsmPrinter] Pass MJTI by const reference instead of const pointer (#122365 ) The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a null pointer before calling `emitJumpTableEntry` or `emitJumpTableSizesSection`. This patch updates callee function's signature to accept const reference, this way it's explicit `MJTI` won't be nullptr inside the callee. [1] `9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)`	2025-01-09 15:57:32 -08:00
Thor Preimesberger	c1c50c7a3e	[SDPatternMatch] Add matchers m_ExtractSubvector and m_InsertSubvector (#120212 ) Fixes #118846	2025-01-09 15:20:48 -08:00
vporpo	6312beef78	[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (#120826 ) With this patch we switch from the temporary dummy seeds to actual seeds provided by the seed collector. The seeds get sliced and each slice is used as the starting point for vectorization.	2025-01-09 11:53:48 -08:00
Nico Weber	9ec92873ec	Revert "[HLSL] Move length support out of the DirectX Backend (#121611 )" This reverts commit a6b7181733c83523a39d4f4e788c6b7a227d477d. Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see https://github.com/llvm/llvm-project/pull/121611#issuecomment-2581004278	2025-01-09 14:19:03 -05:00
Farzon Lotfi	a6b7181733	[HLSL] Move length support out of the DirectX Backend (#121611 ) ## Changes - Delete DirectX length intrinsic - Delete HLSL length lang builtin - Implement length algorithm entirely in the header. ## History - In the past if an HLSL intrinsic lowered to either a spirv op code or a DXIL opcode we represented it with intrinsics ## Why we are moving away? - To make HLSL apis more portable the team decided that it makes sense for some intrinsics to be defined only in the header. - Since there tends to be more SPIRV opcodes than DXIL opcodes the plan is to support SPIRV opcodes either with target specific builtins or via pattern matching.	2025-01-09 13:11:52 -05:00
Ramkumar Ramachandra	17912f336b	LAA: refactor dependence class to prep for scaled strides (NFC) (#122113 ) Rearrange the DepDistanceAndSizeInfo struct in preparation to scale strides. getDependenceDistanceStrideAndSize now returns the data of CommonStride, MaxStride, and clarifies when to retry with runtime checks, in place of (unscaled) strides.	2025-01-09 16:05:17 +00:00
macurtis-amd	52c338daec	[llvm][NFC] Rework Timer.cpp globals to ensure valid lifetimes (#121663 ) This is intended to help with flang `-ftime-report` support: - #107270. With this change, I was able to cherry-pick #107270, uncomment `llvm::TimePassesIsEnabled = true;` and compile with `-ftime-report`. I also noticed that `clang/lib/Driver/OffloadBundler.cpp` was statically constructing a `TimerGroup` and changed it to lazily construct via ManagedStatic.	2025-01-09 06:32:48 -06:00
Benjamin Maxwell	f88ef1bd1b	[LV] Teach LoopVectorizationLegality about struct vector calls (#119221 ) This is a split-off from #109833 and only adds code relating to checking if a struct-returning call can be vectorized. This initial patch only allows the case where all users of the struct return are `extractvalue` operations that can be widened. ``` %call = tail call { float, float } @foo(float %in_val) %extract_a = extractvalue { float, float } %call, 0 %extract_b = extractvalue { float, float } %call, 1 ``` Note: The tests require the VFABI changes from #119000 to pass.	2025-01-09 09:27:29 +00:00
Akshat Oke	f07b10b7c4	[Support] Recycler: Match dealloc size and enforce min size (#121889 ) Address sanitizer found mismatching deallocation size in Recycler.	2025-01-09 14:22:27 +05:30
Nikita Popov	71f7b972c3	[Local] Make combineAAMetadata() more principled (#122091 ) This moves combineAAMetadata() into Local and implements it via a new AAOnly flag, which will intersect only AA metadata and keep other known metadata. The existing KnownIDs list is dropped, because it is redundant with the switch in combineMetadata(), which already drops unknown metadata. I tried a few variants of this, and ultimately went with the AAOnly flag because this way we make an explicit choice for each metadata kind supported by combineMetadata(), and ignoring the flag gives you conservatively correct behavior. I checked that the memcpy tests still pass if we adjust the logic for MD_memprof/MD_callsite to drop the metadata instead of arbitrarily picking one. Fixes https://github.com/llvm/llvm-project/issues/121495.	2025-01-09 09:34:46 +01:00
NAKAMURA Takumi	61b294aa15	Introduce CounterExpressionBuilder::subst(C, Map) (#112698 ) This return a counter for each term in the expression replaced by ReplaceMap. At the moment, this doesn't update the Map, so Map is marked as `const`.	2025-01-09 16:27:35 +09:00
Lang Hames	42b23257c5	[ORC] Fail materialization in tasks that are destroyed before running. If a MaterialiaztionTask is destroyed before running then we need to call failMaterialization on the MaterializationResponsibility member.	2025-01-09 17:49:59 +11:00
Yingwei Zheng	d80bdf7261	[IRBuilder] Add a helper function to intersect FMFs from two instructions (#122059 ) Address review comment in https://github.com/llvm/llvm-project/pull/121899#discussion_r1905765776	2025-01-09 14:36:42 +08:00
Justin Bogner	cba9bd5cb0	[DirectX] Implement the resource.load.rawbuffer intrinsic (#121012 ) This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the buffer load docs under DirectX/DXILResources. This resolves the "load" parts of #106188	2025-01-08 16:56:05 -08:00
Lang Hames	8312876205	[ORC] Fix Task cleanup during DynamicThreadPoolTaskDispatcher::shutdown. Threads created by DynamicThreadPoolTaskDispatcher::dispatch had been holding a unique_ptr to the most recent Task, meaning that the Task would be destroyed when the thread object was destroyed, but this would happen after the thread signaled the Dispatcher that it was finished. This could cause DynamicThreadPoolTaskDispatcher::shutdown to return (and consequently ExecutionSession to be destroyed) before all Tasks were destroyed, with Task destructors accessing ExecutionSession and related objects after they were freed. The fix is to reset the Task pointer immediately after it is run to trigger cleanup, then (if there are no other tasks to run) signal the Dispatcher that the thread is finished. This patch also updates DynamicThreadPoolTaskDispatcher::dispatch to reject any new Tasks dispatched after DynamicThreadPoolTaskDispatcher::shutdown is called.	2025-01-09 00:46:05 +00:00
Alexandros Lamprineas	8e65940161	[FMV][AArch64] Simplify version selection according to ACLE. (#121921 ) Currently, the more features a version has, the higher its priority is. We are changing ACLE https://github.com/ARM-software/acle/pull/370 as follows: "Among any two versions, the higher priority version is determined by identifying the highest priority feature that is specified in exactly one of the versions, and selecting that version."	2025-01-08 18:59:07 +00:00
Alex MacLean	e54054684e	[OptTable] Fix typo VALUE => VALUES (NFCI) (#121523 ) While VALUES is not actually used by LLVM_MAKE_OPT_ID_WITH_ID_PREFIX threading the correct value through is clearer and avoids the potential for strange bugs if this ever changes.	2025-01-08 08:26:26 -08:00

1 2 3 4 5 ...

57730 Commits