57730 Commits

Author SHA1 Message Date
Sam Tebbs
795e35a653
Reland "[LoopVectorizer] Add support for partial reductions" with non-phi operand fix. (#121744)
This relands the reverted #120721 with a fix for cases where neither
reduction operand are the reduction phi. Only
63114239cc8d26225a0ef9920baacfc7cc00fc58 and
63114239cc8d26225a0ef9920baacfc7cc00fc58 are new on top of the reverted
PR.

---------

Co-authored-by: Nicholas Guy <nicholas.guy@arm.com>
2025-01-13 11:20:35 +00:00
Durgadoss R
7e2eb0f83e
[NVPTX] Add float to tf32 conversion intrinsics (#121507)
This patch adds the missing variants of float to tf32 conversion
intrinsics, with their corresponding lit tests.

PTX Spec link:

https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-13 16:17:42 +05:30
Akshat Oke
4f96fb5fb3
Reapply "Spiller: Detach legacy pass and supply analyses instead (#119181)" (#122665)
Makes Inline Spiller amenable to the new PM.

This reapplies commit a531800344dc54e9c197a13b22e013f919f3f5e1 reverted
because of two unused private members reported on sanitizer bots.
2025-01-13 14:14:13 +05:30
Akshat Oke
f431f93a77
[CodeGen][NewPM] Use proper NPM AtomicExpandPass in AMDGPU (#122086)
`PassRegistry.def` already has this entry, but the dummy definition was
being pulled instead.

I couldn't reproduce the build failures that FIXME referenced, maybe the
Dummy pass getting in the way was part of the cause.
2025-01-13 10:38:24 +05:30
Sameer Sahasrabuddhe
77e6f434ec
[SPIRV] convergence anchor intrinsic does not have a parent token (#122230) 2025-01-13 09:54:57 +05:30
Justin Bogner
0e51b54b7a
[DirectX] Implement the resource.store.rawbuffer intrinsic (#121282)
This introduces `@llvm.dx.resource.store.rawbuffer` and generalizes the
buffer store docs under DirectX/DXILResources.

Fixes #106188
2025-01-12 18:52:20 -07:00
Daniel Paoliello
5ee0a71df9
[aarch64][win] Add support for import call optimization (equivalent to MSVC /d2ImportCallOptimization) (#121516)
This change implements import call optimization for AArch64 Windows
(equivalent to the undocumented MSVC `/d2ImportCallOptimization` flag).

Import call optimization adds additional data to the binary which can be
used by the Windows kernel loader to rewrite indirect calls to imported
functions as direct calls. It uses the same [Dynamic Value Relocation
Table mechanism that was leveraged on x64 to implement
`/d2GuardRetpoline`](https://techcommunity.microsoft.com/blog/windowsosplatform/mitigating-spectre-variant-2-with-retpoline-on-windows/295618).

The change to the obj file is to add a new `.impcall` section with the
following layout:
```cpp
  // Per section that contains calls to imported functions:
  //  uint32_t SectionSize: Size in bytes for information in this section.
  //  uint32_t Section Number
  //  Per call to imported function in section:
  //    uint32_t Kind: the kind of imported function.
  //    uint32_t BranchOffset: the offset of the branch instruction in its
  //                            parent section.
  //    uint32_t TargetSymbolId: the symbol id of the called function.
```

NOTE: If the import call optimization feature is enabled, then the
`.impcall` section must be emitted, even if there are no calls to
imported functions.

The implementation is split across a few parts of LLVM:
* During AArch64 instruction selection, the `GlobalValue` for each call
to a global is recorded into the Extra Information for that node.
* During lowering to machine instructions, the called global value for
each call is noted in its containing `MachineFunction`.
* During AArch64 asm printing, if the import call optimization feature
is enabled:
- A (new) `.impcall` directive is emitted for each call to an imported
function.
- The `.impcall` section is emitted with its magic header (but is not
filled in).
* During COFF object writing, the `.impcall` section is filled in based
on each `.impcall` directive that were encountered.

The `.impcall` section can only be filled in when we are writing the
COFF object as it requires the actual section numbers, which are only
assigned at that point (i.e., they don't exist during asm printing).

I had tried to avoid using the Extra Information during instruction
selection and instead implement this either purely during asm printing
or in a `MachineFunctionPass` (as suggested in [on the
forums](https://discourse.llvm.org/t/design-gathering-locations-of-instructions-to-emit-into-a-section/83729/3))
but this was not possible due to how loading and calling an imported
function works on AArch64. Specifically, they are emitted as `ADRP` +
`LDR` (to load the symbol) then a `BR` (to do the call), so at the point
when we have machine instructions, we would have to work backwards
through the instructions to discover what is being called. An initial
prototype did work by inspecting instructions; however, it didn't
correctly handle the case where the same function was called twice in a
row, which caused LLVM to elide the `ADRP` + `LDR` and reuse the
previously loaded address. Worse than that, sometimes for the
double-call case LLVM decided to spill the loaded address to the stack
and then reload it before making the second call. So, instead of trying
to implement logic to discover where the value in a register came from,
I instead recorded the symbol being called at the last place where it
was easy to do: instruction selection.
2025-01-11 21:30:17 -08:00
Austin Kerbow
657fb4433e
[AMDGPU] Add target hook to isGlobalMemoryObject (#112781)
We want special handing for IGLP instructions in the scheduler but they
should still be treated like they have side effects by other passes. Add
a target hook to the ScheduleDAGInstrs DAG builder so that we have more
control over this.
2025-01-11 09:57:57 -08:00
Amr Hesham
1d58699f5c
[SDPatternMatch] Add Matcher m_Undef (#122521)
Add Matcher `m_Undef`

Fixes: #122439
2025-01-11 13:23:37 +01:00
Ramkumar Ramachandra
f38c40bff3
VT: teach isImpliedCondMatchingOperands about samesign (#122474)
Move isImplied{True,False}ByMatchingCmp from CmpInst to ICmpInst, so
that it can operate on CmpPredicate instead of CmpInst::Predicate, and
teach it about samesign. There are two callers of this function, and we
choose to migrate the one in ValueTracking, namely
isImpliedCondMatchingOperands to CmpPredicate, hence teaching it about
samesign, with visible test impact.
2025-01-11 09:08:57 +00:00
Fangrui Song
0de18e72c6
-ftime-report: reorganize timers
The code generation time is unclear in the -ftime-report output:

* The two clang timers "Code Generation Time" and "LLVM IR Generation
  Time" are in the default group "Miscellaneous Ungrouped Timers".
* There is also a "Clang front-end time" group, which actually includes
  code generation time.

```
===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0611 (  1.7%)   0.0099 (  4.4%)   0.0710 (  1.9%)   0.0713 (  1.9%)  LLVM IR Generation Time
   3.5140 ( 98.3%)   0.2165 ( 95.6%)   3.7306 ( 98.1%)   3.7342 ( 98.1%)  Code Generation Time
   3.5751 (100.0%)   0.2265 (100.0%)   3.8016 (100.0%)   3.8055 (100.0%)  Total
...
===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 3.9108 seconds (3.9146 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   3.6802 (100.0%)   0.2306 (100.0%)   3.9108 (100.0%)   3.9146 (100.0%)  Clang front-end timer
   3.6802 (100.0%)   0.2306 (100.0%)   3.9108 (100.0%)   3.9146 (100.0%)  Total
```

This patch

* renames "Clang front-end time report" (FrontendAction time) to "Clang
  time report",
* renames "Clang front-end" to "Front end",
* moves "LLVM IR Generation" into the group,
* replaces "Code Generation time" with "Optimizer" (middle end) and
  "Machine code generation" (back end).

```
% clang -c sqlite3.i -w -ftime-report -mllvm -sort-timers=0
...
===-------------------------------------------------------------------------===
                               Clang time report
===-------------------------------------------------------------------------===
  Total Execution Time: 1.5922 seconds (1.5972 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.5107 ( 35.9%)   0.0105 (  6.2%)   0.5211 ( 32.7%)   0.5222 ( 32.7%)  Front end
   0.2464 ( 17.3%)   0.0340 ( 20.0%)   0.2804 ( 17.6%)   0.2814 ( 17.6%)  LLVM IR generation
   0.6240 ( 43.9%)   0.1235 ( 72.7%)   0.7475 ( 47.0%)   0.7503 ( 47.0%)  Machine code generation
   0.0413 (  2.9%)   0.0018 (  1.0%)   0.0431 (  2.7%)   0.0433 (  2.7%)  Optimizer
   1.4224 (100.0%)   0.1698 (100.0%)   1.5922 (100.0%)   1.5972 (100.0%)  Total
```

Pull Request: https://github.com/llvm/llvm-project/pull/122225
2025-01-10 19:25:18 -08:00
GeorgeHuyubo
9b528ed380
Debuginfod cache use index cache settings and include real file name (#120814)
This PR include two changes:
1. Change debuginfod cache file name to include origin file name, the
new file name would be something like:

llvmcache-13267c5f5d2e3df472c133c8efa45fb3331ef1ea-liblzma.so.5.2.2.debuginfo.dwp
So it will provide more information in image list instead of a plain
llvmcache-123
2. Switch debuginfod cache to use lldb index cache settings. Currently
we don't have proper settings for setting the cache path or the cache
expiration time for debuginfod cache. We want to use the lldb index
cache settings, as they make sense to be in the same place and have the
same TTL.

---------

Co-authored-by: George Hu <georgehuyubo@gmail.com>
2025-01-10 18:13:46 -08:00
Mircea Trofin
6329355860
[ctxprof] Move test serialization to yaml (#122545)
We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader.

A subsequent patch will address deserialization.

(thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
2025-01-10 18:04:25 -08:00
Fangrui Song
af4d76d909
[Support] Reduce globaal variable overhead after #121663
* Construct frequently-accessed TimerLock/DefaultTimerGroup early to
  reduce overhead.
* Rename `aquireDefaultGroup` to `acquireTimerGlobals` and restore
  ManagedStatic::claim. https://reviews.llvm.org/D76099

* Drop mtg::. We use internal linkage, so mtg:: is unneeded and might
  mislead users. In addition, llvm/ code almost never introduces a named
  namespace not in llvm::. Drop mtg::.
* Replace some unique_ptr with optional to reduce overhead.
* Switch to `functionName()`.
* Simplify `llvm::initTimerOptions` and `TimerGroup::constructForStatistics()`

Pull Request: https://github.com/llvm/llvm-project/pull/122429
2025-01-10 17:59:28 -08:00
Sergei Barannikov
a475ae05fb
Revert "[ADT] Fix specialization of ValueIsPresent for PointerUnion" (#122557)
Reverts llvm/llvm-project#121847

Causes compile time regressions and allegedly miscompilation.
2025-01-11 03:36:34 +03:00
vporpo
9248428db7
[SandboxVec][DAG][NFC] Refactor setNextNode() and setPrevNode() (#122363)
This patch updates DAG's `setNextNode()` and `setPrevNode()` to update
both nodes of the link.
2025-01-10 13:32:33 -08:00
Farzon Lotfi
b900379e26
[HLSL] Reapply Move length support out of the DirectX Backend (#121611) (#122337)
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.

## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics

## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
2025-01-10 14:16:27 -05:00
Durgadoss R
372044ee09
[NVPTX] Add TMA Bulk Copy intrinsics (#122344)
PR #96083 added intrinsics for async copy of 'tensor' data
using TMA. Following a similar design, this PR adds intrinsics
for async copy of bulk data (non-tensor variants) through TMA.

* These intrinsics optionally support multicast and cache_hints,
   as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
   appropriate PTX instructions.
* Lit tests are added for all combinations of these intrinsics in
   cp-async-bulk.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.

PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-10 22:31:53 +05:30
Philip Reames
24bb180e8a
[RISCV] Attempt to widen SEW before generic shuffle lowering (#122311)
This takes inspiration from AArch64 which does the same thing to assist
with zip/trn/etc.. Doing this recursion unconditionally when the mask
allows is slightly questionable, but seems to work out okay in practice.

As a bit of context, it's helpful to realize that we have existing logic
in both DAGCombine and InstCombine which mutates the element width of in
an analogous manner. However, that code has two restriction which
prevent it from handling the motivating cases here. First, it only
triggers if there is a bitcast involving a different element type.
Second, the matcher used considers a partially undef wide element to be
a non-match. I considered trying to relax those assumptions, but the
information loss for undef in mid-level opt seemed more likely to open a
can of worms than I wanted.
2025-01-10 07:12:24 -08:00
Sergei Barannikov
7b05367943
[ADT] Fix specialization of ValueIsPresent for PointerUnion (#121847)
Two instances of `PointerUnion` with different active members and null
value compare unequal. Currently, this results in counterintuitive
behavior when using functions from `Casting.h`, e.g.:

```C++
  PointerUnion<int *, float *> U;
  // U = (int *)nullptr;
  dyn_cast<int *>(U); // Aborts
  dyn_cast<float *>(U); // Aborts
  U = (float *)nullptr;
  dyn_cast<int *>(U); // OK
  dyn_cast<float *>(U); // OK
```

`dyn_cast` should abort in all cases because the argument is null.
Currently, it aborts only if the first member is active. This happens
because the partial template specialization of `ValueIsPresent` for
nullable types compares the union with a union constructed from nullptr,
and the two unions compare equal only if their active members are the
same.

This patch changed the specialization of `ValueIsPresent` for nullable
types to make `isPresent()` return false for all possible null values of
a PointerUnion, and fixes two places where the old behavior was
exploited.

Pull Request: https://github.com/llvm/llvm-project/pull/121847
2025-01-10 16:43:19 +03:00
Nikita Popov
c39500f88c Revert "[GVN] MemorySSA for GVN: add optional AllowMemorySSA"
This reverts commit eb63cd62a4a1907dbd58f12660efd8244e7d81e9.

This changes the preservation behavior for MSSA when the new flag
is not enabled.
2025-01-10 12:57:00 +01:00
Mirko Brkušanin
3def49cb64
[AMDGPU] Remove s_wakeup_barrier instruction (#122277) 2025-01-10 11:30:22 +01:00
Momchil Velikov
eb63cd62a4 [GVN] MemorySSA for GVN: add optional AllowMemorySSA
Preparatory work to migrate from MemoryDependenceAnalysis
towards MemorySSA in GVN.

Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>
2025-01-10 10:43:12 +01:00
Jay Foad
fd922c4b4f
[CodeGen] Add const to getAddrModeArguments argument. NFC. (#122335) 2025-01-10 09:19:25 +00:00
Lang Hames
e8cc4d24bc [ORC][MachO] Fix deferred action handling during MachOPlatform bootstrap.
DeferredAAs should only capture bootstrap actions, but after 30b73ed7bd it was
capturing all actions, including those from other plugins. This is problematic
as other plugins may introduce actions that need to run before the platform
actions (e.g. on arm64e we need pointer signing to run before we access any
global pointers in the graph).

Note that this effectively undoes 30b73ed7bd, which was a buggy attempt to
synchronize writes to the DeferredAAs vector. This patch fixes that issue the
obvious way by locking the bootstrap mutex while accessing the DeferredAAs
vector.

No testcase yet: So far I've only seen this fail during bootstrap of arm64e
JIT'd programs.
2025-01-10 18:08:43 +11:00
Akshat Oke
089555095b
Revert "Spiller: Detach legacy pass and supply analyses instead (#119… (#122426)
…181)"

This reverts commit a531800344dc54e9c197a13b22e013f919f3f5e1.
2025-01-10 12:23:07 +05:30
Tyler Lanphear
4c0a0f7241
[SandboxVectorizer][NFCI] Fix use of possibly-uninitialized scalar. (#122201)
The `EraseCallbackID` field is not always initialized in the ctor for
SeedCollector; if not, it will be used uninitialized by its dtor. This
could potentially lead to the erasure of a random callback, leading to a
bug.

Fixed by making `CallbackID` an opaque type, which is always
default-initialized to an invalid ID.
2025-01-09 22:43:30 -08:00
Akshat Oke
a531800344
Spiller: Detach legacy pass and supply analyses instead (#119181)
Makes Inline Spiller amenable to the new PM.
2025-01-10 11:46:56 +05:30
Vitaly Buka
4c8fdc2954
[nfc][BoundsChecking] Rename BoundsCheckingOptions into Options (#122359) 2025-01-09 20:38:13 -08:00
Vitaly Buka
9c2de994a1
[nfc][BoundsChecking] Refactor BoundsCheckingOptions (#122346)
Remove ReportingMode and ReportingOpts.
2025-01-09 20:19:01 -08:00
Lang Hames
2d10b7b750 Reapply "[ORC][llvm-jitlink] Add SimpleLazyReexportsSpeculator..." with fixes.
This reapplies 6d72bf47606, which was reverted in 57447d3ddf to investigate
build failures, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/10114.

The original patch contained an invalid unused friend declaration of
std::make_shared. This has been removed.
2025-01-10 12:16:21 +11:00
Lang Hames
57447d3ddf Revert "[ORC][llvm-jitlink] Add SimpleLazyReexportsSpeculator, use in llvm-jitlink."
This reverts commit 6d72bf47606c2a288b911d682fd96129c9c1466d while I fix bot failures.
2025-01-10 12:08:05 +11:00
Lang Hames
6d72bf4760 [ORC][llvm-jitlink] Add SimpleLazyReexportsSpeculator, use in llvm-jitlink.
Also adds a new IdleTask type and updates DynamicThreadPoolTaskDispatcher to
schedule IdleTasks whenever the total number of threads running is less than
the maximum number of MaterializationThreads.

A SimpleLazyReexportsSpeculator instance maintains a list of speculation
suggestions ((JITDylib, Function) pairs) and registered lazy reexports. When
speculation opportunities are available (having been added via
addSpeculationSuggestions or when lazy reexports were created) it schedules
an IdleTask that triggers the next speculative lookup as soon as resources
are available. Speculation suggestions are processed first, followed by
lookups for lazy reexport bodies. A callback can be registered at object
construction time to record lazy reexport executions as they happen, and these
executions can be fed back into the speculator as suggestions on subsequent
executions.

The llvm-jitlink tool is updated to support speculation when lazy linking is
used via three new arguments:

 -speculate=[none|simple] : When the 'simple' value is specified a
                            SimpleLazyReexportsSpeculator instances is used
                            for speculation.

 -speculate-order <path> : Specifies a path to a CSV containing
                           (jit dylib name, function name) triples to use
                           as speculative suggestions in the current run.

 -record-lazy-execs <path> : Specifies a path in which to record lazy function
                             executions as a CSV of (jit dylib name, function
                             name) pairs, suitable for use with
                             -speculate-order.

The same path can be passed to -speculate-order and -record-lazy-execs, in
which case the file will be overwritten at the end of the execution.

No testcase yet: Speculative linking is difficult to test (since by definition
execution behavior should be unaffected by speculation) and this is an new
prototype of the concept*. Tests will be added in the future once the interface
and behavior settle down.

* An earlier implementation of the speculation concept can be found in
  llvm/include/llvm/ExecutionEngine/Orc/Speculation.h. Both systems have the
  same goal (hiding compilation latency) but different mechanisms. This patch
  relies entirely on information available in the controller, where the old
  system could receive additional information from the JIT'd runtime via
  callbacks. I aim to combine the two in the future, but want to gain more
  practical experience with speculation first.
2025-01-10 11:48:08 +11:00
Mingming Liu
a6aa9365f7
[NFC][AsmPrinter] Pass MJTI by const reference instead of const pointer (#122365)
The caller `AsmPrinter::emitJumpTableInfo` checks [1] `MJTI` is not a
null pointer before calling `emitJumpTableEntry` or
`emitJumpTableSizesSection`.

This patch updates callee function's signature to accept const
reference, this way it's explicit `MJTI` won't be nullptr inside the
callee.

[1]
9d5299eb61/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (L2857)
2025-01-09 15:57:32 -08:00
Thor Preimesberger
c1c50c7a3e
[SDPatternMatch] Add matchers m_ExtractSubvector and m_InsertSubvector (#120212)
Fixes #118846
2025-01-09 15:20:48 -08:00
vporpo
6312beef78
[SandboxVec][BottomUpVec] Use SeedCollector and slice seeds (#120826)
With this patch we switch from the temporary dummy seeds to actual seeds
provided by the seed collector.
The seeds get sliced and each slice is used as the starting point for
vectorization.
2025-01-09 11:53:48 -08:00
Nico Weber
9ec92873ec Revert "[HLSL] Move length support out of the DirectX Backend (#121611)"
This reverts commit a6b7181733c83523a39d4f4e788c6b7a227d477d.
Breaks Clang :: CodeGenHLSL/builtins/length.hlsl, see
https://github.com/llvm/llvm-project/pull/121611#issuecomment-2581004278
2025-01-09 14:19:03 -05:00
Farzon Lotfi
a6b7181733
[HLSL] Move length support out of the DirectX Backend (#121611)
## Changes
- Delete DirectX length intrinsic
- Delete HLSL length lang builtin
- Implement length algorithm entirely in the header.

## History
- In the past if an HLSL intrinsic lowered to either a spirv op code or
a DXIL opcode we represented it with intrinsics

## Why we are moving away?
- To make HLSL apis more portable the team decided that it makes sense
for some intrinsics to be defined only in the header.
- Since there tends to be more SPIRV opcodes than DXIL opcodes the plan
is to support SPIRV opcodes either with target specific builtins or via
pattern matching.
2025-01-09 13:11:52 -05:00
Ramkumar Ramachandra
17912f336b
LAA: refactor dependence class to prep for scaled strides (NFC) (#122113)
Rearrange the DepDistanceAndSizeInfo struct in preparation to scale
strides. getDependenceDistanceStrideAndSize now returns the data of
CommonStride, MaxStride, and clarifies when to retry with runtime
checks, in place of (unscaled) strides.
2025-01-09 16:05:17 +00:00
macurtis-amd
52c338daec
[llvm][NFC] Rework Timer.cpp globals to ensure valid lifetimes (#121663)
This is intended to help with flang `-ftime-report` support:
- #107270.

With this change, I was able to cherry-pick #107270, uncomment
`llvm::TimePassesIsEnabled = true;` and compile with `-ftime-report`.

I also noticed that `clang/lib/Driver/OffloadBundler.cpp` was statically
constructing a `TimerGroup` and changed it to lazily construct via
ManagedStatic.
2025-01-09 06:32:48 -06:00
Benjamin Maxwell
f88ef1bd1b
[LV] Teach LoopVectorizationLegality about struct vector calls (#119221)
This is a split-off from #109833 and only adds code relating to checking
if a struct-returning call can be vectorized.

This initial patch only allows the case where all users of the struct
return are `extractvalue` operations that can be widened.

```
%call = tail call { float, float } @foo(float %in_val)
%extract_a = extractvalue { float, float } %call, 0
%extract_b = extractvalue { float, float } %call, 1
```

Note: The tests require the VFABI changes from #119000 to pass.
2025-01-09 09:27:29 +00:00
Akshat Oke
f07b10b7c4
[Support] Recycler: Match dealloc size and enforce min size (#121889)
Address sanitizer found mismatching deallocation size in Recycler.
2025-01-09 14:22:27 +05:30
Nikita Popov
71f7b972c3
[Local] Make combineAAMetadata() more principled (#122091)
This moves combineAAMetadata() into Local and implements it via a new
AAOnly flag, which will intersect only AA metadata and keep other known
metadata.

The existing KnownIDs list is dropped, because it is redundant with the
switch in combineMetadata(), which already drops unknown metadata.

I tried a few variants of this, and ultimately went with the AAOnly flag
because this way we make an explicit choice for each metadata kind
supported by combineMetadata(), and ignoring the flag gives you
conservatively correct behavior.

I checked that the memcpy tests still pass if we adjust the logic for
MD_memprof/MD_callsite to drop the metadata instead of arbitrarily
picking one.

Fixes https://github.com/llvm/llvm-project/issues/121495.
2025-01-09 09:34:46 +01:00
NAKAMURA Takumi
61b294aa15
Introduce CounterExpressionBuilder::subst(C, Map) (#112698)
This return a counter for each term in the expression replaced by
ReplaceMap.

At the moment, this doesn't update the Map, so Map is marked as `const`.
2025-01-09 16:27:35 +09:00
Lang Hames
42b23257c5 [ORC] Fail materialization in tasks that are destroyed before running.
If a MaterialiaztionTask is destroyed before running then we need to call
failMaterialization on the MaterializationResponsibility member.
2025-01-09 17:49:59 +11:00
Yingwei Zheng
d80bdf7261
[IRBuilder] Add a helper function to intersect FMFs from two instructions (#122059)
Address review comment in
https://github.com/llvm/llvm-project/pull/121899#discussion_r1905765776
2025-01-09 14:36:42 +08:00
Justin Bogner
cba9bd5cb0
[DirectX] Implement the resource.load.rawbuffer intrinsic (#121012)
This introduces `@llvm.dx.resource.load.rawbuffer` and generalizes the
buffer load docs under DirectX/DXILResources.

This resolves the "load" parts of #106188
2025-01-08 16:56:05 -08:00
Lang Hames
8312876205 [ORC] Fix Task cleanup during DynamicThreadPoolTaskDispatcher::shutdown.
Threads created by DynamicThreadPoolTaskDispatcher::dispatch had been holding a
unique_ptr to the most recent Task, meaning that the Task would be destroyed
when the thread object was destroyed, but this would happen *after* the thread
signaled the Dispatcher that it was finished. This could cause
DynamicThreadPoolTaskDispatcher::shutdown to return (and consequently
ExecutionSession to be destroyed) before all Tasks were destroyed, with Task
destructors accessing ExecutionSession and related objects after they were
freed.

The fix is to reset the Task pointer immediately after it is run to trigger
cleanup, *then* (if there are no other tasks to run) signal the Dispatcher that
the thread is finished.

This patch also updates DynamicThreadPoolTaskDispatcher::dispatch to reject any
new Tasks dispatched after DynamicThreadPoolTaskDispatcher::shutdown is called.
2025-01-09 00:46:05 +00:00
Alexandros Lamprineas
8e65940161
[FMV][AArch64] Simplify version selection according to ACLE. (#121921)
Currently, the more features a version has, the higher its priority is.
We are changing ACLE https://github.com/ARM-software/acle/pull/370 as
follows:

"Among any two versions, the higher priority version is determined by
 identifying the highest priority feature that is specified in exactly
 one of the versions, and selecting that version."
2025-01-08 18:59:07 +00:00
Alex MacLean
e54054684e
[OptTable] Fix typo VALUE => VALUES (NFCI) (#121523)
While VALUES is not actually used by LLVM_MAKE_OPT_ID_WITH_ID_PREFIX
threading the correct value through is clearer and avoids the potential
for strange bugs if this ever changes.
2025-01-08 08:26:26 -08:00