58205 Commits

Author SHA1 Message Date
JaydeepChauhan14
0d17547879
[X86][NFC] Added POWI function testcases (#134276)
- Moved existing llvm/test/CodeGen/X86/powi.ll file to
  llvm/test/CodeGen/X86/powi-const.ll.
- Added new testcases for powi into llvm/test/CodeGen/X86/powi.ll.
2025-04-04 13:42:20 +02:00
Paul Walker
b0b97e3b05
[LLVM][AArch64] Refactor lowering of fixed length integer setcc operations. (#132434)
The original code is essentially performing isel during legalisation
with the AArch64 specific nodes offering no additional value compared to
ISD::SETCC.
2025-04-04 12:13:45 +01:00
Alex MacLean
ba0a52a04b
[InferAS] Support getAssumedAddrSpace for Arguments for NVPTX (#133991) 2025-04-03 16:47:36 -07:00
Sumit Agarwal
996cf5dc67
[HLSL] Implement dot2add intrinsic (#131237)
Resolves #99221 
Key points: For SPIRV backend, it decompose into a `dot` followed a
`add`.

- [x] Implement dot2add clang builtin,
- [x] Link dot2add clang builtin with hlsl_intrinsics.h
- [x] Add sema checks for dot2add to CheckHLSLBuiltinFunctionCall in
SemaHLSL.cpp
- [x] Add codegen for dot2add to EmitHLSLBuiltinExpr in CGBuiltin.cpp
- [x] Add codegen tests to clang/test/CodeGenHLSL/builtins/dot2add.hlsl
- [x] Add sema tests to clang/test/SemaHLSL/BuiltIns/dot2add-errors.hlsl
- [x] Create the int_dx_dot2add intrinsic in IntrinsicsDirectX.td
- [x] Create the DXILOpMapping of int_dx_dot2add to 162 in DXIL.td
- [x] Create the dot2add.ll and dot2add_errors.ll tests in
llvm/test/CodeGen/DirectX/
2025-04-03 16:23:09 -06:00
Sterling-Augustine
7514225052
Use a more proper idiom for "the output file doesn't matter". NFC. (#134280)
As in the description. Follow up to PR #134179.
2025-04-03 10:24:10 -07:00
zhijian lin
1a540c3b8b
[PowerPC] Deprecate uses of ISD::ADDC/ISD::ADDE/ISD::SUBC/ISD::SUBE (#133155)
ISD::ADDC, ISD::ADDE, ISD::SUBC and ISD::SUBE are being deprecated,
using ISD::UADDO_CARRY,ISD::USUBO_CARRY instead. Lowering the UADDO,
UADDO_CARRY, USUBO, USUBO_CARRY in the patch.
2025-04-03 13:22:49 -04:00
Simon Pilgrim
2190808f5d
[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (REAPPLIED) (#134263)
With AVX512VL targets, use 128/256-bit VPERMV/VPERMV3 nodes when we only need the lower elements.

Reapplied version of #133923 with fix for typo in the VPERMV3 mask adjustment
2025-04-03 17:39:38 +01:00
Brox Chen
bf388f8a43
[AMDGPU][True16][CodeGen] legalize operands when move16bit SALU to VALU (#133985)
This is a follow up PR from
https://github.com/llvm/llvm-project/pull/132089.

When a V2S copy and its useMI are lowered to VALU,  this patch check:
If the generated new VALU is a true16 inst. Add subreg access on all
operands if necessary.

an example MIR looks like:
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:sreg_32 = COPY %1:vgpr_32
%3:sreg_32 = S_FLOOR_F16 %2:sreg_32, ...
```
currently lowered to
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1:vgpr_32, 0, 0, 0 ...
```
after this patch
```
%1:vgpr_32 = V_CVT_F32_U32_e64 %0:vgpr_32, 0, 0 ...
%2:vgpr_16 = V_FLOOR_F16_t16_e64 0, %1.lo16:vgpr_32, 0, 0, 0 ...
```
2025-04-03 12:26:41 -04:00
Simon Pilgrim
12f75bba41
Revert "[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded" (#134256)
Found a typo in the VPERMV3 mask adjustment - I'm going to revert and re-apply the patch with a fix

Reverts llvm/llvm-project#133923
2025-04-03 16:28:24 +01:00
Krzysztof Drewniak
f23bb530cf
[AMDGPULowerBufferFatPointers] Use InstSimplifyFolder during rewrites (#134137)
This PR updates AMDGPULowerBufferFatPointers to use the
InstSimplifyFolder
when creating IR during buffer fat pointer lowering.

This shouldn't cause any large functional changes and might improve the
quality of the generated code.
2025-04-03 10:12:18 -05:00
Aaron Puchert
9e0ca5720b
[X86] When expanding LCMPXCHG16B_SAVE_RBX, substitute RBX in base (#134109)
The pseudo-instruction LCMPXCHG16B_SAVE_RBX is used when RBX serves as
frame base pointer. At a very late stage it is then translated into a
regular LCMPXCHG16B, preceded by copying the actual argument into RBX,
and followed by restoring the register to the base pointer.

However, in case the `cmpxchg` operates on a local variable, RBX might
also be used as a base for the memory operand in frame finalization, and
we've overwritten RBX with the input operand for `cmpxchg16b`. So we
have to rewrite the memory operand base to use the saved value of RBX.

Fixes #119959.
2025-04-03 15:56:53 +02:00
Steven Perron
a77d807781
[SPIRV] Add spirv.VulkanBuffer types to the backend (#133475)
Adds code to expand the `llvm.spv.resource.handlefrombinding` and
`llvm.spv.resource.getpointer` when the resource type is
`spirv.VulkanBuffer`.

It gets expanded as a storage buffer or uniform buffer denpending on the
storage class used.

This is implementing part of

https://github.com/llvm/wg-hlsl/blob/main/proposals/0018-spirv-resource-representation.md.
2025-04-03 09:44:07 -04:00
Paul Walker
41a6bb4c05
[LLVM][CodeGen][SVE] Prefer NEON instructions when zeroing Z registers. (#133929)
Several implementations have zero-latency instructions to zero
registers. To-date no implementation has a dedicated SVE instruction but
we can use the NEON equivalent because it is defined to zero bits
128..VL regardless of the immediate used.

NOTE: The relevant instruction is not available in streaming mode, where
the original SVE DUP instruction remains in use.
2025-04-03 13:15:05 +01:00
Paul Walker
ee4e8197fa
[LLVM][AArch64][SVE] Mark DUP immediate instructions with isAsCheapAsAMove. (#133945)
Doing this means we'll regenerate an immediate rather than copy the
result of an existing one, reducing instruction dependency chains.
2025-04-03 11:42:07 +01:00
David Green
6c27817294
[SelectionDAG] Use SimplifyDemandedBits from SimplifyDemandedVectorElts Bitcast. (#133717)
This adds a call to SimplifyDemandedBits from bitcasts with scalar input
types in SimplifyDemandedVectorElts, which can help simplify the input
scalar.
2025-04-03 11:14:08 +01:00
Simon Pilgrim
bf516098fb
[X86] SimplifyDemandedVectorEltsForTargetNode - reduce the size of VPERMV/VPERMV3 nodes if the upper elements are not demanded (#133923)
With AVX512VL targets, use 128/256-bit VPERMV/VPERMV3 nodes when we only need the lower elements.
2025-04-03 11:01:08 +01:00
Hua Tian
7e65944292
[llvm][CodeGen] avoid repeated interval calculation in window scheduler (#132352)
Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.
2025-04-03 14:25:55 +08:00
tangaac
ff0c2fbd8e
[LoongArch] Pre-commit tests for vector absolute difference (#132898) 2025-04-03 09:19:59 +08:00
Sterling-Augustine
f68a5185d0
Allow this test to pass when the source is on a read-only filesystem (#134179)
llc attempts to create an empty file in the current directory, but it
can't do that on a read-only file system. Send that empty-output to
stdout, which prevents this failure.
2025-04-02 16:49:57 -07:00
Sami Tolvanen
acc6bcdc50
Support alternative sections for patchable function entries (#131230)
With -fpatchable-function-entry (or the patchable_function_entry
function attribute), we emit records of patchable entry locations to the
__patchable_function_entries section. Add an additional parameter to the
command line option that allows one to specify a different default
section name for the records, and an identical parameter to the function
attribute that allows one to override the section used.

The main use case for this change is the Linux kernel using prefix NOPs
for ftrace, and thus depending on__patchable_function_entries to locate
traceable functions. Functions that are not traceable currently disable
entry NOPs using the function attribute, but this creates a
compatibility issue with -fsanitize=kcfi, which expects all indirectly
callable functions to have a type hash prefix at the same offset from
the function entry.

Adding a section parameter would allow the kernel to distinguish between
traceable and non-traceable functions by adding entry records to
separate sections while maintaining a stable function prefix layout for
all functions. LKML discussion:

https://lore.kernel.org/lkml/Y1QEzk%2FA41PKLEPe@hirez.programming.kicks-ass.net/
2025-04-02 21:53:55 +00:00
Brox Chen
066787b9bd
[AMDGPU][True16][CodeGen] fold clamp update for true16 (#128919)
Check through COPY for possible clamp folding for v_mad_mixhi_f16 isel
2025-04-02 17:10:53 -04:00
Brox Chen
fb0e7b5f16
[AMDGPU][True16][CodeGen] Implement sgpr folding in true16 (#128929)
We haven't implemented 16 bit SGPRs. Currently allow 32-bit SGPRs to be
folded into True16 bit instructions taking 16 bit values. Also use
sgpr_32 when Imm is copied to spgr_lo16 so it could be further folded.
This improves generated code quality.
2025-04-02 16:08:26 -04:00
Florian Hahn
3bdf9a0880
[EquivalenceClasses] Use SmallVector for deterministic iteration order. (#134075)
Currently iterators over EquivalenceClasses will iterate over std::set,
which guarantees the order specified by the comperator. Unfortunately in
many cases, EquivalenceClasses are used with pointers, so iterating over
std::set of pointers will not be deterministic across runs.

There are multiple places that explicitly try to sort the equivalence
classes before using them to try to get a deterministic order
(LowerTypeTests, SplitModule), but there are others that do not at the
moment and this can result at least in non-determinstic value naming in
Float2Int.

This patch updates EquivalenceClasses to keep track of all members via a
extra SmallVector and removes code from LowerTypeTests and SplitModule
to sort the classes before processing.

Overall it looks like compile-time slightly decreases in most cases, but
close to noise:

https://llvm-compile-time-tracker.com/compare.php?from=7d441d9892295a6eb8aaf481e1715f039f6f224f&to=b0c2ac67a88d3ef86987e2f82115ea0170675a17&stat=instructions

PR: https://github.com/llvm/llvm-project/pull/134075
2025-04-02 20:27:43 +01:00
Juan Manuel Martinez Caamaño
beae0e9f1a
[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on gfx9/10 (#133055)
This patch introduces the `vmem-to-lds-load-insts` target feature, which
can be used to enable builtins `__builtin_amdgcn_global_load_lds` and
`__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this
feature.

This feature is only available on gfx9/10.

A limitation of using a common target feature for both builtins is that
we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available
on gfx6,7,8.
2025-04-02 20:00:09 +02:00
Juan Manuel Martinez Caamaño
0375ef07c3
[Clang][AMDGPU] Add __builtin_amdgcn_cvt_off_f32_i4 (#133741)
This built-in maps to `V_CVT_OFF_F32_I4` which treats its input as
a 4-bit signed integer and returns `0.0625f * src`.

SWDEV-518861
2025-04-02 19:51:40 +02:00
Luke Lau
711b15d179
[RISCV] Mark subvector extracts from index 0 as cheap (#134101)
Previously we only marked fixed length vector extracts as cheap, so this
extends it to any extract at index 0 which should just be a subreg
extract.

This allows extracts of i1 vectors to be considered for DAG combines,
but also scalable vectors too.

This causes some slight improvements with large legalized fixed-length
vectors, but the underlying motiviation for this is to actually prevent
an unprofitable DAG combine on a scalable vector in an upcoming patch.
2025-04-02 17:57:13 +01:00
Ryan Buchner
fa2a6d68c6
[CodeGenPrepare][RISCV] Combine (X ^ Y) and (X == Y) where appropriate (#130922)
Fixes #130510.

In RISCV, modify the folding of (X ^ Y == 0) -> (X == Y) to account for
cases where the (X ^ Y) will be re-used.

If a constant is being used for the XOR before a branch, ensure that it
is small enough to fit within a 12-bit immediate field. Otherwise, the
equality check is more efficient than the check against 0, see the
following:
```
# %bb.0:
        lui     a1, 5
        addiw   a1, a1, 1365
        xor     a0, a0, a1
        beqz    a0, .LBB0_2
# %bb.1: 
        ret
.LBB0_2: 
```

```
# %bb.0:
        lui     a1, 5
        addiw   a1, a1, 1365
        beq    a0, a1, .LBB0_2
# %bb.1: 
        xor     a0, a0, a1
        ret
.LBB0_2: 
```

Similarly, if the XOR is between 1 and a size one integer, we should
still fold away the XOR since that comparison can be optimized as a
comparison against 0.
```
# %bb.0:
        slt a0, a0, a1
        xor  a0, a0, 1
        beqz    a0, .LBB0_2
# %bb.1: 
        ret
.LBB0_2: 
```

```
# %bb.0:
        slt a0, a0, a1
        bnez    a0, .LBB0_2
# %bb.1: 
        xor  a0, a0, 1
        ret
.LBB0_2: 
```

One question about my code is that I used a hard-coded value for the
width of a RISCV ALU immediate. Do you know of a way that I can gather
this from the `context`, I was unable to devise one.
2025-04-02 09:56:09 -07:00
Luke Lau
c132bd6885 [RISCV] Add test for vmv.s.x of an immediate into a zeroinitializer vector. NFC
The immediate version of ffaaaceaa1cfaa7103196cc7f307ffcb61d73558
2025-04-02 16:48:57 +01:00
Luke Lau
ffaaaceaa1 [RISCV] Add test for vmv.s.x into a zeroinitializer vector. NFC
This is generated by the loop vectorizer for out-of-loop add
reductions with some starting value
2025-04-02 16:09:44 +01:00
Simon Pilgrim
3843dfeaf7 [X86] Add demanded elts test coverage for vXi16 VPERMW nodes
Requested for #133923
2025-04-02 15:32:01 +01:00
Akshat Oke
a13a51b91f
[AMDGPU][NPM] Port AMDGPUSetWavePriority to NPM (#130064) 2025-04-02 16:28:05 +05:30
Simon Pilgrim
2426ac647f [X86] Add demanded elts for v8f32 VPERMV node
Based off #133923 - test to ensure the VPERMV node as only the lower 128-bit source elements are demanded.
2025-04-02 11:18:47 +01:00
Nikita Popov
9356091a98
[GlobalMerge][PPC] Don't merge globals in llvm.metadata section (#131801)
The llvm.metadata section is not emitted and has special semantics. We
should not merge globals in it, similarly to how we already skip merging
of `llvm.xyz` globals.

Fixes https://github.com/llvm/llvm-project/issues/131394.
2025-04-02 10:40:53 +02:00
Sudharsan Veeravalli
536fe74aaa
[RISCV] Modify register type of extd* Xqcibm instructions (#134027)
The v0.8 spec specifies that rs1 cannot be x31 (t6) since these
instructions operate on a pair of registers (rs1 and rs1 + 1) with no wrap
around.

The latest spec can be found here:
https://github.com/quic/riscv-unified-db/releases/tag/Xqci-0.8.0
2025-04-02 12:14:50 +05:30
ZhaoQi
46968310cb
[LoongArch] Move fix-tle-le-sym-type test to test/MC. NFC (#133839) 2025-04-02 09:11:20 +08:00
Alex MacLean
3c7a0e6c82
[NVPTX] Cleanup and refactor atomic lowering (#133781)
Cleanup lowering of atomic instructions and intrninsics. The TableGen
changes are primarily a refactor, though sub variants are now lowered
via operation legalization, potentially allowing for more DAG
optimization.
2025-04-01 13:08:57 -07:00
Virginia Cangelosi
79487757b7
[Clang][LLVM] Implement multi-multi vectors MOP4{A/S} (#129230)
Implement all multi-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files
2025-04-01 19:20:27 +01:00
Sam Clegg
a30caa6a73
[WebAssembly] Add missing tests from #133289 (#133938) 2025-04-01 10:47:35 -07:00
Brox Chen
dd1d41f833
[AMDGPU][True16][CodeGen] fix moveToVALU with proper subreg access in true16 (#132089)
There are V2S copies between vpgr16 and spgr32 in true16 mode. This is
caused by vgpr16 and sgpr32 both selectable by 16bit src in ISel.

When a V2S copy and its useMI are lowered to VALU,  this patch check
1. If the generated new VALU is used by a true16 inst. Add subreg access
if necessary.
2. Legalize the V2S copy by replacing it to subreg_to_reg

an example MIR looks like:
```
%2:sgpr_32 = COPY %1:vgpr_16
%3:sgpr_32 = S_OR_B32 %2:sgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:sgpr_32, ...
```
currently lowered to
```
%2:vgpr_32 = COPY %1:vgpr_16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3:vgpr_32, ...
```
after this patch
```
%2:vgpr_32 = SUBREG_TO_REG 0, %1:vgpr_16, lo16
%3:vgpr_32 = V_OR_B32 %2:vgpr_32, ...
%4:vgpr_16 = V_ADD_F16_t16 %3.lo16:vgpr_32, ...
```
2025-04-01 12:40:18 -04:00
Petr Hosek
4b19db6db9
Revert "AsmPrinter: Remove ELF's special lowerRelativeReference for unnamed_addr function" (#133935)
Reverts llvm/llvm-project#132684
2025-04-01 09:39:07 -07:00
Jonathan Thackray
558ce50ebc
[Clang][LLVM] Implement multi-single vectors MOP4{A/S} (#129226)
Implement all multi-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in clang and
llvm following the ACLE in https://github.com/ARM-software/acle/pull/381/files
2025-04-01 17:04:59 +01:00
David Green
4cb41d136c
[AArch64] Prefer zip over ushll for anyext. (#133433)
Many CPUs have a higher throughput of ZIP instructions vs USHLL. This
adds some tablegen patterns for preferring zip in anyext patterns.
2025-04-01 16:24:54 +01:00
Simon Pilgrim
664745cf38 [X86] avx512-vselect.ll - regenerate VPTERNLOG comments 2025-04-01 15:50:07 +01:00
Virginia Cangelosi
e92ff64bad
[Clang][LLVM] Implement single-multi vectors MOP4{A/S} (#128854)
Implement all single-multi {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files.

This PR depends on https://github.com/llvm/llvm-project/pull/127797

This patch updates the semantics of template arguments in intrinsic
names for clarity and ease of use. Previously, template argument numbers
indicated which character in the prototype string determined the final
type suffix, which was confusing—especially for intrinsics using
multiple prototype modifiers per operand (e.g., intrinsics operating on
arrays of vectors). The number had to reference the correct character in
the prototype (e.g., the ‘u’ in “2.u”), making the system cumbersome and
error-prone.
With this patch, template argument numbers now refer to the operand
number that determines the final type suffix, providing a more intuitive
and consistent approach.
2025-04-01 15:05:30 +01:00
Jeremy Morse
1ebc308bba
[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855)
During the transition from debug intrinsics to debug records, we used
several different command line options to customise handling: the
printing of debug records to bitcode and textual could be independent of
how the debug-info was represented inside a module, whether the
autoupgrader ran could be customised. This was all valuable during
development, but now that totally removing debug intrinsics is coming
up, this patch removes those options in favour of a single flag
(experimental-debuginfo-iterators), which enables autoupgrade, in-memory
debug records, and debug record printing to bitcode and textual IR.

We need to do this ahead of removing the
experimental-debuginfo-iterators flag, to reduce the amount of
test-juggling that happens at that time.

There are quite a number of weird test behaviours related to this --
some of which I simply delete in this commit. Things like
print-non-instruction-debug-info.ll , the test suite now checks for
debug records in all tests, and we don't want to check we can print as
intrinsics. Or the update_test_checks tests -- these are duplicated with
write-experimental-debuginfo=false to ensure file writing for intrinsics
is correct, but that's something we're imminently going to delete.

A short survey of curious test changes:
* free-intrinsics.ll: we don't need to test that debug-info is a zero
cost intrinsic, because we won't be using intrinsics in the future.
* undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory
mode while we sorted something out; it works now either way.
* salvage-cast-debug-info.ll: was testing intrinsics-in-memory get
salvaged, isn't necessary now
* localize-constexpr-debuginfo.ll: was producing "dead metadata"
intrinsics for optimised-out variable values, dbg-records takes the
(correct) representation of poison/undef as an operand. Looks like we
didn't update this in the past to avoid spurious test differences.
* Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing
that debug-info affected codegen, and we deferred updating the tests
until now. This is just one of those silent gnochange issues that get
fixed by RemoveDIs.

Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc,
that checks we can autoupgrade debug intrinsics that are in bitcode into
the new debug records.
2025-04-01 14:27:11 +01:00
Simon Pilgrim
2c0b888359
[X86] combineX86ShuffleChain - prefer combining to X86ISD::SHUF128 if PERMQ operands are splittable (#133900)
If the 512-bit unary shuffle is a concatenation of 128/256-bit subvectors then we're better off using a X86ISD::SHUF128 node so we can fold the concatenation into the shuffle as well.
2025-04-01 13:47:52 +01:00
Virginia Cangelosi
6892d54286
[Clang][LLVM] Implement single-single vectors MOP4{A/S} (#127797)
Implement all single-single {BF/F/S/U/SU/US}MOP4{A/S} instructions in
clang and llvm following the acle in
https://github.com/ARM-software/acle/pull/381/files
2025-04-01 13:35:09 +01:00
Akshat Oke
4a68702455
[CodeGen][NPM] Port XRayInstrumentation to NPM (#129865) 2025-04-01 15:38:49 +05:30
Afanasyev Ivan
337bad3921
[EarlyIfConverter] Fix reg killed twice after early-if-predicator and ifcvt (#133554)
Bug relates to `early-if-predicator` and `early-ifcvt` passes. If
virtual register has "killed" flag in both basic blocks to be merged
into head, both instructions in head basic block will have "killed" flag
for this register. It makes MIR incorrect.

Example:

```
  bb.0: ; if
    ...
    %0:intregs = COPY $r0
    J2_jumpf %2, %bb.2, implicit-def dead $pc
    J2_jump %bb.1, implicit-def dead $pc

  bb.1: ; if.then
    ...
    S4_storeiri_io killed %0, 0, 1
    J2_jump %bb.3, implicit-def dead $pc

  bb.2: ; if.else
    ...
    S4_storeiri_io killed %0, 0, 1
    J2_jump %bb.3, implicit-def dead $pc
```

After early-if-predicator will become:

```
  bb.0:
    %0:intregs = COPY $r0
    S4_storeirif_io %1, killed %0, 0, 1
    S4_storeirit_io %1, killed %0, 0, 1
```

Having `killed` flag set twice in bb.0 for `%0` is an incorrect MIR.
2025-04-01 12:06:30 +02:00
Shoreshen
7f14b2a9eb
Revert "[AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable" (#133880)
Reverts llvm/llvm-project#130577
2025-04-01 17:37:02 +08:00