548974 Commits

Author SHA1 Message Date
Panagiotis Karouzakis
c2e7fad446
[DemandedBits] Support non-constant shift amounts (#148880)
This patch adds support for the shift operators to handle non-constant
shift operands.

ashr proof -->https://alive2.llvm.org/ce/z/EN-siK
lshr proof --> https://alive2.llvm.org/ce/z/eeGzyB
shl proof --> https://alive2.llvm.org/ce/z/dpvbkq
2025-08-19 01:11:16 +08:00
Yang Bai
4eb1a07d7d
[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175)
This patch introduces a new unrolling-based approach for lowering
multi-dimensional `vector.from_elements` operations.

**Implementation Details:**
1. **New Transform Pattern**: Added `UnrollFromElements` that unrolls a
N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the
outermost dimension.
2. **Utility Functions**: Added `unrollVectorOp` to reuse the unroll
algo of vector.gather for vector.from_elements.
3. **Integration**: Added the unrolling pattern to the
convert-vector-to-llvm pass as a temporal transformation.
4. Use direct LLVM dialect operations instead of intermediate
vector.insert operations for efficiency in `VectorFromElementsLowering`.

**Example:**
```mlir
// unroll
%v = vector.from_elements  %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32>
%vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32>
%vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32>
%result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32>

// convert-vector-to-llvm
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
%poison_1d_0 = llvm.mlir.poison : vector<2xf32>
%c0_0 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32>
%c1_0 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32>
%vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>>
%poison_1d_1 = llvm.mlir.poison : vector<2xf32>
%c0_1 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32>
%c1_1 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32>
%vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>>
%result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32>
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-08-18 10:09:12 -07:00
Tobias Stadler
8135b7c1ab
[LV] Emit all remarks for unvectorizable instructions (#153833)
If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first.
This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.
2025-08-18 18:04:53 +01:00
Ramkumar Ramachandra
97f554249c
[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549)
Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as
well as inbounds at the callsite.
2025-08-18 17:48:42 +01:00
Andreas Jonson
1b60236200
[SimplifyCFG] Avoid redundant calls in gather. (NFC) (#154133)
Split out from https://github.com/llvm/llvm-project/pull/154007 as it
showed compile time improvements

NFC as there needs to be at least two icmps that is part of the chain.
2025-08-18 18:45:52 +02:00
Nishant Patel
4a9d038acd
[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432)
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.

Create_nd PR : #152351
2025-08-18 09:45:29 -07:00
LLVM GN Syncbot
d6e0922a5e [gn build] Port 3ecfc0330d93 2025-08-18 16:02:02 +00:00
Damyan Pepper
cc49f3b3e1
[NFC][HLSL] Remove confusing enum aliases / duplicates (#153909)
Remove:

* DescriptorType enum - this almost exactly shadowed the ResourceClass
enum
* ClauseType aliased ResourceClass

Although these were introduced to make the HLSL root signature handling
code a bit cleaner, they were ultimately causing confusion as they
appeared to be unique enums that needed to be converted between each
other.

Closes #153890
2025-08-18 08:58:33 -07:00
Yitzhak Mandelbaum
3ecfc0330d
[clang][dataflow] Add support for serialization and deserialization. (#152487)
Adds support for compact serialization of Formulas, and a corresponding
parse function. Extends Environment and AnalysisContext with necessary
functions for serializing and deserializing all formula-related parts of
the environment.
2025-08-18 11:55:12 -04:00
Jeremy Kun
c67d27dad0
[mlir][Presburger] NFC: return var index from IntegerRelation::addLocalFloorDiv (#153463)
addLocalFloorDiv currently returns void and requires the caller to know
that the newly added local variable is in a particular index. This
commit returns the index of the newly added variable so that callers
need not tie themselves to this implementation detail.

I found one relevant callsite demonstrating this and updated it. I am
using this API out of tree and wanted to make our out-of-tree code a bit
more resilient to upstream changes.
2025-08-18 08:47:47 -07:00
Antonio Frighetto
33761df961
Revert "[SimpleLoopUnswitch] Record loops from unswitching non-trivial conditions"
This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to
multiple performance regressions observed across downstream Numba
benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772).

While avoiding non-trivial unswitches on newly-cloned loops helps
mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509,
it may as well make the IR less friendly to vectorization / loop-
canonicalization (in the test reported, previously no select with
loop-carried dependence existed in the new specialized loops),
leading the abovementioned approach to be reconsidered.
2025-08-18 17:40:08 +02:00
Aiden Grossman
17f5f5ba55 [X86] Avoid Register implicit int conversion
PushedRegisters in this patch needs to be of type int64_t because iot is
grabbing registers from immediate operands of pseudo instructions.
However, we then compare to an actual register type later, which relies
on the implicit conversion within Register to int, which can result in
build failures in some configurations.
2025-08-18 15:37:25 +00:00
黃國庭
0773854575
[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits (#152273)
avgceil version :  https://alive2.llvm.org/ce/z/2CKrRh  

Fixes #147773 

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-18 16:36:26 +01:00
Alex MacLean
d12f58ff11
[NVVM] Add various intrinsic attrs, cleanup and consolidate td (#153436)
- llvm.nvvm.reflect - Use a PureIntrinsic for (adding speculatable),
this will be replaced by a constant prior to lowering so speculation is
fine.
- llvm.nvvm.tex.* - Add [IntrNoCallback, IntrNoFree, IntrWillReturn]
- llvm.nvvm.suld.* - Add [IntrNoCallback, IntrNoFree] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.sust.* - Add [IntrNoCallback, IntrNoFree, IntrWriteMem] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.[suq|txq|istypep].* - Use DefaultAttrsIntrinsic
- llvm.nvvm.read.ptx.sreg.* - Add [IntrNoFree, IntrWillReturn] to
non-constant reads as well.
2025-08-18 08:33:23 -07:00
Andres-Salamanca
916218ccbd
[CIR] Upstream GotoOp (#153701)
This PR upstreams `GotoOp`. It moves some tests from the `goto` test
file to the `label` test file, and adds verify logic to `FuncOp`. The
gotosSolver, required for lowering, will be implemented in a future PR.
2025-08-18 10:25:40 -05:00
Craig Topper
60aa0d4bfc
[RISCV] Add P-ext MC support for pli.dh, pli.db, and plui.dh. (#153972)
Refactor the pli.b/h/w and plui.h/w tablegen classes.
2025-08-18 08:23:14 -07:00
Jacques Pienaar
4bf33958da
[mlir] Update builders to use new form. (#154132)
Mechanically applied using clang-tidy.
2025-08-18 15:19:34 +00:00
Jay Foad
f15c6ff6cb
[AMDGPU] Make use of SIInstrInfo::isWaitcnt. NFC. (#154087) 2025-08-18 16:18:46 +01:00
Timm Baeder
6ce13ae1c2
[clang][bytecode] Always track item types in InterpStack (#151088)
This has been a long-standing problem, but we didn't use to call the
destructors of items on the stack unless we explicitly `pop()` or
`discard()` them.

When interpretation was interrupted midway-through (because something
failed), we left `Pointer`s on the stack. Since all `Block`s track what
`Pointer`s point to them (via a doubly-linked list in the `Pointer`),
that meant we potentially leave deallocated pointers in that list. We
used to work around this by removing the `Pointer` from the list before
deallocating the block.

However, we now want to track pointers to global blocks as well, which
poses a problem since the blocks are never deallocated and thus those
pointers are always left dangling.

I've tried a few different approaches to fixing this but in the end I
just gave up on the idea of never knowing what items are in the stack.
We already have an `ItemTypes` vector that we use for debugging
assertions. This patch simply enables this vector unconditionally and
uses it in the abort case to properly `discard()` all elements from the
stack. That's a little sad IMO but I don't know of another way of
solving this problem.

As expected, this is a slight hit to compile times:
https://llvm-compile-time-tracker.com/compare.php?from=574d0a92060bf4808776b7a0239ffe91a092b15d&to=0317105f559093cfb909bfb01857a6b837991940&stat=instructions:u
2025-08-18 17:15:31 +02:00
AZero13
08a140add8
[AArch64] Fix build-bot assertion error in AArch64 (#154124)
Fixes build bot assertion.

I forgot to include logic that will be added in a future PR that handles
-1 correctly. For now, let's just return nullptr like we used to.
2025-08-18 15:12:07 +00:00
William Tran-Viet
1c51886920
[libc++] Implement P3168R2: Give optional range support (#149441)
Resolves #105430

- Implement all required pieces of P3168R2
- Leverage existing `wrap_iter` and `bounded_iter` classes to implement
the `optional` regular and hardened iterator type, respectively
- Update documentation to match
2025-08-18 18:04:45 +03:00
Tiger Ding
4ab14685a0
[AMDGPU] Narrow only on store to pow of 2 mem location (#150093)
Lowering in GlobalISel for AMDGPU previously always narrows to i32 on
truncating store regardless of mem size or scalar size, causing issues
with types like i65 which is first extended to i128 then stored as i64 +
i8 to i128 locations. Narrowing only on store to pow of 2 mem location
ensures only narrowing to mem size near end of legalization.

This LLVM defect was identified via the AMD Fuzzing project.
2025-08-19 00:04:27 +09:00
Brox Chen
7c53c6162b
[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (#153894)
Update true16 mode with zext patterns using vgpr16 for 16bit data types.
This stop isel from inserting invalid "vgpr32 = copy vgpr16"
2025-08-18 11:01:57 -04:00
David Green
03912a1de5
[GlobalISel] Translate scalar sequential vecreduce.fadd/fmul as fadd/fmul. (#153966)
A llvm.vector.reduce.fadd(float, <1 x float>) will be translated to
G_VECREDUCE_SEQ_FADD with two scalar operands, which is illegal
according to the verifier. This makes sure we generate a fadd/fmul
instead.
2025-08-18 14:59:44 +00:00
LLVM GN Syncbot
f4b5c24022 [gn build] Port e6e874ce8f05 2025-08-18 14:52:19 +00:00
LLVM GN Syncbot
ad064bc5c3 [gn build] Port a0f325bd41c9 2025-08-18 14:52:18 +00:00
erichkeane
ec227050e3 [OpenACC] Fix verify lines from 8fc80519cdb97c
Like a big dummy, I completely skipped running this test locally and
forgot it would need check lines.  *sigh*, Looks like SOMEONE has a case
of the Mondays!

Anyway, this patch fixes it by adding the proper verify lines.
2025-08-18 07:49:38 -07:00
Craig Topper
98e8f01d18
[RISCV] Rename MIPS_PREFETCH->MIPS_PREF. NFC (#154062)
This matches the instruction's assembler mnemonic.
2025-08-18 07:38:10 -07:00
erichkeane
8fc80519cd [OpenACC] Fix crash on error recovery of variable in OpenACC mode
As reported, OpenACC's variable declaration handling was assuming some
semblence of legality in the example, so it didn't properly handle an
error case.  This patch fixes its assumptions so that we don't crash.

Fixes #154008
2025-08-18 07:37:45 -07:00
Timm Baeder
8f0da9b8bd
[clang][bytecode] Disable EndLifetime op for array elements (#154119)
This breaks a ton of libc++ tests otherwise, since calling
std::destroy_at will currently end the lifetime of the entire array not
just the given element.

See https://github.com/llvm/llvm-project/issues/147528
2025-08-18 16:32:50 +02:00
David Green
8b52e5ac22 [AArch64] Update and cleanup irtranslator-reductions.ll. NFC 2025-08-18 15:30:23 +01:00
erichkeane
0dbcdf33b8 [OpenACC] Fix racing commit test failures for firstprivate lowering
The original patch to implement basic lowering for firstprivate didn't
have the Sema work to change the name of the variable being generated
from openacc.private.init to openacc.firstprivate.init. I forgot about
that when I merged the Sema changes this morning, so the tests now
failed.  This patch fixes those up.

Additionally, Suggested on #153622 post-commit, it seems like a good idea to
use a size of APInt that matches the size-type, so this changes us to use that
instead.
2025-08-18 07:26:50 -07:00
Aaron Ballman
f5dc3021cd
[C] Fix failing assertion with designated inits (#154120)
Incompatible pointer to integer conversion diagnostic checks would
trigger an assertion when the designated initializer is for an array of
unknown bounds.

Fixes #154046
2025-08-18 14:22:31 +00:00
Connector Switch
b368e7f6a5
[flang] optimize acosd precision (#154118)
Part of https://github.com/llvm/llvm-project/issues/150452.
2025-08-18 14:15:52 +00:00
Krzysztof Parzyszek
ae75884130
[Frontend][OpenMP] Add 6.1 as a valid OpenMP version (#153628)
Co-authored-by: Michael Klemm <michael.klemm@amd.com>
2025-08-18 09:13:27 -05:00
Aiden Grossman
2497864e09
[Github] Remove call to llvm-project-tests from libclang tests
This allows for removing llvm-project-tests.yml. This significantly
reduces the complexity of this workflow (including the complexity of
llvm-project-tests.yml) at the cost of a little bit of duplication with
the other workflows that were also using llvm-project-tests.yml.

Reviewers: tstellar, DeinAlptraum

Reviewed By: DeinAlptraum

Pull Request: https://github.com/llvm/llvm-project/pull/153876
2025-08-18 07:07:26 -07:00
Aiden Grossman
f8cd582534
[Github] Remove call to llvm-project-tests.yml from mlir-spirv-tests.yml
This will eventually allow for removing llvm-project-tests.yml. This
should significantly reduce the complexity of this workflow (including
the complexity of llvm-project-tests.yml) at the cost of a little bit of
duplication.

Reviewers: IgWod-IMG, kuhar

Reviewed By: kuhar

Pull Request: https://github.com/llvm/llvm-project/pull/153871
2025-08-18 07:05:39 -07:00
Kazu Hirata
c48ec7fb60
[clang] Proofread SourceBasedCodeCoverage.rst (#154050) 2025-08-18 07:02:15 -07:00
Kazu Hirata
07eb7b7692
[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>.  Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:

  template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};

We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
2025-08-18 07:01:29 -07:00
AZero13
0e52092ff7
[AArch64] Adjust comparison constant if adjusting it means less instructions (#151024)
Prefer constants that require less instructions to materialize, in both
Global-ISel and Selection-DAG
2025-08-18 14:56:45 +01:00
Simon Pilgrim
858d1dfa2c
[DAG] visitTRUNCATE - early out from computeKnownBits/ComputeNumSignBits failures. NFC. (#154111)
Avoid unnecessary (costly) computeKnownBits/ComputeNumSignBits calls - use MaskedValueIsZero instead of computeKnownBits directly to simplify code.
2025-08-18 14:55:09 +01:00
Benjamin Maxwell
81c06d198e
Reland "[AArch64][SME] Port all SME routines to RuntimeLibcalls" (#153417)
This updates everywhere we emit/check an SME routines to use
RuntimeLibcalls to get the function name and calling convention.
2025-08-18 14:53:40 +01:00
halbi2
2a02147ff5
[clang] [Sema] Simplify Expr::isUnusedResultAWarning for CXXConstructExpr (#153116)
…Expr

Two tests have new warnings because `warn_unused_result` is now
respected for constructor temporaries. These tests were newly added in
#112521 last year. This is good because the new behavior is better than
the old.

@Sirraide and @Mick235711 what do you think about it?
2025-08-18 06:49:04 -07:00
Aiden Grossman
5b2c3aac90
[MCA][X86] Pretend To Have a Stack Engine (#153348)
This patch removes RSP dependencies from push and pop instructions to
pretend that we have a stack engine. This does not model details like
sync uops that are relevant implementation details due to complexity.
This is just enabled on all X86 CPUs given LLVM does not have a
scheduling model for any X86 CPU that does not have a stack engine.

This fixes #152008.
2025-08-18 13:44:43 +00:00
Shilei Tian
e37eff5dcd
[AMDGPU] Add an option to completely disable kernel argument preload (#153975)
The existing `amdgpu-kernarg-preload-count` can't be used as a switch to
turn it off if it is set to 0. This PR adds an extra option to turn it
off.

Fixes SWDEV-550147.
2025-08-18 09:44:20 -04:00
Jonathan Thackray
f38c83c582
[AArch64][llvm] Disassemble instructions in SYS alias encoding space more correctly (#153905)
For instructions in the `SYS` alias encoding space which take no
register operands, and where the unused 5 register bits are not all set
(0x31, 0b11111), then disassemble to a `SYS` alias and not the
instruction, since it is not considered valid.

This is because it is specified in the Arm ARM in text similar to this
(e.g. page C5-1037 of DDI0487L.b for `TLBI ALLE1`, or page C5-1585 for
`GCSPOPX`):
```
  Rt should be encoded as 0b11111. If the Rt field is not set to 0b11111,
  it is CONSTRAINED UNPREDICTABLE whether:
    * The instruction is UNDEFINED.
    * The instruction behaves as if the Rt field is set to 0b11111.
```

Since we want to follow "should" directives, and not encourage undefined
behaviour, only assemble or disassemble instructions considered valid.
Add an extra test-case for this, and all existing test-cases are
continuing to pass.
2025-08-18 14:41:41 +01:00
Timm Baeder
31d2db2a68
[clang][bytecode][NFC] Use UnsignedOrNone for Block::DeclID (#154104) 2025-08-18 15:40:44 +02:00
Erich Keane
340fa3e1bb
[OpenACC] Implement firstprivate lowering except init. (#153847)
This patch implements the basic lowering infrastructure, but does not
quite implement the copy initialization, which requires #153622.

It does however pass verification for the 'copy' section, which just
contains a yield.
2025-08-18 06:33:40 -07:00
Aiden Grossman
1650e4a73c
[X86] Remove TuningPOPCNTFalseDeps from AlderLake (#154004)
This false dependency issue was fixed in CannonLake looking at the data
from uops.info. This is confirmed not to be an issue based on
benchmarking data in #153983. Setting this can potentially lead to extra
xor instructions whihc could consume extra frontend/renaming resources.

None of the other CPUs that have had this fixed have the tuning flag.

Fixes #153983.
2025-08-18 06:31:16 -07:00
Matthias Springer
f84aaa6eaa
[mlir][Transforms] Dialect conversion: Add flag to dump materialization kind (#119532)
Add a debugging flag to the dialect conversion to dump the
materialization kind. This flag is useful to find out whether a missing
materialization rule is for source or target materializations.

Also add missing test coverage for the `buildMaterializations` flag.
2025-08-18 13:25:18 +00:00