549008 Commits

Author SHA1 Message Date
Mehdi Amini
dfaebe7f48
[MLIR] Fix Liveness analysis handling of unreachable code (#153973)
This patch is forcing all values to be initialized by the
LivenessAnalysis, even in dead blocks. The dataflow framework will skip
visiting values when its already knows that a block is dynamically
unreachable, so this requires specific handling.
Downstream code could consider that the absence of liveness is the same
a "dead".
However as the code is mutated, new value can be introduced, and a
transformation like "RemoveDeadValue" must conservatively consider that
the absence of liveness information meant that we weren't sure if a
value was dead (it could be a newly introduced value.

Fixes #153906
2025-08-18 20:50:36 +00:00
Mehdi Amini
191e7eba93
[MLIR] Stop visiting unreachable blocks in the walkAndApplyPatterns driver (#154038)
This is similar to the fix to the greedy driver in #153957 ; except that
instead of removing unreachable code, we just ignore it.

Operations like:

```
%add = arith.addi %add, %add : i64
```

are legal in unreachable code.
Unfortunately many patterns would be unsafe to apply on such IR and can
lead to crashes or infinite loops.
2025-08-18 20:46:59 +00:00
Oliver Hunt
624b724ca6
[clang][PAC] ptrauth_qualifier and ptrauth_intrinsic should only be available on Darwin (#153912)
For backwards compatibility reasons the `ptrauth_qualifier` and
`ptrauth_intrinsic` features need to be testable with `__has_feature()`
on Apple platforms, but for other platforms this backwards compatibility
issue does not exist.

This PR resolves these issues by making the `ptrauth_qualifier` and
`ptrauth_intrinsic` tests conditional upon a darwin target. This also
allows us to revert the ptrauth_qualifier check from an extension to a
feature test again, as is required on these platforms.

At the same time we introduce a new predefined macro `__PTRAUTH__` that
answers the same question as `__has_feature(ptrauth_qualifier)` and
`__has_feature(ptrauth_intrinsic)` as those tests are synonymous and
only exist separately for compatibility reasons.

The requirement to test for the `__PTRAUTH__` macro also resolves the
hazard presented by mixing the `ptrauth_qualifier` flag (that impacts
ABI and security policies) with `-pedantics-errors`, which makes
`__has_extension` return false for all extensions.

---------

Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
2025-08-18 20:29:26 +00:00
Charitha Saumya
9617ce4862
[vector][distribution] Bug fix in moveRegionToNewWarpOpAndAppendReturns (#153656) 2025-08-18 13:26:08 -07:00
Stanislav Mekhanoshin
13716843eb
[AMDGPU] Make s_setprio_inc_wg a scheduling boundary (#154188) 2025-08-18 13:20:38 -07:00
Thurston Dang
4220538e25
[msan] Handle multiply-add-accumulate; apply to AVX Vector Neural Network Instructions (VNNI) (#153927)
This extends the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to three-operand intrinsics (multiply-add-accumulate), and applies it to the AVX Vector Neural Network Instructions.

Updates the tests from https://github.com/llvm/llvm-project/pull/153135
2025-08-18 13:18:27 -07:00
Jordan Rupprecht
462929183c
[bazel] Port #153497: reland clang modules scanner change (#154192) 2025-08-18 20:15:12 +00:00
Stanislav Mekhanoshin
3395676a18
[AMDGPU] Fold copies of constant physical registers into their uses (#154183)
With current codegen this only affects src_flat_scratch_base_lo/hi.

Co-authored-by: Jay Foad <Jay.Foad@amd.com>

Co-authored-by: Jay Foad <Jay.Foad@amd.com>
2025-08-18 13:07:36 -07:00
Stanislav Mekhanoshin
c328c5d911
[AMDGPU] Combine to bf16 reciprocal square root. (#154185)
Co-authored-by: Ivan Kosarev <Ivan.Kosarev@amd.com>

Co-authored-by: Ivan Kosarev <Ivan.Kosarev@amd.com>
2025-08-18 13:07:20 -07:00
Sergei Barannikov
b20bbd48e8
[TableGen][DecoderEmitter] Store HW mode ID instead of name (NFC) (#154052)
This simplifies code a bit.
2025-08-18 22:53:09 +03:00
Baranov Victor
378d240125
[clang-tidy] Remove addition of emacs tag in checks headers (#153942)
After https://github.com/llvm/llvm-project/pull/118553, emacs tag is no
longer needed in LLVM files:
https://llvm.org/docs/CodingStandards.html#file-headers.
This patch removes it from `add_new_check.py` lowering complexity we
need to maintain.
2025-08-18 22:49:54 +03:00
Florian Hahn
7e9989390d
[VPlan] Materialize Build(Struct)Vectors for VPReplicateRecipes. (NFCI) (#151487)
Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.

Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.

PR: https://github.com/llvm/llvm-project/pull/151487
2025-08-18 20:49:42 +01:00
Konrad Kleine
f5a648f919
[doc] Add documentation for clang-change-namespace (#148277)
This adds rst documentation for the `clang-change-namespace` program.

Fixes #35519
2025-08-18 21:46:34 +02:00
Naveen Seth Hanig
9403c2d64d
Reland [clang][modules-driver] Add scanner to detect C++20 module presence (#153497)
This patch is part of a series to support driver managed module builds
for C++ named modules and Clang modules.
This introduces a scanner that detects C++ named module usage early in
the driver with only negligible overhead.

For now, it is enabled only with the `-fmodules-driver` flag and serves
solely diagnostic purposes. In the future, the scanner will be enabled
for any (modules-driver compatible) compilation with two or more inputs,
and will help the driver determine whether to implicitly enable the
modules driver.

Since the scanner adds very little overhead, we are also exploring
enabling it for compilations with only a single input. This approach
could allow us to detect `import std` usage in a single-file
compilation, which would then activate the modules driver. For
performance measurements on this, see
https://github.com/naveen-seth/llvm-dev-cxx-modules-check-benchmark.

RFC for driver managed module builds:

https://discourse.llvm.org/t/rfc-modules-support-simple-c-20-modules-use-from-the-clang-driver-without-a-build-system

This patch relands the reland (2d31fc8) for commit ded1426. The earlier
reland failed due to a missing link dependency on `clangLex`. This
reland fixes the issue by adding the link dependency after discussing it
in the following RFC:

https://discourse.llvm.org/t/rfc-driver-link-the-driver-against-clangdependencyscanning-clangast-clangfrontend-clangserialization-and-clanglex
2025-08-18 21:21:08 +02:00
Daniel Thornburgh
986d7aa675
Bump ProtocolServerMCPTest timeout to 200ms (#154182)
This should reduce flakes observed in the Fuchsia AArch64 Linux LLDB CI
builders.
2025-08-18 12:19:19 -07:00
Stanislav Mekhanoshin
3d6177c14b
[AMDGPU] Avoid setting op_sel_hi bits if there is matrix_b_scale. NFCI. (#154176)
This is NFCI now as there is no matrix_b_scale without matrix_b_reuse,
but technically this condition shall be here.
2025-08-18 12:13:31 -07:00
Stanislav Mekhanoshin
e7c2c80fa1
[AMDGPU] Combine prng(undef) -> undef (#154160) 2025-08-18 12:13:16 -07:00
Usama Hameed
1bb7217050
[Sanitizers][Darwin][Test] The top few frames are inaccurate in UBSan. (#153899)
XFailing until further investigation

rdar://158303080
2025-08-18 12:08:45 -07:00
Utkarsh Saxena
d30fd562e8
[LifetimeSafety] Enhance benchmark script for new sub analyses (#149577)
Enhanced the lifetime safety analysis benchmark script with more
detailed performance metrics and a new nested loop test case. This is a
worst case for loan expiry analysis.

### What changed?

- Added a new test case `nested_loops` that generates code with N levels
of nested loops to test how analysis performance scales with loop
nesting depth
- Improved the trace file analysis to extract durations for sub-phases
of the lifetime analysis (FactGenerator, LoanPropagation, ExpiredLoans)
- Enhanced the markdown report generation to include:
    - Relative timing results as percentages of total Clang time
    - More detailed complexity analysis for each analysis phase

Report
# Lifetime Analysis Performance Report
> Generated on: 2025-08-18 13:29:57 

---

## Test Case: Pointer Cycle in Loop

**Timing Results:**

| N (Input Size) | Total Time | Analysis Time (%) | Fact Generator (%) |
Loan Propagation (%) | Expired Loans (%) |

|:---------------|-----------:|------------------:|-------------------:|---------------------:|------------------:|
| 10 | 10.75 ms | 24.61% | 0.00% | 24.38% | 0.00% |
| 25 | 64.98 ms | 86.08% | 0.00% | 86.02% | 0.00% |
| 50 | 709.37 ms | 98.53% | 0.00% | 98.51% | 0.00% |
| 75 | 3.13 s | 99.63% | 0.00% | 99.63% | 0.00% |
| 100 | 9.44 s | 99.85% | 0.00% | 99.84% | 0.00% |
| 150 | 45.31 s | 99.96% | 0.00% | 99.96% | 0.00% |

**Complexity Analysis:**

| Analysis Phase    | Complexity O(n<sup>k</sup>) |
|:------------------|:--------------------------|
| Total Analysis    | O(n<sup>3.87</sup> &pm; 0.01) |
| FactGenerator     | (Negligible)              |
| LoanPropagation   | O(n<sup>3.87</sup> &pm; 0.01) |
| ExpiredLoans      | (Negligible)              |

---

## Test Case: CFG Merges

**Timing Results:**

| N (Input Size) | Total Time | Analysis Time (%) | Fact Generator (%) |
Loan Propagation (%) | Expired Loans (%) |

|:---------------|-----------:|------------------:|-------------------:|---------------------:|------------------:|
| 10 | 8.54 ms | 0.00% | 0.00% | 0.00% | 0.00% |
| 50 | 40.85 ms | 65.09% | 0.00% | 64.61% | 0.00% |
| 100 | 207.70 ms | 93.58% | 0.00% | 93.46% | 0.00% |
| 200 | 1.54 s | 98.82% | 0.00% | 98.78% | 0.00% |
| 400 | 12.04 s | 99.72% | 0.00% | 99.71% | 0.01% |
| 800 | 96.73 s | 99.94% | 0.00% | 99.94% | 0.00% |

**Complexity Analysis:**

| Analysis Phase    | Complexity O(n<sup>k</sup>) |
|:------------------|:--------------------------|
| Total Analysis    | O(n<sup>3.01</sup> &pm; 0.00) |
| FactGenerator     | (Negligible)              |
| LoanPropagation   | O(n<sup>3.01</sup> &pm; 0.00) |
| ExpiredLoans      | (Negligible)              |

---

## Test Case: Deeply Nested Loops

**Timing Results:**

| N (Input Size) | Total Time | Analysis Time (%) | Fact Generator (%) |
Loan Propagation (%) | Expired Loans (%) |

|:---------------|-----------:|------------------:|-------------------:|---------------------:|------------------:|
| 10 | 8.25 ms | 0.00% | 0.00% | 0.00% | 0.00% |
| 50 | 27.25 ms | 51.87% | 0.00% | 45.71% | 5.93% |
| 100 | 113.42 ms | 82.48% | 0.00% | 72.74% | 9.62% |
| 200 | 730.05 ms | 95.24% | 0.00% | 83.95% | 11.25% |
| 400 | 5.40 s | 98.74% | 0.01% | 87.05% | 11.68% |
| 800 | 41.86 s | 99.62% | 0.00% | 87.77% | 11.84% |

**Complexity Analysis:**

| Analysis Phase    | Complexity O(n<sup>k</sup>) |
|:------------------|:--------------------------|
| Total Analysis    | O(n<sup>2.97</sup> &pm; 0.00) |
| FactGenerator     | (Negligible)              |
| LoanPropagation   | O(n<sup>2.96</sup> &pm; 0.00) |
| ExpiredLoans      | O(n<sup>2.97</sup> &pm; 0.00) |

---
2025-08-18 19:07:41 +00:00
Jonas Devlieghere
4b94c08a57
[lldb] Relax the error message in TestProcessCrashInfo.py (#153653)
The error message has been updated in macOS 26. Relax the error message
to check the more generic "BUG IN CLIENT OF LIBMALLOC" rather than the
error message that comes after.
2025-08-18 14:01:41 -05:00
Trevor Gross
549d7c4f35
[SPARC] Change half to use soft promotion rather than PromoteFloat (#152727)
`half` currently uses the default legalization of promoting to a `f32`;
however, this implementation implements math in a way that results in
incorrect rounding. Switch to the soft promote implementation, which
does not have this problem.

The SPARC ABI does not specify a `_Float16` type, so there is no concern
with keeping interface compatibility.

Fixes the SPARC part of
https://github.com/llvm/llvm-project/issues/97975
Fixes the SPARC part of
https://github.com/llvm/llvm-project/issues/97981
2025-08-18 20:56:24 +02:00
Matthias Braun
43df97a909
llvm-profgen: Avoid "using namespace" in headers (#147631)
Avoid global `using namespace` directives in headers as they are bad
style.
2025-08-18 18:55:23 +00:00
Krzysztof Parzyszek
8429f7faaa
[flang][OpenMP] Parsing support for DYN_GROUPPRIVATE (#153615)
This does not perform semantic checks or lowering.
2025-08-18 13:35:02 -05:00
Steven Perron
0fb1057e40
[SPIRV] Filter disallowed extensions for env (#150051)
Not all SPIR-V extensions are allows in every environment. When we use
the `-spirv-ext=all` option, the backend currently believes that all
extensions can be used.

This commit filters out the extensions on the command line to remove
those that are not known to be allowed for the current environment.

Alternatives considered: I considered modifying the
SPIRVExtensionsParser::parse to use a different list of extensions for
"all" depending on the target triple. However that does not work because
the target triple is not available, and cannot be made available in a
reasonable way.

Fixes #147717

---------

Co-authored-by: Victor Lomuller <victor@codeplay.com>
2025-08-18 18:33:58 +00:00
Thurston Dang
ade755d62b
[msan] Add Instrumentation for Avx512 Instructions: pmaddw, pmaddubs (#153919)
This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512
equivalent of the pmaddw and pmaddubs intrinsics:
  <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
  <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
2025-08-18 11:31:15 -07:00
Kyle Wang
064f02dac0
[VectorCombine] Preserve scoped alias metadata (#153714)
Right now if a load op is scalarized, the `!alias.scope` and `!noalias`
metadata are dropped. This PR is to keep them if exist.
2025-08-18 18:16:32 +00:00
Jordan Rupprecht
8d256733a0
[bazel] Port #151175: VectorFromElementsLowering (#154169) 2025-08-18 13:07:05 -05:00
Brox Chen
d49aab10bd
Revert "[AMDGPU][True16][CodeGen] use vgpr16 for zext patterns (#1538… (#154163)
This reverts commit 7c53c6162bd43d952546a3ef7d019babd5244c29.

This patch hit an issue in hip test. revert and will reopen later
2025-08-18 14:01:19 -04:00
Shaoce SUN
7e8ff2afa9
[RISCV][GISel] Optimize +0.0 to use fcvt.d.w for s64 on rv32 (#153978)
Resolve the TODO: on RV32, when constructing the double-precision
constant `+0.0` for `s64`, `BuildPairF64Pseudo` can be optimized to use
the `fcvt.d.w` instruction to generate the result directly.
2025-08-18 17:52:24 +00:00
Justin Fargnoli
58de8f2c25
[Inliner] Add option (default off) to inline all calls regardless of the cost (#152365)
Add a default off option to the inline cost calculation to always inline
all viable calls regardless of the cost/benefit and cost/threshold
calculations.

For performance reasons, some users require that all calls be inlined.
Rather than forcing them to adjust the inlining threshold to an
arbitrarily high value, offer an option to inline all calls.
2025-08-18 17:48:49 +00:00
LauraElanorJones
350f4a3e3b
Decent to Descent (#154040)
[lldb] Rename RecursiveDecentFormatter to RecursiveDescentFormatter (NFC)
2025-08-18 12:47:14 -05:00
Krzysztof Drewniak
7f27482a32
[AMDGPU][LowerBufferFatPointers] Fix lack of rewrite when loading/storing null (#154128)
Fixes #154056.

The fat buffer lowering pass was erroniously detecting that it did not
need to run on functions that only load/store to the null constant (or
other such constants). We thought this would be covered by specializing
constants out to instructions, but that doesn't account foc trivial
constants like null. Therefore, we check the operands of instructions
for buffer fat pointers in order to find such constants and ensure the
pass runs.

---------

Co-authored-by: Nikita Popov <github@npopov.com>
2025-08-18 12:32:54 -05:00
Shafik Yaghmour
99829573cc
[Clang][Webassembly] Remove unrachable code in ParseTypeQualifierListOpt (#153729)
Static analysis flagged this goto as unreachable and indeed it is, so
removing it.
2025-08-18 10:27:37 -07:00
Aiden Grossman
6960bf556c
[Github] Drop llvm-project-tests
All users of this have been claned up so we can now drop it fully.

Reviewers: cmtice, tstellar

Reviewed By: cmtice

Pull Request: https://github.com/llvm/llvm-project/pull/153877
2025-08-18 10:20:31 -07:00
Panagiotis Karouzakis
c2e7fad446
[DemandedBits] Support non-constant shift amounts (#148880)
This patch adds support for the shift operators to handle non-constant
shift operands.

ashr proof -->https://alive2.llvm.org/ce/z/EN-siK
lshr proof --> https://alive2.llvm.org/ce/z/eeGzyB
shl proof --> https://alive2.llvm.org/ce/z/dpvbkq
2025-08-19 01:11:16 +08:00
Yang Bai
4eb1a07d7d
[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering (#151175)
This patch introduces a new unrolling-based approach for lowering
multi-dimensional `vector.from_elements` operations.

**Implementation Details:**
1. **New Transform Pattern**: Added `UnrollFromElements` that unrolls a
N-D(N>=2) from_elements op to a (N-1)-D from_elements op align the
outermost dimension.
2. **Utility Functions**: Added `unrollVectorOp` to reuse the unroll
algo of vector.gather for vector.from_elements.
3. **Integration**: Added the unrolling pattern to the
convert-vector-to-llvm pass as a temporal transformation.
4. Use direct LLVM dialect operations instead of intermediate
vector.insert operations for efficiency in `VectorFromElementsLowering`.

**Example:**
```mlir
// unroll
%v = vector.from_elements  %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%vec_1d_0 = vector.from_elements %e0, %e1 : vector<2xf32>
%vec_2d_0 = vector.insert %vec_1d_0, %poison_2d [0] : vector<2xf32> into vector<2x2xf32>
%vec_1d_1 = vector.from_elements %e2, %e3 : vector<2xf32>
%result = vector.insert %vec_1d_1, %vec_2d_0 [1] : vector<2xf32> into vector<2x2xf32>

// convert-vector-to-llvm
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>
=>
%poison_2d = ub.poison : vector<2x2xf32>
%poison_2d_cast = builtin.unrealized_conversion_cast %poison_2d : vector<2x2xf32> to !llvm.array<2 x vector<2xf32>>
%poison_1d_0 = llvm.mlir.poison : vector<2xf32>
%c0_0 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_0_0 = llvm.insertelement %e0, %poison_1d_0[%c0_0 : i64] : vector<2xf32>
%c1_0 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_0_1 = llvm.insertelement %e1, %vec_1d_0_0[%c1_0 : i64] : vector<2xf32>
%vec_2d_0 = llvm.insertvalue %vec_1d_0_1, %poison_2d_cast[0] : !llvm.array<2 x vector<2xf32>>
%poison_1d_1 = llvm.mlir.poison : vector<2xf32>
%c0_1 = llvm.mlir.constant(0 : i64) : i64
%vec_1d_1_0 = llvm.insertelement %e2, %poison_1d_1[%c0_1 : i64] : vector<2xf32>
%c1_1 = llvm.mlir.constant(1 : i64) : i64
%vec_1d_1_1 = llvm.insertelement %e3, %vec_1d_1_0[%c1_1 : i64] : vector<2xf32>
%vec_2d_1 = llvm.insertvalue %vec_1d_1_1, %vec_2d_0[1] : !llvm.array<2 x vector<2xf32>>
%result = builtin.unrealized_conversion_cast %vec_2d_1 : !llvm.array<2 x vector<2xf32>> to vector<2x2xf32>
```

---------

Co-authored-by: Nicolas Vasilache <Nico.Vasilache@amd.com>
Co-authored-by: Yang Bai <yangb@nvidia.com>
Co-authored-by: James Newling <james.newling@gmail.com>
Co-authored-by: Diego Caballero <dieg0ca6aller0@gmail.com>
2025-08-18 10:09:12 -07:00
Tobias Stadler
8135b7c1ab
[LV] Emit all remarks for unvectorizable instructions (#153833)
If ExtraAnalysis is requested, emit all remarks caused by unvectorizable instructions - instead of only the first.
This is in line with how other places handle DoExtraAnalysis and it can be quite helpful to get info about all instructions in a loop that prevent vectorization.
2025-08-18 18:04:53 +01:00
Ramkumar Ramachandra
97f554249c
[VPlan] Preserve nusw in createInBoundsPtrAdd (#151549)
Rename createInBoundsPtrAdd to createNoWrapPtrAdd, and preserve nusw as
well as inbounds at the callsite.
2025-08-18 17:48:42 +01:00
Andreas Jonson
1b60236200
[SimplifyCFG] Avoid redundant calls in gather. (NFC) (#154133)
Split out from https://github.com/llvm/llvm-project/pull/154007 as it
showed compile time improvements

NFC as there needs to be at least two icmps that is part of the chain.
2025-08-18 18:45:52 +02:00
Nishant Patel
4a9d038acd
[MLIR][XeGPU] Distribute load_nd/store_nd/prefetch_nd with offsets from Wg to Sg (#153432)
This PR adds pattern to distribute the load/store/prefetch nd ops with
offsets from workgroup to subgroup IR. This PR is part of the transition
to move offsets from create_nd to load/store/prefetch nd ops.

Create_nd PR : #152351
2025-08-18 09:45:29 -07:00
LLVM GN Syncbot
d6e0922a5e [gn build] Port 3ecfc0330d93 2025-08-18 16:02:02 +00:00
Damyan Pepper
cc49f3b3e1
[NFC][HLSL] Remove confusing enum aliases / duplicates (#153909)
Remove:

* DescriptorType enum - this almost exactly shadowed the ResourceClass
enum
* ClauseType aliased ResourceClass

Although these were introduced to make the HLSL root signature handling
code a bit cleaner, they were ultimately causing confusion as they
appeared to be unique enums that needed to be converted between each
other.

Closes #153890
2025-08-18 08:58:33 -07:00
Yitzhak Mandelbaum
3ecfc0330d
[clang][dataflow] Add support for serialization and deserialization. (#152487)
Adds support for compact serialization of Formulas, and a corresponding
parse function. Extends Environment and AnalysisContext with necessary
functions for serializing and deserializing all formula-related parts of
the environment.
2025-08-18 11:55:12 -04:00
Jeremy Kun
c67d27dad0
[mlir][Presburger] NFC: return var index from IntegerRelation::addLocalFloorDiv (#153463)
addLocalFloorDiv currently returns void and requires the caller to know
that the newly added local variable is in a particular index. This
commit returns the index of the newly added variable so that callers
need not tie themselves to this implementation detail.

I found one relevant callsite demonstrating this and updated it. I am
using this API out of tree and wanted to make our out-of-tree code a bit
more resilient to upstream changes.
2025-08-18 08:47:47 -07:00
Antonio Frighetto
33761df961
Revert "[SimpleLoopUnswitch] Record loops from unswitching non-trivial conditions"
This reverts commit e9de32fd159d30cfd6fcc861b57b7e99ec2742ab due to
multiple performance regressions observed across downstream Numba
benchmarks (https://github.com/llvm/llvm-project/issues/138509#issuecomment-3193855772).

While avoiding non-trivial unswitches on newly-cloned loops helps
mitigate the pathological case reported in https://github.com/llvm/llvm-project/issues/138509,
it may as well make the IR less friendly to vectorization / loop-
canonicalization (in the test reported, previously no select with
loop-carried dependence existed in the new specialized loops),
leading the abovementioned approach to be reconsidered.
2025-08-18 17:40:08 +02:00
Aiden Grossman
17f5f5ba55 [X86] Avoid Register implicit int conversion
PushedRegisters in this patch needs to be of type int64_t because iot is
grabbing registers from immediate operands of pseudo instructions.
However, we then compare to an actual register type later, which relies
on the implicit conversion within Register to int, which can result in
build failures in some configurations.
2025-08-18 15:37:25 +00:00
黃國庭
0773854575
[DAG] Fold trunc(avg(x,y)) for avgceil/floor u/s nodes if they have sufficient leading zero/sign bits (#152273)
avgceil version :  https://alive2.llvm.org/ce/z/2CKrRh  

Fixes #147773 

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-08-18 16:36:26 +01:00
Alex MacLean
d12f58ff11
[NVVM] Add various intrinsic attrs, cleanup and consolidate td (#153436)
- llvm.nvvm.reflect - Use a PureIntrinsic for (adding speculatable),
this will be replaced by a constant prior to lowering so speculation is
fine.
- llvm.nvvm.tex.* - Add [IntrNoCallback, IntrNoFree, IntrWillReturn]
- llvm.nvvm.suld.* - Add [IntrNoCallback, IntrNoFree] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.sust.* - Add [IntrNoCallback, IntrNoFree, IntrWriteMem] and
[IntrWillReturn] when not using "clamp" mode
- llvm.nvvm.[suq|txq|istypep].* - Use DefaultAttrsIntrinsic
- llvm.nvvm.read.ptx.sreg.* - Add [IntrNoFree, IntrWillReturn] to
non-constant reads as well.
2025-08-18 08:33:23 -07:00
Andres-Salamanca
916218ccbd
[CIR] Upstream GotoOp (#153701)
This PR upstreams `GotoOp`. It moves some tests from the `goto` test
file to the `label` test file, and adds verify logic to `FuncOp`. The
gotosSolver, required for lowering, will be implemented in a future PR.
2025-08-18 10:25:40 -05:00
Craig Topper
60aa0d4bfc
[RISCV] Add P-ext MC support for pli.dh, pli.db, and plui.dh. (#153972)
Refactor the pli.b/h/w and plui.h/w tablegen classes.
2025-08-18 08:23:14 -07:00