541742 Commits

Author SHA1 Message Date
Aiden Grossman
b7be8786af
Reapply "[CI] Migrate to runtimes build" (#143612)
This reverts commit 6f62979a5a5bcf70d65f23e0991a274e6df5955b.

The reapplies commit 80ea5f46df3e365a0a2112889bb91732167b6214.

That commit was reverted because it was causing compiler-rt test
failures due to tysan not having its dependencies set up properly within
CMake. That situation has since been rectified in
3cef099ceddccefca8e11268624397cde9e04af6.

Reviewers: lnihlen, rnk, gburgessiv, cmtice

Reviewed By: rnk, cmtice

Pull Request: https://github.com/llvm/llvm-project/pull/144033
2025-06-20 15:07:00 -07:00
Anshil Gandhi
94865edfa8
[Reland][InstCombine] Iterative replacement in PtrReplacer (#144626)
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simpletraversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does not have a replacement.

This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.

This is a reland of https://github.com/llvm/llvm-project/pull/137215.
2025-06-20 18:03:54 -04:00
Finn Plummer
e6ee2c7c7b
[HLSL][RootSignature] Implement validation of resource ranges for RootDescriptors (#140962)
As was established
[previously](https://github.com/llvm/llvm-project/pull/140957), we
created a structure to model a resource range and to detect an overlap
in a given set of these.

However, a resource range only overlaps with another resource range if
they have:
- equivalent ResourceClass (SRV, UAV, CBuffer, Sampler)
- equivalent resource name-space
- overlapping shader visibility

For instance, the following don't overlap even though they have the same
register range:
- `CBV(b0)` and `SRV(t0)` (different resource class)
- `CBV(b0, space = 0)` and `CBV(b0, space = 1)` (different space)
- `CBV(b0, visibility = Pixel)` and `CBV(b0, visibility = Domain)`
(non-overlapping visibility)

The first two clauses are naturally modelled by grouping all the
`RangeInfo`s that have the equivalent `ResourceClass` and `Space` values
together and check if there is any overlap on a `ResourceRange` for all
these `RangeInfo`s. However, `Visibility` is not quite as easily mapped
(`Visibility = All` would overlap with any other visibility). So we will
instead need to track a `ResourceRange` for each of the `Visibility`
types in a group. Then we can determine when inserting a range of the
same group if it would overlap with any overlapping visibilities.

The collection of `RangeInfo` for `RootDescriptor`s, sorting of the
`RangeInfo`s into the groups and finally the insertion of each point
into their respective `ResourceRange`s are implemented. Furthermore, we
integrate this into `SemaHLSL` to provide a diagnostic for each entry
function that uses the invalid root signature.

- Implements collection of `RangeInfo` for `RootDescriptors`
- Implements resource range validation in `SemaHLSL`
- Add diagnostic testing of error production in
`RootSignature-resource-ranges-err.hlsl`
- Add testing to ensure no errors are raised in valid root signatures
`RootSignature-resource-ranges.hlsl`

Part 2 of https://github.com/llvm/llvm-project/issues/129942

A final pr will be produced to integrate the analysis of
`DescriptorTable`, `StaticSampler` and `RootConstants` by defining how
to construct the `RangeInfo` from their elements respectively.
2025-06-20 14:54:58 -07:00
Uzair Nawaz
a911543437
[libc] Implemented wcrtomb internal function and public libc function (#144596)
Implemented internal wcrtomb function using the CharacterConverter class
public libc function calls this internal function to perform the
conversion
2025-06-20 14:43:00 -07:00
Amir Ayupov
f0d32575a1
[BOLT][NFCI] Use FileSymbols for local symbol disambiguation (#89088)
Remove SymbolToFileName mapping from every local symbol to its
containing FILE symbol name, and reuse FileSymbols to disambiguate
local symbols instead.

Also removes the check for `ld-temp.o` file symbol which was added to
prevent LTO build mode from affecting the disambiguated name. This may
cause incompatibility when using the profile collected on a binary built
in a different mode than the input binary.

Addresses #90661.

Speeds up discover file objects by 5-10% for large binaries:
- binary with ~1.2M symbols: 12.6422s -> 12.0297s
- binary with ~4.5M symbols: 48.8851s -> 43.7315s
2025-06-20 14:29:32 -07:00
Philip Reames
5886f0a183
[RISCV] Allow larger offset when matching build_vector as vid sequence (#144756)
I happened to notice that when legalizing get.active.lane.mask with
large vectors we were materializing via constant pool instead of just
shifting by a constant.

We should probably be doing a full cost comparison for the different
lowering strategies as opposed to our current adhoc heuristics, but the
few cases this regresses seem pretty minor. (Given the reduction in vset
toggles, they might not be regressions at all.)

---------

Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-06-20 14:20:17 -07:00
Stanislav Mekhanoshin
0c2191b3a7
[AMDGPU] Omit image waits in function prologue on gfx1250 (#145097) 2025-06-20 14:11:29 -07:00
sribee8
4c97a91dc0
[libc] Added closing quote (#145101)
Error message was missing a closing quote, added it.

Co-authored-by: Sriya Pratipati <sriyap@google.com>
2025-06-20 21:00:56 +00:00
Nishant Patel
9c1ce31f54
[mlir][vector] Add unroll patterns for vector.load and vector.store (#143420)
This PR adds unroll patterns for vector.load and vector.store. This PR is follow up of #137558
2025-06-20 13:50:25 -07:00
David Green
b6445ac0c5
[GlobalISel] Create a common register_vector_matchinfo (#144306)
Several combiner use MatchInfo that are just SmallVector<Register>. This
creates a common register_vector_matchinfo that they can all use.
2025-06-20 21:37:02 +01:00
Med Ismail Bennani
58f48011b3
[lldb] Add support for x86_64h to scripted process (#145099)
This patch adds support to the haswell sub-architecture (x86_64h) to
scripted processes.

rdar://147208252

Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
2025-06-20 13:28:21 -07:00
Michael Spencer
6110dead89
[clang][scan-deps] Add option to disable caching stat failures (#144000)
While the source code isn't supposed to change during a build, in some
environments it does. This adds an option that disables caching of stat
failures, meaning that source files can be added to the build during
scanning.

This adds a `-no-cache-negative-stats` option to clang-scan-deps to
enable this behavior. There are no tests for clang-scan-deps as there's
no reliable way to do so from it. A unit test has been added that
modifies the filesystem between scans to test it.
2025-06-20 13:28:05 -07:00
Peter Collingbourne
491b82a5ec ELF: Add branch-to-branch optimization.
When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
    N           Min           Max        Median           Avg        Stddev
x 512     1.2264546     1.3481076     1.2970261     1.2965788   0.018620888
+ 512     1.2561196     1.3839965     1.3214632     1.3209327   0.019443971
Difference at 95.0% confidence
	0.0243538 +/- 0.00233202
	1.87831% +/- 0.179859%
	(Student's t, pooled s = 0.0190369)
```

[1] https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366
2025-06-20 13:16:24 -07:00
Rodolfo Wottrich
3b9795b3d3
[AArch64] Add CodeGen support for scalar FEAT_CPA (#105669)
CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE
architecture extensions for A-profile.
The new CPA instructions perform regular pointer arithmetic (such as
base register + offset) but check for overflow in the most significant
bits of the result, enhancing security by detecting address tampering.

In this patch we intend to capture the semantics of pointer arithmetic
when it is not folded into loads/stores, then generate the appropriate
scalar CPA instructions. In order to preserve pointer arithmetic
semantics through the backend, we use the PTRADD SelectionDAG node type.

Use backend option `-aarch64-use-featcpa-codegen=true` to enable CPA
CodeGen (for a target with CPA enabled).

The story of this PR is that initially it introduced the PTRADD
SelectionDAG node and the respective visitPTRADD() function, adapted
from the CHERI/Morello LLVM tree. The original authors are
@davidchisnall, @jrtc27, @arichardson.
After a while, @ritter-x2a took the part of the code that was
target-independent and merged it separately in #140017. This PR thus
remains as the AArch64-part only.

Mode details about the CPA extension can be found at:

-
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023
- https://developer.arm.com/documentation/ddi0602/2023-09/ (e.g ADDPT
instruction)

This PR follows #79569.
It does not address vector FEAT_CPA instructions.
2025-06-20 21:14:52 +01:00
Florian Hahn
f8ffb4e7cd
[VPlan] Simplify ExtractLastElement(Broadcast(A)) -> A.
Remove trivial ExtractLastElement VPInstructions.
2025-06-20 21:08:14 +01:00
sribee8
d078ce7c98
[libc] mbrtowc implementation (#144760)
implemented the internal and public mbrtowc as well as tests for the
public function.

---------

Co-authored-by: Sriya Pratipati <sriyap@google.com>
2025-06-20 20:00:59 +00:00
Stanislav Mekhanoshin
3a66e20652
[AMDGPU] Add gfx1250 runlines to vop3 dpp tests. NFC. (#145089)
dpp8 disasm test does not work yet.
2025-06-20 12:57:36 -07:00
nerix
d8924d4da7
[LLDB] Explicitly use python for version fixup (#144217)
On Windows, the post build command would open the script in the default
editor, since it doesn't know about shebangs. This effectively adds
`python3` in front of the command.

Amends https://github.com/llvm/llvm-project/pull/142871 /
https://github.com/llvm/llvm-project/pull/141116
2025-06-20 14:54:06 -05:00
Amir Ayupov
4959e8a1da
[BOLT][NFCI] Use heuristic for matching split global functions (#90429)
This change speeds up fragment matching for large BOLTed binaries where
all fragments of global parent functions are put under `bolt-pseudo.o`
file symbol:
- before: iterating over symbols under `bolt-pseudo.o` only to fail
  to find a parent,
- after: bail out immediately and use a global parent by name.

Test Plan: NFC, updated register-fragments-bolt-symbols.s
2025-06-20 12:46:56 -07:00
Amir Ayupov
6d8c6ef90c
[BOLT][NFC] Simplify doTrace in BAT mode (#143233)
`BoltAddressTranslation::getFallthroughsInTrace` iterates over address
translation map entries and therefore has direct access to both original
and translated offsets. Return the translated offsets in fall-throughs
list to avoid duplicate address translation inside `doTrace`.

Test Plan: NFC
2025-06-20 12:45:21 -07:00
Maksim Levental
227f759644
[mlir][python] expose operation.block (#145088)
Expose `operation-getBlock()` in python.
2025-06-20 15:34:43 -04:00
Stanislav Mekhanoshin
affcc5e728
[AMDGPU] Add s_wait_xcnt gfx1250 instruction (#145086) 2025-06-20 12:28:18 -07:00
Farzon Lotfi
2a4207e732
[DirectX] Don't limit visitGetElementPtrInst to global ptrs (#144959)
fixes #144608
- there is a getPointerOperandIndex function so we don't need to iterate
the operands trying to find the pointer. This resulted in a small
cleanup to visitStoreInst and visitLoadInst.

- The meat of this change was in visitGetElementPtrInst to account for
allocas and not bail when we don't find a global.
2025-06-20 15:23:20 -04:00
Stanislav Mekhanoshin
958dc86026
[AMDGPU] Don't insert wait instructions that are not supported by gfx1250 (#145084)
No tests yet, but it will allow further tests not to be
polluted with these waits.
2025-06-20 12:21:45 -07:00
joaosaffran
b5d5708128
[HLSL] Add descriptor table metadata parsing (#142492)
Implements descriptor table parsing from root signature metadata. This
is required to support root signatures in hlsl.
Closes: #[126640](https://github.com/llvm/llvm-project/issues/126640)

---------

Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
2025-06-20 12:12:02 -07:00
Stanislav Mekhanoshin
8d2eea96b3
[AMDGPU] gfx1250 SOPP MC tests. NFC. (#145082) 2025-06-20 12:06:55 -07:00
Philip Reames
c103bbc836
[LV] Consider whether vscale is a known power of two for iteration check (#144963)
Going mostly by the comment here - but it says "vscale is not
necessarily a power-of-2". Both in tree targets have vscale as a power
of two, and we have an existing TTI hook for that.
2025-06-20 11:37:27 -07:00
Fabian Mora
f159774352
[mlir][core|ptr] Add PtrLikeTypeInterface and casting ops to the ptr dialect (#137469)
This patch adds the `PtrLikeTypeInterface` type interface to identify
pointer-like types. This interface is defined as:

```
A ptr-like type represents an object storing a memory address. This object
is constituted by:
- A memory address called the base pointer. This pointer is treated as a
  bag of bits without any assumed structure. The bit-width of the base
  pointer must be a compile-time constant. However, the bit-width may remain
  opaque or unavailable during transformations that do not depend on the
  base pointer. Finally, it is considered indivisible in the sense that as
  a `PtrLikeTypeInterface` value, it has no metadata.
- Optional metadata about the pointer. For example, the size of the  memory
  region associated with the pointer.

Furthermore, all ptr-like types have two properties:
- The memory space associated with the address held by the pointer.
- An optional element type. If the element type is not specified, the
  pointer is considered opaque.
```

This patch adds this interface to `!ptr.ptr` and the `memref` type.

Furthermore, this patch adds necessary ops and type to handle casting
between `!ptr.ptr` and ptr-like types.

First, it defines the `!ptr.ptr_metadata` type. An opaque type to
represent the metadata of a ptr-like type. The rationale behind adding
this type, is that at high-level the metadata of a type like `memref`
cannot be specified, as its structure is tied to its lowering.

The `ptr.get_metadata` operation was added to extract the opaque pointer
metadata. The concrete structure of the metadata is only known when the
op is lowered.

Finally, this patch adds the `ptr.from_ptr` and `ptr.to_ptr` operations.
Allowing to cast back and forth between `!ptr.ptr` and ptr-like types.

```mlir
func.func @func(%mr: memref<f32, #ptr.generic_space>) -> memref<f32, #ptr.generic_space> {
  %ptr = ptr.to_ptr %mr : memref<f32, #ptr.generic_space> -> !ptr.ptr<#ptr.generic_space>
  %mda = ptr.get_metadata %mr : memref<f32, #ptr.generic_space>
  %res = ptr.from_ptr %ptr metadata %mda : !ptr.ptr<#ptr.generic_space> -> memref<f32, #ptr.generic_space>
  return %res : memref<f32, #ptr.generic_space>
}
```

It's future work to replace and remove the `bare-ptr-convention` through
the use of these ops.

---------

Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2025-06-20 14:23:39 -04:00
Krzysztof Parzyszek
925dbc7988 [flang][OpenMP] Fix namespace nesting after PR144960
Newly introduced Atomic.cpp fails to compile on its own, but somehow
compiles fine in the build. Maybe it's because PCH, but it needs to be
fixed nevertheless.
2025-06-20 13:22:58 -05:00
Deric C.
3f42c6bddd
[DirectX] Scalarize extractelement and insertelement with dynamic indices (#141676)
Fixes #141136

- Implement `visitExtractElementInst` and `visitInsertElementInst` in
`DXILDataScalarizerVisitor` to scalarize `extractelement` and
`insertelement` instructions whose index operand is not a `ConstantInt`
by converting the vector to an array and then loading from the array
- Rename the `replaceVectorWithArray` helper function to
`equivalentArrayTypeFromVector`, relocate the function toward the top of
the file, and remove the unused `Ctx` parameter
2025-06-20 11:20:30 -07:00
Luke Lau
521adc9fa2 [VPlan] Use createScalarZExtOrTrunc when expanding expandVPWidenIntOrFpInduction
Split off from #144666
2025-06-20 19:18:49 +01:00
Diego Caballero
ff6367b470
[[mlir][Vector] Add simple folders for vector.from_element/vector.to_elements (#144444)
This PR adds simple folders to remove no-op sequences of
`vector.from_elements` and `vector.to_elements`.
2025-06-20 11:16:46 -07:00
Yijia Gu
bae48ac3c0 [mlir][bazel] add missing deps for XeGPUTransforms 2025-06-20 11:14:14 -07:00
Florian Hahn
7f74a377d0
[LV] Regenerate uniform_across_vf* check lines.
Re-generate check lines to reduce diff in upcoming changes.

Also filters out the code after scalar.ph:, which is dead.
2025-06-20 19:10:26 +01:00
Sam Elliott
ab8b8c1e13
[TargetParser][cmake] Be Smarter about TableGen Deps (#144848)
This tries to be a bit smarter for the OLD behaviour of CMP0116, to glob
more relevant directories looking for possible dependencies.

The changes are:
- Remove some duplication of lines in the `tablegen` function.
- Put CURRENT_SOURCE_DIR into `tblgen_includes` (at the front)
- Glob all directories in `tblgen_includes`
- Give up on `local_tds` which was wrong when using tablegen to compile
a file in a different directory (as TargetParser does)
- Use `EXTRA_INCLUDES` in TargetParser `tablegen` calls.

This is still an under-approximation of what might be included, at least
comparing the RISCVTargetParserDef.inc.d (after building
`target_parser_gen`), and the list of deps in the ninja file when
explicitly setting CMP0116 to OLD.

Fixes #144639
2025-06-20 11:05:25 -07:00
Craig Topper
04e2e581ac
[RISCV] Treat bf16->f32 as separate ExtKind in combineOp_VLToVWOp_VL. (#144653)
This allows us to better track the narrow type we need and to fix
miscompiles if f16->f32 and bf16->f32 extends are mixed.

Fixes #144651.
2025-06-20 10:44:51 -07:00
Charitha Saumya
adc6228ea0
[mlir][xegpu] Refine layout assignment in XeGPU SIMT distribution. (#142687)
Changes:
* Decouple layout propagation from subgroup distribution and move it to
an independent pass.
* Refine layout assignment to handle control-flow ops correctly (scf.for, scf.while).
* Refine test cases.
2025-06-20 10:43:19 -07:00
Michal Rostecki
0d21c956a5
[BPF] Handle nested wrapper structs in BPF map definition traversal (#144097)
In Aya/Rust, BPF map definitions are nested in two nested types:

* A struct representing the map type (e.g., `HashMap`, `RingBuf`) that
provides methods for interacting with the map type (e.g. `HashMap::get`,
`RingBuf::reserve`).
* An `UnsafeCell`, which informs the Rust compiler that the type is
thread-safe and can be safely mutated even as a global variable. The
kernel guarantees map operation safety.

This leads to a type hierarchy like:

```rust
    pub struct HashMap<K, V, const M: usize, const F: usize = 0>(
        core::cell::UnsafeCell<HashMapDef<K, V, M, F>>,
    );
    const BPF_MAP_TYPE_HASH: usize = 1;
    pub struct HashMapDef<K, V, const M: usize, const F: usize = 0> {
        r#type: *const [i32; BPF_MAP_TYPE_HASH],
        key: *const K,
        value: *const V,
        max_entries: *const [i32; M],
        map_flags: *const [i32; F],
    }
```

Then used in the BPF program code as a global variable:

```rust
    #[link_section = ".maps"]
    static HASH_MAP: HashMap<u32, u32, 1337> = HashMap::new();
```

Which is an equivalent of the following BPF map definition in C:

```c
    #define BPF_MAP_TYPE_HASH 1
    struct {
        int (*type)[BPF_MAP_TYPE_HASH];
        typeof(int) *key;
        typeof(int) *value;
        int (*max_entries)[1337];
    } map_1 __attribute__((section(".maps")));
```

Accessing the actual map definition requires traversing:

```
  HASH_MAP -> __0 -> value
```

Previously, the BPF backend only visited the pointee types of the
outermost struct, and didn’t descend into inner wrappers. This caused
issues when the key/value types were custom structs:

```rust
    // Define custom structs for key and values.
    pub struct MyKey(u32);
    pub struct MyValue(u32);

    #[link_section = ".maps"]
    #[export_name = "HASH_MAP"]
    pub static HASH_MAP: HashMap<MyKey, MyValue, 10> = HashMap::new();
```

These types weren’t fully visited and appeared in BTF as forward
declarations:

```
    #30: <FWD> 'MyKey' kind:struct
    #31: <FWD> 'MyValue' kind:struct
```

The fix is to enhance `visitMapDefType` to recursively visit inner
composite members. If a member is a composite type (likely a wrapper),
it is now also visited using `visitMapDefType`, ensuring that the
pointee types of the innermost stuct members, like `MyKey` and
`MyValue`, are fully resolved in BTF.

With this fix, the correct BTF entries are emitted:

```
    #6: <STRUCT> 'MyKey' sz:4 n:1
            #00 '__0' off:0 --> [7]
    #7: <INT> 'u32' bits:32 off:0
    #8: <PTR> --> [9]
    #9: <STRUCT> 'MyValue' sz:4 n:1
            #00 '__0' off:0 --> [7]
```

Fixes: #143361
2025-06-20 10:17:36 -07:00
Thurston Dang
33a92af1b2
[msan] Add off-by-default flag to fix false negatives from partially undefined constant fixed-length vectors (#143837)
This patch adds an off-by-default flag which, when enabled via `-mllvm -msan-poison-undef-vectors=true`, fixes a false negative in MSan (partially-undefined constant fixed-length vectors). It is currently off by default since, by fixing the false positive, code/tests that previously passed MSan may start failing. The default will be changed in a future patch.

Prior to this patch, MSan computes that partially-undefined constant fixed-length vectors are fully initialized, which leads to false negatives; moreover, benign vector rewriting could theoretically flip MSan's shadow computation from initialized to uninitialized or vice-versa (*). `-msan-poison-undef-vectors=true` calculates the shadow precisely: for each element of the vector, the corresponding shadow is fully uninitialized if the element is undefined/poisoned, otherwise it is fully initialized.

Updates the test from https://github.com/llvm/llvm-project/pull/143823

(*) For example:
  ```
  %x = insertelement <2 x i64> <i64 0, i64 poison>, i64 42, i64 0
  %y = insertelement <2 x i64> <i64 poison, i64 poison>, i64 42, i64 0
  ```
%x and %y are equivalent but, prior to this patch, MSan incorrectly computes the shadow of %x as <0, 0> rather than <0, -1>.
2025-06-20 10:11:12 -07:00
Simon Pilgrim
f8ee5774b6
[X86] combineConcatVectorOps - only concat AVX1 v4i64 shift-by-32 to a shuffle if the concat is free (#145043) 2025-06-20 18:09:07 +01:00
Maryam Moghadas
65cb3bcf32
[Clang][PowerPC] Add __dmr1024 type and DMF integer calculation builtins (#142480)
Define the __dmr1024 type used to manipulate the new DMR registers
introduced by the Dense Math Facility (DMF) on PowerPC, and add six
Clang builtins that correspond to the integer outer-product accumulate
to ACC PowerPC instructions:
*  __builtin_mma_dmxvi8gerx4
* __builtin_mma_pmdmxvi8gerx4
*  __builtin_mma_dmxvi8gerx4pp
*  __builtin_mma_pmdmxvi8gerx4pp
*  __builtin_mma_dmxvi8gerx4spp
* __builtin_mma_pmdmxvi8gerx4spp.
2025-06-20 13:03:14 -04:00
Uzair Nawaz
8d6e29d0d3
[libc] Reworked CharacterConverter isComplete into isFull and isEmpty (#144799)
isComplete previously meant different things for different conversion
directions.
Refactored bytes_processed to bytes_stored which now consistently
increments on every push and decrements on pop making both directions
more consistent with each other
2025-06-20 16:59:30 +00:00
Aiden Grossman
7157f33c6c
[libc++] Disable a std::unexpected test in modules build (#144466)
This patch disables unexpected_disabled_cpp17.verify.cpp under clang
modules builds because it changes diagnostics criteria post #143423,
causing the test to fail.

This patch follows a similar style to 853059a15011fd8b57dd0.
This was found when working on trying to land #144033.
2025-06-20 12:58:59 -04:00
Jay Foad
6ddb3a69c1
[AMDGPU] Add another test showing unwanted VALU codegen (#145062) 2025-06-20 17:54:44 +01:00
Hristo Hristov
945ce1aa3d
[libc++] Update the value of __cpp_lib_constrained_equality after P3379R0 (#144553)
https://wg21.link/P3379R0 updated the value of __cpp_lib_constrained_equality,
but we forgot to update it when we implemented the paper.
2025-06-20 12:36:46 -04:00
Shilei Tian
edbaf19c46
[AMDGPU] Fix a potential integer overflow in GCNRegPressure when true16 is enabled (#144968)
Fixes SWDEV-537014.
2025-06-20 12:29:32 -04:00
Muzammil
379a609dad
[mlir][arith][transforms] Adds f4E2M1FN support to truncf and extf (#144157)
See work detail: https://github.com/iree-org/iree/issues/20920

Add support for f4E2M1FN in `arith.truncf` and `arith.extf` ops though a software emulation

---------

Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
2025-06-20 11:27:35 -05:00
Jameson Nash
940ff110d7
[InstCombine] fix hwasan mistake in "remove dead loads" (#145057)
Detected by CI after #143958.
2025-06-20 12:22:59 -04:00
Michael Buch
877511920d
Revert "[lldb][DWARF] Remove object_pointer from ParsedDWARFAttributes" (#145065)
Reverts llvm/llvm-project#144880

Caused `TestObjCIvarsInBlocks.py` to fail on macOS CI.
2025-06-20 17:20:58 +01:00
Justin King
bfef8732be
msan: Support free_sized and free_aligned_sized from C23 (#144529)
Adds support to MSan for `free_sized` and `free_aligned_sized` from C23.

Other sanitizers will be handled with their own separate PRs.

For https://github.com/llvm/llvm-project/issues/144435

Signed-off-by: Justin King <jcking@google.com>
2025-06-20 09:16:40 -07:00