57581 Commits

Author SHA1 Message Date
Philip Reames
0517772b4a Delete unused PoisonChecking utility pass
This was introduced ~5yrs ago (by me), and has never really gotten
any adoption.  By now, it's significantly out of sync with new/changed
poison propoagation rules.  The idea is still reasonable, but the
imagined use case is largely covered by alive2 these days anyways.
2024-12-19 14:23:38 -08:00
Florian Hahn
5f096fd221
Revert "[LoopVectorizer] Add support for partial reductions (#92418)"
This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701.

It looks like this is triggering an assertion when build llvm-test-suite
on ARM64 macOS.

Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32"
    target triple = "arm64-apple-macosx15.0.0"

    define void @test(i64 %idx.neg, i8 %0) #0 {
    entry:
      br label %while.body

    while.body:                                       ; preds = %while.body, %entry
      %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ]
      %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ]
      %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ]
      %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1
      %conv = sext i8 %0 to i64
      %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1
      %1 = load i8, ptr null, align 1
      %conv97 = sext i8 %1 to i64
      %mul = mul i64 %conv97, %conv
      %add99 = add i64 %mul, %sum.1129
      %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0
      %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1
      %2 = and i1 %cmp94, %cmp95
      br i1 %2, label %while.body, label %while.end.loopexit

    while.end.loopexit:                               ; preds = %while.body
      %add99.lcssa = phi i64 [ %add99, %while.body ]
      ret void
    }

    attributes #0 = { "target-cpu"="apple-m1" }

> opt -p loop-vectorize
Assertion failed: ((VF.isScalar() || V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.
2024-12-19 21:46:51 +00:00
Kazu Hirata
10d054e954
[memprof] Introduce IndexedCallstackIdConveter (NFC) (#120540)
This patch introduces IndexedCallstackIdConveter as a convenience
wrapper around FrameIdConverter and CallStackIdConverter just for
tests.

With the new wrapper, we get to replace idioms like:

  FrameIdConverter<decltype(MemProfData.Frames)> FrameIdConv(
      MemProfData.Frames);
  CallStackIdConverter<decltype(MemProfData.CallStacks)> CSIdConv(
      MemProfData.CallStacks, FrameIdConv);

with:

  IndexedCallstackIdConveter CSIdConv(MemProfData);

Unfortunately, this exact pattern occurs in tests only; the
combinations of the frame ID converter and call stack ID converter are
diverse in production code.
2024-12-19 12:20:25 -08:00
Finn Plummer
45c01e8a33
[NFC][TargetTransformInfo][VectorUtils] Consolidate isVectorIntrinsic... api (#117635)
- update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for
all uses, to allow specifiction of target specific intrinsics
- add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api
- update TTI api to provide `isTargetIntrinsicWith...` functions and
  consistently name them
- move `isTriviallyScalarizable` to VectorUtils
  
- update all uses of the api and provide the TTI parameter

Resolves #117030
2024-12-19 11:54:26 -08:00
Justin Bogner
aa07f92210
[DirectX][SPIRV] Consistent names for HLSL resource intrinsics (#120466)
Rename HLSL resource-related intrinsics to be consistent with the naming
conventions discussed in [wg-hlsl:0014].

This is an entirely mechanical change, consisting of the following
commands and automated formatting.

```sh
git grep -l handle.fromBinding | xargs perl -pi -e \
  's/(dx|spv)(.)handle.fromBinding/$1$2resource$2handlefrombinding/g'
git grep -l typedBufferLoad_checkbit | xargs perl -pi -e \
  's/(dx|spv)(.)typedBufferLoad_checkbit/$1$2resource$2loadchecked$2typedbuffer/g'
git grep -l typedBufferLoad | xargs perl -pi -e \
  's/(dx|spv)(.)typedBufferLoad/$1$2resource$2load$2typedbuffer/g'
git grep -l typedBufferStore | xargs perl -pi -e \
  's/(dx|spv)(.)typedBufferStore/$1$2resource$2store$2typedbuffer/g'
git grep -l bufferUpdateCounter | xargs perl -pi -e \
  's/(dx|spv)(.)bufferUpdateCounter/$1$2resource$2updatecounter/g'
git grep -l cast_handle | xargs perl -pi -e \
  's/(dx|spv)(.)cast.handle/$1$2resource$2casthandle/g'
```

[wg-hlsl:0014]: https://github.com/llvm/wg-hlsl/blob/main/proposals/0014-consistent-naming-for-dx-intrinsics.md
2024-12-19 12:17:21 -07:00
MagentaTreehouse
254ba78495
[GenericDomTree][NFC] Remove unnecessary const_casts (#97638) 2024-12-19 09:46:03 -08:00
Craig Topper
f139bde8d8
[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536)
This allows us to write more range based for loops because we no
longer need the iterator. It also matches IR's Use class.
2024-12-19 09:07:42 -08:00
Craig Topper
e6b2495545
[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531)
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.

We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.

Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
2024-12-19 08:35:32 -08:00
Nicholas Guy
060d62b48a
[LoopVectorizer] Add support for partial reductions (#92418)
Following on from https://github.com/llvm/llvm-project/pull/94499, this
patch adds support to the Loop Vectorizer to emit the partial reduction
intrinsics where they may be beneficial for the target.

---------

Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>
2024-12-19 11:42:40 +00:00
Mikhail Goncharov
cffe22a937 Revert "[NFC] Move DroppedVariableStats code to Analysis (#120502)"
that introduces a circular dependency of analysis -> codegen -> target

This reverts commit e389492d6a00e1c49a034e13343098541ebd03c6.
2024-12-19 10:56:02 +01:00
Shubham Sandeep Rastogi
16d952898f Revert "Add a pass to collect dropped var stats for MIR (#120501)"
This reverts commit 223c7648468cd4f649a578d3f9cbc27a63523192.

Reverted due to vuildbot failure:

flang-aarch64-libcxx

Linking CXX shared library lib/libLLVMAnalysis.so.20.0git
FAILED: lib/libLLVMAnalysis.so.20.0git
2024-12-19 00:48:40 -08:00
Shubham Sandeep Rastogi
223c764846
Add a pass to collect dropped var stats for MIR (#120501)
Reland "Add a pass to collect dropped var stats for MIR" (#117044)

I am trying to reland https://github.com/llvm/llvm-project/pull/115566

I also moved the DroppedVariableStats code to the Analysis lib

This is part of a stack of patches with
https://github.com/llvm/llvm-project/pull/120502 being the first one in
the stack
2024-12-19 00:41:48 -08:00
Shubham Sandeep Rastogi
e389492d6a
[NFC] Move DroppedVariableStats code to Analysis (#120502)
This is done because the CodeGen library and Passes library both link
against Analysis, to avoid adding a dependency between CodeGen and
Passes if we want to extend the DroppedVariableStats code for MIR stats
as well, as seen in https://github.com/llvm/llvm-project/pull/120501
2024-12-18 23:42:24 -08:00
Craig Topper
bd261ecc5a
[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509)
Most of these are just places that want the first user and aren't
iterating over the whole list.

While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.

This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
2024-12-18 22:13:04 -08:00
Craig Topper
104ad9258a
[SelectionDAG] Rename SDNode::uses() to users(). (#120499)
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.

I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
2024-12-18 20:09:33 -08:00
Kazu Hirata
6fb967ec9e
[memprof] Move Frame::hash and hashCallStack to IndexedMemProfData (NFC) (#120365)
Now that IndexedMemProfData::{addFrame,addCallStack} are the only
callers of Frame::hash and hashCallStack, respectively, this patch
moves those functions into IndexedMemProfData and makes them private.
With this patch, we can obtain FrameId and CallStackId only through
addFrame and addCallStack, respectively.
2024-12-18 10:56:45 -08:00
Justin Bogner
0e2466f624
[DirectX] Create symbols for resource handles (#119775)
We need to create symbols with "the original shape of resource and
element type" to put in the resource metadata in order to generate valid
DXIL.

Note that DXC generally doesn't emit an actual symbol outside of library
shaders (it emits an undef of a pointer to the type), but since we have
to deal with opaque pointers we would need a way to smuggle the type
through to match that. Instead, we simply emit symbols for now.

Fixed #116849
2024-12-18 10:47:12 -07:00
Shubham Sandeep Rastogi
5717a99d8d
Reland 2de78815604e9027efd93cac27c517bf732587d2 (#119650) (#120454)
[NFC] Move DroppedVariableStats to its own file and redesign it to be
extensible. (#115563)

Move DroppedVariableStats code to its own file and change the class to
have an extensible design so that we can use it to add dropped
statistics to MIR passes and the instruction selector.
2024-12-18 08:58:14 -08:00
Justin Bogner
3eca15cbb9
[DirectX] Split resource info into type and binding info. NFC (#119773)
This splits the DXILResourceAnalysis pass into TypeAnalysis and
BindingAnalysis passes. The type analysis pass is made immutable and
populated lazily so that it can be used earlier in the pipeline without
needing to carefully maintain the invariants of the binding analysis.

Fixes #118400
2024-12-18 09:02:28 -07:00
Akash Banerjee
aadf606d90 Fix #110001 build error. 2024-12-18 15:55:56 +00:00
Florian Hahn
76714be5fd
Revert "Add support for single reductions in ComplexDeinterleavingPass (#112875)"
This reverts commit b3eede5e1fa7ab742b86e9be22db7bccd2505b8a.

This has been breaking most AArch64 stage2 builds for 4+ hours,
reverting to get the bots back to green.

https://lab.llvm.org/buildbot/#/builders/41/builds/4172
https://lab.llvm.org/buildbot/#/builders/4/builds/4281
https://lab.llvm.org/buildbot/#/builders/199/builds/263
https://lab.llvm.org/buildbot/#/builders/198/builds/334
https://lab.llvm.org/buildbot/#/builders/143/builds/4276
https://lab.llvm.org/buildbot/#/builders/17/builds/4725
2024-12-18 15:06:52 +00:00
Akash Banerjee
6f0e9c4a56
[OpenMP][Clang] Migrate OpenMP UserDefinedMapper from Clang to OMPIRBuilder (#110001)
This patch migrates the OpenMP UserDefinedMapper codegen from Clang to
the OpenMPIRBuilder. I will be adding further patches in the near future
so that OpenMP dialect in MLIR can make use of these.
2024-12-18 15:02:14 +00:00
NAKAMURA Takumi
5a5838fba3 Introduce CounterMappingRegion::isBranch(). NFC. 2024-12-18 20:00:02 +09:00
Nicholas Guy
b3eede5e1f
Add support for single reductions in ComplexDeinterleavingPass (#112875)
The Complex Deinterleaving pass assumes that all values emitted will
result in complex numbers, this patch aims to remove that assumption and
adds support for emitting just the real or imaginary components, not
both.
2024-12-18 10:34:26 +00:00
Benjamin Maxwell
1ee740a796
[VFABI] Add support for vector functions that return struct types (#119000)
This patch updates the `VFABIDemangler` to support vector functions that
return struct types. For example, a vector variant of `sincos` that
returns a vector of sine values and a vector of cosine values within a
struct.

This patch also adds some helpers for vectorizing types (including
struct types). Some of these are used in the `VFABIDemangler`, and
others will be used in subsequent patches, so this patch simply adds
tests for them.
2024-12-18 09:46:45 +00:00
David Sherwood
13107cb094
[LoopVectorize] Enable more early exit vectorisation tests (#117008)
PR #112138 introduced initial support for dispatching to
multiple exit blocks via split middle blocks. This patch
fixes a few issues so that we can enable more tests to use
the new enable-early-exit-vectorization flag. Fixes are:

1. The code to bail out for any loop live-out values happens
too late. This is because collectUsersInExitBlocks ignores
induction variables, which get dealt with in fixupIVUsers.
I've moved the check much earlier in processLoop by looking
for outside users of loop-defined values.
2. We shouldn't yet be interleaving when vectorising loops
with uncountable early exits, since we've not added support
for this yet.
3. Similarly, we also shouldn't be creating vector epilogues.
4. Similarly, we shouldn't enable tail-folding.
5. The existing implementation doesn't yet support loops
that require scalar epilogues, although I plan to add that
as part of PR #88385.
6. The new split middle blocks weren't being added to the
parent loop.
2024-12-18 09:25:45 +00:00
Vitaly Buka
55e87a79b9
[BoundsChecking] Add parameters to pass (#119894)
This check is a part of UBSAN, but does not support
verbose output like other UBSAN checks.

This is a step to fix that.
2024-12-17 22:07:14 -08:00
Jinsong Ji
c189b2a1ec
[DiagnosticInfo] Fix the default DiagnosticSeverity (#120342)
After
https://github.com/llvm/llvm-project/commit/ea632e1b34e1

the API call to LLVMContext->emitError(I, Errorstr) default to warning
instead of error.

This cause problems as the API mentioned it is "prefixed with error:".
2024-12-17 23:00:11 -05:00
Krzysztof Drewniak
b24caf3d2b
[llvm][TableGen] Add a !initialized predicate to allow testing for ? (#117964)
There are cases (like in an upcoming patch to MLIR's `Property` class)
where the ? value is a useful null value. However, existing predicates
make ti difficult to test if the value in a record one is operating is ?
or not.

This commit adds the !initialized predicate, which is 1 on concrete,
non-? values and 0 on ?.

---------

Co-authored-by: Akshat Oke <Akshat.Oke@amd.com>
2024-12-17 20:34:35 -06:00
Teresa Johnson
d7d0e740cc
[MemProf] Refactor single alloc type handling and use in more cases (#120290)
Emit message when we have aliased contexts that are conservatively
hinted not cold. This is not a change in behavior, just in message when
the -memprof-report-hinted-sizes flag is enabled.
2024-12-17 12:50:49 -08:00
alx32
ad32576cff
[DWARFVerifier] Allow overlapping ranges for ICF-merged functions (#117952)
This patch modifies the DWARF verifier to handle a valid case where two
or more functions have identical address ranges due to being merged by
ICF (Identical Code Folding). Previously, the verifier would incorrectly
report these as errors, but functions merged via ICF (such as when using
LLD's --keep-icf-stabs option) can legitimately share the same address
range.

A new test case has been added to verify this behavior using YAML-based
DWARF data that simulates two DW_TAG_subprogram entries with identical
address ranges. The test ensures that the verifier correctly identifies
this as a valid case and doesn't emit any errors, while still
maintaining the existing verification for truly invalid overlapping
ranges in other scenarios. Before this change, the newly added test case
would have failed, with `llvm-dwarfdump` marking the overlapping address
ranges in the DWARF as an error.

We also modify the existing tests `llvm-dwarfutil/ELF/X86/verify.test` and 
`llvm/test/tools/llvm-dwarfdump/X86/verify_parent_zero_length.yaml`
which rely on the existence of the error that we're trying to
suppress. We slightly change one offset so that the ranges don't
perfectly overlap and an error is still generated.
2024-12-17 11:00:56 -08:00
Nick Sarnie
1c16807d0d
[LLVM] Add Intel vendor in Triple (#120250)
We plan to make use of this in SPIR-V-based OpenMP offloading, for which
there is already an initial patch in review.

Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
2024-12-17 12:30:21 -06:00
alx32
558de0e1f9
[llvm-gsymutil] Add option to load callsites from DWARF (#119913)
This change adds support for loading gSYM callsite information from
DWARF. Previously the only support was for loading callsites info from
YAML.

For testing, we add a pass where `macho-gsym-merged-callsites-dsym`
loads callsite info from DWARF rather than YAML.
2024-12-17 08:51:30 -08:00
Florian Hahn
a487b792e2
[TySan] Add initial Type Sanitizer (LLVM) (#76259)
This patch introduces the LLVM components of a type sanitizer: a
sanitizer for type-based aliasing violations.

It is based on Hal Finkel's https://reviews.llvm.org/D32198.

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
these given TBAA metadata added by Clang. Roughly, a pointer of given
type cannot be used to access an object of a different type (with, of
course, certain exceptions). Unfortunately, there's a lot of code in the
wild that violates these rules (e.g. for type punning), and such code
often must be built with -fno-strict-aliasing. Performance is often
sacrificed as a result. Part of the problem is the difficulty of finding
TBAA violations. Hopefully, this sanitizer will help.

For each TBAA type-access descriptor, encoded in LLVM's IR using
metadata, the corresponding instrumentation pass generates descriptor
tables. Thus, for each type (and access descriptor), we have a unique
pointer representation. Excepting anonymous-namespace types, these
tables are comdat, so the pointer values should be unique across the
program. The descriptors refer to other descriptors to form a type
aliasing tree (just like LLVM's TBAA metadata does). The instrumentation
handles the "fast path" (where the types match exactly and no
partial-overlaps are detected), and defers to the runtime to handle all
of the more-complicated cases. The runtime, of course, is also
responsible for reporting errors when those are detected.

The runtime uses essentially the same shadow memory region as tsan, and
we use 8 bytes of shadow memory, the size of the pointer to the type
descriptor, for every byte of accessed data in the program. The value 0
is used to represent an unknown type. The value -1 is used to represent
an interior byte (a byte that is part of a type, but not the first
byte). The instrumentation first checks for an exact match between the
type of the current access and the type for that address recorded in the
shadow memory. If it matches, it then checks the shadow for the
remainder of the bytes in the type to make sure that they're all -1. If
not, we call the runtime. If the exact match fails, we next check if the
value is 0 (i.e. unknown). If it is, then we check the shadow for the
remainder of the byes in the type (to make sure they're all 0). If
they're not, we call the runtime. We then set the shadow for the access
address and set the shadow for the remaining bytes in the type to -1
(i.e. marking them as interior bytes). If the type indicated by the
shadow memory for the access address is neither an exact match nor 0, we
call the runtime.

The instrumentation pass inserts calls to the memset intrinsic to set
the memory updated by memset, memcpy, and memmove, as well as
allocas/byval (and for lifetime.start/end) to reset the shadow memory to
reflect that the type is now unknown. The runtime intercepts memset,
memcpy, etc. to perform the same function for the library calls.

The runtime essentially repeats these checks, but uses the full TBAA
algorithm, just as the compiler does, to determine when two types are
permitted to alias. In a situation where access overlap has occurred and
aliasing is not permitted, an error is generated.

Clang's TBAA representation currently has a problem representing unions,
as demonstrated by the one XFAIL'd test in the runtime patch. We'll
update the TBAA representation to fix this, and at the same time, update
the sanitizer.

When the sanitizer is active, we disable actually using the TBAA
metadata for AA. This way we're less likely to use TBAA to remove memory
accesses that we'd like to verify.

As a note, this implementation does not use the compressed shadow-memory
scheme discussed previously
(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That
scheme would not handle the struct-path (i.e. structure offset)
information that our TBAA represents. I expect we'll want to further
work on compressing the shadow-memory representation, but I think it
makes sense to do that as follow-up work.

It goes together with the corresponding clang changes
(https://github.com/llvm/llvm-project/pull/76260) and compiler-rt
changes (https://github.com/llvm/llvm-project/pull/76261)

PR: https://github.com/llvm/llvm-project/pull/76259
2024-12-17 13:57:34 +00:00
Florian Hahn
8ea9576d94
[SCEV] Add initial matchers for SCEV expressions. (NFC) (#119390)
This patch adds initial matchers for unary and binary SCEV expressions 
and specializes it for SExt, ZExt and binary add expressions.

Also adds matchers for SCEVConstant and SCEVUnknown.

This patch only converts a few instances to use the new matchers to make
sure everything builds as expected for now.

The goal of the matchers is to hopefully make it slightly easier to
write code matching SCEV patterns.

Depends on https://github.com/llvm/llvm-project/pull/119389

PR: https://github.com/llvm/llvm-project/pull/119390
2024-12-17 12:12:56 +00:00
SpencerAbson
908e30658d
[AArch64] Implement intrinsics for FP8 SME FMLAL/FMLALL (multi) (#119546)
This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long (multiple vectors).

``` c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn, svmfloat8x2_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");

void svmla_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn, svmfloat8x4_t zm,
                                fpm_t fpm) __arm_streaming __arm_inout("za");                              
```

In accordance with https://github.com/ARM-software/acle/pull/323
2024-12-17 11:47:20 +00:00
Lang Hames
24c2744a18 [ORC] Fix LazyReexports resource key management.
Multiple reentry points may be associated with a single key.
2024-12-17 22:38:15 +11:00
Benjamin Maxwell
a7dafea384
[SDAG] Allow folding stack slots into sincos/frexp in more cases (#118117)
This adds a new helper `canFoldStoreIntoLibCallOutputPointers()` to
check that it is safe to fold a store into a node that will expand to a
library call that takes output pointers. This requires checking for two
(independent) properties:

1. The store is not within a CALLSEQ_START..CALLSEQ_END pair
* If it is, the expansion would lead to nested call sequences (which is
invalid)
2. The node does not appear as a predecessor to the store
* If it does, attempting to merge the store into the call would result
in a cycle in the DAG

These two properties are checked as part of the same traversal in
`canFoldStoreIntoLibCallOutputPointers()`
2024-12-17 10:54:17 +00:00
Matt Arsenault
10b12e6e07
LiveVariables: Use Register (#120204) 2024-12-17 17:45:24 +07:00
SpencerAbson
9c89b40f18
[AArch64] Implement intrinsics for FMLAL/FMLALL (single) (#119568)
Multi-vector 8-bit floating-point multiply-add long (single)
```c
// Only if __ARM_FEATURE_SME_F8F16 != 0
void svmla[_single]_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
// Only if __ARM_FEATURE_SME_F8F32 != 0
void svmla[_single]_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");

void svmla[_single]_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                         svmfloat8_t zm, fpm_t fpm)
                                         __arm_streaming __arm_inout("za");
 ```
 In accordance with https://github.com/ARM-software/acle/pull/323.
 
Co-authored-by: Momchil Velikov momchil.velikov@arm.com
2024-12-17 09:31:54 +00:00
Lang Hames
300deebf41 [ORC] Make LazyReexportsManager implement ResourceManager.
This ensures that the reexports mappings are cleared when the resource tracker
associated with each mapping is removed.
2024-12-17 18:45:16 +11:00
Lang Hames
b3d2548d5b
[ORC] Introduce LinkGraphLayer interface and LinkGraphLinkingLayer. (#120182)
Introduces a new layer interface, LinkGraphLayer, that can be used to
add LinkGraphs to an ExecutionSession.

This patch moves most of ObjectLinkingLayer's functionality into a new
LinkGraphLinkingLayer which should (in the future) be able to be used
without linking libObject. ObjectLinkingLayer now inherits from
LinkGraphLinkingLayer and just handles conversion of object files to
LinkGraphs, which are then handed down to LinkGraphLinkingLayer to be
linked.
2024-12-17 17:18:58 +11:00
Ashley Coleman
41a6e9cfd6
[HLSL] Implement WaveActiveAllTrue Intrinsic (#117245)
Resolves https://github.com/llvm/llvm-project/issues/99161

- [x]  Implement `WaveActiveAllTrue` clang builtin,
- [x]  Link `WaveActiveAllTrue` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveAllTrue` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveAllTrue` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveAllTrue.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveAllTrue-errors.hlsl`
- [x] Create the `int_dx_WaveActiveAllTrue` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveAllTrue` to `114`
in `DXIL.td`
- [x] Create the `WaveActiveAllTrue.ll` and
`WaveActiveAllTrue_errors.ll` tests in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveAllTrue` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveAllTrue`
lowering and map it to `int_spv_WaveActiveAllTrue` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveAllTrue.ll`
2024-12-16 16:13:35 -08:00
Justin Bogner
482237e884
[DirectX] Get resource information via TargetExtType (#119772)
Instead of storing an auxilliary structure with the information from the
DXIL resource target extension types duplicated, access the information
that we can via the type itself.

This also means we need to handle some of the target extension types we
haven't fully defined yet, like Texture and CBuffer. For now we make an
educated guess to what those should look like based on llvm/wg-hlsl#76,
and we can update them fairly easily when we've defined them more
thoroughly.

First part of #118400
2024-12-16 16:04:25 -07:00
Artem Pianykh
8402a0fab0
[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto (#118624)
Summary:
This and previously extracted `CloneFunction*Into` functions will be used in later diffs.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-16 22:30:56 +00:00
SpencerAbson
38099d0608
[AArch64] Implement intrinsics for SME FP8 FMLAL/FMLALL (Indexed) (#118549)
This patch implements the following intrinsics:

Multi-vector 8-bit floating-point multiply-add long.
``` c
  // Only if __ARM_FEATURE_SME_F8F16 != 0
  void svmla_lane_za16[_mf8]_vg2x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)  __arm_streaming __arm_inout("za");

  void svmla_lane_za16[_mf8]_vg2x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx
                                       fpm_t fpm) __arm_streaming __arm_inout("za");

// Only if __ARM_FEATURE_SME_F8F32 != 0
  void svmla_lane_za32[_mf8]_vg4x1_fpm(uint32_t slice, svmfloat8_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x2_fpm(uint32_t slice, svmfloat8x2_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");

  void svmla_lane_za32[_mf8]_vg4x4_fpm(uint32_t slice, svmfloat8x4_t zn,
                                       svmfloat8_t zm, uint64_t imm_idx,
                                       fpm_t fpm)__arm_streaming __arm_inout("za");
```
In accordance with: https://github.com/ARM-software/acle/pull/323
2024-12-16 21:45:38 +00:00
Jay Foad
2fe2969659
[CodeGen] Simplify LLT bitfields. NFC. (#120074)
- Put the element size field in the same place for all non-pointer
  types.
- Put the element size and address space fields in the same place for
  all pointer types.
- Put the number of elements and scalable fields in the same place for
  all vector types.

This simplifies initialization and accessor methods isScalable,
getElementCount, getScalarSizeInBits and getAddressSpace.
2024-12-16 21:23:33 +00:00
vporpo
3769fcb3e7
[SandboxVec][Interval] Implement Interval::notifyMoveInstr() (#119471)
This patch implements the notifier for Instruction intervals. It updates
the interval's top/bottom.
2024-12-16 12:59:24 -08:00
Artem Pianykh
a9237b1a10
[NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto (#118623)
Summary:
The new API expects the caller to populate the VMap. We need it this way
for a subsequent change around coroutine cloning.

Test Plan:
ninja check-llvm-unit check-llvm
2024-12-16 20:50:05 +00:00
Ramkumar Ramachandra
290f38cd1a
IR: fix getSwappedCmpPredicate() return type (#120097)
The change 51a895a (IR: introduce struct with CmpInst::Predicate and
samesign) missed a change to ICmpInst::getSwappedCmpPredicate(), which
intends to return a CmpPredicate, but returns a Predicate instead. Fix
this.
2024-12-16 16:29:21 +00:00