57952 Commits

Author SHA1 Message Date
Craig Topper
dd3edc8365
[CodeGen] Add Register::stackSlotIndex(). Replace uses of Register::stackSlot2Index. NFC (#125028) 2025-01-29 23:02:07 -08:00
Lang Hames
b1bd73700a [ORC] Add missing files from d6524c8dfa3. 2025-01-30 13:48:08 +11:00
Lang Hames
d6524c8dfa Reapply "[ORC] Enable JIT support for the compact-unwind frame..." with fixes.
This reapplies 4f0325873fa (and follow up patches 26fc07d5d88, a001cc0e6cdc,
c9bc242e387, and fd174f0ff3e), which were reverted in 212cdc9a377 to
investigate bot failures (e.g.
https://lab.llvm.org/buildbot/#/builders/108/builds/8502)

The fix to address the bot failures was landed in d0052ebbe2e. This patch also
restricts construction of the UnwindInfoManager object to Apple platforms (as
it won't be used on other platforms).
2025-01-30 13:42:10 +11:00
Alex MacLean
de7438e472
[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (#119261)
Add a new AutoUpgrade function to convert some legacy nvvm.annotations
metadata to function level attributes. These attributes are quicker to
look-up so improve compile time and are more idiomatic than using
metadata which should not include required information that changes the
meaning of the program.

Currently supported annotations are:

- !"kernel" -> ptx_kernel calling convention
- !"align" -> alignstack parameter attributes (return not yet supported)
2025-01-29 16:27:27 -08:00
vporpo
e094c0fa67
[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479)
This patch adds a legality check that checks for repeated instrs in a
bundle and won't vectorize if such pattern is found.
2025-01-29 15:54:15 -08:00
Krzysztof Parzyszek
15ab7be2e0
[flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (#121817)
Parse METADIRECTIVE as a standalone executable directive at the moment.
This will allow testing the parser code.

There is no lowering, not even clause conversion yet. There is also no
verification of the allowed values for trait sets, trait properties.
2025-01-29 15:07:20 -06:00
Axel Sorenson
d3161defd6
[PassBuilder] VectorizerEnd Extension Points (#123494)
Added an extension point after vectorizer passes in the PassBuilder.
Additionally, added extension points before and after vectorizer passes
in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me
through my first LLVM contribution (and my first open source
contribution in general!) :)
- Implemented `registerVectorizerEndEPCallback`
- Implemented `invokeVectorizerEndEPCallbacks`
- Added `VectorizerEndEPCallbacks` SmallVector
- Added a command line option `passes-ep-vectorizer-end` to
`NewPMDriver.cpp`
- `buildModuleOptimizationPipeline` now calls
`invokeVectorizerEndEPCallbacks`
- `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks`
- `buildLTODefaultPipeline` now calls BOTH
`invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks`
- Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`,
`new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll`
- Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in
`new-pm-lto-defaults.ll` for consistency.

This code is intended for developers that wish to implement and run
custom passes after the vectorizer passes in the PassBuilder pipeline.
For example, in #91796, a pass was created that changed the induction
variables of vectorized code. This is right after the vectorization
passes.
2025-01-29 11:24:03 -08:00
Teresa Johnson
ae6d5dd58b
[MemProf] Prune unneeded non-cold contexts (#124823)
We can take advantage of the fact that we subsequently only clone cold
allocation contexts, since not cold behavior is the default, and
significantly reduce the amount of metadata (and later ThinLTO summary
and MemProfContextDisambiguation graph nodes) by pruning unnecessary not
cold contexts when building metadata from the trie.

Specifically, we only need to keep notcold contexts that overlap the
longest with cold allocations, to know how deeply to clone those
contexts to expose the cold allocation behavior.

For a large target this reduced ThinLTO bitcode object sizes by about
35%. It reduced the ThinLTO indexing time by about half and the peak
ThinLTO indexing memory by about 20%.
2025-01-29 10:38:31 -08:00
Joel E. Denny
18f8106f31
[KernelInfo] Implement new LLVM IR pass for GPU code analysis (#102944)
This patch implements an LLVM IR pass, named kernel-info, that reports
various statistics for codes compiled for GPUs. The ultimate goal of
these statistics to help identify bad code patterns and ways to mitigate
them. The pass operates at the LLVM IR level so that it can, in theory,
support any LLVM-based compiler for programming languages supporting
GPUs. It has been tested so far with LLVM IR generated by Clang for
OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs.

By default, the pass runs at the end of LTO, and options like
``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang`
command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include
summary statistics (e.g., total size of static allocas) and individual
occurrences (e.g., source location of each alloca). Examples of its
output appear in tests in `llvm/test/Analysis/KernelInfo`.
2025-01-29 12:40:19 -05:00
Nikita Popov
29441e4f5f
[IR] Convert from nocapture to captures(none) (#123181)
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.

Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
   reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
   make it easier to use old IR files and somewhat reduce the test churn in
   this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
   attribute. The representation in the LLVM IR dialect should be updated
   separately.
2025-01-29 16:56:47 +01:00
Mikhail Gudim
3c3c850a45
[ReachingDefAnalysis] Extend the analysis to stack objects. (#118097)
We track definitions of stack objects, the implementation is identical
to tracking of registers.

Also, added printing of all found reaching definitions for testing
purposes.

---------

Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
2025-01-29 10:55:16 -05:00
Yingwei Zheng
f226cabbb1
[ValueTracking] Handle nonnull attributes at callsite (#124908)
Alive2: https://alive2.llvm.org/ce/z/yJfskv
Closes https://github.com/llvm/llvm-project/issues/124540.
2025-01-29 23:14:36 +08:00
Acim Maravic
3a29dfe37c
[LLVM][AMDGPU] Add Intrinsic and Builtin for ds_bpermute_fi_b32 (#124616) 2025-01-29 14:04:10 +01:00
Mingming Liu
3feb724496
[AsmPrinter][ELF] Support profile-guided section prefix for jump tables' (read-only) data sections (#122215)
https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to
infer machine jump table entry's hotness from the MBB hotness. This is a
follow-up PR to produce `.hot` and or `.unlikely` section prefix for
jump table's (read-only) data sections in the relocatable `.o` files.

When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`,
`.rodata.unlikely`} in input sections. It can map `.rodata.hot` and
`.rodata` in the input sections to `.rodata.hot` in the executable, and
map `.rodata.unlikely` into `.rodata` with a pending extension to
`--keep-text-section-prefix` like
059e7cbb66,
or with a linker script.

1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group.

2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement
`TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on
MJTE's hotness.
    - The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does.

3. The new code path is off by default.
    - Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it
readable through `TargetMachine`.
    - To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in
`CodeGen/CommandFlags.h/cpp`.
    -  A subsequent patch
([draft](8f36a13743)) will add a clang option to enable the new code path.

---------

Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>
2025-01-28 22:49:28 -08:00
Ellis Hoag
9ec4f474c0
[GlobPattern] Fix doxygen docs (#124361)
Fix the docs for the `GlobPattern` class.
https://llvm.org/docs/doxygen/classllvm_1_1GlobPattern.html

I've tried to fix this a few times, but I finally got around to building
`doxygen-llvm` to verify that my fixed worked.

For some reason, `\p` does not work well with `/`. Instead I'm using
<code>`</code> to make a codespan, which is more correct anyway.
https://www.doxygen.nl/manual/markdown.html#md_codespan
2025-01-28 21:03:56 -08:00
vporpo
79cbad188a
[SandboxVec] Clear Context's state within runOnFunction() (#124842)
`sandboxir::Context` is defined at a pass-level scope with the
`SandboxVectorizerPass` class because the function pass manager `FPM`
object depends on it, and that is in pass-level scope to avoid
recreating the pass pipeline every single time `runOnFunction()` is
called.

This means that the Context's state lives on across function passes. The
problem is twofold:
(i) the LLVM IR to Sandbox IR map can grow very large including objects
from different functions, which is of no use to the vectorizer, as it's
a function-level pass.
(ii) this can result in stale data in the LLVM IR to Sandbox IR object
map, as other passes may delete LLVM IR objects.

To fix both issues this patch introduces a `Context::clear()` function
that clears the `LLVMValueToValueMap`.
2025-01-28 18:28:08 -08:00
Prabhuk
e7f02241ad
[nfc][llvm] Clean up isUEFI checks (#124845)
The check for `isOSWindows() || isUEFI()` is used in several places
across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h
to simplify these checks.
2025-01-28 15:18:10 -08:00
Jeremy Morse
48df9480da
[NFC] Suppress spurious deprecation warning with MSVC (#124764)
gcc and clang won't complain about calls to deprecated functions, if
you're calling from a function that is deprecated too. However, MSVC
does care, and expands into maaany deprecation warnings for
getFirstNonPHI.

Suppress this by converting the inlineable copy of getFirstNonPHI into a
non-inline copy.
2025-01-28 15:55:45 +00:00
Jeremy Morse
79499f010d [NFC][DebugInfo] Deprecate iterator-taking moveBefore and getFirstNonPHI (#124290)
The RemoveDIs project [0] makes debug intrinsics obsolete and to support
this instruction iterators carry an extra bit of debug information. To
maintain debug information accuracy insertion needs to be performed with a
BasicBlock::iterator rather than with Instruction pointers, otherwise the
extra bit of debug information is lost.

To that end, we're deprecating getFirstNonPHI and moveBefore for
instruction pointers. They're replaced by getFirstNonPHIIt and an
iterator-taking moveBefore: switching to the replacement is
straightforwards, and 99% of call-sites need only to unwrap the iterator
with &* or call getIterator() on an Instruction pointer.

The exception is when inserting instructions at the start of a block: if
you call getFirstNonPHI() (or begin() or getFirstInsertionPt()) and then
insert something at that position, you must pass the BasicBlock::iterator
returned into the insertion method. Unwrapping with &* and then calling
getIterator strips the debug-info bit we wish to preserve. Please do
contact us about any use case that's confusing or unclear [1].

[0] https://llvm.org/docs/RemoveDIsDebugInfo.html
[1] https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578
2025-01-28 14:02:24 +00:00
Joseph Huber
13dcc95dcd
[Offload] Rework offloading entry type to be more generic (#124018)
Summary:
The previous offloading entry type did not fit the current use-cases
very well. This widens it and adds a version to prevent further
annoyances. It also includes the kind to better sort who's using it.

The first 64-bytes are reserved as zero so the OpenMP runtime can detect
the old format for binary compatibilitry.
2025-01-28 07:26:13 -06:00
Bushev Dmitry
0165e3346f
[llvm][Object] Add missing const qualifier for value_type in content_iterator (#124106)
value_type was defined as non-const for content_iterator, although it's
methods returned a const pointers/references. This prevented it from
using in some algorithms from STLExtras.h
2025-01-28 13:22:48 +03:00
SivanShani-Arm
de4bbbfdcc
[Build Attributes] Standardize names according to convention. (#124556)
The de-facto convention for build attributes file and class names seems
to be 'Attrs' in the class name and 'Attributes' in the file name.

Accordingly, change file ARMBuildAttrs.cpp -> ARMBuildAttributes.cpp And
class AArch64BuildAttrs --> AArch64BuildAttributes
2025-01-28 09:53:45 +00:00
Chandler Carruth
f4de28a63c
[StrTable] Switch intrinsics to StringTable and work around MSVC (#123548)
Historically, the main example of *very* large string tables used the
`EmitCharArray` to work around MSVC limitations with string literals,
but that was switched (without removing the API) in order to consolidate
on a nicer emission primitive.

While this large string table in `IntrinsicsImpl.inc` seems to compile
correctly on MSVC without the work around in `EmitCharArray` (and that
this PR adds back to the nicer emission path), other users have
repeatedly hit this MSVC limitation as you can see in the discussion on
PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the
string offset table emission to look at
the size of the table and switch to the char array emission strategy
when the table becomes too large.

This work around does have the downside of making compile times worse
for large string tables, but that appears unavoidable until we can
identify known good MSVC versions and switch to requiring them for all
LLVM users. It also reduces searchability of the generated string table
-- I looked at emitting a comment with each string but it is tricky
because the escaping rules for an inline comment are different from
those of of a string literal, and there's no real way to turn the string
literal into a comment.

While improving the output in this way, also clean up the output to not
emit an extraneous empty string at the end of the string table, and
update the `StringTable` class to not look for that. It isn't actually
used by anything and is wasteful.

This PR also switches the `IntrinsicsImpl.inc` string tables over to the
new `StringTable` runtime abstraction. I didn't want to do this until
landing the MSVC workaround in case it caused even this example to start
hitting the MSVC bug, but I wanted to switch here so that I could
simplify the API for emitting the string table with the workaround
present. With the two different emission strategies, its important to
use a very exact syntax and that seems better encapsulated in the API.

Last but not least, the `SDNodeInfoEmitter` is updated, including its
tests to match the new output.

This PR should unblock landing
https://github.com/llvm/llvm-project/pull/120534 and letting us switch
all of
Clang's builtins to use string tables. That PR has all the details
motivating the overall effort.

Follow-up patches will try to consolidate the remaining users onto the
single interface, but those at least were easy to separate into
follow-ups and keep this PR somewhat smaller.
2025-01-28 00:17:04 -08:00
Adam Yang
aab25f20f6
[HLSL][SPIRV][DXIL] Implement WaveActiveMax intrinsic (#123428)
```    - add clang builtin to Builtins.td
      - link builtin in hlsl_intrinsics
      - add codegen for spirv intrinsic and two directx intrinsics to retain
        signedness information of the operands in CGBuiltin.cpp
      - add semantic analysis in SemaHLSL.cpp
      - add lowering of spirv intrinsic to spirv backend in
        SPIRVInstructionSelector.cpp
      - add lowering of directx intrinsics to WaveActiveOp dxil op in
    DXIL.td

      - add test cases to illustrate passespendent pr merges.
```
Resolves #99170
2025-01-27 23:26:56 -08:00
Thurston Dang
fa9ac62d02
[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211)
This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994).

Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.
2025-01-27 20:08:53 -08:00
Craig Topper
d839e765f0 [TargetLowering] Inline the only caller of one of the forceExpandWideMUL functions. NFC
This caller does not need the libcall portion so it can directly
call forceExpandMultiply.
2025-01-27 17:10:37 -08:00
Momchil Velikov
f75860f895
[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615)
This patch adds the following intrinsics:

* Fused multiply-add non-indexed

float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
        
float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)

* Floating-point multiply-add long to half-precision (vector, by
element)

float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
    
* Floating-point multiply-add long-long to single-precision (vector, by
element)

float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
2025-01-28 00:38:44 +00:00
Mingming Liu
e98b2028c7
[NFCI]Refactor AsmPrinter around jump table emission (#124645)
Add method `AsmPrinter::emitJumpTableImpl`. It takes an array-ref of jump table indices.

This splits refactor of PR https://github.com/llvm/llvm-project/pull/122215
2025-01-27 15:29:38 -08:00
Rahul Joshi
aca08a8515
[TableGen] Add assert to validate Objects list for HwModeSelect (#123794)
- Bail out of TableGen if any asserts fail before running the backend. 
- Add asserts to validate that the `Objects` and `Modes` lists for
various `HwModeSelect` subclasses are of same length.
 - Eliminate equivalent check in CodeGenHWModes.cpp
2025-01-27 13:44:44 -08:00
Florian Hahn
ad9da92cf6
[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)
Add an extra knob to RuntimeUnrollMultiExit to let backends control
whether to allow multi-exit unrolling on a per-loop basis.

This gives backends more fine-grained control on deciding if multi-exit
unrolling is profitable for a given loop and uarch. Similar to
4226e0a0c75.

PR: https://github.com/llvm/llvm-project/pull/124462
2025-01-27 21:20:04 +00:00
Momchil Velikov
804b81d39f
[AArch64] Add FP8 Neon intrinsics for dot-product (#123613)
This patch adds the following intrinsics:

float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)
    
float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
    
float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)

float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
2025-01-27 21:14:16 +00:00
Momchil Velikov
99bd2e3f12
[AArch64] Add Neon FP8 conversion intrinsics (#123612)
The patch adds the following intrinsics:

    bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
    float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
    float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm)
mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn,
float32x4_t vm, fpm_t fpm)
    
mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm)
mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t
fpm)

Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>
2025-01-27 17:32:47 +00:00
Jeremy Morse
34b139594a
[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283)
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.

Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
2025-01-27 16:44:14 +00:00
Jeremy Morse
81d18ad864
[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators. A
number of these (such as getFirstNonPHIOrDbg) are sufficiently
infrequently used that we can just replace the pointer-returning version
with an iterator-returning version, hopefully without much/any
disruption.

Thus this patch has getFirstNonPHIOrDbg and
getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all
call-sites. There are no concerns about the iterators returned being
converted to Instruction*'s and losing the debug-info bit: because the
methods skip debug intrinsics, the iterator head bit is always false
anyway.
2025-01-27 16:27:54 +00:00
ssijaric-nv
16e9601e19
[Flang] Adjust the trampoline size for AArch64 and PPC (#118678)
Set  the trampoline size to match that in compiler-rt/lib/builtins/trampoline_setup.c
and AArch64 and PPC lowering.
2025-01-27 08:02:18 -08:00
Jeremy Morse
e14962a39c
[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
2025-01-27 15:25:17 +00:00
David Sherwood
b7286dbef9
Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752" (#123616)
The last attempt failed a sanitiser build because we were
creating a reference to a null Predicates pointer in
isDereferenceableAndAlignedInLoop. This was exposed by
the unit test IsDerefReadOnlyLoop in
unittests/Analysis/LoadsTest.cpp. I fixed this by falling
back on getConstantMaxBackedgeTakenCount if Predicates is
null - see line 316 in llvm/lib/Analysis/Loads.cpp. There
are no other changes.
2025-01-27 11:59:38 +00:00
Durgadoss R
3b5e9eed2f
[NVPTX] Add float to tf32 conversion intrinsics (#124316)
This patch adds the set of f32 -> tf32 cvt intrinsics introduced
in sm100 with ptx8.6. This builds on top of the recent PR #121507.

Tests are verified with a 12.8 ptxas executable.

PTX ISA link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2025-01-27 15:52:43 +05:30
Jay Foad
0c784851c5
[MathExtras] Favor using the hexadecimal FP constants (#123180)
This just fixes a TODO now that we are using C++17.
2025-01-26 15:32:46 +00:00
Craig Topper
37fdde6025 [CodeGen] Remove implict conversions from Register to unsigned from MachineOperand. NFC 2025-01-25 23:12:14 -08:00
Craig Topper
4bcd8184a0
[TargetLowering] Pull similar code out of the forceExpandWideMUL into a helper. NFC (#124371)
These functions have similar code. One of them calculates the 2x width
full product from 2 sources. The other calculates the product from 2
sources that have low and high halves.

This patch introduces a new function that takes HiLHS and HiRHS as
optional values. If they are not null, they will be used in the
calculation of the Hi half. The Signed flag can only be set when
HiLHS/HiRHS are null.
2025-01-25 10:53:01 -08:00
vporpo
5cb2db3b51
[SandboxVec][Scheduler] Forbid crossing BBs (#124369)
This patch updates the scheduler to forbid scheduling across BBs. It
should eventually be able to handle this, but we disable it for now.
2025-01-25 08:19:27 -08:00
Craig Topper
ac1ba1f9dd
[CodeGen] Introduce a VirtRegOrUnit class to hold virtual reg or physical reg unit. NFC (#123768)
LiveIntervals and MachineVerifier were previously using Register to
store this, but reg units are different than physical registers. One
important difference is that 0 is a valid reg unit number, but it is not
a valid phyiscal register.

This patch introduces a new VirtRegOrUnit class that is distinct from
Register. It can be be converted to/from a virtual Register or a
MCRegUnit. I've made all conversions explicit and used assertions to
check the validity.

I also fixed a place in MachineVerifier that was ignoring reg unit 0.
2025-01-24 18:30:28 -08:00
Teresa Johnson
c725a95e08
[MemProf] Convert Hot contexts to NotCold early (#124219)
While we convert hot contexts to notcold contexts during the cloning
step, their existence was greatly limiting the context trimming
performed when we add the MemProf profile to the IR. To address this,
any hot contexts are converted to notcold contexts immediately after
first checking for unambiguous allocation types, and before checking it
again and before adding metadata while performing context trimming.

Note that hot hints are now disabled by default, however, this avoids
adding unnecessary overhead if they are re-enabled.
2025-01-24 15:58:13 -08:00
vporpo
6409799bdc
[SandboxVec][Legality] Pack from different BBs (#124363)
When the inputs of the pack come from different BBs we need to make sure
we emit the pack instructions at the correct place.
2025-01-24 15:39:37 -08:00
vporpo
ac75d32280
[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360)
This patch changes the functionality of `VecUtils::getLowest(Vals, BB)`
such that it filters out any instructions in `Vals` that are not in BB.
This is useful when Vals contains instructions from different BBs,
because in that case we are only interested in one BB.
2025-01-24 14:52:57 -08:00
vporpo
cff7ad56ba
[SandboxVec][Utils] Implement Utils::verifyFunction() (#124356)
This patch implements a wrapper function for the LLVM IR verifier for
functions, and calls it (flag-guarded) within the bottom-up-vectorizer
for finding IR bugs as soon as they happen.
2025-01-24 14:35:20 -08:00
vporpo
4b209c5d87
[SandboxIR][Region] Add cost modeling to the region (#124354)
This patch implements cost modeling for Region. All instructions that
are added or removed get their cost counted in the Scoreboard. This is
used for checking if the region before or after a transformation is more
profitable.
2025-01-24 14:28:55 -08:00
vporpo
b41987beae
[SandboxVec][DAG] Fix MemDGNode chain maintenance when move destination is non-mem (#124227)
This patch fixes a bug in the maintenance of the MemDGNode chain of the
DAG. Whenever we move a memory instruction, the DAG gets notified about
the move and maintains the chain of memory nodes. The bug was that if
the destination of the move was not a memory instruction, then the
memory node's next node would end up pointing to itself.
2025-01-24 13:59:32 -08:00
Stephen Long
ab976a1712
PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568) 2025-01-24 14:02:06 -05:00