llvm-project

Author	SHA1	Message	Date
Craig Topper	dd3edc8365	[CodeGen] Add Register::stackSlotIndex(). Replace uses of Register::stackSlot2Index. NFC (#125028 )	2025-01-29 23:02:07 -08:00
Lang Hames	b1bd73700a	[ORC] Add missing files from d6524c8dfa3.	2025-01-30 13:48:08 +11:00
Lang Hames	d6524c8dfa	Reapply "[ORC] Enable JIT support for the compact-unwind frame..." with fixes. This reapplies 4f0325873fa (and follow up patches 26fc07d5d88, a001cc0e6cdc, c9bc242e387, and fd174f0ff3e), which were reverted in 212cdc9a377 to investigate bot failures (e.g. https://lab.llvm.org/buildbot/#/builders/108/builds/8502) The fix to address the bot failures was landed in d0052ebbe2e. This patch also restricts construction of the UnwindInfoManager object to Apple platforms (as it won't be used on other platforms).	2025-01-30 13:42:10 +11:00
Alex MacLean	de7438e472	[NVPTX] Auto-Upgrade some nvvm.annotations to attributes (#119261 ) Add a new AutoUpgrade function to convert some legacy nvvm.annotations metadata to function level attributes. These attributes are quicker to look-up so improve compile time and are more idiomatic than using metadata which should not include required information that changes the meaning of the program. Currently supported annotations are: - !"kernel" -> ptx_kernel calling convention - !"align" -> alignstack parameter attributes (return not yet supported)	2025-01-29 16:27:27 -08:00
vporpo	e094c0fa67	[SandboxVec][Legality] Don't vectorize when instructions repeat (#124479 ) This patch adds a legality check that checks for repeated instrs in a bundle and won't vectorize if such pattern is found.	2025-01-29 15:54:15 -08:00
Krzysztof Parzyszek	15ab7be2e0	[flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (#121817 ) Parse METADIRECTIVE as a standalone executable directive at the moment. This will allow testing the parser code. There is no lowering, not even clause conversion yet. There is also no verification of the allowed values for trait sets, trait properties.	2025-01-29 15:07:20 -06:00
Axel Sorenson	d3161defd6	[PassBuilder] VectorizerEnd Extension Points (#123494 ) Added an extension point after vectorizer passes in the PassBuilder. Additionally, added extension points before and after vectorizer passes in `buildLTODefaultPipeline`. Credit goes to @mshockwave for guiding me through my first LLVM contribution (and my first open source contribution in general!) :) - Implemented `registerVectorizerEndEPCallback` - Implemented `invokeVectorizerEndEPCallbacks` - Added `VectorizerEndEPCallbacks` SmallVector - Added a command line option `passes-ep-vectorizer-end` to `NewPMDriver.cpp` - `buildModuleOptimizationPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildO0DefaultPipeline` now calls `invokeVectorizerEndEPCallbacks` - `buildLTODefaultPipeline` now calls BOTH `invokeVectorizerStartEPCallbacks` and `invokeVectorizerEndEPCallbacks` - Added LIT tests to `new-pm-defaults.ll`, `new-pm-lto-defaults.ll`, `new-pm-O0-ep-callbacks.ll`, and `pass-pipeline-parsing.ll` - Renamed `CHECK-EP-Peephole` to `CHECK-EP-PEEPHOLE` in `new-pm-lto-defaults.ll` for consistency. This code is intended for developers that wish to implement and run custom passes after the vectorizer passes in the PassBuilder pipeline. For example, in #91796, a pass was created that changed the induction variables of vectorized code. This is right after the vectorization passes.	2025-01-29 11:24:03 -08:00
Teresa Johnson	ae6d5dd58b	[MemProf] Prune unneeded non-cold contexts (#124823 ) We can take advantage of the fact that we subsequently only clone cold allocation contexts, since not cold behavior is the default, and significantly reduce the amount of metadata (and later ThinLTO summary and MemProfContextDisambiguation graph nodes) by pruning unnecessary not cold contexts when building metadata from the trie. Specifically, we only need to keep notcold contexts that overlap the longest with cold allocations, to know how deeply to clone those contexts to expose the cold allocation behavior. For a large target this reduced ThinLTO bitcode object sizes by about 35%. It reduced the ThinLTO indexing time by about half and the peak ThinLTO indexing memory by about 20%.	2025-01-29 10:38:31 -08:00
Joel E. Denny	18f8106f31	[KernelInfo] Implement new LLVM IR pass for GPU code analysis (#102944 ) This patch implements an LLVM IR pass, named kernel-info, that reports various statistics for codes compiled for GPUs. The ultimate goal of these statistics to help identify bad code patterns and ways to mitigate them. The pass operates at the LLVM IR level so that it can, in theory, support any LLVM-based compiler for programming languages supporting GPUs. It has been tested so far with LLVM IR generated by Clang for OpenMP offload codes targeting NVIDIA GPUs and AMD GPUs. By default, the pass runs at the end of LTO, and options like ``-Rpass=kernel-info`` enable its remarks. Example `opt` and `clang` command lines appear in `llvm/docs/KernelInfo.rst`. Remarks include summary statistics (e.g., total size of static allocas) and individual occurrences (e.g., source location of each alloca). Examples of its output appear in tests in `llvm/test/Analysis/KernelInfo`.	2025-01-29 12:40:19 -05:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Mikhail Gudim	3c3c850a45	[ReachingDefAnalysis] Extend the analysis to stack objects. (#118097 ) We track definitions of stack objects, the implementation is identical to tracking of registers. Also, added printing of all found reaching definitions for testing purposes. --------- Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>	2025-01-29 10:55:16 -05:00
Yingwei Zheng	f226cabbb1	[ValueTracking] Handle nonnull attributes at callsite (#124908 ) Alive2: https://alive2.llvm.org/ce/z/yJfskv Closes https://github.com/llvm/llvm-project/issues/124540.	2025-01-29 23:14:36 +08:00
Acim Maravic	3a29dfe37c	[LLVM][AMDGPU] Add Intrinsic and Builtin for ds_bpermute_fi_b32 (#124616 )	2025-01-29 14:04:10 +01:00
Mingming Liu	3feb724496	[AsmPrinter][ELF] Support profile-guided section prefix for jump tables' (read-only) data sections (#122215 ) https://github.com/llvm/llvm-project/pull/122183 adds a codegen pass to infer machine jump table entry's hotness from the MBB hotness. This is a follow-up PR to produce `.hot` and or `.unlikely` section prefix for jump table's (read-only) data sections in the relocatable `.o` files. When this patch is enabled, linker will see {`.rodata`, `.rodata.hot`, `.rodata.unlikely`} in input sections. It can map `.rodata.hot` and `.rodata` in the input sections to `.rodata.hot` in the executable, and map `.rodata.unlikely` into `.rodata` with a pending extension to `--keep-text-section-prefix` like `059e7cbb66`, or with a linker script. 1. To partition hot and jump tables, the AsmPrinter pass slices a function's jump table indices into two groups, one for hot and the other for cold jump tables. It then emits hot jump tables into a `.hot`-prefixed data section and cold ones into a `.unlikely`-prefixed data section, retaining the relative order of `LJT<N>` labels within each group. 2. [ELF only] To have data sections with _dynamic_ names (e.g., `.rodata.hot[.func]`), we implement `TargetLoweringObjectFile::getSectionForJumpTable` method that accepts a `MachineJumpTableEntry` parameter, and update `selectELFSectionForGlobal` to generate `.hot` or `.unlikely` based on MJTE's hotness. - The dynamic JT section name doesn't depend on `-ffunction-section=true` or `-funique-section-names=true`, even though it leverages the similar underlying mechanism to have a MCSection with on-demand name as `-ffunction-section` does. 3. The new code path is off by default. - Typically, `TargetOptions` conveys clang or LLVM tools' options to code generation passes. To follow the pattern, add option `EnableStaticDataPartitioning` bit in `TargetOptions` and make it readable through `TargetMachine`. - To enable the new code path in tools like `llc`, `partition-static-data-sections` option is introduced in `CodeGen/CommandFlags.h/cpp`. - A subsequent patch ([draft](`8f36a13743`)) will add a clang option to enable the new code path. --------- Co-authored-by: Ellis Hoag <ellis.sparky.hoag@gmail.com>	2025-01-28 22:49:28 -08:00
Ellis Hoag	9ec4f474c0	[GlobPattern] Fix doxygen docs (#124361 ) Fix the docs for the `GlobPattern` class. https://llvm.org/docs/doxygen/classllvm_1_1GlobPattern.html I've tried to fix this a few times, but I finally got around to building `doxygen-llvm` to verify that my fixed worked. For some reason, `\p` does not work well with `/`. Instead I'm using <code>`</code> to make a codespan, which is more correct anyway. https://www.doxygen.nl/manual/markdown.html#md_codespan	2025-01-28 21:03:56 -08:00
vporpo	79cbad188a	[SandboxVec] Clear Context's state within runOnFunction() (#124842 ) `sandboxir::Context` is defined at a pass-level scope with the `SandboxVectorizerPass` class because the function pass manager `FPM` object depends on it, and that is in pass-level scope to avoid recreating the pass pipeline every single time `runOnFunction()` is called. This means that the Context's state lives on across function passes. The problem is twofold: (i) the LLVM IR to Sandbox IR map can grow very large including objects from different functions, which is of no use to the vectorizer, as it's a function-level pass. (ii) this can result in stale data in the LLVM IR to Sandbox IR object map, as other passes may delete LLVM IR objects. To fix both issues this patch introduces a `Context::clear()` function that clears the `LLVMValueToValueMap`.	2025-01-28 18:28:08 -08:00
Prabhuk	e7f02241ad	[nfc][llvm] Clean up isUEFI checks (#124845 ) The check for `isOSWindows() \|\| isUEFI()` is used in several places across the codebase. Introducing `isOSWindowsOrUEFI()` in Triple.h to simplify these checks.	2025-01-28 15:18:10 -08:00
Jeremy Morse	48df9480da	[NFC] Suppress spurious deprecation warning with MSVC (#124764 ) gcc and clang won't complain about calls to deprecated functions, if you're calling from a function that is deprecated too. However, MSVC does care, and expands into maaany deprecation warnings for getFirstNonPHI. Suppress this by converting the inlineable copy of getFirstNonPHI into a non-inline copy.	2025-01-28 15:55:45 +00:00
Jeremy Morse	79499f010d	[NFC][DebugInfo] Deprecate iterator-taking moveBefore and getFirstNonPHI (#124290 ) The RemoveDIs project [0] makes debug intrinsics obsolete and to support this instruction iterators carry an extra bit of debug information. To maintain debug information accuracy insertion needs to be performed with a BasicBlock::iterator rather than with Instruction pointers, otherwise the extra bit of debug information is lost. To that end, we're deprecating getFirstNonPHI and moveBefore for instruction pointers. They're replaced by getFirstNonPHIIt and an iterator-taking moveBefore: switching to the replacement is straightforwards, and 99% of call-sites need only to unwrap the iterator with &* or call getIterator() on an Instruction pointer. The exception is when inserting instructions at the start of a block: if you call getFirstNonPHI() (or begin() or getFirstInsertionPt()) and then insert something at that position, you must pass the BasicBlock::iterator returned into the insertion method. Unwrapping with &* and then calling getIterator strips the debug-info bit we wish to preserve. Please do contact us about any use case that's confusing or unclear [1]. [0] https://llvm.org/docs/RemoveDIsDebugInfo.html [1] https://discourse.llvm.org/t/psa-ir-output-changing-from-debug-intrinsics-to-debug-records/79578	2025-01-28 14:02:24 +00:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Bushev Dmitry	0165e3346f	[llvm][Object] Add missing const qualifier for value_type in content_iterator (#124106 ) value_type was defined as non-const for content_iterator, although it's methods returned a const pointers/references. This prevented it from using in some algorithms from STLExtras.h	2025-01-28 13:22:48 +03:00
SivanShani-Arm	de4bbbfdcc	[Build Attributes] Standardize names according to convention. (#124556 ) The de-facto convention for build attributes file and class names seems to be 'Attrs' in the class name and 'Attributes' in the file name. Accordingly, change file ARMBuildAttrs.cpp -> ARMBuildAttributes.cpp And class AArch64BuildAttrs --> AArch64BuildAttributes	2025-01-28 09:53:45 +00:00
Chandler Carruth	f4de28a63c	[StrTable] Switch intrinsics to StringTable and work around MSVC (#123548 ) Historically, the main example of very large string tables used the `EmitCharArray` to work around MSVC limitations with string literals, but that was switched (without removing the API) in order to consolidate on a nicer emission primitive. While this large string table in `IntrinsicsImpl.inc` seems to compile correctly on MSVC without the work around in `EmitCharArray` (and that this PR adds back to the nicer emission path), other users have repeatedly hit this MSVC limitation as you can see in the discussion on PR https://github.com/llvm/llvm-project/pull/120534. This PR teaches the string offset table emission to look at the size of the table and switch to the char array emission strategy when the table becomes too large. This work around does have the downside of making compile times worse for large string tables, but that appears unavoidable until we can identify known good MSVC versions and switch to requiring them for all LLVM users. It also reduces searchability of the generated string table -- I looked at emitting a comment with each string but it is tricky because the escaping rules for an inline comment are different from those of of a string literal, and there's no real way to turn the string literal into a comment. While improving the output in this way, also clean up the output to not emit an extraneous empty string at the end of the string table, and update the `StringTable` class to not look for that. It isn't actually used by anything and is wasteful. This PR also switches the `IntrinsicsImpl.inc` string tables over to the new `StringTable` runtime abstraction. I didn't want to do this until landing the MSVC workaround in case it caused even this example to start hitting the MSVC bug, but I wanted to switch here so that I could simplify the API for emitting the string table with the workaround present. With the two different emission strategies, its important to use a very exact syntax and that seems better encapsulated in the API. Last but not least, the `SDNodeInfoEmitter` is updated, including its tests to match the new output. This PR should unblock landing https://github.com/llvm/llvm-project/pull/120534 and letting us switch all of Clang's builtins to use string tables. That PR has all the details motivating the overall effort. Follow-up patches will try to consolidate the remaining users onto the single interface, but those at least were easy to separate into follow-ups and keep this PR somewhat smaller.	2025-01-28 00:17:04 -08:00
Adam Yang	aab25f20f6	[HLSL][SPIRV][DXIL] Implement `WaveActiveMax` intrinsic (#123428 ) ``` - add clang builtin to Builtins.td - link builtin in hlsl_intrinsics - add codegen for spirv intrinsic and two directx intrinsics to retain signedness information of the operands in CGBuiltin.cpp - add semantic analysis in SemaHLSL.cpp - add lowering of spirv intrinsic to spirv backend in SPIRVInstructionSelector.cpp - add lowering of directx intrinsics to WaveActiveOp dxil op in DXIL.td - add test cases to illustrate passespendent pr merges. ``` Resolves #99170	2025-01-27 23:26:56 -08:00
Thurston Dang	fa9ac62d02	[ubsan] Parse and use <cutoffs[0,1,2]=70000;cutoffs[5,6,8]=90000> in LowerAllowCheckPass (#124211 ) This adds and utilizes a cutoffs parameter for LowerAllowCheckPass, via the Options parameter (introduced in https://github.com/llvm/llvm-project/pull/122994). Future work will connect -fsanitize-skip-hot-cutoff (introduced patch in https://github.com/llvm/llvm-project/pull/121619) in the clang frontend to the cutoffs parameter used here.	2025-01-27 20:08:53 -08:00
Craig Topper	d839e765f0	[TargetLowering] Inline the only caller of one of the forceExpandWideMUL functions. NFC This caller does not need the libcall portion so it can directly call forceExpandMultiply.	2025-01-27 17:10:37 -08:00
Momchil Velikov	f75860f895	[AArch64] Implement NEON FP8 intrinsics for fused multiply-add (#123615 ) This patch adds the following intrinsics: * Fused multiply-add non-indexed float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t, mfloat8x16_t, fpm_t) * Floating-point multiply-add long to half-precision (vector, by element) float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) * Floating-point multiply-add long-long to single-precision (vector, by element) float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-28 00:38:44 +00:00
Mingming Liu	e98b2028c7	[NFCI]Refactor AsmPrinter around jump table emission (#124645 ) Add method `AsmPrinter::emitJumpTableImpl`. It takes an array-ref of jump table indices. This splits refactor of PR https://github.com/llvm/llvm-project/pull/122215	2025-01-27 15:29:38 -08:00
Rahul Joshi	aca08a8515	[TableGen] Add assert to validate `Objects` list for `HwModeSelect` (#123794 ) - Bail out of TableGen if any asserts fail before running the backend. - Add asserts to validate that the `Objects` and `Modes` lists for various `HwModeSelect` subclasses are of same length. - Eliminate equivalent check in CodeGenHWModes.cpp	2025-01-27 13:44:44 -08:00
Florian Hahn	ad9da92cf6	[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462 ) Add an extra knob to RuntimeUnrollMultiExit to let backends control whether to allow multi-exit unrolling on a per-loop basis. This gives backends more fine-grained control on deciding if multi-exit unrolling is profitable for a given loop and uarch. Similar to 4226e0a0c75. PR: https://github.com/llvm/llvm-project/pull/124462	2025-01-27 21:20:04 +00:00
Momchil Velikov	804b81d39f	[AArch64] Add FP8 Neon intrinsics for dot-product (#123613 ) This patch adds the following intrinsics: float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, fpm_t fpm) float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, fpm_t fpm) float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm) float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn, mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)	2025-01-27 21:14:16 +00:00
Momchil Velikov	99bd2e3f12	[AArch64] Add Neon FP8 conversion intrinsics (#123612 ) The patch adds the following intrinsics: bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm) float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm) mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn, float32x4_t vm, fpm_t fpm) mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm) mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t fpm) Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>	2025-01-27 17:32:47 +00:00
Jeremy Morse	34b139594a	[NFC][DebugInfo] Switch more call-sites to using iterator-insertion (#124283 ) To finalise the "RemoveDIs" work removing debug intrinsics, we're updating call sites that insert instructions to use iterators instead. This set of changes are those where it's not immediately obvious that just calling getIterator to fetch an iterator is correct, and one or two places where more than one line needs to change. Overall the same rule holds though: iterators generated for the start of a block such as getFirstNonPHIIt need to be passed into insert/move methods without being unwrapped/rewrapped, everything else can use getIterator.	2025-01-27 16:44:14 +00:00
Jeremy Morse	81d18ad864	[NFC][DebugInfo] Make some block-start-position methods return iterators (#124287 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction's as positions with iterators. A number of these (such as getFirstNonPHIOrDbg) are sufficiently infrequently used that we can just replace the pointer-returning version with an iterator-returning version, hopefully without much/any disruption. Thus this patch has getFirstNonPHIOrDbg and getFirstNonPHIOrDbgOrLifetime return an iterator, and updates all call-sites. There are no concerns about the iterators returned being converted to Instruction's and losing the debug-info bit: because the methods skip debug intrinsics, the iterator head bit is always false anyway.	2025-01-27 16:27:54 +00:00
ssijaric-nv	16e9601e19	[Flang] Adjust the trampoline size for AArch64 and PPC (#118678 ) Set the trampoline size to match that in compiler-rt/lib/builtins/trampoline_setup.c and AArch64 and PPC lowering.	2025-01-27 08:02:18 -08:00
Jeremy Morse	e14962a39c	[NFC][DebugInfo] Use iterators for instruction insertion in more places (#124291 ) As part of the "RemoveDIs" work to eliminate debug intrinsics, we're replacing methods that use Instruction*'s as positions with iterators. This patch changes some more complex call-sites, those crossing file boundaries and where I've had to perform some minor rewrites.	2025-01-27 15:25:17 +00:00
David Sherwood	b7286dbef9	Reland "[LoopVectorize] Add support for reverse loops in isDereferenceableAndAlignedInLoop #96752 " (#123616 ) The last attempt failed a sanitiser build because we were creating a reference to a null Predicates pointer in isDereferenceableAndAlignedInLoop. This was exposed by the unit test IsDerefReadOnlyLoop in unittests/Analysis/LoadsTest.cpp. I fixed this by falling back on getConstantMaxBackedgeTakenCount if Predicates is null - see line 316 in llvm/lib/Analysis/Loads.cpp. There are no other changes.	2025-01-27 11:59:38 +00:00
Durgadoss R	3b5e9eed2f	[NVPTX] Add float to tf32 conversion intrinsics (#124316 ) This patch adds the set of f32 -> tf32 cvt intrinsics introduced in sm100 with ptx8.6. This builds on top of the recent PR #121507. Tests are verified with a 12.8 ptxas executable. PTX ISA link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2025-01-27 15:52:43 +05:30
Jay Foad	0c784851c5	[MathExtras] Favor using the hexadecimal FP constants (#123180 ) This just fixes a TODO now that we are using C++17.	2025-01-26 15:32:46 +00:00
Craig Topper	37fdde6025	[CodeGen] Remove implict conversions from Register to unsigned from MachineOperand. NFC	2025-01-25 23:12:14 -08:00
Craig Topper	4bcd8184a0	[TargetLowering] Pull similar code out of the forceExpandWideMUL into a helper. NFC (#124371 ) These functions have similar code. One of them calculates the 2x width full product from 2 sources. The other calculates the product from 2 sources that have low and high halves. This patch introduces a new function that takes HiLHS and HiRHS as optional values. If they are not null, they will be used in the calculation of the Hi half. The Signed flag can only be set when HiLHS/HiRHS are null.	2025-01-25 10:53:01 -08:00
vporpo	5cb2db3b51	[SandboxVec][Scheduler] Forbid crossing BBs (#124369 ) This patch updates the scheduler to forbid scheduling across BBs. It should eventually be able to handle this, but we disable it for now.	2025-01-25 08:19:27 -08:00
Craig Topper	ac1ba1f9dd	[CodeGen] Introduce a VirtRegOrUnit class to hold virtual reg or physical reg unit. NFC (#123768 ) LiveIntervals and MachineVerifier were previously using Register to store this, but reg units are different than physical registers. One important difference is that 0 is a valid reg unit number, but it is not a valid phyiscal register. This patch introduces a new VirtRegOrUnit class that is distinct from Register. It can be be converted to/from a virtual Register or a MCRegUnit. I've made all conversions explicit and used assertions to check the validity. I also fixed a place in MachineVerifier that was ignoring reg unit 0.	2025-01-24 18:30:28 -08:00
Teresa Johnson	c725a95e08	[MemProf] Convert Hot contexts to NotCold early (#124219 ) While we convert hot contexts to notcold contexts during the cloning step, their existence was greatly limiting the context trimming performed when we add the MemProf profile to the IR. To address this, any hot contexts are converted to notcold contexts immediately after first checking for unambiguous allocation types, and before checking it again and before adding metadata while performing context trimming. Note that hot hints are now disabled by default, however, this avoids adding unnecessary overhead if they are re-enabled.	2025-01-24 15:58:13 -08:00
vporpo	6409799bdc	[SandboxVec][Legality] Pack from different BBs (#124363 ) When the inputs of the pack come from different BBs we need to make sure we emit the pack instructions at the correct place.	2025-01-24 15:39:37 -08:00
vporpo	ac75d32280	[SandboxVec][VecUtils] Filter out instructions not in BB in VecUtils:getLowest() (#124360 ) This patch changes the functionality of `VecUtils::getLowest(Vals, BB)` such that it filters out any instructions in `Vals` that are not in BB. This is useful when Vals contains instructions from different BBs, because in that case we are only interested in one BB.	2025-01-24 14:52:57 -08:00
vporpo	cff7ad56ba	[SandboxVec][Utils] Implement Utils::verifyFunction() (#124356 ) This patch implements a wrapper function for the LLVM IR verifier for functions, and calls it (flag-guarded) within the bottom-up-vectorizer for finding IR bugs as soon as they happen.	2025-01-24 14:35:20 -08:00
vporpo	4b209c5d87	[SandboxIR][Region] Add cost modeling to the region (#124354 ) This patch implements cost modeling for Region. All instructions that are added or removed get their cost counted in the Scoreboard. This is used for checking if the region before or after a transformation is more profitable.	2025-01-24 14:28:55 -08:00
vporpo	b41987beae	[SandboxVec][DAG] Fix MemDGNode chain maintenance when move destination is non-mem (#124227 ) This patch fixes a bug in the maintenance of the MemDGNode chain of the DAG. Whenever we move a memory instruction, the DAG gets notified about the move and maintains the chain of memory nodes. The bug was that if the destination of the move was not a memory instruction, then the memory node's next node would end up pointing to itself.	2025-01-24 13:59:32 -08:00
Stephen Long	ab976a1712	PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec arg (#117568 )	2025-01-24 14:02:06 -05:00

1 2 3 4 5 ...

57952 Commits