llvm-project

Author	SHA1	Message	Date
Matt Arsenault	fbf74b2553	AMDGPU: Select vector reg class for divergent build_vector (#168169 ) The main improvement is to the mfma tests. There are some mild regressions scattered around, and a few major ones. The worst regressions are in some of the bitcast tests; these are cases where the SGPR argument list runs out and uses VGPRs, and the copies-from-VGPR are misidentified as divergent. Most of the shufflevector tests are also regressions. These end up with cleaner MIR, but then get poor regalloc decisions.	2025-11-14 21:53:39 -08:00
Matt Arsenault	2bf92787df	AMDGPU: Start using RegClassByHwMode for wavesize operands (#159884) This eliminates the pseudo registerclasses used to hack the wave register class, which are now replaced with RegClassByHwMode, so most of the diff is from register class ID renumbering.	2025-11-11 15:07:59 -08:00
Valery Pykhtin	59f6f33bc3	Reapply "[utils][UpdateLLCTestChecks] Add MIR support to update_llc_test_checks.py." (#164965 ) (#166575 ) This change enables update_llc_test_checks.py to automatically generate MIR checks for RUN lines that use `-stop-before` or `-stop-after` flags allowing tests to verify intermediate compilation stages (e.g., after instruction selection but before peephole optimizations) alongside the final assembly output. If `-debug-only` flag is present in the run line it's considered as the main point of interest for testing and stop flags above are ignored (that is no MIR checks are generated). This resulted from the scenario, when I needed to test two instruction matching patterns where the later pattern in the peepholer reverts the earlier pattern in the instruction selector and distinguish it from the case when the earlier pattern didn't worked at all. Initially created by Claude Sonnet 4.5 it was improved later to handle conflicts in MIR <-> ASM prefixes and formatting.	2025-11-06 11:35:46 +01:00
Valery Pykhtin	ba1dbdd44a	Revert "[utils][UpdateLLCTestChecks] Add MIR support to update_llc_test_checks.py." (#166549 ) Reverts llvm/llvm-project#164965	2025-11-05 14:30:27 +01:00
Valery Pykhtin	c782ed3440	[utils][UpdateLLCTestChecks] Add MIR support to update_llc_test_checks.py. (#164965 ) This change enables update_llc_test_checks.py to automatically generate MIR checks for RUN lines that use `-stop-before` or `-stop-after` flags allowing tests to verify intermediate compilation stages (e.g., after instruction selection but before peephole optimizations) alongside the final assembly output. If `-debug-only` flag is present in the run line it's considered as the main point of interest for testing and stop flags above are ignored (that is no MIR checks are generated). This resulted from the scenario, when I needed to test two instruction matching patterns where the later pattern in the peepholer reverts the earlier pattern in the instruction selector and distinguish it from the case when the earlier pattern didn't worked at all. Initially created by Claude Sonnet 4.5 it was improved later to handle conflicts in MIR <-> ASM prefixes and formatting.	2025-11-05 13:31:10 +01:00
Kunqiu Chen	82cf54fbf6	[UTC] CHECK-EMPTY instead of skipping blank lines (#165718 ) Previously, any blank lines in IR were ignored by UTC, leading to more fragile `CHECK`s being generated. This change lets UTC, 1) emit `CHECK-EMPTY` to check blank lines, and 2) generate more `CHECK-NEXT`s, landing the discussion https://github.com/llvm/llvm-project/pull/165419#issuecomment-3457572422. Moreover, this change also aligns the behavior of IR check-gen to ASM check-gen, which has been emitting `CHECK-EMPTY` since `a8a89c77ea`.	2025-11-01 17:01:30 +08:00
Kunqiu Chen	566c7311d4	[UTC] Indent switch cases (#165212 ) LLVM prints switch cases indented by 2 additional spaces, as follows: ```LLVM switch i32 %x, label %default [ i32 0, label %phi i32 1, label %phi ] ``` Since this only changes the output IR of update_test_checks.py and does not change the logic of the File Check Pattern, there seems to be no need to update the existing test cases.	2025-10-28 22:00:54 +08:00
Tomer Shafir	9abae17b25	[UpdateTestChecks][llc] Support `arm64-apple-darwin` (#165092 ) Adds `arm64-apple-darwin` support to `asm.py` matching and removes now invalidated `target-triple-mismatch` test (I dont have another triple supported by llc but not the autogenerator that make this test useful).	2025-10-27 21:22:23 +02:00
Matt Arsenault	853760bca6	AMDGPU: Use ELF mangling in data layout (#163011 ) Closes #95219	2025-10-13 03:01:45 +00:00
Matt Arsenault	4af3e8f1d4	AMDGPU: Remove LDS_DIRECT_CLASS register class (#161762 ) This is a singleton register class which is a bad idea, and not actually used.	2025-10-04 03:56:43 +00:00
Matt Arsenault	1b30e49b9b	AMDGPU: Remove m0 classes (#161758 ) These are singleton register classes, which are not a good idea and also are unused.	2025-10-04 02:33:45 +00:00
Alex Bradbury	9d48df7a92	[UpdateTestChecks] Don't fail silently when conflicting CHECK lines means no checks are generated for some functions (#159321 ) There is a warning that triggers if you (for instance) run `update_llc_test_checks.py` on an input where _all_ functions have conflicting check lines and so no checks are generated. However, there are no warnings emitted at all for the case where some functions have non-conflicting check lines but others don't. This is a source of frustration because running update_llc_test_checks can result in all check lines being removed for certain functions when such a conflict exists with no warning, meaning we have to be extra vigilant inspecting the diff. I've also personally wasted time tracking down the source of the dropped lines assuming that update_test_checks would emit a warning in such cases. This change adds logic to emit warnings on a function-by-function basis for any RUN that has conflicting prefixes meaning no output is generated. This subsumes the previous warning for when _all_ functions conflict.	2025-09-23 16:17:35 +00:00
Stanislav Mekhanoshin	fd59fd563f	[AMDGPU] Add aperture classes to VS_64 (#158823 ) Should not do anything.	2025-09-16 11:15:50 -07:00
Stanislav Mekhanoshin	72aa946762	[AMDGPU] Drop high 32 bits of aperture registers (#158725 ) Fixes: SWDEV-551181	2025-09-16 02:11:39 -07:00
Antonio Frighetto	1cacc7339b	[UTC] Record TBAA semantics when autogenerating check lines UpdateTestChecks have been updated to take into account TBAA semantics as well, when emitting checks. This is achieved by parsing TBAA metadata for each tool invocation – whose tool is identified by their prefixes –, and maintaining a global dict of prefixes, TBAA nodes.	2025-09-10 19:40:30 +02:00
Antonio Frighetto	cb9cb4eb2e	[UTC] Introduce test for PR147670 (NFC)	2025-09-10 19:40:30 +02:00
Stanislav Mekhanoshin	6aebbb0a85	[AMDGPU] Define 1024 VGPRs on gfx1250 (#156765 ) This is a baseline support, it is not useable yet.	2025-09-03 16:25:18 -07:00
Matt Arsenault	1ff6bfe7a5	AMDGPU: Add VS_64_Align2 class (#156132 ) We need an aligned version of the VS class to properly represent operand constraints. This fixes regressions with #155559	2025-09-02 23:24:07 +09:00
Michael Berg	efa99eccfc	[LoopDist] Add metadata for checking post process state of distribute… (#153902 ) …d loops Add a count of the number of partitions LoopDist made when distributing a loop in meta data, then check for loops which are already distributed to prevent reprocessing. We see this happen on some spec apps, LD is on by default at SiFive.	2025-08-22 11:05:31 -07:00
Alex MacLean	d494eb0fa3	[NVPTX] Skip numbering unreferenced virtual registers (readability) (#154391 ) When assigning numbers to registers, skip any with neither uses nor defs. This is will not have any impact at all on the final SASS but it makes for slightly more readable PTX. This change should also ensure that future minor changes are less likely to cause noisy diffs in register numbering.	2025-08-19 12:27:46 -07:00
Philip Reames	4d629f9744	[MIR] Remove std::variant from multiple save/restore point handling [nfc] (#153226 ) In review of bbde6b, I had originally proposed that we support the legacy text format. As review evolved, it bacame clear this had been a bad idea (too much complexity), but in order to let that patch finally move forward, I approved the change with the variant. This change undoes the variant, and updates all the tests to just use the array form.	2025-08-12 11:23:05 -07:00
Tomer Shafir	d64e6b5e27	[utils][UpdateTestChecks] Warn about possible target triple mismatch (#149645 ) Aims to improve error reporting by printing a warning if the target function regex that has been selected finds no matches. For example, a `-mtriple=arm64-apple-darwin` runline, would map to the `arm64` prefix by `update_llc_test_checks.py` and wouldn't match Apple's function layout, generating some not understandable garbage checks. The implementation changes `common.process_run_line` to return an abstract indicator of number of functions processed, without breaking the drivers. Then `update_llc_test_checks.py` prints a driver specific error message.	2025-08-12 11:44:42 +03:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Aiden Grossman	6ba25c1a56	[llvm] Remove uses of %T in tests (#151621 ) This patch removes all uses of %T from within LLVM tests. %T has been deprecated for about seven years and use is not advised given it is not unique per test and can thus lead to races. The goal of this is to eventually remove support for %T from lit.	2025-08-01 08:24:56 -07:00
Alex MacLean	35693daa70	[NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (#146930 )	2025-07-28 10:41:51 -07:00
sivadeilra	b933f0c376	Fix Windows EH IP2State tables (remove +1 bias) (#144745 ) This changes how LLVM constructs certain data structures that relate to exception handling (EH) on Windows. Specifically this changes how IP2State tables for functions are constructed. The purpose of this change is to align LLVM to the requires of the Windows AMD64 ABI, which requires that the IP2State table entries point to the boundaries between instructions. On most Windows platforms (AMD64, ARM64, ARM32, IA64, but not x86-32), exception handling works by looking up instruction pointers in lookup tables. These lookup tables are stored in `.xdata` sections in executables. One element of the lookup tables are the `IP2State` tables (Instruction Pointer to State). If a function has any instructions that require cleanup during exception unwinding, then it will have an IP2State table. Each entry in the IP2State table describes a range of bytes in the function's instruction stream, and associates an "EH state number" with that range of instructions. A value of -1 means "the null state", which does not require any code to execute. A value other than -1 is an index into the State table. The entries in the IP2State table contain byte offsets within the instruction stream of the function. The Windows ABI requires that these offsets are aligned to instruction boundaries; they are not permitted to point to a byte that is not the first byte of an instruction. Unfortunately, CALL instructions present a problem during unwinding. CALL instructions push the address of the instruction after the CALL instruction, so that execution can resume after the CALL. If the CALL is the last instruction within an IP2State region, then the return address (on the stack) points to the next IP2State region. This means that the unwinder will use the wrong cleanup funclet during unwinding. To fix this problem, compilers should insert a NOP after a CALL instruction, if the CALL instruction is the last instruction within an IP2State region. The NOP is placed within the same IP2State region as the CALL, so that the return address points to the NOP and the unwinder will locate the correct region. This PR modifies LLVM so that it inserts NOP instructions after CALL instructions, when needed. In performance tests, the NOP has no detectable significance. The NOP is rarely inserted, since it is only inserted when the CALL is the last instruction before an IP2State transition or the CALL is the last instruction before the function epilogue. NOP padding is only necessary on Windows AMD64 targets. On ARM64 and ARM32, instructions have a fixed size so the unwinder knows how to "back up" by one instruction. Interaction with Import Call Optimization (ICO): Import Call Optimization (ICO) is a compiler + OS feature on Windows which improves the performance and security of DLL imports. ICO relies on using a specific CALL idiom that can be replaced by the OS DLL loader. This removes a load and indirect CALL and replaces it with a single direct CALL. To achieve this, ICO also inserts NOPs after the CALL instruction. If the end of the CALL is aligned with an EH state transition, we also insert a single-byte NOP. Both forms of NOPs must be preserved. They cannot be combined into a single larger NOP; nor can the second NOP be removed. This is necessary because, if ICO is active and the call site is modified by the loader, the loader will end up overwriting the NOPs that were inserted for ICO. That means that those NOPs cannot be used for the correct termination of the exception handling region (the IP2State transition), so we still need an additional NOP instruction. The NOPs cannot be combined into a longer NOP (which is ordinarily desirable) because then ICO would split one instruction, producing a malformed instruction after the ICO call.	2025-07-22 09:18:13 -07:00
Alex MacLean	f03782dd67	[NVPTX] Fixup v2i8 parameter and return lowering (#145585 ) This change fixes v2i8 lowering for parameters and returned values. As part of this work, I move the lowering for return values to use generic ISD::STORE nodes as these are more flexible and have existing legalization handling. Note that calling a function with v2i8 arguments or returns is still not working but this is left for a subsequent change as this MR is already fairly large. Partially addresses #128853	2025-06-27 09:26:10 -07:00
Alex MacLean	70333de6cf	[NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (#145581 ) This change consolidates and cleans up various NVPTXISD target-specific nodes in order to simplify SDAG ISel. While there are some whitespace changes in the emitted PTX it is otherwise a non-functional change. NVPTXISD::Wrapper - This node was used to wrap external-symbol and global-address nodes. It is redundant and has been removed. Instead we use the non-target versions of these nodes and convert them appropriately during ISel. NVPTXISD::CALL - Much of the family of nodes used to represent a PTX call instruction have been replaced by this new single node. It corresponds to a single instruction and is therefore much simpler to create and lower.	2025-06-25 11:42:21 -07:00
Pierre van Houtryve	01848731d3	[tools] Allow RegClass/Bank in update_givaluetracking_test_checks.py (#141727 ) The script previously assumed an underscore after the :	2025-05-28 10:29:18 +02:00
David Green	a2aa88192f	[GlobalISel] Add a update_givaluetracking_test_checks.py script (#140296 ) As with the other update scripts this takes the output of -passes=print<gisel-value-tracking> and inserts the results into an existing mir file. This means that the input is a lot like update_analysis_test_checks.py, and the output needs to insert into a mir file similarly to update_mir_test_checks.py. The code used to do the inserting has been moved to common, to allow it to be reused. Otherwise it tries to reuse the existing infrastructure, and update_givaluetracking_test_checks is kept relatively short.	2025-05-22 09:06:37 +01:00
Ramkumar Ramachandra	bb2791609d	[LAA] Tweak debug output for UTC stability (#140764 ) UpdateTestChecks has a make_analyzer_generalizer to replace pointer addressess from the debug output of LAA with a pattern, which is an acceptable solution when there is one RUN line. However, when there are multiple RUN lines with a common pattern, UTC fails to recognize common output due to mismatched pointer addresses. Instead of hacking UTC scrub the output before comparing the outputs from the different RUN lines, fix the issue once and for all by making LAA not output unstable pointer addresses in the first place. The removal of the now-dead make_analyzer_generalizer is left as a non-trivial exercise for a follow-up.	2025-05-21 12:01:49 +01:00
hev	746c682c4a	[LoongArch] Introduce `32s` target feature for LA32S ISA extensions (#139695 ) According to the offical LoongArch reference manual, the 32-bit LoongArch is divied into two variants: the Reduced version (LA32R) and Standard version (LA32S). LA32S extends LA32R by adding additional instructions, and the 64-bit version (LA64) fully includes the LA32S instruction set. This patch introduces a new target feature `32s` for the LoongArch backend, enabling support for instructions specific to the LA32S variant. The LA32S exntension includes the following additional instructions: - ALSL.W - {AND,OR}N - B{EQ,NE}Z - BITREV.{4B,W} - BSTR{INS,PICK}.W - BYTEPICK.W - CL{O,Z}.W - CPUCFG - CT{O,Z}.W - EXT.W,{B,H} - F{LD,ST}X.{D,S} - MASK{EQ,NE}Z - PC{ADDI,ALAU12I} - REVB.2H - ROTR{I},W Additionally, LA32R defines three new instruction aliases: - RDCNTID.W RJ => RDTIMEL.W ZERO, RJ - RDCNTVH.W RD => RDTIMEH.W RD, ZERO - RDCNTVL.W RD => RDTIMEL.W RD, ZERO	2025-05-20 18:28:08 +08:00
Ruiling, Song	b8e5307031	update_mir_test_checks: keep comment embedded in MIR (#140016 ) We often add inline comment in mir. It is useful to keep them.	2025-05-20 09:55:18 +08:00
Alex MacLean	369891b674	[NVPTX] use untyped loads and stores where ever possible (#137698 ) In most cases, the type information attached to load and store instructions is meaningless and inconsistently applied. We can usually use ".b" loads and avoid the complexity of trying to assign the correct type. The one expectation is sign-extending load, which will continue to use ".s" to ensure the sign extension into a larger register is done correctly.	2025-05-10 08:26:26 -07:00
Orlando Cazalet-Hyams	234ae9bfd9	update_test_checks: indent dbg records (#139230 ) LLVM prints debug records like `#dbg_value` indented 2 additional spaces.	2025-05-09 11:23:43 +01:00
Scott Linder	e78b763568	update_test_checks: Relax DIFile filename checks (#135692 ) Avoid baking in absolute paths in check lines generated for DIFile metadata. Generated test checks cannot be sensitive to absolute paths anyway, as those vary with the environment, but there could be situations where some sensitivity to partial paths is required for certain tests. This implementation just assumes such tests aren't worth the effort to support, but it could be supported in the future. This is most useful for update_cc_test_checks with debug info enabled, where the test writer cannot manipulate the paths within the generated IR directly.	2025-04-24 13:03:33 -04:00
Pankaj Dwivedi	a25fdd7aca	Reapply "[AMDGPU] Insert readfirstlane in the function returns in sgpr." (#136678 ) Reapply #135326 and fix the target-dependent constant check. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-04-22 17:48:55 +05:30
Alex MacLean	56910a8b1b	[NVPTX] Improve kernel byval parameter lowering (#136008 ) This change introduces a new pattern for lowering kernel byval parameters in `NVPTXLowerArgs`. Each byval argument is wrapped in a call to a new intrinsic, `@llvm.nvvm.internal.addrspace.wrap`. This intrinsic explicitly equates to no instructions and is removed during operation legalization in SDAG. However, it allows us to change the addrspace of the arguments to 101 to reflect the fact that they will occupy this space when lowered by `LowerFormalArgs` in `NVPTXISelLowering`. Optionally, if a generic pointer to a param is needed, a standard `addrspacecast` is used. This approach offers several advantages: - Exposes addrspace optimizations: By using a standard `addrspacecast` back to generic space we allow InferAS to optimize this instruction, potentially sinking it through control flow or in other ways unsupported by `NVPTXLowerArgs`. This is demonstrated in several existing tests. - Clearer, more consistent semantics: Previously an `addrspacecast` from generic to param space was implicitly a no-op. This is problematic because it's not reciprocal with the inverse cast, violating LLVM semantics. Further it is very confusing given the existence of `cvta.to.param`. After this change the cast equates to this instruction. - Allow for the removal of all nvvm.ptr.* intrinsics: In a follow-up change the nvvm.ptr.gen.to.param and nvvm.ptr.param.to.gen intrinsics may be removed.	2025-04-21 15:55:06 -07:00
Shilei Tian	9968ba8652	Revert "[AMDGPU] Insert readfirstlane in the function returns in sgpr. (#135326 )" This reverts commit 76ced7fa782f0d7db9efea871fa6de74706dd9cc since it breaks a lot of bots.	2025-04-21 14:31:10 -04:00
Pankaj Dwivedi	76ced7fa78	[AMDGPU] Insert readfirstlane in the function returns in sgpr. (#135326 ) insert `readfirstlane` in the function returns in sgpr.	2025-04-21 21:57:16 +05:30
Jeremy Morse	1ebc308bba	[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855 ) During the transition from debug intrinsics to debug records, we used several different command line options to customise handling: the printing of debug records to bitcode and textual could be independent of how the debug-info was represented inside a module, whether the autoupgrader ran could be customised. This was all valuable during development, but now that totally removing debug intrinsics is coming up, this patch removes those options in favour of a single flag (experimental-debuginfo-iterators), which enables autoupgrade, in-memory debug records, and debug record printing to bitcode and textual IR. We need to do this ahead of removing the experimental-debuginfo-iterators flag, to reduce the amount of test-juggling that happens at that time. There are quite a number of weird test behaviours related to this -- some of which I simply delete in this commit. Things like print-non-instruction-debug-info.ll , the test suite now checks for debug records in all tests, and we don't want to check we can print as intrinsics. Or the update_test_checks tests -- these are duplicated with write-experimental-debuginfo=false to ensure file writing for intrinsics is correct, but that's something we're imminently going to delete. A short survey of curious test changes: * free-intrinsics.ll: we don't need to test that debug-info is a zero cost intrinsic, because we won't be using intrinsics in the future. * undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory mode while we sorted something out; it works now either way. * salvage-cast-debug-info.ll: was testing intrinsics-in-memory get salvaged, isn't necessary now * localize-constexpr-debuginfo.ll: was producing "dead metadata" intrinsics for optimised-out variable values, dbg-records takes the (correct) representation of poison/undef as an operand. Looks like we didn't update this in the past to avoid spurious test differences. * Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing that debug-info affected codegen, and we deferred updating the tests until now. This is just one of those silent gnochange issues that get fixed by RemoveDIs. Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc, that checks we can autoupgrade debug intrinsics that are in bitcode into the new debug records.	2025-04-01 14:27:11 +01:00
Alexey Karyakin	c0b2c10e9f	[hexagon] Bump the default version to v68 (#132304 ) Set the default processor version to v68 when the user does not specify one in the command line. This includes changes in the LLVM backed and linker (lld). Since lld normally sets the version based on inputs, this change will only affect cases when there are no inputs. Fixes #127558	2025-03-21 20:08:45 -05:00
David Sherwood	194eceff43	update_test_checks: add new --filter-out-after option (#129739 ) Whilst trying to clean up some loop vectoriser IR tests (see test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll for example) a reviewer on PR #129047 suggested it would be nice to have an option to stop generating CHECK lines after a certain point. Typically when performing a transformation with the loop vectoriser we don't usually care about any CHECK lines generated for the scalar tail of the loop, since the scalar loop is kept intact. Previously if you wanted to eliminate such unwanted CHECK lines you had to run the update script, then manually delete all the lines corresponding to the scalar loop. This can be very time consuming if the tests ever need changing. What I've tried to do here is add a new --filter-out-after option alongside the existing --filter* options that provides support for stopping the generation of any CHECK lines beyond the line that matches the filter. With the existing filter options we never generate CHECK-NEXT lines, but we still care about ordering with --filter-out-after so I've amended the code to ensure we treat this filter differently.	2025-03-18 09:46:43 +00:00
Daniel Paoliello	16e051f0b9	[win] NFC: Rename `EHCatchret` to `EHCont` to allow for EH Continuation targets that aren't `catchret` instructions (#129953 ) This change splits out the renaming and comment updates from #129612 as a non-functional change.	2025-03-06 09:28:44 -08:00
Alex MacLean	0065343159	[NVPTX] Improve device function byval parameter lowering (#129188 ) PTX supports 2 methods of accessing device function parameters: - "simple" case: If a parameters is only loaded, and all loads can address the parameter via a constant offset, then the parameter may be loaded via the ".param" address space. This case is not possible if the parameters is stored to or has it's address taken. This method is preferable when possible. - "move param" case: For more complex cases the address of the param may be placed in a register via a "mov" instruction. This mov also implicitly moves the param to the ".local" address space and allows for it to be written to. This essentially defers the responsibilty of the byval copy to the PTX calling convention. The handling of these cases in the NVPTX backend for byval pointers has some major issues. We currently attempt to determine if a copy is necessary in NVPTXLowerArgs and either explicitly make an additional copy in the IR, or insert "addrspacecast" to move the param to the param address space. Unfortunately the criteria for determining which case is possible are not correct, leading to miscompilations (https://godbolt.org/z/Gq1fP7a3G). Further, the criteria for the "simple" case aren't enforceable in LLVM IR across other transformations and instruction selection, making deciding between the 2 cases in NVPTXLowerArgs brittle and buggy. This patch aims to fix these issues and improve address space related optimization. In NVPTXLowerArgs, we conservatively assume that all parameters will use the "move param" case and the local address space. Responsibility for switching to the "simple" case is given to a new MachineIR pass, NVPTXForwardParams, which runs once it has become clear whether or not this is possible. This ensures that the correct address space is known for the "move param" case allowing for optimization, while still using the "simple" case where ever possible.	2025-02-28 14:15:25 -08:00
Alex MacLean	79261d4aab	[NVPTX][InferAS] assume alloca instructions are in local AS (#121710 )	2025-02-21 14:32:54 -08:00
Jinsong Ji	5d4998bc02	UpdateTestChecks: Don't check meta details in func definition w/--global none (#124205 ) When --check-globals none, we skipped all the globals in check lines. However, we are still checking the meta info in function defintion. The generated checks are still sensitive to metadata changes. This is to scrub the meta info and match them with {{.*}} instead.	2025-02-19 20:29:51 -05:00
Daniel Paoliello	b3458fdec5	[llvm] Win x64 Unwind V2 1/n: Mark beginning and end of epilogs (#110024 ) Windows x64 Unwind V2 adds epilog information to unwind data: specifically, the length of the epilog and the offset of each epilog. The first step to do this is to add markers to the beginning and end of each epilog when generating Windows x64 code. I've modelled this after how LLVM was marking ARM and AArch64 epilogs in Windows (and unified the code between the three).	2025-01-30 13:51:30 -08:00
Ramkumar Ramachandra	3a4376b8f9	LAA: handle 0 return from getPtrStride correctly (#124539 ) getPtrStride returns 0 when the PtrScev is loop-invariant, and this is not an erroneous value: it returns std::nullopt to communicate that it was not able to find a valid pointer stride. In analyzeLoop, we call getPtrStride with a value_or(0) conflating the zero return value with std::nullopt. Fix this, handling loop-invariant loads correctly.	2025-01-27 14:21:14 +00:00
Aaditya	11b0401926	[AMDGPU] Restore SP from saved-FP or saved-BP (#124007 ) Currently, the AMDGPU backend bumps the Stack Pointer by fixed size offsets in the prolog of device functions, and restores it by the same amount in the epilog. Prolog: sp += frameSize Epilog: sp -= frameSize If a function has dynamic stack realignment, Prolog: sp += frameSize + max_alignment Epilog: sp -= frameSize + max_alignment These calculations are not optimal in case of dynamic stack realignment, and completely fail in case of dynamic stack readjustment. This patch uses the saved Frame Pointer to restore SP. Prolog: fp = sp sp += frameSize Epilog: sp = fp In case of dynamic stack realignment, SP is restored from the saved Base Pointer. Prolog: fp = sp + (max_alignment - 1) fp = fp & (-max_alignment) bp = sp sp += frameSize + max_alignment Epilog: sp = bp (Note: The presence of BP has been enforced in case of any dynamic stack realignment.) --------- Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-01-24 19:13:40 +05:30

1 2 3 4 5 ...

264 Commits