llvm-project

Author	SHA1	Message	Date
Matt Arsenault	6017480461	MachineVerifier: Fix check for range type (#124894 ) We need to permit scalar extending loads with range annotations. Fix expensive_checks failures after 11db7fb09b36e656a801117d6a2492133e9c2e46	2025-01-30 10:56:12 +07:00
Matt Arsenault	97a1f494a6	DAG: Avoid breaking legal vector_shuffle with multiple uses (#123712 ) Previously this combine would undo AMDGPU's new custom legalization of wide vector shuffles into 2 element pieces. The comment also states that this combine is only done before legalization, but the case with a build_vector source was unconditional. We probably don't want to do this if the multiple uses are full scalarization of the vector, but this seems to work well enough. Scalarizing extracts should have folded out pre-legalize.	2025-01-30 10:55:21 +07:00
Carl Ritson	a3a3e6997b	[AMDGPU] Rewrite GFX12 SGPR hazard handling to dedicated pass (#118750 ) - Algorithm operates over whole IR to attempt to minimize waits. - Add support for VALU->VALU SGPR hazards via VA_SDST/VA_VCC.	2025-01-30 11:21:11 +09:00
Konstantina Mitropoulou	9adc99bcc5	[AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches. (#124028 ) - [NFC] Use GCNPat instead of Pat. - [AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches. --------- Co-authored-by: Konstantina Mitropoulou <KonstantinaMitropoulou@amd.com>	2025-01-29 09:00:40 -08:00
Nikita Popov	29441e4f5f	[IR] Convert from nocapture to captures(none) (#123181 ) This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.	2025-01-29 16:56:47 +01:00
Acim Maravic	3a29dfe37c	[LLVM][AMDGPU] Add Intrinsic and Builtin for ds_bpermute_fi_b32 (#124616 )	2025-01-29 14:04:10 +01:00
David Green	66e0498daf	[GlobalISel] Do not run verifier after ResetMachineFunctionPass (#124799 ) After we fall back from GlobalISel to SDAG, the verifier gets called, which calls getReservedRegs which uses SIMachineFunctionInfo::usesAGPRs which caches the result of UsesAGPRs. Because we have just fallen-back the function is empty and it incorrectly gets cached to false. This patch makes sure we don't try to run the verifier whilst the function is empty.	2025-01-29 12:48:11 +00:00
Daniil Fukalov	68d90cff58	[AMDGPU][GlobalISel] Fix assert on APInt creation. (#124608 ) Since 3494ee95902cef62f767489802e469c58a13ea04 APInt stopped to implicitly truncate values, therefore it asserts on a big signed value converted to (implicitly) unsigned APInt. The change explicitly marks offset as a signed value.	2025-01-28 15:53:17 +01:00
Renat Idrisov	11db7fb09b	[GlobalISel] Catching inconsistencies in load memory, result, and range metadata type (#121247 ) This is a fix for: https://github.com/llvm/llvm-project/issues/97290 Please let me know if that is the right way to address the issue. Thank you! --------- Co-authored-by: Renat Idrisov <parsifal-47@users.noreply.github.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-01-28 20:54:34 +07:00
Pierre van Houtryve	8ea018ce1d	[DAGISel] Fix MMRA Handling in copyExtraInfo (#124730 ) #78569 did not implement this correctly and an edge case breaks it by triggering `Assertion `!Leafs.empty()' failed.` Fixes SWDEV-507698	2025-01-28 13:27:26 +01:00
Aaditya	cd57c9530b	[NFC][AMDGPU] Autogenerating test cases (#124507 )	2025-01-28 13:41:59 +05:30
Shilei Tian	6e4105574e	[NFC][AMDGPU] Improve code introduced in #124607 (#124672 )	2025-01-27 22:57:16 -05:00
Shilei Tian	3b2b7ec07d	[AMDGPU] Handle invariant marks in `AMDGPUPromoteAllocaPass` (#124607 ) Fixes SWDEV-509327.	2025-01-27 17:30:50 -05:00
David Green	5a81a559d6	[GISel] Explicitly disable BF16 tablegen patterns. (#124113 ) We currently have an issue where bf16 patters can be used to match fp16 types, as GISel does not know about the difference between the two. This patch explicitly disables them to make sure that they are never used. The opposite can also happen too, where fp16 patterns are used for operators that should be bf16. So this also changes any operations with bf16 types to now cause a fallback to SDAG. The pass setup for GISel has been slightly adjusted to make sure that a verify pass does not get added between AMD-SDAG and SIFixSGPRCopiesPass, which otherwise can cause verifier issues when falling back.	2025-01-27 22:21:12 +00:00
Jeffrey Byrnes	e77d428e46	[AMDGPU] Do not remat instructions with PhysReg uses (#124366 ) This blocks rematerialization during scheduling if the instruction has a non accepted PhysReg use. Currently, there aren't any checks like this in place, and we may create invalid code: https://godbolt.org/z/xjPjdcorf	2025-01-27 10:50:06 -08:00
Brox Chen	d1139b32d2	[AMDGPU][True16][CodeGen] true16 codegen pats for v_mad_u16 (#124000 ) true16 codegen pats for v_mad_u16 (mul+add)	2025-01-27 13:47:17 -05:00
Brox Chen	ec66c4af09	[AMDGPU][True16][CodeGen] true16 codegen pattern for f16 canonicalize (#122000 ) true16 codegen pattern for f16 canonicalize	2025-01-24 10:44:00 -05:00
Aaditya	11b0401926	[AMDGPU] Restore SP from saved-FP or saved-BP (#124007 ) Currently, the AMDGPU backend bumps the Stack Pointer by fixed size offsets in the prolog of device functions, and restores it by the same amount in the epilog. Prolog: sp += frameSize Epilog: sp -= frameSize If a function has dynamic stack realignment, Prolog: sp += frameSize + max_alignment Epilog: sp -= frameSize + max_alignment These calculations are not optimal in case of dynamic stack realignment, and completely fail in case of dynamic stack readjustment. This patch uses the saved Frame Pointer to restore SP. Prolog: fp = sp sp += frameSize Epilog: sp = fp In case of dynamic stack realignment, SP is restored from the saved Base Pointer. Prolog: fp = sp + (max_alignment - 1) fp = fp & (-max_alignment) bp = sp sp += frameSize + max_alignment Epilog: sp = bp (Note: The presence of BP has been enforced in case of any dynamic stack realignment.) --------- Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com> Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-01-24 19:13:40 +05:30
Petar Avramovic	b60c118f53	MachineUniformityAnalysis: Improve isConstantOrUndefValuePhi (#112866 ) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.	2025-01-24 12:43:40 +01:00
Petar Avramovic	4831fa8632	AMDGPU/GlobalISel: RegBankLegalize rules for load (#112882 ) Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, \|\| and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect.	2025-01-24 12:36:41 +01:00
Petar Avramovic	0ee037b861	AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864 ) Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.	2025-01-24 12:12:45 +01:00
Petar Avramovic	f8a56df36e	AMDGPU/GlobalISel: AMDGPURegBankSelect (#112863 ) Assign register banks to virtual registers. Does not use generic RegBankSelect. After register bank selection all register operand of G_ instructions have LLT and register banks exclusively. If they had register class, reassign appropriate register bank. Assign register banks using machine uniformity analysis: Sgpr - uniform values and some lane masks Vgpr - divergent, non S1, values Vcc - divergent S1 values(lane masks) AMDGPURegBankSelect does not consider available instructions and, in some cases, G_ instructions with some register bank assignment can't be inst-selected. This is solved in RegBankLegalize. Exceptions when uniformity analysis does not work: S32/S64 lane masks: - need to end up with sgpr register class after instruction selection - In most cases Uniformity analysis declares them as uniform (forced by tablegen) resulting in sgpr S32/S64 reg bank - When Uniformity analysis declares them as divergent (some phis), use intrinsic lane mask analyzer to still assign sgpr register bank temporal divergence copy: - COPY to vgpr with implicit use of $exec inside of the cycle - this copy is declared as uniform by uniformity analysis - make sure that assigned bank is vgpr Note: uniformity analysis does not consider that registers with vgpr def are divergent (you can have uniform value in vgpr). - TODO: implicit use of $exec could be implemented as indicator that instruction is divergent	2025-01-24 11:06:02 +01:00
Frederik Harwath	bfd9bc2745	[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#124131 ) This PR reapplies the changes from PR #123942 which had to be reverted because of a test failure. The test has been adjusted.	2025-01-24 09:12:32 +01:00
Chaitanya	3c79a04cc2	[AMDGPU] Add amdgpu-sw-lower-lds pass to NPM codegen addIRPasses. (#124102 ) This PR adds amdgpu-sw-lower-lds pass to AMDGPUCodeGenPassBuilder::addIRPasses()	2025-01-24 11:15:30 +05:30
Jeffrey Byrnes	acb7859f07	[MachineSink] Extend loop sinking capability (#117247 ) The current MIR cycle sinking capabilities are rather limited. It only support sinking copies into a single successor block while obeying limits. This opt-in feature adds a more aggressive option, that is not limited to the above concerns. The feature will try to "sink" by duplicating any top-level preheader instruction (that we are sure is safe to sink) into any user block, then does some dead code cleanup. In particular, this is useful for high RP situations when loop bodies have control flow.	2025-01-23 17:08:23 -08:00
Lucas Ramirez	6206f5444f	[AMDGPU] Occupancy w.r.t. workgroup size range is also a range (#123748 ) Occupancy (i.e., the number of waves per EU) depends, in addition to register usage, on per-workgroup LDS usage as well as on the range of possible workgroup sizes. Mirroring the latter, occupancy should therefore be expressed as a range since different group sizes generally yield different achievable occupancies. `getOccupancyWithLocalMemSize` currently returns a scalar occupancy based on the maximum workgroup size and LDS usage. With respect to the workgroup size range, this scalar can be the minimum, the maximum, or neither of the two of the range of achievable occupancies. This commit fixes the function by making it compute and return the range of achievable occupancies w.r.t. workgroup size and LDS usage; it also renames it to `getOccupancyWithWorkGroupSizes` since it is the range of workgroup sizes that produces the range of achievable occupancies. Computing the achievable occupancy range is surprisingly involved. Minimum/maximum workgroup sizes do not necessarily yield maximum/minimum occupancies i.e., sometimes workgroup sizes inside the range yield the occupancy bounds. The implementation finds these sizes in constant time; heavy documentation explains the rationale behind the sometimes relatively obscure calculations. As a justifying example, consider a target with 10 waves / EU, 4 EUs/CU, 64-wide waves. Also consider a function with no LDS usage and a flat workgroup size range of [513,1024]. - A group of 513 items requires 9 waves per group. Only 4 groups made up of 9 waves each can fit fully on a CU at any given time, for a total of 36 waves on the CU, or 9 per EU. However, filling as much as possible the remaining 40-36=4 wave slots without decreasing the number of groups reveals that a larger group of 640 items yields 40 waves on the CU, or 10 per EU. - Similarly, a group of 1024 items requires 16 waves per group. Only 2 groups made up of 16 waves each can fit fully on a CU ay any given time, for a total of 32 waves on the CU, or 8 per EU. However, removing as many waves as possible from the groups without being able to fit another equal-sized group on the CU reveals that a smaller group of 896 items yields 28 waves on the CU, or 7 per EU. Therefore the achievable occupancy range for this function is not [8,9] as the group size bounds directly yield, but [7,10]. Naturally this change causes a lot of test churn as instruction scheduling is driven by achievable occupancy estimates. In most unit tests the flat workgroup size range is the default [1,1024] which, ignoring potential LDS limitations, would previously produce a scalar occupancy of 8 (derived from 1024) on a lot of targets, whereas we now consider the maximum occupancy to be 10 in such cases. Most tests are updated automatically and checked manually for sanity. I also manually changed some non-automatically generated assertions when necessary. Fixes #118220.	2025-01-23 16:07:57 +01:00
Nico Weber	99d450e9f5	Revert "[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#123942 )" This reverts commit 6fdaaafd89d7cbc15dafe3ebf1aa3235d148aaab. Breaks check-llvm, see https://github.com/llvm/llvm-project/pull/123942#issuecomment-2609861953	2025-01-23 09:19:42 -05:00
Matt Arsenault	e28e93550a	AMDGPU: Make vector_shuffle legal for v2i32 with v_pk_mov_b32 (#123684 ) For VALU shuffles, this saves an instruction in some case.	2025-01-23 20:58:02 +07:00
Kareem Ergawy	ff55c9bc63	[llvm][amdgpu] Handle indirect refs to LDS GVs during LDS lowering (#124089 ) Fixes #123800 Extends LDS lowering by allowing it to discover transitive indirect/escpaing references to LDS GVs. For example, given the following input: ```llvm @lds_item_to_indirectly_load = internal addrspace(3) global ptr undef, align 8 %store_type = type { i32, ptr } @place_to_store_indirect_caller = internal addrspace(3) global %store_type undef, align 8 define amdgpu_kernel void @offloading_kernel() { store ptr @indirectly_load_lds, ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @place_to_store_indirect_caller, i32 0), align 8 call void @call_unknown() ret void } define void @call_unknown() { %1 = alloca ptr, align 8 %2 = call i32 %1() ret void } define void @indirectly_load_lds() { call void @directly_load_lds() ret void } define void @directly_load_lds() { %2 = load ptr, ptr addrspace(3) @lds_item_to_indirectly_load, align 8 ret void } ``` With the above input, prior to this patch, LDS lowering failed to lower the reference to `@lds_item_to_indirectly_load` because: 1. it is indirectly called by a function whose address is taken in the kernel. 2. we did not check if the kernel indirectly makes any calls to unknown functions (we only checked the direct calls). Co-authored-by: Jon Chesterfield <jonathan.chesterfield@amd.com>	2025-01-23 14:53:11 +01:00
Frederik Harwath	6fdaaafd89	[AMDGPU] SIPeepholeSDWA: Disable on existing SDWA instructions (#123942 ) This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR #123221 for an attempt to fix the actual issue. --------- Co-authored-by: Frederik Harwath <fharwath@amd.com>	2025-01-23 14:32:01 +01:00
Matt Arsenault	93d35ad5f5	AMDGPU: Delete FillMFMAShadowMutation (#123861 ) No test changes with this removed and it appears to be obsolete.	2025-01-22 22:41:25 +07:00
Akshat Oke	a343b8e595	[AMDGPU][NewPM] Port SILowerWWMCopies to NPM (#123695 )	2025-01-22 14:54:01 +05:30
TiborGY	3630d9ef65	[PartiallyInlineLibCalls] Add infrastructure for emitting optimization remarks from PartiallyInlineLibCalls (#122654 ) I am planning to add some optimization remarks to the `PartiallyInlineLibCalls` pass. However, since this pass does not emit any optimization remarks yet, I have to add the "infrastructure" for that first, which is what this PR is about.	2025-01-22 13:15:40 +07:00
Shoreshen	7c58d6363a	[AMDGPU] Add commute for some VOP3 inst (#121326 ) add commute for some VOP3 inst, allow commute for both inline constant operand, adjust tests Fixes #111205	2025-01-22 11:08:26 +07:00
Shoreshen	e8811ad3cc	[AMDGPU] Fix unreachable reg bit width (#122107 ) Add register class bit width for SReg_256_XNULL and SReg_128_XNULL	2025-01-22 10:05:47 +07:00
Matt Arsenault	5e79ae60a6	DAG: Fix vector_shuffle -> splat fold defining undef lanes (#123596 ) For shuffle vector splats with undef lanes in the mask, this was introducing real values. Filter out build_vector results based on the undef elements in the mask. This avoids AMDGPU test regressions in a future change. test/CodeGen/X86/urem-seteq-illegal-types.ll looks worse but I didn't investigate.	2025-01-21 23:55:50 +07:00
Brox Chen	70632f9566	[AMDGPU][True16][MC] true16 for v_cmp_xx_f16 (#122943 ) A bulk commit of true16 support for v_cmp_xx_f16 instructions including: v_cmp_f_f16 v_cmp_eq_f16 v_cmp_le_f16 v_cmp_gt_f16 v_cmp_lg_f16 v_cmp_ge_f16 v_cmp_o_f16 v_cmp_u_f16 v_cmp_nge_f16 v_cmp_nlg_f16 v_cmp_ngt_f16 v_cmp_nle_f16 v_cmp_neq_f16 v_cmp_nlt_f16 v_cmp_t_f16 Added a GFX12 runline for fcmp.f16	2025-01-21 10:06:22 -05:00
Chinmay Deshpande	9ca1323de1	[AMDGPU] Fix crash due to missing check for FLAT instructions that dont use vector registers when computing VALU hazard (#123627 )	2025-01-21 05:50:58 -08:00
lialan	5d9c717597	[GISel] Fold shifts to constant result. (#123510 ) This resolves #123212	2025-01-21 05:10:45 -08:00
Janek van Oirschot	82944595fa	[AMDGPU] Change scope of resource usage info symbols (#114810 ) Change scope of resource usage info MC symbols to align with the function linkage type	2025-01-21 13:10:06 +00:00
Akshat Oke	9b6e8df896	[AMDGPU][NewPM] Port SIFixVGPRCopies to NPM (#123592 ) Extends NPM pipeline support till PostRegAlloc passes (greedy is in the works)	2025-01-21 15:27:46 +05:30
David Stuttard	ebc5020564	[AMDGPU] Update entry point name for PAL metadata (#123581 ) Old entry-point metadata being updated. Nothing is required to account for deprecation as nothing uses the old style	2025-01-21 09:37:22 +00:00
Matt Arsenault	585858aeb6	AMDGPU: Fix asm constrains in new shuffle tests These passed prechecks but failed after cc5eba1737146a727a61b5dbe16d8c2ac453981e	2025-01-21 10:49:42 +07:00
Matt Arsenault	7786266dc7	AMDGPU: Expand shuffle testing with generated tests (#123574 ) Add some generated tests with every shuffle permutation for relevant vector element types and sizes. Not sure if this is going overboard with the number of tests. I pruned out the largest cases (16 and 32-bit cases are impractically large), and there's redundancy when testing the pointer cases (at least for SelectionDAG). This uses inline assembly to produce sample values because of how the ABI is lowered when using a function argument. Since we break all arguments into 32-bit pieces, a shuffle never ends up forming. We need separate handling to reconstruct shuffles in contexts involving physical registers in ABI contexts. I wrote a small tool to generate these, so I can easily change the exact test body. Not sure if it's worth posting anywhere. This is in preparation for making better use of v_pk_mov_b32, v_mov_b64 and s_mov_b64 in shuffles.	2025-01-21 10:08:42 +07:00
Krzysztof Drewniak	697c1883f1	Reapply "[AMDGPU] Handle natively unsupported types in addrspace(7) lowering" (#123660 ) (#123657) This reverts commit 64749fb01538fba2b56d9850497d5f3a626cabc2. Adds a constructor to VecSlice to address the failure	2025-01-20 16:12:17 -06:00
Krzysztof Drewniak	64749fb015	Revert "[AMDGPU] Handle natively unsupported types in addrspace(7) lowering" (#123657 ) Reverts llvm/llvm-project#110572 Seem to have broken a buildbot, not sure why https://lab.llvm.org/buildbot/#/builders/108/builds/8346	2025-01-20 13:14:04 -05:00
Krzysztof Drewniak	3805355ef6	[AMDGPU] Handle natively unsupported types in addrspace(7) lowering (#110572 ) The current lowering for ptr addrspace(7) assumed that the instruction selector can handle arbtrary LLVM types, which is not the case. Code generation can't deal with - Values that aren't 8, 16, 32, 64, 96, or 128 bits long - Aggregates (this commit only handles arrays of scalars, more may come) - Vectors of more than one byte - 3-word values that aren't a vector of 3 32-bit values (for axample, a <6 x half>) This commit adds a buffer contents type legalizer that adds the needed bitcasts, zero-extensions, and splits into subcompnents needed to convert a load or store operation into one that can be successfully lowered through code generation. In the long run, some of the involved bitcasts (though potentially not the buffer operation splitting) ought to be handled by the instruction legalizer, but SelectionDAG makes this difficult. It also takes advantage of the new `nuw` flag on `getelementptr` when lowering GEPs to offset additions. We don't currently plumb through `nsw` on GEPs since that should likely be a separate change and would require declaring what we mean by "the address" in the context of the GEP guarantees.	2025-01-20 11:33:35 -06:00
Fabian Ritter	cc5eba1737	[AMDGPU] Reject misaligned SGPR constraints for inline asm (#123590 ) The indices of SGPR register pairs need to be 2-aligned and SGPR quadruplets need to be 4-aligned. With this patch, we report an error when inline asm register constraints specify a misaligned register index, instead of silently dropping the specified index. Fixes #123208 --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2025-01-20 15:47:11 +01:00
Fraser Cormack	9cf24652e7	[AMDGPU] Fix spurious NoAlias results (#122309 ) After a30e50fc, AMDGPUAAResult is being called in more situations where BasicAA isn't sure. This exposed some regressions where NoAlias is being incorrectly returned for two identical pointers. The fix is to check the underlying objects for equality before returning NoAlias.	2025-01-20 14:19:30 +00:00
Akshat Oke	96c4f978d0	[AMDGPU][NewPM] Port SIOptimizeExecMasking to NPM (#123572 )	2025-01-20 16:34:01 +05:30

1 2 3 4 5 ...

8216 Commits