llvm-project

Author	SHA1	Message	Date
Alex Wang	a947599991	[AMDGPU][GlobalISel] Add lowering for G_FMODF (#180152 ) Add generic expansion for G_FMODF matching the SelectionDAG implementation. Enable G_FMODF lowering for AMDGPU with tests. Related: #179434	2026-02-07 18:43:55 +00:00
Matt Arsenault	9ea2c5fcdd	GlobalISel: Use LibcallLowering to get libcall calling conventions (#176837 ) 0e304e6d9f306ead81fc5177b8a497af0d416a73 converted the name queries, but missed some of the calling conventions.	2026-01-19 23:29:46 +00:00
Matt Arsenault	0e304e6d9f	GlobalISel: Use LibcallLoweringInfo more in LegalizerHelper (#176411 ) Avoid using TargetLowering for libcall information.	2026-01-19 11:07:56 +01:00
Stefan Weigl-Bosker	828261ebb8	[GISel] Add G_CTLS Opcode and combines, lower to cls(w) (#175069 ) Fixes https://github.com/llvm/llvm-project/issues/174369 - Added a G_CTLS opcode and some pattern matching. This is the GlobalISel equivalent to https://github.com/llvm/llvm-project/pull/173417 - Add legalization for aarch64 and riscv ``` // Folds (ctlz (xor x, (sra x, bitwidth-1))) -> (add (ctls x), 1). // Folds (ctlz (or (shl (xor x, (sra x, bitwidth-1)), 1), 1) -> (ctls x) (clang aarch64) ```	2026-01-16 13:22:18 -08:00
Matt Arsenault	539412914a	GlobalISel: Use LibcallLoweringInfo analysis in legalizer (#170328 ) This is mostly boilerplate to move various freestanding utility functions into LegalizerHelper. LibcallLoweringInfo is currently optional, mostly because threading it through assorted other uses of LegalizerHelper is more difficult. I had a lot of trouble getting this to work in the legacy pass manager with setRequiresCodeGenSCCOrder, and am not happy with the result. A sub-pass manager is introduced and this is invalidated, so we're re-computing this unnecessarily.	2026-01-16 14:42:10 +01:00
David Green	3c844212f2	[AArch64][GlobalISel] Add disjoint to the G_OR when lowering G_ROTR/L (#172317 ) It looks like this is already handled for funnel shifts, we can do the same for the or created when lowering G_ROTR and G_ROTL. This allows some more add-like-ors to match.	2026-01-02 09:07:21 +00:00
Nathan Corbyn	b7a20c1cc4	[GlobalISel] Don't permit G_MIN/G_MAX of pointer vectors (#168872 ) - Use `LLT::changeElementType()` instead of `LLT::changeElementSize()` in `LegalizerHelper::lowerMinMax()` to avoid a crash in the case that the destination type is a pointer vector; - Reject `G_MIN`/`G_MAX` of pointers and pointer vectors in `MachineVerifier`; - Don't combine `G_SELECT`+`G_ICMP` pairs into `G_MIN`/`G_MAX` generic instructions when the operands are pointers / pointer vectors. Fixes #166556	2025-12-17 09:03:41 +00:00
Nathan Corbyn	2f9bf3f292	[GlobalISel](NFC) Refactor construction of LLTs in `LegalizerHelper` (#170664 ) I spotted a number of places where we're duplicating logic provided by the `LLT` class inline in `LegalizerHelper`. This PR tidies up these spots.	2025-12-15 12:26:27 +00:00
Evgenii Kudriashov	7ecc9ee919	[X86][GlobalISel] Set Dst register correctly when narrowing G_ICMP (#169947 ) Due to untested branch in #119335 Fixes #167326	2025-12-08 21:08:24 +01:00
Nathan Corbyn	b5f7058e91	[AArch64][GlobalISel] Don't crash when legalising vector G_SHL (#168848 )	2025-12-02 07:57:47 +00:00
Craig Topper	1157a22134	[GISel] Use getScalarSizeInBits in LegalizerHelper::lowerBitCount (#168584 ) For vectors, CTLZ, CTTZ, CTPOP all operate on individual elements. The lowering should be based on the element width. I noticed this by inspection. No tests in tree are currently affected, but I thought it would be good to fix so someone doesn't have to debug it in the future.	2025-11-18 12:26:47 -08:00
Hongyu Chen	523bd2df6d	[GISel][RISCV] Compute CTPOP of small odd-sized integer correctly (#168559 ) Fixes the assertion in #168523 This patch lifts the small, odd-sized integer to 8 bits, ensuring that the following lowering code behaves correctly.	2025-11-18 18:49:13 +00:00
David Green	4ecfaa602f	[AArch64][GlobalISel] Add better basic legalization for llround. (#168427 ) This adds handling for f16 and f128 lround/llround under LP64 targets, promoting the f16 where needed and using a libcall for f128. This codegen is now identical to the selection dag version.	2025-11-18 12:05:02 +00:00
Ryan Cowan	f8d65fd874	[AArch64][GlobalISel] Improve lowering of vector fp16 fpext (#165554 ) This PR improves the lowering of vectors of fp16 when using fpext. Previously vectors of fp16 were scalarized leading to lots of extra instructions. Now, vectors of fp16 will be lowered when extended to fp64 via the preexisting lowering logic for extends. To make use of the existing logic, we need to add elements until we reach the next power of 2.	2025-11-14 20:52:51 -08:00
Matt Arsenault	ad8f6b44be	DAG: Avoid some libcall string name comparisons (#166321 ) Move to the libcall impl based functions.	2025-11-05 07:09:02 -08:00
Yunqing Yu	059d90d08f	[Legalizer] Cache extracted element when lowering G_SHUFFLE_VECTOR. (#163893 ) Cache extracted elements in lowerShuffleVector(). For example, when lowering ``` %0:_(<2 x s32>) = G_BUILD_VECTOR %0, %1 %2:_(<N x s32>) = G_SHUFFLE_VECTOR %1, shufflemask(0, 0, 0, 0 ... x N ) ``` Currently, we generate `N` `G_EXTRACT_VECTOR_ELT` for each element in shufflemask. This is undesirable and bloats the code, especially for larger vectors. With this change, we only generate one `G_EXTRACT_VECTOR_ELT` from `%0` and reuse it for all four result elements.	2025-10-25 10:26:11 -05:00
Mirko Brkušanin	fe5f49942e	[AMDGPU][GlobalISel] Lower G_FMINIMUM and G_FMAXIMUM (#151122 ) Add GlobalISel lowering of G_FMINIMUM and G_FMAXIMUM following the same logic as in SDag's expandFMINIMUM_FMAXIMUM. Update AMDGPU legalization rules: Pre GFX12 now uses new lowering method and make G_FMINNUM_IEEE and G_FMAXNUM_IEEE legal to match SDag.	2025-10-24 14:48:27 +02:00
David Green	a1e59bdc17	[GlobalISel] Make scalar G_SHUFFLE_VECTOR illegal. (#140508 ) I'm not sure if this is the best way forward or not, but we have a lot of issues with forgetting that shuffle_vectors can be scalar again and again. (There is another example from the recent known-bits code added recently). As a scalar-dst shuffle vector is just an extract, and a scalar-source shuffle vector is just a build vector, this patch makes scalar shuffle vector illegal and adjusts the irbuilder to create the correct node as required. Most targets do this already through lowering or combines. Making scalar shuffles illegal simplifies gisel as a whole, it just requires that transforms that create shuffles of new sizes to account for the scalar shuffle being illegal (mostly IRBuilder and LessElements).	2025-10-24 08:21:35 +01:00
Craig Topper	cc251197a0	[GISel] Use G_ZEXT when widening G_EXTRACT_VECTOR_ELT/G_INSERT_VECTOR_ELT index. (#163416 )	2025-10-15 09:02:05 -07:00
Ryan Cowan	eb803df502	[AArch64][GlobalISel] Add `G_FMODF` instruction (#160061 ) This commit adds the intrinsic `G_FMODF` to GMIR & enables its translation, legalization and instruction selection in AArch64.	2025-10-02 10:30:31 +01:00
Matt Arsenault	c80d495908	GlobalISel: Adjust insert point when expanding G_[SU]DIVREM (#160683) The insert point management is messy here. We probably should have an insert point guard, and not have ths dest operand utilities modify the insert point. Fixes #159716	2025-09-25 11:00:53 +00:00
AZero13	151a80bbce	[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ucmp (#159889 ) Same deal we use for determining ucmp vs scmp. Using selects on platforms that like selects is better than using usubo. Rename function to be more general fitting this new description.	2025-09-25 10:33:05 +09:00
woruyu	1a172b9924	[RISCV][GISel] Lower G_SSUBE (#157855 ) ### Summary Try to implemente Lower G_SSUBE in LegalizerHelper::lower	2025-09-18 10:08:56 +08:00
Shaoce SUN	41d7ae84e5	[RISCV][GlobalIsel] Lower G_FMINIMUMNUM, G_FMAXIMUMNUM (#157295 ) Similar to the implementation in https://github.com/llvm/llvm-project/pull/104411 , the `fmin.s`/`fmax.s` instructions follow IEEE 754-2019 semantics, and `G_FMINIMUMNUM`/`G_FMAXIMUMNUM` are legal.	2025-09-11 10:16:42 +08:00
woruyu	c69172637e	[RISCV][GISel] Lower G_SADDE (#156865 ) ### Summary Try to implemente Lower G_SADDE in LegalizerHelper::lower	2025-09-11 09:32:56 +08:00
Craig Topper	262c7b7b5a	[RISCV][GISel] Widen G_ABDS/G_ABDU before lowering when Zbb is enabled. (#157766 ) This allows us to use G_SMIN/SMAX/UMIN/UMAX in the lowering.	2025-09-10 12:17:30 -07:00
Shaoce SUN	eb623e650b	[RISCV][GISel] Lower G_ABDS and G_ABDU (#155888 ) Implementation follows the `ISD::ABDS` handling in `RISCVTargetLowering`.	2025-09-05 21:16:35 +08:00
Amara Emerson	4829dedfa9	[GlobalISel] Add multi-way splitting support for wide scalar shifts. (#155353 ) This patch implements direct N-way splitting for wide scalar shifts instead of recursive binary splitting. For example, an i512 G_SHL can now be split directly into 8 i64 operations rather than going through i256 -> i128 -> i64. The main motivation behind this is to alleviate (although not entirely fix) pathological compile time issues with huge types, like i4224. The problem we see is that the recursive splitting strategy combined with our messy artifact combiner ends up with terribly long compiles as tons of intermediate artifacts are generated, and then attempted to be combined ad-nauseum. Going directly from the large shifts to the destination types short-circuits a lot of these issues, but it's still an abuse of the backend and front-ends should never be doing this sort of thing.	2025-09-03 10:25:52 -07:00
Kane Wang	7d6e72f110	[RISCV][GlobalISel] Lower G_ATOMICRMW_SUB via G_ATOMICRMW_ADD (#155972 ) RISCV does not provide a native atomic subtract instruction, so this patch lowers `G_ATOMICRMW_SUB` by negating the RHS value and performing an atomic add. The legalization rules in `RISCVLegalizerInfo` are updated accordingly, with libcall fallbacks when `StdExtA` is not available, and intrinsic legalization is extended to support `riscv_masked_atomicrmw_sub`. For example, lowering `%1 = atomicrmw sub ptr %a, i32 1 seq_cst` on riscv32a produces: ``` li a1, -1 amoadd.w.aqrl a0, a1, (a0) ``` On riscv64a, where the RHS type is narrower than XLEN, it currently produces: ``` li a1, 1 neg a1, a1 amoadd.w.aqrl a0, a1, (a0) ``` There is still a constant-folding or InstConbiner gap. For instance, lowering ``` %b = sub i32 %x, %y %1 = atomicrmw sub ptr %a, i32 %b seq_cst ``` generates: ``` subw a1, a1, a2 neg a1, a1 amoadd.w.aqrl a0, a1, (a0) ``` This sequence could be optimized further to eliminate the redundant neg. Addressing this may require improvements in the Combiner or Peephole Optimizer in future work. --------- Co-authored-by: Kane Wang <kanewang95@foxmail.com>	2025-09-03 08:42:31 -07:00
David Green	4ee80ca29e	[GlobalISel] Add support for scalarizing vector insert and extract elements (#153274 ) This Adds scalarization handling for fewer vector elements of insert and extract, so that i128 and fp128 types can be handled if they make it past combines. Inserts are unmerged with the inserted element added to the remerged vector, extracts are unmerged then the correct element is copied into the destination. With a non-constant vector the usual stack lowering is used.	2025-08-27 10:21:58 +01:00
David Green	c5105c1e0a	[GlobalISel] Fix bitcast fewerElements with scalar narrow types. (#153364 ) For a <8 x i32> -> <2 x i128> bitcast, that under aarch64 is split into two halfs, the scalar i128 remainder was causing problems, causing a crash with invalid vector types. This makes sure they are handled correctly in fewerElementsBitcast.	2025-08-13 22:27:53 +01:00
Fabian Ritter	d64240b5c6	[GISel] Introduce MachineIRBuilder::(build\|materialize)ObjectPtrOffset (#150392 ) These functions are for building G_PTR_ADDs when we know that the base pointer and the result are both valid pointers into (or just after) the same object. They are similar to SelectionDAG::getObjectPtrOffset. This PR also changes call sites of the generic (build\|materialize)PtrAdd functions that implement pointer arithmetic to split large memory accesses to the new functions. Since memory accesses have to fit into an object in memory, pointer arithmetic to an offset into a large memory access also yields an address in that object. Currently, these (build\|materialize)ObjectPtrOffset functions only add "nuw" to the generated G_PTR_ADD, but I intend to introduce an "inbounds" MIFlag in a later PR (analogous to a concurrent effort in SDAG: #131862, related: #140017, #141725) that will also be set in the (build\|materialize)ObjectPtrOffset functions. Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where offsets are now folded into scratch instructions, and cases where the behavior of the check regeneration script changed, resulting, e.g., in better checks for "nusw G_PTR_ADD" instructions, matched empty lines, and the use of "CHECK-NEXT" in MIPS tests. For SWDEV-516125.	2025-07-29 13:04:04 +02:00
paperchalice	ce86ff105b	[GlobalISel] Remove `UnsafeFPMath` references (#146319 ) This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen pipeline.	2025-07-29 12:11:52 +08:00
Pete Chou	314ce691df	[GlobalISel] Allow Legalizer to lower volatile memcpy family. (#145997 ) This change updates legalizer to allow lowering volatile memcpy family as a target might rely on lowering to legalize them.	2025-07-22 00:42:23 -07:00
Fraser Cormack	a516c60ec3	[NFC] Correct typo: invertion -> inversion (#147995 )	2025-07-11 07:37:25 +01:00
David Green	3448e9c075	[AArch64][GlobalISel] Fix lowering of i64->f32 itofp. (#132703 ) This is a GISel equivalent of #130665, preventing a double-rounding issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP. Because it is legal to convert i64->f16 itofp without double-rounding, but not a fpround f64->f16, that variant is lowered to build the two extends.	2025-07-05 18:13:19 +01:00
Pete Chou	13e06403b4	[GlobalISel] Remove dead code. (NFC) (#145811 ) LegalizerHelper::lowerMemCpyFamily only execpts G_MEMCPY, G_MEMMOVE, and G_MMSET.	2025-06-26 10:48:27 +09:00
JaydeepChauhan14	c3c923c8d6	[X86][GlobalISel] Enable SINCOS with libcall mapping (#142438 )	2025-06-25 15:37:33 +09:00
Matt Arsenault	a65e0edd6a	PowerPC: Stop reporting memcpy as an alias of memmove on AIX (#143836 ) Instead of reporting ___memmove as an implementation of memcpy, make it unavailable and let the lowering logic consider memmove as a fallback path. This avoids a special case 1:N mapping for libcall implementations.	2025-06-23 22:15:37 +09:00
Matt Arsenault	48155f93dd	CodeGen: Emit error if getRegisterByName fails (#145194 ) This avoids using report_fatal_error and standardizes the error message in a subset of the error conditions.	2025-06-23 16:33:35 +09:00
David Green	437346378f	[GlobalISel] Widen vector loads from aligned ptrs (#144309 ) If the pointer is aligned to more than the size of the vector, we can widen the load up to next power of 2 size, as SDAG performs. Some of the v3 tests are currently worse - those should be addressed in other issues.	2025-06-21 07:42:54 +01:00
David Green	89f692a24f	[GlobalISel] Split Legalizer debug ouput into paragraphs. NFC (#143427 ) This helps keep the legalizer output easier to read, splitting each instructions legalization into a separate block.	2025-06-15 16:43:18 +08:00
Stanley Gambarin	33974b41c7	[GlobalISel] support lowering of G_SHUFFLEVECTOR with pointer args (#141959 )	2025-06-05 09:13:51 -07:00
Matt Arsenault	2e2bbcacf8	AMDGPU/GlobalISel: Start legalizing minimumnum and maximumnum (#140900 ) This is the bare minimum to get the intrinsic to compile for AMDGPU, and it's not optimal. We need to follow along closer with the existing G_FMINNUM/G_FMAXNUM with custom lowering to handle the IEEE=0 case better. Just re-use the existing lowering for the old semantics for G_FMINNUM/G_FMAXNUM. This does not change G_FMINNUM/G_FMAXNUM's treatment, nor try to handle the general expansion without an underlying min/max variant (or with G_FMINIMUM/G_FMAXIMUM).	2025-05-21 17:00:45 +02:00
jyli0116	382ad6f2e7	[GISel][AArch64] Added more efficient lowering of Bitreverse (#139233 ) GlobalISel was previously inefficient in handling bitreverses of vector types. This deals with i16, i32, i64 vector types and converts them into i8 bitreverses and rev instructions.	2025-05-13 11:21:50 +01:00
jyli0116	fd80048738	[GlobalISel][AArch64] Handles bitreverse to prevent falling back (#138150 ) Handles bitreverse for vector types which were previously falling back onto Selection DAG. Includes 8-bit element vectors greater than 128 bits and less than 64 bits: <32 x i8>, <4 x i8>, and odd vector types: <9 x i8>.	2025-05-06 09:57:01 +01:00
Kazu Hirata	cdc9a4b5f8	[CodeGen] Use range-based for loops (NFC) (#138488 ) This is a reland of #138434 except that: - the bits for llvm/lib/CodeGen/RenameIndependentSubregs.cpp have been dropped because they caused a test failure under asan, and - the bits for llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp have been improved with structured bindings.	2025-05-05 10:08:49 -07:00
Nico Weber	1d955489c3	Revert "[CodeGen] Use range-based for loops (NFC) (#138434 )" This reverts commit a9699a334bc9666570418a3bed9520bcdc21518b. Breaks CodeGen/AMDGPU/collapse-endcf.ll in several configs (sanitizer builds; macOS; possibly more), see comments on https://github.com/llvm/llvm-project/pull/138434	2025-05-04 17:36:52 -04:00
Kazu Hirata	47f391fd0e	[CodeGen] Remove unused local variables (NFC) (#138441 )	2025-05-04 00:26:37 -07:00
Kazu Hirata	a9699a334b	[CodeGen] Use range-based for loops (NFC) (#138434 )	2025-05-04 00:26:19 -07:00

1 2 3 4 5 ...

722 Commits