llvm-project

Author	SHA1	Message	Date
Amara Emerson	2d53eaff4a	[AArch64][GlobalISel] Fix legalization for <4 x i1> vector stores. This case is different from the earlier <8 x i1> case handled because it triggers a legalization failure in lowerStore() that's intended for scalar code. It also was triggering incorrect bitcast actions in the AArch64 rules that weren't expecting truncating stores. With these two fixed, more cases are handled. The code is still bad, including some missing load promotion in our combiners that result in dead stores hanging around at the end of codegen. Again, we can fix these in separate changes. Reviewers: davemgreen, madhur13490, topperc, arsenm Reviewed By: davemgreen Pull Request: https://github.com/llvm/llvm-project/pull/121185	2025-01-06 10:22:48 -08:00
Amara Emerson	6b0807fe2b	[AArch64][GlobalISel] Add support for lowering trunc stores of vector bools. This is essentially a port of TargetLowering::scalarizeVectorStore(), which is used for the case where we have something like a store of <8 x s8> truncating to <8 x s1> in memory. The naive lowering is a sequence of extracts to compute a scalar value to store. AArch64's DAG implementation has some more smarts to improve this further which we can do later. Reviewers: topperc, davemgreen Pull Request: https://github.com/llvm/llvm-project/pull/121169	2025-01-06 10:21:42 -08:00
Amara Emerson	41ebbed280	[AArch64][GlobalISel] Legalize vector boolean bitcasts to scalars by lowering via stack. Reviewers: davemgreen, topperc, arsenm Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/121171	2025-01-05 21:32:27 -08:00
Amara Emerson	7e3180a2c2	[AArch64][GlobalISel] Add support for widening vector store elements to s8. Reviewers: topperc, arsenm, davemgreen Reviewed By: arsenm Pull Request: https://github.com/llvm/llvm-project/pull/121170	2025-01-05 21:31:34 -08:00
Craig Topper	54dac27c57	[GISel][RISCV] Use isSExtCheaperThanZExt when widening G_UMAX/G_UMIN. (#120041 ) Similar to what we do for unsigned comparisons after #120032.	2024-12-15 23:16:58 -08:00
Craig Topper	115872902b	[GISel][RISCV] Use isSExtCheaperThanZExt when widening G_ICMP. (#120032 ) Sign extending i32->i64 is more efficient than zero extend for RV64.	2024-12-15 22:55:58 -08:00
Craig Topper	de1a423c23	[GISel][RISCV][AArch64] Support legalizing G_SCMP/G_UCMP to sub(isgt,islt). (#119265 ) Convert the LLT to EVT and call TargetLowering::shouldExpandCmpUsingSelects to determine if we should do this. We don't have a getSetccResultType, so I'm boolean extending the compares to the result type and using that. If the compares legalize to the same type, these extends will get removed. Unfortunately, if the compares legalize to a different type, we end up with truncates or extends that might not be optimally placed.	2024-12-15 20:47:17 -08:00
David Green	4c8c130847	[AArch64][GlobalISel] Scalarize i128 shufflevector instructions. (#119980 ) This, like other operations, scalarizes shuffle vector operations with types larger than 64bits. ImplicitDef and Freeze are also handled the same way, to allow them to legalize. The legalization of fewerElementsVectorShuffle is adjusted to handled scalarization.	2024-12-15 10:44:40 +00:00
Craig Topper	7ece560a50	[GISel] Support narrowing G_ICMP with more than 2 parts. (#119335 ) This allows us to support i128 G_ICMP on RV32. I'm not sure how to test the "left over" part of this as RISC-V always widens to a power of 2 before narrowing.	2024-12-12 09:50:26 -08:00
Tim Gymnich	2db2dc8ab9	[GlobalISel][NFC] Fix LLT Propagation (#119587 ) Retain LLT type information by creating new LLTs from the original LLT instead of only using the original scalar size. This PR prepares for the [LLT FPInfo RFC](https://discourse.llvm.org/t/rfc-globalisel-adding-fp-type-information-to-llt/83349/24) where LLTs will carry additional floating point type information in addition to the scalar size.	2024-12-12 09:47:46 -08:00
Craig Topper	5797ed660a	[GISel][SDAG] Avoid push_back in loops for some shuffle mask handling. (#119434 ) Each call to push_back contains a check to see if the vector needs to grow. Using resize or giving the size to the constructor can reduce the number of checks for growing.	2024-12-10 22:18:46 -08:00
Craig Topper	e3284d8cc7	[GISel] Use SmallVector::append instead of copying one element at a time. (#119321 )	2024-12-10 07:18:20 -08:00
Craig Topper	7c12418021	[GISel] Avoid creating a virtual register we don't need. (#119305 ) narrowScalarAddSub was creating a virtual register and then overwriting the Register variable without using it. Add an else and only create it when needed.	2024-12-09 20:23:24 -08:00
Craig Topper	4cf2cf18c9	[RISCV][GISel] Stop over promoting G_SITOFP/UITOFP libcalls on RV64. (#118597 ) When we have legal instructions we want to promote to sXLen and let isel pattern matching removing the and/sext_inreg. When using a libcall we want to use a 'si' libcall for small types instead of 'di'. To match the RV64 ABI, we need to sign extend `unsigned int` arguments. We reuse the shouldSignExtendTypeInLibCall hook from SelectionDAG.	2024-12-04 10:42:49 -08:00
Craig Topper	a15400d05d	[RISCV][GISel] Support f32/f64 ldexp. (#117941 ) The existing libcall lowering in LegalizerHelper.cpp did not account for one operand being integer. Reuse the G_FPOWI code to fix this.	2024-12-02 13:30:46 -08:00
Craig Topper	bee33b5291	[RISCV][GISel] Support f32/f64 powi. (#117937 ) Need to force libcall legalization to treat the integer argument as signed so that it can be promoted to XLen in call lowering for RV64. Alternatively we could promote the operand before converting to libcall, but going through call lowering is closer to what SelectionDAG does.	2024-12-02 09:06:38 -08:00
Craig Topper	43b6b78771	[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660 ) LegalizerHelp only supported f128 libcalls and incorrectly assumed that the destination register for the G_FCMP was s32.	2024-11-26 15:48:49 -08:00
Craig Topper	ebcaa57715	[GISel] #undef macros when they are no longer needed. NFC (#117652 ) These macros are created inside a function. They should be undefined before the end of the function.	2024-11-25 18:00:03 -08:00
David Green	d3ce069572	[AArch64][GlobalISel] Legalize ptr shuffle vector to s64 (#116013 ) This converts all ptr element shuffle vectors to s64, so that the existing vector legalization handling can lower them as needed. This prevents a lot of fallbacks that currently try to generate things like `<2 x ptr> G_EXT`. I'm not sure if bitcast/inttoptr/ptrtoint is intended to be necessary for vectors of pointers, but it uses buildCast for the casts, which now generates a ptrtoint/inttoptr.	2024-11-23 17:00:51 +00:00
Tex Riddell	c03d09ce3e	[aarch64] atan2 intrinsic lowering (p5) (#112611 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing fewerElementsVector handling for the new atan2 intrinsic - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize atan2. Part 5 for Implement the atan2 HLSL Function #70096.	2024-10-24 17:53:12 -07:00
Michael Maitland	6bac41496e	[RISCV][GISEL] Legalize G_INSERT_SUBVECTOR (#108859 ) This code is heavily based on the SelectionDAG lowerINSERT_SUBVECTOR code.	2024-10-21 08:49:13 -04:00
Michael Maitland	f957d080e9	[RISCV][GISEL] Legalize G_EXTRACT_SUBVECTOR (#109426 ) This is heavily based on the SelectionDAG lowerEXTRACT_SUBVECTOR code.	2024-10-01 14:08:49 -04:00
David Green	9f255d863f	[AArch64][GlobalISel] Lower fp16 abs and neg without fullfp16. (#110096 ) This changes the existing promote logic to lower, so that it can use normal integer operations. A minor change was needed to fneg lower code to handle vectors.	2024-09-27 07:43:58 +01:00
Evgenii Kudriashov	e9cb44090f	[X86][GlobalISel] Enable scalar versions of G_UITOFP and G_FPTOUI (#100079 ) Also add tests for G_SITOFP and G_FPTOSI	2024-09-25 16:15:36 +02:00
Craig Topper	d5d1417659	[RISCV][GISel] Use libcalls for rint, nearbyint, trunc, round, and roundeven intrinsics. (#108779 )	2024-09-18 12:07:44 -07:00
David Green	feac761f37	[GlobalISel][AArch64] Add G_FPTOSI_SAT/G_FPTOUI_SAT (#96297 ) This is an implementation of the saturating fp to int conversions for GlobalISel. On AArch64 the converstion instrctions work this way, producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is ported from SDAG. AArch64 has a lot of existing tests for fptosi_sat, covering a wide range of types. I have tried to make most of them work all at once, but a few fall back due to other missing features such as f128 handling for min/max.	2024-09-16 10:33:59 +01:00
Him188	0748f4227c	[AArch64][GlobalISel] Legalize 128-bit types for FABS (#104753 ) This patch adds a common lower action for `G_FABS`, which generates `and x8, x8, #0x7fffffffffffffff` to reset the sign bit. The action does not support vectors since `G_AND` does not support fp128. This approach is different than what SDAG is doing. SDAG stores the value onto stack, clears the sign bit in the most significant byte, and loads the value back into register. This involves multiple memory ops and sounds slower.	2024-09-03 12:47:26 +01:00
Sergei Barannikov	4d7a0abae8	[DataLayout] Change return type of `getStackAlignment` to `MaybeAlign` (#105478 ) Currently, `getStackAlignment` asserts if the stack alignment wasn't specified. This makes it inconvenient to use and complicates testing. This change also makes `exceedsNaturalStackAlignment` method redundant.	2024-08-27 22:59:33 +03:00
Sumanth Gundapaneni	e78156a0e2	Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054 ) Verifier is updated in a different patch to let the vector types for llvm.lround and llvm.llround intrinsics.	2024-08-21 12:13:56 -05:00
David Green	10fe531d6c	[GlobalISel] Add and use an Opcode variable and update match-table-cxx.td checks. NFC	2024-08-18 11:08:49 +01:00
Him188	ba461f8c62	[AArch64][GlobalISel] Legalize fp128 types as libcalls for G_FCMP (#98452 ) - Generate libcall for supported predicates. - Generate unsupported predicates as combinations of supported predicates. - Vectors are scalarized, however some cases like `v3f128_fp128` are still failing, because we failed to legalize G_OR for these types. GISel now generates the same code as SDAG, however, note the difference in the `one` case.	2024-07-25 11:07:31 +01:00
Sumanth Gundapaneni	0ee32c4573	[AMDGPU] Implement llvm.lrint intrinsic lowering (#98931 ) This patch enabled the target-independent lowering of llvm.lrint via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU.	2024-07-24 23:34:31 +04:00
Sumanth Gundapaneni	fc832d5349	[AMDGPU] Implement llvm.lround intrinsic lowering. (#98970 ) This patch enables the target-independent lowering of llvm.lround via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU. In order to support vector floating point input for llvm.lround, this patch extends the target independent APIs and provide support for scalarizing. pr98950 is needed to let verifier allow vector floating point types	2024-07-23 20:34:34 +04:00
Thorsten Schütt	2d2d6853cf	[GlobalIsel][AArch64] Legalize G_SCMP and G_UCMP (#99820 ) https://github.com/llvm/llvm-project/pull/91871 https://github.com/llvm/llvm-project/pull/98774	2024-07-23 10:12:28 +02:00
Joseph Huber	615b7eeaa9	Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.	2024-07-20 09:29:31 -05:00
NAKAMURA Takumi	740161a9b9	Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 )" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610	2024-07-20 12:36:57 +09:00
Farzon Lotfi	e2f463b5b6	[aarch64] Add hyperbolic and arc trig intrinsic lowering (#98937 ) ## The change(s) - `VecFuncs.def`: define intrinsic to sleef/armpl mapping - `LegalizerHelper.cpp`: add missing `fewerElementsVector` handling for the new trig intrinsics - `AArch64ISelLowering.cpp`: Add arch64 specializations for lowering like neon instructions - `AArch64LegalizerInfo.cpp`: Legalize the new trig intrinsics. aarch64 has specail legalization requirments in `AArch64LegalizerInfo.cpp`. If we redirect the clang builtin without handling this we will break the aarch64 compiler ## History This change is part of an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds wasm lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 ## Why is aarch64 needed The last step is to redirect the `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` to emit the intrinsic. We can't emit the intrinsic without the intrinsics becoming legal for aarch64 in `AArch64LegalizerInfo.cpp`	2024-07-19 10:18:23 -04:00
Lawrence Benson	177ce1900f	[LLVM] Add `llvm.experimental.vector.compress` intrinsic (#92289 ) This PR adds a new vector intrinsic `@llvm.experimental.vector.compress` to "compress" data within a vector based on a selection mask, i.e., it moves all selected values (i.e., where `mask[i] == 1`) to consecutive lanes in the result vector. A `passthru` vector can be provided, from which remaining lanes are filled. The main reason for this is that the existing `@llvm.masked.compressstore` has very strong constraints in that it can only write values that were selected, resulting in guard branches for all targets except AVX-512 (and even there the AMD implementation is _very_ slow). More instruction sets support "compress" logic, but only within registers. So to store the values, an additional store is needed. But this combination is likely significantly faster on many target as it avoids branches. In follow up PRs, my plan is to add target-specific lowerings for x86, SVE, and possibly RISCV. I also want to combine this with a store instruction, as this is probably a common case and we can avoid some memory writes in that case. See [discussion in forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663) for initial discussion on the design.	2024-07-17 14:24:24 +02:00
Joseph Huber	c05126bdfc	[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512 ) Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)	2024-07-16 06:22:09 -05:00
Him188	365f5b4a1d	[AArch64][GISel] Add fp128 and i128 sitofp/uitofp handling (#97691 ) Legalize sitofp/uitofp involving fp128/i128 types into a libcall. Vector with i128/fp128 types are scalarized.	2024-07-15 16:24:24 +01:00
chuongg3	0d5db4e7ba	[AArch64][GlobalISel] Bitcast and Build Illegal G_CONCAT_VECTOR Instructions (#96492 ) Attempts to handle illegal G_CONCAT_VECTOR instructions by bitcasting the source into scalar values and using G_BUILD_VECTOR instead Treating the G_CONCAT_VECTORS instruction in the legalization artefact by folding away concat(bitcast, ...) into buildvector(...) would require check for ImpDef created by the shuffles in llvm.	2024-07-15 12:00:47 +01:00
Farzon Lotfi	0b58f34c98	[X86][CodeGen] Add base trig intrinsic lowerings (#96222 ) This change is an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds constraint intrinsics and some lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. The only x86 specific change was for f80. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 The x86 lowering is going to be done in three pr changes with this being the first. A second PR will be put up for Loop Vectorizing and then SLPVectorizer. The constraint intrinsics is also going to be in multiple parts, but just 2. This part covers just the llvm specific changes, part2 will cover clang specifc changes and legalization for backends than have special legalization requirements like aarch64 and wasm.	2024-07-11 15:58:43 -04:00
Manish Kausik H	69192e0193	[LegalizeDAG] Optimize CodeGen for `ISD::CTLZ_ZERO_UNDEF` (#83039 ) Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case. The details of the optimization are outlined in #82075 Fixes #82075 Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>	2024-07-08 14:01:32 +01:00
Matt Arsenault	7032076242	GlobalISel: Drop vector range metadata on bitcast lowering (#97279 ) If we are reinterpreting the type, the range metadata also needs to be converted. I believe the DAG has the same bug.	2024-07-01 15:26:09 +02:00
Matt Arsenault	2df2373eb8	DAG/GlobalISel: Set disjoint for or in copysign lowering (#97057 ) We masked out the sign bit from one value, and the non-sign bits from the other so there should be no common bits set. No idea how to test this on the DAG path, other than scraping the debug logs. A few targets hit this path with f16 values, but the resulting i16 ors get anyext promoted and lose the disjoint flag. In the fp128 case, PPC gets further and the or loses the flag somewhere else later. Adding a haveNoCommonBits assert shows this works though.	2024-06-28 23:03:39 +02:00
isuckatcs	937d79bc9d	[GlobalISel][AArch64][AMDGPU] Expand FPOWI into series of multiplication (#95217 ) SelectionDAG already converts FPOWI into a series of optimized multiplications, this patch introduces the same optimization into GlobalISel.	2024-06-28 09:57:50 +02:00
David Green	e887624aca	[AArch64][GlobalISel] Add fp128 and i128 fptosi/fptoui handling. (#95528 ) Any fp128 need to end up as libcall, as will f32->i128 and f64->i128. f16 are a bit special as the maximum range of the result fits in a i17, so can be shrank to an i64. Vector with i128/fp128 types are scalarized.	2024-06-21 10:24:57 +01:00
Nikita Popov	f2f18459d4	Revert "Intrinsic: introduce minimumnum and maximumnum (#93841 )" As far as I can tell, this pull request was not approved, and did not go through an RFC on discourse. This reverts commit 89881480030f48f83af668175b70a9798edca2fb. This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.	2024-06-21 08:34:04 +02:00
YunQiang Su	8988148003	Intrinsic: introduce minimumnum and maximumnum (#93841 ) Currently, on different platform, the behaivor of llvm.minnum is different if one operand is sNaN: When we compare sNaN vs NUM: ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN. RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86: Returns NUM but not same with IEEE754-2019's minimumNumber as +0.0 is not always greater than -0.0. MIPS/LoongArch/Generic: return NUM. LIBCALL: returns qNaN. So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow IEEE754-2019's minimumNumber/maximumNumber. Half-fix: #93033	2024-06-21 11:53:08 +08:00
Christudasan Devadasan	27bebc1161	[GISel] Unify multiple instances of getTypeForLLT (NFC) (#95577 ) Multiple static instances of this utility function have been found in different GlobalISel files. Unifying them by adding an instance in utils.cpp.	2024-06-15 18:11:32 +05:30

1 2 3 4 5 ...

656 Commits