llvm-project

Author	SHA1	Message	Date
Craig Topper	c67653fbc3	[RISCV] Support vXf16 vector_shuffle with Zvfhmin. (#97491 ) We can shuffle vXf16 vectors just like vXi16 vectors. We don't need any FP instructions. Update the predicates for vrgather and vslides patterns to only check the predicates based on the equivalent integer type. If we use the FP type it will check Zvfh and block Zvfhmin. These are probably not the only patterns that need to be fixed, but the test from the bug report no longer crashes. Fixes #97477	2024-07-03 23:56:17 -07:00
realqhc	56f0ecd6db	[RISCV] Implement Intrinsics Support for XCValu Extension in CV32E40P (#85603 ) Implement XCValu intrinsics for CV32E40P according to the specification. This commit is part of a patch-set to upstream the vendor specific extensions of CV32E40P that need LLVM intrinsics to implement Clang builtins. Contributors: @CharKeaney, @ChunyuLiao, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie.	2024-07-04 01:25:10 +10:00
Craig Topper	57555c6a0a	[RISCV] Don't custom lower f16 SCALAR_TO_VECTOR with Zvfhmin. This doesn't appear to be tested and our custom handler doesn't support this right now.	2024-07-02 14:48:21 -07:00
Craig Topper	7e6e4986e6	[RISCV] Use EXTLOAD instead of ZEXTLOAD when lowering riscv_masked_strided_load with zero stride. (#97317 ) The splat we generate after the load doesn't use the extended bits, so it shouldn't matter which extend type we use. EXTLOAD is lowered as SEXTLOAD on every element type except i8.	2024-07-01 13:11:52 -07:00
Nikita Popov	2d209d964a	[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902 ) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.	2024-06-27 16:38:15 +02:00
Simon Pilgrim	3d7d246977	[RICSV] PerformDAGCombine - don't directly dereference dyn_cast results Use cast<> to assert the cast is valid to help avoid null dereferences Fixes static analyser warnings	2024-06-27 12:29:56 +01:00
Craig Topper	847235bbef	[RISCV] Add DAG combine to turn (sub (shl X, 8), X) into orc.b (#96680 ) If only bits 8, 16, 24, 32, etc. can be non-zero. This is what (mul X, 255) is decomposed to. This decomposition happens early before RISC-V DAG combine runs. This patch does not support types larger than XLen so i64 on rv32 fails to generate 2 orc.b instructions. It might have worked if the mul hadn't been decomposed before it was expanded. Partial fix for #96595.	2024-06-25 15:55:09 -07:00
Jiahan Xie	43d207adda	[RISCV][GISEL] IRTranslator for Scalable Vector Store (#86699 ) Support IR translation for scalable vector store	2024-06-24 13:48:59 -04:00
Nikita Popov	f2f18459d4	Revert "Intrinsic: introduce minimumnum and maximumnum (#93841 )" As far as I can tell, this pull request was not approved, and did not go through an RFC on discourse. This reverts commit 89881480030f48f83af668175b70a9798edca2fb. This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.	2024-06-21 08:34:04 +02:00
YunQiang Su	8988148003	Intrinsic: introduce minimumnum and maximumnum (#93841 ) Currently, on different platform, the behaivor of llvm.minnum is different if one operand is sNaN: When we compare sNaN vs NUM: ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN. RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86: Returns NUM but not same with IEEE754-2019's minimumNumber as +0.0 is not always greater than -0.0. MIPS/LoongArch/Generic: return NUM. LIBCALL: returns qNaN. So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow IEEE754-2019's minimumNumber/maximumNumber. Half-fix: #93033	2024-06-21 11:53:08 +08:00
Jianjian Guan	7625465651	[RISCV] Make M imply Zmmul (#95070 ) According to the spec, M implies Zmmul.	2024-06-21 11:11:10 +08:00
Philip Reames	3e55ac94c7	[RISCV] Strength reduce mul by 2^N - 2^M (#88983 ) This is a three instruction expansion, and does not depend on zba, so most of the test changes are in base RV32/64I configurations. With zba, this gets immediates such as 14, 28, 30, 56, 60, 62.. which aren't covered by our other expansions.	2024-06-20 07:36:48 -07:00
Roger Ferrer Ibáñez	5ef02d9963	[RISCV] Lower llvm.clear_cache to __riscv_flush_icache for glibc targets (#93481 ) This change is a preliminary step to support trampolines on RISC-V. Trampolines are used by flang to implement obtaining the address of an internal program (i.e., a nested function in Fortran parlance). In this change we lower `llvm.clear_cache` intrinsic on glibc targets to `__riscv_flush_icache` which is what GCC is currently doing for Linux targets.	2024-06-20 07:27:07 +02:00
Craig Topper	cb021f5e46	[RISCV] Don't use SEW=16 .vf instructions to move scalar bf16 into a vector. The instructions are only defined to operator f16 data. If the scalar FPR register isn't properly nan-boxed, these instructions will create a fp16 nan not a bf16 nan in the vector register.	2024-06-13 18:12:25 -07:00
Craig Topper	dc8e078a59	[RISCV] SPLAT_VECTOR of bf16 should not require Zvfhmin. (#95357 ) The custom lowering converts to f32, splats as f32, then narrows the vector to bf16. None of that requires Zvfhmin. Add new bf16 test files without Zvfh/Zvfmin in their RUN lines. I will remove the bf16 tests from other files in a follow up patch.	2024-06-13 08:42:36 -07:00
Craig Topper	e9fa6ffaf7	[RISCV] Fold (vXi8 (trunc (vselect (setltu, X, 256), X, (sext (setgt X, 0))))) to vmax+vnclipu. (#94720 ) This pattern is an obscured way to express saturating a signed value into a smaller unsigned value. If (setltu, X, 256) is true, then the value is already in the desired range so we can pick X. If it's false, we select (sext (setgt X, 0)) which is 0 for negative values and all ones for positive values. The all ones value when truncated to the final type will still be all ones like we want.	2024-06-07 09:57:03 -07:00
Liao Chunyu	2afea72968	[RISCV] Codegen support for XCVmem extension (#76916 ) All post-Increment load/store, register-register load/store spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @serkm, @simonpcook, @xingmingjie, @realqhc	2024-06-07 21:45:49 +08:00
Jianjian GUAN	be18daad06	Reland "[RISCV] Support select/merge like ops for bf16 vectors when have Zvfbfmin" (#94565 )"	2024-06-07 15:55:16 +08:00
Mehdi Amini	8c452d0cc5	Revert "[RISCV] Support select/merge like ops for bf16 vectors when have Zvfbfmin" (#94565 ) Reverts llvm/llvm-project#91936 Premerge bots are broken.	2024-06-05 21:13:52 -07:00
Jianjian Guan	d5ab38f69c	[RISCV] Support select/merge like ops for bf16 vectors when have Zvfbfmin (#91936 )	2024-06-06 10:33:54 +08:00
Craig Topper	edf4e02906	[RISCV] Support multiple levels of truncates in combineTruncToVnclip. (#93752 ) We can use multiple vnclips to saturate an i32 value into an i8 value.	2024-05-31 09:09:12 -05:00
Craig Topper	8247068b70	[RISCV] Support (truncate (smin (smax X, C1), C2)) for vnclipu in combineTruncToVnclip. (#93756 ) If the smax removed all negative numbers, then we can treat the smin like a umin. If the smin and smax are in the other order we can swap them and use a vnclipu as long as the smax constant is smaller than the smin constant. This is based on similar code from X86's detectUSatPattern.	2024-05-30 15:41:07 -05:00
Craig Topper	8a8cd8a766	[RISCV] Move vnclip patterns into DAGCombiner. (#93728 ) Similar to #93596, this moves the signed vnclip patterns into DAG combine. This will allows us to support more than 1 level of truncate in a future patch.	2024-05-29 16:46:36 -07:00
Craig Topper	424f82c204	[RISCV] Refactor combineTruncToVnclipu to prepare for adding signed vnclip support. NFC Reviewed as part of #93728.	2024-05-29 16:34:11 -07:00
Craig Topper	ec8fe598a9	[RISCV] Move vnclipu patterns into DAGCombiner. (#93596 ) I plan to add support for multiple layers of vnclipu. For example, i32->i8 using 2 vnclipu instructions. First clipping to 65535, then clipping to 255. Similar for signed vnclip. This scales poorly if we need to add patterns with 2 or 3 truncates. Instead, move the code to DAGCombiner with new ISD opcodes to represent VCLIP(U). This patch just moves the existing patterns into DAG combine. Support for multiple truncates will as a follow up. A similar patch series will be made for the signed vnclip.	2024-05-29 13:00:15 -07:00
Craig Topper	b3bbb2de6f	[RISCV] Verify the VL and Mask on the outer TRUNCATE_VECTOR_VL in combineTruncOfSraSext. (#93578 ) We checked the VL and mask of any additional TRUNCATE_VECTOR_VL nodes we peek through, but not the outermost. This moves the check to the outer node and then verifies all the additional nodes have the same VL and Mask. Stacked on #93574	2024-05-29 11:53:01 -07:00
Craig Topper	f7c8a0339c	[RISCV] Combine vXi32 (mul (and (lshr X, 15), 0x10001), 0xffff) -> (bitcast (sra (v2Xi16 (bitcast X)), 15)) (#93565 ) Similar for i16 and i64 elements for both fixed and scalable vectors. This reduces the number of vector instructions, but increases vl/vtype toggles. This reduces some code in 525.x264_r from SPEC2017. In that usage, the vectors are fixed with a small number of elements so vsetivli can be used. This is similar to `performMulVectorCmpZeroCombine` from AArch64.	2024-05-28 15:54:44 -07:00
Craig Topper	060b3023e1	[RISCV] Move TRUNCATE_VECTOR_VL combine into a helper function. NFC (#93574 ) I plan to add other combines on TRUNCATE_VECTOR_VL.	2024-05-28 14:49:57 -07:00
Craig Topper	d490ce22e9	[RISCV] Use mask undisturbed policy when silencing sNans for strict rounding ops. (#93356 ) The elements that aren't sNans need to get passed through this fadd instruction unchanged. With the agnostic mask policy they might be forced to all ones.	2024-05-28 08:51:42 -07:00
Craig Topper	a1c9b9673c	[SelectionDAG][RISCV][VE] Rename VP_ASHR->VP_SRA VP_LSHR->VP_SRL. (#93221 ) This maintains consistency with the non-VP ISD opcodes.	2024-05-24 09:03:19 -07:00
Yingwei Zheng	557bf3835b	[RISCV][ISel] Allow opaque constants in `hasAndNotCompare` (#92926 ) See the following code: `4ae896fe97/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L9334-L9357)` > Combining: t47: i64 = xor t43, OpaqueConstant:i64<31808> X: i64 = Constant<0> Y: i64 = OpaqueConstant<31808> The assertion failed because both `X` and `Y` are constants. This patch allows opaque constants in `hasAndNotCompare` to fix the issue. Fixes https://github.com/llvm/llvm-project/issues/90730.	2024-05-22 00:48:26 +08:00
Yingwei Zheng	76748119bf	[GISel][RISCV] Add irtranslator/legalizer/selector support for G_FREEZE. (#92744 ) This patch adds support for G_FREEZE on riscv. It will be selected into a copy instruction. The ll test is copied from the AArch64 patch: `665da59685`.	2024-05-21 23:59:51 +08:00
Craig Topper	6246b495ad	[RISCV] Select ISD::AVGCEILS/AVGFLOORS as vaadd. (#92839 ) I think the behaviors are the same if this describes their behavior. AVGFLOORS sign extends the inputs by 1 bit, adds them, then does an arithmetic shift right by 1 before truncating to the original bit width. This is vaadd with rdn rounding mode. AVGCEILS sign extends the inputs by 1 bit, adds them, then does an arithmetic shift right by 1. If the bit shifted out is 1, it adds 1 to the shifted value. Then truncates to the original bit width. This is vaadd with rnu rounding mode. I think this wasn't implemented previously because there was some confusion about what average means. Some may expect average to round towards zero, but there is no way to do that in RISC-V or with the SelectionDAG nodes. Related issue https://github.com/riscv/riscv-v-spec/issues/935	2024-05-20 23:24:22 -07:00
Kazu Hirata	79a6a7e28f	[RISCV] Fix a warning This patch fixes: llvm/lib/Target/RISCV/RISCVISelLowering.cpp:19848:11: error: enumeration value 'SW_GUARDED_BRIND' not handled in switch [-Werror,-Wswitch]	2024-05-14 00:48:56 -07:00
Yeting Kuo	d488a54b40	[RISCV] Use software guarded branch for indirect jump table branch. (#66762 ) When Zicfilp enabled, indirect jump table branch should be a software guarded branch.	2024-05-14 14:44:25 +08:00
Philip Reames	6140b5bae4	[RISCV] Use RISCVISD::SHL_ADD in transformAddShlImm (#89832 ) Doing so avoids negative interactions with other combines which don't know the shl_add is a single instruction. From the commit log, we've had several combine loops already. This was originally posted as part of #88791, where a bug was pointed out. That bug was fixed by #89789 which hits the same issue from another angle. To confirm the fix, I included the reduced test case here.	2024-05-13 09:48:46 -07:00
Paul Kirth	d95f7c9cab	[RISCV] Use the thread local stack protector for Android targets (#87672 ) Android supports per thread stack protectors that are individually managed and initialized, which can provide stronger protections than using the global stack protector cookie. This patch matches the convention for other architectures targeting Android platforms.	2024-05-13 08:52:59 -07:00
Min-Yih Hsu	f8063ffe73	[VP][RISCV] Add vp.reduce.fmaximum/fminimum and its RISC-V codegen (#91782 ) `vp.reduce.fmaximum/fminimum` are the VP version of `vector.reduce.fmaximum/fminimum`.	2024-05-10 16:01:47 -07:00
Luke Lau	d24eaef925	[RISCV] Sink vector select splat operands (#91554 ) vmerge.vxm allows us to splat the true operand of a select, so sink it where possible to reduce vector register pressure.	2024-05-10 10:01:23 +08:00
Harald van Dijk	8fd838a8c4	[RISC-V] Limit vscale interleaving to addrspace 0. (#91573 ) The vlseg and vsseg intrinsic functions are not overloaded on pointer type, so cannot handle non-default address spaces. This fixes an error we see after #90583.	2024-05-09 19:15:42 +01:00
Philip Reames	4298fc5eb5	[RISCV] Move strength reduction of mul X, 3/5/9*2^N to combine (#89966 ) This moves our last major category tablegen driven multiply strength reduction into the post legalize combine framework. The one slightly tricky bit is making sure that we use a leading shl if we can form a slli.uw, and trailing shl otherwise. Having the trailing shl is critical for shNadd matching, and folding any following sext.w. As can be seen in the TD deltas, this allows us to kill off both the actual multiply patterns and the explicit add (mul X, C) Y patterns. The later are now handled by the generic shNadd matching code, with the exception of the THead only C=200 case because we don't (yet) have a multiply expansion with two shNadd + a shift. --------- Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>	2024-05-08 10:13:01 -07:00
Liao Chunyu	f4d2f7a3b7	[RISCV] Codegen support for XCVbi extension (#89719 ) spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst#immediate-branching-operations Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @NandniJamnadas, @PaoloS02, @simonpcook, @xingmingjie, @realqhc, @PhilippvK,@melonedo	2024-05-08 11:22:16 +08:00
Jianjian Guan	37fcb323f6	[RISCV] Add codegen support for Zvfbfmin (#87911 ) This patch adds basic codegen support for Zvfbfmin extension.	2024-05-07 10:25:06 +08:00
Yingwei Zheng	2647bd7369	[RISCV][ISel] Fix types in `tryFoldSelectIntoOp` (#90659 ) ``` SelectionDAG has 17 nodes: t0: ch,glue = EntryToken t6: i64,ch = CopyFromReg t0, Register:i64 %2 t8: i1 = truncate t6 t4: i64,ch = CopyFromReg t0, Register:i64 %1 t7: i1 = truncate t4 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t10: i64,i1 = saddo t2, Constant:i64<1> t11: i1 = or t8, t10:1 t12: i1 = select t7, t8, t11 t13: i64 = any_extend t12 t15: ch,glue = CopyToReg t0, Register:i64 $x10, t13 t16: ch = RISCVISD::RET_GLUE t15, Register:i64 $x10, t15:1 ``` `OtherOpVT` should be i1, but `OtherOp->getValueType(0)` returns `i64`, which ignores `ResNo` in `SDValue`. Fix https://github.com/llvm/llvm-project/issues/90652.	2024-05-01 06:51:36 +08:00
Luke Lau	f565b79f9f	[RISCV] Handle fixed length vectors with exact VLEN in lowerINSERT_SUBVECTOR (#84107 ) This is the insert_subvector equivalent to #79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector fits exactly into a vector register group or the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7ad A subvector exactly fits into a vector register group if its size is a known multiple of the size of a vector register, and this adds a new overload for TypeSize::isKnownMultipleOf for scalable to scalable comparisons to help reason about this. I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.	2024-05-01 01:35:13 +08:00
Min-Yih Hsu	539f626ecd	[VP][RISCV] Add vp.cttz.elts intrinsic and its RISC-V codegen (#90502 ) This intrinsic is the VP version of `experimental.cttz.elts`.	2024-04-30 09:27:10 -07:00
Craig Topper	2524146b25	[RISCV] Add DAG combine for (vmv_s_x_vl (undef) (vmv_x_s X). (#90524 ) We can use the original vector as long as the type of X matches the result type of the vmv_s_x_vl.	2024-04-29 23:35:30 -07:00
Craig Topper	f9d4d54aa0	[RISCV] Break the (czero_eqz x, (setne x, 0)) -> x combine into 2 combines. (#90428 ) We can think of this as two separate combines (czero_eqz x, (setne y, 0)) -> (czero_eqz x, y) and (czero_eqz x, x) -> x Similary the (czero_nez x, (seteq x, 0)) -> x combine can be broken into (czero_nez x, (seteq y, 0)) -> (czero_eqz x, y) and (czero_eqz x, x) -> x isel already does the (czero_eqz x, (setne y, 0)) -> (czero_eqz x, y) and (czero_nez x, (seteq y, 0)) -> (czero_eqz x, y) combines, but doing them early could expose other opportunities.	2024-04-29 10:15:57 -07:00
Maciej Gabka	bfc0317153	Move several vector intrinsics out of experimental namespace (#88748 ) This patch is moving out following intrinsics: * vector.interleave2/deinterleave2 * vector.reverse * vector.splice from the experimental namespace. All these intrinsics exist in LLVM for more than a year now, and are widely used, so should not be considered as experimental.	2024-04-29 10:16:45 +01:00
Qiu Chaofan	4a8f2f2e1a	[Legalizer] Expand fmaximum and fminimum (#67301 ) According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and propagates NaN. Expand the nodes on targets not supporting the operation, by adding extra check for NaN and using is_fpclass to check zero signs.	2024-04-29 15:09:54 +08:00

... 3 4 5 6 7 ...

1787 Commits