llvm-project

Author	SHA1	Message	Date
Craig Topper	644899addd	[RISCV][GISel] Port portions of float-intrinsics.ll and double-intrinsics.ll. NFC Remove the legalizer test for the same intrinsics as it is no longer interesting with end to end tests.	2024-09-18 13:13:34 -07:00
Craig Topper	d5d1417659	[RISCV][GISel] Use libcalls for rint, nearbyint, trunc, round, and roundeven intrinsics. (#108779 )	2024-09-18 12:07:44 -07:00
Craig Topper	8e4909aa19	[RISCV] Remove unnecessary vand.vi from vXi1 and nvXvi1 VECTOR_REVERSE codegen. (#109071 ) Use a setne with 0 instead of a trunc. We know we zero extended the node so we can get by with a non-zero check only. The truncate lowering doesn't know that we zero extended so has to mask the lsb. I don't think DAG combine sees the trunc before we lower it to RISCVISD nodes so we don't get a chance to use computeKnownBits to remove the AND.	2024-09-18 09:43:48 -07:00
Sarah Spall	67518a44fe	[HLSL] Implement elementwise popcount (#108121 ) Add new elementwise popcount builtin to support HLSL function 'countbits'. elementwise popcount only accepts integer types. Add hlsl intrinsic 'countbits' Closes #99094	2024-09-18 08:19:52 -07:00
Phoebe Wang	a10c9f994b	Revert "[X86][BF16] Add libcall for F80 -> BF16" (#109140 ) Reverts llvm/llvm-project#109116	2024-09-18 21:35:38 +08:00
Phoebe Wang	76eda76f9f	[X86][BF16] Add libcall for F80 -> BF16 (#109116 ) This fixes #108936, but the calling convention doesn't match with GCC. I doubt we have such a lib function for now, so leave the calling convention as is.	2024-09-18 21:23:10 +08:00
Mahesh-Attarde	311e4e3245	[X86][AVX10.2] Support AVX10.2 MOVZXC new Instructions. (#108537 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965 Chapter 14 INTEL® AVX10 ZERO-EXTENDING PARTIAL VECTOR COPY INSTRUCTIONS --------- Co-authored-by: mattarde <mattarde@intel.com>	2024-09-18 21:01:51 +08:00
Simon Pilgrim	4b529f840c	[X86] Fold extractsubvector(permv3(src0,mask,src1),c) -> extractsubvector(permv3(src0,widensubvector(extractsubvector(mask,c)),src1),0) iff c != 0 For cross-lane shuffles, extract the mask operand (uppper) subvector directly, and make use of the free implicit extraction of the lowest subvector of the result.	2024-09-18 12:06:08 +01:00
Piotr Sobczak	adf02ae41f	[AMDGPU] Simplify lowerBUILD_VECTOR (#109094 ) Simplify `lowerBUILD_VECTOR` by commoning up the way the vectors are split. Also reorder the checks to avoid a long condition inside `if`.	2024-09-18 12:58:16 +02:00
Mahesh-Attarde	f5ad9e1ca5	[X86][AVX10.2] Support AVX10.2-COMEF new instructions. (#108063 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965 Chapter 8 AVX10 COMPARE SCALAR FP WITH ENHANCED EFLAGS INSTRUCTIONS --------- Co-authored-by: mattarde <mattarde@intel.com>	2024-09-18 17:55:29 +08:00
Luke Lau	edac1b2d63	[RISCV] Promote bf16 ops to f32 with zvfbfmin (#108937 ) For f16 with zvfhmin, we promote most ops and VP ops to f32. This does the same for bf16 with zvfbfmin, so the two fp types should now be in sync. There are a few places in the custom lowering where we need to check for a LMUL 8 f16/bf16 vector that can't be promoted and must be split, this extracts that out into isPromotedOpNeedingSplit. In a follow up NFC we can deduplicate the code that sets up the promotions.	2024-09-18 17:39:40 +08:00
Luke Lau	8d7d4c25cb	[RISCV] Split fp rounding ops with zvfhmin nxv32f16 (#108765 ) This adds zvfhmin test coverage for fceil, ffloor, fnearbyint, frint, fround and froundeven and splits them at nxv32f16 to avoid crashing, similarly to what we do for other nodes that we promote. This also sets ftrunc to promote which was previously missing. We already promote the VP version of it, vp_froundtozero. Marking it as promoted affects some of the cost model tests since they're no longer expanded.	2024-09-18 16:36:13 +08:00
Aditi Medhane	5a8d2dd1f9	[AMDGPU] Handle subregisters properly in generic operand legalizer (#108496 ) Fix for the issue found during COPY introduction during legalization of PHI operands for sgpr to vgpr copy when subreg is involved.	2024-09-18 13:14:49 +05:30
Ahmed S. Taei	22a2d74c0c	[NVPTX] Emit ld.v4.b16 for loading <4 x bfloat> (#109069 ) This PR enables emitting a single load instruction for <4 x bfloat>, otherwise, 2 ld.b32 loads are generated.	2024-09-17 21:06:46 -07:00
Heejin Ahn	08bba6503b	[WebAssembly] Support binary generation for new EH (#109027 ) This adds support for binary generation for the new EH proposal. So far the only case that we emitted variable immediate operands in binary has been `br_table`'s destinations. (Other `variable_ops` uses in TableGen files are register operands, such as the operands of `call`, so they don't get emitted in binary as a part of the same instruction.) With this PR, variable immediate operands can include `try_table`'s operands: - The number of of catch clauses - catch clauses sub-opcodes - `catch`: 0x00 - `catch_ref`: 0x01 - `catch_all`: 0x02 - `catch_all_ref`: 0x03 - catch clauses' destinations With `try_table`, we now have variable expr operands for `try_table`'s catch clauses' tags. We treat their fixups in the same way we do for tags in other instructions such as in `throw`. Diff without whitespace will be easier to view.	2024-09-17 14:58:19 -07:00
Michael Maitland	1ebe16bf43	[RISCV] Add VL optimization related tests These tests are good candidate for VL optimization. This is a pre-commit for PR #108640, but can could probably also be improved by the peephole VL optimizations.	2024-09-17 11:14:20 -07:00
Simon Pilgrim	6153582c9c	[X86] combineX86ShuffleChainWithExtract - peek through insert_subvector(undef,vec,0) widening patterns when tracking subvector sources Helps replace a number of X86ISD::VPERMV3 nodes that are shuffling subvectors from the same source with X86ISD::VPERMV equivalents.	2024-09-17 18:43:32 +01:00
Stephen Tozer	51a29b5f16	Revert2 "[DebugInfo][DWARF] Set is_stmt on first non-line-0 instruction in BB (#105524 )" Reverted due to large .debug_line size regressions for some configurations; work currently in place to improve the output of this behaviour in PR #108251. This patch also modifies two tests that were created or modified after the original commit landed and are affected by the revert: llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll llvm/test/DebugInfo/X86/empty-line-info.ll This reverts commit 5fef40c2c477e92187bd4e5c18091eca6b8465cc.	2024-09-17 18:29:20 +01:00
Michael Maitland	64972834c1	[RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (#108991 ) This is mostly a copy of the AArch64PostLegalizerLoweringPass, except it removes all of the AArch64 combines. This pass allows us to lower instructions after the generic post-legalization combiner has had a chance to run. We will be adding combines to this pass in future patches.	2024-09-17 13:18:35 -04:00
Craig Topper	da46244e49	Revert "[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific." This reverts commit 884ff9e3f9741ac282b6cf8087b8d3f62b8e138a. Regression was reported in Halide for arm32.	2024-09-17 09:04:43 -07:00
Philip Reames	594579b7af	[RISCV] Autogenerate compress-opt-select.ll I realized after spending way too much time looking at this, that we can avoid objdump entirely here by having the assembly simply not print the aliases. Once we do that, we can simply autogen this test, and updates become trivial and understandable.	2024-09-17 08:47:20 -07:00
Farzon Lotfi	0f97b4824a	[Scalarizer][DirectX] Add support for scalarization of Target intrinsics (#108776 ) Since we are using the Scalarizer pass in the backend we needed a way to allow this pass to operate on Target intrinsics. We achieved this by adding `TargetTransformInfo ` to the Scalarizer pass. This allowed us to call a function available to the DirectX backend to know if an intrinsic is a target intrinsic that should be scalarized.	2024-09-17 11:35:42 -04:00
Philip Reames	c532e6db27	[RISCV] Restructure compress-opt-select.ll Two major changes: - Remove use of sed preprocessing - this was being used to create two versions of each test, and the result is much more readable if we just duplicate the tests. - Use a regex for matching the condition. An upcoming change causes us to reverse the branch direction (which doesn't matter to the purpose of these tests at all), so using the regex makes the test more stable.	2024-09-17 08:29:03 -07:00
Philip Reames	b153cc5c2b	[RISCV] Fix boundary error in compress-opt-select.ll Per the comment, this test is intending to test the first constant which can't be encoded via a c.addi. However, -32 can be encoded as in a c.addi, and all that's preventing it from doing so is the register allocators choice to use a difference destination register on the add than it's source. (Which compressed doesn't support.) The current LLC codegen for this test looks like: addi a1, a0, -32 li a0, -99 bnez a1, .LBB0_2 li a0, 42 .LBB0_2: ret After https://github.com/llvm/llvm-project/pull/108889, we sink the LI, and the register allocator picks the same source and dest register for the addi resulting in the c.addi form being emitted. So, to avoid a confusing diff let's fix the test to check what was originally intended.	2024-09-17 07:43:28 -07:00
David Green	2242cd2b6a	[DAG] Fold vecreduce.or(sext(x)) to sext(vecreduce.or(x)) (#108959 ) The same is true for and / xor reductions, where the sext / zext can be sank down through the bitwise operation. https://alive2.llvm.org/ce/z/TvzCd5	2024-09-17 15:24:00 +01:00
Mikhail R. Gadelha	d2125e1db6	[RISCV] Support STRICT_UINT_TO_FP and STRICT_SINT_TO_FP (#102503 ) This patch adds support for the missing STRICT_UINT_TO_FP and STRICT_SINT_TO_FP for riscv and adds a test case for rv32 which was previously crashing. The code is in line with how other strict_* nodes are handled (e.g., getting op(1) instead of op(0) when it's a strict node, as op(0) in a strict node is the entry token).	2024-09-17 11:21:52 -03:00
Michael Maitland	ee2add0683	[GISEL] Fix bugs and clarify spec of G_EXTRACT_SUBVECTOR (#108848 ) The implementation was missing the fact that `G_EXTRACT_SUBVECTOR` destination and source vector can be different types. Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate the correct opcode. Clarify the G_EXTRACT_SUBVECTOR specification.	2024-09-17 10:08:39 -04:00
Simon Pilgrim	b222ec1865	[X86] vector-reduce-add-mask.ll - regenerate vpmulhuw asm comments. NFC	2024-09-17 11:40:38 +01:00
Simon Pilgrim	742e04de96	[X86] combineConcatVectorOps - handle *_EXTEND nodes	2024-09-17 11:40:37 +01:00
David Green	8411214c56	[AArch64] Tests for vecreduce.or(sext(x)), with or/and/xor and sext/zext. NFC	2024-09-17 11:27:05 +01:00
Csanád Hajdú	72901fe19e	[AArch64] Fold UBFMXri to UBFMWri when it's an LSR or LSL alias (#106968 ) Using the LSR or LSL aliases of UBFM can be faster on some CPUs, so it is worth changing 64 bit UBFM instructions, that are equivalent to 32 bit LSR/LSL operations, to 32 bit variants. This change folds the following patterns: * If `Imms == 31` and `Immr <= Imms`: `UBFMXri %0, Immr, Imms` -> `UBFMWri %0.sub_32, Immr, Imms` * If `Immr == Imms + 33`: `UBFMXri %0, Immr, Imms` -> `UBFMWri %0.sub_32, Immr - 32, Imms`	2024-09-17 11:21:23 +01:00
Benjamin Kramer	3e32e45591	[NVPTX] Verify ptx in the right version	2024-09-17 10:40:21 +02:00
SpencerAbson	79d380f2ca	[AArch64][SVE2] Add codegen patterns for SVE2 FAMINMAX (#107284 ) Tablegen patterns were previously added to lower the following sequences from generic IR to NEON FAMIN/FAMAX instructions - `fminimum((abs(a), abs(b)) -> famin(a, b)` - `fmaximum((abs(a)), abs(b)) -> famax(a, b)` - https://github.com/llvm/llvm-project/pull/103027 - `fminnum[nnan](abs(a), abs(b)) -> famin(a, b)` - `fmaxnum[nnan](abs(a), abs(b)) -> famax(a, b)` - https://github.com/llvm/llvm-project/pull/104766 The same idea has been applied for the scalable vector variants of [FAMIN](https://developer.arm.com/documentation/ddi0602/2024-06/SVE-Instructions/FAMIN--Floating-point-absolute-minimum--predicated--)/[FAMAX](https://developer.arm.com/documentation/ddi0602/2024-06/SVE-Instructions/FAMAX--Floating-point-absolute-maximum--predicated--). ('nnan' documenatation: https://llvm.org/docs/LangRef.html#fast-math-flags). - Changes to LLVM - lib/target/AArch64/AArch64SVEInstrInfo.td - Add 'AArch64fminnm_p_nnan' and 'AArch64fmaxnm_p_nnan' patfrags (patterns predicated on the 'nnan' flag). - Add 'AArch64famax_p' and 'AArch64famin_p' - test/CodeGen/AArch64/aarch64-sve2-faminmax.ll - Add tests to verify the new patterns, including both positive and negative tests for 'nnan' predicated behavior.	2024-09-17 09:12:27 +01:00
Simon Pilgrim	c91f2a259f	[X86] Consistently use 'k' for predicate mask registers in instruction names (#108780 ) We use 'k' for move instructions and to indicate masked variants of evex instructions, but otherwise we're very inconsistent when we use 'k' vs 'r'.	2024-09-17 08:57:57 +01:00
Thorsten Schütt	acfa294b5e	[GlobalIsel] Canonicalize G_FCMP (#108891 ) As a side-effect, we start constant folding fcmps.	2024-09-17 09:42:04 +02:00
Luke Lau	6af2f225a0	[RISCV] Restrict combineOp_VLToVWOp_VL w/ bf16 to vfwmadd_vl with zvfbfwma (#108798 ) We currently make sure to check that if folding an op to an f16 widening op that we have zvfh. We need to do the same for bf16 vectors, but with the further restriction that we can only combine vfmadd_vl to vfwmadd_vl (to get vfwmaccbf16.v{v,f}). The added test case currently crashes because we try to fold an add to a bf16 widening add, which doesn't exist in zvfbfmin or zvfbfwma This moves the checks into the extension support checks to keep it one place.	2024-09-17 13:35:25 +08:00
Craig Topper	884ff9e3f9	[LegalizeVectorOps] Make the AArch64 hack in ExpandFNEG more specific. Only scalarize single element vectors when vector FSUB is not supported and scalar FNEG is supported.	2024-09-16 21:48:42 -07:00
bwlodarcz	f99bb02d7d	[SPIR-V] Emit DebugTypeBasic for NonSemantic DI (#106980 ) The commit introduces support for fundamental DI instruction. Metadata handlers required for this instruction is stored inside debug records (https://llvm.org/docs/SourceLevelDebugging.html) parts of the module which rises the necessity of it's traversal.	2024-09-16 18:26:22 -07:00
Craig Topper	f022111e64	[RISCV][GISel] Restore s32 support on RV64 for DIV, and REM. This reverts commit 2599d695128381e6932b43f0e95649c533308d6d. I was was plannig to remove s32 as a legal type on RV64, but I'm rethinking that.	2024-09-16 16:29:20 -07:00
Craig Topper	a366323c8d	[RISCV][GISel] Restore s32 support for G_ABS on RV64. This reverts commit 5e6a1987a5d4574d3c3811f878ddbbbf7c35fa01. I was was plannig to remove s32 as a legal type on RV64, but I'm rethinking that.	2024-09-16 16:29:20 -07:00
Philip Reames	7e56a09278	[RISCV] Add a testcase for an unprofitable machine-sink issue This corresponds to an upcoming change which will fully explain why this is a machine-sink issue.	2024-09-16 14:17:42 -07:00
David Majnemer	49c5cebb29	[X86] Improve support for vXi8 arithmetic shifts, logical left shifts Use SWAR techniques for arithmetic shifts: we use the same technique as logical right shift but with an additional step of sign extending the result. Also, use the logical shift left technique even on AVX512 as vpmovzxbw and vpmovwb are actually quite expensive.	2024-09-16 20:33:12 +00:00
Kevin McAfee	62cdc2a347	[NVPTX] Convert calls to indirect when call signature mismatches function signature (#107644 ) When there is a function signature mismatch between a call instruction and the callee, lower the call to an indirect call. The current behavior is to produce direct calls that may or may not be valid PTX. Consider the following example with mismatching return types: ``` %struct.1 = type <{i64}> %struct.2 = type <{i64}> declare %struct.1 @callee() ... %call1 = call %struct.2 @callee() %call2 = call i64 @callee() ``` The return type of `callee` in PTX is `.b8 _[8]`. The return type of `%call1` will be the same and so the PTX has no problems. The return type of `%call2` will be `.b64`, so the types will not match and PTX will be unacceptable to ptxas. This despite all the types having the same size. The same is true for mismatching parameter types. If we instead convert these calls to indirect calls, we will generate functional PTX when the types have the same size. If they do not have the same size then the PTX will likely be incorrect, though this will not necessarily be caught by ptxas. Also, even if the sizes are the same, if the types differ then it is technically undefined behavior. This change allows for more flexibility in the bitcode that can be lowered to functioning PTX, at the cost of sometimes producing PTX that is less clearly wrong than it would have been previously (i.e. incorrect indirect calls are not as obviously wrong as incorrect direct calls). We consider it okay to generate PTX with undefined behavior as the behavior of calls with mismatching types is not explicitly defined.	2024-09-16 13:08:18 -07:00
Craig Topper	4eb9780261	[RISCV] Fix IR for store_large_offset_no_opt_i16 in make-compressible-zbc.mir. NFC The IR used loads instead of stores.	2024-09-16 12:51:07 -07:00
Philip Reames	8f023ec81d	[RISCV] Add coverage for select C, C1, C2 where (C1-C2)*[0,1] is cheap	2024-09-16 12:46:43 -07:00
Farzon Lotfi	8ee685e601	[NFC][DirectX] fix intrinsics that need IntrNoMem and test typo (#108852 ) In the process of adding scalarization support for DirectX target intrinsics I found that intrinsics that weren't marked with `IntrNoMem` did not get removed by `RecursivelyDeleteTriviallyDeadInstructionsPermissive`. So this change is to make it more clear that our intrinsics don't have side effects. I only added `IntrNoMem` to the intrinics in `IntrinsicsDirectX.td` I was involved with. There a potentially a few other cases that might warrant this attribute, but will need input on the others.	2024-09-16 14:19:29 -04:00
David Green	960c975acd	[AArch64] Expand scmp/ucmp vector operations with sub (#108830 ) Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select, under Neon we can use the arithmetic expansion to generate fewer instructions. Notably it also prevents the scalarization of vselect during vector-legalization.	2024-09-16 18:44:52 +01:00
Thorsten Schütt	5c348f692a	[GlobalIsel] Canonicalize G_ICMP (#108755 ) As a side-effect, we start constant folding icmps. Split out from https://github.com/llvm/llvm-project/pull/105991.	2024-09-16 19:25:34 +02:00
David Green	823eab2bd5	[AArch64] Add a selection for vector scmp/ucmp tests. NFC	2024-09-16 14:25:15 +01:00
David Green	feac761f37	[GlobalISel][AArch64] Add G_FPTOSI_SAT/G_FPTOUI_SAT (#96297 ) This is an implementation of the saturating fp to int conversions for GlobalISel. On AArch64 the converstion instrctions work this way, producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is ported from SDAG. AArch64 has a lot of existing tests for fptosi_sat, covering a wide range of types. I have tried to make most of them work all at once, but a few fall back due to other missing features such as f128 handling for min/max.	2024-09-16 10:33:59 +01:00

1 2 3 4 5 ...

55141 Commits