llvm-project

Author	SHA1	Message	Date
Matt Arsenault	d9c4e9ffe7	AMDGPU: Verify f8f6f4 formats in assembler (#117826 ) Verify the register widths of the corresponding operands match the floating point format expected size.	2024-11-26 23:45:03 -05:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	142b33c58b	AMDGPU: Allocate different registers for vdst & src in v_cvt_scalef32* (#117822 ) For multipass instructions, overlap on VDST and SRC’s would result in HW race & undefined results. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:29:11 -05:00
Matt Arsenault	265e209ceb	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_{bf8\|fp8}_{f16\|bf16\|f32} (#117821 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:24:01 -05:00
Matt Arsenault	301c8e6047	AMDGPU: Add support for v_cvt_scalef32_sr instructions (#117820 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:20:16 -05:00
Brandon Wu	4a7dbede6b	[RISCV] Support `svukte` extension (#115657 ) This is the extension for "Address-Independent Latency of User-Mode Faults to Supervisor Addresses". Spec: https://github.com/riscv/riscv-isa-manual/pull/1564, https://lf-riscv.atlassian.net/browse/RVS-2977 The spec states that the `svukte` depends on `sv39`, but we don't have `sv39` yet, so I didn't add it to the implied list.	2024-11-27 10:54:57 +08:00
Sam Clegg	ea58410d0f	[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817 ) We can simply use the `__tls_base` global for this which is guaranteed to be non-zero and unique per thread. Fixes: #117433	2024-11-26 17:19:14 -08:00
Matt Arsenault	76715787f4	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_pk_fp4 instructions (#117798 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:59:14 -05:00
Matt Arsenault	f87cabea26	AMDGPU: MC support for v_cvt_scalef32_sr_{bf8\|fp8}_{f16\|bf16\|f32} (#117797 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:52:09 -05:00
Matt Arsenault	34a8bb0da3	AMDGPU: MC support for v_cvt_sr_{f16\|bf16}_f32 instructions (#117796 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:48:50 -05:00
Matt Arsenault	d3c103b80e	AMDGPU: MC support for V_CVT_SCALE_SR_FP4 instructions (#117795 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:41:52 -05:00
Matt Arsenault	c8ee1ee057	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk_fp4_{f\|bf}16 for gfx950 (#117794 ) These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. OPSEL ASM OPSEL Syntax: opsel:[a,b,c,d] a & b are meaningless, c & d together decides byte to write in dst reg. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:38:23 -05:00
Matt Arsenault	065dc93d96	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{bf\|f}16_{bf\|fp}8 for gfx950 (#117793 ) OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:35:18 -05:00
Matt Arsenault	991dcbc468	AMDGPU: Builtin & codegen support for v_cvt_scalef32_pk32_{bf\|f}16_{bf\|fp}6 for gfx950 (#117747 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:30:04 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	eeb76880f3	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{f\|bf}16_fp4 for gfx950 (#117744 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_{f\|bf}16_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. Note: Conventional Inst{13} i.e. OPSEL[2] is ignored in asm syntax. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:23:15 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	4527894143	Builtins & Codegen support for v_cvt_scalef32_pk_{fp\|bf}8_{f\|bf}16 for gfx950 (#117742 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:16:08 -05:00
Matt Arsenault	62584f32eb	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8\|bf8} for gfx950 (#117741 ) OPSEL[0] determines low/high 16 bits of src0 to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:12:18 -05:00
Craig Topper	43b6b78771	[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660 ) LegalizerHelp only supported f128 libcalls and incorrectly assumed that the destination register for the G_FCMP was s32.	2024-11-26 15:48:49 -08:00
Pradeep Kumar	e84614833e	[LLVM][NVPTX] Add support for div.full instruction (#116482 ) This commit adds NVPTX support for div.full PTX instruction with test under div.ll. [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-div)	2024-11-27 04:57:42 +05:30
Matt Arsenault	803bd812b1	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_{fp8\|bf8}_f32 for gfx950 (#117740 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:57:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	7221bc74bc	AMDGPU: Make v2f16 minimum/maximum legal for gfx950 (#117738 )	2024-11-26 14:51:05 -05:00
Matt Arsenault	f5e92eb04b	AMDGPU: Handle f32 minimum3/maximum3 pattern for gfx950 (#117737 )	2024-11-26 14:47:52 -05:00
Matt Arsenault	e57b327be2	AMDGPU: Legalize fminimum and fmaximum f32 for gfx950 (#117634 ) Select to minimum3/maximum3. Leave f16/v2f16 for later since it's complicated by only having the vector version.	2024-11-26 14:44:09 -05:00
SpencerAbson	2a0162c019	[AArch64][SVE] Change the immediate argument in svextq (#115340 ) In order to align with `svext` and NEON `vext`/`vextq`, this patch changes immediate argument in `svextq` such that it refers to elements of the size of those of the source vector, rather than bytes. The [spec for this intrinsic](https://github.com/ARM-software/acle/blob/main/main/acle.md#extq) is ambiguous about the meaning of this argument, this issue was raised after there was a differing interpretation for it from the implementers of the ACLE in GCC. For example (with our current implementation): `svextq_f64(zn_f64, zm_f64, 1)` would, for each 128-bit segment of `zn_f64,` concatenate the highest 15 bytes of this segment with the first byte of the corresponding segment of `zm_f64`. After this patch, the behavior of `svextq_f64(zn_f64, zm_f64, 1)` would be, for each 128-bit vector segment of `zn_f64`, to concatenate the higher doubleword of this segment with the lower doubleword of the corresponding segment of `zm_f64`. The range of the immediate argument in `svextq` would be modified such that it is: - [0,15] for `svextq_{s8,u8}` - [0,7] for `svextq_{s16,u16,f16,bf16}` - [0,3] for `svextq_{s32,u32,f32}` - [0,1] for `svextq_{s64,u64,f64}`	2024-11-26 16:50:51 +00:00
Mark Goncharov	80df56e03b	Reapply "[RISCV] Implement tail call optimization in machine outliner" (#117700 ) This MR fixes failed test `CodeGen/RISCV/compress-opt-select.ll`. It was failed due to previously merged commit `[TTI][RISCV] Unconditionally break critical edges to sink ADDI (PR #108889)`. So, regenerated `compress-opt-select` test.	2024-11-26 23:39:45 +08:00
tangaac	f4379db496	[LoongArch] Support LA V1.1 feature that div.w[u] and mod.w[u] instructions with inputs not signed-extended. (#116764 ) Two options for clang -mdiv32: Use div.w[u] and mod.w[u] instructions with input not sign-extended. -mno-div32: Do not use div.w[u] and mod.w[u] instructions with input not sign-extended. The default is -mno-div32.	2024-11-26 21:57:29 +08:00
Nikita Popov	5322415f92	[PowerPC] Use getSignedConstant() in SelectOptimalAddrMode() All of these immediates are signed, as the surrounding comments indicate. This fixes an assertion failure in CodeGen/Generic/dag-combine-ossfuzz-crash.ll when run with a powerpc-aix triple.	2024-11-26 14:34:30 +01:00
Mehdi Amini	f94bd3c933	Revert "[RISCV] Implement tail call optimization in machine outliner" (#117710 ) Reverts llvm/llvm-project#115297 Bots are broken	2024-11-26 13:45:47 +01:00
Simon Pilgrim	90df66455b	[MCA][X86] Fix throughput of (V)PMOV extension/truncation 512-bit instructions znver4 512-bit instructions are half rate of 128/256-bit variants (still 1uop though) Confirmed with Agner/uops.info Noticed while triaging #110308 and #117579	2024-11-26 12:04:19 +00:00
Simon Pilgrim	45fdb77557	[MCA][X86] Cleanup znver4 instregex patterns for (V)PMOV extension/truncation instructions Split extension/truncation patterns to simplify matching. Fix patterns to consistently match SSE/AVX1/AVX2 variants as well. Add some missing src/dst type variants - there should be no difference in scheduling, its purely based on dst reg width. Confirmed with Agner/uops.info Noticed while triaging #110308	2024-11-26 10:56:27 +00:00
Fraser Cormack	3414993eaf	[AMDGPU][SplitModule] Fix potential divide by zero (#117602 ) A static analysis tool found that ModuleCost could be zero, so would perform divide by zero when being printed. Perhaps this is unreachable in practice, but the fix is straightforward enough and unlikely to be a performance concern.	2024-11-26 10:05:09 +00:00
Mark Goncharov	29062329f3	[RISCV] Implement tail call optimization in machine outliner (#115297 ) Following up issue #89822, this patch adds opportunity to use tail call in machine outliner pass. Also it enables outline patterns with X5(T0) register.	2024-11-26 12:30:37 +03:00
Piotr Sobczak	a96ec01e1a	[AMDGPU] Optimize out s_barrier_signal/_wait (#116993 ) Extend the optimization that converts s_barrier to wave_barrier (nop) when the number of work items is not larger than wave size. This handles the "split barrier" form of s_barrier where the barrier is represented by separate intrinsics (s_barrier_signal/s_barrier_wait). Note: the version where s_barrier is used in gfx12 (and later split) has the optimization already, but some front-ends may prefer to use split intrinsics and this is being addressed by the patch.	2024-11-26 10:04:32 +01:00
Craig Topper	bc282605df	[SelectionDAG] Require last operand of (STRICT_)FP_ROUND to be a TargetConstant. (#117639 ) Fix all the places I could find that did't do this. We were already mostly correct for FP_ROUND after 9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.	2024-11-25 21:36:33 -08:00
Matt Arsenault	ae719f0756	AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (#117601 )	2024-11-25 20:02:50 -08:00
Matt Arsenault	a5174de8c2	AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (#117600 )	2024-11-25 19:59:00 -08:00
Matt Arsenault	7fc71f7909	AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (#117599 ) Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:54:50 -08:00
Matt Arsenault	716364ebd6	AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (#117598 ) The encoding of v_dot2c_f32_bf16 opcode is same as v_mac_f32 in gfx90a, both from gfx9 series. This required a new decoderNameSpace GFX950_DOT. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:51:01 -08:00
Matt Arsenault	aa7eb5723c	AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (#117597 ) v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:47:48 -08:00
Matt Arsenault	5d650a62a3	AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (#117596 ) This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande <Sirish.Pande@amd.com>	2024-11-25 19:44:47 -08:00
Matt Arsenault	a87d484a97	AMDGPU: Support v_cvt_scalef32_2xpk16_{bf\|fp}6_f32 for gfx950. (#117595 ) Scale packed 16-component single-precision float vectors from two source inputs using the exponent provided by the third single-precision float input, then convert the values to a packed 32-component FP6 float value. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:41:12 -08:00
LiqinWeng	c3377af4c3	[RISCV][CostModel] add cost for cttz/ctlz under the non-zvbb (#117515 )	2024-11-26 11:40:52 +08:00
Matt Arsenault	d727b6f777	AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f\|bf}16 on gfx950. (#117594 ) These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:37:04 -08:00
Matt Arsenault	c767570eb1	AMDGPU: MC support for v_cvt_scalef32_pk_{bf\|f}16_{bf\|fp}8 of gfx950. (#117593 ) OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:30:07 -08:00
Matt Arsenault	22503a9df1	AMDGPU: Support v_cvt_scalef32_pk32_{bf\|f}6_{bf\|fp}16 for gfx950 (#117592 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:27:01 -08:00
Matt Arsenault	658db918fe	AMDGPU: MC support for v_cvt_scalef32_pk32_{bf\|f}16_{bf\|fp}6 of gfx950. (#117591 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-25 19:23:58 -08:00

1 2 3 4 5 ...

81365 Commits