llvm-project

Author	SHA1	Message	Date
Craig Topper	1bc9de2474	[RISCV] Add test cases for llvm.tan/asin/acos/atan/atan2/sinh/cosh/tanh. NFC	2024-11-27 10:24:34 -08:00
Craig Topper	d7643e8610	[RISCV][GISel] Support f32/f64 llvm.exp10 intrinsics.	2024-11-27 10:24:33 -08:00
Craig Topper	dae9cf3816	[RISCV] Move scalar llvm.exp10 tests into half/float/double-intrinsics.ll. NFC Improves coverage for more configurations.	2024-11-27 10:24:33 -08:00
Petar Avramovic	87503fa51c	Revert "AMDGPU/GlobalISel: Add stub custom regbankselect pass" (#113913 ) This reverts commit e9c49901a43f5b16c3df416460b7e4dbdd24ce03. Current AMDGPURegBankSelect does nothing different then RegBankSelect. Revert to using generic RegBankSelect in preparation for adding new regbankselect passes. New AMDGPURegBankSelect, that will use uniformity analysis for regbank select decisions, will not subclass RegBankSelect. Revert regression tests to use regbankselect since amdgpu-regbankselect will be used by new pass and behavior will be different.	2024-11-27 13:16:22 -05:00
RolandF77	a475180498	[PowerPC] Use setbc for values from vector compare conditions (#114858 ) For P10 use the setbc instruction to get int values from vector compare summary condition results.	2024-11-27 12:47:10 -05:00
knickish	c29e895ad2	[M68k] Handle 16 bit MOVs to and from CCR (#114714 ) Builds on @TechnoElf 's CCR MOV pr https://github.com/llvm/llvm-project/pull/107591 and adds some tests. Fixes https://github.com/llvm/llvm-project/issues/106210. --------- Co-authored-by: TechnoElf <technoelf@undertheprinter.com>	2024-11-27 09:37:57 -08:00
Sander de Smalen	318c69de52	Reland "[AArch64] Define high bits of FPR and GPR registers (take 2) (#114827 )" The issue with slow compile-time was caused by an assert in AArch64RegisterInfo.cpp. The assert invokes 'checkAllSuperRegsMarked' after adding all the reserved registers. This call gets very expensive after adding the _HI registers due to the way the function searches in the 'Exception' list, which is expected to be a small list but isn't (the patch added 190 _HI regs). It was possible to rewrite the code in such a way that the _HI registers are marked as reserved after the check. This makes the problem go away entirely and restores compile-time to what it was before (tested for `check-runtimes`, which previously showed a ~5x slowdown). This reverts commits: 1434d2ab215e3ea9c5f34689d056edd3d4423a78 2704647fb7986673b89cef1def729e3b022e2607	2024-11-27 13:31:59 +00:00
Simon Pilgrim	f30f7a084c	[X86] canonicalizeShuffleWithOp - initial support for shuffle(cvt(x),cvt(y)) -> cvt(shuffle(x,y)) Initial support is just for UNPCKL(CVTPH2PS(X),CVTPH2PS(Y)) -> CVTPH2PS(UNPCKL(X,Y)) Making this more general for other shuffles/conversions will have to be done carefully as we have to handle changes in src/dst element width, so I just handled the CVTPH2PS regression case. Fixes #83414	2024-11-27 12:38:52 +00:00
Igor Kirillov	e874c8fc27	[SelectOpt] Refactor to prepare for support more select-like operations (#117582 ) * Enables conversion of several select-like instructions within one group * Any number of auxiliary instructions depending on the same condition can be in between select-like instructions * After splitting the basic block, move select-like instructions into the relevant basic blocks and optimise them * Make it easier to add support shift-base select-like instructions and also any mixture of zext/sext/not instructions	2024-11-27 11:35:59 +00:00
Anatoly Trosinenko	1fccba5ca1	[AArch64][PAC] Eliminate excessive MOVs when computing blend (#115185 ) As function calls do not generally preserve X16 and X17, it is beneficial to allow AddrDisc operand of BLRA instruction to reside in these registers and make use of this condition when computing the discriminator. This can save up to two MOVs in cases such as loading a (signed) virtual function pointer via a (signed) pointer to vtable, for example ldr x9, [x16] mov x8, x16 mov x17, x8 movk x17, #34646, lsl #48 blraa x9, x17 can be simplified to ldr x8, [x16] movk x16, #34646, lsl #48 blraa x8, x16	2024-11-27 13:24:32 +03:00
David Green	712ef7d0ba	[AArch64][GlobalISel] Fix smull and umull intrinsics. These were the wrong way around somehow, with aarch64_neon_umull being converted to G_SMULL.	2024-11-27 10:11:06 +00:00
tangaac	427be07675	[LoongArch] Support amcas[_db].{b/h/w/d} instructions. (#114189 ) Two options for clang: -mlamcas & -mno-lamcas. Enable or disable amcas[_db].{b/h} instructions. The default is -mno-lamcas. Only works on LoongArch64.	2024-11-27 17:36:13 +08:00
Craig Topper	50dfb0772b	[RISCV] Support f32/f64 libcalls for sin/cos/pow/log/log2/log10/exp/exp2 Test cases copied from SelectionDAG.	2024-11-26 23:35:52 -08:00
tangaac	53c0a25db7	[LoongArch] Use div.w/mod.w to eliminate unnecessary sign-extend for sdiv/srem i32. (#117298 )	2024-11-27 14:35:53 +08:00
Matt Arsenault	b4a16a78c2	AMDGPU: Match and Select BITOP3 on gfx950 (#117843 ) Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>	2024-11-27 01:31:19 -05:00
Matt Arsenault	6934870a13	AMDGPU: Remove FeatureCvtFP8VOP1Bug from gfx950 (#117827 )	2024-11-27 01:28:09 -05:00
Durgadoss R	40d0058e6a	[NVPTX] Add TMA bulk tensor reduction intrinsics (#116854 ) This patch adds NVVM intrinsics and NVPTX codegen for: * cp.async.bulk.tensor.reduce.1D -> 5D variants, supporting both Tile and Im2Col modes. * These intrinsics optionally support cache_hints as indicated by the boolean flag argument. * Lit tests are added for all combinations of these intrinsics in cp-async-bulk-tensor-reduce.ll. * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-reduce-async-bulk-tensor Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-11-27 10:57:51 +05:30
Thurston Dang	0d15d46362	[ubsan] Change ubsan-unique-traps to use nomerge instead of counter (#117651 ) https://github.com/llvm/llvm-project/pull/65972 (continuation of https://reviews.llvm.org/D148654) had considered adding nomerge to ubsantrap, but did not proceed with that because of https://github.com/llvm/llvm-project/issues/53011. Instead, it added a counter (based on TrapBB->getParent()->size()) to each ubsantrap call. However, this counter is not guaranteed to be unique after inlining, as shown by https://github.com/llvm/llvm-project/pull/83470, which can result in ubsantraps being merged by the backend. https://github.com/llvm/llvm-project/pull/101549 has since fixed the nomerge limitation ("It sets nomerge flag for the node if the instruction has nomerge arrtibute."). This patch therefore takes advantage of nomerge instead of using the counter, guaranteeing that the ubsantraps are not merged. This patch is equivalent to https://github.com/llvm/llvm-project/pull/83470 but also adds nomerge and updates tests (https://github.com/llvm/llvm-project/pull/117649: ubsan-trap-merge.c; https://github.com/llvm/llvm-project/pull/117657: ubsan-trap-merge.ll, ubsan-trap-nomerge.ll; catch-undef-behavior.c).	2024-11-26 21:13:00 -08:00
Sergei Barannikov	61a23646c9	[SjLjEHPrepare] Configure call sites correctly (#117656 ) After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site` before all invoke instruction except the one in the entry block, which has the effect of bypassing landing pads on exceptions. When configuring the call site for a potentially throwing instruction check that it is not `InvokeInst` -- they are handled by earlier code.	2024-11-27 08:03:47 +03:00
Matt Arsenault	5615657209	AMDGPU: Builtin & CodeGen support for v_cvt_sr_{bf16\|f16}_f32 instructions (#117824 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:37:05 -05:00
Matt Arsenault	62dc8f3069	AMDGPU: Add builtins & codegen support for bitop3_b{16\|32} of gfx950. (#117823 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:33:07 -05:00
Matt Arsenault	142b33c58b	AMDGPU: Allocate different registers for vdst & src in v_cvt_scalef32* (#117822 ) For multipass instructions, overlap on VDST and SRC’s would result in HW race & undefined results. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 23:29:11 -05:00
Matt Arsenault	265e209ceb	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_{bf8\|fp8}_{f16\|bf16\|f32} (#117821 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:24:01 -05:00
Matt Arsenault	301c8e6047	AMDGPU: Add support for v_cvt_scalef32_sr instructions (#117820 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 23:20:16 -05:00
antangelo	512defe603	[NFC][GISel][AArch64] Pre-commit baseline tests for translation of @llvm.expect.with.probability (#117842 ) Pre-commit of tests for generic GlobalISel translation of `@llvm.expect.with.probability` for when optimizations are not enabled	2024-11-26 22:58:56 -05:00
Brandon Wu	4a7dbede6b	[RISCV] Support `svukte` extension (#115657 ) This is the extension for "Address-Independent Latency of User-Mode Faults to Supervisor Addresses". Spec: https://github.com/riscv/riscv-isa-manual/pull/1564, https://lf-riscv.atlassian.net/browse/RVS-2977 The spec states that the `svukte` depends on `sv39`, but we don't have `sv39` yet, so I didn't add it to the implied list.	2024-11-27 10:54:57 +08:00
Craig Topper	38a3cce90a	[RISCV][GISel] Copy fneg test cases from SelectionDAG into float/double-arith.ll. NFC The test cases use fcmp which was not fully supported before 43b6b78771e9ab4da912b574664e713758c43110.	2024-11-26 18:20:56 -08:00
antangelo	dd4844722d	[SelectionDAG] Add generic implementation for @llvm.expect.with.probability when optimizations are disabled (#117459 ) Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel, and IntrinsicLowering in the same way \@llvm.expect is handled, where the value is passed through as-is. This can be reached if the intrinsic is used without optimizations, where it would otherwise be properly transformed out. Fixes #115411 for SelectionDAG. A similar patch is likely needed for GlobalISel.	2024-11-26 20:22:25 -05:00
Sam Clegg	ea58410d0f	[WebAssembly] Implement %llvm.thread.pointer intrinsic (#117817 ) We can simply use the `__tls_base` global for this which is guaranteed to be non-zero and unique per thread. Fixes: #117433	2024-11-26 17:19:14 -08:00
Matt Arsenault	76715787f4	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_sr_pk_fp4 instructions (#117798 ) Co-authored-by: Shilei Tian <shilei.tian@amd.com>	2024-11-26 19:59:14 -05:00
Matt Arsenault	c8ee1ee057	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk_fp4_{f\|bf}16 for gfx950 (#117794 ) These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. OPSEL ASM OPSEL Syntax: opsel:[a,b,c,d] a & b are meaningless, c & d together decides byte to write in dst reg. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:38:23 -05:00
Matt Arsenault	065dc93d96	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{bf\|f}16_{bf\|fp}8 for gfx950 (#117793 ) OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:35:18 -05:00
Matt Arsenault	991dcbc468	AMDGPU: Builtin & codegen support for v_cvt_scalef32_pk32_{bf\|f}16_{bf\|fp}6 for gfx950 (#117747 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:30:04 -05:00
Matt Arsenault	0f4fcca546	AMDGPU: Builtin & CodeGen support for v_cvt_scalef32_pk32_f32_[fp\|bf]6 for gfx950 (#117745 ) Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:26:07 -05:00
Matt Arsenault	eeb76880f3	AMDGPU: Builtins & CodeGen support for v_cvt_scalef32_pk_{f\|bf}16_fp4 for gfx950 (#117744 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_{f\|bf}16_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. Note: Conventional Inst{13} i.e. OPSEL[2] is ignored in asm syntax. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:23:15 -05:00
Matt Arsenault	2b9e947d43	AMDGPU: Builtins & Codegen support for v_cvt_scale_fp4<->f32 for gfx950 (#117743 ) OPSEL ASM Syntax for v_cvt_scalef32_pk_f32_fp4 : opsel:[x,y,z] where, x & y i.e. OPSEL[1 : 0] selects which src_byte to read. OPSEL ASM Syntax for v_cvt_scalef32_pk_fp4_f32 : opsel:[a,b,c,d] where, c & d i.e. OPSEL[3 : 2] selects which dst_byte to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:20:09 -05:00
Matt Arsenault	4527894143	Builtins & Codegen support for v_cvt_scalef32_pk_{fp\|bf}8_{f\|bf}16 for gfx950 (#117742 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:16:08 -05:00
Matt Arsenault	62584f32eb	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_f32_{fp8\|bf8} for gfx950 (#117741 ) OPSEL[0] determines low/high 16 bits of src0 to read. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 19:12:18 -05:00
Craig Topper	43b6b78771	[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660 ) LegalizerHelp only supported f128 libcalls and incorrectly assumed that the destination register for the G_FCMP was s32.	2024-11-26 15:48:49 -08:00
Pradeep Kumar	e84614833e	[LLVM][NVPTX] Add support for div.full instruction (#116482 ) This commit adds NVPTX support for div.full PTX instruction with test under div.ll. [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-div)	2024-11-27 04:57:42 +05:30
Thurston Dang	dde7f4d024	[NFC][clang] Add ubsan-trap-merge.ll test to show absence of nomerge considered harmful (#117657 ) These testcases demonstrate that ubsan intrinsics are merged in the backend iff nomerge is missing from ubsantrap intrinsics. This is based on the observation and testcase by Vitaly Buka in https://github.com/llvm/llvm-project/pull/83470.	2024-11-26 14:21:05 -08:00
Matt Arsenault	803bd812b1	AMDGPU: Builtins & Codegen support for v_cvt_scalef32_pk_{fp8\|bf8}_f32 for gfx950 (#117740 ) OPSEL[3] determines low/high 16 bits of word to write. Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:57:09 -05:00
Matt Arsenault	815069c701	AMDGPU: Builtins & Codegen support for: v_cvt_scalef32_[f16\|f32]_[bf8\|fp8] (#117739 ) OPSEL[1:0] collectively decide which byte to read from src input. Builtin takes additional imm argument which represents index (with valid values:[0:3]) of src byte read. Out of bounds checks will added in next patch. OPSEL ASM Syntax: opsel:[x,y,z] where, opsel[x] = Inst{11} = src0_modifier{2} opsel[y] = Inst{12} = src1_modifier{2} opsel[z] = Inst{14} = src0_modifier{3} Note: Inst{13} i.e. OPSEL[2] is ignored in asm syntax and opsel[z] is meaningless for v_cvt_scalef32_f32_{fp\|bf}8 Co-authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-11-26 14:54:10 -05:00
Matt Arsenault	7221bc74bc	AMDGPU: Make v2f16 minimum/maximum legal for gfx950 (#117738 )	2024-11-26 14:51:05 -05:00
Matt Arsenault	f5e92eb04b	AMDGPU: Handle f32 minimum3/maximum3 pattern for gfx950 (#117737 )	2024-11-26 14:47:52 -05:00
Matt Arsenault	e57b327be2	AMDGPU: Legalize fminimum and fmaximum f32 for gfx950 (#117634 ) Select to minimum3/maximum3. Leave f16/v2f16 for later since it's complicated by only having the vector version.	2024-11-26 14:44:09 -05:00
Matt Arsenault	5a3299a684	AMDGPU: Remove some -verify-machineinstrs from tests (#117736 ) We should leave these for EXPENSIVE_CHECKS builds. Some of these were near the top of slowest tests.	2024-11-26 12:59:15 -05:00
Philip Reames	c55a080c08	[RISCV] Add shuffle coverage for compress, decompress, and repeat idioms compress is intented to match vcompress from the ISA manual. Note that deinterleave is a subset of this, and is already tested elsewhere. decompress is the synthetic pattern defined in same - though we can often do better than the mentioned iota/vrgather. Note that some of these can also be expressed as interleave with at least one undef source, and is already tested elsewhere. repeat repeats each input element N times in the output. It can be described as as a interleave operations, but we can sometimes do better lowering wise.	2024-11-26 09:27:56 -08:00
Zaara Syeda	b1a34b80b8	[NFC][Test] Fix PowerPC test gcov_ctr_ref_init.ll (#117577 )	2024-11-26 12:09:49 -05:00
SpencerAbson	2a0162c019	[AArch64][SVE] Change the immediate argument in svextq (#115340 ) In order to align with `svext` and NEON `vext`/`vextq`, this patch changes immediate argument in `svextq` such that it refers to elements of the size of those of the source vector, rather than bytes. The [spec for this intrinsic](https://github.com/ARM-software/acle/blob/main/main/acle.md#extq) is ambiguous about the meaning of this argument, this issue was raised after there was a differing interpretation for it from the implementers of the ACLE in GCC. For example (with our current implementation): `svextq_f64(zn_f64, zm_f64, 1)` would, for each 128-bit segment of `zn_f64,` concatenate the highest 15 bytes of this segment with the first byte of the corresponding segment of `zm_f64`. After this patch, the behavior of `svextq_f64(zn_f64, zm_f64, 1)` would be, for each 128-bit vector segment of `zn_f64`, to concatenate the higher doubleword of this segment with the lower doubleword of the corresponding segment of `zm_f64`. The range of the immediate argument in `svextq` would be modified such that it is: - [0,15] for `svextq_{s8,u8}` - [0,7] for `svextq_{s16,u16,f16,bf16}` - [0,3] for `svextq_{s32,u32,f32}` - [0,1] for `svextq_{s64,u64,f64}`	2024-11-26 16:50:51 +00:00

1 2 3 4 5 ...

56301 Commits