llvm-project

Author	SHA1	Message	Date
Paul Walker	71b87d1267	[LLVM][SVE] Ensure all fixed length mask bits are defined. (#116819 ) convertFixedMaskToScalableVector expects the mask input to honour the BoolContents scheme employed by the target. For AArch64 this means a mask should be zero or all ones, and thus when promoting a mask we must use a sign extend.	2024-11-20 13:54:50 +00:00
Sergei Barannikov	8c56dd3040	[ARM] Stop gluing FP comparisons to FMSTAT (#116676 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with `cl_FPSCR_NZCV` register class. If this register class is inappropriate, `TRI::getCrossCopyRegClass` should be modified to return the correct class. Pull Request: https://github.com/llvm/llvm-project/pull/116676	2024-11-20 16:07:05 +03:00
Sam Elliott	408659c5b5	[RISCV] Merge GPRPair and GPRF64Pair (#116094 ) As suggested by Craig, this tries to merge the two sets of register classes created in #112983, GPRPair* and GPRF64Pair*. - I added some explicit annotations to `RISCVInstrInfoD.td` which fixed the type inference issues I was seeing from tablegen for select patterns. - I've had to make the behaviour of `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` cover more cases, because you cannot bitcast to/from untyped (the bitcast would otherwise have been inserted automatically by TargetLowering code). - I apparently didn't need to change `getNumRegisters` again, which continues to tell me there's a bug in the code for tied inputs. I added some more test coverage of this case but it didn't seem to help find the asserts I was finding before - I think the difference is between the default behaviour for integers which doesn't apply to floats. - There's still a difference between BuildGPRPair and BuildPairF64 (and the same for SplitGPRPair and SplitF64). I'm not happy with this, I think it's quite confusing, as they're very similar, just differing in whether they give a `untyped` or a `f64`. I haven't really worked out how the DAGCombiner copes if one meets the other, I know we have some of this for the f64 variants already, but they're a lot more complex than the GPRPair variants anyway.	2024-11-20 10:08:55 +00:00
Diana Picus	09c41246ed	[AMDGPU] Fix restores in chain functions (#116193 ) When spilling a VGPR in `emitPrologue`, chain functions prefer to use offsets to access the stack instead of the SP. This patch fixes `emitEpilogue` to do the same. It also brings back some test coverage that was lost in #93526, when WWM registers started being shifted to the lowest available range (which meant that tests that were originally spilling v8 would shift to spill v0, which is a scratch register for chain functions and didn't get spilled). Change-Id: Icb07fccd859b563cd45f74c25ae578ecb38bdeeb	2024-11-20 10:43:59 +01:00
David Green	bca846d462	[AArch64] Improve mull generation (#114997 ) This attempts to clean up and improve where we generate smull/umull using known-bits. For v2i64 types (where no mul is present), we try to create mull more aggressively to avoid scalarization.	2024-11-20 09:12:22 +00:00
Simon Pilgrim	3a5cf6d99b	[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826 ) Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).	2024-11-20 08:25:01 +00:00
Craig Topper	2187738508	[RISCV] Add additional CHECK prefixes to fixed-vectors-strided-load-store-asm.ll. NFC We had 2 RUN lines with conflicting output sharing prefixes. The script unfortunately did not report the error.	2024-11-19 16:54:29 -08:00
David Green	5b79152937	[AArch64] Make sure there is test coverage for ptr phis. NFC	2024-11-19 21:01:53 +00:00
Craig Topper	eff60d83b0	[RISCV][GISel] Make extended loads and truncating stores with s16 register type and s8 memory type legal. This addresses some failures I've seen in testing on real code.	2024-11-19 11:57:35 -08:00
Yashas Andaluri	b28eebf926	[RDF] Fix cover check when linking refs to defs (#113888 ) During RDF graph construction, linkRefUp method links a register ref to its upward reaching defs until all RegUnits of the ref have been covered by defs. However, when a sub-register def covers some, but not all, of the RegUnits of a previous super-register def, a super-register ref is not linked to the super-register def. This can result in certain super register defs being dead code eliminated. This patch fixes the cover check for a register ref. A def must be skipped only when all RegUnits of that def have already been covered by a previously seen def.	2024-11-19 12:38:36 -06:00
Jay Foad	b3995aa338	[AMDGPU] Decrease default NSA threshold from 3 to 2 (#116624 ) In graphics shaders it is better overall to use NSA encoding for IMAGE instructions, because the benefit of less constrained register allocation outweighs the cost of larger encoding. In particular NSA form often avoids the need for extra V_MOV_B32 instructions between IMAGE instructions, which can allow the IMAGE instructions to be claused. Note that in GFX12 there is no longer a bit in the encoding to choose between NSA and non-NSA forms, so this only affects GFX10 and GFX11.	2024-11-19 15:54:27 +00:00
Zaara Syeda	8e4423eb08	[AsmPrinter] Fix handling in emitGlobalConstantImpl for AIX (#116255 ) When GlobalMerge creates a MergedGlobal of statics all initialized to zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results in just emitting zeros followed by labels for the aliases. We need to handle it more like how emitGlobalConstantStruct does by emitting each global inside the aggregate. --------- Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>	2024-11-19 09:58:25 -05:00
Sergei Barannikov	aff98e4be0	[ARM] Stop gluing 1-bit shifts (#116547 ) 1. When two (or more) nodes are glued, DAG scheduler will always schedule them as one piece, i.e. it will not allow any instructions to be scheduled between them. It does so because if nodes are glued this usually means that there is an implicit register dependency between them, and an intervening node could clobber this physical register. When emitting such nodes into machine IR, they will also be stuck together, e.g.: ``` %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr %10:gpr = RRX %3, implicit $cpsr ``` 2. If a node has Glue result, SelectionDAG will not try to CSE this node. If it did, it would break the implicit physical register dependency. In practice this means that if a node with Glue result has multiple uses, it has to be duplicated before each use. This the reason for `ARMTargetLowering::duplicateCmp` to exist. When using normal data dependency, dependent nodes can freely be scheduled around. If there is a physical register dependency between nodes, the physical register will be copied to/from a virtual register, allowing other nodes to intervene between them. The resulting machine IR might look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = COPY $cpsr %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al /, $noreg, $noreg %12:gpr = BICri killed %11, -2147483648, 14 / CC::al /, $noreg, $noreg $cpsr = COPY %10 %13:gpr = RRX %3, implicit $cpsr ``` The two copies are likely to be eliminated by register coalescer, given that there are no instructions between them that clobber this physical register. If the copies are unwanted in the first place (they could be expensive or impossible), DAG scheduler will try to avoid inserting them wherever possible, and the resulting machine IR will look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = ORRrsi killed %9, %3, 242, 14 / CC::al /, $noreg, $noreg %11:gpr = BICri killed %10, -2147483648, 14 / CC::al */, $noreg, $noreg %12:gpr = RRX %3, implicit $cpsr ``` On ARM, arithmetic operations and LSLS already use the new data flow approach. This patch extends it to include 1-bit shifts. Pull Request: https://github.com/llvm/llvm-project/pull/116547	2024-11-19 17:46:48 +03:00
Yingwei Zheng	c727b48287	[SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075 ) On loongarch64 with lsx extension, we select `VBITREV_W` for `v4i32 (xor X, (shl splat(1), Y))`: `8e66303916/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (L1583-L1584)` And `vsplat_imm_eq_1` is defined as: `8e66303916/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td (L77-L87)` For the `(bitconvert (v4i32 (build_vector)))` case, the pattern is expected to be: ``` PATTERN: (xor:{ :[v4i32] } v4i32:{ :[v4i32] }:$vj, (shl:{ :[v4i32] } (bitconvert:{ :[v4i32] } (build_vector:{ :[v4i32] }))<<P:Predicate_vsplat_imm_eq_1>>, v4i32:{ :[v4i32] }:$vk)) RESULT: (VBITREV_W:{ :[v4i32] } v4i32:{ :[v4i32] }:$vj, v4i32:{ :[v4i32] }:$vk) ``` However, `simplifyTree` drops the `bitconvert` node and its predicates: `8e66303916/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp (L3036-L3062)` Then llvm will match `vsplat_imm_eq_1` for any v4i32 splats and cause a miscompilation: ``` PATTERN: (xor:{ :[v4i32] } v4i32:{ :[v4i32] }:$vj, (shl:{ :[v4i32] } (build_vector:{ :[v4i32] }), v4i32:{ :[v4i32] }:$vk)) RESULT: (VBITREV_W:{ :[v4i32] } v4i32:{ :[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` This patch adds additional checks for predicates associated with the trivial bitconvert node. Unused patterns in the LoongArch target are also removed. Fixes https://github.com/llvm/llvm-project/issues/116008.	2024-11-19 21:24:40 +08:00
Hari Limaye	4f0403fe96	[CodeGen][AArch64] Sink splat operands of FMul instructions (#116222 ) Sink shuffle operands of FMul instructions if these are splats, as we can generate lane-indexed variants for these.	2024-11-19 12:59:22 +00:00
Sam Elliott	c4030c896d	[RISCV] Fix FP64 DinX R Regclass (#116688 ) This was a typo in llvm/llvm-project#112983 that didn't cause build failures but is still wrong.	2024-11-19 12:42:27 +00:00
Yingwei Zheng	42ed775783	[InstSimplify] Generalize `simplifyAndOrOfFCmps` to handle fabs (#116590 ) This patch generalizes https://github.com/llvm/llvm-project/issues/81027 to handle pattern `and/or (fcmp ord/uno X, 0), (fcmp pred fabs(X), Y)`. Alive2: https://alive2.llvm.org/ce/z/tsgUrz The correctness is straightforward because `fcmp ord/uno X, 0.0` is equivalent to `fcmp ord/uno fabs(X), 0.0`. We may generalize it to handle fneg as well. Address comment https://github.com/llvm/llvm-project/pull/116065#pullrequestreview-2434796846	2024-11-19 20:10:40 +08:00
Simon Pilgrim	95ab42661e	[X86] Attempt to canonicalize vXf64 SHUFPD shuffle masks with undef elts to improve further folding (#116419 ) Currently when creating a SHUFPD immediate mask, any undef shuffle elements are set to 0, which can limit options for further shuffle combining. This patch attempts to canonicalize the mask to improve folding: first by detecting a per-lane broadcast style mask (which can allow us to fold to UNPCK instead), and second ensure any undef elements are set to an 'inplace' value to improve chances of the SHUFPD later folding to a BLENDPD (or be bypassed in a SimplifyMultipleUseDemandedVectorElts call). This is very similar to canonicalization we already attempt in getV4X86ShuffleImm for vXi32/vXf32 SHUFPS/SHUFD shuffles.	2024-11-19 10:45:07 +00:00
Sergei Barannikov	6f53ae6e61	[X86] Properly chain PROBED_ALLOCA / SEG_ALLOCA (#116508 ) These nodes should appear between CALLSEQ_START / CALLSEQ_END. Previously, they could be scheduled after CALLSEQ_END because the nodes didn't update the chain. The change in a test is due to X86 call frame optimizer pass bailing out for a particular call when CALLSEQ_START / CALLSEQ_END are not in the same basic block. This happens because SEG_ALLOCA is expanded into a sequence of basic blocks early. It didn't bail out before because the closing CALLSEQ_END was scheduled before SEG_ALLOCA, in the same basic block as CALLSEQ_START. While here, simplify creation of these nodes: allocating a virtual register and copying `Size` into it were unnecessary.	2024-11-19 13:29:58 +03:00
Lukacma	61726add1b	[AArch64] Update predicate for FEXPA (#116613 ) This patch updates predicate and backend tests for FEXPA instructions to match [latest spec](https://developer.arm.com/documentation/ddi0602/2024-09/SVE-Instructions/FEXPA--Floating-point-exponential-accelerator-).	2024-11-19 10:29:19 +00:00
Mikhail Goncharov	f77126c549	Revert "[FunctionAttrs] Add the "initializes" attribute inference (#97373 )" This reverts commit 661c593850715881d2805a59e90e6d87d8b9fbb8. Multiple buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/108/builds/6096	2024-11-19 10:29:36 +01:00
Davide	8cd348c96a	[MIPS] Updated MIPS N calling conventions so that fp16 arguments no longer cause a crash (#116569 ) This PR fixes a bug introduced by #110199, which causes any half float argument to crash the compiler on MIPS64. Currently compiling this bit of code with `llc -mtriple=mips64`: ``` define void @half_args(half %a) nounwind { entry: ret void } ``` Crashes with the following log: ``` LLVM ERROR: unable to allocate function argument #0 PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llc -mtriple=mips64 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'MIPS DAG->DAG Pattern Instruction Selection' on function '@half_args' #0 0x000055a3a4013df8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x32d0df8) #1 0x000055a3a401199e llvm::sys::RunSignalHandlers() (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x32ce99e) #2 0x000055a3a40144a8 SignalHandler(int) Signals.cpp:0:0 #3 0x00007f00bde558c0 __restore_rt libc_sigaction.c:0:0 #4 0x00007f00bdea462c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 #5 0x00007f00bde55822 gsignal ./signal/../sysdeps/posix/raise.c:27:6 #6 0x00007f00bde3e4af abort ./stdlib/abort.c:81:7 #7 0x000055a3a3f80e3c llvm::report_fatal_error(llvm::Twine const&, bool) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x323de3c) #8 0x000055a3a2e20dfa (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x20dddfa) #9 0x000055a3a2a34e20 llvm::MipsTargetLowering::LowerFormalArguments(llvm::SDValue, unsigned int, bool, llvm::SmallVectorImpl<llvm::ISD::InputArg> const&, llvm::SDLoc const&, llvm::SelectionDAG&, llvm::SmallVectorImpl<llvm::SDValue>&) const MipsISelLowering.cpp:0:0 #10 0x000055a3a3d896a9 llvm::SelectionDAGISel::LowerArguments(llvm::Function const&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30466a9) #11 0x000055a3a3e0b3ec llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30c83ec) #12 0x000055a3a3e09e21 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30c6e21) #13 0x000055a3a2aae1ca llvm::MipsDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) MipsISelDAGToDAG.cpp:0:0 #14 0x000055a3a3e07706 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x30c4706) #15 0x000055a3a3051ed6 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x230eed6) #16 0x000055a3a35a3ec9 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x2860ec9) #17 0x000055a3a35ac3b2 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x28693b2) #18 0x000055a3a35a499c llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x286199c) #19 0x000055a3a262abbb main (/home/davide/Ps2/rps2-tools/prefix/bin/llc+0x18e7bbb) #20 0x00007f00bde3fc4c __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3 #21 0x00007f00bde3fd05 call_init ./csu/../csu/libc-start.c:128:20 #22 0x00007f00bde3fd05 __libc_start_main@GLIBC_2.2.5 ./csu/../csu/libc-start.c:347:5 #23 0x000055a3a2624921 _start /builddir/glibc-2.39/csu/../sysdeps/x86_64/start.S:117:0 ``` This is caused by the fact that after the change, `f16`s are no longer lowered as `f32`s in calls. Two possible fixes are available: - Update calling conventions to properly support passing `f16` as integers. - Update `useFPRegsForHalfType()` to return `true` so that `f16` are still kept in `f32` registers, as before #110199. This PR implements the first solution to not introduce any more ABI changes as #110199 already did. As of what is the correct ABI for halfs, I don't think there is a correct answer. GCC doesn't support halfs on MIPS, and I couldn't find any information on old MIPS ABI manuals either.	2024-11-19 10:23:32 +01:00
Sander de Smalen	3093b29b59	[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. (#116191 ) In a situation like the following: ``` undef %2.subreg = INST %1 ; DefMI (rematerializable), ; DefSubIdx = subreg %3 = COPY %2 ; SrcIdx = DstIdx = 0 .... = SOMEINSTR %3, %2 ``` there are no subranges for `%3` because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for `.subreg`. (To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization)	2024-11-19 08:46:55 +00:00
Matt Arsenault	927032807d	AMDGPU: Handle gfx950 96/128-bit buffer_load_lds (#116681 ) Enforcing this limit in the clang builtin will come later.	2024-11-18 22:01:56 -08:00
Matt Arsenault	50224bd5ba	AMDGPU: Handle gfx950 global_load_lds_* instructions (#116680 ) Define global_load_lds_dwordx3 and global_load_dwordx4. Oddly it seems dwordx2 was skipped.	2024-11-18 21:58:02 -08:00
Matt Arsenault	130a3150ec	AMDGPU: Define v_mfma_f32_32x32x16_bf16 for gfx950 (#116679 ) Unlike the existing gfx940 intrinsics using short/i16 in place of bfloat, this uses the natural bfloat type.	2024-11-18 21:53:56 -08:00
Matt Arsenault	738bdd4969	AMDGPU: Add V_CVT_PK_BF16_F32 for gfx950 (#116678 )	2024-11-18 21:50:54 -08:00
Haopeng Liu	661c593850	[FunctionAttrs] Add the "initializes" attribute inference (#97373 ) Add the "initializes" attribute inference. This change is expected to have ~0.09% compile time regression, which seems acceptable for interprocedural DSE. https://llvm-compile-time-tracker.com/compare.php?from=9f10252c4ad7cffbbcf692fa9c953698f82ac4f5&to=56345c1cee4375eb5c28b8e7abf4803d20216b3b&stat=instructions%3Au	2024-11-18 21:36:05 -08:00
Jim Lin	cd418030de	[RISCV] Remove +a from the attribute test for zacas and zabha. NFC. zacas and zabha don't require the 'a' or 'zaamo' extension after https://github.com/llvm/llvm-project/pull/115694.	2024-11-19 10:18:40 +08:00
David Green	36d47f8878	[AArch64][GlobalISel] Legalize ptr vector freeze and implicit defs. They can be treated the same as other s64 operations.	2024-11-18 22:35:58 +00:00
David Green	50209e9942	[AArch64][GlobalISel] Move and update freeze.ll test. NFC This adds a number of extra vector cases, notably the ptr vectors.	2024-11-18 22:31:13 +00:00
Youngsuk Kim	b083340cb6	[llvm][NVPTX] Don't reorder MIs that construct a PTX function call (#116522 ) With "-enable-misched", MachineScheduler can reorder MIs that must stick together (in initially set order) to generate legal PTX code for a function call. When generating PTX code for the attached test (using LLVM before this revision), the following invalid PTX code is generated: ``` { // callseq 0, 0 .param .b64 param0; st.param.f64 [param0], 0d0000000000000000; .param .b64 retval0; call.uni (retval0), mul.lo.s32 %r7, %r10, %r3; or.b32 %r8, %r4, %r7; mul.lo.s32 %r9, %r2, %r8; cvt.rn.f64.s32 %fd3, %r9; quux, ( param0 ); ld.param.f64 %fd1, [retval0]; } // callseq 0 ```	2024-11-18 17:12:19 -05:00
Justin Bogner	e0b522dd94	[DirectX] Fix crash in DXILFlattenArrays for function declarations (#116690 ) We were skipping intrinsics here, but really we need to skip all function declarations - if the function doesn't have a body there's nothing to walk.	2024-11-18 13:56:33 -08:00
Alex MacLean	55876278d3	[NVPTX] Add support for f16 fabs (#116107 ) Add support for f16 and f16x2 support for abs. See PTX ISA 9.7.4.6. Half Precision Floating Point Instructions: abs https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-abs	2024-11-18 13:47:08 -08:00
Thorsten Schütt	f8d1905a24	[GlobalISel] Combine [S,U]SUBO (#116489 ) We import the llvm.ssub.with.overflow.* Intrinsics, but the Legalizer also builds them while legalizing other opcodes, see narrowScalarAddSub.	2024-11-18 22:39:23 +01:00
Matt Arsenault	0c421687f8	AMDGPU: Add first gfx950 mfma instructions (#116312 ) Scheduling info and hazards are wrong and TBD.	2024-11-18 13:38:07 -08:00
Matt Arsenault	ca1b35a6c8	AMDGPU: Add v_prng_b32 instruction for gfx950 (#116310 ) Rand num instruction for stochastic rounding.	2024-11-18 10:54:54 -08:00
Matt Arsenault	5a556d55fb	AMDGPU: Increase the LDS size to support to 160 KB for gfx950 (#116309 )	2024-11-18 10:48:56 -08:00
Matt Arsenault	a6fc489bb7	AMDGPU: Add gfx950 subtarget definitions (#116307 ) Mostly a stub, but adds some baseline tests and tests for removed instructions.	2024-11-18 10:41:14 -08:00
Fraser Cormack	18be88e20a	[NVPTX][NFC] Regenerate some tests checks (#116605 ) Use update_llc_test_checks.py to automate the test checks in some files I was observing changes in locally.	2024-11-18 18:35:44 +00:00
Lei Huang	ed8ebad6eb	[SelectionDAG] Support integer promotion for VP_LOAD and VP_STORE (#81299 ) Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend and truncate of each form. Patch commandeered from: https://reviews.llvm.org/D109377	2024-11-18 13:32:58 -05:00
Sam Elliott	4615cc38f3	[RISCV] Inline Assembly Support for GPR Pairs ('R') (#112983 ) This patch adds support for getting even-odd general purpose register pairs into and out of inline assembly using the `R` constraint as proposed in riscv-non-isa/riscv-c-api-doc#92 There are a few different pieces to this patch, each of which need their own explanation. - Renames the Register Class used for f64 values on rv32i_zdinx from `GPRPair` to `GPRF64Pair`. These register classes are kept broadly unmodified, as their primary value type is used for type inference over selection patterns. This rename affects quite a lot of files. - Adds new `GPRPair` register classes which will be used for `R` constraints and for instructions that need an even-odd GPR pair. This new type is used for `amocas.d.`(rv32) and `amocas.q.`(rv64) in Zacas, instead of the `GPRF64Pair` class being used before. - Marks the new `GPRPair` class legal as for holding a `MVT::Untyped`. Two new RISCVISD node types are added for creating and destructing a pair - `BuildGPRPair` and `SplitGPRPair`, and are introduced when bitcasting to/from the pair type and `untyped`. - Adds functionality to `splitValueIntoRegisterParts` and `joinRegisterPartsIntoValue` to handle changing `i<2xlen>` MVTs into `untyped` pairs. - Adds an override for `getNumRegisters` to ensure that `i<2*xlen>` values, when going to/from inline assembly, only allocate one (pair) register (they would otherwise allocate two). This is due to a bug in SelectionDAGBuilder.cpp which other backends also work around. - Ensures that Clang understands that `R` is a valid inline assembly constraint. - This also allows `R` to be used for `f64` types on `rv32_zdinx` architectures, where doubles are stored in a GPR pair.	2024-11-18 17:45:58 +00:00
Hugh Delaney	8f8016fe66	[NVPTX] Add patterns for fma.relu.{f16\|f16x2\|bf16\|bf16x2} (#114977 ) Add patterns to lower `fmaxnum(fma(a, b, c), 0)` to `fma.rn{.ftz}.relu` for `f16`, `f16x2`, `bf16`, `bf16x2` types, when `nnan` is used. `fma_relu` honours `NaN`, so the substitution is only made if the `fma` is `nnan`, since `fmaxnum` returns the non NaN argument when passed a NaN value. This patch also removes some `bf16` ftz instructions since `FTZ` is not supported with the `bf16` type, according to the PTX ISA docs.	2024-11-18 15:29:17 +00:00
Steven Perron	756fe54dc7	[SPIRV] Add write to image buffer for shaders. (#115927 ) This commit adds an intrinsic that will write to an image buffer. We chose to match the name of the DXIL intrinsic for simplicity in clang. We cannot reuse the existing openCL write_image function because that is not a reserved name in HLSL. There is not much common code to factor out.	2024-11-18 09:06:05 -05:00
Akshat Oke	3f9d02aae8	[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326 ) With this, all machine SSA optimization passes are available in the new codegen pipeline.	2024-11-18 11:02:01 +05:30
Aiden Grossman	d9eda6b2f3	[MLGO] Remove extranous check lines from test input This patch removes check lines from a test input. It was originally copied from a test that had assertions automatically generated, but given we only use it as an input, the check lines do absolutely nothing. Remove them to improve readability of the test/prevent confusion.	2024-11-18 03:52:13 +00:00
Freddy Ye	97836bed63	Reland "[X86] Support -march=diamondrapids (#113881 )" (#116564 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-18 10:40:32 +08:00
Freddy Ye	90e92239bd	Revert "[X86] Support -march=diamondrapids (#113881 )" (#116563 ) This reverts commit 826b845c9e97448395431be3e4e5da585bd98c5e.	2024-11-18 08:45:28 +08:00
Freddy Ye	826b845c9e	[X86] Support -march=diamondrapids (#113881 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-18 08:31:17 +08:00
Craig Topper	eed9af95e6	[RISCV][GISel] Make loads/stores with s16 register type and s16 memory type legal. This is needed to support Zfh loads/stores. This requires supporting extends from sext/zext form i16 and s16 G_FREEZE to support the current tests we have.	2024-11-17 11:39:59 -08:00

1 2 3 4 5 ...

56130 Commits