llvm-project

Author	SHA1	Message	Date
Joe Nash	ef79d9e38e	[AMDGPU][NFC] Regenerate CHECKs as pre-commit for D157426	2023-08-11 09:55:59 -04:00
Paul Walker	ac2a7637fe	[SVE] Add test to show incorrect code generation for scalable vector struct loads and stores. Patch also includes a minor fix to AArch64::isLegalAddressingMode to ensure all scalable types have a suitable bailout.	2023-08-11 13:35:04 +00:00
Simon Pilgrim	6c119cff31	[X86] combineConcatVectorOps - extend PACKSS/PACKUS handling to 512-bit nodes on BWI targets. Fixes another TRUNCATE -> PACKSS/PACKUS regression when #63710 finally gets fixed	2023-08-11 13:25:24 +01:00
Simon Pilgrim	0464a8f4a4	[X86] Add tests showing failure to concat(pack(),pack()) 512-bit results on BWI targets	2023-08-11 13:15:06 +01:00
Matt Arsenault	29fff3e2ab	AMDGPU: Try to select fmul by power of 2 to ldexp For the f64 case, this gives us a cheaper to materialize 32-bit constant. It's less obviously a win for f32 and f16. It forces us to use a VOP3 encoding so it's a neutral code size change. GlobalISel cases don't work because of the constant-is-copy-to-vgpr problem. https://reviews.llvm.org/D157111	2023-08-11 07:57:55 -04:00
Matt Arsenault	c8a4f2a8c1	AMDGPU: Add baseline tests for fmul-to-ldexp patterns We can better some multiply-by-power-of-2 patterns as ldexp.	2023-08-11 07:57:55 -04:00
Anatoly Trosinenko	81300f75f4	[AArch64][PAC] Remove the duplication of LR sign/auth implementations In the machine outliner implementation for AArch64, `signOutlinedFunction()` reimplements signing the LR value in prologue and authenticating it in epilogue of the outlined function. This patch factors out `signLR()` and `authenticateLR()` functions from AArch64FrameLowering code and reuses them in `signOutlinedFunction()`. The `mergeOutliningCandidateAttributes()` outliner callback is introduced as well to further unify signing and authentication of the LR value. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D157320	2023-08-11 14:39:18 +03:00
David Green	7720b9a7e8	[AArch64] Extend and cleanup vecreduce.fmin/max tests. NFC See D156614 and D156615. This extends and uniforms the types tested in vecreduce min/max tests to make them more useful to GlobalISel.	2023-08-11 12:16:22 +01:00
Simon Pilgrim	14d1e502df	[X86] combineConcatVectorOps - fold a 512-bit splat of a 128-bit subvector to a single X86ISD::SHUF128 node. Replaces a pair of insert_subvectors with a single (implicitly widened) vector - also reduce uses of the src. Hopefully this should address most of the remaining widen subvector regressions I'm seeing while trying to aggressively convert TRUNCATE to PACKSS/PACKUS.	2023-08-11 12:14:02 +01:00
pvanhout	14cfe92975	[AArch64][GlobalISel] Regenerate select combine tests Will be modified in D157690	2023-08-11 12:45:09 +02:00
Stanislav Mekhanoshin	02046ad944	[AMDGPU] W/a for gfx940 byte0 fp8 conversion bug VOP1 form of these do not work. Differential Revision: https://reviews.llvm.org/D157683	2023-08-11 02:21:21 -07:00
David Green	acd17ea662	[AArch64][GISel] Expand handling for G_FSQRT to more vector types Similar to G_FABS, these can reuse the existing lowering to successfully handle more types.	2023-08-11 10:16:45 +01:00
Simon Pilgrim	ef46046060	[X86] combineConcatVectorOps - add handling for X86ISD::VPERM2X128 nodes. On AVX512 targets we can concatenate these and create a X86ISD::SHUF128 node. Prevents regression on some future work to improve codegen for concat_vectors(extract_subvector(),extract_subvector()) (mainly via vector widening) patterns.	2023-08-11 10:01:13 +01:00
Lawrence Benson	c7b537bf09	[AArch64] Add more efficient vector bitcast for v16i8 We previously split the vector into two halves and performed two vector reduce operations followed by bit shifting and bitwise or. Now, we use NEON's zip1 to concatenate the halves in a smart way and then perform only a single vector reduce. This boosts performance quite a bit for this small routine, as vector reduce is a rather expensive intruction. Original discussion for this started in: https://reviews.llvm.org/D145301 Differential Revision: https://reviews.llvm.org/D156544	2023-08-11 10:10:42 +02:00
Nikita Popov	59d558a378	[X86] Add test for PR64589 (NFC)	2023-08-11 09:52:25 +02:00
pvanhout	490a867f16	[GlobalISel] Also set dead flags of implicit defs added by BuildMI BuildMI automatically adds the implicit operands of the instruction. This meant we couldn''t set the dead flag on dead implicit defs in that case. Fix it by introducing an opcode to mark a given implicit def as dead. Fixes #64565 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157515	2023-08-11 08:38:37 +02:00
pvanhout	89e91e4c0c	[AMDGPU] Remove post-PromoteAlloca SROA run PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore. Internal testing shows no noticeable performance impact. Reviewed By: #amdgpu, arsenm Differential Revision: https://reviews.llvm.org/D156398	2023-08-11 08:29:21 +02:00
Yeting Kuo	69cc5a4e1a	[LegalizeTypes] Support promotion for vp bitmanip sdnodes. This support promotion for vp.bitreverse/bswap/ctlz/ctlz_zero_undef/cttz/cttz_zero_undef/ctpop/fshr/fshl. Reviewed By: craig.topper, luke Differential Revision: https://reviews.llvm.org/D157607	2023-08-11 08:27:42 +08:00
Eduard Zingerman	e66affa17e	Revert "[BPF] support for BPF_ST instruction in codegen" This reverts commit 92e28e397d4ccf1bff075f48e22cf1e23a7d02bf. Reverting to investigate buildbot failure reported in [1]. field-reloc-st-imm.ll: * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", ... - operand 0: 3 * Bad machine code: Explicit definition must be a register * - function: bar - basic block: %bb.0 entry (0x742f318) - instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", ... - operand 0: 4 LLVM ERROR: Found 4 machine code errors. [1] https://lab.llvm.org/buildbot/#/builders/16/builds/52877	2023-08-11 02:23:40 +03:00
Eduard Zingerman	92e28e397d	[BPF] support for BPF_ST instruction in codegen Generate store immediate instruction when CPUv4 is enabled. For example: $ cat test.c struct foo { unsigned char b; unsigned short h; unsigned int w; unsigned long d; }; void bar(volatile struct foo p) { p->b = 1; p->h = 2; p->w = 3; p->d = 4; } $ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - \| llvm-objdump -d - ... 0000000000000000 <bar>: 0: 72 01 00 00 01 00 00 00 (u8 )(r1 + 0x0) = 0x1 1: 6a 01 02 00 02 00 00 00 (u16 )(r1 + 0x2) = 0x2 2: 62 01 04 00 03 00 00 00 (u32 )(r1 + 0x4) = 0x3 3: 7a 01 08 00 04 00 00 00 (u64 *)(r1 + 0x8) = 0x4 4: 95 00 00 00 00 00 00 00 exit Take special care to: - apply `BPFMISimplifyPatchable::checkADDrr` rewrite for BPF_ST - validate immediate value when BPF_ST write is 64-bit: BPF interprets `(BPF_ST \| BPF_MEM \| BPF_DW)` writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff. Differential Revision: https://reviews.llvm.org/D140804	2023-08-11 02:07:29 +03:00
Matt Arsenault	7575ee7167	AMDGPU: Add more test coverage for FP-typed atomicrmw xchg	2023-08-10 17:38:25 -04:00
Matt Arsenault	c8cac15613	PreISelIntrinsicLowering: Check RuntimeLibcalls instead of TLI for memory functions We need a better mechanism for expressing which calls you are allowed to emit and which calls are recognized. This should be applied to the 17 branch.	2023-08-10 16:40:04 -04:00
Changpeng Fang	1e22873ef4	[AMDGPU][NFC] Rename two LIT test files	2023-08-10 11:31:14 -07:00
Craig Topper	2df9328fe3	[RISCV] Stop performFP_TO_INTCombine from folding with ISD::FRINT. FRINT was added to matchRoundingOp after this function was written. So FRINT was not tested originally. For vectors, folding this causes us to create a CSR swap that tries to write 7 to FRM. This is an illegal value and will cause the CSR write to fail. While this might be a legal fold we could do, I'm disabling it for now so we can backport to LLVM 17 with the least risk. Differential Revision: https://reviews.llvm.org/D157583	2023-08-10 09:30:36 -07:00
Philip Reames	b1ada7a1d3	[DAG] Support store merging of vector constant stores (try 2) Original commit didn't handle the case where one of the stores was a truncating store of the build_vector. The existing codepath produced wrong code (which thankfully also failed asserts) instead of guarding against unexpected types. Original commit message follows.. Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged. Differential Revision: https://reviews.llvm.org/D156349	2023-08-10 08:54:05 -07:00
Philip Reames	e838471bc4	[X86] Add regression test case from pr64593 This is the case which triggered the revert of 660b740. Note that the test is extremely fragile as it depends on getting a truncating store at the right moment rather than folding the constant to a narrower bitwidth. This appears to happen on skylake, but not e.g. plain avx.	2023-08-10 08:49:41 -07:00
Patrick O'Neill	fcad2bbcfc	[RISC-V] Add proposed mapping for Ztso Currently LLVM emits Ztso code for fences, loads, and stores (behind an experimental flag) [1]. This patch updates the mapping and implements support for LR/SC and AMO ops. This updated mapping is compatible with the RVWMO ABI present in the psABI. Additional context can be found in the psABI pull request [2]. [1] https://reviews.llvm.org/D143076 [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391 Differential Revision: https://reviews.llvm.org/D155517	2023-08-10 15:59:06 +01:00
Philip Reames	0696a531c2	Revert "[DAG] Support store merging of vector constant stores" This reverts commit 660b740e4b3c4b23dfba36940ae0fe2ad41bfedf. Crash reported in the review thread post commit. Reverting while investigating.	2023-08-10 07:58:00 -07:00
Nabeel Omer	d43634cd74	[X86] Pre-commit test for D157513 https://reviews.llvm.org/D157513	2023-08-10 15:40:12 +01:00
Luke Lau	b165a7779d	[RISCV] Remove completed FIXME. NFC Looks like this FIXME was already taken off during the original patch in https://reviews.llvm.org/D104921	2023-08-10 15:31:05 +01:00
Sean Fertile	b37c7ed0c9	[PPC][AIX] Fix toc-data peephole bug and some related cleanup. Set the ReplaceFlags variable to false, since there is code meant only for the ADDItocHi/ADDItocL nodes. This has the side effect of disabling the peephole when the load/store instruction has a non-zero offset. This patch also fixes retrieving the `ImmOpnd` node from the AIX small code model pseduos and does the same for the register operand node. This allows cleaning up the later calls to replaceOperands. Finally move calculating the MaxOffset into the code guarded by ReplaceFlags as it is only used there and the comment is specific to the ELF ABI. Fixes https://github.com/llvm/llvm-project/issues/63927 Differential Revision: https://reviews.llvm.org/D155957	2023-08-10 10:23:15 -04:00
Jay Foad	3091bdb86d	[AMDGPU] Do not release VGPRs at -O0 This was an oversight when the GFX11 early release VGPRs optimization was reimplemented in D153279. Sending the DEALLOC_VGPRS message is a performance optimization so there is no need to do it at -O0. In addition it makes some kinds of post mortem debugging hard or impossible, since VGPR values are no longer available to inspect at the s_endpgm instruction. Differential Revision: https://reviews.llvm.org/D157599	2023-08-10 14:58:06 +01:00
Simon Pilgrim	4ed452b747	[X86] getFauxShuffleMask - handle insert_subvector(src, bitcast(extract_subvector(sub))) patterns Add bitcast handling to the existing insert_subvector(src, extract_subvector(sub)) pattern, and recognise undef src cases to allow us to detect vector widening patterns.	2023-08-10 13:38:38 +01:00
Paul Walker	3d65f8211f	[SVE] Expand scalable vector ISD::BITCASTs when targeting big-endian. Whilst sub-optimial, it's better than the current selection failure. Fixes: #64406 Differential Revision: https://reviews.llvm.org/D157406	2023-08-10 11:02:01 +00:00
Jianjian GUAN	8901eb281f	[RISCV] Fix zihintntl test	2023-08-10 17:18:17 +08:00
David Green	c26459258a	[AArch64] Update check lines in neon-compare-instructions.ll -global-isel-abort=2 is no longer required, and many of the tests can now shared CHECK lines between SDAG and GlobalISel.	2023-08-10 10:09:13 +01:00
Jianjian GUAN	f808788487	[RISCV] Remove experimental for zihintntl Since zihintntl is ratified now, we could remove the experimental prefix and change its version to 1.0. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D151547	2023-08-10 17:04:49 +08:00
Neumann Hon	3e139be29f	[SystemZ][z/OS] Add support for function name field of PPA1 This PR causes the PPA1 to emit the function's name if it exists. This field is not emitted for unnamed functions. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D157494	2023-08-10 04:40:19 -04:00
David Green	b720dcba92	[AArch64][GISel] Split large f64 vectors for fcmp. This adds some very basic f64 handling for larger fcmp vectors, which seemed to be missing.	2023-08-10 08:19:22 +01:00
Yunze Zhu	5f73d2b780	[RISCV] Enable alias analysis by default In llvm alias analysis is off by default now. This patch enable alias analysis on RISCV target during code generation by default, and this makes more chances for improving performance. Modified related test cases. Differential Revision: https://reviews.llvm.org/D157250	2023-08-10 10:48:43 +08:00
Matt Arsenault	6dbd458128	AMDGPU: Remove pointless libcall optimization of fma/mad After the library is linked and trivially inlined, the generic fma and fmuladd intrinsics already handle these cases, and with precise flag handling. This was requiring all fast math flags when we really just need nsz for the fma(a, b, 0) case. https://reviews.llvm.org/D156677	2023-08-09 19:37:52 -04:00
Matt Arsenault	6448d5ba58	AMDGPU: Remove pointless libcall recognition of native_{divide\|recip} This was trying to constant fold these calls, and also turn some of them into a regular fmul/fdiv. There's no point to doing that, the underlying library implementation should be using those in the first place. Even when the library does use the rcp intrinsics, the backend handles constant folding of those. This was also only performing the folds under overly strict fast-evertyhing-is-required conditions. The one possible plus this gained over linking in the library is if you were using all fast math flags, it would propagate them to the new instructions. We could address this in the library by adding more fast math flags to the native implementations. The constant fold case also had no test coverage. https://reviews.llvm.org/D156676	2023-08-09 18:48:46 -04:00
Matt Arsenault	58e87c961e	AMDGPU: Port AMDGPULowerKernelArguments to new pass manager https://reviews.llvm.org/D157498	2023-08-09 18:34:30 -04:00
Matt Arsenault	1ca0808db2	GlobalISel: Don't expand stacksave/stackrestore in IRTranslator In some (likely invalid edge cases anyway), it's not correct to directly copy the stack pointer register.	2023-08-09 18:33:55 -04:00
Matt Arsenault	25bc999d1f	Intrinsics: Add type overload to stacksave and stackstore This allows use with non-0 address space stacks. llvm_ptr_ty should never be used. This could use some more percolation up through mlir, but this is enough to fix existing tests. https://reviews.llvm.org/D156666	2023-08-09 18:33:11 -04:00
priyanshi1708	b16a0f9f6e	[AArch64][Optimization]Emit FCCMP for AND of two float compares Transforms and(fcmp(a, b), fcmp(c, d)) into fccmp(fcmp(a, b), c, d) Issue link: https://github.com/llvm/llvm-project/issues/60819 Differential Revision: https://reviews.llvm.org/D152714	2023-08-09 15:58:04 +01:00
Paul Walker	b7e6e568b4	[SelectionDAG] Fix problematic call to EVT::changeVectorElementType(). The function changeVectorElementType assumes MVT input types will result in MVT output types. There's no gurantee this is possible during early code generation and so this patch converts an instance used during initial DAG construction to instead explicitly create a new EVT. NOTE: I could have added more MVTs, but that seemed unscalable as you can either have MVTs with 100% element count coverage or 100% bitwidth coverage, but not both. Differential Revision: https://reviews.llvm.org/D157392	2023-08-09 12:50:02 +00:00
Matt Devereau	175850f987	[AArch64][SVE2] Combine trunc+add+lsr to rshrnb The example sequence add z0.h, z0.h, #32 lsr z0.h, #6 st1b z0.h, x1 can be replaced with rshrnb z0.b, #6 st1b z0.h, x1 As the top half of the destination elements are truncated. In similar fashion, add z0.s, z0.s, #32 lsr z1.s, z1.s, #6 add z1.s, z1.s, #32 lsr z0.s, z0.s, #6 uzp1 z0.h, z0.h, z1.h Can be replaced with rshrnb z1.h, z1.s, #6 rshrnb z0.h, z0.s, #6 uzp1 z0.h, z0.h, z1.h Differential Revision: https://reviews.llvm.org/D155299	2023-08-09 12:49:42 +00:00
Quentin Colombet	bb206cb131	[NVPTX] Apply global var demotion to private symbols When emitting the assembly we perform some late global variables demotion. Prior to this patch, this optimization was only performed on variables with the internal linkage whereas any local global variable can be demoted. Fix that by using `hasLocalLinkage` instead of `hasInternalLinkage`. Without this change, global variables with the `private` linkage wouldn't be demoted. Differential Revision: https://reviews.llvm.org/D154507	2023-08-09 14:41:01 +02:00
Sander de Smalen	ecb7b9c5c5	[Clang][AArch64] Diagnostics for SME attributes when target doesn't have 'sme' This patch adds error diagnostics to Clang when code uses the AArch64 SME attributes without specifying 'sme' as available target attribute. * Function definitions marked as '__arm_streaming', '__arm_locally_streaming', '__arm_shared_za' or '__arm_new_za' will by definition use or require SME instructions. * Calls from non-streaming functions to streaming-functions require the compiler to enable/disable streaming-SVE mode around the call-site. In some cases we can accept the SME attributes without having 'sme' enabled: * Function declaration can have the SME attributes. * Definitions can be __arm_streaming_compatible since the generated code should execute on processing elements without SME. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D157269	2023-08-09 12:31:02 +00:00

... 66 67 68 69 70 ...

52796 Commits