llvm-project

Author	SHA1	Message	Date
Yusra Syeda	9a38a72f1d	[SystemZ][z/OS] This change adds support for the PPA2 section in zOS (#68926 ) This PR adds support for the PPA2 fields. --------- Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>	2023-11-27 16:30:12 -05:00
Craig Topper	179a2e0443	[RISCV][GISel] Legalize and select G_BRINDIRECT. (#73059 )	2023-11-27 13:09:47 -08:00
Craig Topper	9e86919626	[RISCV][GISel] Fix 2 indirect call bugs. (#73170 ) We can't set MO_PLT on an indirect call. We need to constrain the register class for the operand to the call instruction.	2023-11-27 12:59:01 -08:00
David Li	c2ba2b2190	Fix ISel crash when lowering BUILD_VECTOR (#73186 ) 512bit vpbroadcastw is available only with AVX512BW. Avoid lowering BUILD_VEC into vbroard_cast node when the condition is not met. This fixed a crash (see the added new test).	2023-11-27 11:09:46 -08:00
Bjorn Pettersson	30afb21547	Revert "[MCP] Enhance MCP copy Instruction removal for special case (#70778 )" This reverts commit cae46f6210293ba4d3568eb21b935d438934290d. Reverted due to miscompiles. See https://github.com/llvm/llvm-project/issues/73512	2023-11-27 19:39:40 +01:00
Craig Topper	5f31dbd18d	[RISCV] Add register bank and instruction selection support for FP G_SELECT. (#72726 ) Try to pick the FP register bank based on surrounding use/defs. Code is basically copied from AArch64. Need legalizer changes to make this more useful. Right now we're stuck with only being able to FP select types less than or equal to XLen.	2023-11-27 10:38:25 -08:00
David Green	3c23ed156f	[AArch64] Add a test to show scheduling aliasing between SVE loads and stores. NFC	2023-11-27 16:22:46 +00:00
Simon Pilgrim	286905351f	[X86] vector-interleaved tests - add AVX512F/AVX512DQ/AVX512BW/AVX512DQBW-ONLY common prefixes to merge more SLOW/FAST checks Not used by many vector-interleaved tests, but its a LOT easier to maintain if we use the same prefixes for all of them.	2023-11-27 15:06:24 +00:00
Igor Kirillov	839abdb0d2	[MachineLICM] Fix incorrect CSE on hoisted const load (#73007 ) When hoisting an invariant load, we should not combine it with an existing load through common subexpression elimination (CSE). This is because there might be memory-changing instructions between the existing load and the end of the block entering the loop. Fixes https://github.com/llvm/llvm-project/issues/72855	2023-11-27 14:37:18 +00:00
Simon Pilgrim	edf645616f	[X86] Regenerate vector-interleaved-store-i64-stride-4.ll	2023-11-27 13:48:36 +00:00
Shengchen Kan	cb112eb16c	[X86][CodeGen] Teach frame lowering to spill/reload registers w/ PUSHP/POPP, PUSH2[P]/POP2[P] (#73292 ) #73092 supported the encoding/decoding for PUSHP/POPP #73233 supported the encoding/decoding for PUSH2[P]/POP2[P] In this patch, we teach frame lowering to spill/reload registers w/ these instructions. 1. Use PPX for balanced spill/reload 2. Use PUSH2/POP2 for continuous spills/reloads 3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary	2023-11-27 21:37:07 +08:00
Momchil Velikov	ac06d4e4cb	Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This re-commits 13fe0386454d after fixing a couple of issues in the LLDB testsuite in ef9bcace834e and 6b87d84ff45d	2023-11-27 11:28:22 +00:00
Simon Pilgrim	11276563c8	[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 ) If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector. Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value. This is mainly useful for better constant sharing.	2023-11-27 10:26:26 +00:00
David Green	295edaab13	[AArch64][GlobalISel] Better vecreduce.fadd lowering. (PR #73294 ) This changes the fadd legalization to handle fp16 types, and treats more types as legal so that the backend can produce the correct patterns. This is currently a missing identity fold for `fadd x -0.0 -> x`	2023-11-27 08:20:54 +00:00
Shengchen Kan	27c0bc9cae	[X86][MC] Allow to specify any of the 8/16/32/64 register names interchangeably for R16-R31 (#73421 )	2023-11-27 15:25:19 +08:00
Zi Xuan Wu (Zeson)	e89324219a	[RISCV] Don't combine store of vmv.x.s/vfmv.f.s to vp_store with VL of 1 when it's indexed store (#73219 ) Because we can't support vp_store with indexed address mode by lowering to vse intrinsic later.	2023-11-27 13:39:35 +08:00
Douglas Yung	1aa1d176ba	Add "REQUIRES: asserts" to test as it requires the compiler to hit an assertion failure to pass and was failing in release builds.	2023-11-25 21:45:58 -08:00
Chen Zheng	abc405858d	[XCOFF] make related SD symbols as isFunction (#69553 ) This will help tools like llvm-symbolizer recognizes more functions.	2023-11-26 11:59:09 +08:00
Craig Topper	75a9ed4246	[RISCV][GISel] Add simplest case of folding add with immediate into load/store address. This covers the simm12 offset case.	2023-11-25 10:48:35 -08:00
Craig Topper	564ff80e22	[RISCV][GISel] Test G_FRAME_INDEX folding into store address. NFC	2023-11-25 10:48:31 -08:00
David Green	9cee94b81b	[GlobalISel] Add identity fold for fadd -0.0 (#73296 ) -0.0 acts as the identity element for fadd. This doesn't try to add 0.0 too, which would require nsz fast math flags.	2023-11-25 08:35:26 +00:00
Craig Topper	26cf3aab83	[RISCV][GISel] Add more G_SEXTLOAD instruction selection tests. NFC	2023-11-24 23:58:11 -08:00
Craig Topper	f995afe7f2	[RISCV][GISel] Add G_FRAME_INDEX support to selectAddrRegImm. We can fold the G_FRAME_INDEX into a load/store address.	2023-11-24 23:57:54 -08:00
Florian Hahn	20f634f275	[Thumb] Add test case where the machine-outliner clobbers LR. Add ad test case where `bl OUTLINED_FUNCTION_0` clobbers LR, which in turn is used the later call to memcpy to return to the caller.	2023-11-24 20:27:43 +00:00
Stefan Pintilie	d896b1f5a6	[PowerPC] Do not string pool globals that are part of llvm used. (#66848 ) The string pooling pass was incorrectly pooling global varables that were part of llvm.used or llvm.compiler.used. This patch fixes the pass to prevent that by checking each candidate to make sure that it is not in either of those lists.	2023-11-24 12:21:28 -05:00
Antonio Frighetto	0ff5281c94	[GlobalISel] Treat shift amounts as unsigned in `matchShiftImmedChain` A miscompilation issue in the GISel pre-legalization phase has been addressed with improved routines. Fixes: https://github.com/llvm/llvm-project/issues/71440.	2023-11-24 18:14:52 +01:00
Craig Topper	5d501b1091	[GISel][RISCV] Fix several boundary cases in narrow G_SEXT_INREG. (#72719 ) This fixes cases when SizeInBits is a multiple of the narrow size. If SizeBits is equal to NarrowTy size, the first block would create an illegal G_SEXT_INREG where the the extension size is equal to the type. I tried to turn it into G_TRUNC+G_SEXT, but that just turned back into G_SEXT_INREG causing an infinite loop. So punt to the splitting case. In the for loop we should copy when the part ends on SizeInBits. In that case there is no G_SEXT_INREG needed for partial. But we should note that register in PartialExtensionReg for the first full part to use. If the part starts on SizeInBits then we should do an AShr of PartialExtensionReg. We should only get to the G_SEXT_INREG case if the SizeInBits is in the middle of the part.	2023-11-24 08:39:38 -08:00
Florian Hahn	820b3583c9	[AArch64] Add artificial clobbers to swift async context test. Manually add clobbers for various register combinations to tests. This highlights incorrectly performing shrink-wrapping, with StoreSwiftAsyncContext expansion clobbering a live register.	2023-11-24 14:14:49 +00:00
pasmpe01	de6c9c84e2	[TLI][AArch64] Add TLI Mappings of @llvm.exp10 for ArmPL and SLEEF. Update regex to _explicitly_ show which exp versions are added. The previous regex used `exp[^e]` to avoid matching calls like: `@llvm.experimental.stepvector`. Note: ArmPL Mappings for scalable types are not yet utilized (eg, `llvm.exp10.nxv2f64`, `llvm.exp10.nxv4f32`), as `replace-with-veclib` pass needs improvements.	2023-11-24 12:24:33 +00:00
Jay Foad	28233b11ac	[AMDGPU] New AMDGPUInsertSingleUseVDST pass (#72388 ) Add support for emitting GFX11.5 s_singleuse_vdst instructions. This is a power saving feature whereby the compiler can annotate VALU instructions whose results are known to have only a single use, so the hardware can in some cases avoid writing the result back to VGPR RAM. To begin with the pass is disabled by default because of one missing feature: we need an exclusion list of opcodes that never qualify as single-use producers and/or consumers. A future patch will implement this and enable the pass by default. --------- Co-authored-by: Scott Egerton <scott.egerton@amd.com>	2023-11-24 10:23:06 +00:00
David Green	b3dd14ce07	[AArch64] Add extra vecreduce.fmul tests. NFC	2023-11-24 10:00:00 +00:00
Phoebe Wang	ea81e31aa1	[X86][AVX10] Allow AVX10 use VBMI2 instructions (#73276 )	2023-11-24 12:54:30 +08:00
Craig Topper	0a9c6bea6b	[RISCV][GISel] Support G_CTTZ/CTLZ with Zbb.	2023-11-23 14:15:11 -08:00
Craig Topper	5bb03d25f7	[RISCV][GISel] Support G_CTPOP with Zbb.	2023-11-23 13:06:23 -08:00
Björn Pettersson	3114bd32e7	[StackColoring] Do not drop AA metadata when not doing remappings (#71958 ) In the StackColoring pass we first scan for possible stack slot merges. A SlotRemap map is setup with the remappings that should be performed. Then the main work is done by calling remapInstructions and providing that map. Most of the work in remapInstructions would just be a waste of time in situations when the SlotRemap map is empty, but it turns out that the part that adjusts Alias Analysis information could end up dropping AA metadata even when there are no stack slot merges being done. This happens since all instruction's machine memory operands are considered, and if we can't determine the underlying object that is accessed (using getUnderlyingObjectsForCodeGen) then we conservatively drop AA metadata. This patch simply avoids calling remapInstructions if we don't intend to do any remappings (i.e. if SlotRemap is empty). That avoids touching AA metadata when all we do is to remove lifetime markers. That seems like a safe thing to do, as it is the same thing as happens when we bail out early due to other reasons (e.g. when only having one lifetime marker). For targets that do not care about Alias Analysis information after the StackColoring pass this shouldn't have any impact, except that it might improve compile time slightly as we now skip spending time in remapInstructions when not doing any stack merges.	2023-11-23 18:10:40 +01:00
Simon Pilgrim	381efa4960	Revert rG67275263b3b781a "[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 )" Missed an issue that we were calling continue from within the for loop - fixed version incoming shortly.	2023-11-23 16:50:58 +00:00
Jay Foad	cf1e0c0b07	[AMDGPU] Define new targets gfx1200 and gfx1201 (#73133 ) Define target names and ELF numbers for new GFX12 targets gfx1200 and gfx1201. For now they behave identically to GFX11.	2023-11-23 16:44:05 +00:00
Simon Pilgrim	67275263b3	[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 ) If we are loading the same ptr at different vector widths, then reuse the larger load and just extract the low subvector. Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value. This is mainly useful for better constant sharing.	2023-11-23 14:10:23 +00:00
Acim Maravic	376b22a371	[LLVM] Make s_getpc_b64 rematerializable (#71823 )	2023-11-23 13:07:12 +01:00
hev	0d9f557b6c	[LoongArch] Disable mulodi4 and muloti4 libcalls (#73199 ) This library function only exists in compiler-rt not libgcc. So this would fail to link unless we were linking with compiler-rt. Fixes https://github.com/ClangBuiltLinux/linux/issues/1958	2023-11-23 19:34:50 +08:00
Thorsten Schütt	b71b32ba87	[Gisel][AArch64] legalize G_IS_FPCLASS (#72796 )	2023-11-23 10:31:05 +01:00
Craig Topper	1343d96ec1	[RISCV][GISel] Suppport G_BSWAP with Zbb.	2023-11-23 00:39:42 -08:00
Zhaoxuan Jiang	147c5d6686	[AArch64] Allow LDR merge with same destination register by renaming (#71908 ) The patch is based on a reverted patch: https://reviews.llvm.org/D103597. It was trying to rename registers before alias check, which is not safe and causes miscompiles. This patch does 2 things: 1. Do the renaming with necessary checks passed, including alias check. 2. Rename the register for the instructions between the pairs and combine the second load into the first. By doing so we can just check the renamability between the pairs and avoid scanning unknown amount of instructions before/after the pairs. Necessary refactoring has been made in order to reuse as much code possible with STR renaming.	2023-11-23 08:21:27 +00:00
Pierre van Houtryve	d76d8e541d	[AMDGPU][NFC] Update GISel memory-legalizer-atomic-fence test (#72829 ) Test needs to be moved to MIR checks and use stop-after=si-memory-legalizer to avoid being optimized out in a future patch.	2023-11-23 09:09:05 +01:00
hev	7414c0db96	[LoongArch] Precommit a test for smul with overflow (NFC) (#73212 )	2023-11-23 15:15:26 +08:00
Wang Pengcheng	5973272af7	[RISCV] Add MinimumJumpTableEntries to TuneInfo (#72963 ) This is like what AArch64 has done in #71166 except that we don't handle `HasMinSize` case now.	2023-11-23 14:05:23 +08:00
Min-Yih Hsu	7c3c8a1277	[RISCV][GISel] Add support for G_IS_FPCLASS in F and D extensions (#72000 ) Add legalizer, regbankselect, and isel supports for floating point version of G_IS_FPCLASS.	2023-11-22 16:43:20 -08:00
Craig Topper	a845061935	[AArch64] Use the same fast math preservation for MachineCombiner reassociation as X86/PowerPC/RISCV. (#72820 ) Don't blindly copy the original flags from the pre-reassociated instrutions. This copied the integer poison flags which are not safe to preserve after reassociation. For the FP flags, I think we should only keep the intersection of the flags. Override setSpecialOperandAttr to do this. Fixes #72777.	2023-11-22 14:17:45 -08:00
LWenH	32903b0b6d	[MCP] fix PowerPC redundant copy instructions removal fail test cases, NFC	2023-11-23 01:54:53 +08:00
Florian Hahn	a842430c20	[AArch64] Add check that prologue insertion doesn't clobber live regs. (#71826 ) This patch extends AArch64FrameLowering::emitProglogue to check if the inserted prologue clobbers live registers. It updates `llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir` with an extra load to make x9 live before the store, preserving the original test. It uses the original `llvm/test/CodeGen/AArch64/framelayout-scavengingslot.mir` as `llvm/test/CodeGen/AArch64/emit-prologue-clobber-verification.mir`, because there x9 is marked as live on entry, but used as scratch reg as it is not callee saved. The new assertion catches a mis-compile in `store-swift-async-context-clobber-live-reg.ll` on https://github.com/apple/llvm-project/tree/next	2023-11-22 16:49:33 +00:00

... 36 37 38 39 40 ...

52796 Commits