llvm-project

Author	SHA1	Message	Date
Sean Fertile	cef56b9318	Revert "[XCOFF][AIX] Peephole optimization for toc-data." This reverts commit 5e28d30f1fb10faf2db2f8bf0502e7fd72e6ac2e.	2023-08-15 10:40:35 -04:00
Sean Fertile	ce658829c9	Revert "[PPC][AIX] Fix toc-data peephole bug and some related cleanup." This reverts commit b37c7ed0c95c7f24758b1532f04275b4bb65d3c1.	2023-08-15 10:40:35 -04:00
Jay Foad	fdbc944385	Fix typos in comments	2023-08-15 13:57:21 +01:00
Jingu Kang	9f8dcb0706	[AArch64] Try to detect patterns with fdiv and fmul for [su]cvtf. If fmul's constant operand is the reciprocal of a power of 2 (i.e 1/2^n) or fdiv's constant operand is power of 2, we can try to match patterns with [su]int_to_fp for [su]cvtf. Differential Revision: https://reviews.llvm.org/D156538	2023-08-15 10:57:07 +01:00
Jay Foad	f0e5f73fdc	[MachineScheduler] Account for lane masks in basic block liveins Differential Revision: https://reviews.llvm.org/D157633	2023-08-15 09:52:43 +01:00
wangpc	ac00cca3d9	[RISCV] Fix assertion when passing f64 vectors via integer registers The vector arguments are split but assignments won't be pending. Fixes #64645 Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157847	2023-08-15 12:11:08 +08:00
wangpc	61ab106f82	[RISCV] Add tune features of preferred function/loop align D144048 has added preferred function and loop alignment to RISCVSubtarget, but now we need to set them manually for different processors. Tune features that set preferred function/loop align to [2, 64] bytes (align 1 is not here since the min align is 2) are added. These features can be used in processor definitions. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157832	2023-08-15 12:04:12 +08:00
Eduard Zingerman	08d92dedd2	[BPF] Fix in/out argument constraints for CORE_MEM instructions When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { int a; } __attribute__((preserve_access_index)); void test(struct t t) { t->a = 42; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t.c -o /dev/null Function Live Ins: $r1 in %0 bb.0.entry: liveins: $r1 DBG_VALUE $r1, $noreg, !"t", ... %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", ... %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... RET debug-location !25; t.c:7:1 Bad machine code: Explicit definition marked as use * - function: test - basic block: %bb.0 entry (0x6210000d8a90) - instruction: CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... - operand 0: killed %4:gpr This happens because `CORE_MEM` instruction is defined to have output operands: def CORE_MEM : TYPE_LD_ST<BPF_MEM.Value, BPF_W.Value, (outs GPR:$dst), (ins u64imm:$opcode, GPR:$src, u64imm:$offset), "$dst = core_mem($opcode, $src, $offset)", []>; As documented in [1]: > By convention, the LLVM code generator orders instruction operands > so that all register definitions come before the register uses, even > on architectures that are normally printed in other orders. In other words, the first argument for `CORE_MEM` is considered to be a "def", while in reality it is "use": %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr %4:gpr = MOV_ri 42 '---------------. v CORE_MEM killed %4:gpr, 411, %0:gpr, @"llvm.t:0:0$0:0", ... Here is how `CORE_MEM` is constructed in `BPFMISimplifyPatchable::checkADDrr()`: BuildMI(DefInst->getParent(), DefInst, DefInst->getDebugLoc(), TII->get(COREOp)) .add(DefInst->getOperand(0)).addImm(Opcode).add(*BaseOp) .addGlobalAddress(GVal); Note that first operand is constructed as `.add(DefInst->getOperand(0))`. For `LD{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a destination register of a load, so instruction is constructed in accordance with `outs` declaration. For `ST{D,W,H,B}` instructions the `DefInst->getOperand(0)` is a source register of a store (value to be stored), so instruction violates the `outs` declaration. This commit fixes the issue by splitting `CORE_MEM` in three instructions: `CORE_ST`, `CORE_LD64`, `CORE_LD32` with correct `outs` specifications. [1] https://llvm.org/docs/CodeGenerator.html#the-machineinstr-class Differential Revision: https://reviews.llvm.org/D157806	2023-08-15 02:34:21 +03:00
Eduard Zingerman	27026fe563	[BPF] Reset machine register kill mark in BPFMISimplifyPatchable When LLVM is build with `LLVM_ENABLE_EXPENSIVE_CHECKS=ON` option the following C code snippet: struct t { unsigned long a; } __attribute__((preserve_access_index)); void foo(volatile struct t t, volatile unsigned long p) { p = t->a; p = t->a; } Causes an assertion: $ clang -g -O2 -c --target=bpf -mcpu=v2 t2.c -o /dev/null # After BPF PreEmit SimplifyPatchable # Machine code for function foo: IsSSA, TracksLiveness Function Live Ins: $r1 in %0, $r2 in %1 bb.0.entry: liveins: $r1, $r2 DBG_VALUE $r1, $noreg, !"t", !DIExpression() DBG_VALUE $r2, $noreg, !"p", !DIExpression() %1:gpr = COPY $r2 DBG_VALUE %1:gpr, $noreg, !"p", !DIExpression() %0:gpr = COPY $r1 DBG_VALUE %0:gpr, $noreg, !"t", !DIExpression() %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %5:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %5:gpr, %1:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr %8:gpr = CORE_LD 344, %0:gpr, @"llvm.t:0:0$0:0" STD killed %8:gpr, %1:gpr, 0 RET # End machine code for function foo. * Bad machine code: Using a killed virtual register * - function: foo - basic block: %bb.0 entry (0x6210000e6690) - instruction: %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr - operand 2: killed %2:gpr This happens because of the way BPFMISimplifyPatchable::processDstReg() updates second operand of the `ADD_rr` instruction. Code before `BPFMISimplifyPatchable`: .-> %2:gpr = LD_imm64 @"llvm.t:0:0$0:0" \| \|`----------------. \| %3:gpr = LDD %2:gpr, 0 \| %4:gpr = ADD_rr %0:gpr(tied-def 0), killed %3:gpr <--- (1) \| %5:gpr = LDD killed %4:gpr, 0 ^^^^^^^^^^^^^ \| STD killed %5:gpr, %1:gpr, 0 this is updated `----------------. %6:gpr = LDD %2:gpr, 0 %7:gpr = ADD_rr %0:gpr(tied-def 0), killed %6:gpr <--- (2) %8:gpr = LDD killed %7:gpr, 0 ^^^^^^^^^^^^^ STD killed %8:gpr, %1:gpr, 0 this is updated Instructions (1) and (2) would be updated to: ADD_rr %0:gpr(tied-def 0), killed %2:gpr The `killed` mark is inherited from machine operands `killed %3:gpr` and `killed %6:gpr` which are updated inplace by `processDstReg()`. This commit updates `processDstReg()` reset kill marks for updated machine operands to keep liveness information conservatively correct. Differential Revision: https://reviews.llvm.org/D157805	2023-08-15 02:23:38 +03:00
Anmol P. Paralkar	53e89f5e3f	[RISCV] Add bounds check before use on returned iterator. Check iterator validity before use; fixes a crash seen in the RISC-V Zcmp Push/Pop optimization pass when compiling an internal benchmark. Reviewed By: asb, wangpc Differential Revision: https://reviews.llvm.org/D157674	2023-08-14 16:06:09 -07:00
Matt Arsenault	1faa4797ca	AMDGPU: Handle unsafe exp.f32 with denormal handling I somehow missed this path when adding the new expansions. Saves a lot of instructions for afn + IEEE. https://reviews.llvm.org/D157867	2023-08-14 18:36:01 -04:00
Matt Arsenault	d45022b094	AMDGPU: Remove special case constant folding of divide We should probably just swap this out for the fdiv, but that's what the implementation is anyway.	2023-08-14 18:36:01 -04:00
Matt Arsenault	0eabe65bfb	AMDGPU: Replace ldexp libcalls with intrinsic	2023-08-14 18:36:01 -04:00
Matt Arsenault	f337a77c99	AMDGPU: Replace rounding libcalls with intrinsics	2023-08-14 18:36:01 -04:00
Matt Arsenault	c7876c55ac	AMDGPU: Replace fabs and copysign libcalls with intrinsics Preserves flags and metadata like the other cases.	2023-08-14 18:28:21 -04:00
Matt Arsenault	a70006c4c5	AMDGPU: Replace some libcalls with intrinsics OpenCL loses fast math information by going through libcall wrappers around intrinsics. Do this to preserve call site flags which are lost when inlining. It's not safe in general to propagate flags during inline, so avoid dealing with this by just special casing some of the useful calls.	2023-08-14 18:20:47 -04:00
Craig Topper	9cf375b310	[RISCV][GISel] Narrow G_SEXT_INREG to XLenLLT before lowering. If we lower, we need to legalize the wide shifts which is costly. This will improve the tests from https://reviews.llvm.org/D157415 too Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157677	2023-08-14 14:49:00 -07:00
Craig Topper	6299650f97	[DAGCombiner] Fold trunc(undef) -> undef. We already do this in getNode, but the undef might appear during another DAGCombine. While here remove code for handling noop truncates. getNode checks the types and won't a noop truncate. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157910	2023-08-14 13:02:24 -07:00
Philip Reames	a63bd7e99b	[RISCV] Use NoReg in place of IMPLICIT_DEF for undefined passthru operands In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG does CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282. This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers. We may end up backporting this into the 17.x release branch. Given how late in the release cycle this is landing, that's much less likely now, but still a possibility. Differential Revision: https://reviews.llvm.org/D156909	2023-08-14 12:57:38 -07:00
Joe Nash	dc242f9f1e	[AMDGPU][NFC] Convert fpto{u\|s}i f16 tests to auto-gen Makes it easier to add GFX11 runline in future patch, which has significantly different output.	2023-08-14 15:28:20 -04:00
Matt Arsenault	a8376bbe53	AMDGPU: Add baseline tests for libcall to intrinsic handling Test all the different itanium mangled opencl functions that are interesting to replace with raw intrinsic calls. https://reviews.llvm.org/D157873	2023-08-14 15:15:30 -04:00
Craig Topper	e4b2f2d4a6	[RISCV][GISel] Legalize G_PHI and G_BRCOND. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157818	2023-08-14 10:21:58 -07:00
Craig Topper	1fa858d987	[RISCV][GISel] Make G_CONSTANT of pointers legal. This is needed to support things like null pointers. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157822	2023-08-14 10:11:29 -07:00
David Green	3f8210921e	[AArch64] Add FeatureFuseAdrpAdd for NeoverseV2 As in all the other cpus from D134521, this adds FeatureFuseAdrpAdd to NeoverseV2 to allow more linker relaxations.	2023-08-14 17:21:25 +01:00
Matt Arsenault	f44beecb78	AMDGPU: Try to use private version of sincos if available The comment was out of date, the device libs build does provide all the pointer overloads. An extremely pedantic interpretation of the spec would suggest only the flat version exists, but the overloads do exist in the implementation. https://reviews.llvm.org/D156720	2023-08-14 11:40:04 -04:00
Luke Lau	9f369a4c43	[RISCV] Lower reverse shuffles of fixed i1 vectors to vbrev.v If we can fit an entire vector of i1 into a single element, e.g. v32i1 -> v1i32, then we can reverse it via vbrev.v. We need to handle the case where the vector doesn't exactly fit into the larger element type, e.g. v4i1 -> v1i8. In this case we shift up the reversed bits afterwards. Reviewed By: fakepaper56, 4vtomat Differential Revision: https://reviews.llvm.org/D157614	2023-08-14 16:36:58 +01:00
Matt Arsenault	58fd1de09f	AMDGPU: Consider nobuiltin when querying defined libfuncs https://reviews.llvm.org/D156708	2023-08-14 11:30:12 -04:00
Matt Arsenault	42c6e4209c	AMDGPU: Handle multiple uses when matching sincos Match how the generic implementation handles this. We now will leave behind the dead other user for later passes to deal with. https://reviews.llvm.org/D156707	2023-08-14 11:28:41 -04:00
Dinar Temirbulatov	f598b616e0	[AArch64][SME] Non-streaming compatible SCVTF emitted with --force-streaming-compatible-sve For scalar integer to float converts for Streaming Compatible SVE use non-NEON version of convert instrction. Differential Revision: https://reviews.llvm.org/D157698	2023-08-14 13:49:57 +00:00
David Green	a3f2751f78	[AArch64][GISel] Add handling for G_VECREDUCE_FMAXIMUM and G_VECREDUCE_FMINIMUM This is a lot of copy-pasting for the existing handling of G_VECREDUCE_FMAX/G_VECREDUCE_FMIN to add handling for G_VECREDUCE_FMAXIMUM/G_VECREDUCE_FMINIMUM in the same way. Differential Revision: https://reviews.llvm.org/D156615	2023-08-14 10:03:25 +01:00
Luke Lau	6238b8ea63	[LegalizeTypes] Factor in vscale_range when widening insert_subvector Currently when widening operands for insert_subvector nodes, we check first that the indices are valid by seeing if the subvector is statically known to be smaller than or equal to the in-place vector. However if we're inserting a fixed subvector into a scalable vector we rely on the minimum vector length of the latter. This patch extends the widening logic to also take into account the minimum vscale from the vscale_range attribute, so we can handle more scenarios where we know the scalable vector is large enough to contain the subvector. Fixes https://github.com/llvm/llvm-project/issues/63437 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153519	2023-08-14 09:58:15 +01:00
David Green	d199478af4	[AArch64][GISel] Handling for G_VECREDUCE_FMIN and G_VECREDUCE_FMAX This adds legalization for G_VECREDUCE_FMIN and G_VECREDUCE_FMAX, where the selection can go via tablegen patterns. I haven't tried to get non-power2 types working yet, just the more legal types. Differential Revision: https://reviews.llvm.org/D156614	2023-08-14 09:19:47 +01:00
Nikita Popov	9deee6bffa	[SDAG] Don't transfer !range metadata without !noundef to SDAG (PR64589) D141386 changed the semantics of !range metadata to return poison on violation. If !range is combined with !noundef, violation is immediate UB instead, matching the old semantics. In theory, these IR semantics should also carry over into SDAG. In practice, DAGCombine has at least one key transform that is invalid in the presence of poison, namely the conversion of logical and/or to bitwise and/or (`c7b537bf09/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L11252)`). Ideally, we would fix this transform, but this will require substantial work to avoid codegen regressions. In the meantime, avoid transferring !range metadata without !noundef, effectively restoring the old !range metadata semantics on the SDAG layer. Fixes https://github.com/llvm/llvm-project/issues/64589. Differential Revision: https://reviews.llvm.org/D157685	2023-08-14 09:04:27 +02:00
LWenH	555e0305fd	[RISCV] Match ext + ext + srem + trunc to vrem.vv This patch match the SDNode pattern:" trunc (srem(sext, ext))" to vrem.vv. This could remove the extra "vsext" ,"vnsrl" and the "vsetvli" instructions in the case like "c[i] = a[i] % b[i]", where the element types in the array are all int8_t or int16_t at the same time. For element types like uint8_t or uint16_t, the "zext + zext + urem + trunc" based redundant IR have been removed during the instCombine pass, this is because the urem operation won't lead to the overflowed in the LLVM. However, for signed types, the instCombine pass can not remove such patterns due to the potential for Undefined Behavior in LLVM IR. Taking an example, -128 % -1 will lead to the Undefined Behaviour(overflowed) under the i8 type in LLVM IR, but this situation doesn't occur for i32. To address this, LLVM first signed extends the operands for srem to i32 to prevent the UB. For RVV, such overflow operations are already defined by the specification and yield deterministic output for extreme inputs. For example, based on the spec, for the i8 type, -128 % -1 actually have 0 as the output result under the overflowed situation. Therefore, it would be able to match such pattern in the instruction selection phase for the rvv backend rather than removing them in the target-independent optimization passes like instCombine pass. This patch only handle the sign_ext circumstances for srem. For more information about the C test cases compared with GCC, please see : https://gcc.godbolt.org/z/MWzE7WaT4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156685	2023-08-13 22:14:43 -07:00
LWenH	6cb55a3d9a	[RISCV] Add Precommit test for D156685 Add baseline test for [[ https://reviews.llvm.org/D156685 \| D156685 ]]. In LLVM, such signed 8 bits reaminder operation will first signed extened the operands to 32 bits, and then narrow the operands to the smaller bits data type such as 16 bits during the CorrelatedValuePropagation Pass to optimize the final data storage size. Such a signed extension operation for srem in LLVM system is to prevent the Undefined Behavior. Taking an example, -128 % -1 will lead to the Undefined Behaviour under the i8 type in LLVM IR, but this won't happen for i32, so such pattern cannot be eliminated in the platform-independent InstCombine Pass. The LLVM IR of these sext/trunc operations will be translated one by one during the RVV backend code generation process, and redundant vsetvli instructions will be inserted. In fact, according to the RVV instruction manual, the vrem.vv instruction has already specified the final output value of this type of overflow operation. For example, the overflow operation of -128 % -1 will get 0 according to the RISC-V spec, so through this patch , I think we can optimize these redundant rvv code through the SDNode pattern match at the instruction selection phase. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157592	2023-08-13 22:14:38 -07:00
wangpc	8a98f24ec5	[RISCV] Truncate constants to EltSize when combine store of BUILD_VECTOR The constants can be with larger bit width, so we need to truncate them to EltSize or we will exceed the width of fixed-length vector. Fixes #64588 Reviewed By: luke, craig.topper, bjope, michaelmaitland Differential Revision: https://reviews.llvm.org/D157603	2023-08-14 10:55:53 +08:00
Shengchen Kan	fda9a9c61e	[X86][Codegen] Remove dead code for ADCX/ADOX There is no pattern for ADCX/ADOX and they are never selected during ISEL. So we remove the cases in some MIR optimizations in this patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D157717	2023-08-14 10:23:42 +08:00
Karl-Johan Johnsson	917574d5d8	[MachineLICM][WinEH] Don't hoist register reloads out of funclets This fixes https://github.com/llvm/llvm-project/issues/60766 With MSVC style exception-handling (funclets), no registers are alive when entering the funclet so they must be reloaded from the stack. MachineLICM can sometimes hoist such reloads out of the funclet which is not correct, the register will have been clobbered when entering the funclet. This can happen in any loop that contains a try-catch. This has been tested on x86_64-pc-window-msvc. I'm not sure if funclets work the same on the other windows archs. Reviewed By: rnk, arsenm Differential Revision: https://reviews.llvm.org/D153337	2023-08-13 23:58:16 +03:00
Simon Pilgrim	512a6c50e8	[X86] combineToExtendBoolVectorInReg - don't use changeVectorElementType to create the bool vector type Converting a (simple) vXf32 type to a vXi1 type isn't guaranteed to be simple, causing the MVT type to be invalid. Fixes #64627	2023-08-13 11:18:59 +01:00
Christudasan Devadasan	bd7c6e3c48	[AMDGPU] Precommit lit test for wwm-reg AV spill pseudos D155646.	2023-08-12 16:18:18 +05:30
Philip Reames	84a2b55b0d	[RISCV] Add test coverage for matching strided loads with negative offsets	2023-08-11 15:27:01 -07:00
Konrad Kusiak	4fa8a5487e	[AMDGPU] Add sanity check that fixes bad shift operation in AMD backend There is a problem with the SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to UB. This boolean function decides if two masks can be combined into 1. The idea here is that the bits which are "on" in one mask, don't overlap with the "on" bits of the other. Consider an example (10 bits for simplicity): Mask 1: 0101101000 Mask 2: 0000000110 Those can be combined into a single mask: 0101101110. To check if such an operation is possible, the code takes the mask which is greater and counts how many 0s there are, starting from the LSB and stopping at the first 1. Then, it shifts 1u by this number and compares it with the smaller mask. The problem is that when both masks are 0, the counter will find 32 zeroes in the first mask and will try to do a shift by 32 positions which leads to UB. The fix is a simple sanity check, if the bigger mask is 0 or not. https://reviews.llvm.org/D155051	2023-08-11 15:26:35 -04:00
Mirko Brkusanin	1e5359c6ba	[AMDGPU] Treat KIMM32 and KIMM16 operand types as noninlinable While they are represent 32/16 bit immediate values they are already included in encoding of the instructions that use them and are not true literals. FMAMK and FMAAK instructions that use them are marked with fixed size so getInstSizeInBytes will not increase the size for these operands. We also add tests whose logic relies on KIMM16 and KIMM32 being considered not inlinable. Differential Revision: https://reviews.llvm.org/D157624	2023-08-11 18:46:39 +02:00
Jeffrey Byrnes	f76ffc1f40	[MCP] Invalidate copy for super register in copy source We must also track the super sources of a copy, otherwise we introduce a sort of subtle bug. Consider: 1. DEF r0:r1 2. USE r1 3. r6:r9 = COPY r10:r13 4. r14:15 = COPY r0:r1 5. USE r6 6.. r1:4 = COPY r6:9 BackwardCopyPropagateBlock processes the instructions from bottom up. After processing 6., we will have propagatable copy for r1-r4 and r6-r9. After 5., we invalidate and erase the propagatble copy for r1-r4 and r6 but not for r7-r9. The issue is that when processing 3., data structures still say we have valid copies for dest regs r7-r9 (from 6.). The corresponding defs for these registers in 6. are r1:r4, which we mark as registers to invalidate. When invalidating, we find the copy that corresponds to r1 is 4. (this was added when processing 4.), and we say that r1 now maps to unpropagatable copies. Thus, when we process 2., we do not have a valid copy, but when we process 1. we do -- because the mapped copy for subregister r0 was never invalidated. The net result is to propagate the copy from 4. to 1., and replace DEF r0:r1 with DEF r14:r15. Then, we have a use before def in 2. The main issue is that we have an inconsitent state between which def regs and which src regs are valid. When processing 5., we mark all the defs in 6. as invalid, but only the subreg use as invalid. Either we must only invalidate the individual subreg for both uses and defs, or the super register for both. Differential Revision: https://reviews.llvm.org//D157564 Change-Id: I99d5e0b1a0d735e8ea3bd7d137b6464690aa9486	2023-08-11 09:01:18 -07:00
Jeffrey Byrnes	d0e54e377b	[AMDGPU] Extend CalculateByteProvider to capture vectors and signed Differential Revision: https://reviews.llvm.org/D157133 Change-Id: I9ba8727b4ac5a627de2f7d87d2169eb79e01f0ee	2023-08-11 08:47:17 -07:00
Joe Nash	2fb4bfa5ba	[AMDGPU][True16] Fix ISel for A16 Image Instructions The 16-bit VAddr arguments to A16 image instructions are packed into legal VGPR_32 operands in AMDGPULegalizerInfo::legalizeImageIntrinsic on all subtargets. With True16, we also need to pack if the number of VAddr is one because VGPR_16 is not a legal argument to those Image instructions. No change to emitted code intended on subtargets pre-GFX11, and none on GFX11 until True16 is active. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D157426	2023-08-11 11:12:16 -04:00
Matt Devereau	c52d9509d4	[AArch64][SVE] Add asm predicate constraint Uph Some instructions such as multi-vector LD1 only accept a range of PN8-PN15 predicate-as-counter. This new constraint allows more refined parsing and better decision making when parsing these instructions from ASM, instead of defaulting to Upa which incorrectly uses the whole range of registers P0-P15 from the register class PPR. Differential Revision: https://reviews.llvm.org/D157517	2023-08-11 14:48:19 +00:00
Matt Arsenault	8f18cf77e7	AMDGPU: Check for implicit defs before constant folding instruction Can't delete the constant folded instruction if scc is used. Fixes #63986 https://reviews.llvm.org/D157504	2023-08-11 10:29:53 -04:00
Matt Arsenault	1030483561	AMDGPU/GlobalISel: Handle stacksave/stackrestore https://reviews.llvm.org/D156670	2023-08-11 10:25:01 -04:00
Matt Arsenault	9a53f5f5c4	AMDGPU: Handle llvm.stacksave and llvm.stackrestore Not sure if the only valid use is to have stackrestore directly consume stacksave outputs or not. Handled exactly like a regular stack pointer so all the edge cases theoretically should work. https://reviews.llvm.org/D156669	2023-08-11 10:25:01 -04:00

... 65 66 67 68 69 ...

52796 Commits