llvm-project

Author	SHA1	Message	Date
Tuan Chuong Goh	5686f06d7f	[AArch64][GlobalISel] Select USHLL2 Instruction Select ushll2 instruction instead of using mov and ushll Differential Revision: https://reviews.llvm.org/D158420	2023-08-23 13:41:47 +01:00
David Green	ef0b8cf3f4	[AArch64][GISel] Expand coverage of FAdd and FSub. This adds some more extensive test coverage for fadd/fsub through global isel, switching the opcodes to use the more complete ActionDefinitions to handle more cases.	2023-08-23 09:51:06 +01:00
Jianjian GUAN	879e801a91	[RISCV] Apply promotion for f16 vector ops when only have zvfhmin For most fp16 vector ops, we could promote it to fp32 vector when zvfhmin is enable but zvfh is not. But for nxv32f16, we need to split it first since nxv32f32 is not a valid MVT. Reviewed By: michaelmaitland Differential Revision: https://reviews.llvm.org/D153848	2023-08-23 16:49:20 +08:00
Jianjian GUAN	759903568f	[RISCV] Add Zvfhmin extension support for llvm RISCV backend This patch supports Zvfhmin for RISCV codegen. Reviewed By: michaelmaitland Differential Revision: https://reviews.llvm.org/D151414	2023-08-23 16:47:47 +08:00
esmeyi	96b5ea6e00	[NFC][PowerPC] Add cases for 64-bit constants.	2023-08-23 04:10:16 -04:00
wanglei	1bb7766489	[LoongArch] Optimize stack realignment using BSTRINS instruction Prior to this change, stack realignment was achieved using the SRLI/SLLI instructions in two steps. With this patch, stack realignment is optimized using a single `BSTRINS` instruction. Reviewed By: SixWeining, xen0n Differential Revision: https://reviews.llvm.org/D158384	2023-08-23 09:21:42 +08:00
Rahman Lavaee	d7e10df605	Remove checking stats from -gc-empty-basic-blocks test. The test does not require asserts. So it can't check the stats.	2023-08-23 01:18:47 +00:00
Rahman Lavaee	d0ec03a384	Revert "[BasicBlockSections] avoid insertting redundant branch to fall through blocks" This reverts commit ab53109166c0345a79cbd6939cf7bc764a982856 which was commited by mistake.	2023-08-23 01:09:13 +00:00
Rahman Lavaee	ab53109166	[BasicBlockSections] avoid insertting redundant branch to fall through blocks	2023-08-22 23:32:02 +00:00
Rahman Lavaee	e280e406c2	Add a pass to garbage-collect empty basic blocks after code generation. Propeller and pseudo-probes map profiles back to Machine IR via basic block addresses that are stored in metadata sections. Empty basic blocks (basic blocks without real code) obfuscate the profile mapping because their addresses collide with their next basic blocks. For instance, the fallthrough block of an empty block should always be adjacent to it. Otherwise, a completely unnecessary jump would be added. This patch adds a MachineFunction pass named `GCEmptyBasicBlocks` which attempts to garbage-collect the empty blocks before the `BasicBlockSections` and pass. This pass removes each empty basic block after redirecting its incoming edges to its fall-through block. The garbage-collection is not complete. We keep the empty block in 4 cases: 1. The empty block is an exception handling pad. 2. The empty block has its address taken. 3. The empty block is the last block of the function and it has predecessors. 4. The empty block is the only block of the function. The first three cases are extremely rare in normal code (no cases for the clang binary). Removing the blocks under the first two cases requires modifying exception handling structures and operands of non-terminator instructions -- which is doable but not worth the additional complexity in the pass. Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D107534	2023-08-22 22:42:19 +00:00
Daniel Hoekwater	90ab85a1b2	Reland "[CodeGen][AArch64] Make MFS testable on AArch64" Reverted by 3d22dac6c3b97d7bb92f243886dfb0d32a5c42e9 because it depended on b9d079d6188b50730e0a67267b7fee36008435ce, which broke some tests.	2023-08-22 20:21:33 +00:00
Craig Topper	d9320e22d4	[RISCV][GlobalISel] Select register banks for GPR ALU instructions This patch implements the getInstrMapping hook for RISCVRegisterBankInfo and others in order to correctly select the GPR register bank for operands of ALU instructions, and the associated operations introduced by the legalizer. Co-authored-by: Lewis Revill <lewis.revill@embecosm.com> Reviewed By: nitinjohnraj Differential Revision: https://reviews.llvm.org/D76051	2023-08-22 12:31:20 -07:00
Joseph Huber	8a20612467	[AMDGPU] Respect `nobuiltin` when converting `printf` The AMDGPU backend uses a pass to transform calls to the `printf` function to a built-in verision for either HIP or OpenCL. Currently this does not respect `-fno-builtin` and is always emitted. This allows the user to turn off this functionality as is standard for these types of built-in transformations. The motivation behind this change is to allow the `libc` project to provide a linkable version of the `printf` function in the future. Reviewed By: sameerds Differential Revision: https://reviews.llvm.org/D158477	2023-08-22 12:48:16 -05:00
Changpeng Fang	930e8dea41	AMDGPU: Add s-memrealtime and s-memtime-inst to RemoveIncompatibleFunctions Summary: Under -O0, device-libs may still emit these instructions under conditions. So we need to remove them with warning if not compatible. Fixes: SWDEV-417219 Reviewers: arsenm, Pierre-vh and b-sumner Differential Revision: https://reviews.llvm.org/D158316	2023-08-22 10:22:41 -07:00
David Green	13c2514df3	[AArch64] Disable GlobalISel/FastISel for more SME functions The patch D136361 disabled GlobalISel and FastISel for some SME functions, as the saving and restoring of SM is not yet handled. There were several tests added for fp128 fadd, which will be expanded to a libcall, that only happened to work by accident and did not handle other cases such as f32/f64 frem libcalls. This extends the cases where GlobalISel / FastISel is disabled for functions with SME attributes, under the assumption that it is difficult to tell what will become a libcall reliably, and so should fall back for all function until GlobalISel and/or FastISel can handle them. Differential Revision: https://reviews.llvm.org/D158490	2023-08-22 18:06:27 +01:00
David Green	ba114abec7	[AArch64] Add extra SME attribute tests for expanded intrinsics. NFC See D136361.	2023-08-22 17:51:57 +01:00
Noah Goldstein	7c9fe735d4	[ValueTracking] Strengthen analysis in `computeKnownBits` of phi Use the comparison based analysis to strengthen the standard knownbits analysis rather than choosing either/or. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D157807	2023-08-22 10:59:03 -05:00
Philip Reames	c3b48ec6ff	[RISCV] Match strided loads with reversed indexing sequences This extends the concat_vector of loads to strided_load transform to handle reversed index pattern. The previous code expected indexing of the form (a0, a1+S, a2+S,...). However, we can also see indexing of the form (a1+S, a2+S, a3+S, .., aS). This form is a strided load starting at address aN + S(n-1) with stride -S. Note that this is also fixing what looks to be a bug in the memory location reasoning for forward strided case. A strided load with negative stride access eltsize bytes past base ptr, and then bytes before* base ptr. (That is, the range should extend from before base ptr to after base ptr.) Differential Revision: https://reviews.llvm.org/D157886	2023-08-22 07:59:49 -07:00
Philip Reames	ecb855a5a8	[RISCV] Reduce LMUL for vector extracts If we have a known (or bounded) index which definitely fits in a smaller LMUL register group size, we can reduce the LMUL of the slide and extract instructions. This loosens constraints on register allocation, and allows the hardware to do less work, at the potential cost of some additional VTYPE toggles. In practice, we appear (after prior patches) to do a decent job of eliminating the additional VTYPE toggles in most cases. Differential Revision: https://reviews.llvm.org/D158460	2023-08-22 07:36:17 -07:00
David Green	8f6a1a07cb	[GISel][AArch64] Combine G_BUILD_VECTOR(G_UNMERGE) with undef elements This extends the existing legalization combine to fold G_BUILD_VECTOR where the sources are all from the same G_UNMERGE, to handle cases where some of the lanes are undef. This comes up in the legalization of <3 x ..> vectors in AArch64, where they are padded with undef. There are two choices for what to create. This patch just removes the G_BUILD_VECTOR/G_UNMERGE, losing the information about which lanes are undef. The alternative would be to generate an identity G_SHUFFLE_VECTOR with undef lanes marked as undef. I think both have advantages and disadvantages. Differential Revision: https://reviews.llvm.org/D158063	2023-08-22 14:25:31 +01:00
Luke Lau	946c672fe0	[RISCV] Remove fixed length lmul max restriction from fp build_vector tests. NFC For the same reasons as D157973, remove the LMUL flag from the tests to simplify them and make the diffs in D157976 easier to read. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D158270	2023-08-22 11:13:02 +01:00
Luke Lau	4f996d7fbf	[RISCV] Add test for constant build_vector that could use vid. NFC We currently don't lower this to a vid because the addend doesn't fit into a vadd.vi immediate. An extra li here seems like a small cost to pay for a constant pool load. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157975	2023-08-22 11:12:57 +01:00
Luke Lau	7492b54bd5	[RISCV] Split up structs in buildvec tests. NFC Some of these tests have multiple vid sequence build_vectors in them, and machine CSE tidies up their output. But now that small build_vectors are sometimes lowered to scalar `vmv.x.s`s, the output is harder to read with no CSE, e.g. see `buildvec_no_vid_v4i8`. This patch splits them up into separate functions to address this, and also makes the diff in D157976 more clear since it causes `vmv.x.s` to be lowered to more frequently. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D157974	2023-08-22 11:12:52 +01:00
Luke Lau	9918853215	[RISCV] Remove fixed length lmul max restriction from build_vector tests. NFC I'm not sure why the flag was added initially but I find the tests a bit easier to read with it off, and it matches the default behaviour. Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D157973	2023-08-22 11:12:47 +01:00
Luke Lau	007b41b393	[RISCV] Don't relax policy to ta when vmerge's VL shrinks during folding When folding a vmerge into its operands, if the resulting VL is smaller than what the vmerge had originally then what was previously in its body then gets moved to the tail. In that case, we can't relax the tail policy to agnostic when the merge operand is undefined, since we need to preserve these elements past the new VL. Fixes https://github.com/llvm/llvm-project/issues/64754 Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D158161	2023-08-22 10:39:22 +01:00
Luke Lau	6e532f94eb	[RISCV] Add test case showing vmerge fold miscompile with tail policy Reviewed By: reames Differential Revision: https://reviews.llvm.org/D158160	2023-08-22 10:39:18 +01:00
Tuan Chuong Goh	0b91b1aec4	[AArch64][GlobalISel] Legalize and Lower Funnel Shift for GlobalISel Recognise G_FSHR with constant shift amount as a legal instruction. Lowers G_FSHL with constant shift to G_FSHR. If shift amount is non-constant, generic lowering is applied to the instruction. Differential Revision: https://reviews.llvm.org/D155484	2023-08-22 10:32:50 +01:00
Nikita Popov	1c6e6432ca	[SCEVExpander] Fix incorrect reuse of more poisonous instructions (PR63763) SCEVExpander tries to reuse existing instruction with the same SCEV expression. However, doing this replacement blindly is not safe, because the instruction might be more poisonous. What we were already doing is to drop poison-generating flags on the reused instruction. But this is not the only way that more poison can be introduced. The poison-generating flag might not be directly on the reused instruction, or the poison contribution might come from something like 0 * %var, which folds to 0 but can still introduce poison. This patch fixes the issue in a principled way, by determining which values can contribute poison to the SCEV expression, and then checking whether any additional values can contribute poison to the instruction being reused. Poison-generating flags are dropped if doing that enables reuse. This is a pretty big hammer and does cause some regressions in tests, but less than I would have expected. I wasn't able to come up with a less intrusive fix that still satisfies the correctness requirements. Fixes https://github.com/llvm/llvm-project/issues/63763. Fixes https://github.com/llvm/llvm-project/issues/63926. Fixes https://github.com/llvm/llvm-project/issues/64333. Fixes https://github.com/llvm/llvm-project/issues/63727. Differential Revision: https://reviews.llvm.org/D158181	2023-08-22 09:27:07 +02:00
pvanhout	2d87319f06	[GlobalISel] Rewrite some simple rules using MIR Patterns Rewrites some simple rules that cause little to no codegen regressions as MIR patterns. I may have missed some easy cases, but some other rules have intentionally been left as-is because bigger changes are needed to make them work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D157690	2023-08-22 09:09:54 +02:00
Mikael Holmen	d1e685df45	[test] Add -verify-coalescing to testcase and fix problems Apparently the testcase coalesce-partial-redundant-reguse-terminator.mir was broken in a way that -verify-coalescing detected. Update the testcase so -verify-coalescing doesn't complain and so that it still exposes the problem originally fixed in 6c062b7641623. Differential Revision: https://reviews.llvm.org/D158397	2023-08-22 07:20:53 +02:00
Eduard Zingerman	651e644595	[BPF] Replace BPFMIPeepholeTruncElim by custom logic in isZExtFree() Replace `BPFMIPeepholeTruncElim` by adding an overload for `TargetLowering::isZExtFree()` aware that zero extension is free for `ISD::LOAD`. Short description ================= The `BPFMIPeepholeTruncElim` handles two patterns: Pattern #1: %1 = LDB %0, ... %1 = LDB %0, ... %2 = AND_ri %1, 0xff -> %2 = MOV_ri %1 <-- (!) Pattern #2: bb.1: bb.1: %a = LDB %0, ... %a = LDB %0, ... br %bb3 br %bb3 bb.2: bb.2: %b = LDB %0, ... -> %b = LDB %0, ... br %bb3 br %bb3 bb.3: bb.3: %1 = PHI %a, %b %1 = PHI %a, %b %2 = AND_ri %1, 0xff %2 = MOV_ri %1 <-- (!) Plus variations: - AND_ri_32 instead of AND_ri - SLL/SLR instead of AND_ri - LDH, LDW, LDB32, LDH32, LDW32 Both patterns could be handled by built-in transformations at instruction selection phase if suitable `isZExtFree()` implementation is provided. The idea is borrowed from `ARMTargetLowering::isZExtFree`. When evaluating on BPF kernel selftests and remove_truncate_.ll LLVM test cases this revisions performs slightly better than BPFMIPeepholeTruncElim, see "Impact" section below for details. Commit also adds a few test cases to make sure that patterns in question are handled. Long description ================ Why this works: Pattern #1 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 %cond = icmp eq i8 %a, 0 ret i1 %cond } Log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command: ... Type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 t19: i64 = and t16, Constant:i64<255> t17: i64 = setcc t19, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.1 t19: i64 = and t16, Constant:i64<255> With: t16: i64,ch = load<(load (s8) from %ir.p), anyext from i8> t0, t2, undef:i64 and 0 other values ... Optimized type-legalized selection DAG: %bb.0 'foo:entry' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t20: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t17: i64 = setcc t20, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t17 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Optimized type-legalized selection DAG: - `t19 = and t16, 255` had been replaced by `t16` (load). - Patterns like `(and (load ... i8), 255)` are replaced by `load` in `DAGCombiner::BackwardsPropagateMask` called from `DAGCombiner::visitAND`. - Similarly patterns like `(shl (srl ..., 56), 56)` are replaced by `(and ..., 255)` in `DAGCombiner::visitSRL` (this function is huge, look for `TLI.shouldFoldConstantShiftPairToMask()` call). Why this works: Pattern #2 -------------------------- Consider the following example: define i1 @foo(ptr %p) { entry: %a = load i8, ptr %p, align 1 br label %next next: %cond = icmp eq i8 %a, 0 ret i1 %cond } Consider log for `llc -mcpu=v2 -mtriple=bpfel -debug-only=isel` command. Log for first basic block: Initial selection DAG: %bb.0 'foo:entry' SelectionDAG has 9 nodes: t0: ch,glue = EntryToken t3: i64 = Constant<0> t2: i64,ch = CopyFromReg t0, Register:i64 %1 t5: i8,ch = load<(load (s8) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 t8: ch = CopyToReg t0, Register:i64 %0, t6 ... Replacing.1 t6: i64 = zero_extend t5 With: t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 and 0 other values ... Optimized lowered selection DAG: %bb.0 'foo:entry' SelectionDAG has 7 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %1 t9: i64,ch = load<(load (s8) from %ir.p), zext from i8> t0, t2, undef:i64 t8: ch = CopyToReg t0, Register:i64 %0, t9 Note: - Initial selection DAG: - `%a = load ...` is lowered as `t6 = (zero_extend (load ...))` w/o special `isZExtFree()` overload added by this commit it is instead lowered as `t6 = (any_extend (load ...))`. - The decision to generate `zero_extend` or `any_extend` is done in `RegsForValue::getCopyToRegs` called from `SelectionDAGBuilder::CopyValueToVirtualRegister`: - if `isZExtFree()` for load returns true `zero_extend` is used; - `any_extend` is used otherwise. - Optimized lowered selection DAG: - `t6 = (any_extend (load ...))` is replaced by `t9 = load ..., zext from i8` This is done by `DagCombiner.cpp:tryToFoldExtOfLoad()` called from `DAGCombiner::visitZERO_EXTEND`. Log for second basic block: Initial selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t5: i8 = truncate t4 t8: i1 = setcc t5, Constant:i8<0>, seteq:ch t9: i64 = any_extend t8 t11: ch,glue = CopyToReg t0, Register:i64 $r0, t9 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Replacing.2 t18: i64 = and t4, Constant:i64<255> With: t4: i64 = AssertZext t2, ValueType:ch:i8 ... Type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 13 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t18: i64 = and t4, Constant:i64<255> t16: i64 = setcc t18, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Optimized type-legalized selection DAG: %bb.1 'foo:next' SelectionDAG has 11 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t4: i64 = AssertZext t2, ValueType:ch:i8 t16: i64 = setcc t4, Constant:i64<0>, seteq:ch t11: ch,glue = CopyToReg t0, Register:i64 $r0, t16 t12: ch = BPFISD::RET_GLUE t11, Register:i64 $r0, t11:1 ... Note: - Initial selection DAG: - `t0` is an input value for this basic block, it corresponds load instruction (`t9`) from the first basic block. - It is accessed within basic block via `t4` (AssertZext (CopyFromReg t0, ...)). - The `AssertZext` is generated by RegsForValue::getCopyFromRegs called from SelectionDAGBuilder::getCopyFromRegs, it is generated only when `LiveOutInfo` with known number of leading zeros is present for `t0`. - Known register bits in `LiveOutInfo` are computed by `SelectionDAG::computeKnownBits` called from `SelectionDAGISel::ComputeLiveOutVRegInfo`. - `computeKnownBits()` generates leading zeros information for `(load ..., zext from ...)` but does not* generate leading zeros information for `(load ..., anyext from ...)`. This is why `isZExtFree()` added in this commit is important. - Type-legalized selection DAG: - `t5 = truncate t4` is replaced by `t18 = and t4, 255` - Optimized type-legalized selection DAG: - `t18 = and t4, 255` is replaced by `t4`, this is done by `DAGCombiner::SimplifyDemandedBits` called from `DAGCombiner::visitAND`, which simplifies patterns like `(and (assertzext ...))` Impact ------ This change covers all remove_truncate_.ll test cases: - for -mcpu=v4 there are no changes in the generated code; - for -mcpu=v2 code generated for remove_truncate_7 and remove_truncate_8 improved slightly, for other tests it is unchanged. For remove_truncate_7: Before this revision After this revision -------------------- ------------------- r1 <<= 0x20 r1 <<= 0x20 r1 >>= 0x20 r1 >>= 0x20 if r1 == 0x0 goto +0x2 <LBB0_2> if r1 == 0x0 goto +0x2 <LBB0_2> r1 = (u32 )(r2 + 0x0) r0 = (u32 )(r2 + 0x0) goto +0x1 <LBB0_3> goto +0x1 <LBB0_3> <LBB0_2>: <LBB0_2>: r1 = (u32 )(r2 + 0x4) r0 = (u32 )(r2 + 0x4) <LBB0_3>: <LBB0_3>: r0 = r1 exit exit For remove_truncate_8: Before this revision After this revision -------------------- ------------------- r2 = (u32 )(r1 + 0x0) r2 = (u32 )(r1 + 0x0) r3 = r2 r3 = r2 r3 <<= 0x20 r3 <<= 0x20 r4 = r3 r3 s>>= 0x20 r4 s>>= 0x20 if r4 s> 0x2 goto +0x5 <LBB0_3> if r3 s> 0x2 goto +0x4 <LBB0_3> r4 = (u32 )(r1 + 0x4) r3 = (u32 )(r1 + 0x4) r3 >>= 0x20 if r3 >= r4 goto +0x2 <LBB0_3> if r2 >= r3 goto +0x2 <LBB0_3> r2 += 0x2 r2 += 0x2 (u32 )(r1 + 0x0) = r2 (u32 )(r1 + 0x0) = r2 <LBB0_3>: <LBB0_3>: r0 = 0x3 r0 = 0x3 exit exit For kernel BPF selftests statistics is as follows: (-mcpu=v4): - For -mcpu=v4: 9 out of 655 object files have differences, in all cases total number of instructions marginally decreased (-27 instructions). - For -mcpu=v2: 9 out of 655 object files have differences: - For 19 object files number of instruction decreased (-129 instruction in total): some redundant `rX &= 0xffff` and register to register assignments removed; - For 2 object files number of instructions increased +2 instructions in each file. Both -mcpu=v2 instruction increases could be reduced to the same example: define void @foo(ptr %p) { entry: %a = load i32, ptr %p, align 4 %b = sext i32 %a to i64 %c = icmp ult i64 1, %b br i1 %c, label %next, label %end next: call void inttoptr (i64 62 to ptr)(i32 %a) br label %end end: ret void } Note that this example uses value loaded to `%a` both as a sign extended (`%b`) and as zero extended (`%a` passed as parameter). Here is the difference in final assembly code: Before this revision After this revision -------------------- ------------------- r1 = (u32 )(r1 + 0) r1 = (u32 *)(r1 + 0) r1 <<= 32 r1 <<= 32 r1 s>>= 32 r1 s>>= 32 if r1 < 2 goto <LBB0_2> if r1 < 2 goto <LBB0_2> r1 <<= 32 r1 >>= 32 call 62 call 62 <LBB0_2>: <LBB0_2>: exit exit Before this commit `%a` is passed to call as a sign extended value, after this commit `%a` is passed to call as a zero extended value, both are correct as 32-bit sub-register is the same. The difference comes from `DAGCombiner` operation on the initial DAG: Initial selection DAG before this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = any_extend t5 <--------------------- (1) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch Initial selection DAG after this commit: t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 t6: i64 = zero_extend t5 <--------------------- (2) t8: ch = CopyToReg t0, Register:i64 %0, t6 t9: i64 = sign_extend t5 t12: i1 = setcc Constant:i64<1>, t9, setult:ch The node `t9` is processed before node `t6` and `load` instruction is combined to load with sign extension: Replacing.1 t9: i64 = sign_extend t5 With: t30: i64,ch = load<(load (s32) from %ir.p), sext from i32> t0, t2, undef:i64 and 0 other values Replacing.1 t5: i32,ch = load<(load (s32) from %ir.p)> t0, t2, undef:i64 With: t31: i32 = truncate t30 and 1 other values This is done by `DAGCombiner.cpp:tryToFoldExtOfLoad` called from `DAGCombiner::visitSIGN_EXTEND`. Note that `t5` is used by `t6` which is `any_extend` in (1) and `zero_extend` in (2). `tryToFoldExtOfLoad()` rewrites such uses of `t5` differently: - `any_extend` is simply removed - `zero_extend` is replaced by `and t30, 0xffffffff`, which is later converted to a pair of shifts. This pair of shifts survives till the end of translation. Differential Revision: https://reviews.llvm.org/D157870	2023-08-22 00:04:51 +03:00
Fangrui Song	77596e6b16	Revert D157750 "[Driver][CodeGen] Properly handle -fsplit-machine-functions for fatbinary compilation." This reverts commit 317a0fe5bd7113c0ac9d30b2de58ca409e5ff754. This reverts commit 30c4b97aec60895a6905816670f493cdd1d7c546. See post-commit discussions on https://reviews.llvm.org/D157750 that we should use a different mechanism to handle the error with --cuda-gpu-arch= The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in clang/test/Driver, and the following driver behavior (downgrading error to warning) changes are undesired. ``` % clang --target=riscv64 -fsplit-machine-functions -c a.c warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin] ```	2023-08-21 13:54:15 -07:00
Philip Reames	dd0d36d09f	[RISCVInsertVSETVLI] Handle vl-preserve case in backwards rewrite This updates the backwards mutation code to handle the case where the previous vset was in vl-preserving (x0, x0) form, but that VL was never used before the next vset which changes the VL. Since this requires writing both VL operands, eliminate the restriction on removing GPR producing vsetv as well. (The register will now be written by the earlier vsetv.) Differential Revision: https://reviews.llvm.org/D158019	2023-08-21 12:28:28 -07:00
Craig Topper	479716d954	[RISCV][GISel] Make G_SEXT_INREG with source size of 32 legal for RV64. This maps to the sext.w instruction. As far as I could tell this needs custom lowering to check the immediate. Reviewed By: nitinjohnraj Differential Revision: https://reviews.llvm.org/D158350	2023-08-21 10:43:10 -07:00
Daniel Hoekwater	e223e45677	Reland "[AArch64][CodeGen] Avoid inverting hot branches during relaxation"" This is a reland of 46d2d7599d9ed5e68fb53e910feb10d47ee2667b, which was reverted because of breaking build https://lab.llvm.org/buildbot/#/builders/21/builds/78779. However, this buildbot is spuriously broken due to Flang::underscoring.f90 being nondeterministic.	2023-08-21 17:29:47 +00:00
Daniel Hoekwater	0303137bfc	Revert "[AArch64][CodeGen] Avoid inverting hot branches during relaxation" This reverts commit 46d2d7599d9ed5e68fb53e910feb10d47ee2667b. Breaks build https://lab.llvm.org/buildbot/#/builders/21/builds/78779	2023-08-21 17:13:35 +00:00
Daniel Hoekwater	46d2d7599d	[AArch64][CodeGen] Avoid inverting hot branches during relaxation Current behavior for relaxing out-of-range conditional branches is to invert the conditional and insert a fallthrough unconditional branch to the original destination. This approach biases the branch predictor in the wrong direction, which can degrading performance. Machine function splitting introduces many rarely-taken cross-section conditional branches, which are improperly relaxed. Avoid inverting these branches; instead, retarget them to trampolines at the end of the function. Doing so increases the runtime cost of jumping to cold code but eliminates the misprediction cost of jumping to hot code. Differential Revision: https://reviews.llvm.org/D156837	2023-08-21 16:41:02 +00:00
Harvin Iriawan	7ba4896ecf	[AArch64][NFC] Fix stack-guard-sysreg.ll Fix test updated by commit db158c7c830807caeeb0691739c41f1d522029e9 Differential Revision: https://reviews.llvm.org/D158432	2023-08-21 17:30:53 +01:00
Stefan Pintilie	d0e1e7649b	[NFC][PowerPC] Add a test case for rotate and clear. Added a test case for situations where a rotate is followed by a clear. NFC because only a test case is added.	2023-08-21 11:01:47 -04:00
Harvin Iriawan	db034da211	[AArch64][NFC] Update test related to a510 sched update Missed updating the win64_vararg2.ll test on db158c7c830807caeeb0691739c41f1d522029e9 Differential Revision: https://reviews.llvm.org/D158410	2023-08-21 13:03:56 +01:00
Harvin Iriawan	db158c7c83	[AArch64] Update generic sched model to A510 Refresh of the generic scheduling model to use A510 instead of A55. Main benefits are to the little core, and introducing SVE scheduling information. Changes tested on various OoO cores, no performance degradation is seen. Differential Revision: https://reviews.llvm.org/D156799	2023-08-21 12:25:15 +01:00
Martin Storsjö	955d7615bd	[AArch64] [GlobalISel] Fix clobbered callee saved registers with win64 varargs This fixes a regression since 1c10d5b175992a9d056a2d763a932e5652386fc1 / https://reviews.llvm.org/D130903 by applying the same fix from SelectionDAG from 8cb3667541a94c4fa11b06e19020f753414c1d03 / https://reviews.llvm.org/D35720. This could possibly have been detected if the existing testcases in win64_vararg.ll had been tested with GlobalISel too, but all the IR snippets there fail to be translated with GlobalISel. This adds a separate testcase based on real world LLVM IR (instead of hand-reduced IR), which GlobalISel does translate happily - tested with both SelectionDAG and GlobalISel. Before this change, the stack object locations (visible in MIR with "llc -print-after-all") didn't match with what the prologue emitted by AArch64FrameLowering actually looked like, which caused clobbered callee saved registers when function local stack objects aliased the actual location of the callee saved registers. This fixes https://github.com/llvm/llvm-project/issues/64740. Differential Revision: https://reviews.llvm.org/D158272	2023-08-21 14:08:23 +03:00
wangpc	dc60003ec8	[RISCV] Support global address as inline asm memory operand of `m` In D146245, we have supported lowering inline asm `m` with offset to `register+imm`, but we didn't handle the case that the offset is the low part of global address. This patch will emit `%lo(g)` when `g` is a global address. Fixes #64656 Reviewed By: asb Differential Revision: https://reviews.llvm.org/D157839	2023-08-21 18:59:52 +08:00
wangpc	a3b11ce786	[RISCV][NFC] Move tests of inline asm memory constraints to separate file We will need to check the output of medium code model. Reviewed By: wangpc Differential Revision: https://reviews.llvm.org/D157965	2023-08-21 18:59:52 +08:00
Diana Picus	5272ae667d	[AMDGPU] Add IsChainFunction to the MachineFunctionInfo This will represent functions with the amdgpu_cs_chain or amdgpu_cs_chain_preserve calling conventions. Differential Revision: https://reviews.llvm.org/D156410	2023-08-21 12:37:32 +02:00
Simon Pilgrim	ba818c4019	[DAG] replaceStoreOfInsertLoad - don't fold if the inserted element is implicitly truncated D152276 wasn't handling the case where the inserted element is implicitly truncated into the vector - resulting in a i1 element (implicitly truncated from i8) overwriting 8 bits instead of 1 bit. This patch is intended to be merged into 17.x so I've just disallowed any vector element vs inserted element type mismatch - technically we could be more elegant and permit truncated stores (as long as the store is still byte sized), but the use cases for that are so limited I'd prefer to play it safe for now. Candidate patch for #64655 17.x merge Differential Revision: https://reviews.llvm.org/D158366	2023-08-21 11:22:07 +01:00
Nikita Popov	69bd66b3ce	[Tests] Remove some and/or constant expressions in tests (NFC) In preparation for their removal in D158081.	2023-08-21 12:05:32 +02:00
Nikita Popov	fa1b6e6b34	[X86] Fix i128 argument passing under SysV ABI The x86_64 SysV ABI specifies that __int128 is passed either in two registers (if available) or in a 16 byte aligned stack slot. GCC implements this behavior. However, if only one free register is available, LLVM will instead pass one half of the i128 in a register, and the other on the stack. Make sure that either both are passed in registers or both on the stack. Fixes https://github.com/llvm/llvm-project/issues/41784. The patch is basically what craig.topper proposed to do there. Differential Revision: https://reviews.llvm.org/D158169	2023-08-21 11:44:35 +02:00
Diana Picus	26dc284498	[AMDGPU] ISel for amdgpu_cs_chain[_preserve] functions Lower formal arguments and returns for functions with the `amdgpu_cs_chain` and `amdgpu_cs_chain_preserve` calling conventions: * Put `inreg` arguments into SGPRs, starting at s0, and other arguments into VGPRs, starting at v8. No arguments should end up on the stack, if we don't have enough registers we should error out. * Lower the return (which is always void) as an S_ENDPGM. * Set the ScratchRSrc register to s48:51, as described in the docs. * Set the SP to s32, matching amdgpu_gfx. This might be revisited in a future patch. Differential Revision: https://reviews.llvm.org/D153517	2023-08-21 11:16:17 +02:00
Tuan Chuong Goh	a40c984976	[AArch64][GlobalISel] Support more legal types for EXTEND Expand (s/z/any)ext instructions to be compatible with more types for GlobalISel. This patch mainly focuses on 64-bit and 128-bit vectors with element size of powers of 2. It also notably handles larger than legal vectors. Differential Revision: https://reviews.llvm.org/D157113	2023-08-21 09:51:17 +01:00

... 62 63 64 65 66 ...

52796 Commits