llvm-project

Author	SHA1	Message	Date
Philip Reames	d9942319d7	{RISCV] Adjust check lines to reduce duplication	2023-09-25 11:25:36 -07:00
Philip Reames	0eaf11ff41	[RISCV] Add test coverage for buildvec-of-binops	2023-09-25 11:12:04 -07:00
Matthias Braun	740ee00a4c	PPCBranchCoalescing: Fix invalid branch weights (#67211 ) Re-normalize branch-weights after removing a block successor to avoid branch-weights not adding up to 100%. This changes MIR for the `test/CodeGen/PowerPC/branch_coalesce.ll` test like this: ```diff - successors: %bb.6(0x40000000); %bb.6(50.00%) + successors: %bb.6(0x80000000); %bb.6(100.00%) ``` This doesn't affect codegen on its own but fixing this helps with fluctuations I have with some of my upcoming changes.	2023-09-25 10:41:04 -07:00
Ivan Kosarev	053478bbd0	[AMDGPU] Switch to using real True16 operands. The DPP source and e64 destination operands remain unchanged for now. Reviewed By: Joe_Nash Differential Revision: https://reviews.llvm.org/D156104	2023-09-25 18:21:13 +01:00
Austin Kerbow	0455596e1e	[AMDGPU] Add DAG ISel support for preloaded kernel arguments This patch adds the DAG isel changes for kernel argument preloading. These changes are not usable with older firmware but subsequent patches in the series will make the codegen backwards compatible. This patch should only be submitted alongside that subsequent patch. Preloading here begins from the start of the kernel arguments until the amount of arguments indicated by the CL flag amdgpu-kernarg-preload-count. Aggregates and arguments passed by-ref are not supported. Special care for the alignment of the kernarg segment is needed as well as consideration of the alignment of addressable SGPR tuples when we cannot directly use misaligned large tuples that the arguments are loaded to. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D158579	2023-09-25 09:32:59 -07:00
Austin Kerbow	7b70af297a	[AMDGPU] Add IR lowering changes for preloaded kernargs Preloaded kernel arguments should not be lowered in the IR pass AMDGPULowerKernelArguments. Therefore it's necessary to calculate the total number of user SGPRs that are available for preloading and how many SGPRs would be required to preload each argument to determine whether we should skip lowering i.e. the argument will be preloaded instead. Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D156853	2023-09-25 08:54:07 -07:00
Philip Reames	95ce3c23c2	[RISCV] Be more aggressive about shrinking constant build_vector etype (#67175 ) If LMUL is more than m1, we can be more aggressive about narrowing the build_vector via a vsext if legal. If the narrow build_vector gets lowered as a load, while both are linear in lmul, load uops are generally more expensive than extend uops. If the narrow build_vector gets lowered via dominant values, that work is linear in both #unique elements and LMUL. So provided the number of unique values > 2, this is a net win in work performed.	2023-09-25 08:09:46 -07:00
Mark Harley	eb96d6e2fb	[AArch64][GlobalISel] Vector Constant Materialization Vector constants are always lowered via constant pool loads. This patch selects MOVI/MVNI in more cases where appropriate.	2023-09-25 13:40:33 +01:00
Diana Picus	327fdcf789	Revert "AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 )" This reverts commit a04603993b43e5ebac1531293d288315f1885886 because it broke the OpenMP buildbot.	2023-09-25 13:40:38 +02:00
Diana	a04603993b	AMDGPU: Duplicate instead of COPY constants from VGPR to SGPR (#66882 ) Teach the si-fix-sgpr-copies pass to deal with REG_SEQUENCE, PHI or INSERT_SUBREG where the result is an SGPR, but some of the inputs are constants materialized into VGPRs. This may happen in cases where for instance several instructions use an immediate zero and SelectionDAG chooses to put it in a VGPR to satisfy all of them. This however causes the si-fix-sgpr-copies to try to switch the whole chain to VGPR and may lead to illegal VGPR-to-SGPR copies. Rematerializing the constant into an SGPR fixes the issue.	2023-09-25 13:20:08 +02:00
Momchil Velikov	c649fd34e9	[MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode This patch adds a new code transformation to the `MachineSink` pass, that tries to sink copies of an instruction, when the copies can be folded into the addressing modes of load/store instructions, or replace another instruction (currently, copies into a hard register). The criteria for performing the transformation is that: * the register pressure at the sink destination block must not exceed the register pressure limits * the latency and throughput of the load/store or the copy must not deteriorate * the original instruction must be deleted Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D152828	2023-09-25 10:49:44 +01:00
David Green	54e5de08d4	[ARM][LSR] Exclude uses outside the loop when favoring postinc. (#67090 ) Extra uses for variables outside the loop can mess with the generation of postinc variables. This patch alters the collection of loop invariant fixups in LSR when the target is optimizing for PostInc, to exclude the collection of these extra uses. It is expected that the variable can be rematerialized, which will lead to a more optimal sequence of instructions in the loop.	2023-09-25 10:09:36 +01:00
chuongg3	45f51f9f7c	[AArch64][GlobalISel] Select UMULL instruction (#65469 ) Global ISel now selects `UMULL` and `UMULL2` instructions. G_MUL instruction with input operands coming from `SEXT` or `ZEXT` operations are turned into UMULL G_MUL instructions with v2s64 result type is always scalarised except: `mul ( unmerge( ext ), unmerge( ext ))` So the extend could be unmerged and fold away the unmerge in the middle: `mul ( unmerge( ext ), unmerge( ext ))` => `mul ( unmerge( merge( ext( unmerge )), unmerge( merge( ext( unmerge ))))` => `mul ( ext(unmerge)), ( ext( unmerge ))) `	2023-09-25 09:34:51 +01:00
Paulo Matos	0564065709	[SPIRV] Implement support for SPV_KHR_expect_assume (#66217 ) Adds new extension SPV_KHR_expect_assume, new capability ExpectAssumeKHR as well as the new instructions: * OpExpectKHR * OpAssumeTrueKHR These are lowered from respectively llvm.expect.<ty> and llvm.assume intrinsics. Previously https://reviews.llvm.org/D157696	2023-09-25 09:52:42 +02:00
Brandon Wu	408b0810ba	[RISCV] Support floating point VCIX (#67094 )	2023-09-25 13:19:21 +08:00
Yeting Kuo	7c70e50b8e	[RISCV] Fix wrong offset use caused by missing the size of Zcmp push. (#66613 ) This fixes two wrong offset uses, 1. .cfi_offset of callee saves are not pushed by cm.push. 2. Reference of frame objests by frame pointer.	2023-09-25 12:05:05 +08:00
XChy	fc86d031fe	[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275 ) This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to handle the similar cases below. ```llvm define i8 @src(i8 noundef %arg) { start: switch i8 %arg, label %unreachable [ i8 0, label %case012 i8 1, label %case1 i8 2, label %case2 i8 3, label %end ] unreachable: unreachable case1: br label %case012 case2: br label %case012 case012: %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ] br label %end end: %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ] ret i8 %phi2 } ``` The phis here should be merged into one phi, so that we can better optimize it: ```llvm define i8 @tgt(i8 noundef %arg) { start: switch i8 %arg, label %unreachable [ i8 0, label %end i8 1, label %case1 i8 2, label %case2 i8 3, label %case3 ] unreachable: unreachable case1: br label %end case2: br label %end case3: br label %end end: %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ] ret i8 %phi } ``` Proof: [normal](https://alive2.llvm.org/ce/z/vAWi88) [multiple stages](https://alive2.llvm.org/ce/z/DDBQqp) [multiple stages 2](https://alive2.llvm.org/ce/z/nGkeqN) [multiple phi combinations](https://alive2.llvm.org/ce/z/VQeEdp) And lookup table optimization should convert it into add %arg 1. This patch just match similar CFG structure and merge the phis in different cases. Maybe such transform can be applied to other situations besides switch, but I'm not sure whether it's better than not merging. Therefore, I only try it in switch, Related issue: #63876 [Migrated](https://reviews.llvm.org/D155940)	2023-09-25 10:13:45 +08:00
Simon Pilgrim	8b36d082c4	[DAG] getNode() - fold (zext (trunc x)) -> x iff the upper bits are known zero - add SRL support This is part of the work to address the D155472 regressions, there's a number of issues with generalizing this fold which is why I'm just adding SRL support atm. Differential Revision: https://reviews.llvm.org/D159533	2023-09-24 13:40:07 +01:00
Simon Pilgrim	142efd6d61	[AMDGPU] Add ISD::FSHR Handling to AMDGPUISD::PERM matching Pulled out of D159533, which encourages (zext (trunc x)) -> x folds, leading to more ISD::FSHR nodes, which was breaking some existing AMDGPUISD::PERM tests Differential Revision: https://reviews.llvm.org/D159533	2023-09-24 13:40:07 +01:00
Ivan Kosarev	fab28e0e14	Reapply "[AMDGPU] Introduce real and keep fake True16 instructions." Reverts 6cb3866b1ce9d835402e414049478cea82427cf1. Analysis of failures on buildbots with expensive checks enabled showed that the problem was triggered by changes in another commit, 469b3bfad20550968ac428738eb1f8bb8ce3e96d, and was caused by the bug addressed in #67245.	2023-09-23 22:07:41 +01:00
Fangrui Song	e01df8716a	[NVPTX] Test crash introduced by #67073 The test is adapted from https://reviews.llvm.org/D46008	2023-09-23 10:42:02 -07:00
Noah Goldstein	bc38c427d4	[DAGCombiner][AArch64] Fix incorrect cast VT in `takeInexpensiveLog2` Previously, we where taking `CurVT` before finalizing `ToCast` which meant potentially returning an `SDValue` with an illegal `ValueType` for the operation. Fix is to just take `CurVT` after we have finalized `ToCast` with `PeekThroughCastsAndTrunc`.	2023-09-23 09:50:42 -05:00
Zhuojia Shen	bcc5b48b0f	Reapply "[AArch64] Merge LDRSWpre-LD[U]RSW pair into LDPSWpre" This reverts commit 0def4e6b0f638b97a73bd4674365961d8fabda28, applies a quick fix that disallows merging two pre-indexed loads, and adds MIR regression tests. Differential Revision: https://reviews.llvm.org/D152407	2023-09-22 21:08:07 -07:00
Fangrui Song	d9a0163e27	Revert "[NVPTX] Improve lowering of v2i16 logical ops. (#67073 )" This reverts commit 648579006234b7608549cf708c07aac4d6283a1f. Caused xla/tests:float8_test_gpu to fail ``` LLVM ERROR: Cannot select: t118: v2i16 = or t375, t401 t375: v2i16 = BUILD_VECTOR t374, t372 t374: i16 = select t247, Constant:i16<8960>, t360 t247: i1 = setcc t199, Constant:i16<7>, seteq:ch t199: i16 = extract_vector_elt t187, Constant:i64<0> t187: v2i16 = and t183, t410 t183: v2i16 = BUILD_VECTOR t383, t384 ... ``` Acked by author to revert	2023-09-22 19:24:18 -07:00
Craig Topper	972df2cecc	[RISCV][GISel] Emit G_CONSTANT 0 as a copy from X0. (#67202 ) We need to use a COPY so the register coalescer can replace reads of the register we copy to with X0. This is needed so that we use X0 on instructions that don't have an immediate form. This was reviewed as #67202.	2023-09-22 17:04:11 -07:00
Craig Topper	7cd01afb73	[RISCV][GISel] Add test showing missed opportunity to use X0 for the LHS of sub for negate. I had to disable the late copy propagation pass that can see through the ADDI we were previously emitting. We really want to get this in the register coalescer if not even earlier.	2023-09-22 17:04:11 -07:00
Rahman Lavaee	897a0b01d6	[BasicBlockSections] Split cold parts of custom-section functions. (#66731 ) This PR makes `-basic-block-sections` handle functions with custom non-dot-text sections correctly. Cold parts of such functions must be placed in the same section (not in `.text.split`) but with a unique id.	2023-09-22 13:49:12 -07:00
Philip Reames	233b6ef66c	[RISCV] Handle EltType > XLEN case in VMV_V_X_VL to VMV_S_X_VL fold I'd guarded this case in D158874 to avoid regressions, and decided to go investigate what was going on. The solution turns out to be a generic splat matching extension to handle INSERT_SUBVECTOR. In theory, we could see these from other sources as well, but for some reason we only seem to see the i64 extract on rv32 case in practice. Not sure why that is to be honest. Differential Revision: https://reviews.llvm.org/D159230	2023-09-22 13:43:43 -07:00
Craig Topper	98eb28b621	[RISCV][GISel] Implement instruction selection for G_PHI and G_BRCOND. This uses a naive lowering for G_BRCOND to a BNE instruction comparing the register to X0.	2023-09-22 13:18:42 -07:00
Artem Belevich	6485790062	[NVPTX] Improve lowering of v2i16 logical ops. (#67073 ) Bitwise logical ops can always be done as b32, regardless of availability of other v2i16 ops, that would need a new GPU.	2023-09-22 13:05:39 -07:00
Matt Harding	64d1ceaa38	Add command line option --no-trap-after-noreturn (#67051 ) Add the command line option --no-trap-after-noreturn, which exposes the pre-existing TargetOption `NoTrapAfterNoreturn`. This pull request was split off from this one: https://github.com/llvm/llvm-project/pull/65876	2023-09-22 22:03:21 +02:00
Rahman Lavaee	6ac71a0149	[BasicBlockSections] Introduce the basic block sections profile version 1. (#65506 ) This patch introduces a new version for the basic block sections profile as was requested in D158442, while keeping backward compatibility for the old version. The new encoding is as follows: ``` m <module_name> f <function_name_1> <function_name_2>... c <bb_id_1> <bb_id_2> <bb_id_3> c <bb_id_4> <bb_id_5> ... ``` Module name specifier (starting with 'm') is optional and allows distinguishing profiles for internal-linkage functions with the same name. If not specified, profile will be applied to any function with the same name. Function name specifier (starting with 'f') can specify multiple function name aliases. Finally, basic block clusters are specified by 'c' and specify the cluster of basic blocks, and the internal order in which they must be placed in the same section.	2023-09-22 12:37:04 -07:00
Nemanja Ivanovic	46d5d264fc	[PowerPC] Improve kill flag computation and add verification after MI peephole The MI Peephole pass has grown to include a large number of transformations over the years. Many of the transformations require re-computation of kill flags but don't do a good job of re-computing them. This causes us to have very common failures when the compiler is built with expensive checks. Over time, we added and augmented a function that is supposed to go and fix up kill flags after each transformation but we keep missing cases. This patch does the following: - Removes the function to re-compute kill flags - Adds LiveVariables to compute and maintain kill flags while transforming code - Adds re-computation of kill flags for the post-RA peepholes for each block that contains a transformed instruction Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D133103	2023-09-22 15:26:39 -04:00
Craig Topper	8e87dc10b8	[RISCV][GISel] Add a post legalizer combiner and enable a couple comb… (#67053 ) …ines. We have an existing test that shows benefit from redundant_and and identity combines so use them as a starting point.	2023-09-22 10:13:56 -07:00
Craig Topper	ec5b0ef7d7	[RISCV] Truncate constants to eltwidth before checking simm5 when con… (#67062 ) …verting VMV_V_X to VMV_X_S. Instruction selection knows the bits past EltWidth are ignored, we should do the same here.	2023-09-22 10:12:12 -07:00
Luke Lau	3510552df6	[RISCV] Check for COPY_TO_REGCLASS in usesAllOnesMask (#67037 ) Sometimes with mask vectors that have been widened, there is a CopyToRegClass node in between the VMSET and the CopyToReg. This is a resurrection of https://reviews.llvm.org/D148524, and is needed to remove the mask operand when it's extracted from a subvector as planned in https://github.com/llvm/llvm-project/pull/66267#discussion_r1331998919	2023-09-22 16:30:43 +01:00
Ivan Kosarev	6cb3866b1c	Revert "[AMDGPU] Introduce real and keep fake True16 instructions." This reverts commit 0f864c7b8bc9323293ec3d85f4bd5322f8f61b16 due to failures on expensive checks.	2023-09-22 15:40:26 +01:00
Mirko Brkusanin	72e3713009	[IRTranslator] Set NUW flag for inbounds gep and load/store offsets Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D159515	2023-09-22 16:16:28 +02:00
Simon Pilgrim	5b8204b221	[X86] SandyBridge ymm broadcast loads use port5 + port23 Unlike the per-lane mov*dup broadcast shuffles, broadcastsd/ss need port5 to splat across lanes Found while reviewing a llvm-exegesis capture (and matches Agner + uops.info numbers) - I can't find any more easy wins from these captures so that will be it for now.	2023-09-22 15:10:27 +01:00
Paulo Matos	e7651e60a2	[SPIRV] Add support for SPV_KHR_bit_instructions (#66215 ) Adds support for SPV_KHR_bit_instructions. It is only used whenever we don't need the whole Shader capability, which is a superset of this extension.	2023-09-22 14:44:21 +02:00
David Green	8b4ca0aa4e	[AArch64] Expand log/exp tests. NFC This is extra testing for exp exp2 log log10 and log2 undef global isel.	2023-09-22 13:33:23 +01:00
Anatoly Trosinenko	eb02ee44d3	[AArch64] Move PAuth codegen down the machine pipeline To simplify handling PAuth in the machine outliner, introduce a separate AArch64PointerAuth pass that is executed after both Prologue/Epilogue Inserter and Machine Outliner passes. After moving to AArch64PointerAuth, signLR and authenticateLR are not used outside of their class anymore, so make them private and simplify accordingly. The new pass is added via AArch64PassConfig::addPostBBSections(), so that it can change the code size before branch relaxation occurs. AArch64BranchTargets is placed there too, so it can take into account any PACI(A\|B)SP instructions and not excessively add BTIs at the start of functions. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D159357	2023-09-22 14:49:14 +03:00
David Green	963268c52b	[AArch64] Expand Sin/Cos GlobalISel testing. NFC This fills out some extra cases for sin/cos testing for various types under Global ISel, which seem to all do OK. The existing tests in sincospow-vector-expansion.ll can be removed, as they are now covered elsewhere.	2023-09-22 12:27:30 +01:00
Mirko Brkusanin	a657deb42e	[AMDGPU] Update RUN line in test (NFC)	2023-09-22 12:41:54 +02:00
DianQK	d200bd1a7d	Reland "[SimplifyCFG] Hoist common instructions on switch" (#67077 ) This relands commit 96ea48ff5dcba46af350f5300eafd7f7394ba606.	2023-09-22 18:29:59 +08:00
Nikita Popov	aa70f4d8cf	[StackColoring] Handle fixed object index This is a followup to #66988. The implementation there did not account for the possibility of the catch object frame index referrring to a fixed object, which is the case on win64.	2023-09-22 12:28:38 +02:00
Ivan Kosarev	c62f208c05	[AMDGPU] Don't suppress printing the .l and .h register suffixes. We don't seem to have a use for the -amdgpu-keep-16-bit-reg-suffixes option anymore. Was introduced in <https://reviews.llvm.org/D79435>. Reviewed By: Joe_Nash, foad Differential Revision: https://reviews.llvm.org/D156102	2023-09-22 11:13:05 +01:00
Simon Pilgrim	b61b2426ac	[DAG] getNode() - remove oneuse limit from (zext (trunc (assertzext x))) -> (assertzext x) fold (REAPPLIED) Noticed on D159533 and I've finally dealt with the x86 regressions - MatchingStackOffset wasn't peeking through AssertZext nodes while trying to find CopyFromReg/Load sources, it was only removing them if they were part of a (trunc (assertzext x)) pattern. Reapplied after being reverted at 4389252c58b783ce5b - which should be addressed by D159537 / 6d2679992e58b	2023-09-22 11:01:38 +01:00
Ivan Kosarev	0f864c7b8b	[AMDGPU] Introduce real and keep fake True16 instructions. The existing fake True16 instructions using 32-bit VGPRs are supposed to co-exist with real ones until all the necessary True16 functionality is implemented and relevant tests are updated. Reviewed By: arsenm, Joe_Nash Differential Revision: https://reviews.llvm.org/D156101	2023-09-22 10:57:56 +01:00
Nikita Popov	b3cb4f069c	[StackColoring] Handle SEH catch object stack slots conservatively The write to the SEH catch object happens before cleanuppads are executed, while the first reference to the object will typically be in a catchpad. If we make use of first-use analysis, we may end up allocating an alloca used inside the cleanuppad and the catch object at the same stack offset, which would be incorrect. https://reviews.llvm.org/D86673 was a previous attempt to fix it. It used the heuristic "a slot loaded in a WinEH pad and never written" to detect catch objects. However, because it checks for more than one load (while probably more than zero was intended), the fix does not actually work. The general approach also seems dubious to me, so this patch reverts that change entirely, and instead marks all catch object slots as conservative (i.e. excluded from first-use analysis) based on the WinEHFuncInfo. As far as I can tell we don't need any heuristics here, we know exactly which slots are affected. Fixes https://github.com/llvm/llvm-project/issues/66984.	2023-09-22 11:50:30 +02:00

... 52 53 54 55 56 ...

52796 Commits