llvm-project

Author	SHA1	Message	Date
spupyrev	cc2fbc648d	[CodeLayout] Faster basic block reordering, ext-tsp (#68617 ) Aggressive inlining might produce huge functions with >10K of basic blocks. Since BFI treats _all_ blocks and jumps as "hot" having non-negative (but perhaps small) weight, the current implementation can be slow, taking minutes to produce an layout. This change introduces a few modifications that significantly (up to 50x on some instances) speeds up the computation. Some notable changes: - reduced the maximum chain size to 512 (from the prior 4096); - introduced MaxMergeDensityRatio param to avoid merging chains with very different densities; - dropped a couple of params that seem unnecessary. Looking at some "offline" metrics (e.g., the number of created fall-throughs), there shouldn't be problems; in fact, I do see some metrics go up. But it might be hard/impossible to measure perf difference for such small changes. I did test the performance clang-14 binary and do not record a perf or i-cache-related differences. My 5 benchmarks, with ext-tsp runtime (the lower the better) and "tsp-score" (the higher the better). Before: - benchmark 1: num functions: 13,047 reordering running time is 2.4 seconds score: 125503458 (128.3102%) - benchmark 2: num functions: 16,438 reordering running time is 3.4 seconds score: 12613997277 (129.7495%) - benchmark 3: num functions: 12,359 reordering running time is 1.9 seconds score: 1315881613 (105.8991%) - benchmark 4: num functions: 96,588 reordering running time is 7.3 seconds score: 89513906284 (100.3413%) - benchmark 5: num functions: 1 reordering running time is 372 seconds score: 21292505965077 (99.9979%) - benchmark 6: num functions: 71,155 reordering running time is 314 seconds score: 29795381626270671437824 (102.7519%) After: - benchmark 1: reordering running time is 2.2 seconds score: 125510418 (128.3130%) - benchmark 2: reordering running time is 2.6 seconds score: 12614502162 (129.7525%) - benchmark 3: reordering running time is 1.6 seconds score: 1315938168 (105.9024%) - benchmark 4: reordering running time is 4.9 seconds score: 89518095837 (100.3454%) - benchmark 5: reordering running time is 4.8 seconds score: 21292295939119 (99.9971%) - benchmark 6: reordering running time is 104 seconds score: 29796710925310302879744 (102.7565%)	2023-10-25 07:52:26 -07:00
Vladislav Dzhidzhoev	2dea7bd8a0	[AArch64][GlobalISel] Legalize NEON smin,smax,umin,umax,fmin,fmax intrinsics Replace these intrinsics with the corresponding GISel operators during legalization stage to reuse available selection patterns.	2023-10-25 16:02:44 +02:00
Simon Pilgrim	ac534d2a16	[X86] combineArithReduction - use PACKUSWB directly for PSADBW(TRUNCATE(v8i16 X)) reduction patterns Avoids a crash in the D152928 patch due to a reduction pattern appearing after legalization We can probably extend this further to avoid truncating to sub-128-bit vXi8 (and then calling WidenToV16I8) entirely, but we can't currently hit other cases.	2023-10-25 14:56:58 +01:00
Nikita Popov	d9cfb82207	[AArch64] Add test for #70207 (NFC)	2023-10-25 15:43:33 +02:00
Simon Pilgrim	a8913f8e04	[X86] Regenerate pr38539.ll Even though we're only interested in the X64 codegen for the first test, its much easier to maintain if we just let the update script generate the codegen checks for X86 as well.	2023-10-25 13:25:15 +01:00
Jay Foad	c82ebfb97a	Revert "[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider" This reverts commit ef33659492325de7871c8c85e35bd9c1c37f7347. It was causing incorrect codegen for some Vulkan CTS tests.	2023-10-25 11:11:27 +01:00
Momchil Velikov	9d35387811	[AArch64] Disable by default MachineSink sink-and-fold (#70101 ) There is a report about a large compile time regression in V8 when generating debug info.	2023-10-25 10:58:31 +01:00
Oliver Stannard	7e8eccd990	[AArch64] Move SLS later in pass pipeline Currently, the SLS hardening pass is run before the machine outliner, which means that the outliner creates new functions and calls which do not have the SLS hardening applied. The fix for this is to move the SLS passes to after the outliner, as has recently been done for the return address signing pass. This also avoids a bug where the SLS outliner emits code with instructions after a return, which the outliner doesn't correctly handle. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D158511	2023-10-25 10:45:12 +01:00
Oliver Stannard	5640d28201	[AArch64] Add test showing incorrect code-gen Differential Revision: https://reviews.llvm.org/D158512	2023-10-25 10:28:50 +01:00
Craig Topper	34af57c5c1	[RISCV][GISel] Add G_SEXTLOAD to legalizer and regbank select. Add instruction selection tests. This updates our G_SEXTLOAD support to the same level as G_ZEXTLOAD. Still missing some legalizer rules for both though.	2023-10-25 00:13:21 -07:00
Craig Topper	35d771fd4f	[RISCV][GISel] Fix failure to legalize non-power of 2 shifts between i32 and i64 on RV64. We weren't legalizing the shift amount to i64.	2023-10-24 23:30:41 -07:00
Matthias Braun	e3cf80c5c1	BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that: * Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room. * Spread the difference between hottest/coldest block as much as possible to increase precision. * If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.	2023-10-24 20:27:39 -07:00
Ruiling, Song	ac24238002	[LowerSwitch] Don't let pass manager handle the dependency (#68662 ) Some passes has limitation that only support simple terminators: branch/unreachable/return. Right now, they ask the pass manager to add LowerSwitch pass to eliminate `switch`. Let's manage such kind of pass dependency by ourselves. Also add the assertion in the related passes.	2023-10-25 09:24:36 +08:00
Min-Yih Hsu	cdcaef876c	[RISCV][GISel] Add ISel support for SHXADD_UW and SLLI.UW (#69972 ) This patch also includes: - Remove legacy non_imm12 PatLeaf from RISCVInstrInfoZb.td - Implement a custom GlobalISel operand renderer for TrailingZeros SDNodeXForm	2023-10-24 16:26:38 -07:00
Luke Lau	b2accb9d8e	[RISCV] Mark V0 regclasses as larger superclasses of non-V0 classes (#70109 )	2023-10-24 22:13:17 +01:00
Amara Emerson	1b11729dc0	[AArch64][GlobalISel] Add support for post-indexed loads/stores. (#69532 ) Gives small code size improvements across the board at -Os CTMark. Much of the work is porting the existing heuristics in the DAGCombiner.	2023-10-24 13:51:59 -07:00
Benjamin Kramer	4c600bd117	[NVPTX] Add a test to verify the .version with sm_90(a)	2023-10-24 18:01:48 +02:00
Mircea Trofin	ec0645939b	Revert "[mlgo] Fix tests post 760e7d0" This reverts commit ab91e05e48d9ea47b60858dc259bdbf00dfde7fa. This is because 760e7d0 has been reverted in 3fb5b18.	2023-10-24 08:43:32 -07:00
Brandon Wu	7cce908367	[RISCV][GISel][NFC] Correct the test case in constant32.mir (#70003 )	2023-10-24 22:57:05 +08:00
Simon Pilgrim	2df69ed14c	[X86] Add scalar isel test coverage for AND/OR/XOR types Even something as simple as bitlogic ops are showing differences between DAG/Fast/Global ISel - promotion, commutation, load/rmw folding etc.	2023-10-24 15:13:17 +01:00
Simon Pilgrim	f2eef3fab6	[DAG] Add test case for Issue #69965	2023-10-24 13:58:18 +01:00
Benjamin Kramer	858d6a15a0	[wasm] Don't crash on non-simple value types during shuffle combine These still exist during the DAGCombine phase.	2023-10-24 12:35:43 +02:00
Stanislav Mekhanoshin	945e943db7	[AMDGPU] Fix subreg check in the SIFixSGPRCopies (#70007 ) It checks for the copy of subregs, but it checks destination which may never happen in SSA. It misses the subreg check and happily produces S_MOV_B64 out of a subreg COPY. The affected test should have never been formed in the first place because the pass is running in SSA and copies into a subreg shall never happen.	2023-10-24 01:44:58 -07:00
Nikita Popov	eb86de63d9	[IR] Require that ptrmask mask matches pointer index size (#69343 ) Currently, we specify that the ptrmask intrinsic allows the mask to have any size, which will be zero-extended or truncated to the pointer size. However, what semantics of the specified GEP expansion actually imply is that the mask is only meaningful up to the pointer type index size -- any higher bits of the pointer will always be preserved. In other words, the mask gets 1-extended from the index size to the pointer size. This is also the behavior we want for CHERI architectures. This PR makes two changes: * It spells out the interaction with the pointer type index size more explicitly. * It requires that the mask matches the pointer type index size. The intention here is to make handling of this intrinsic more robust, to avoid accidental mix-ups of pointer size and index size in code generating this intrinsic. If a zero-extend or truncate of the mask is desired, it should just be done explicitly in IR. This also cuts down on the amount of testing we have to do, and things transforms needs to check for. As far as I can tell, we don't actually support pointers with different index type size at the SDAG level, so I'm just asserting the sizes match there for now. Out-of-tree targets using different index sizes may need to adjust that code.	2023-10-24 09:54:29 +02:00
Mogball	3fb5b18e81	Revert 24633ea and 760e7d0 "Enable FoldImmediate for X86" This reverts commits 24633eac38d46cd4b253ba53258165ee08d886cd and 760e7d00d142ba85fcf48c00e0acc14a355da7c3. I have confirmed that these commits are introducing a new crash in the peephole optimizer. I have minimized a test case, which you can find below. ```llvmir ; ModuleID = 'bugpoint-reduced-simplified.bc' source_filename = "/mnt/big/modular/Kernels/mojo/Mogg/MOGG.mojo" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" declare dso_local void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } }, { ptr }, { ptr, i64, i8 }) define dso_local void @bad_fn(ptr %0, ptr %1, ptr %2) { %4 = load i64, ptr null, align 8 %5 = insertvalue [4 x i64] poison, i64 12, 1 %6 = insertvalue [4 x i64] %5, i64 poison, 2 %7 = insertvalue [4 x i64] %6, i64 poison, 3 %8 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %7, 1 %9 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %8, [4 x i64] poison, 2 %10 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %9, i1 poison, 3 %11 = icmp ne i64 %4, 1 %12 = or i1 false, %11 %13 = select i1 %12, i64 %4, i64 0 %14 = zext i1 %12 to i64 %15 = insertvalue [4 x i64] poison, i64 12, 1 %16 = insertvalue [4 x i64] %15, i64 poison, 2 %17 = insertvalue [4 x i64] %16, i64 %13, 3 %18 = insertvalue [4 x i64] poison, i64 %14, 3 %19 = icmp eq i64 0, 0 %20 = icmp eq i64 0, 0 %21 = icmp eq i64 %13, 0 %22 = and i1 %20, %19 %23 = select i1 %22, i1 %21, i1 false %24 = select i1 %23, i1 %12, i1 false %25 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } poison, [4 x i64] %17, 1 %26 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %25, [4 x i64] %18, 2 %27 = insertvalue { ptr, [4 x i64], [4 x i64], i1 } %26, i1 %24, 3 %28 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } undef, { ptr, [4 x i64], [4 x i64], i1 } %10, 0 %29 = insertvalue { { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %28, { ptr, [4 x i64], [4 x i64], i1 } %27, 1 br label %31 30: ; preds = %3 br label %softmax_pass 31: ; preds = %31 %exitcond.not.i = icmp eq i64 poison, 3 br i1 %exitcond.not.i, label %37, label %31 32: ; preds = %31 br i1 poison, label %34, label %33 33: ; preds = %32 br label %34 34: ; preds = %33, %32 br i1 poison, label %35, label %36 35: ; preds = %34 br label %softmax_pass 36: ; preds = %34 br i1 poison, label %37, label %.critedge.i 37: ; preds = %36 br i1 poison, label %38, label %.critedge.i 38: ; preds = %37 br i1 poison, label %40, label %39 39: ; preds = %38 br label %40 40: ; preds = %39, %38 br i1 poison, label %.lr.ph28.i, label %._crit_edge.i .lr.ph28.i: ; preds = %40 br label %41 41: ; preds = %51, %.lr.ph28.i br i1 poison, label %.thread, label %42 42: ; preds = %41 br i1 poison, label %43, label %44 43: ; preds = %42 br label %45 44: ; preds = %42 br label %45 45: ; preds = %44, %43 br i1 poison, label %46, label %.thread 46: ; preds = %45 br label %47 .thread: ; preds = %45, %41 br label %47 47: ; preds = %.thread, %46 br i1 poison, label %51, label %48 48: ; preds = %47 br i1 poison, label %49, label %50 49: ; preds = %48 br label %51 50: ; preds = %48 br label %51 51: ; preds = %50, %49, %47 call void @foo({ { ptr, [4 x i64], [4 x i64], i1 }, { ptr, [4 x i64], [4 x i64], i1 } } %29, { ptr } poison, { ptr, i64, i8 } poison) br i1 poison, label %._crit_edge.i, label %41 ._crit_edge.i: ; preds = %51, %40 br label %softmax_pass .critedge.i: ; preds = %37, %36 br i1 poison, label %.lr.ph.i, label %softmax_pass .lr.ph.i: ; preds = %.lr.ph.i, %.critedge.i store { ptr, [4 x i64], [4 x i64], i1 } %10, ptr poison, align 8 br i1 poison, label %.lr.ph.i, label %softmax_pass softmax_pass: ; preds = %.lr.ph.i, %.critedge.i, %._crit_edge.i, %35, %30 ret void } ```	2023-10-24 07:08:38 +00:00
pvanhout	300190ffa7	[AMDGPU] Regenerate udiv.ll	2023-10-24 07:59:41 +02:00
Pierre van Houtryve	2bc93584f5	[DAG] Constant Folding for U/SMUL_LOHI (#69437 )	2023-10-24 07:37:55 +02:00
huhu233	dbe8def9cc	[AArch64] Lower mathlib call ldexp into fscale when sve is enabled (#67552 ) The function of 'fscale' is equivalent to mathlib call ldexp, but has better performance. This patch lowers ldexp into fscale when sve is enabled.	2023-10-24 10:17:04 +08:00
Evgenii Kudriashov	cc455033d4	[X86][GlobalISel] Reorganize shift scalar tests (NFC) (#68232 ) Removed duplicated tests from GlobalISel directory	2023-10-24 02:00:30 +02:00
Jeffrey Byrnes	ef33659492	[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider This allows working with e.g. v8i8 / v16i8 sources. It is generally useful, but is primarily beneficial when allowing e.g. v8i8s to be passed to branches directly through registers. As such, this is the first in a series of patches to enable that work. However, it effects https://reviews.llvm.org/D155995, so it has been implemented on top of that. Differential Revision: https://reviews.llvm.org/D159036 Change-Id: Idfcb57dacd0c32cab040fe4dd4ac2ec762750664	2023-10-23 16:07:54 -07:00
Artem Belevich	6115b1b907	[NVPTX] Add lowering for bitcasts float<->v4i8 (#69960 ) .. and move bitcast from a constant for integer-based types into a better suited location. It solves the mystery of why we sometimes used `mov.u32` and sometimes `mov.b32` for loading constants. Now they all should use `.b32`	2023-10-23 13:54:39 -07:00
alex-t	2973febe10	[AMDGPU] Force the third source operand of the MAI instructions to VGPR if no AGPRs are used. (#69720 ) eaf85b9c28 "[AMDGPU] Select VGPR versions of MFMA if possible" prevents the compiler from reserving AGPRs if a kernel has no inline asm explicitly using AGPRs, no calls, and runs at least 2 waves with not more than 256 VGPRs. This, in turn, makes it impossible to allocate AGPR if necessary. As a result, regalloc fails in case we have an MAI instruction that has at least one AGPR operand. This change checks if we have AGPRs and forces operands to VGPR if we do not have them. --------- Co-authored-by: Alexander Timofeev <alexander.timofeev@amd.com>	2023-10-23 19:41:07 +02:00
Philip Reames	25da9bb7d4	[RISCV] Allow swapped operands in reduction formation (#68634 ) Very straight forward, but worth landing on it's own in advance of a more complicated generalization.	2023-10-23 10:37:56 -07:00
David Green	2e69407547	[AArch64] Don't generate st2 for 64bit store that can use stp (#69901 ) D142966 made it so that st2 that do not start at element 0 use zip2 instead of st2. This extends that to any 64bit store that has a nearby load that can better become a LDP operation, which is expected to have a higher throughput. It searches up to 20 instructions away for a store to p+16 or p-16.	2023-10-23 18:15:36 +01:00
Sundeep	4554eac5d4	Update call-long1.ll [llvm][test][Hexagon] NFC: test commit	2023-10-23 11:55:42 -05:00
Michael Maitland	4458ba8cef	[RISCV][GISel] Select G_SELECT (G_ICMP, A, B) (#68247 ) If MI is a G_SELECT(G_ICMP(tst, A, B), C, D) then we can use (A, B, tst) as the (LHS, RHS, CC) of the Select_GPR_Using_CC_GPR.	2023-10-23 10:07:15 -04:00
Igor Kirillov	b507509f6a	[AArch64] Allow SVE code generation for fixed-width vectors (#67122 ) This patch allows the generation of SVE code with masks that mimic Neon.	2023-10-23 12:41:34 +01:00
Hans Wennborg	e2fc68c3db	Typos: 'maxium', 'minium'	2023-10-23 10:42:28 +02:00
Nikita Popov	2ad9fde418	[MemDep] Use EarliestEscapeInfo (#69727 ) Use BatchAA with EarliestEscapeInfo instead of callCapturesBefore() in MemDepAnalysis. The advantage of this is that it will also take not-captured-before information into account for non-calls (see test_store_before_capture for a representative example), and that this is a cached analysis. The disadvantage is that EII is slightly less precise than full CapturedBefore analysis. In practice the impact is positive, with gvn.NumGVNLoad going from 22022 to 22808 on test-suite. The impact to compile-time is also positive, mainly in the ThinLTO configuration.	2023-10-23 09:57:26 +02:00
Jon Roelofs	461918e290	[CodeGen][Remarks] Add the function name to the stack size remark (#69346 ) It is already present in the yaml, but missing from the printed diagnostics.	2023-10-22 11:39:02 -07:00
Ashley Nelson	47f0f8ca47	[WebAssembly] Add exp10 libcall signatures (#69661 ) The llvm.exp.* family of intrinsics and their corresponding libcalls were recently added, which means we need to know their signatures.	2023-10-20 12:15:48 -07:00
Craig Topper	cfdafc1e70	[RISCV][GISel] Support G_PTRTOINT and G_INTTOPTR (#69542 ) Legalizer, register bank selection, and instruction selection.	2023-10-20 12:03:09 -07:00
Craig Topper	f533e8ca9f	Recommit "[RISCV][GISel] Disable call lowering for integers larger than 2XLen. (#69144 )" Remove bad test for >2x XLen scalar. Don't restrict struct returns if they aren't homogenous. Original commit message: Types larger than 2XLen are passed indirectly which is not supported yet. Currently, we will incorrectly pass X10 multiple times.	2023-10-20 11:51:47 -07:00
Luke Lau	b4729f79ed	[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru (#66659 ) We currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is undef to avoid constraining the register allocator since it ignores LMUL. This patch relaxes it for non-undef passthrus, which occurs when lowering insert_vector_elt.	2023-10-20 14:19:04 -04:00
weiguozhi	24633eac38	[Peephole] Check instructions from CopyMIs are still COPY (#69511 ) Function foldRedundantCopy records COPY instructions in CopyMIs and uses it later. But other optimizations may delete or modify it. So before using it we should check if the extracted instruction is existing and still a COPY instruction.	2023-10-20 08:34:43 -07:00
Ivan Kosarev	b6ecdf0a6b	[AMDGPU] Segregate 16-bit fix-sgpr-copies tests. (#69353 ) The 16-bit instructions used in them are not available on the generic target. This patches makes them run for GFX11.	2023-10-20 14:09:30 +01:00
Wang Pengcheng	f24d9490e5	[RISCV] Match prefetch address with offset (#66072 ) A new ComplexPattern `AddrRegImmLsb00000` is added, which is like `AddrRegImm` except that if the least significant 5 bits isn't all zeros, we will fail back to offset 0.	2023-10-20 14:22:48 +08:00
Wang Pengcheng	af3ead4ccf	[RISCV] Add more prefetch tests (#67644 ) We should be able to merge the offset later.	2023-10-20 14:19:49 +08:00
yubingex007-a11y	f2517cbcee	[X86][AMX] remove related code of X86PreAMXConfigPass (#69569 ) In https://reviews.llvm.org/D125075, we switched to use FastPreTileConfig in O0 and abandoned X86PreAMXConfigPass. we can remove related code of X86PreAMXConfigPass safely.	2023-10-20 13:43:34 +08:00
Brandon Wu	d1985e3d1f	[RISCV] Support Xsfvqmaccdod and Xsfvqmaccqoq extensions (#68295 ) SiFive Int8 Matrix Multiplication Extensions Specification https://sifive.cdn.prismic.io/sifive/c4f0e51d-4dd3-402a-98bc-1ffad6011259_int8-matmul-spec.pdf	2023-10-20 11:16:20 +08:00

... 45 46 47 48 49 ...

52796 Commits