llvm-project

Author	SHA1	Message	Date
Alexey Bataev	76422385c3	[SLP]Support reordered buildvector nodes for better clustering Patch adds reordering of the buildvector nodes for better clustering of the compatible operations and future vectorization. Includes basic cost estimation and if the transformation is not profitable - reverts it. AVX512, -O3+LTO Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/401.bzip2/401.bzip2.test 74565.00 75701.00 1.5% test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test 75773.00 76397.00 0.8% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2014462.00 2024494.00 0.5% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 395219.00 396979.00 0.4% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 857795.00 859667.00 0.2% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 800472.00 802440.00 0.2% test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test 590699.00 591403.00 0.1% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 203006.00 203102.00 0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 42408.00 42424.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12451575.00 12451927.00 0.0% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1396480.00 1396448.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1396480.00 1396448.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1047708.00 1047580.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111344.00 111328.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1087660.00 1087500.00 -0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280664.00 280616.00 -0.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 502646.00 502006.00 -0.1% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1033135.00 1031567.00 -0.2% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2070917.00 2065845.00 -0.2% test-suite :: External/SPEC/CINT2006/473.astar/473.astar.test 33893.00 33797.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 39677.00 39549.00 -0.3% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 39674.00 39546.00 -0.3% test-suite :: MultiSource/Benchmarks/MiBench/security-blowfish/security-blowfish.test 11560.00 11512.00 -0.4% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 653867.00 649275.00 -0.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 653867.00 649275.00 -0.7% CINT2006/401.bzip2 - extra code vectorized CINT2017rate/541.leela_r CINT2017speed/641.leela_s - function _ZN9FastBoard25get_pattern3_augment_specEiib not inlined anymore, better vectorization CFP2017rate/510.parest_r - better vectorization JM/ldecod - better vectorization JM/lencod - same CINT2006/464.h264ref - extra code vectorized CFP2006/447.dealII - extra vector code MiBench/consumer-lame - vectorized 2 loops previously scalar DOE-ProxyApps-C/miniGMG - small changes Benchmarks/7zip - extra code vectorized, better vectorization CFP2017rate/526.blender_r - extra vectorization CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vectorization MiBench/consumer-jpeg - extra vectorization CINT2006/400.perlbench - extra vectorization Prolangs-C/TimberWolfMC - small variations Applications/sqlite3 - extra function vectorized and inlined Benchmarks/tramp3d-v4 - extra code vectorized CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - extra code vectorized, function digcpy gets vectorized and inlined CINT2006/473.astar - extra code vectorized MiBench/telecomm-gsm - extra code vectorized, better vector code mediabench/gsm - same MiBench/security-blowfish - extra code vectorized CINT2017speed/625.x264_s CINT2017rate/525.x264_r - sub4x4_dct function vectorized and gets inlined RISCV-V, SiFive-p670, O3+LTO CFP2017rate/510.parest_r - extra vectorization CFP2017rate/526.blender_r - extra vectorization MiBench/consumer-lame - extra vectorized code Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/114284	2024-11-06 10:51:15 -05:00
Hari Limaye	fbd89bcc66	Reland "[LTO] Run Argument Promotion before IPSCCP" (#111853 ) Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas. Relands #111163.	2024-11-06 13:54:48 +00:00
Hari Limaye	88e9b373c0	[FuncSpec] Query SCCPSolver in more places (#114964 ) When traversing the use-def chain of an Argument in a candidate specialization, also query the SCCPSolver to see if a Value is constant. This allows us to better estimate the codesize savings of a candidate in the presence of instructions that are a user of the argument we are estimating savings for which also use arguments that have been found constant by IPSCCP. Similarly when estimating the dead basic blocks from branch and switch instructions which become constant, also query the SCCPSolver to see if a predecessor is unreachable.	2024-11-06 13:25:15 +00:00
Simon Pilgrim	e3a0775651	[VectorCombine] foldExtractedCmps - (re-)enable fold on non-commutative binops #114901 exposed that foldExtractedCmps didn't account for non-commutative binops, and were disabled by 05e838f428555bcc4507bd37912da60ea9110ef6 This patch re-enables support for non-commutative binops by ensuring that the LHS/RHS arg order of the binop is retained.	2024-11-06 12:10:31 +00:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Simon Pilgrim	d8354d63db	[VectorCombine] Extend test coverage for #114901 with commuted test case	2024-11-06 11:27:05 +00:00
Mel Chen	4480a22c2b	[LV][EVL] Emit vp.merge intrinsic to enable out-loop reduction in EVL vectorization. (#101641 ) Following #90184, this patch emits vp.merge intrinsic, which is used to set the inactive lanes in a select operation to the RHS instead of undef. Currently, it is applied to out-loop reduction for EVL vectorization. This patch performs transformation to convert select(header_mask, LHS, RHS) into vp.merge(all-true, LHS, RHS, EVL) And always use the predicated reduction select to set the incoming value of the reduction phi to support out-loop reduction when using tail folding with EVL. TODO: Postpone the adjustment of the predicated reduction select to VPlanTransform. The current adjustment might be too early, which could lead to a situation where the predicated reduction select is adjusted, but the EVL recipes cannot be successfully generated during VPlanTransform.	2024-11-06 14:53:49 +08:00
vporpo	320389d428	[SandboxVec][BottomUpVec] Generate vector instructions (#115087 ) This patch implements some very basic code generation, for some opcodes.	2024-11-05 16:27:24 -08:00
Alexey Bataev	c1cec8c0dc	[SLP][NFC]Add a test with missed splat ordering for loads, NFC	2024-11-05 14:08:17 -08:00
Florian Hahn	a353e258ba	[LAA] Don't require Stride == 1/-1 for inbounds pointer AddRecs nowrap. (#113126 ) If we have a pointer AddRec, the maximum increment is 2^(pointer-index-wdith - 1) - 1. This means that if incrementing the AddRec wraps, the distance between the previously accessed location and the wrapped location is > 2^(pointer-index-wdith - 1), i.e. if the GEP for the AddRec is inbounds, this would be poison due to the object being larger than half the pointer index type space. The poison would be immediate UB when the memory access gets executed.. Similar reasoning can be applied for decrements. PR: https://github.com/llvm/llvm-project/pull/113126	2024-11-05 22:45:56 +01:00
Andreas Jonson	6d6287af84	[NFC] Fix test for zext(shl(trunc)) fold (#113778 ) This fold already exist but there is a call to [shouldChangeType ](`91fdfec263/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp (L1202)`) that blocks it if the target layout is missing a definition of the types in the casts. closes https://github.com/llvm/llvm-project/issues/61650	2024-11-05 18:13:33 +01:00
Alexey Bataev	0c18def2c1	[SLP]Allow interleaving check only if it is less than number of elements Need to check if the interleaving factor is less than total number of elements in loads slice to handle it correctly and avoid compiler crash. Fixes report https://github.com/llvm/llvm-project/pull/112361#issuecomment-2457227670	2024-11-05 07:06:15 -08:00
Nuno Lopes	6070aeb3b7	[Coro] Use poison instead of undef as placeholder [NFC]	2024-11-05 13:51:36 +00:00
Nuno Lopes	928460afc1	[ArgPromotion] Use poison instead of undef as placeholder in deleted metadata [NFC]	2024-11-05 13:44:34 +00:00
Yongtao Huang	f6948e8f9d	[LoopVectorize] Fix typo in branch-weights.ll test CEHCK->CHECK (NFC) (#113574 ) Fix the typo CEHCK. Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>	2024-11-05 14:22:24 +01:00
Simon Pilgrim	05e838f428	[VectorCombine] foldExtractedCmps - disable fold on non-commutative binops The fold needs to be adjusted to correctly track the LHS/RHS operands, which will take some refactoring, for now just disable the fold in this case. Fixes #114901	2024-11-05 11:42:30 +00:00
Simon Pilgrim	a88be11eef	[VectorCombine] Add test coverage for #114901	2024-11-05 11:42:30 +00:00
Nikita Popov	46ccefb123	[CVP] Fix APInt ctor assertion with i1 urem This is creating an APInt with value 2, which may not be well-defined for i1 values. Fix this by replacing the Y.umul_sat(2) with Y.uadd_sat(Y).	2024-11-05 10:03:37 +01:00
Matt Arsenault	ea859005b5	SafeStack: Respect alloca addrspace (#112536 ) Just insert addrspacecast in cases where the alloca uses a different address space, since I don't know what else you could possibly do.	2024-11-04 17:51:30 -08:00
Matt Arsenault	30dd1297fa	AMDGPU: Custom expand flat cmpxchg which may access private (#109410 ) 64-bit flat cmpxchg instructions do not work correctly for scratch addresses, and need to be expanded as non-atomic. Allow custom expansion of cmpxchg in AtomicExpand, as is already the case for atomicrmw.	2024-11-04 09:29:38 -08:00
Ramkumar Ramachandra	cd16b077bf	IR: introduce CmpInst::isEquivalence (#111979 ) Steal impliesEquivalanceIf{True,False} (sic) from GVN, and extend it for floating-point constant vectors, and accounting for denormal values. Since InstCombine also performs GVN-like replacements, introduce CmpInst::isEquivalence, and remove the corresponding code in GVN, with the intent of using it in more places. The code in GVN also has a bad FIXME saying that the optimization may be valid in the nsz case, but this is not the case. Alive2 proof: https://alive2.llvm.org/ce/z/vEaK8M	2024-11-04 15:54:59 +00:00
Jay Foad	f8559751fc	[llvm-project] Fix typo "propogate" (#114795 )	2024-11-04 15:33:19 +00:00
zhijian lin	a51712751c	[PowerPC][LLC] Utilize PPC::getNormalizedPPCTargetCPU() to set CPU (#113943 ) Utilize common API in PPCTargetParser (https://github.com/llvm/llvm-project/pull/97541) to set default CPU with same interfaces for LLC. This will update AIX default CPU to pwr7 and LoP powerppc64 default CPU to ppc64.	2024-11-04 09:40:54 -05:00
Alexey Bataev	899336735a	[SLP]Be more pessimistic about poisonous reductions Consider all possible reductions ops as being non-poisoning boolean logical operations, which require freeze to be fully correct. https://alive2.llvm.org/ce/z/TKWDMP Fixes #114738	2024-11-04 06:13:52 -08:00
Alexey Bataev	a15bf88d53	[SLP][NFC]Add a test with missing freeze instruction before reduction, NFC	2024-11-04 04:38:09 -08:00
Hari Limaye	5ed3f46359	[FuncSpec] Improve handling of Comparison Instructions (#114073 ) When visiting comparison instructions during computation of a specializations's bonus, make use of information from the lattice value of the other operand in the case where we have not found this to have a specific constant value.	2024-11-04 11:39:00 +00:00
Simon Pilgrim	ac1869aa70	[CostModel][X86] Add initial costs for non-lane-crossing one/two input shuffles (#114680 ) Most of the x86 shuffle instructions operate within each 128-bit subvector lane, but our shuffle costs struggle to handle this and have to fallback to worst case shuffles that reference elements from any lane. This patch detects shuffle masks that we know are "inlane" and enable us to assume a cheaper shuffle cost.	2024-11-04 10:19:02 +00:00
Nikita Popov	8851ea64a5	[ConstantHoist] Fix APInt ctor assertion The result here may require truncation. Fix this by removing the calculateOffsetDiff() helper entirely. As far as I can tell, this code does not actually have to deal with different bitwidths. findBaseConstants() will produce ranges of constants with equal types, which is what maximizeConstantsInRange() will then work on. Fixes assertion reported at: https://github.com/llvm/llvm-project/pull/114539#issuecomment-2453008679	2024-11-04 11:17:24 +01:00
Hari Limaye	daa9af179f	[FuncSpec] Handle ssa_copy intrinsic calls in InstCostVisitor (#114247 ) Look through ssa_copy intrinsic calls when computing codesize bonus for a specialization. Also remove redundant logic to skip computing codesize bonus for ssa_copy intrinsics, now these are considered zero-cost by TTI (in PR #75294).	2024-11-04 10:09:50 +00:00
Luke Lau	beb12f92c7	[RISCV] Add +optimized-nfN-segment-load-store (#114414 ) This is a follow up to #111511, where after benchmarking we learnt that the Banana Pi F3 has fast segmented loads for not just NF=2, but also NF=3 and NF=4: https://github.com/preames/bp3-microarch#vlseg_lmul_x_sew_throughput This adds tuning features to allow these segment loads and stores to be costed cheaper and enables it for the spacemit-x60. It also enables +optimized-nf2-segment-load-store by default in the generic tuning to maintain the previous behaviour when compiled without -mcpu or -mtune.	2024-11-04 06:43:58 +08:00
Hubert Tong	5091a359d9	[ConstantFold] Special case log1p +/-0.0 (#114635 ) C's Annex F specifies that log1p +/-0.0 returns the input value; however, this behavior is optional and host C libraries may behave differently. This change applies the Annex F behavior to constant folding by LLVM.	2024-11-02 20:06:39 -04:00
Jorge Botto	fcd51dee42	[InstCombine] Factorise Add and Min/Max using distributivity (#101717 ) This PR fixes part of https://github.com/llvm/llvm-project/issues/92433. It specifically adds the 4 cases mentioned in https://github.com/llvm/llvm-project/issues/92433#issuecomment-2117064459. I've added 8 positive tests, 4 of which are mentioned in the comment above and 4 which are their commutative equivalents. Alive proof: https://alive2.llvm.org/ce/z/z6eFTb I've also added 8 negative tests, because we want to make sure we do not optimise if the relevant flags are not relevant because the optimisation wouldn't be sound. Alive proof that the optimisation is invalid: https://alive2.llvm.org/ce/z/NvNjTD I did have to make the integer types `i4` to make Alive not timeout and to fit them all on one page.	2024-11-02 17:08:12 +01:00
serge-sans-paille	01a103b0b9	[llvm] Fix __builtin_object_size interaction between Negative Offset … (#111827 ) …and Select/Phi When picking a SizeOffsetAPInt through combineSizeOffset, the behavior differs if we're going to apply a constant offset that's positive or negative: If it's positive, then we need to compare the remaining bytes (i.e. Size - Offset), but if it's negative, we need to compare the preceding bytes (i.e. Offset). Fix #111709	2024-11-02 09:14:35 +00:00
Yingwei Zheng	f1e1055c84	[ValueTracking] Compute known bits from recursive select/phi (#113707 ) This patch is inspired by https://github.com/llvm/llvm-project/pull/113686. I found that it removes a lot of unnecessary "and X, 1" in some applications that represent boolean values with int.	2024-11-02 15:45:46 +08:00
Florian Hahn	17bad1a9da	[LV] Bail out on header phis in shouldConsiderInvariant. This fixes an infinite recursion in rare cases. Fixes https://github.com/llvm/llvm-project/issues/113794.	2024-11-01 20:51:25 +00:00
Han-Kuan Chen	a795a18bba	[SLP][REVEC] VF should be scaled when ScalarTy is FixedVectorType. (#114551 )	2024-11-02 03:03:52 +08:00
Simon Pilgrim	718d50d6d0	[VectorCombine] foldPermuteOfBinops - prefer the new fold for matching costs. Minor tweak to #114101 - as we're reducing the instruction count, we should prefer the fold if the old/new costs are the same.	2024-11-01 17:28:37 +00:00
Min-Yih Hsu	64314dedeb	[InlineCost] Print inline cost for invoke call sites as well (#114476 ) Previously InlineCostAnnotationPrinter only prints inline cost for call instructions. I don't think there is any reason not to analyze invoke and its callee, and this patch adds such support.	2024-11-01 09:55:17 -07:00
Yingwei Zheng	a77dedcacb	[InstSimplify][InstCombine][ConstantFold] Move vector div/rem by zero fold to InstCombine (#114280 ) Previously we fold `div/rem X, C` into `poison` if any element of the constant divisor `C` is zero or undef. However, it is incorrect when threading udiv over an vector select: https://alive2.llvm.org/ce/z/3Ninx5 ``` define <2 x i32> @vec_select_udiv_poison(<2 x i1> %x) { %sel = select <2 x i1> %x, <2 x i32> <i32 -1, i32 -1>, <2 x i32> <i32 0, i32 1> %div = udiv <2 x i32> <i32 42, i32 -7>, %sel ret <2 x i32> %div } ``` In this case, `threadBinOpOverSelect` folds `udiv <i32 42, i32 -7>, <i32 -1, i32 -1>` and `udiv <i32 42, i32 -7>, <i32 0, i32 1>` into `zeroinitializer` and `poison`, respectively. One solution is to introduce a new flag indicating that we are threading over a vector select. But it requires to modify both `InstSimplify` and `ConstantFold`. However, this optimization doesn't provide benefits to real-world programs: https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/IR/ConstantFold.cpp.html#L908 https://dtcxzyw.github.io/llvm-opt-benchmark/coverage/data/zyw/opt-ci/actions-runner/_work/llvm-opt-benchmark/llvm-opt-benchmark/llvm/llvm-project/llvm/lib/Analysis/InstructionSimplify.cpp.html#L1107 This patch moves the fold into InstCombine to avoid breaking numerous existing tests. Fixes #114191 and #113866 (only poison-safety issue).	2024-11-01 22:56:22 +08:00
Yingwei Zheng	e577f14b67	[InstCombine] Use `m_NotForbidPoison` when folding `(X u< Y) ? -1 : (~X + Y) --> uadd.sat(~X, Y)` (#114345 ) Alive2: https://alive2.llvm.org/ce/z/mTGCo- We cannot reuse `~X` if `m_AllOnes` matches a vector constant with some poison elts. An alternative solution is to create a new not instead of reusing `~X`. But it doesn't worth the effort because we need to add a one-use check. Fixes https://github.com/llvm/llvm-project/issues/113869.	2024-11-01 22:18:44 +08:00
David Green	0f919444ad	[ValueTracking] Handle recursive phis in knownFPClass (#114008 ) As a follow-on to 113686, this breaks the recursion between phi nodes that have p1 = phi(x, p2) and p2 = phi(y, p1). The knownFPClass can be calculated from the classes of p1 and p2.	2024-11-01 13:38:29 +00:00
Han-Kuan Chen	e4aeeba84c	[SLP][REVEC] When ScalarTy is FixedVectorType, the insertion index should consider the number of elements of ScalarTy. (#114526 )	2024-11-01 21:17:57 +08:00
Nuno Lopes	344d972736	AssumeBundleBuilder: switch placeholder from undef to poison [NFC]	2024-11-01 10:12:10 +00:00
Yingwei Zheng	f16bff1261	[GVN][NewGVN][Local] Handle attributes for function calls after CSE (#114011 ) This patch intersects attributes of two calls to avoid introducing UB. It also skips incompatible call pairs in GVN/NewGVN. However, I cannot provide negative tests for these changes. Fixes https://github.com/llvm/llvm-project/issues/113997.	2024-11-01 12:44:33 +08:00
Lei Wang	bef3b54ea1	[InstrPGO] Avoid using global variable to fix potential data race (#114364 ) In https://github.com/llvm/llvm-project/pull/109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).	2024-10-31 21:28:13 -07:00
Yingwei Zheng	96b14f2ccb	[Reland][InstCombine] Fix FMF propagation in `foldSelectIntoOp` (#114499 ) Relands #114356. Compared to the last version, this patch only merges poison-generating/nsz flags from the select to fix LV regression in `llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.	2024-11-01 12:22:57 +08:00
c8ef	cf0b6cc711	Revert "[ConstantFold] Fold `tgamma` and `tgammaf` when the input parameter is a constant value." (#114496 ) Reverts llvm/llvm-project#114065	2024-11-01 09:26:11 +08:00
c8ef	1f07f995cc	[ConstantFold] Fold `tgamma` and `tgammaf` when the input parameter is a constant value. (#114065 ) This patch adds support for constant folding for the `tgamma` and `tgammaf` libc functions.	2024-11-01 09:07:55 +08:00
Ruiling, Song	54d31bde32	Reapply "StructurizeCFG: Optimize phi insertion during ssa reconstruction (#101301 )" (#114347 ) This reverts commit be40c723ce2b7bf2690d22039d74d21b2bd5b7cf.	2024-11-01 08:29:59 +08:00
Florian Hahn	b021464d35	[VPlan] Introduce scalar loop header in plan, remove VPLiveOut. (#109975 ) Update VPlan to include the scalar loop header. This allows retiring VPLiveOut, as the remaining live-outs can now be handled by adding operands to the wrapped phis in the scalar loop header. Note that the current version only includes the scalar loop header, no other loop blocks and also does not wrap it in a region block. PR: https://github.com/llvm/llvm-project/pull/109975	2024-10-31 21:36:44 +01:00

1 2 3 4 5 ...

30286 Commits