llvm-project

Author	SHA1	Message	Date
Mingming Liu	dda73336ad	[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597 ) The motivating use case is to support import the function declaration across modules to construct call graph edges for indirect calls [1] when importing the function definition costs too much compile time (e.g., the function is too large has no `noinline` attribute). 1. Currently, when the compiled IR module doesn't have a function definition but its postlink combined summary contains the function summary or a global alias summary with this function as aliasee, the function definition will be imported from source module by IRMover. The implementation is in FunctionImporter::importFunctions [2] 2. In order for FunctionImporter to import a declaration of a function, both function summary and alias summary need to carry the def / decl state. Specifically, all existing summary fields doesn't differ across import modules, but the def / decl state of is decided by `<ImportModule, Function>`. This change encodes the def/decl state in `GlobalValueSummary::GVFlags`. In the subsequent changes 1. The indexing step `computeImportForModule` [3] will compute the set of definitions and the set of declarations for each module, and passing on the information to bitcode writer. 2. Bitcode writer will look up the def/decl state and sets the state when it writes out the flag value. This is demonstrated in https://github.com/llvm/llvm-project/pull/87600 3. Function importer will read the def/decl state when reading the combined summary to figure out two sets of global values, and IRMover will be updated to import the declaration (aka linkGlobalValuePrototype [4]) into the destination module. - The next change is https://github.com/llvm/llvm-project/pull/87600 [1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)` [3] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)` [4] `3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)`	2024-04-10 19:46:01 -07:00
Vitaly Buka	d927d1867f	[UBSAN] Emit optimization remarks (#88304 )	2024-04-10 16:30:42 -07:00
Oskar Wirga	a9d4ddd98a	[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218 ) I noticed that we weren't propagating ALL type metadata that was attached to CFI functions: # BEFORE ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028 ``` # AFTER ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030 ``` This patch makes sure that the entire vector of metadata is copied over.	2024-04-10 15:37:27 -07:00
Alexey Bataev	2b00a73f62	[SLP]Buildvector for alternate instructions with non-profitable gather operands. If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84978	2024-04-10 14:33:56 -04:00
Noah Goldstein	81cdd35c0c	[ValueTracking] Add support for `xor`/`disjoint or` in `isKnownNonZero` Handles cases like `X ^ Y == X` / `X disjoint\| Y == X`. Both of these cases have identical logic to the existing `add` case, so just converting the `add` code to a more general helper. Proofs: https://alive2.llvm.org/ce/z/Htm7pe Closes #87706	2024-04-10 13:13:43 -05:00
Noah Goldstein	2646790155	[ValueTracking] Add tests for `xor`/`disjoint or` in `isKnownNonZero`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	0c57a2e4b4	[ValueTracking] Add support for `xor`/`disjoint or` in `getInvertibleOperands` This strengthens our `isKnownNonEqual` logic with some fairly trivial cases. Proofs: https://alive2.llvm.org/ce/z/4pxRTj Closes #87705	2024-04-10 13:13:43 -05:00
Noah Goldstein	195d278d50	[ValueTracking] Add tests for `xor`/`disjoint or` in `getInvertibleOperands`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	9c545a14c0	[ValueTracking] Add support for `insertelement` in `isKnownNonZero` Inserts don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87703	2024-04-10 13:13:43 -05:00
Noah Goldstein	8a28b9b8ec	[ValueTracking] Add tests for `insertelement` in `isKnownNonZero`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	87528bfefb	[ValueTracking] Add support for `shufflevector` in `isKnownNonZero` Shuffles don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87702	2024-04-10 13:13:42 -05:00
Noah Goldstein	c1d3f39ae9	[ValueTracking] Add tests for `shufflevector` in `isKnownNonZero`	2024-04-10 13:13:42 -05:00
Noah Goldstein	f1ee458ddb	[ValueTracking] improve `isKnownNonZero` precision for `smax` Instead of relying on known-bits for strictly positive, use the `isKnownPositive` API. This will use `isKnownNonZero` which is more accurate. Closes #88170	2024-04-10 10:40:49 -05:00
Noah Goldstein	2ff82c2c64	[ValueTracking] Add tests for improving `isKnownNonZero` of `smax`; NFC	2024-04-10 10:40:49 -05:00
Noah Goldstein	37ca6fa1e2	[ValueTracking] Add support for overflow detection functions is `isKnownNonZero` Adds support for: `{s,u}{add,sub,mul}.with.overflow` The logic is identical to the the non-overflow binops, we where just missing the cases. Closes #87701	2024-04-10 10:40:48 -05:00
Noah Goldstein	a02b3c0182	[ValueTracking] Add tests for overflow detection functions is `isKnownNonZero`; NFC	2024-04-10 10:40:48 -05:00
Noah Goldstein	41c52217b0	[ValueTracking] Add support for `vector_reduce_{s,u}{min,max}` in `computeKnownBits` Previously missing. We compute by just applying the reduce function on the knownbits of each element. Closes #88169	2024-04-10 10:40:48 -05:00
Noah Goldstein	77d668451a	[ValueTracking] Add support for `vector_reduce_{s,u}{min,max}` in `isKnownNonZero` Previously missing, proofs for all implementations: https://alive2.llvm.org/ce/z/G8wpmG	2024-04-10 10:40:48 -05:00
Noah Goldstein	f9f4aba547	[InstCombine] Add tests for non-zero/knownbits of `vector_reduce_{s,u}{min,max}`; NFC	2024-04-10 10:40:48 -05:00
Alexey Bataev	6ca5a410d2	[SLP]Fix PR87358: broken module, Instruction does not dominate all uses. If the first node is a gather node with extractelement instructions, still need to put the vector value after all instructions, not after the very first one.	2024-04-10 08:24:15 -07:00
Florian Hahn	94ed57dab6	[PhaseOrdering] Add test for #85551 . Add test for missed hoisting of checks from std::span https://github.com/llvm/llvm-project/issues/85551	2024-04-10 13:30:30 +01:00
Paschalis Mpeis	e50c4c83b6	[AArch64][TLI] Add TLI mappings for ArmPL modf, sincos, sincospi (#83143 ) ArmPL 24.04 release fixes a bug concerning these methods, so now they can be re-introduced to TLI mappings.	2024-04-10 09:34:46 +01:00
XChy	313a33b9df	[InstCombine] Reduce nested logical operator if poison is implied (#86823 ) Fixes #76623 Alive2 proof: https://alive2.llvm.org/ce/z/gX6znJ (I'm not sure how to write a proof for such transform, maybe there are mistakes) In most cases, `icmp(a, C1) && (other_cond && icmp(a, C2))` will be reduced to `icmp(a, C1) & (other_cond && icmp(a, C2))`, since latter icmp always implies the poison of the former. After reduction, it's easier to simplify the icmp chain. Similarly, this patch does the same thing for `(A && B) && C --> A && (B & C)`. Maybe we could constraint such reduction only on icmps if there is regression in benchmarks.	2024-04-10 14:19:44 +08:00
hanbeom	44c79da3ae	[InstCombine] Remove shl if we only demand known signbits of shift source (#79014 ) This patch resolve TODO written in commit: `5909c67883` Proof: https://alive2.llvm.org/ce/z/C3VNoR	2024-04-10 11:19:09 +09:00
Noah Goldstein	9170e38575	Add support for `nneg` flag with `uitofp` As noted when #82404 was pushed (canonicalizing `sitofp` -> `uitofp`), different signedness on fp casts can have dramatic performance implications on different backends. So, it makes to create a reliable means for the backend to pick its cast signedness if either are correct. Further, this allows us to start canonicalizing `sitofp`- > `uitofp` which may easy middle end analysis. Closes #86141	2024-04-09 18:12:33 -05:00
Teresa Johnson	a332cfc986	[MemProf] Perform cloning for each allocation separately (#87112 ) Restructures the cloning slightly to perform all cloning for each allocation separately. The prior algorithm would sometimes miss cloning opportunities in cases where trimmed cold contexts partially overlapped with longer contexts for different allocations. Most of the change is isolated to the helpers that move edges to new or existing clones, which now support moving a subset of context ids.	2024-04-09 14:12:32 -07:00
Noah Goldstein	71ef04d7cd	[InstCombine] fold `(icmp eq/ne (or disjoint x, C0), C1)` -> `(icmp eq/ne x, C0^C1)` Proof: https://alive2.llvm.org/ce/z/m3xoo_ Closes #87734	2024-04-09 15:38:18 -05:00
Noah Goldstein	5b58eb68ed	[InstCombine] Add tests for folding `(icmp eq/ne (or disjoint x, C0), C1)`; NFC	2024-04-09 15:38:18 -05:00
Florian Hahn	a8ec1eb843	[VPlan] Dont assign slots to VPValues with an underlying value. This makes sure the numbering for VPValues without underlying values is consecutive.	2024-04-09 21:30:51 +01:00
Noah Goldstein	7599d478ef	[InstCombine] Fold `(icmp eq/ne (add nuw x, y), 0)` -> `(icmp eq/ne (or x, y), 0)` `(icmp eq/ne (or x, y), 0)` is probably easier to analyze than `(icmp eq/ne x, -y)` Proof: https://alive2.llvm.org/ce/z/2-VTb6 Closes #88088	2024-04-09 13:56:28 -05:00
Noah Goldstein	759bab0681	[InstCombine] Add tests for folding `(icmp eq/ne (add nuw x, y), 0)`; NFC	2024-04-09 13:56:28 -05:00
Alexey Bataev	910d2de357	[SLP]Fix PR88103: consider the sign of the compare for non-negative operands. Need to improve detection of number of bits, required for the operand, before doing a reduction. If the instruction is incoming operand of the signed compare, need to consider adding an extra bit for signedness.	2024-04-09 10:47:47 -07:00
Alexey Bataev	4dfc55f7e7	[SLP][NFC]Add a test for PR88103, where zext is incoming to signed comparison.	2024-04-09 10:39:25 -07:00
Alexey Bataev	e8e67957fa	[SLP]Fix PR88123: use vectorized operands consistently. Need to use vectorized operands, not the vecop of the extractelement instructions, to avoid false detection of the extra vector operand in the extractelements shuffling.	2024-04-09 08:42:57 -07:00
Simon Pilgrim	3bfd5c6424	[TTI] getCommonMaskedMemoryOpCost - consistently use getScalarizationOverhead instead of ExtractElement costs for address/mask extraction. (#87771 ) These aren't unknown extraction indices, we will be extracting every address/mask element in sequence.	2024-04-09 15:42:51 +01:00
Florian Hahn	c836983671	[VPlan] Remove unused first mask op from VPBlendRecipe. (#87770 ) VPBlendRecipe does not use the first mask operand. Removing it allows VPlan-based DCE to remove unused mask computations. This also fixes #87410, where unused Not VPInstructions are considered having only their first lane demanded, but some of their operands providing a vector value due to other users. Fixes https://github.com/llvm/llvm-project/issues/87410 PR: https://github.com/llvm/llvm-project/pull/87770	2024-04-09 11:14:05 +01:00
Noah Goldstein	964df099e1	[ValueTracking] Support non-constant idx for `computeKnownBits` of `insertelement` Its same logic as before, we just need to intersect what we know about the new Elt and the entire pre-existing Vec. Closes #87707	2024-04-09 01:01:41 -05:00
Noah Goldstein	3a2367561d	[ValueTracking] Add tests for non-constant idx in `computeKnownBits` of `insertelement`; NFC	2024-04-09 01:01:41 -05:00
Matthias Braun	4a812b5912	Verify threadlocal_address constraints (#87841 ) Check invariants for `llvm.threadlocal.address` intrinsic in IR Verifier.	2024-04-08 17:47:57 -07:00
Vitaly Buka	0646344062	[HWASAN][UBSAN] Reverse random logic (#88070 ) It feels more intuitive to make higher P to keep more checks.	2024-04-08 17:23:47 -07:00
Alexey Bataev	01d9528ef9	[SLP]Improve final minbitwidth analysis attempt. Added part for demanded bits analysis in the IsPotentiallyTruncated to improve minbitwidth analysis final attempts. Metric: size..text Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test 43069.00 42973.00 -0.2% test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test 43066.00 42970.00 -0.2% Extra trunc instructions are emitted to operate with <32 x i8> instead of <32 x i16>, will be removed in the next patches. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87786	2024-04-08 15:54:30 -04:00
Florian Hahn	977c0a6d29	[LAA] Add tests with non-constant strides & distances. Add a number of LAA test cases with both forward and backward dependences with non-constant strides and dependence distances. This includes test coverage for https://github.com/llvm/llvm-project/issues/87336 Also includes a LoopLoadElimination test to make sure the pass does not crash on non-constant dependence distances.	2024-04-08 19:18:38 +01:00
Min-Yih Hsu	f6315a9572	[AArch64][LoopIdiom] Disable LoopIdiomTransform when NoImplicitFloat is present (#87677 ) This behavior is aligned with both LoopVectorizer and SLPVectorizer.	2024-04-08 09:10:23 -07:00
Alexey Bataev	4a1c53f9fa	[SLP]Improve minbitwidth analysis for abs/smin/smax/umin/umax intrinsics. https://alive2.llvm.org/ce/z/ivPZ26 for the abs transformations. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86135	2024-04-08 08:32:35 -07:00
Matt Arsenault	bdf428af98	ValueTracking: Consider demanded elts for vector constants in computeKnownFPClass	2024-04-08 09:32:14 -04:00
Matt Arsenault	2bc637b1ce	ValueTracking: Handle ConstantAggregateZero in computeKnownFPClass	2024-04-08 09:26:12 -04:00
Matt Arsenault	0832b85e0f	ValueTracking: Add baseline tests for vector fpclass handling	2024-04-08 09:26:12 -04:00
Nikita Popov	91189afef5	Revert "[indvars] Missing variables at Og: (#69920 )" This reverts commit 739fa1c84b92b8af7dceedf2e5ad808a64e85a57. This introduces a layering violation by using IR in Support headers.	2024-04-08 14:31:52 +09:00
Carlos Alberto Enciso	739fa1c84b	[indvars] Missing variables at Og: (#69920 ) https://bugs.llvm.org/show_bug.cgi?id=51735 https://github.com/llvm/llvm-project/issues/51077 In the given test case: ``` 4 ... 5 void bar() { 6 int End = 777; 7 int Index = 27; 8 char Var = 1; 9 for (; Index < End; ++Index) 10 ; 11 nop(Index); 12 } 13 ... ``` Missing local variable `Index` after loop `Induction Variable Elimination`. When adding a breakpoint at line `11`, LLDB does not have information on the variable. But it has info on `Var` and `End`.	2024-04-08 05:31:56 +01:00
Alexey Bataev	a612524197	[SLP]Fix the cost of the reduction result to the final type. Need to fix the way the cost is calculated, otherwise wrong cast opcode can be selected and lead to the over-optimistic vector cost. Plus, need to take into account reduction type size. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87528	2024-04-07 09:51:47 -04:00

1 2 3 4 5 ...

28387 Commits