llvm-project

Author	SHA1	Message	Date
Philip Reames	0517772b4a	Delete unused PoisonChecking utility pass This was introduced ~5yrs ago (by me), and has never really gotten any adoption. By now, it's significantly out of sync with new/changed poison propoagation rules. The idea is still reasonable, but the imagined use case is largely covered by alive2 these days anyways.	2024-12-19 14:23:38 -08:00
Florian Hahn	5f096fd221	Revert "[LoopVectorizer] Add support for partial reductions (#92418 )" This reverts commit 060d62b48aeb5080ffcae1dc56e41a06c6f56701. It looks like this is triggering an assertion when build llvm-test-suite on ARM64 macOS. Reproducer from MultiSource/Benchmarks/Ptrdist/bc/number.c target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-n32:64-S128-Fn32" target triple = "arm64-apple-macosx15.0.0" define void @test(i64 %idx.neg, i8 %0) #0 { entry: br label %while.body while.body: ; preds = %while.body, %entry %n1ptr.0.idx131 = phi i64 [ %n1ptr.0.add, %while.body ], [ %idx.neg, %entry ] %n2ptr.0.idx130 = phi i64 [ %n2ptr.0.add, %while.body ], [ 0, %entry ] %sum.1129 = phi i64 [ %add99, %while.body ], [ 0, %entry ] %n1ptr.0.add = add i64 %n1ptr.0.idx131, 1 %conv = sext i8 %0 to i64 %n2ptr.0.add = add i64 %n2ptr.0.idx130, 1 %1 = load i8, ptr null, align 1 %conv97 = sext i8 %1 to i64 %mul = mul i64 %conv97, %conv %add99 = add i64 %mul, %sum.1129 %cmp94 = icmp ugt i64 %n1ptr.0.idx131, 0 %cmp95 = icmp ne i64 %n2ptr.0.idx130, -1 %2 = and i1 %cmp94, %cmp95 br i1 %2, label %while.body, label %while.end.loopexit while.end.loopexit: ; preds = %while.body %add99.lcssa = phi i64 [ %add99, %while.body ] ret void } attributes #0 = { "target-cpu"="apple-m1" } > opt -p loop-vectorize Assertion failed: ((VF.isScalar() \|\| V->getType()->isVectorTy()) && "scalar values must be stored as (0, 0)"), function set, file VPlan.h, line 284.	2024-12-19 21:46:51 +00:00
Thurston Dang	d33a2c5811	[BoundsSan] Update BoundsChecking.cpp to use no-merge attribute where applicable (#120620 ) https://github.com/llvm/llvm-project/pull/65972 introduced -ubsan-unique-traps and -bounds-checking-unique-traps, which attach the function size to the ubsantrap intrinsic. https://github.com/llvm/llvm-project/pull/117651 changed ubsan-unique-traps to use nomerge instead of the function size, but did not update -bounds-checking-unique-traps. This patch adds nomerge to bounds-checking-unique-traps.	2024-12-19 13:31:29 -08:00
Finn Plummer	45c01e8a33	[NFC][TargetTransformInfo][VectorUtils] Consolidate `isVectorIntrinsic...` api (#117635 ) - update `VectorUtils:isVectorIntrinsicWithScalarOpAtArg` to use TTI for all uses, to allow specifiction of target specific intrinsics - add TTI to the `isVectorIntrinsicWithStructReturnOverloadAtField` api - update TTI api to provide `isTargetIntrinsicWith...` functions and consistently name them - move `isTriviallyScalarizable` to VectorUtils - update all uses of the api and provide the TTI parameter Resolves #117030	2024-12-19 11:54:26 -08:00
Kazu Hirata	2886576944	[memprof] clang-format MemProf-related files (NFC) (#120504 )	2024-12-19 10:25:29 -08:00
Abhay Kanhere	cc246d4a29	[Transforms][CodeExtraction] bug fix regions with stackrestore (#118564 ) Ensure code extraction for outlining to a function does not create a function with stacksave of caller to restore stack (e.g. tail call).	2024-12-19 09:19:11 -07:00
Veera	6f8afafd30	[InstCombine] Fold `A == MIN_INT ? B != MIN_INT : A < B` to `A < B` (#120177 ) This PR folds: `A == MIN_INT ? B != MIN_INT : A < B` to `A < B` `A == MAX_INT ? B != MAX_INT : A > B` to `A > B` Proof: https://alive2.llvm.org/ce/z/bR6E2s This helps in optimizing comparison of optional unsigned non-zero types in https://github.com/rust-lang/rust/issues/49892. Rust compiler's current output: https://rust.godbolt.org/z/9fxfq3Gn8	2024-12-19 22:52:55 +08:00
Nicholas Guy	060d62b48a	[LoopVectorizer] Add support for partial reductions (#92418 ) Following on from https://github.com/llvm/llvm-project/pull/94499, this patch adds support to the Loop Vectorizer to emit the partial reduction intrinsics where they may be beneficial for the target. --------- Co-authored-by: Samuel Tebbs <samuel.tebbs@arm.com>	2024-12-19 11:42:40 +00:00
David Sherwood	c18fda02e1	[LoopVectorize] Use new single string variant of reportVectorizationFailure (#120414 )	2024-12-19 10:07:13 +00:00
DianQK	e7a4d78ad3	[SLP] Check if instructions exist after vectorization (#120434 ) Fixes #120433.	2024-12-19 06:21:57 +08:00
Kazu Hirata	ac8a9f8fff	[memprof] Undrift MemProfRecord (#120138 ) This patch undrifts source locations in MemProfRecord before readMemprof starts the matching process. The thoery of operation is as follows: 1. Collect the lists of direct calls, one from the IR and the other from the profile. 2. Compute the correspondence (called undrift map in the patch) between the two lists with longestCommonSequence. 3. Apply the undrift map just before readMemprof consumes MemProfRecord. The new function gated by a flag that is off by default.	2024-12-18 14:21:25 -08:00
Florian Hahn	5ca3794e82	[VPlan] Move initial VPlan block creation to constructor. (NFC) This sets up the initial blocks needed to initialize a VPlan directly in the constructor. This will allow tracking of all created blocks directly in VPlan, simplifying block deletion.	2024-12-18 22:00:30 +00:00
Teresa Johnson	2916352936	[MemProf] Skip unmatched callers when cloning (#120455 ) Don't unnecessarily clone for a caller that wasn't matched to a call instruction. This necessitated updated a couple of tests that were either unnecessarily cloning or unnecessarily processing an allocation and hinting it not cold.	2024-12-18 12:47:19 -08:00
Florian Hahn	6910aec097	[VPlan] Don't use VPlan ctor taking trip count in most unit tests (NFC). Update tests to use constructor not passing a trip count VPValue. The tests don't need that and are simpler as a result.	2024-12-18 19:57:09 +00:00
Alexander Kornienko	23a239267e	Revert "[InstCombine] Infer nuw for gep inbounds from base of object" (#120460 ) Reverts llvm/llvm-project#119225 due to the lack of sanitizer support, large potential of breaking code containing latent UB, non-trivial localization and investigation, and what seems to be a bad interaction with msan (a test is in the works). Related discussions: https://github.com/llvm/llvm-project/pull/119225#issuecomment-2551904822 https://github.com/llvm/llvm-project/pull/118472#issuecomment-2549986255	2024-12-18 19:06:34 +01:00
Florian Hahn	0e8d022ffe	[VPlan] Handle exit phis with multiple operands in addUsersInExitBlocks. (#120260 ) Currently the addUsersInExitBlocks incorrectly assumes exit phis only have a single operand, which may not be the case for loops with early exits when they share a common exit block. Also further relax the assertion in fixupIVUsers to allow exit values if they come from theloop latch/middle.block. PR: https://github.com/llvm/llvm-project/pull/120260	2024-12-18 14:47:16 +00:00
Simon Pilgrim	fbc18b85d6	Revert "[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different" (#120422 ) Reverts llvm/llvm-project#115209 - investigating a reported regression	2024-12-18 13:32:53 +00:00
Yingwei Zheng	6f68010f91	[InstCombine] Drop samesign flags in `foldLogOpOfMaskedICmps_NotAllZeros_BMask_Mixed` (#120373 ) Counterexamples: https://alive2.llvm.org/ce/z/6Ks8Qz Closes https://github.com/llvm/llvm-project/issues/120361.	2024-12-18 20:40:33 +08:00
David Sherwood	13107cb094	[LoopVectorize] Enable more early exit vectorisation tests (#117008 ) PR #112138 introduced initial support for dispatching to multiple exit blocks via split middle blocks. This patch fixes a few issues so that we can enable more tests to use the new enable-early-exit-vectorization flag. Fixes are: 1. The code to bail out for any loop live-out values happens too late. This is because collectUsersInExitBlocks ignores induction variables, which get dealt with in fixupIVUsers. I've moved the check much earlier in processLoop by looking for outside users of loop-defined values. 2. We shouldn't yet be interleaving when vectorising loops with uncountable early exits, since we've not added support for this yet. 3. Similarly, we also shouldn't be creating vector epilogues. 4. Similarly, we shouldn't enable tail-folding. 5. The existing implementation doesn't yet support loops that require scalar epilogues, although I plan to add that as part of PR #88385. 6. The new split middle blocks weren't being added to the parent loop.	2024-12-18 09:25:45 +00:00
hanbeom	b7a8d9584c	[VectorCombine] Combine scalar fneg with insert/extract to vector fneg when length is different (#115209 ) insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index -> shuffle DestVec, (shuffle (fneg SrcVec), poison, SrcMask), Mask Original combining left the combine between vectors of different lengths as a TODO.	2024-12-18 07:47:42 +00:00
Vitaly Buka	55e87a79b9	[BoundsChecking] Add parameters to pass (#119894 ) This check is a part of UBSAN, but does not support verbose output like other UBSAN checks. This is a step to fix that.	2024-12-17 22:07:14 -08:00
Luke Lau	c2a879ecaa	[VPlan] Fix VPTypeAnalysis cache clobbering in EVL transform (#120252 ) When building SPEC CPU 2017 with RISC-V and EVL tail folding, this assertion in VPTypeAnalysis would trigger during the transformation to EVL recipes: `d8a0709b10/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (L135-L142)` It was caused by this recipe: ``` WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> ``` Having its type inferred as i16, when ir<%add33> and ir<0> had inferred types of i32 somehow. The cause of this turned out to be because the VPTypeAnalysis cache was getting clobbered: In this transform we were erasing recipes but keeping around the same mapping from VPValue* to Type. In the meantime, new recipes would be created which would have the same address as the old value. They would then incorrectly get the old erased VPValue's cached type: ``` --- before --- 0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6> 0x600001ec5450: <badref> <- some value that was erased --- after --- 0x600001ec5030: WIDEN ir<%mul21.neg> = vp.mul vp<%11>, ir<0>, vp<%6> 0x600001ec5450: WIDEN ir<%shr> = vp.or ir<%add33>, ir<0>, vp<%6> <- a new value that happens to have the same address ``` This fixes this by deferring the erasing of recipes till after the transformation. The test case might be a bit flakey since it just happens to have the right conditions to recreate this. I tried to add an assert in inferScalarType that every VPValue in the cache was valid, but couldn't find a way of telling if a VPValue had been erased. --------- Co-authored-by: Florian Hahn <flo@fhahn.com>	2024-12-18 11:28:28 +08:00
Luke Lau	4a7f60d328	[VPlan] Handle VPWidenCastRecipe without underlying value in EVL transform (#120194 ) This fixes a crash that shows up when building SPEC CPU 2017 with EVL tail folding on RISC-V. A VPWidenCastRecipe doesn't always have an underlying value, and in the case of this crash this happens whenever a widened cast is created via truncateToMinimalBitwidths. Fix this by just using the opcode stored in the recipe itself. I think a similar issue exists with VPWidenIntrinsicRecipe and how it's widened, but I haven't run into any crashes with it just yet.	2024-12-18 11:28:07 +08:00
Teresa Johnson	a15e7b11da	[MemProf] Add option to hint allocations at a given cold byte percentage (#120301 ) Optionally unconditionally hint allocations as cold or not cold during the matching step if the percentage of bytes allocated is at least that of the given threshold.	2024-12-17 15:53:56 -08:00
Florian Hahn	eb59fe8d04	[VPlan] Remove redundant assignment in VPReductionPHIRecipe (NFC) Suggested post-commit for 0e528ac404e13ed2d952a2d83aaf8383293c851e.	2024-12-17 21:32:40 +00:00
Florian Hahn	4ad0fdd163	[VPlan] Remove reverse() of predecessors from VPInstruction::generate. This was originally done to reduce the diff for the change. Remove it and update the remaining tests. NFC modulo reordering of incoming values. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-17 20:44:32 +00:00
Simon Pilgrim	5287299f88	[VectorCombine] foldShuffleOfBinops - prefer same cost fold if it reduces instruction count (#120216 ) We don't fold "shuffle (binop), (binop)" -> "binop (shuffle), (shuffle)" if the old/new costs are equal, but we can relax this if either new shuffle will constant fold as it will reduce instruction count.	2024-12-17 18:10:20 +00:00
Simon Pilgrim	146240ef1c	Fix MSVC " 32-bit shift implicitly converted to 64 bits" warnings. NFC.	2024-12-17 16:01:19 +00:00
Florian Hahn	a487b792e2	[TySan] Add initial Type Sanitizer (LLVM) (#76259 ) This patch introduces the LLVM components of a type sanitizer: a sanitizer for type-based aliasing violations. It is based on Hal Finkel's https://reviews.llvm.org/D32198. C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit these given TBAA metadata added by Clang. Roughly, a pointer of given type cannot be used to access an object of a different type (with, of course, certain exceptions). Unfortunately, there's a lot of code in the wild that violates these rules (e.g. for type punning), and such code often must be built with -fno-strict-aliasing. Performance is often sacrificed as a result. Part of the problem is the difficulty of finding TBAA violations. Hopefully, this sanitizer will help. For each TBAA type-access descriptor, encoded in LLVM's IR using metadata, the corresponding instrumentation pass generates descriptor tables. Thus, for each type (and access descriptor), we have a unique pointer representation. Excepting anonymous-namespace types, these tables are comdat, so the pointer values should be unique across the program. The descriptors refer to other descriptors to form a type aliasing tree (just like LLVM's TBAA metadata does). The instrumentation handles the "fast path" (where the types match exactly and no partial-overlaps are detected), and defers to the runtime to handle all of the more-complicated cases. The runtime, of course, is also responsible for reporting errors when those are detected. The runtime uses essentially the same shadow memory region as tsan, and we use 8 bytes of shadow memory, the size of the pointer to the type descriptor, for every byte of accessed data in the program. The value 0 is used to represent an unknown type. The value -1 is used to represent an interior byte (a byte that is part of a type, but not the first byte). The instrumentation first checks for an exact match between the type of the current access and the type for that address recorded in the shadow memory. If it matches, it then checks the shadow for the remainder of the bytes in the type to make sure that they're all -1. If not, we call the runtime. If the exact match fails, we next check if the value is 0 (i.e. unknown). If it is, then we check the shadow for the remainder of the byes in the type (to make sure they're all 0). If they're not, we call the runtime. We then set the shadow for the access address and set the shadow for the remaining bytes in the type to -1 (i.e. marking them as interior bytes). If the type indicated by the shadow memory for the access address is neither an exact match nor 0, we call the runtime. The instrumentation pass inserts calls to the memset intrinsic to set the memory updated by memset, memcpy, and memmove, as well as allocas/byval (and for lifetime.start/end) to reset the shadow memory to reflect that the type is now unknown. The runtime intercepts memset, memcpy, etc. to perform the same function for the library calls. The runtime essentially repeats these checks, but uses the full TBAA algorithm, just as the compiler does, to determine when two types are permitted to alias. In a situation where access overlap has occurred and aliasing is not permitted, an error is generated. Clang's TBAA representation currently has a problem representing unions, as demonstrated by the one XFAIL'd test in the runtime patch. We'll update the TBAA representation to fix this, and at the same time, update the sanitizer. When the sanitizer is active, we disable actually using the TBAA metadata for AA. This way we're less likely to use TBAA to remove memory accesses that we'd like to verify. As a note, this implementation does not use the compressed shadow-memory scheme discussed previously (http://lists.llvm.org/pipermail/llvm-dev/2017-April/111766.html). That scheme would not handle the struct-path (i.e. structure offset) information that our TBAA represents. I expect we'll want to further work on compressing the shadow-memory representation, but I think it makes sense to do that as follow-up work. It goes together with the corresponding clang changes (https://github.com/llvm/llvm-project/pull/76260) and compiler-rt changes (https://github.com/llvm/llvm-project/pull/76261) PR: https://github.com/llvm/llvm-project/pull/76259	2024-12-17 13:57:34 +00:00
Nikita Popov	1157187496	[VPlan] Propagate all GEP flags (#119899 ) Store GEPNoWrapFlags instead of only InBounds and propagate them.	2024-12-17 13:48:50 +01:00
David Green	2a7ed2c1aa	[SROA] Protect against calling the alloca ptr In case we are calling the alloca ptr directly, check that the Use is a normal operand to the call. Fortran is a funny language.	2024-12-17 09:21:15 +00:00
Artem Pianykh	fbdbb13d5b	[NFC][Utils] Eliminate DISubprogram set from BuildDebugInfoMDMap (#118625 ) Summary: Previously, we'd add all SPs distinct from the cloned one into a set. Then when cloning a local scope we'd check if it's from one of those 'distinct' SPs by checking if it's in the set. We don't need to do that. We can just check against the cloned SP directly and drop the set. Test Plan: ninja check-llvm-unit check-llvm	2024-12-17 08:57:59 +00:00
Florian Hahn	58cfa39861	[VPlan] Remove legacy VPlan() constructors (NFC). The constructors were retained to reduce the diff during transition. Remove them now.	2024-12-17 08:22:22 +00:00
Luke Lau	fba3e069b4	[VPlan] Remove overlapping VPInstruction::mayWriteToMemory. NFCI (#120039 ) VPInstruction has a definition of mayWriteToMemory, which seems to only be used by VPlanSLP. However VPInstructions are already handled in VPRecipeBase::mayWriteToMemory, and everywhere else seems to use this definition. I think these should be the same for all intents and purposes. The VPRecipeBase definition is more conservative but returns true for stores/calls/invokes/SLPStores.	2024-12-17 11:02:55 +08:00
Florian Hahn	0e528ac404	[VPlan] Use start value operand for FindLastIV reduction phis. Update VPReductionPHIRecipe::execute to use the start value from the start value operand of the recipe. This is needed to make sure we resume from the correct value during epilogue vectorization. At the moment, the start value is set to the sentinel value in adjustRecipesForReductions, as the original start value needs to be used when creating ResumePhi recipes. Fixes a mis-compile introduced by b3cba9be41bfa8 in SPEC2017 on AArch64.	2024-12-16 23:29:49 +00:00
Artem Pianykh	8402a0fab0	[NFC][Utils] Extract CloneFunctionBodyInto from CloneFunctionInto (#118624 ) Summary: This and previously extracted `CloneFunction*Into` functions will be used in later diffs. Test Plan: ninja check-llvm-unit check-llvm	2024-12-16 22:30:56 +00:00
Artem Pianykh	a9237b1a10	[NFC][Utils] Extract CloneFunctionMetadataInto from CloneFunctionInto (#118623 ) Summary: The new API expects the caller to populate the VMap. We need it this way for a subsequent change around coroutine cloning. Test Plan: ninja check-llvm-unit check-llvm	2024-12-16 20:50:05 +00:00
Florian Hahn	f9120dc2a6	[VPlan] Make sure vector trip count is ready for prepareToExecute (NFC) Split off from https://github.com/llvm/llvm-project/pull/112145. This ensures that getOrCreateVectorTripCount creates the trip count as needed when induction resume value creation is moved to VPlan and no longer creates the vector trip count early.	2024-12-16 20:44:20 +00:00
Florian Hahn	89d5272841	[VPlan] Remove getPreheader(). (NFC) The preheader is now the entry block, connected to the vector.ph. Clean up after https://github.com/llvm/llvm-project/pull/114292.	2024-12-16 19:48:02 +00:00
Kazu Hirata	1dac0cd41f	[memprof] Use ListSeparator (NFC) (#120047 ) ListSeparator from StringExtras.h is essentially the same as FieldSeparator being removed in this patch. ListSeparator returns the empty string on the first use via "operator StringRef()". It returns ", " on subsequent uses.	2024-12-16 09:41:16 -08:00
Simon Pilgrim	8217c2eaef	[VectorCombine] foldShuffleOfBinops - extend to handle icmp/fcmp ops as well (#120075 ) Extend binary instructions matching to match compare instructions + predicate as well.	2024-12-16 17:23:04 +00:00
Vedant Paranjape	b21fa18b44	[LoopVersioning] Add a check to see if the input loop is in LCSSA form (#116443 ) Loop Optimizations expect the input loop to be in LCSSA form. But it seems that LoopVersioning doesn't have any check to see if the loop is actually in LCSSA form. As a result, if we give it a loop which is not in LCSSA form but still correct semantically, the resulting transformation fails to pass through verifier pass with the following error. Instruction does not dominate all uses! %inc = add nsw i16 undef, 1 store i16 %inc, ptr @c, align 1 As the loop is not in LCSSA form, LoopVersioning's transformations leads to invalid IR! As some instructions do not dominate all their uses. This patch checks if a loop is in LCSSA form, if not it will call formLCSSARecursively on the loop before passing it to LoopVersioning. Fixes: #36998	2024-12-16 11:55:19 -05:00
Alexey Bataev	d1a7225076	[SLP]Check if the node must keep its original bitwidth Need to check if during previous analysis the node has requested to keep its original bitwidth to avoid incorrect codegen. Fixes #120076	2024-12-16 08:01:22 -08:00
Yingwei Zheng	7d25bcef09	[InstCombine] Recursively replace condition with constant in select arms (#120011 ) This patch is proposed to reduce the number of selects with undefs introduced by https://github.com/llvm/llvm-project/pull/119884.	2024-12-16 21:11:59 +08:00
Florian Hahn	95e509a989	[VPlan] Add VPWidenInduction recipe as common base class (NFC). (#120008 ) This helps to simplify some existing code and new code (https://github.com/llvm/llvm-project/pull/112145) PR: https://github.com/llvm/llvm-project/pull/120008	2024-12-16 09:40:03 +00:00
Yingwei Zheng	003fb2aeb4	[ConstraintElim] Decompose `sub nsw` (#118219 ) Closes https://github.com/llvm/llvm-project/issues/118211.	2024-12-16 16:41:04 +08:00
Luke Lau	4746395bd7	[VPlan] Omit zero add in VPWidenIntOrFpInductionRecipe (#119668 ) I'm not sure if getStepVector was used for other things in the past where StartIdx was non-zero, but nowadays VPWidenIntOrFpInductionRecipe is the only user of it, and just passes zero to it. I presume InstCombine was already catching this so hopefully removing this won't affect codegen.	2024-12-16 11:55:48 +08:00
Florian Hahn	43045051d4	[VPlan] Modernize VPWidenIntOrFpInductionRecipe printing (NFC). Modernize VPWidenIntOrFpInductionRecipe printing by including the result VPValue and all operand VPValues, similar to VPScalarIVStepsRecipe and VPDerivedIVRecipe.	2024-12-15 20:46:52 +00:00
Florian Hahn	e64650d702	[VPlan] Get types and step from VPWidenPointerInductionRecipe (NFC). Use information directly from operands instead of going through IVDescriptor.	2024-12-15 18:52:10 +00:00
Ramkumar Ramachandra	a22578d38c	ConstraintElim: teach fact-transfer about samesign (#115893 ) When the samesign flag is present on an icmp, we can transfer all the facts on the unsigned system to the signed system, and vice-versa: we do this by specializing transferToOtherSystem when samesign is present.	2024-12-15 17:31:58 +00:00

1 2 3 4 5 ...

38440 Commits