llvm-project

Author	SHA1	Message	Date
Veera	9d1fbbd2b9	[SROA][NFC] Remove Unused Parameter in `promoteAllocas()` (#128382 ) Removing it because `Function &F` is not used by `promoteAllocas()`.	2025-02-23 11:17:43 -05:00
Florian Hahn	b72bbfc293	[VPlan] Remove fixHeaderPhis (NFC). Removes unneeded code after https://github.com/llvm/llvm-project/pull/124432.	2025-02-23 10:51:20 +00:00
Yingwei Zheng	2ebc69a521	[InstCombine] Add support for GEPs in `simplifyNonNullOperand` (#128365 ) Alive2: https://alive2.llvm.org/ce/z/2KE8zG	2025-02-23 17:19:31 +08:00
Florian Hahn	0859df4e42	[VPlan] Use operands from initial VPInstructions directly (NFC). Use operands from VPInstructions directly during recipe creation. Follow-up as discussed and planned after https://github.com/llvm/llvm-project/pull/124432.	2025-02-22 22:34:35 +00:00
Florian Hahn	30f44c9627	[VPlan] Set values for non-header phis at construction. (NFC) Update HCFG builder to set the incoming values directly at construction for non-header phis. Simplification/clarification as suggested independently in https://github.com/llvm/llvm-project/pull/126388.	2025-02-22 17:27:10 +00:00
Teresa Johnson	eb92157399	[MemProf] Add ability to export or highlight only a portion of graph (#128255 ) To simplify debugging and analysis, particularly for very large applications with large graphs, this patch adds support for either highlighting a single context id or allocation's context ids, and/or only exporting the nodes/edges for a single context id or allocation's context ids. When highlighting, the specified nodes and edges are a brighter color and larger. This can be controlled by the new -memprof-dot-scope={all,alloc,context} flag which controls how much to export, along with two companion flags: -memprof-dot-alloc-id=ID -memprof-dot-context-id=ID These two are interpreted differently depending on the value of -memprof-dot-scope (where "all" is the default). If exporting all, one of the above flags can optionally be passed to highlight the nodes/edges for the given context id or allocation's context ids. If exporting alloc scope, an alloc id must be provided. A context id can optionally be provided to highlight that context. If exporting context scope, a context id must be provided. The ids to use can be obtained either by looking at the full graph, or a context id can be identified from the -memprof-report-hinted-sizes output after PR128188 is merged.	2025-02-22 05:42:46 -08:00
Teresa Johnson	9d6f2647de	[MemProf] Print internal context id when reporting bytes hinted (#128188 ) During the whole program reporting of contexts when hinted byte reporting is enabled via -memprof-report-hinted-sizes, also print the internal context id. This is useful for debugging, as well as for guiding the dot file dumping with some upcoming changes that will accept a context id to focus the graph on a context of interest.	2025-02-22 05:42:28 -08:00
Luke Lau	e23ab73335	[VPlan] Don't convert widen recipes to VP intrinsics in EVL transform (#127180 ) This is a copy of #126177, since it was automatically and permanently closed because I messed up the source branch on my remote This patch proposes to avoid converting widening recipes to VP intrinsics during the EVL transform. IIUC we initially did this to avoid `vl` toggles on RISC-V. However we now have the RISCVVLOptimizer pass which mostly makes this redundant. Emitting regular IR instead of VP intrinsics allows more generic optimisations, both in the middle end and DAGCombiner, and we generally have better patterns in the RISC-V backend for non-VP nodes. Sticking to regular IR instructions is likely a lot less work than reimplementing all of these optimisations for VP intrinsics, and on SPEC CPU 2017 we get noticeably better code generation.	2025-02-22 19:38:11 +08:00
Florian Hahn	b74413bf91	[VPlan] Use VPSingleDef instead of VPValue in HCFG builder (NFC). Use VPSingleDef to remove unneeded casts to a recipe type.	2025-02-22 11:15:37 +00:00
Mikhail Gudim	f5d153ef26	[VectorCombine] Fold binary op of reductions. (#121567 ) Replace binary of of two reductions with one reduction of the binary op applied to vectors. For example: ``` %v0_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v0) %v1_red = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %v1) %res = add i32 %v0_red, %v1_red ``` gets transformed to: ``` %1 = add <16 x i32> %v0, %v1 %res = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %1) ```	2025-02-22 06:11:33 -05:00
Florian Hahn	26afa2deea	[VPlan] Create VPInstructions after setting preds in HCFG builder (NFC) Set VPBBs predecessors before creating VPInstructions, as setting incoming values for non-header phis directly there will require predecessors to be available.	2025-02-22 10:33:17 +00:00
Antonio Frighetto	93b263a01c	[SimplifyCFG] Drop unused `LockstepReverseIterator` class (NFC) Unmaintained code has been removed.	2025-02-22 11:26:13 +01:00
Antonio Frighetto	48a6df3604	Reapply "[Utils] Consolidate `LockstepReverseIterator` into own header (NFC)" Common code has been unified and generalized. Original commit: 123dca9b56e1359d8ec7771ea3bd0afd4b1ea6af Previously reverted due to accidentally merged incompletely. The issue has been addressed by restoring missing code.	2025-02-22 11:21:36 +01:00
cooperp	f4e8f6da41	[Reassociate] Use a reference to DataLayout instead of copying the underlying string data (NFC) (#128269 ) I noticed this when looking at all allocations by clang. For a medium sized file this was around 6000 calls to operator new, although i suspect there were more allocations in total as the SmallVectors in DataLayout may have their own allocations in some cases. In a follow-up i'm tempted to make the DataLayout copy constructor private, to avoid this in future. There are a few tests which copy the DataLayout, and perhaps need to (I didn't check yet), but we could provide a clone() method for them if needed. Its only accidental copying I think we should consider avoiding, not people who really do need to copy it for reasons.	2025-02-22 10:37:24 +01:00
Yingwei Zheng	126016b662	[InstCombine] Simplify nonnull pointers (#128111 ) This patch is the follow-up of https://github.com/llvm/llvm-project/pull/127979. It introduces a helper `simplifyNonNullOperand` to avoid duplicate logic. It also addresses the one-use issue in `visitLoadInst`, as discussed in https://github.com/llvm/llvm-project/pull/127979#issuecomment-2671013972. The `nonnull` attribute is also supported. Proof: https://alive2.llvm.org/ce/z/MCKgT9	2025-02-22 15:30:04 +08:00
Alexey Bataev	8ffdc3b207	[SLP]Fix a crash when checking a scalar in a reordered buildvector node Need to check reordered scalars, not the original ones, to correctly check proper scalar.	2025-02-21 14:59:43 -08:00
Teresa Johnson	92e02ad9dc	[MemProf] Display backedges with dotted line in dot graphs (#128235 ) Add checking of this behavior in the postbuild dot graphs, facilitated by PR128226 which marked these edges at the end of the graph building.	2025-02-21 14:49:28 -08:00
Florian Hahn	236fa506d4	Revert "[Utils] Consolidate `LockstepReverseIterator` into own header (NFC) (#116657 )" This reverts commit 123dca9b56e1359d8ec7771ea3bd0afd4b1ea6af. This breaks building on macOS with clang and multiple build bots, including https://lab.llvm.org/buildbot/#/builders/175/builds/13585 llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp: In function ‘bool sinkCommonCodeFromPredecessors(llvm::BasicBlock, llvm::DomTreeUpdater)’: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:2503:3: error: reference to ‘LockstepReverseIterator’ is ambiguous 2503 \| LockstepReverseIterator<true> LRI(UnconditionalPreds); \| ^~~~~~~~~~~~~~~~~~~~~~~	2025-02-21 21:00:28 +00:00
Teresa Johnson	c3d5070086	[MemProf] Refactor backedge computation and invoke earlier (#128226 ) Invoke the backedge computation (refactored as a new method) at the end of the graph construction, instead of at the start of cloning. That makes more logical sense, and it also makes it easier to look at the results in the postbuild dot graph with a follow on change to display those differently.	2025-02-21 12:57:40 -08:00
Antonio Frighetto	123dca9b56	[Utils] Consolidate `LockstepReverseIterator` into own header (NFC) (#116657 ) Common code has been unified and generalized. Not sure if it may be worth to generalize this further, since it looks closely tied to Blocks (might make sense to rename it in `LockstepReverseInstructionIterator`).	2025-02-21 12:21:33 -08:00
Teresa Johnson	741f923fac	[MemProf] Minor fixes to dot graph printing (#128217 ) Two misc cleanup/improvements to the dot printing. Remove a redundant "style=filled" in the Node attributes. No effect on resulting graph. Add a "color" attribute to the Edge, with the same color name as "fillcolor". The latter only fills in the arrowhead, and the former is what affects the line. This makes the edge colors more visible, previously it was a black edge with a colored in arrowhead. For the second change, I added the new Edge color attributes to the checking in the two "basic.ll" tests, so we get some testing coverage of the full printing. For the other affected tests I removed the final "]'" after the fillcolor so it matches up through that attribute and ignores the rest of the line.	2025-02-21 12:02:06 -08:00
Kazu Hirata	34cebaf73a	[Instrumentation] Avoid repeated hash lookups (NFC) (#128128 )	2025-02-21 11:08:12 -08:00
Ramkumar Ramachandra	2d38be5fd4	[LV] Strip redundant casts (NFC) (#128177 )	2025-02-21 17:37:39 +00:00
Alexey Bataev	894935cb51	[SLP]Represent SLP graph as a tree We can stop using a graph representation of the SLP structure and switch directly to tree by relying on a single user of each tree node. If the node has multiple uses, other uses must be represented as a separate gather/buildvector node, which then will be combined with the existing vectorized node(s) uoon cost estimation/codegen. This allow to simplify inner structure and turn in some extra optimizations, which could not be turned on for the nodes with multi users (reordering, minbitwidth analysis). AVX512, -O3+LTO Metric: size..text results results0 diff test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 253453.00 254253.00 0.3% test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test 251411.00 252051.00 0.3% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 19114.00 19146.00 0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1399200.00 1399520.00 0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1399200.00 1399520.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 304310.00 304326.00 0.0% test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 304662.00 304678.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12566919.00 12567511.00 0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1146300.00 1146316.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1159864.00 1159880.00 0.0% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9407880.00 9407864.00 -0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9407880.00 9407864.00 -0.0% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1011612.00 1011596.00 -0.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280584.00 280536.00 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 93016.00 93000.00 -0.0% ASCI_Purple/SMG2000 - extra code vectorized, small variations CFP2006/444.namd - small variations, less shuffles Benchmarks/Misc/oourafft - small variations CFP2017rate/538.imagick_r CFP2017speed/638.imagick_s - small variations, less shuffles LCALS/SubsetALambdaLoops - less shuffles LCALS/SubsetARawLoops - less shuffles CFP2017rate/526.blender_r - small variations, extra vector code CFP2006/453.povray - small variations CFP2017rate/511.povray_r - small variations CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - small variations Benchmarks/tramp3d-v4 - small variations Prolangs-C/TimberWolfMC - small variations DOE-ProxyApps-C++/miniFE - extra code vectorized, small variations DOE-ProxyApps-C++/CLAMR - extra code vectorized, small variations ASCI_Purple/SMG2000 - no significant changes RISCV, -O3+LTO Metric: size..text results results0 diff test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-pr28982b.test 1812.00 1866.00 3.0% test-suite :: MultiSource/Benchmarks/Olden/health/health.test 3946.00 4016.00 1.8% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 513180.00 513550.00 0.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 513180.00 513550.00 0.1% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7672198.00 7672202.00 0.0% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7672198.00 7672202.00 0.0% test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 746060.00 746044.00 -0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9497716.00 9497364.00 -0.0% test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 948266.00 948214.00 -0.0% test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 89874.00 89862.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 835492.00 835346.00 -0.0% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 66230.00 66202.00 -0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 946090.00 944206.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1136404.00 1131854.00 -0.4% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1136404.00 1131854.00 -0.4% gcc-c-torture/execute/GCC-C-execute-pr28982b - better vector code Olden/health - extra vector code CINT2017speed/625.x264_s CINT2017rate/525.x264_r - small variation + improvements in reordering, @pixel_hadamard_ac stopped being vectorized because of some non-effective shuffle recognition by the compiler CINT2017rate/502.gcc_r CINT2017speed/602.gcc_s - small variations CFP2017rate/508.namd_r - small variations CFP2017rate/526.blender_r - small variations CFP2006/453.povray - extra vector code Benchmarks/7zip - extra vector code DOE-ProxyApps-C++/miniFE - small variations CFP2017rate/511.povray_r - extra vector code CFP2017speed/638.imagick_s CFP2017rate/538.imagick_r - extra vector code Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/126771	2025-02-21 07:15:02 -05:00
Aleksandr Popov	41437a6067	[LoopSimplifyCFG] Fix SCEV invalidation after removing dead exit (#127536 ) Fixes #127534	2025-02-21 12:26:39 +01:00
vporpo	4d92975b5c	[SandboxVec][Scheduler] Don't allow rescheduling of already scheduled (#128050 ) This patch implements the check for not allowing re-scheduling of instructions that have already been scheduled in a scheduling bundle. Rescheduling should only happen if the instructions were temporarily scheduled in singleton bundles during a previous call to `trySchedule()`.	2025-02-20 16:16:34 -08:00
Vasileios Porpodas	2ff80d2448	[SandboxVec][Scheduler] Fix reassignment of SchedBundle to DGNode When assigning a bundle to a DAG Node that is already assigned to a SchedBundle we need to remove the node from the old bundle.	2025-02-20 15:28:16 -08:00
vporpo	10b99e97ff	[SandboxVec][BottomUpVec] Separate vectorization decisions from code generation (#127727 ) Up until now the generation of vector instructions was taking place during the top-down post-order traversal of vectorizeRec(). The issue with this approach is that the vector instructions emitted during the traversal can be reordered by the scheduler, making it challenging to place them without breaking the def-before-uses rule. With this patch we separate the vectorization decisions (done in `vectorizeRec()`) from the code generation phase (`emitVectors()`). The vectorization decisions are stored in the `Actions` vector and are used by `emitVectors()` to drive code generation.	2025-02-20 10:21:25 -08:00
Simon Pilgrim	2fab6db728	[VectorCombine] foldSelectShuffle - remove extra adds of old shuffles to worklist (#127999 ) We already push the old shuffles to the worklist as part of the replaceValue calls, so we shouldn't need to add them to the deferred list as well - my guess is this was to ensure that the instructions got erased first to help cleanup unused instructions, but eraseInstruction should handle this now.	2025-02-20 18:02:34 +00:00
Kazu Hirata	4a8f414565	[Utils] Avoid repeated hash lookups (NFC) (#127959 )	2025-02-20 08:56:56 -08:00
Kazu Hirata	506b31ec36	[IPO] Avoid repeated hash lookups (NFC) (#127957 )	2025-02-20 08:55:52 -08:00
Florian Hahn	404af37175	[VPlan] Remove stale assertion in HCFG builder. The assertion was left over from a time when VPBBs still had an associated condition bit. This is not the case any more (comment was stale). In case a branch on condition is needed, a BranchOnCond VPInstruction is added when constructing recipes. That's also where it is checked if the condition is available. Exposed by 38376dee9.	2025-02-20 17:01:49 +01:00
Yingwei Zheng	1b78ff6972	[InstCombine] Simplify the pointer operand of store if writing to null is UB (#127979 ) Proof: https://alive2.llvm.org/ce/z/mzVj-u I will add some follow-up patches to avoid duplicate code, support more memory instructions, and bypass gep instructions.	2025-02-20 23:53:45 +08:00
Kazu Hirata	2130b9cea4	[Coroutines] Avoid repeated hash lookups (NFC) (#127956 )	2025-02-19 23:29:46 -08:00
Kazu Hirata	6342095bce	[memprof] Fix a warning This patch fixes: llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:3409:8: error: unused variable 'I' [-Werror,-Wunused-variable]	2025-02-19 14:28:13 -08:00
Teresa Johnson	92b07520bc	[MemProf] Support cloning through recursive cycles (#127429 ) In order to facilitate cloning of recursive cycles, we first identify backedges using a standard DFS search from the root callers, then initially defer recursively invoking the cloning function via those edges. This is because the cloning opportunity along the backedge may not be exposed until the current node is cloned for other non-backedge callers that are cold after the earlier recursive cloning, resulting in a cold predecessor of the backedge. So we recursively invoke the cloning function for the backedges during the cloning of the current node for its caller edges (which were sorted to enable handling cold callers first). There was no significant time or memory overhead measured for several large applications.	2025-02-19 12:44:33 -08:00
Craig Topper	1761066fc6	[GlobalOpt] Remove Function* argument from tryWidenGlobalArrayAndDests. NFC (#127848 ) This is only used to get the Module and the LLVMContext. We can get both of those from the GlobalVariable*.	2025-02-19 12:37:54 -08:00
Björn Pettersson	c833746c6c	[DSE] Make iter order deterministic in removePartiallyOverlappedStores. NFC (#127678 ) In removePartiallyOverlappedStores we iterate over InstOverlapIntervalsTy which is a DenseMap. Change that map into using MapVector to ensure that we apply the transforms in a deterministic order. I've only seen that the order matters if starting to use names for the instructions created when doing the transforms. But such things are a bit annoying when debugging etc.	2025-02-19 21:24:49 +01:00
Craig Topper	2bf473bd54	[GlobalOpt] Don't query TTI on a llvm.memcpy declaration. (#127760 ) Querying TTI creates a Subtarget object, but an llvm.memcpy declaration doesn't have target-cpu and target-feature attributes like functions with definitions. This can cause a warning to be printed on RISC-V because the target-abi in the Module requires floating point, but the subtarget features don't enable floating point. So far we've only seen this in LTO when an -mcpu is not supplied for the TargetMachine. To fix this, get TTI for the calling function instead. Fixes the issue reported here https://github.com/llvm/llvm-project/issues/69780#issuecomment-2665273161	2025-02-19 10:17:07 -08:00
Florian Hahn	a96444af44	[VPlan] Remove dead exit block handling code in HCFGBuilder. The mapping of IR ExitBB to a VPBB isn't used. It also sets an incorrect VPBB for the ExitBB; the regions successor is the middle block, no the exit block. It also unnecessarily triggers an assertion after 38376dee922.	2025-02-19 18:51:45 +01:00
Andreas Jonson	aa847ced07	[InstCombine] handle trunc to i1 in foldSelectICmpAndBinOp (#127390 ) for `trunc nuw` saves a instruction and otherwise only other instructions without the select, same behavior as for bit test before. proof: https://alive2.llvm.org/ce/z/a6QmyV	2025-02-19 18:29:47 +01:00
Andreas Jonson	8fc03e4ff1	[InstCombine] avoid extra instructions in foldSelectICmpAnd (#127398 ) Disable fold when it will result in more instructions.	2025-02-19 18:09:24 +01:00
Nico Weber	e2ba1b6ffd	Revert "Reapply [CaptureTracking][FunctionAttrs] Add support for CaptureInfo (#125880 )" This reverts commit 0fab404ee874bc5b0c442d1841c7d2005c3f8729. Seems to break LTO builds of clang on Windows, see comments on https://github.com/llvm/llvm-project/pull/125880	2025-02-19 11:32:57 -05:00
Yingwei Zheng	b2659ca44b	[InstCombine] Propagate flags in `foldSelectICmpAndBinOp` (#127437 ) It is always safe to add poison-generating flags for `BinOp Y, Identity`. Proof: https://alive2.llvm.org/ce/z/8BLEpq and https://alive2.llvm.org/ce/z/584Bb4 Then we can propagate flags from one of the arms: ``` select Cond, Y, (BinOp flags Y, Z) -> select Cond, (BinOp flags Y, Identity), (BinOp flags Y, Z) -> BinOp flags Y, (select Cond, Identity, Z) ``` This patch is proposed to avoid information loss caused by https://github.com/llvm/llvm-project/pull/127390.	2025-02-19 09:22:15 +08:00
vporpo	0cc7381543	[SandboxVec][Scheduler] Don't insert scheduled instrs into the ready list (#127688 ) In a particular scenario (see test) we used to insert scheduled instructions into the ready list. This patch fixes this by fixing the trimSchedule() function.	2025-02-18 16:17:46 -08:00
vporpo	0f6c18e8c6	[SandboxVec] Replace hard-coded context save() with transaction-save pass (#127690 ) This patch implements a small region pass that saves the context's state. The patch is now used in the default pipeline to save the context state instead of the hard-coded call to Context::save(). The concept behind this is that the passes themselves should not have to do the actual saving/restoring of the IR state, because that would make it challenging to reorder them in the pipeline. Having separate save/restore passes makes the transformation passes more composable as parts of arbitrary pipelines.	2025-02-18 13:34:51 -08:00
vporpo	5ecce45ea2	[SandboxVec] Move seed collection into its own separate pass (#127132 ) This patch moves the seed collection logic from the BottomUpVec pass into a new Sandbox IR Function pass. The new "seed-collection" pass collects the seeds, builds a region and runs the region pass pipeline.	2025-02-18 11:11:07 -08:00
vporpo	426148b269	[SandboxVec][DAG] Implement DAG maintainance on Instruction removal (#127361 ) This patch implements dependency maintenance upon receiveing the notification that an instruction gets deleted.	2025-02-18 10:59:31 -08:00
Kazu Hirata	5d4eb08379	[Analysis] Remove skipSCC (#127412 ) The last use was removed in: commit fa6ea7a419f37befbed04368bcb8af4c718facbb Author: Arthur Eubanks <aeubanks@google.com> Date: Mon Mar 20 11:18:35 2023 -0700	2025-02-18 09:59:12 -08:00
Björn Pettersson	74016728e3	[DSE] Update dereferenceable attributes when adjusting memintrinsic ptr (#125073 ) Consider IR like this call void @llvm.memset.p0.i64(ptr dereferenceable(28) %p, i8 0, i64 28, i1 false) store i32 1, ptr %p In the past it has been optimized like this: %p2 = getelementptr inbounds i8, ptr %p, i64 4 call void @llvm.memset.p0.i64(ptr dereferenceable(28) %p2, i8 0, i64 24, i1 false) store i32 1, ptr %p As the input IR doesn't guarantee that it is OK to deref 28 bytes starting at the adjusted pointer %p2 the transformation has been a bit flawed. With this patch we make sure to drop any dereferenceable/dereferenceable_or_null attributes when doing such transforms. An alternative would have been to adjust the amount of dereferenceable bytes, but since a memset with a constant length already implies dereferenceability by itself it is simpler to just drop the attributes. The new filtering of attributes is done using a helper that only keep attributes that we explicitly handle. For the adjusted mem instrinsic pointers that currently involve "NonNull", "NoUndef" and "Alignment" (when the alignment is known to be fulfilled also after offsetting the pointer). Fixes #115976	2025-02-18 17:51:14 +01:00

1 2 3 4 5 ...

38986 Commits