llvm-project

Author	SHA1	Message	Date
Augie Fackler	9029bda041	[Attributor] Don't crash if getAnalysisResultForFunction() returns null LoopInfo I have no idea what's going on here. This code was moved around/introduced in change cb26b01d57f5 and starts crashing with a NULL dereference once I apply https://reviews.llvm.org/D123090. I assume that I've unwittingly taught the attributor enough that it's able to do more clever things than in the past, and it's able to trip on this case. I make no claims about the correctness of this patch, but it passes tests and seems to fix all the crashes I've been seeing. Differential Revision: https://reviews.llvm.org/D129589	2022-07-12 16:44:06 -04:00
Yuanfang Chen	fcb7d76d65	[coroutine] add nomerge function attribute to `llvm.coro.save` It is illegal to merge two `llvm.coro.save` calls unless their `llvm.coro.suspend` users are also merged. Marks it "nomerge" for the moment. This reverts D129025. Alternative to D129025, which affects other token type users like WinEH. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D129530	2022-07-12 10:39:38 -07:00
Nick Desaulniers	2240d72f15	[X86] initial -mfunction-return=thunk-extern support Adds support for: * `-mfunction-return=<value>` command line flag, and * `__attribute__((function_return("<value>")))` function attribute Where the supported <value>s are: * keep (disable) * thunk-extern (enable) thunk-extern enables clang to change ret instructions into jmps to an external symbol named __x86_return_thunk, implemented as a new MachineFunctionPass named "x86-return-thunks", keyed off the new IR attribute fn_ret_thunk_extern. The symbol __x86_return_thunk is expected to be provided by the runtime the compiled code is linked against and is not defined by the compiler. Enabling this option alone doesn't provide mitigations without corresponding definitions of __x86_return_thunk! This new MachineFunctionPass is very similar to "x86-lvi-ret". The <value>s "thunk" and "thunk-inline" are currently unsupported. It's not clear yet that they are necessary: whether the thunk pattern they would emit is beneficial or used anywhere. Should the <value>s "thunk" and "thunk-inline" become necessary, x86-return-thunks could probably be merged into x86-retpoline-thunks which has pre-existing machinery for emitting thunks (which could be used to implement the <value> "thunk"). Has been found to build+boot with corresponding Linux kernel patches. This helps the Linux kernel mitigate RETBLEED. * CVE-2022-23816 * CVE-2022-28693 * CVE-2022-29901 See also: * "RETBLEED: Arbitrary Speculative Code Execution with Return Instructions." * AMD SECURITY NOTICE AMD-SN-1037: AMD CPU Branch Type Confusion * TECHNICAL GUIDANCE FOR MITIGATING BRANCH TYPE CONFUSION REVISION 1.0 2022-07-12 * Return Stack Buffer Underflow / Return Stack Buffer Underflow / CVE-2022-29901, CVE-2022-28693 / INTEL-SA-00702 SystemZ may eventually want to support "thunk-extern" and "thunk"; both options are used by the Linux kernel's CONFIG_EXPOLINE. This functionality has been available in GCC since the 8.1 release, and was backported to the 7.3 release. Many thanks for folks that provided discrete review off list due to the embargoed nature of this hardware vulnerability. Many Bothans died to bring us this information. Link: https://www.youtube.com/watch?v=IF6HbCKQHK8 Link: https://github.com/llvm/llvm-project/issues/54404 Link: https://gcc.gnu.org/legacy-ml/gcc-patches/2018-01/msg01197.html Link: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html Link: https://arstechnica.com/information-technology/2022/07/intel-and-amd-cpus-vulnerable-to-a-new-speculative-execution-attack/?comments=1 Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce114c866860aa9eae3f50974efc68241186ba60 Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html Link: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00707.html Reviewed By: aaron.ballman, craig.topper Differential Revision: https://reviews.llvm.org/D129572	2022-07-12 09:17:54 -07:00
David Sherwood	6b694d600a	[LoopVectorize] Change PredicatedBBsAfterVectorization to be per VF When calculating the cost of Instruction::Br in getInstructionCost we query PredicatedBBsAfterVectorization to see if there is a scalar predicated block. However, this meant that the decisions being made for a given fixed-width VF were affecting the cost for a scalable VF. As a result we were returning InstructionCost::Invalid pointlessly for a scalable VF that should have a low cost. I encountered this for some loops when enabling tail-folding for scalable VFs. Test added here: Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll Differential Revision: https://reviews.llvm.org/D128272	2022-07-12 14:53:20 +01:00
Nikita Popov	3d475dfeb9	[Mem2Reg] Consistently preserve nonnull assume for uninit load When performing a !nonnull load from uninitialized memory, we should preserve the nonnull assume just like in all other cases. We already do this correctly in the generic mem2reg code, but don't handle this case when using the optimized single-block implementation. Make sure that the optimized implementation exhibits the same behavior as the generic implementation.	2022-07-12 12:53:08 +02:00
Kazu Hirata	ec9a0e36d9	[IPO] Remove addLTOOptimizationPasses and addLateLTOOptimizationPasses (NFC) The last uses were removed on Apr 15, 2022 in commit 2e6ac54cf48aa04f7b05c382c33135b16d3f01ea. Differential Revision: https://reviews.llvm.org/D129460	2022-07-11 20:15:24 -07:00
Florian Hahn	5d135041c5	[LV] Move VPBlendRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-11 16:01:07 -07:00
Justin Cady	3d438ceed1	[InstrProf] Mark __llvm_profile_runtime hidden to match libclang_rt.profile definition Mark the symbol hidden to match INSTR_PROF_PROFILE_RUNTIME_VAR in compiler-rt. Fixes second issue discussed at https://discourse.llvm.org/t/63090 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D128842	2022-07-11 11:29:20 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
David Sherwood	02d6950d84	[LoopVectorize][NFC] Add optional Name parameter to VPInstruction This patch is a simple piece of refactoring that now permits users to create VPInstructions and specify the name of the value being generated. This is useful for creating more readable/meaningful names in IR. Differential Revision: https://reviews.llvm.org/D128982	2022-07-11 09:23:24 +01:00
Florian Hahn	6a4bc452f8	[LV] Move VPWidenGEPRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-10 17:10:17 -07:00
Florian Hahn	13ae213469	[LV] Move VPWidenRecipe::execute to VPlanRecipes.cpp (NFC).	2022-07-09 18:46:57 -07:00
Paul Osmialowski	b17754bcaa	[SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer value Since the backend's codegen is capable to expand powi into fmul's, it is not needed anymore to do so in the ::optimizePow() function of SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n) into powi(x, n) for the cases where n is a constant integer value. Dropping the current expansion code allowed relaxation of the folding conditions and now this can also happen at optimization levels below Ofast. The added CodeGen/AArch64/powi.ll test case ensures that powi is actually expanded into fmul's, confirming that this refactor did not cause any performance degradation. Following an idea proposed by David Sherwood <david.sherwood@arm.com>. Differential Revision: https://reviews.llvm.org/D128591	2022-07-09 12:00:22 -04:00
Florian Hahn	0c27b38849	[VPlan] Move VPWidenSelectRecipe::execute to VPlanRecipes.cpp (NFC). Depends on D127968. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D127970	2022-07-08 09:35:23 -07:00
Nikita Popov	d287051404	[InstCombine] Avoid ConstantExpr::get() in vector binop fold (NFCI) Use the ConstantFoldBinaryOpOperands() API instead. This case would bail out on a non-folded result anyway.	2022-07-08 17:20:14 +02:00
Nikita Popov	29c6bf45c3	[InstCombine] Avoid ConstantExpr::get() call Avoid calling ConstantExpr::get() for associative/commutative binops, call ConstantFoldBinaryOpOperands() instead. We only want to perform the reassociation of the constants actually fold.	2022-07-08 17:13:06 +02:00
Nikita Popov	fc18a88231	[InstCombine] Avoid creating float binop ConstantExprs Replace ConstantExpr:getFAdd etc with call to ConstantFoldBinaryOpOperands(). I'm using the constant folding API rather than IRBuilder here to ensure that this does actually constant fold. These transforms don't use m_ImmConstant(), so this would not otherwise be guaranteed (and apparently, they can't use m_ImmConstant because they want to handle scalable vector splats). There is an opportunity here to further migrate these to the ConstantFoldFPInstOperands() API, which would respect the denormal mode. I've held off on doing so here, because some of this code explicitly checks for denormal results, and I don't want to touch it in a mostly NFC change.	2022-07-08 16:36:04 +02:00
Sanjay Patel	79bb915fb6	[InstCombine] enhance fold for subtract-from-constant -> xor A low-bit mask is not required: https://alive2.llvm.org/ce/z/yPShss This matches the SDAG implementation that was updated at: 8b756713140f	2022-07-08 10:02:19 -04:00
zhongyunde	716e1b856a	[IndVars] Eliminate redundant type cast between integer and float Recompute the range: match for fptosi of sitofp, and then query the range of the input to the sitofp according the comment on D129140. Fixes https://github.com/llvm/llvm-project/issues/55505. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D129191	2022-07-08 17:07:20 +08:00
ChenYang Li	6d036b83d1	[JumpThreading] Avoid threadThroughTwoBasicBlocks when PredPred BB ends with indirectbranch Since we can't change the destination of indirectbr, so when encounter indirectbr as PredPredBB terminator, we should pass it. Differential Revision: https://reviews.llvm.org/D129193	2022-07-08 09:29:17 +02:00
Nikita Popov	34a5c2bcf2	[BasicBlockUtils] Allow critical edge splitting with callbr terminators After D129205, we support SplitBlockPredecessors() for predecessors with callbr terminators. This means that it is now also safe to invoke critical edge splitting for an edge coming from a callbr terminator. Remove checks in various passes that were protecting against that. Differential Revision: https://reviews.llvm.org/D129256	2022-07-08 09:20:44 +02:00
Craig Topper	0266773464	[SLP] Add missing space to optimization remark. Reviewed By: vporpo Differential Revision: https://reviews.llvm.org/D129330	2022-07-07 23:29:11 -07:00
Johannes Doerfert	f6e0c05e3d	Revert "[Attributor] Replace AAValueSimplify with AAPotentialValues" This reverts commit f17639ea0cd30f52ac853ba2eb25518426cc3bb8 as three AMDGPU tests haven't been updated. Will need to verify the changes are not regressions we should avoid.	2022-07-08 00:53:38 -05:00
Johannes Doerfert	f17639ea0c	[Attributor] Replace AAValueSimplify with AAPotentialValues For the longest time we used `AAValueSimplify` and `genericValueTraversal` to determine "potential values". This was problematic for many reasons: - We recomputed the result a lot as there was no caching for the 9 locations calling `genericValueTraversal`. - We added the idea of "intra" vs. "inter" procedural simplification only as an afterthought. `genericValueTraversal` did offer an option but `AAValueSimplify` did not. Thus, we might end up with "too much" simplification in certain situations and then gave up on it. - Because `genericValueTraversal` was not a real `AA` we ended up with problems like the infinite recursion bug (#54981) as well as code duplication. This patch introduces `AAPotentialValues` and replaces the `AAValueSimplify` uses with it. `genericValueTraversal` is folded into `AAPotentialValues` as are the instruction simplifications performed in `AAValueSimplify` before. We further distinguish "intra" and "inter" procedural simplification now. `AAValueSimplify` was not deleted as we haven't ported the re-materialization of instructions yet. There are other differences over the former handling, e.g., we may not fold trivially foldable instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2` but if an operand would be simplified to `i32 1` we would fold it still. We are also even more aware of function/SCC boundaries in CGSCC passes, which is good even if some tests look like they regress. Fixes: https://github.com/llvm/llvm-project/issues/54981 Note: A previous version was flawed and consequently reverted in 6555558a80589d1c5a1154b92cc3af9495f8f86c.	2022-07-08 00:38:27 -05:00
Johannes Doerfert	cb26b01d57	[Attributor] Make heap2stack record alloca placement We recently learned to place the alloca during the heap2stack transformation in the entry block but we did not account for other concurrent modifications. We need to record our decision rather than checking (then outdated) passes during the manifest stage. This will also allow us to use a custom (=optimistic) "loop info" in the future.	2022-07-07 16:49:22 -05:00
Johannes Doerfert	efe8c581ff	[Attributor][NFC] Improve heap2stack result readability and code style	2022-07-07 16:49:22 -05:00
Johannes Doerfert	c771eaf07e	[OpenMP] Ensure to not use SPMD mode in the absence of parallel regions	2022-07-07 16:49:22 -05:00
Leonard Chan	0f589826a3	[hwasan] Refactor frame record info into function This way it can be reused easily in D128387. Note this changes the IR slightly. Before The steps for calculating and storing the frame record info were: 1. getPC 2. getSP 3. inttoptr 4. or SP, PC 5. store Now the steps are: 1. getPC 2. getSP 3. or SP, PC 4. inttoptr 5. store Differential Revision: https://reviews.llvm.org/D129315	2022-07-07 14:44:39 -07:00
Martin Sebor	516915beb5	[InstCombine] Fold memchr and strchr equality with first argument Enhance memchr and strchr handling to simplify calls to the functions used in equality expressions with the first argument to at most two integer comparisons: - memchr(A, C, N) == A to N && A == C for either a dereferenceable A or a nonzero N, - strchr(S, C) == S to S == C for any S and C, and - strchr(S, '\0') == 0 to true for any S Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D128939	2022-07-07 15:14:23 -06:00
Zaara Syeda	58b9666dc1	[LSR] Fix bug - check if loop has preheader before calling isInductionPHI Fix bug exposed by https://reviews.llvm.org/D125990 rewriteLoopExitValues calls InductionDescriptor::isInductionPHI which requires the PHI node to have an incoming edge from the loop preheader. This adds checks before calling InductionDescriptor::isInductionPHI to see that the loop has a preheader. Also did some refactoring. Differential Revision: https://reviews.llvm.org/D129297	2022-07-07 15:11:33 -04:00
Daniel Bertalan	ef7aed3e11	[InstCombine] Do not fold 'and (sext (ashr X, Shift)), C' if Shift < 0 The 'and (sext (ashr X, ShiftC)), C' --> 'lshr (sext X), ShiftC' transformation would access out of bounds bits in APInt::getLowBitsSet if the shift count was larger than X's bit width or if it was negative. Fixes #56424	2022-07-07 19:13:55 +02:00
Joseph Huber	41fba3c107	[Metadata] Add 'exclude' metadata to add the exclude flags on globals This patchs adds a new metadata kind `exclude` which implies that the global variable should be given the necessary flags during code generation to not be included in the final executable. This is done using the ``SHF_EXCLUDE`` flag on ELF for example. This should make it easier to specify this flag on a variable without needing to explicitly check the section name in the target backend. Depends on D129053 D129052 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129151	2022-07-07 12:20:40 -04:00
Joseph Huber	ed801ad5e5	[Clang] Use metadata to make identifying embedded objects easier Currently we use the `embedBufferInModule` function to store binary strings containing device offloading data inside the host object to create a fatbinary. In the case of LTO, we need to extract this object from the LLVM-IR. This patch adds a metadata node for the embedded objects containing the embedded pointers and the sections they were stored at. This should create a cleaner interface for identifying these values. In the future it may be worthwhile to also encode an `ID` in the metadata corresponding to the object's special section type if relevant. This would allow us to extract the data from an object file and LLVM-IR using the same ID. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D129033	2022-07-07 12:20:25 -04:00
Florian Hahn	bc19b7c3cc	[LV] Remove collectTriviallyDeadInstructions, already handled by VP DCE. Now that removeDeadRecipes can remove most dead recipes across a whole VPlan, there is no need to first collect some dead instructions. Instead removeDeadRecipes can simply clean them up. Depends D127580. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128408	2022-07-07 08:40:27 -07:00
Sander de Smalen	519d7876cb	[VectorCombine] Avoid creating shuffle for extract-extract pattern on scalable vector. This addresses https://github.com/llvm/llvm-project/issues/56377 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D129136	2022-07-07 08:37:04 +00:00
Nikita Popov	40a4078e14	[BasicBlockUtils] Allow splitting predecessors with callbr terminators SplitBlockPredecessors currently asserts if one of the predecessor terminators is a callbr. This limitation was originally necessary, because just like with indirectbr, it was not possible to replace successors of a callbr. However, this is no longer the case since D67252. As the requirement nowadays is that callbr must reference all blockaddrs directly in the call arguments, and these get automatically updated when setSuccessor() is called, we no longer need this limitation. The only thing we need to do here is use replaceSuccessorWith() instead of replaceUsesOfWith(), because only the former does the necessary blockaddr updating magic. I believe there's other similar limitations that can be removed, e.g. related to critical edge splitting. Differential Revision: https://reviews.llvm.org/D129205	2022-07-07 09:13:25 +02:00
Chuanqi Xu	66e15d4c01	[NFC] [Coroutines] Update the comments for lowering coro.save The original comment is not right. We don't store 0 all the time.	2022-07-07 14:57:41 +08:00
Florian Hahn	17d48c3169	[VPlan] Move remove dead recipes before merging regions. This can enable additional region merging, while not losing opportunities as region merging does not produce dead recipes. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D128831	2022-07-06 20:38:38 -07:00
Chuanqi Xu	e3b4452e07	[Debug] [Coroutines] Get rid of DW_ATE_address Closing https://github.com/llvm/llvm-project/issues/55916 This patch tries to get rid of DW_ATE_address and enhance the test coverage. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D127625	2022-07-07 10:47:09 +08:00
Chuanqi Xu	7137ebc4ce	[Debug] [Coroutine] Adjust the scope and name for coroutine frame Previously the scope of debug type of __coro_frame is limited in the current function. It looked good at the first sight. But it prevent us to print the type in splitted functions and other functions. Also the debug type is different for different coroutine functions. So it makes sense to rename the debug type to make it related to the function name. After this patch, we could access the coroutine frame type in a function by `function_name.coro_frame_ty`. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D127623	2022-07-07 10:35:32 +08:00
Vir Narula	89a99ec900	[GVN] Bug fix to reportMayClobberedLoad remark Bug fix to avoid assert crashing when generating remarks for GVN crashing. Intention of assert is correct but ignores edge case of instructions being equivalent. Reduced input that causes crash when remarks are turned on: ``` target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" target triple = "arm64-apple-macosx12.0.0" define ptr @ReplaceWithTidy(ptr %zz_hold) { cond.end480.us: %0 = load ptr, ptr null, align 8 store ptr %0, ptr %0, align 8 store ptr null, ptr %zz_hold, align 8 %1 = load ptr, ptr %0, align 8 store ptr %1, ptr null, align 8 ret ptr null } ``` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D129235	2022-07-06 17:42:05 -07:00
Wolfgang Pieb	ff87ee4dee	[Metadata] Utilize the resizing capability of MDNodes in Moduleflag processing. This mostly affects PGO/LTO builds which use module flags describing the call graph. Fixes Issue #51893. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D125999	2022-07-06 10:18:33 -07:00
Nikola Tesic	b5b6d3a41b	[Debugify] Port verify-debuginfo-preserve to NewPM Debugify in OriginalDebugInfo mode, introduced with D82545, runs only with legacy PassManager. This patch enables this utility for the NewPM. Differential Revision: https://reviews.llvm.org/D115351	2022-07-06 17:07:20 +02:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Nikita Popov	20962c1240	[SimplifyCFG] Don't split predecessors of callbr terminator This addresses the assertion failure reported in https://reviews.llvm.org/D124159#3631240. I believe that this limitation in SplitBlockPredecessors is not actually necessary (because unlike with indirectbr, callbr is restricted in a way that does allow updating successors), but for now fix the assertion failure the same way we do everywhere else, by also skipping callbr.	2022-07-06 15:38:53 +02:00
Dimitrije Milosevic	9f492a9ae5	[MIPS] Fix the ASAN shadow offset hook for the N32 ABI Currently, LLVM doesn't have the correct shadow offset mapping for the n32 ABI. This patch introduces the correct shadow offset value for the n32 ABI - 1ULL << 29. Differential Revision: https://reviews.llvm.org/D127096	2022-07-06 12:44:28 +02:00
Nikita Popov	f96cb66d19	[ValueTracking] Accept Instruction in isSafeToSpeculativelyExecute() (NFC) As constant expressions can no longer trap, it only makes sense to call isSafeToSpeculativelyExecute on Instructions, so limit the API to accept only them, rather than general Operators or Values.	2022-07-06 11:12:49 +02:00
Chenbing Zheng	851447cb32	[InstCombine] remove useless insertelement extractelement (bitcast (insertelement (Vec, b)), a) -> extractelement (bitcast (Vec), a) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128890	2022-07-06 17:05:27 +08:00
Nikita Popov	1ed8b29302	[LoopVectorizationLegality] Drop unused variable (NFC)	2022-07-06 10:43:39 +02:00
Nikita Popov	8ee913d83b	[IR] Remove Constant::canTrap() (NFC) As integer div/rem constant expressions are no longer supported, constants can no longer trap and are always safe to speculate. Remove the Constant::canTrap() method and its usages.	2022-07-06 10:36:47 +02:00

1 2 3 4 5 ...

30991 Commits