llvm-project

Author	SHA1	Message	Date
Qi Zhao	636914e0f3	update tests	2025-08-22 17:57:36 +08:00
Qi Zhao	49294ba29e	[LoongArch] Spill 256-bit build_vector to avoid using LASX element insertion Note: Only worse for v8i32/v8f32/v4i64/v4f64 types when the high part only has one non-undef element.	2025-08-22 17:56:32 +08:00
Qi Zhao	6e525512d0	update	2025-08-22 16:13:44 +08:00
Qi Zhao	b99249e3f7	update tests	2025-08-21 10:49:42 +08:00
Qi Zhao	66aae1ebd9	using lsx inserting	2025-08-21 10:49:14 +08:00
Qi Zhao	6acd00f096	update tests	2025-08-20 21:12:05 +08:00
Qi Zhao	3674bad63b	[LoongArch] Broadcast repeated subsequence in build_vector instead of inserting per element	2025-08-20 21:12:04 +08:00
Qi Zhao	8375c79afe	[LoongArch][NFC] Add tests for build_vector containing repeated sub-sequence	2025-08-20 20:33:44 +08:00
Timm Baeder	2f011ea37a	[clang][bytecode][NFC] Remove unused Program::relocs (#154308 )	2025-08-19 13:29:56 +02:00
Dan Blackwell	2e9494ff96	[ASan] Re-enable duplicate_os_log_reports test and include cstdlib for malloc (#153195 ) rdar://62141527	2025-08-19 12:12:48 +01:00
David Green	22b4021f01	[AArch64][GlobalISel] Add additional vecreduce.fadd and fadd 0.0 tests. NFC	2025-08-19 11:52:50 +01:00
Stephen Tozer	5cedb01487	[Debugify] Fix compile error in tracking coverage build Forward-fixes a compile error in bc216b057d (#150212) in specific build configurations, due to a missing const_cast.	2025-08-19 11:18:42 +01:00
Nuno Lopes	d0029b87d8	remove UB from test [NFC]	2025-08-19 11:18:27 +01:00
yronglin	57bf5dd7a0	[libc++][tuple.apply] Implement P2255R2 make_from_tuple part. (#152867 ) Implement P2255R2 tuple.apply part wording for `std::make_from_tuple`. ``` Mandates: If tuple_size_v<remove_reference_t<Tuple>> is 1, then reference_constructs_from_temporary_v<T, decltype(get<0>(declval<Tuple>()))> is false. ``` Fixes #154274 --------- Signed-off-by: yronglin <yronglin777@gmail.com>	2025-08-19 18:14:13 +08:00
Kareem Ergawy	0e93dbc6b1	[flang] `do concurrent`: Enable delayed localization by default (#154303 ) Enables delayed localization by default for `do concurrent`. Tested both gfortran and Fujitsu test suites. All tests pass for gfortran tests. For Fujitsu, enabled delayed localization passes more tests: Delayed localization disabled: Testing Time: 7251.76s Passed : 88520 Failed : 162 Executable Missing: 408 Delayed localization enabled: Testing Time: 7216.73s Passed : 88522 Failed : 160 Executable Missing: 408	2025-08-19 12:07:17 +02:00
Benjamin Maxwell	7170a81241	[AArch64][SME] Rename `EdgeBundles` to `Bundles` (NFC) (#154295 ) It seems some buildbots do not like the shadowing. See: https://lab.llvm.org/buildbot/#/builders/137/builds/23838	2025-08-19 10:58:19 +01:00
Lang Hames	615f8393c9	[orc-rt] Remove unused LLVM_RT_TOOLS_BINARY_DIR cmake variable. (#154254 ) This was accidentally left in ee7a6a45bdb.	2025-08-19 19:56:27 +10:00
Timm Baeder	ab8b4f6629	[clang][bytecode][NFC] Replace std::optional<unsigned> with UnsignedO… (#154286 ) …rNone	2025-08-19 11:37:32 +02:00
David Green	a7df02f83c	[InstCombine] Make strlen optimization more resilient to different gep types. (#153623 ) This makes the optimization in optimizeStringLength for strlen(gep @glob, %x) -> sub endof@glob, %x a little more resilient, and maybe a bit more correct for geps with non-array types.	2025-08-19 10:37:17 +01:00
Vinay Deshmukh	d286f2ef55	[libc++] Make `std::__tree_node` member private to prepare for UB removal (#154225 ) Prepare for: https://github.com/llvm/llvm-project/pull/153908#discussion_r2281756219	2025-08-19 11:27:56 +02:00
tangaac	ccbcebcfd3	[LoongArch] Fix implicit PesudoXVINSGR2VR error (#152432 ) According to the instructions manual, when `vr0` is changed, high 128 bit of `xr0` is undefined. Use `vinsgr2vr.b/h` to insert an `i8/i16` to low 128bit of a 256 vector may cause undefined behavior when high 128bit is used in later instructions.	2025-08-19 17:22:00 +08:00
Aditi Medhane	948abf1bf5	[PowerPC] Add BCDCOPYSIGN and BCDSETSIGN Instruction Support (#144874 ) Support the following BCD format conversion builtins for PowerPC. - `__builtin_bcdcopysign` – Conversion that returns the decimal value of the first parameter combined with the sign code of the second parameter. ` - `__builtin_bcdsetsign` – Conversion that sets the sign code of the input parameter in packed decimal format. > Note: This built-in function is valid only when all following conditions are met: > -qarch is set to utilize POWER9 technology. > The bcd.h file is included. ## Prototypes ```c vector unsigned char __builtin_bcdcopysign(vector unsigned char, vector unsigned char); vector unsigned char __builtin_bcdsetsign(vector unsigned char, unsigned char); ``` ## Usage Details `__builtin_bcdsetsign`: Returns the packed decimal value of the first parameter combined with the sign code. The sign code is set according to the following rules: - If the packed decimal value of the first parameter is positive, the following rules apply: - If the second parameter is 0, the sign code is set to 0xC. - If the second parameter is 1, the sign code is set to 0xF. - If the packed decimal value of the first parameter is negative, the sign code is set to 0xD. > notes: > The second parameter can only be 0 or 1. > You can determine whether a packed decimal value is positive or negative as follows: > - Packed decimal values with sign codes 0xA, 0xC, 0xE, or 0xF are interpreted as positive. > - Packed decimal values with sign codes 0xB or 0xD are interpreted as negative. --------- Co-authored-by: Aditi-Medhane <aditi.medhane@ibm.com>	2025-08-19 14:47:27 +05:30
Temperz87	9cadc4e153	[DAG] SelectionDAG::canCreateUndefOrPoison - add ISD::SCMP/UCMP handling + tests (#154127 ) This pr aims to resolve #152144 In SelectionDAG::canCreateUndefOrPoison the ISD::SCMP/UCMP cases are added to always return false as they cannot generate poison or undef The `freeze-binary.ll` file is now testing the SCMP/UCMP cases --------- Co-authored-by: Temperz87 <= temperz871@gmail.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>	2025-08-19 10:06:14 +01:00
Timm Baeder	a1039c1b84	[clang][bytecode] Fix initializing float elements from #embed (#154285 ) Fixes #152885	2025-08-19 11:04:21 +02:00
Sergei Barannikov	56ce40bc73	[TableGen][DecoderEmitter] Stop duplicating encodings (NFC) (#154288 ) When HwModes are involved, we can duplicate an instruction encoding that does not belong to any HwMode multiple times. We can do better by mapping HwMode to a list of encoding IDs it contains. (That is, duplicate IDs instead of encodings.) The encodings that were duplicated are still processed multiple times (e.g., we call an expensive populateInstruction() on each instance). This is going to be fixed in subsequent patches.	2025-08-19 09:02:22 +00:00
Benjamin Maxwell	eb764040bc	[AArch64][SME] Implement the SME ABI (ZA state management) in Machine IR (#149062 ) ## Short Summary This patch adds a new pass `aarch64-machine-sme-abi` to handle the ABI for ZA state (e.g., lazy saves and agnostic ZA functions). This is currently not enabled by default (but aims to be by LLVM 22). The goal is for this new pass to more optimally place ZA saves/restores and to work with exception handling. ## Long Description This patch reimplements management of ZA state for functions with private and shared ZA state. Agnostic ZA functions will be handled in a later patch. For now, this is under the flag `-aarch64-new-sme-abi`, however, we intend for this to replace the current SelectionDAG implementation once complete. The approach taken here is to mark instructions as needing ZA to be in a specific ("ACTIVE" or "LOCAL_SAVED"). Machine instructions implicitly defining or using ZA registers (such as $zt0 or $zab0) require the "ACTIVE" state. Function calls may need the "LOCAL_SAVED" or "ACTIVE" state depending on the callee (having shared or private ZA). We already add ZA register uses/definitions to machine instructions, so no extra work is needed to mark these. Calls need to be marked by glueing Arch64ISD::INOUT_ZA_USE or Arch64ISD::REQUIRES_ZA_SAVE to the CALLSEQ_START. These markers are then used by the MachineSMEABIPass to find instructions where there is a transition between required ZA states. These are the points we need to insert code to set up or restore a ZA save (or initialize ZA). To handle control flow between blocks (which may have different ZA state requirements), we bundle the incoming and outgoing edges of blocks. Bundles are formed by assigning each block an incoming and outgoing bundle (initially, all blocks have their own two bundles). Bundles are then combined by joining the outgoing bundle of a block with the incoming bundle of all successors. These bundles are then assigned a ZA state based on the blocks that participate in the bundle. Blocks whose incoming edges are in a bundle "vote" for a ZA state that matches the state required at the first instruction in the block, and likewise, blocks whose outgoing edges are in a bundle vote for the ZA state that matches the last instruction in the block. The ZA state with the most votes is used, which aims to minimize the number of state transitions.	2025-08-19 10:00:28 +01:00
Nikita Popov	4ab87ffd1e	[SCCP] Enable PredicateInfo for non-interprocedural SCCP (#153003 ) SCCP can use PredicateInfo to constrain ranges based on assume and branch conditions. Currently, this is only enabled during IPSCCP. This enables it for SCCP as well, which runs after functions have already been simplified, while IPSCCP runs pre-inline. To a large degree, CVP already handles range-based optimizations, but SCCP is more reliable for the cases it can handle. In particular, SCCP works reliably inside loops, which is something that CVP struggles with due to LVI cycles. I have made various optimizations to make PredicateInfo more efficient, but unfortunately this still has significant compile-time cost (around 0.1-0.2%).	2025-08-19 10:59:38 +02:00
Timm Baeder	fb8ee3adb6	[clang][bytecode] Move pointers from extern globals to new decls (#154273 )	2025-08-19 10:54:33 +02:00
Michael Buch	d9d5090b03	[CI] Run LLDB tests on Clang changes in pre-merge CI (#154154 ) This attempts https://github.com/llvm/llvm-project/issues/132795 again. Last time we tried this we didn't have enough infra capacity, so had to revert. According to recent communication from the Infrastructure Area Team, we should now have enough capacity to re-enable the LLDB tests.	2025-08-19 09:44:08 +01:00
Baranov Victor	ef3ce0dcb2	[Github] Remove redundant 'START_REV', 'END_REV' env variables (NFC) (#154218 ) After https://github.com/llvm/llvm-project/pull/133023, `START_REV` and `END_REV` env variables became redundant.	2025-08-19 11:41:37 +03:00
Amr Hesham	5581e34bd9	[CIR] Implement MemberExpr support for ComplexType (#154027 ) This change adds support for the MemberExpr ComplexType Issue: https://github.com/llvm/llvm-project/issues/141365	2025-08-19 10:32:22 +02:00
David Sherwood	13d8ba7dea	[LV][TTI] Calculate cost of extracting last index in a scalable vector (#144086 ) There are a couple of places in the loop vectoriser where we want to calculate the cost of extracting the last lane in a vector. However, we wrongly assume that asking for the cost of extracting lane (VF.getKnownMinValue() - 1) is an accurate representation of the cost of extracting the last lane. For SVE at least, this is non-trivial as it requires the use of whilelo and lastb instructions. To solve this problem I have added a new getReverseVectorInstrCost interface where the index is used in reverse from the end of the vector. Suppose a vector has a given ElementCount EC, the extracted/inserted lane would be EC - 1 - Index. For scalable vectors this index is unknown at compile time. I've added a AArch64 hook that better represents the cost, and also a RISCV hook that maintains compatibility with the behaviour prior to this PR. I've also taken the liberty of adding support in vplan for calculating the cost of VPInstruction::ExtractLastElement.	2025-08-19 09:31:37 +01:00
William Huynh	0c622d72fc	[libc] Add _Returns_twice to C++ code (#153602 ) Fixes issue with `<csetjmp>` which requires `_Returns_twice` but in C++ mode	2025-08-19 09:28:23 +01:00
Timm Baeder	da05208bfb	[clang][bytecode] Create temporary before discarding CXXConstructExpr (#154280 ) Fixes #154110	2025-08-19 10:27:26 +02:00
Nikita Popov	5753ee2434	[LICM] Avoid assertion failure on stale MemoryDef It can happen that the call is originally created as a MemoryDef, and then later transforms show it is actually read-only and could be a MemoryUse -- however, this is not guaranteed to be reflected in MSSA.	2025-08-19 10:25:45 +02:00
Luke Lau	cabf6433c6	[VPlan] EVL transform VPVectorEndPointerRecipe alongisde load/store recipes. NFC (#152542 ) This is the first step in untangling the variable step transform and header mask optimizations as described in #152541. Currently we replace all VF users globally in the plan, including VPVectorEndPointerRecipe. However this leaves reversed loads and stores in an incorrect state until they are adjusted in optimizeMaskToEVL. This moves the VPVectorEndPointerRecipe transform so that it is updated in lockstep with the actual load/store recipe. One thought that crossed my mind was that VPInterleaveRecipe could also use VPVectorEndPointerRecipe, in which case we would have also been computing the wrong address because we don't transform it to an EVL recipe which accounts for the reversed address.	2025-08-19 08:16:48 +00:00
Morris Hafner	b5e5794534	[CIR] Implement Statement Expressions (#153677 ) Depends on #153625 This patch adds support for statement expressions. It also changes emitCompoundStmt and emitCompoundStmtWithoutScope to accept an Address that the optional result is written to. This allows the creation of the alloca ahead of the creation of the scope which saves us from hoisting the alloca to its parent scope.	2025-08-19 10:11:15 +02:00
Simon Pilgrim	9adc4f9720	[X86] Enable MMX unpcklo/unpckhi intrinsics in constexpr (#154149 ) Matches behaviour in SSE/AVX/AVX512 intrinsics - was missed in #153028	2025-08-19 09:08:39 +01:00
Timm Baeder	ddcd3fdcfd	[clang][bytecode][NFC] use both-note in literals test (#154277 ) And harmonize the RUN lines.	2025-08-19 10:02:23 +02:00
Nikita Popov	e6d9542b77	[X86][Inline] Check correct function for target feature check (#152515 ) The check for ABI differences for inlined calls involves the caller, the callee and the nested callee. Before inlining, the ABI is determined by the target features of the callee. After inlining it is determined by the caller. The features of the nested callee should never actually matter.	2025-08-19 09:44:00 +02:00
Nikita Popov	86ac834df5	[RISCV] Use OrigTy from InputArg/OutputArg (NFCI) (#154095 ) The InputArg/OutputArg now contains the OrigTy, so directly use that instead of trying to recover it. CC_RISCV is now nearly a normal CC assignment function. However, it still differs by having an IsRet flag.	2025-08-19 09:28:24 +02:00
Nikita Popov	9d37e80d3c	[SystemZ] Remove custom CCState pre-analysis (#154091 ) The calling convention lowering now has access to OrigTy, so use that to detect short vectors.	2025-08-19 09:28:09 +02:00
Nikita Popov	a4f85515c2	[Hexagon] Remove custom vararg tracking (NFCI) (#154089 ) This information is now directly available, use the generic CCIfArgVarArg.	2025-08-19 09:27:11 +02:00
Nikita Popov	b2fae5b3c7	[Mips] Remove custom "original type" handling (#154082 ) Replace Mips custom logic for retaining information about original types in calling convention lowering by directly querying the OrigTy that is now available. There is one change in behavior here: If the return type is a struct containing fp128 plus additional members, the result is now different, as we no longer special case to a single fp128 member. I believe this is fine, because this is a fake ABI anyway: Such cases should actually use sret, and as such are a frontend responsibility, and Clang will indeed emit these as sret, not as a return value struct. So this only impacts manually written IR tests.	2025-08-19 09:26:38 +02:00
Morris Hafner	b44e47a68f	[CIR] Upstream __builtin_va_start and __builtin_va_end (#153819 ) Part of #153286	2025-08-19 09:16:11 +02:00
Timm Baeder	eb7a1d91b2	[clang][bytecode] Support pmul X86 builtins (#154275 )	2025-08-19 09:10:50 +02:00
Sergei Barannikov	cded128009	[TableGen][DecoderEmitter] Extract encoding parsing into a method (NFC) (#154271 ) Call it from the constructor so that we can make `run` method `const`. Turn a couple of related functions into methods as well.	2025-08-19 06:35:59 +00:00
Sudharsan Veeravalli	c00b04a7e0	[RISCV] Generate QC_INSB/QC_INSBI instructions from OR of AND Imm (#154023 ) Generate QC_INSB/QC_INSBI from `or (and X, MaskImm), OrImm` iff the value being inserted only sets known zero bits. This is based on a similar DAG to DAG transform done in `AArch64`.	2025-08-19 11:14:14 +05:30
Pierre van Houtryve	6f7c77fe90	[AMDGPU] Check noalias.addrspace in mayAccessScratchThroughFlat (#151319 ) PR #149247 made the MD accessible by the backend so we can now leverage it in the memory model. The first use case here is detecting if a flat op can access scratch memory. Benefits both the MemoryLegalizer and InsertWaitCnt.	2025-08-19 07:42:59 +02:00
Benjamin Maxwell	bb3066d42b	[LAA] Move scalable vector check into `getStrideFromAddRec()` (#154013 ) This moves the check closer to the `.getFixedValue()` call and fixes #153797 (which is a regression from #126971).	2025-08-19 06:40:07 +01:00

1 2 3 4 5 ...

549124 Commits