llvm-project

Author	SHA1	Message	Date
Luke Lau	7d39664a6a	Revert "[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting" (#188220 ) Reverts llvm/llvm-project#185605 Buildbot failures caused by ISel crashes in https://lab.llvm.org/buildbot/#/builders/157/builds/45416 and https://lab.llvm.org/buildbot/#/builders/10/builds/25156	2026-03-24 11:35:14 +00:00
Luke Lau	fe105347e2	[SelectionDAG] Expand CTTZ_ELTS[_ZERO_POISON] and handle splitting (#185605 ) Currently a cttz.elts of e.g. nxv32i1 will get expanded to a reduction of nxv32i64 or equivalent, but we can split it into two legal nxv16i1 cttz.elts once we have dedicated SelectionDAG nodes. This implements the splitting for them the same way we implement type splitting for vp.cttz.elts, i.e. check if the low result is VF, and if so add it to the result of the high result. It also implements operand type promotion for NEON which needs to promote i1 vectors to something larger first. We also need to move expansion into LegalizeVectorOps so it doesn't get expanded before type legalization can do splitting. This uses LegalizeVectorOps in case the scalar reduction type, which depends on the minimum bitwidth needed to store the result, still needs type promotion. The TTI costs should be updated after this to reflect the more efficient codegen, but that is deferred to another PR.	2026-03-24 10:11:46 +00:00
Luke Lau	7a8903566d	[SelectionDAG] Add CTTZ_ELTS[_ZERO_POISON] nodes. NFCI (#185600 ) Currently llvm.experimental.cttz.elts are directly lowered from the intrinsic. If the type isn't legal then the target tells SelectionDAGBuilder to expand it into a reduction, but this means we can't split the operation. E.g. it's possible to split a cttz.elts nxv32i1 into two nxv16i1, instead of expanding it into a nxv32i64 reduction. vp.cttz.elts can be split because it has a dedicated SelectionDAG node. This adds CTTZ_ELTS and CTTZ_ELTS[_ZERO_POISON] nodes and just enough legalization to get tests passing. A follow up patch will add splitting and move the expansion into LegalizeDAG.	2026-03-16 14:39:35 +08:00
Alexis Engelke	4fd826d1f9	[IR] Split Br into UncondBr and CondBr (#184027 ) BranchInst currently represents both unconditional and conditional branches. However, these are quite different operations that are often handled separately. Therefore, split them into separate opcodes and classes to allow distinguishing these operations in the type system. Additionally, this also slightly improves compile-time performance.	2026-03-11 12:31:10 +00:00
Dmitry Sidorov	a636928bb4	[SelectionDAG] Add expansion for llvm.convert.from.arbitrary.fp (#179318 ) The expansion converts arbitrary-precision FP represented as integer following these algorithm: 1. Extract sign, exponent, and mantissa bit fields via masks and shifts. 2. Classify the input (zero, denormal, normal, Inf, NaN) using the exponent and mantissa fields. 3. Normal path: adjusting the exponent bias and left-shifting the mantissa to fit the wider destination format. 4. Denormal path: normalizing by finding the MSB position of the mantissa (via count-leading-zeros), computing the correct exponent from that position, stripping the implicit leading 1, and shifting the fraction into the destination mantissa field. 5. Assemble the destination IEEE bit pattern (sign \| exponent \| mantissa) and select among the normal, denormal, and special-value results. Currently only conversions from OCP floats are covered, in LLVM terms these are: Float8E5M2, Float8E4M3FN, Float6E3M2FN, Float6E2M3FN, Float4E2M1FN. OCP spec: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf AI has assisted in X86 E2E testing.	2026-03-04 10:40:47 +01:00
David Sherwood	0b36d4265e	[AArch64] Add vector expansion support for ISD::FCBRT when using ArmPL (#183750 ) This patch teaches the backend how to lower the FCBRT DAG node to the vector math library function when using ArmPL. This is similar to what we already do for llvm.pow/FPOW, however the only way to expose this is via a DAG combine that converts FPOW(<2 x double> %x, <2 x double> <double 1.0/3.0, double 1.0/3.0>) into FCBRT(<2 x double> %x) when the appropriate fast math flags are present on the node. I've updated the DAG combine to handle vector types and only perform the transformation if there exists a vector library variant of cbrt.	2026-03-03 10:39:21 +00:00
David Sherwood	9e95cff515	[AArch64] Add vector expansion support for ISD::FPOW when using ArmPL (#183526 ) This patch is split off from PR #183319 and teaches the backend how to lower the FPOW DAG node to the vector math library function when using ArmPL. This is similar to what we already do for llvm.sincos/FSINCOS today.	2026-02-27 09:43:05 +00:00
Matt Arsenault	c435e8bde7	TargetLowering: Replace android triple check with libcall check (#148800 ) Instead of directly checking if the target is android, check if __safestack_pointer_address is available and configure android to have the call. Maintain the -safestack-use-pointer-address cl::opt in an unclean way by ignoring libcall availability. Also add a RuntimeLibcallsInfo entry for __safestack_unsafe_stack_ptr, similar to other special globals. Also add this unconditionally to most targets, even though this seems contrary to reality. A few tests rely on unsupported OSes, so leave that alone for now.	2026-02-20 07:14:41 +01:00
JaydeepChauhan14	5df173263b	[NFC] Initialize AtomicLoadExtActions array (#180752 )	2026-02-10 22:52:35 +05:30
Matt Arsenault	a7d48bd305	DAG: Remove TypePromoteFloat (#177427 ) Remove the now unimplemented target hook and associated DAG machinery for the old half legalization path. Really fixes #97975	2026-01-26 22:02:24 +00:00
Chuanqi Xu	524589119a	[LLVM] Update the default value for MaxLargeFPConvertBitWidthSupported to 128 (#176851 ) Previously, we can't compile the program which convert 256 bits to floating points and vice versa(we'll crash). After this, we're able to compile them.	2026-01-22 16:37:06 +08:00
Matt Arsenault	aa57ee958d	CodeGen: Use LibcallLoweringInfo for stack protector insertion (#176829 ) Thread LibcallLoweringInfo into the TargetLowering hooks used by the stack protector passes.	2026-01-20 12:37:31 +01:00
Matt Arsenault	85b6d43bf7	TargetLowering: Avoid getLibcallName in getSafeStackPointerLocation (#176362 )	2026-01-16 13:37:02 +00:00
moorabbit	a5fa246435	[Clang] Add `__builtin_stack_address` (#148281 ) Add support for `__builtin_stack_address` builtin. The semantics match those of GCC's builtin with the same name. `__builtin_stack_address` returns the starting address of the stack region that may be used by called functions. It may or may not include the space used for on-stack arguments passed to a callee (See [GCC Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)). Fixes #82632.	2026-01-12 10:01:57 +01:00
Ties Stuij	b28eeb28be	[CodeGen] Generalise Hexagon flags for memop inline thresholds (#172829 ) Generalise the Hexagon cmdline options to control if memset, memcpy or memmove intrinsics should be inlined versus calling library functions, so they can be used by all backends: • -max-store-memset • -max-store-memcpy • -max-store-memmove These flags override the target-specific defaults set in TargetLowering (e.g., MaxStoresPerMemcpy) and allow fine-tuning of the inlining threshold for performance analysis and optimization. The optsize variants (-max-store-memset-Os, -max-store-memcpy-Os, max-store-memmove-Os) from the Hexagon backend were removed, and now the above options control both. The threshold is specified as a number of store operations, which is backend-specific. Operations requiring more stores than the threshold will call the corresponding library function instead of being inlined.	2026-01-09 12:08:35 +00:00
Luke Lau	ad4bfac732	[IR] Split vector.splice into vector.splice.left and vector.splice.right (#170796 ) This PR implements the first change outlined in https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel In order to allow non-immediate offsets in the llvm.vector.splice intrinsic, we need to separate out the "shift left" and "shift right" modes into two separate intrinsics, which were previously determined by whether or not the offset is positive or negative. The description in the LangRef has also been reworded in terms of sliding elements left or right and extracting either the upper or lower half as opposed to extracting from a certain index, which brings it inline with the definition of `llvm.fshr.`/`llvm.fshl.`. This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into their new equivalent one based on their offset, so existing uses of vector.splice should still work. Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced in this PR to keep the diff small and kick the tyres on the AutoUpgrader a bit. I planned to do this in a follow up NFC but can include it in this PR if reviewers prefer. Similarly the shuffle costing kind `SK_Splice` has just been kept the same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight` later.	2026-01-06 15:41:26 +08:00
Ramkumar Ramachandra	9e5e267a03	[ISel] Introduce llvm.clmul intrinsic (#168731 ) In line with a std proposal to introduce the llvm.clmul family of intrinsics corresponding to carry-less multiply operations. This work builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives), and follow-up patches will introduce custom-lowering on supported targets, replacing target-specific clmul intrinsics. Testing is done on the RISC-V target, which should be sufficient to prove that the intrinsics work, since no RISC-V specific lowering has been added. Ref: https://isocpp.org/files/papers/P3642R3.html Co-authored-by: Craig Topper <craig.topper@sifive.com>	2026-01-05 20:24:06 +00:00
Craig Topper	1b43f5cec6	[RISCV][SelectionDAG] Add a ISD::CTLS node for count leading redundant sign bits. Use it to select CLS(W). (#173417 ) The RISC-V P extension adds an instruction equivalent to __builtin_clrsb. AArch64 has a similar instruction that we currently fail to select when using the builtin. This patch adds a combine based on the canonical version of the pattern emitted by clang for the builtin, (add (ctlz (xor x, (sra x, bw-1)))), -1). I'm starting the combine at the ctlz because the outer add can easily be combined into other nodes obscuring the full pattern. So we generate (add (ctls x), 1) and hope the add will be combined away. I've also added a combine for the pattern AArch64 recognizes (ctlz_zero_undef (or (shl (xor x, (sra x, bw-1)), 1), 1)). I've only enabled the combines when the target has a Legal or Custom action for the operation, taking into account type promotion. We can relax this in the future by adding a default expansion to LegalizeDAG and adding more type legalization rules.	2026-01-04 18:00:00 -08:00
Folkert de Vries	a587ccd87d	fix `llvm.fma.f16` double rounding issue when there is no native support (#171904 ) fixes https://github.com/llvm/llvm-project/issues/98389 As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does not work, because there is not enough precision to handle the repeated rounding. `f64` does have sufficient space. So this PR explicitly promotes the 16-bit fma to a 64-bit fma. I could not find examples of a libcall being used for fma, but that's something that could be looked in separately to work around code size issues.	2025-12-17 22:03:01 +01:00
Matt Arsenault	a3aaa1a391	DAG: Use RuntimeLibcalls to legalize vector frem calls (#170719 ) This continues the replacement of TargetLibraryInfo uses in codegen with RuntimeLibcallsInfo started in 821d2825a4f782da3da3c03b8a002802bff4b95c. The series there handled all of the multiple result calls. This extends for the other handled case, which happened to be frem. For some reason the Libcall for these are prefixed with "REM_", for the instruction "frem", which maps to the libcall "fmod".	2025-12-11 13:33:27 +00:00
Matt Arsenault	62832593b7	DAG: Use more RTLIB helper functions for getting libcall from type (#170563 ) We had a set of utilities which was only used for some set of floating point libcalls. Add more, most of which are for integer operations. Ideally we would generate these functions from tablegen.	2025-12-04 17:49:27 +01:00
Matt Arsenault	1c5b1501ca	CodeGen: Move libcall lowering configuration to subtarget (#168621 ) Previously libcall lowering decisions were made directly in the TargetLowering constructor. Pull these into the subtarget to facilitate turning LibcallLoweringInfo into a separate analysis in the future.	2025-11-25 11:59:56 -05:00
Matt Arsenault	1d73b68463	TargetLowering: Avoid hardcoding OpenBSD + __guard_local name (#167744 ) Query RuntimeLibcalls for the support and the name. The check that the implementation is exactly __guard_local instead of unsupported feels a bit strange.	2025-11-20 20:44:25 -05:00
Matt Arsenault	a757c4e74e	CodeGen: Add subtarget to TargetLoweringBase constructor (#168620 ) Currently LibcallLoweringInfo is defined inside of TargetLowering, which is owned by the subtarget. Pass in the subtarget so we can construct LibcallLoweringInfo with the subtarget. This is a temporary step that should be revertable in the future, after LibcallLoweringInfo is moved out of TargetLowering.	2025-11-19 19:18:13 +00:00
Matt Arsenault	862d34666f	opt: Fix bad merge of #167996 (#168110 ) After the base branch was moved to main, this somehow ended up adding a second definition of RTLCI, instead of modifying the existing one. Also fix other build error with gcc bots.	2025-11-14 12:03:26 -08:00
Matt Arsenault	590ab43e8a	RuntimeLibcalls: Move VectorLibrary handling into TargetOptions (#167996 ) This fixes the -fveclib flag getting lost on its way to the backend. Previously this was its own cl::opt with a random boolean. Move the flag handling into CommandFlags with other backend ABI-ish options, and have clang directly set it, rather than forcing it to go through command line parsing. Prior to de68181d7f, codegen used TargetLibraryInfo to find the vector function. Clang has special handling for TargetLibraryInfo, where it would directly construct one with the vector library in the pass pipeline. RuntimeLibcallsInfo currently is not used as an analysis in codegen, and needs to know the vector library when constructed. RuntimeLibraryAnalysis could follow the same trick that TargetLibraryInfo is using in the future, but a lot more boilerplate changes are needed to thread that analysis through codegen. Ideally this would come from an IR module flag, and nothing would be in TargetOptions. For now, it's better for all of these sorts of controls to be consistent.	2025-11-14 11:19:21 -08:00
Matt Arsenault	4b9771e41a	DAG: Use modf vector libcalls through RuntimeLibcalls (#166986 ) Copy new process from sincos/sincospi	2025-11-11 18:05:35 -08:00
Matt Arsenault	de68181d7f	DAG: Use sincos vector libcalls through RuntimeLibcalls (#166984 ) Copy new process from sincospi.	2025-11-11 10:51:23 -08:00
serge-sans-paille	af146462f9	Remove unused <iterator> inclusion Per https://llvm.org/docs/CodingStandards.html#include-as-little-as-possible this improves compilation time, while not being too intrusive on the codebase.	2025-11-11 13:33:38 +01:00
Matt Arsenault	b7423af8da	RuntimeLibcalls: Add entries for vector sincospi functions (#166981 ) Add libcall entries for sleef and armpl sincospi implementations. This is the start of adding the vector library functions; eventually they should all be tracked here. I'm starting with this case because this is a prerequisite to fix reporting sincospi calls which do not exist on any common targets without regressing vector codegen when these libraries are available.	2025-11-10 10:56:33 -08:00
Matt Arsenault	056d2c12f7	RuntimeLibcalls: Split lowering decisions into LibcallLoweringInfo (#164987 ) Introduce a new class for the TargetLowering usage. This tracks the subtarget specific lowering decisions for which libcall to use. RuntimeLibcallsInfo is a module level property, which may have multiple implementations of a particular libcall available. This attempts to be a minimum boilerplate patch to introduce the new concept. In the future we should have a tablegen way of selecting which implementations should be used for a subtarget. Currently we do have some conflicting implementations added, it just happens to work out that the default cases to prefer is alphabetically first (plus some of these still are using manual overrides in TargetLowering constructors).	2025-11-05 17:10:36 +00:00
Matt Arsenault	28e9a2832f	DAG: Consider __sincos_stret when deciding to form fsincos (#165169 )	2025-10-28 08:28:09 -07:00
Shimin Cui	531fd45e92	[PPC] Set minimum of largest number of comparisons to use bit test for switch lowering (#155910 ) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) \|\| (NumDests == 2 && NumCmps >= 5) \|\| (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <scui@xlperflep9.rtp.raleigh.ibm.com>	2025-10-28 10:24:32 -04:00
Matt Arsenault	f5a2e6bb8f	CodeGen: Remove overrides of getSSPStackGuardCheck (NFC) (#164044 ) All 3 implementations are just checking if this has the windows check function, so merge that as the only implementation.	2025-10-24 21:17:34 +09:00
Sam Parker	1820102167	Wasm fmuladd relaxed (#163177 ) Reland #161355, after fixing up the cross-projects-tests for the wasm simd intrinsics. Original commit message: Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 16:50:53 +01:00
Sam Parker	30d3441cf0	Revert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171 ) Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.	2025-10-13 11:53:40 +01:00
Sam Parker	a4eb7ea225	[WebAssembly] Lower fmuladd to madd and nmadd (#161355 ) Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.	2025-10-13 10:36:08 +01:00
Daniel Paoliello	f99b0f3de4	[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850 ) As noted in #153256, TableGen is generating reserved names for RuntimeLibcalls, which resulted in a build failure for Arm64EC since `vcruntime.h` defines `__security_check_cookie` as a macro. To avoid using reserved names, all impl names will now be prefixed with `Impl_`. `NumLibcallImpls` was lifted out as a `constexpr size_t` instead of being an enum field. While I was churning the dependent code, I also removed the TODO to move the impl enum into its own namespace and use an `enum class`: I experimented with using an `enum class` and adding a namespace, but we decided it was too verbose so it was dropped.	2025-09-02 09:57:33 -07:00
Sam Tebbs	569d738d4e	[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes (#117007 ) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.	2025-09-02 15:35:15 +01:00
daniel-trujillo-bsc	658a931c5b	[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication instructions. (#153033 ) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.	2025-08-27 15:48:33 -07:00
Matt Arsenault	65d12622fa	RuntimeLibcalls: Add entries for stackprotector globals (#154930 ) Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.	2025-08-23 10:21:00 +09:00
Nikita Popov	498ef361fe	[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414 ) https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.	2025-08-13 18:42:26 +02:00
Stephen Long	19ada02086	PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744 ) Similar to ab976a1, but for llvm.log.	2025-08-12 01:04:28 +09:00
Nikita Popov	e92b7e9641	[CodeGen] Provide original IR type to CC lowering (NFC) (#152709 ) It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.	2025-08-11 08:57:53 +02:00
Alexander Richardson	3a4b351ba1	[IR] Introduce the `ptrtoaddr` instruction This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357	2025-08-08 10:12:39 -07:00
Kazu Hirata	4be22dabc5	[CodeGen] Remove an unnecessary cast (NFC) (#152441 ) getActiveBits() already returns unsigned.	2025-08-07 07:22:42 -07:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Abhishek Kaushik	1c0ac80d4a	[DAG] Combine `store + vselect` to `masked_store` (#145176 ) Add a new combine to replace ``` (store ch (vselect cond truevec (load ch ptr offset)) ptr offset) ``` to ``` (mstore ch truevec ptr offset cond) ``` This saves a blend operation on targets that support conditional stores.	2025-08-04 19:05:36 +05:30
jeremyd2019	28b3190053	[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638 ) Cygwin and MinGW share the auto import behavior that could result in __stack_check_guard being non-dso-local. Allow windres to assume a Cygwin target as well as a MinGW one, so defines like _WIN32 would not be present on Cygwin.	2025-07-29 10:01:04 -07:00

1 2 3 4 5 ...

571 Commits