llvm-project

Author	SHA1	Message	Date
Ricardo Jesus	565f707beb	[AArch64] Allow splitting bitmasks for EOR/ORR. (#150394 ) This patch extends #149095 for EOR and ORR. It uses a simple partition scheme to try to find two suitable disjoint bitmasks that can be used with EOR/ORR to reconstruct the original mask. Fixes: #148987.	2025-08-07 10:42:30 +01:00
Ties Stuij	b9e133d5b6	[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 (#152156 ) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, #8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, #32 add x12, x12, #32 stp q0, q1, [x10, #-16] add x10, x10, #32 ... ```	2025-08-07 09:48:09 +01:00
Nikita Popov	406d9b1dd6	[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319 ) The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).	2025-08-07 09:12:40 +02:00
David Green	0491d8bda7	[AArch64] Treat single-vector ext as legal shuffle masks. (#151909 ) We can generate ext from shuffles like <2, 3, 0, 1> from a single vector source. Add handling to isShuffleMaskLegal to allow DAG combines to optimize to it.	2025-08-06 20:54:11 +01:00
Jonathan Thackray	c3d24217bf	[AArch64][llvm] Fix disassembly of `ldt{add,set,clr}` instructions using `xzr/wzr` (#152292 ) The current disassembly of `ldt{add,set,clr}` instructions when using `xzr/wzr` is incorrect. The Armv9.6-A Memory Systems specification says: ``` For each of LDT{ADD\|SET\|CLR}{L}, there is the corresponding STT{ADD\|SET\|CLR}{L} alias, for the case where the register selected by the Rt field is XZR or WZR ``` and: ``` LDT{ADD\|SET\|CLR}{A}{L} is equivalent to LD{ADD\|SET\|CLR}{A}{L} except that: <..conditions..> ``` The Arm ARM specifies the preferred form of disassembly for these aliases: ``` STADD <Xs>, [<Xn\|SP>] is equivalent to LDADD <Xs>, XZR, [<Xn\|SP>] and is always the preferred disassembly. ``` (ref: DDI 0487L.b C6-2317) This means that `sttadd` is the preferred disassembly for `ldtadd w0, wzr, [x2]` when Rt is `xzr` or `wzr`. This change also aligns llvm disassembly with GNU binutils, as shown by the following examples: llvm before this change: ``` % cat test.s stadd w0, [sp] sttadd w0, [sp] ldadd w0, wzr, [sp] ldtadd w0, wzr, [sp] % llvm-mc-20 -triple aarch64 -mattr=+lse,+lsui test.s stadd w0, [sp] ldtadd w0, wzr, [sp] stadd w0, [sp] ldtadd w0, wzr, [sp] ``` llvm after this change: ``` % llvm-mc -triple aarch64 -mattr=+lse,+lsui test.s stadd w0, [sp] sttadd w0, [sp] stadd w0, [sp] sttadd w0, [sp] ``` GCC-15 test: ``` % gas test.s -march=armv8-a+lsui+lse -o test.o % objdump -dr test.o 0: b82003ff stadd w0, [sp] 4: 192007ff sttadd w0, [sp] 8: b82003ff stadd w0, [sp] c: 192007ff sttadd w0, [sp] ``` Many thanks to Ezra Sitorus and Alice Carlotti for reporting and confirming this issue.	2025-08-06 15:44:15 +01:00
Kazu Hirata	62fc0028bf	[Target] Remove unnecessary casts (NFC) (#152262 ) value() already returns uint64_t.	2025-08-06 07:11:07 -07:00
Kazu Hirata	e9c510b151	[AArch64] Remove an unnecessary cast (NFC) (#152260 ) Pred is already of CmpInst::Predicate.	2025-08-06 07:10:49 -07:00
Ricardo Jesus	d8f896172d	[AArch64] Improve lowering of scalar abs(sub(a, b)). (#151180 ) This patch avoids a comparison against zero when lowering abs(sub(a, b)) patterns, instead reusing the condition codes generated by a subs of the operands directly. For example, currently: ``` sxtb w8, w0 sub w8, w8, w1, sxtb cmp w8, #0 cneg w0, w8, mi ``` becomes: ``` sxtb w8, w0 subs w8, w8, w1, sxtb cneg w0, w8, mi ``` Together with #151177, this should handle the remaining patterns in #118413.	2025-08-06 13:00:28 +01:00
Simon Pilgrim	9b7b382871	Fix MSVC truncation to char warning. NFC.	2025-08-06 11:39:48 +01:00
Ricardo Jesus	df34eaca59	[AArch64] Drop poison flags when lowering absolute difference patterns. (#152130 ) As a follow-up to #151177, when lowering SELECT_CC nodes of absolute difference patterns, drop poison-generating flags from the negated operand to avoid inadvertently propagating poison. As discussed in the PR above, I didn't find practical issues with the current code, but it seems safer to do this preemptively.	2025-08-06 09:27:35 +01:00
Benjamin Maxwell	ff5fa711b3	[AArch64][SVE] Tweak how SVE CFI expressions are emitted (#151677 ) The main change in this patch is we go from emitting the expression: @ cfa - NumBytes - NumScalableBytes * VG To: @ cfa - VG * NumScalableBytes - NumBytes That is, VG is the first expression. This is for a future patch that adds an alternative way to resolve VG (which uses the CFA, so it is convenient for the CFA to be at the top of the stack). Since doing this is fairly churn-heavy, I took the opportunity to also save up to 4-bytes per SVE CFI expression. This is done by folding LEB128 constants to literals when in the range 0 to 31, and using the offset in `DW_OP_breg*` expressions.	2025-08-06 09:21:57 +01:00
Matt Arsenault	342bf58f93	RuntimeLibcalls: Add entries for __security_check_cookie (#151843 ) Avoids hardcoding string name based on target, and gets the entry in the centralized list of emitted calls.	2025-08-06 10:26:36 +09:00
Daniel Paoliello	07da480614	[win][arm64ec] More fixes for building and testing Arm64EC Windows (#151409 ) * `tools/llvm-objcopy/MachO/update-section-object.test` was failing on Windows since the input file (`macho_sections.s`) might be checked out with the wrong line ending, resulting in difference in the size of sections being checked. * Removed the check for Windows in `AArch64Arm64ECCallLowering`: when `llc` is run without an explicit target, the module's target triple is unknown so this assert fires. * Expect `llvm/test/CodeGen/Generic/allow-check.ll` to fail for Arm64EC: Global ISel is not supported.	2025-08-05 14:54:08 -07:00
Oliver Stannard	f6c2a357e7	[AArch64] Add Apple assembly syntax for recent instructions (#152111 ) Some vector instructions override AsmString in the tablegen description, but did not include the Apple syntax variant, so were printed without operands. Fixes #151330	2025-08-05 16:04:25 +01:00
Paul Walker	94d374ab6c	[LLVM][CGP] Allow finer control for sinking compares. (#151366 ) Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.	2025-08-05 11:43:41 +01:00
Daniel Paoliello	717e753d1e	[win][arm64ec] Handle empty function names (#151609 ) While testing Arm64EC, I observed that LLVM crashes when an empty function name is used. My original fix in #151409 was to raise an error, but this change now handles the empty name via `Mangler::getNameWithPrefix` (which assigns a name to the function). To get this working, I had to create the `Mangler` in `TargetLoweringObjectFile` early so it would be available to Arm64EC's lowering. There's no reason why `Mangler` is only created when `Initialize` is called (or re-created if it exists), and so I moved creation to the constructor and switched the raw pointer for a `unique_ptr` to avoid the explicit `delete` in the destructor.	2025-08-04 16:20:55 -07:00
David Green	b30d5315b7	[AArch64] Add better fcmp costs for expanded predicates (#147940 ) Certain fcmp predicates need to be expanded into multiple operations and or'd together. This adds some more accurate cost modelling for them based on the predicate. Unsupported operations are given the cost of a libcall and the latency is set to 2 as that seemed to be fairly common between different CPUs.	2025-08-04 13:42:57 +01:00
David Green	e136fb04f2	[AArch64] Add sve bf16 fpext and fpround costs. (#150485 ) This prevents them from generating Invalid costs, as generating the instructions seems to work fine with and without +bf16. The costs are mostly taken from the number of instructions (minus ptrue and constants).	2025-08-04 09:47:41 +01:00
Fangrui Song	5d5ce06cae	MCSymbolELF: Migrate away from classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 16:21:19 -07:00
Fangrui Song	e640ca8b9a	MCSymbolELF: Migrate away from classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 15:45:36 -07:00
Fangrui Song	5570ce5cef	MCSymbolELF: Migrate away from classof The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.	2025-08-03 15:17:13 -07:00
Fangrui Song	d3589edafc	MCAsmBackend::applyFixup: Change `Data` to indicate the relocated location `Data` now references the first byte of the fixup offset within the current fragment. MCAssembler::layout asserts that the fixup offset is within either the fixed-size content or the optional variable-size tail, as this is the most the generic code can validate without knowing the target-specific fixup size. Many backends applyFixup assert ``` assert(Offset + Size <= F.getSize() && "Invalid fixup offset!"); ``` This refactoring allows a subsequent change to move the fixed-size content outside of MCSection::ContentStorage, fixing the -fsanitize=pointer-overflow issue of #150846 Pull Request: https://github.com/llvm/llvm-project/pull/151724	2025-08-02 09:27:06 -07:00
AZero13	23022a4683	[SelectionDAG] Move sign pattern check from AArch64 and ARM to general SelectionDAG (#151736 ) This works on all cases much like the XOR case above it in SelectionDAG.	2025-08-01 14:46:51 -07:00
Daniel Paoliello	c696ecddee	[win][arm64ec] Handle `available_externally` functions (#151610 ) While testing Arm64EC, I observed that LLVM crashes when an `available_externally` function is used as it tries to place it in a COMDAT, which is not permitted by the verifier. This the fix from #151409 plus a dedicated test.	2025-08-01 14:04:39 -07:00
Sergei Barannikov	e0fa6569c8	[AArch64] Add getCondCode() helper (NFC) (#150521 ) Follow-up to #150313.	2025-08-01 16:27:35 +03:00
Sander de Smalen	76ce464073	[AArch64] Dont inline streaming fn into non-streaming caller (#150595 ) Without this change, the following test would fail to compile with `-march=armv8-a+sme`: ``` void func1(const svuint32_t in, svuint32_t out) { [&]() __arm_streaming { out = in; }(); } ``` But in general, it's probably better never to inline streaming functions into non-streaming functions, because they will have been marked as 'streaming' for a reason by the user.	2025-08-01 09:05:19 +01:00
Fangrui Song	491c7bdd58	MCAsmBackend::applyFixup: Replace Data.getSize() with F.getSize() to facilitate replacing `MutableArrayRef<char> Data` (fragment content) with the relocated location. This is necessary to fix the pointer-overflow sanitizer issue and reland #150846	2025-08-01 00:31:51 -07:00
John Brawn	9a9b8b7d1c	[AArch64] Allow unrolling of scalar epilogue loops (#151164 ) #147420 changed the unrolling preferences to permit unrolling of non-auto vectorized loops by checking for the isvectorized attribute, however when a loop is vectorized this attribute is put on both the vector loop and the scalar epilogue, so this change prevented the scalar epilogue from being unrolled. Restore the previous behaviour of unrolling the scalar epilogue by checking both for the isvectorized attribute and vector instructions in the loop.	2025-07-31 11:03:41 +01:00
AZero13	10b323b993	[AArch64][GISel] Signed comparison using CMN is safe when the subtraction is nsw (#150480 ) #141993 but for GISel, though for LHS, we now do the inverse compare, something that SelDAG does not do as of now but I will do in a future patch.	2025-07-31 07:48:13 +01:00
David Green	3313cf4a83	[AArch64][GlobalISel] Add push_mul_through_s/zext (#141551 ) This extends the existing push_add_through_zext to handle mul, similar to performVectorExtCombine in SDAG. This allows muls to be pushed up the tree of extends, operating on smaller vector types whilst keeping the result the same (providing there are > 2x bits in the output). matchExtAddvToUdotAddv needs to be adjusted to make sure it keeps generating dot instructions from add(ext(mul(ext, ext))).	2025-07-31 07:38:11 +01:00
Prabhu Rajasekaran	17ccb849f3	[llvm] Extract and propagate callee_type metadata Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds from callee_type metadata attached to indirect call instructions. Reviewers: nikic, ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87575	2025-07-30 14:56:39 -07:00
Paul Walker	3a4d506cb0	[LLVM][CodeGen][AArch64] Prevent invalid extract_elt within combineStoreValueFPToInt. This reverts a small part of https://github.com/llvm/llvm-project/pull/147707 that triggers an isel failure because we cannot extract an >i32 element into an i64 result.	2025-07-30 17:54:15 +00:00
Guy David	58d70dc62b	[AArch64] Keep floating-point conversion in SIMD (#147707 ) Stores can be issued faster if the result is kept in the SIMD/FP registers. The `HasOneUse` guards against creating two floating point conversions, if for example there's some arithmetic done on the converted value as well. Another approach would be to inspect the user instructions during lowering, but I don't see that type of check in the lowering too often.	2025-07-30 14:53:56 +03:00
Ricardo Jesus	4fdf07fd46	[AArch64] Use CNEG for absolute difference patterns. (#151177 ) The current code generated for absolute difference patterns (a > b ? a - b : b - a) typically consists of sequences of: ``` sub w8, w1, w0 subs w9, w0, w1 csel w0, w9, w8, hi ``` The first sub is redundant if the csel is replaced by a cneg: ``` subs w8, w0, w1 cneg w0, w8, ls ``` This is achieved by canonicalising ``` select_cc lhs, rhs, sub(lhs, rhs), sub(rhs, lhs), cc -> select_cc lhs, rhs, sub(lhs, rhs), neg(sub(lhs, rhs)), cc select_cc lhs, rhs, sub(rhs, lhs), sub(lhs, rhs), cc -> select_cc lhs, rhs, neg(sub(lhs, rhs)), sub(lhs, rhs), cc ``` as the second forms can already be matched. This helps with some of the patterns in #118413.	2025-07-30 12:29:13 +01:00
Paul Walker	20293ebd31	[LLVM][CodeGen][SME] Only emit strided loads in streaming mode. (#150445 ) The selection code for aarch64_sve_ld[nt]1_pn_x{2,4} intrinsics gates the use of strided load instructions behind the SME2 target feature. However, the instructions are only available in streaming mode.	2025-07-30 11:41:46 +01:00
David Green	4687a7647f	[AArch64][GlobalISel] Lower udot/sdot intrinsics to G_UDOT/G_SDOT This allows them to be selected using the same pathways as normal lowering. USDOT is not handled yet as we do not yet have a node for it.	2025-07-30 11:02:30 +01:00
Ricardo Jesus	7a0024d694	[AArch64] Refactor AND/ANDS bitmask splitting (NFC). (#150619 ) This patch generalises the logic for splitting bitmasks for AND/ANDS immediate instructions, to prepare it to handle more opcodes, as in #150394.	2025-07-30 09:28:52 +01:00
David Green	1f66724725	[AArch64] Create a performRNDRCombine to pull code out of PerformDAGCombine. NFC	2025-07-30 06:01:40 +01:00
paperchalice	adcad6adc9	[AArch64] Remove `UnsafeFPMath` (#150876 ) We should always use fast math flags, remove these global flags incrementally. See also https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797	2025-07-29 12:41:09 +08:00
Fangrui Song	f517ac2083	MCSectionCOFF: Avoid cast The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.	2025-07-26 10:04:04 -07:00
Fangrui Song	2571924ad6	MCSectionELF: Remove classof The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.	2025-07-25 09:50:19 -07:00
Jonathan Cohen	81bbe98abf	Revert "[AArch64][Machine-Combiner] Split gather patterns into neon regs to multiple vectors (#142941 )" (#150505 ) Reverting due to reported miscompiles, will reland once it is fixed.	2025-07-25 14:09:23 +01:00
Anatoly Trosinenko	6e04e1e164	[AArch64][PAC] Introduce AArch64::PAC pseudo instruction (#146488 ) Introduce a pseudo instruction carrying address and immediate modifiers as separate operands to be selected instead of a pair of `MOVKXi` and `PAC[ID][AB]` . The new pseudo instruction is expanded in AsmPrinter, so that `MOVKXi` is emitted immediately before `PAC[ID][AB]`. This way, an attacker cannot control the immediate modifier used to sign the value, even if address modifier can be substituted. To simplify the instruction selection, select `AArch64::PAC` pseudo using TableGen pattern and post-process its `$AddrDisc` operand by custom inserter hook - this eliminates duplication of the logic for DAGISel and GlobalISel. Furthermore, this improves cross-BB analysis in case of DAGISel.	2025-07-25 13:10:37 +03:00
David Green	8b54dbeefe	[AArch64] Ensure the type of LDNP/STNP is always v2i64 (#150378 ) I think this is OK, that we always use v2i64 for the type of a LDNP/STNP nodes. Bitcasting the type should be fine for little endian. This helps with #150125.	2025-07-25 06:29:02 +01:00
Craig Topper	b82cf20bf4	[AArch64] Use getMergeValues to simplify code. NFC (#150337 )	2025-07-24 12:14:35 -07:00
AZero13	34986003d1	[AArch64] Predicate should be NE for CBNZW (#150287 ) Unable to reproduce yet, but this definitely seems wrong. Better safe than sorry. No effect on codegen as far as I know (because I have not been able to repro).	2025-07-24 15:45:21 +01:00
Paul Walker	94aa08a3b0	[LLVM][CodeGen][SVE] Don't combine shifts at the expense of addressing modes. (#149873 ) Fixes https://github.com/llvm/llvm-project/issues/149654	2025-07-24 10:54:48 +01:00
Sergei Barannikov	98562ffaaa	[AArch64] Fix the type of NZCV operands/results (#150313 ) Consistently use `FlagsVT` for operands/results of nodes that consume/produce NZCV flags. Previously, some of the operands/results had incorrect `MVT::Glue` type while others had `MVT_CC` type, which is supposed to be used for condition codes (`AArch64CC::CondCode` enum). Found by #150125.	2025-07-24 11:48:28 +03:00
Amina Chabane	b4edd827e4	[AArch64] Remove redundant FMOV for zero-extended i32/i16 loads to f64 (#146920 ) Previously, a separate load, zext and FMOV instruction was emitted. This patch adds a new TableGen pattern to avoid the unnecessary FMOV. A test is included in test/CodeGen/AArch64/load_u64_from_u32.ll	2025-07-24 07:47:32 +01:00
Kazu Hirata	3e53d4d386	[llvm] Remove unused includes (NFC) (#150265 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-23 15:18:46 -07:00

1 2 3 4 5 ...

9481 Commits