llvm-project

Author	SHA1	Message	Date
yingopq	754ed95b66	[Mips] Fix compiler crash when returning fp128 after calling a functi… (#117525 ) …on returning { i8, i128 } Fixes https://github.com/llvm/llvm-project/issues/96432.	2025-01-20 16:47:40 +08:00
David Green	f87a9db832	[ARM] Expand fp64 bf16 converts similarly to f32 This helps with +fp64 targets where the f64s are legal and not previously lowered. It can treat fpextends as a shift + cvt and fptrunc can use a libcall.	2025-01-03 11:28:31 +00:00
David Green	4472648998	[ARM] Expand bf16 expanding/rounding fp loads/stores As with other fp types, these should be expanded to prevent nodes that are illegal for Arm.	2024-12-20 09:03:28 +00:00
Craig Topper	f139bde8d8	[SelectionDAG] Move SDNode::use_iterator::getOperandNo to SDUse. (#120536 ) This allows us to write more range based for loops because we no longer need the iterator. It also matches IR's Use class.	2024-12-19 09:07:42 -08:00
Craig Topper	e6b2495545	[SelectionDAG] Split SDNode::use_iterator into user_iterator and use_iterator. (#120531 ) SDNode::use_iterator now returns an SDUse& when dereferenced. SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses work on use_iterator. SDNode::user_begin/user_end/users work on user_iterator. We can now write range based for loops using SDUse& and SDNode::uses(). I've converted many of these in this patch. I didn't update loops that have additional variables updated in their for statement. Some loops use SDNode::use_iterator::getOperandNo() which also prevents using range based for loops. I plan to move this into SDUse in a follow up patch.	2024-12-19 08:35:32 -08:00
David Green	e020f46027	[ARM] Fix BF16 lowering with FullFP16 This adds test coverage for bf16 instructions, making sure that lowering bf16 works with and without +fullfp16.	2024-12-19 10:20:35 +00:00
Craig Topper	bd261ecc5a	[SelectionDAG] Add SDNode::user_begin() and use it in some places (#120509 ) Most of these are just places that want the first user and aren't iterating over the whole list. While there I changed some use_size() == 1 to hasOneUse() which is more efficient. This is part of an effort to rename use_iterator to user_iterator and provide a use_iterator that dereferences to SDUse&. This patch helps reduce the diff on later patches.	2024-12-18 22:13:04 -08:00
Craig Topper	104ad9258a	[SelectionDAG] Rename SDNode::uses() to users(). (#120499 ) This function is most often used in range based loops or algorithms where the iterator is implicitly dereferenced. The dereference returns an SDNode * of the user rather than SDUse * so users() is a better name. I've long beeen annoyed that we can't write a range based loop over SDUse when we need getOperandNo. I plan to rename use_iterator to user_iterator and add a use_iterator that returns SDUse& on dereference. This will make it more like IR.	2024-12-18 20:09:33 -08:00
David Green	a35db2880a	[NFC] Remove some unnecessary semicolons All inside LLVM_DEBUG, some of which have been cleaned up by adding block scopes to allow them to format more nicely.	2024-12-16 08:48:57 +00:00
LiqinWeng	3083acc215	[DAGCombine] Remove oneuse restrictions for RISCV in folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2) in some scenarios (#101294 ) This patch remove the restriction for folding (shl (add_nsw x, c1)), c2) and folding (shl(sext(add x, c1)), c2), and test case from dhrystone , see this link: riscv32: https://godbolt.org/z/o8GdMKrae riscv64: https://godbolt.org/z/Yh5bPz56z	2024-12-10 11:17:54 +08:00
Sergei Barannikov	e0ed0333f0	Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887 ) Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: https://github.com/llvm/llvm-project/pull/118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.	2024-12-07 10:14:36 +03:00
Martin Storsjö	2a5e1da57a	Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232 ) Reverts llvm/llvm-project#116970. This change broke Wine compiled for armv7, causing segfaults when starting Wine. See llvm/llvm-project#116970 for more detailed discussion about the issue.	2024-12-02 00:02:25 +02:00
Sergei Barannikov	a348f223ca	[ARM] Stop gluing ALU nodes to branches / selects (#116970 ) Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: https://github.com/llvm/llvm-project/pull/116970	2024-11-30 08:14:24 +03:00
Sergei Barannikov	ad9dcd96dc	Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.	2024-11-22 22:29:58 +03:00
Sergei Barannikov	5d32a1409d	Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175 ) Reverts llvm/llvm-project#116676 Reverting per post-commit feedback (causes miscompilation errors and/or assertion failures).	2024-11-21 18:26:53 +03:00
Sergei Barannikov	8c56dd3040	[ARM] Stop gluing FP comparisons to FMSTAT (#116676 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with `cl_FPSCR_NZCV` register class. If this register class is inappropriate, `TRI::getCrossCopyRegClass` should be modified to return the correct class. Pull Request: https://github.com/llvm/llvm-project/pull/116676	2024-11-20 16:07:05 +03:00
Sergei Barannikov	aff98e4be0	[ARM] Stop gluing 1-bit shifts (#116547 ) 1. When two (or more) nodes are glued, DAG scheduler will always schedule them as one piece, i.e. it will not allow any instructions to be scheduled between them. It does so because if nodes are glued this usually means that there is an implicit register dependency between them, and an intervening node could clobber this physical register. When emitting such nodes into machine IR, they will also be stuck together, e.g.: ``` %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr %10:gpr = RRX %3, implicit $cpsr ``` 2. If a node has Glue result, SelectionDAG will not try to CSE this node. If it did, it would break the implicit physical register dependency. In practice this means that if a node with Glue result has multiple uses, it has to be duplicated before each use. This the reason for `ARMTargetLowering::duplicateCmp` to exist. When using normal data dependency, dependent nodes can freely be scheduled around. If there is a physical register dependency between nodes, the physical register will be copied to/from a virtual register, allowing other nodes to intervene between them. The resulting machine IR might look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = COPY $cpsr %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al /, $noreg, $noreg %12:gpr = BICri killed %11, -2147483648, 14 / CC::al /, $noreg, $noreg $cpsr = COPY %10 %13:gpr = RRX %3, implicit $cpsr ``` The two copies are likely to be eliminated by register coalescer, given that there are no instructions between them that clobber this physical register. If the copies are unwanted in the first place (they could be expensive or impossible), DAG scheduler will try to avoid inserting them wherever possible, and the resulting machine IR will look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = ORRrsi killed %9, %3, 242, 14 / CC::al /, $noreg, $noreg %11:gpr = BICri killed %10, -2147483648, 14 / CC::al */, $noreg, $noreg %12:gpr = RRX %3, implicit $cpsr ``` On ARM, arithmetic operations and LSLS already use the new data flow approach. This patch extends it to include 1-bit shifts. Pull Request: https://github.com/llvm/llvm-project/pull/116547	2024-11-19 17:46:48 +03:00
Craig Topper	cde4ae789e	[ARM] Use getSignedTargetConstant. NFC	2024-11-18 13:12:23 -08:00
Sergei Barannikov	baf59be89b	[SelectionDAG] Fix return types of TC_RETURN for several targets (#116504 ) TC_RETURN nodes do not have a glue result.	2024-11-17 02:14:05 +03:00
Kazu Hirata	9571cc2b28	[ARM] Remove unused includes (NFC) (#115995 ) Identified with misc-include-cleaner.	2024-11-12 23:15:21 -08:00
David Green	750247bc1c	[ARM] Fix APInt assert for CSNEG known bits. Use APInt::getAllOnes(32), to avoid the assert when constructing an APInt from -1.	2024-11-11 19:39:02 +00:00
dnsampaio	28d0718033	[DAGCombiner] Add combine avg from shifts (#113909 ) This teaches dagcombiner to fold: `(asr (add nsw x, y), 1) -> (avgfloors x, y)` `(lsr (add nuw x, y), 1) -> (avgflooru x, y)` as well the combine them to a ceil variant: `(avgfloors (add nsw x, y), 1) -> (avgceils x, y)` `(avgflooru (add nuw x, y), 1) -> (avgceilu x, y)` iff valid for the target. Removes some of the ARM MVE patterns that are now dead code. It adds the avg opcodes to `IsQRMVEInstruction` as to preserve the immediate splatting as before.	2024-10-31 10:57:27 +01:00
Craig Topper	50896e7ef5	[ARM] Use getSignedConstant. NFC	2024-10-30 21:43:16 -07:00
Oliver Stannard	376d7b27fa	[ARM] Optimise byval arguments in tail-calls We don't need to copy byval arguments to tail calls via a temporary, if we can prove that we are not copying from the outgoing argument area. This patch does this when the source if the argument is one of: * Memory in the local stack frame, which can't be used for tail-call arguments. * A global variable. We can also avoid doing the copy completely if the source and destination are the same memory location, which is the case when the caller and callee have the same signature, and pass some arguments through unmodified.	2024-10-25 09:34:09 +01:00
Oliver Stannard	914a3990d1	[ARM] Avoid clobbering byval arguments when passing to tail-calls When passing byval arguments to tail-calls, we need to store them into the stack memory in which this the caller received it's arguments. If any of the outgoing arguments are forwarded from incoming byval arguments, then the source of the copy is from the same stack memory. This can result in the copy corrupting a value which is still to be read. The fix is to first make a copy of the outgoing byval arguments in local stack space, and then copy them to their final location. This fixes the correctness issue, but results in extra copying, which could be optimised.	2024-10-25 09:34:09 +01:00
Oliver Stannard	78ec2e2ed5	[ARM] Allow tail calls with byval args Byval arguments which are passed partially in registers get stored into the local stack frame, but it is valid to tail-call them because the part which gets spilled is always re-loaded into registers before doing the tail-call, so it's OK for the spill area to be deallocated.	2024-10-25 09:34:08 +01:00
Oliver Stannard	82e6472197	[ARM] Allow functions with sret returns to be tail-called It is valid to tail-call a function which returns through an sret argument, as long as we have an incoming sret pointer to pass on.	2024-10-25 09:34:08 +01:00
Oliver Stannard	c1eb790cd2	[ARM] Tail-calls do not require caller and callee arguments to match The ARM backend was checking that the outgoing values for a tail-call matched the incoming argument values of the caller. This isn't necessary, because the caller can change the values in both registers and the stack before doing the tail-call. The actual limitation is that the callee can't need more stack space for it's arguments than the caller does. This is needed for code using the musttail attribute, as well as enabling tail calls as an optimisation in more cases.	2024-10-25 09:34:08 +01:00
Oliver Stannard	246baeb5fe	[ARM] Add debug trace for tail-call optimisation There are lots of reasons a call might not be eligible for tail-call optimisation, this adds debug trace to help understand the compiler's decisions here.	2024-10-25 09:34:08 +01:00
Oliver Stannard	8e289e4fa6	[ARM] Fix comment typo	2024-10-25 09:34:07 +01:00
Alex Rønne Petersen	5785cbb405	[llvm] Ensure that soft float targets don't emit `fma()` libcalls. (#106615 ) The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.	2024-10-19 06:13:15 -07:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
Jay Foad	85c17e4092	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706 ) Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.	2024-10-17 16:20:43 +01:00
Jay Foad	9255850e89	[LLVM] Remove unused variables after #112546	2024-10-16 16:15:34 +01:00
Jay Foad	d9c95efb6c	[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112546 ) Convert almost every instance of: CreateCall(Intrinsic::getOrInsertDeclaration(...), ...) to the equivalent CreateIntrinsic call.	2024-10-16 15:43:30 +01:00
Rahul Joshi	fa789dffb1	[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752 ) Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).	2024-10-11 05:26:03 -07:00
Jeffrey Byrnes	853c43d04a	[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564 ) Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.	2024-10-09 14:30:09 -07:00
Nashe Mncube	439dcfafc5	[llvm][ARM][NFC] Renaming FeaturePrefLoopAlignment (#109932 ) The feature 'FeaturePrefLoopAlignment' was misleading as it was used to set the alignment of branch targets such as functions. Renamed to FeaturePreferfBranchAlignment.	2024-09-26 13:36:12 +01:00
David Green	f1bbabd628	[ARM] Lower arm_neon_vbsl to ARMISD::VBSP and fold (vbsl x, y, y) to y (#109761 ) This helps clean up the patterns a little and will help share combines on both the intrinsic and VBSP. A combine is then added to fold away the VBSP if both the selected operands are the same.	2024-09-25 10:03:39 +01:00
David Green	78ff736db2	[ARM] Fix VMOVRRD combine with non-canonical inserts. (#109639 ) In some situations, in the test case here with the multiple calls being late legalized, we can see inserts of the form: ``` b = insert a, x, 0 c = insert b, y, 1 d = insert c, z, 0 bc = bitcast d e = extract bc, 0 r = vmovrrd e ``` The redundant insert will usually be removed, but in some cases are not prior to PerformVMOVRRDCombine. The code was finding the first insert from each lane (x and y), as opposed to the last (z and y).	2024-09-24 08:10:50 +01:00
Kazu Hirata	1e4e1ceeeb	[Target] Avoid repeated hash lookups (NFC) (#108677 )	2024-09-14 07:39:09 -07:00
David Green	637aa61732	[ARM] Fix VBICimm and VORRimm generation under Big endian. (#107813 ) This is a smaller follow on to #105519 that fixes VBICimm and VORRimm too. The logic behind lowering vector immediates under big endian Neon/MVE is to treat them in natural lane ordering (same as little endian), and VECTOR_REG_CAST them to the correct type (as opposed to creating the constants in big endian form and bitcasting them). This makes sure that is done when creating VORRIMM and VBICIMM.	2024-09-13 10:59:57 +01:00
Austin	3242e77841	[ARM][Codegen] Fix vector data miscompilation in arm32be (#105519 ) Fix #102418， resolved the issue of generating incorrect vrev during vectorization in big-endian scenarios	2024-09-07 14:09:29 +08:00
Nikita Popov	a7697c8655	[ARM] Do not assume alignment in vld1xN and vst1xN intrinsics (#106984 ) These intrinsics currently assume natural alignment. Instead, respect the alignment attribute on the intrinsic. Teach InstCombine to improve that alignment. If desired I could also adjust the clang frontend to add alignment annotations equivalent to the previous behavior, but I don't see any indication that such an assumption is correct in the ARM intrinsics docs. Fixes https://github.com/llvm/llvm-project/issues/59081.	2024-09-05 09:26:53 +02:00
Oliver Stannard	9cf68679c4	[ARM] Fix failure to register-allocate CMP_SWAP_64 pseudo-inst (#106721 ) This test case was failing to compile with a "ran out of registers during register allocation" error at -O0. This was because CMP_SWAP_64 has 3 operands which must be an even-odd register pair, and two other GPR operands. All of the def operands are also early-clobber, so registers can't be shared between uses and defs. Because the function has an over-aligned alloca it needs frame and base pointers, so r6 and r11 are both reserved. That leaves r0/r1, r2/r3, r4/r5 and r8/r9 as the only valid register pairs, and if the two individual GPR operands happen to get allocated to registers in different pairs then only 2 pairs will be available for the three GPRPair operands. To fix this, I've merged the two GPR operands into a single GPRPair operand. This means that the instruction now has 4 GPRPair operands, which can always be allocated without relying on luck. This does constrain register allocation a bit more, but this pseudo instruction is only used at -O0, so I don't think that's a problem.	2024-09-02 08:54:10 +01:00
Sergei Barannikov	4d7a0abae8	[DataLayout] Change return type of `getStackAlignment` to `MaybeAlign` (#105478 ) Currently, `getStackAlignment` asserts if the stack alignment wasn't specified. This makes it inconvenient to use and complicates testing. This change also makes `exceedsNaturalStackAlignment` method redundant.	2024-08-27 22:59:33 +03:00
Kiran	c50d11e6d9	Revert "[ARM] musttail fixes" committed by accident, see #104795 This reverts commit a2088a24dad31ebe44c93751db17307fdbe1f0e2.	2024-08-27 11:17:17 +01:00
Kiran	a2088a24da	[ARM] musttail fixes Backend: - Caller and callee arguments no longer have to match, just to take up the same space, as they can be changed before the call - Allowed tail calls if callee and callee both (or neither) use sret, wheras before it would be dissalowed if either used sret - Allowed tail calls if byval args are used - Added debug trace for IsEligibleForTailCallOptimisation Frontend (clang): - Do not generate extra alloca if sret is used with musttail, as the space for the sret is allocated already Change-Id: Ic7f246a7eca43c06874922d642d7dc44bdfc98ec	2024-08-27 10:44:06 +01:00
Craig Topper	4b0c0ec6b8	[CodeGen] Use MCRegister for CCState::AllocateReg and CCValAssign::getReg. NFC (#106032 )	2024-08-26 11:40:25 -07:00
David Green	b9a0276550	[ARM] Add VECTOR_REG_CAST identity fold. v16i8 VECTOR_REG_CAST (v16i8 Op) can use v16i8 Op directly, as the VECTOR_REG_CAST is a noop.	2024-08-24 21:21:27 +01:00

1 2 3 4 5 ...

2326 Commits