llvm-project

Author	SHA1	Message	Date
Zaara Syeda	22067a8eb4	[PowerPC] Fix assert exposed by PR 95931 in LowerBITCAST (#108062 ) Hit Assertion failed: Num < NumOperands && "Invalid child # of SDNode!" Fix by checking opcode and value type before calling getOperand.	2024-09-10 14:14:01 -04:00
Qiu Chaofan	06c331163e	[PowerPC] Implement llvm.set.rounding intrinsic (#67302 )	2024-09-10 14:30:31 +08:00
Jeremy Morse	7a930ce327	[DWARF] Emit a minimal line-table for totally empty functions (#107267 ) In degenerate but legal inputs, we can have functions that have no source locations at all -- all the DebugLocs attached to instructions are empty. LLVM didn't produce any source location for the function; with this patch it will at least emit the function-scope source location. Demonstrated by empty-line-info.ll The XCOFF test modified has similar symptoms -- with this patch, the size of the ".dwline" section grows a bit, thus shifting some of the file internal offsets, which I've updated.	2024-09-09 12:54:45 +01:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Matt Arsenault	100d9b8994	Reapply "AtomicExpand: Allow incrementally legalizing atomicrmw" (#107307 ) This reverts commit 63da545ccdd41d9eb2392a8d0e848a65eb24f5fa. Use reverse iteration in the instruction loop to avoid sanitizer errors. This also has the side effect of avoiding the AArch64 codegen quality regressions. Closes #107309	2024-09-06 18:37:34 +04:00
Simon Pilgrim	6ec889e53f	[DAG] Add support for neg(abd(x,y)) patterns. Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv. Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.	2024-09-06 13:16:09 +01:00
Matt Arsenault	fc3e6a8186	DAG: Handle lowering unordered compare with inf (#100378 ) Try to take advantage of the nan check behavior of fcmp. x86_64 looks better, x86_32 looks worse.	2024-09-05 19:54:32 +04:00
RolandF77	26ba186bd0	[PowerPC] Improve pwr7 codegen for v4i8 load (#104507 ) There are no partial vector loads on pwr7 so current v4i8 codegen is an int load then store to vector sized temp and re-load as vector. Try to use lfiwax to load 32 bits into an FP reg and take advantage of VSX FP and vector reg sharing to move the result to the right vector position.	2024-09-04 12:55:27 -04:00
Christudasan Devadasan	6c143a86cd	[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605 )	2024-09-04 18:54:07 +05:30
paperchalice	69657eb7f6	[llc] Provide `opt` like verifier options (#106665 ) - Support `verify-each` option. - Default behavior is verifying output only.	2024-09-04 17:37:34 +08:00
Michael Marjieh	00c198b2ca	[MachinePipeliner] Make Recurrence MII More Accurate (#105475 ) Current RecMII calculation is bigger than it needs to be. The calculation was refined in this patch.	2024-09-03 16:15:17 +09:00
Craig Topper	aa91d90cb0	[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595 ) This preserves the semantis of fneg and matches what we do in LegalizeDAG. I kept the legal FSUB check to force unrolling for some targets that don't have FSUB but have XOR. On Aarch64, using xor broke some tests that expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg (extractvectorelt X)))))) pattern.	2024-08-29 15:00:23 -07:00
Stephen Tozer	3d08ade7bd	[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149 ) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2024-08-29 17:53:32 +01:00
Matt Arsenault	7b7b0b95b2	DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (#105577 ) For some reason, isOperationLegalOrCustom is not the same as isOperationLegal \|\| isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization.	2024-08-29 14:05:43 +04:00
RolandF77	89bbcbe285	[PowerPC] fix legalization crash (#105563 ) If v2i64 scalar_to_vector is made custom, llc can crash in certain legalization cases where v2i64 vectors are injected, even if they weren't otherwise present. The code generated would be fine, but that operation is not handled in ReplaceNodeResults. Add handling.	2024-08-28 11:22:23 -04:00
Kai Luo	8e901c255d	[PowerPC] Retire PPCExpandISel pass (#84289 ) We can decide whether to expand isel or not in instruction selection pass and early-if-conversion pass. The transformation implemented in PPCExpandISel can be retired considering PPC backend doesn't generate `isel` instructions post-RA. Also if we are seeking performant branch-or-isel decision, we can turn to selectoptimize pass. --------- Co-authored-by: Kai Luo <lkail@cn.ibm.com>	2024-08-27 09:43:52 +08:00
Zaara Syeda	327edbe07a	[PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453 ) These builtins are currently returning CR0 which will have the format [0, 0, flag_true_if_saved, XER]. We only want to return flag_true_if_saved. This patch adds a shift to remove the XER bit before returning.	2024-08-22 09:55:46 -04:00
Sergei Barannikov	c91cc459d3	[DataLayout] Refactor the rest of `parseSpecification` (#104545 ) The aim is to improve test coverage of data layout string parsing. Pull Request: https://github.com/llvm/llvm-project/pull/104545	2024-08-20 11:25:49 +03:00
Qiu Chaofan	b6d1df2afd	[PowerPC] Support -mno-red-zone option (#94581 )	2024-08-19 17:58:08 +08:00
Amy Kwan	cf721e29c6	[PowerPC] Do not merge TLS constants within PPCMergeStringPool.cpp (#94059 ) This patch prevents thread-local constants to be merged within PPCMergeStringPool.cpp. The PPCMergeStringPool pass primarily merges non-thread-local constants together, and thread-local constants should not be mixed together with other (non-thread-local) constants. In the event that thread-local and other non-thread-local constants are pooled together, the llvm.threadlocal.address intrinsic can fail as it expects its argument to be a thread-local global value, but the merged string structure created by the PPCMergeStringPool pass is not thread-local as a whole.	2024-08-16 15:06:50 -04:00
Amy Kwan	9325381998	[PowerPC][GlobalMerge] Enable GlobalMerge by default on AIX (#101226 ) This patch turns on the GlobalMerge pass by default on AIX and updates LIT tests accordingly.	2024-08-15 15:25:54 -04:00
Craig Topper	abc1acf8df	[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751 ) If the upper bits of the shr aren't demanded. This helps with cases where the outer srl was originally an sra and was converted to a srl by SimplifyDemandedBits before it had a chance to combine with the inner sra. This can occur when the inner sra was part of a sign_extend_inreg expansion. There are some regressions in ARM and Thumb2.	2024-08-14 08:44:57 -07:00
Amy Kwan	5e990b0b7f	[PowerPC][GlobalMerge] Reduce TOC usage by merging internal and private global data (#101224 ) This patch aims to reduce TOC usage by merging internal and private global data. Moreover, we also add the GlobalMerge pass within the PPCTargetMachine pipeline, which is disabled by default. This transformation can be enabled by -ppc-global-merge.	2024-08-14 10:14:33 -04:00
RolandF77	8b6e9de3dd	[PowerPC] improve P10 store forwarding on P7 scalar to vector (#102330 ) Try to make P7 code with scalar to vector operations that use store/re-load to run smoother on P10 by supplying enough store width to cover the load and allow hardware store forwarding.	2024-08-12 12:30:06 -04:00
Peter Rong	74e4694b8c	[LTO] enable `ObjCARCContractPass` only on optimized build (#101114 ) \#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](`1579e9ca9c`). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>	2024-08-09 13:04:25 -07:00
Simon Pilgrim	13d04fa560	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) (REAPPLIED) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization	2024-08-08 11:39:05 +01:00
Tim Gymnich	408d82d352	[PowerPC] Respect endianness when bitcasting to fp128 (#95931 ) Fixes #92246 Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`. by inserting a missing swap of the arguments based on endianness. ### Current behaviour: fp128 bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi BUILD_FP128 %lo %hi => MTVSRDD %hi %lo v2i64 bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi	2024-08-08 08:51:04 +08:00
Lei Huang	64510c1411	[PPC] Implement BCD assist builtins (#101390 ) Implement BCD assist builtins for XL and GCC compatibility. GCC compat: ``` unsigned int __builtin_cdtbcd (unsigned int); unsigned int __builtin_cbcdtd (unsigned int); unsigned int __builtin_addg6s (unsigned int, unsigned int); ``` 64BIT XL compat: ``` long long __cdtbcd (long long); long long __cbcdtd (long long); long long __addg6s (long long source1, long long source2) ```	2024-08-07 13:38:48 -04:00
Simon Pilgrim	e4e96b3e26	Revert b1234ddbe2652aa7948242a57107ca7ab12fd2f8. "[DAG] Add legalization handling for ABDS/ABDU (#92576 )" Reverting #92576 while we identify a reported regression	2024-08-07 17:11:25 +01:00
Zaara Syeda	d07f106e51	[PPC][AIX] Save/restore r31 when using base pointer (#100182 ) When the base pointer r30 is used to hold the stack pointer, r30 is spilled in the prologue. On AIX registers are saved from highest to lowest, so r31 also needs to be saved. Fixes https://github.com/llvm/llvm-project/issues/96411	2024-08-07 09:59:45 -04:00
Ricardo Jesus	fc157522c5	[LICM] Prevent fold and hoist of binary ops with over 2 uses (#102114 ) This limits folding and hoisting associative binary ops to cases where the intermediate op has at most two uses. The more uses the intermediate op has, the more new ops we have to create to potentially reduce the loop's critical path. We keep the limit to two uses to minimise undesirable increases in code size.	2024-08-07 09:52:30 +01:00
Simon Pilgrim	b1234ddbe2	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv	2024-08-06 10:18:06 +01:00
Alexis Engelke	fa92d51f9e	[VP] Merge ExpandVP pass into PreISelIntrinsicLowering (#101652 ) Similar to #97727; avoid an extra pass over the entire IR by performing the lowering as part of the pre-isel-intrinsic-lowering pass.	2024-08-06 09:27:59 +02:00
Alexis Engelke	b5fc083dc3	[CodeGen] Merge lowerConstantIntrinsics into pre-isel lowering (#97727 ) Currently, the LowerConstantIntrinsics pass does an RPO traversal of every function... only to find that many functions don't have constant intrinsics (is.constant, objectsize). In the CodeGen pipeline, there is already a pre-isel intrinsic lowering pass, which iterates over intrinsic declarations and lowers all users. Call lowerConstantIntrinsics from this pass to avoid the extra iteration over the entire IR and the RPO traversal.	2024-08-01 17:44:32 +02:00
Stefan Pintilie	53c37f300d	[PowerPC] Add phony subregisters to cover the high half of the VSX registers. (#94628 ) On PowerPC there are 128 bit VSX registers. These registers are half overlapped with 64 bit floating point registers (FPR). The 64 bit half of the VXS register that does not overlap with the FPR does not overlap with any other register class. The FPR are the only subregisters of the VSX registers but they do not fully cover the 128 bit super register. This leads to incorrect lane masks being created. This patch adds phony registers for the other half of the VSX registers in order to fully cover them and to make sure that the lane masks are not the same for the VSX and the floating point register.	2024-07-29 11:17:04 -04:00
Ricardo Jesus	25da8e5a97	Reapply "[LICM] Fold associative binary ops to promote code hoisting (#81608 )" (#100377 ) This reapplies a more strict version of `f2ccf80136`. Perform the transformation "(LV op C1) op C2" ==> "LV op (C1 op C2)" where op is an associative binary op, LV is a loop variant, and C1 and C2 are loop invariants, and hoist (C1 op C2) into the preheader. For now this fold is restricted to ADDs.	2024-07-26 10:12:25 +01:00
Qiu Chaofan	20957d2091	[AIX] Add -msave-reg-params to save arguments to stack (#97524 ) In PowerPC ABI, a few initial arguments are passed through registers, but their places in parameter save area are reserved, arguments passed by memory goes after the reserved location. For debugging purpose, we may want to save copy of the pass-by-reg arguments into correct places on stack. The new option achieves by adding new function level attribute and make argument lowering part aware of it.	2024-07-24 20:58:37 +08:00
Stefan Pintilie	26fa399012	[RegisterCoalescer] Fix SUBREG_TO_REG handling in the RegisterCoalescer. (#96839 ) The issue with the handling of the SUBREG_TO_REG is that we don't join the subranges correctly when we join live ranges across the SUBREG_TO_REG. For example when joining across this: ``` 32B %2:gr64_nosp = SUBREG_TO_REG 0, %0:gr32, %subreg.sub_32bit ``` we want to join these live ranges: ``` %0 [16r,32r:0) 0@16r weight:0.000000e+00 %2 [32r,112r:0) 0@32r weight:0.000000e+00 ``` Before the fix the range for the resulting merged `%2` is: ``` %2 [16r,112r:0) 0@16r weight:0.000000e+00 ``` After the fix it is now this: ``` %2 [16r,112r:0) 0@16r L000000000000000F [16r,112r:0) 0@16r weight:0.000000e+00 ``` Two tests are added to this fix. The X86 test fails without the patch. The PowerPC test passes with and without the patch but is added as a way track future possible failures when register classes are changed in a future patch.	2024-07-23 21:59:27 -04:00
Wesley Wiser	ca076f7a63	[LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263 ) Rebase of #84114. I've only included the core changes to frame layout calculation & CFI generation which sidesteps the regressions found after merging #84114. Since these changes are a necessary precursor to the overall fix and are themselves slightly beneficial as CFI is now generated correctly, I think it is reasonable to merge this first step. --- For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and fixes CFI to use the corrected sizes. After this patch, additional work is needed to fix offset truncations in each target's codegen.	2024-07-23 09:43:30 -07:00
azhan92	1df4d866cc	[PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511 ) This PR adds support for -mcpu=pwr11/power11 and -mtune=pwr11/power11 in clang and llvm.	2024-07-23 09:49:41 -04:00
Nikita Popov	b48819dbcd	Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608 )" This reverts commit f2ccf80136a01ca69f766becafb329db6c54c0c8. The flag propagation code is incorrect.	2024-07-23 12:01:22 +02:00
Ricardo Jesus	f2ccf80136	[LICM] Fold associative binary ops to promote code hoisting (#81608 ) Perform the transformation "(LV op C1) op C2" ==> "LV op (C1 op C2)" where op is an associative binary op, LV is a loop variant, and C1 and C2 are loop invariants to hoist. Similar patterns could be folded (left in comment) but this one seems to be the most impactful.	2024-07-23 10:03:26 +01:00
Chen Zheng	43213002b9	[PowerPC] Support -fpatchable-function-entry (#92997 ) For now only PPC big endian Linux 32 and 64 bit are supported. PPC little endian Linux has XRAY support for 64-bit. PPC AIX has different patchable function entry implementations. Fixes #63220 Fixes #57031	2024-07-22 08:51:51 +08:00
paperchalice	1b873e565e	[CodeGen][NewPM] Port `phi-node-elimination` to new pass manager (#98867 ) - Add `PHIEliminationPass `. - Support new pass manager in `MachineBasicBlock:: SplitCriticalEdge `	2024-07-17 11:26:56 +08:00
Volodymyr Vasylkun	e094abde42	[SelectionDAG] Expand [US]CMP using arithmetic on boolean values instead of selects (#98774 ) The previous expansion of [US]CMP was done using two selects and two compares. It produced decent code, but on many platforms it is better to implement [US]CMP nodes by performing the following operation: ``` [us]cmp(x, y) = (x [us]> y) - (x [us]< y) ``` This patch adds this new expansion, as well as a hook in TargetLowering to allow some targets to still use the select-based approach. AArch64 and SystemZ are currently the only targets to prefer the former approach, but other targets may also start to use it if it provides for better codegen.	2024-07-16 20:56:18 +01:00
Lei Huang	266a784cce	[PowerPC] Ensure MI peephole knows about instr modified by combineRLWINM() (#97134 ) Ensure registers used in instructions modified by `combineRLWINM()` are added to list of `RegsToUpdate`.	2024-07-16 11:46:37 -04:00
esmeyi	c119da23af	[PowerPC] Function descriptor symbol may be omitted for external symbol. #97526 If a function's address is taken, which means it may be called via a function pointer, we need the function descriptor for it. Otherwise, the function descriptor can be omitted for external symbols.	2024-07-08 03:47:33 -04:00
Matt Arsenault	db9252b115	DAG: Call SimplifyDemandedBits on fcopysign sign value (#97151 ) Math library code has quite a few places with complex bit logic that are ultimately fed into a copysign. This helps avoid some regressions in a future patch. This assumes the position in the float type, which should at least be valid for IEEE types. Not sure if we need to guard against ppc_fp128 or anything else weird. There appears to be some value in simplifying the value operand as well, but I'll address that separately.	2024-07-01 12:19:17 +02:00
Chen Zheng	e1c03ddc9b	[PowerPC] use r1 as the frame pointer when there is dynamic alloca On PPC, when there is dynamic alloca, only r1 points to the backchain.	2024-06-20 22:26:52 -04:00
Chen Zheng	abaaa48ce6	[PowerPC] fix frameaddress error when there is dynamic alloca call, NFC	2024-06-20 22:26:48 -04:00

1 2 3 4 5 ...

3928 Commits