llvm-project

Author	SHA1	Message	Date
Maryam Moghadas	8a0cb9ac86	[PowerPC] Add custom lowering for ssubo (#111748 ) This patch is to improve the codegen for ssubo node for i32 in 64-bit mode by custom lowering.	2024-10-29 15:43:05 -04:00
Simon Pilgrim	32aa782ea2	[PowerPC] copysignl.ll - regenerate to reduce the diff in #111269	2024-10-29 10:57:43 +00:00
Serge Pavlov	819abe412d	[Test] Fix usage of constrained intrinsics (#113523 ) Some tests contain errors in constrained intrinsic usage, such as missed or extra type parameters, wrong type parameters order and some other. --------- Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>	2024-10-28 14:07:32 +07:00
Zaara Syeda	f3131c99bf	[GlobalMerge] Aggressively merge constants to reduce TOC entries (#111756 ) Symbols that get mapped into the read-only section are loaded as part of the text segment and will always need a TOC entry to be addressable. Add an option to aggressively merge these read only globals to reduce TOC usage.	2024-10-24 10:16:39 -04:00
Lei Huang	522f34cfff	[PowerPC] Expand global named register support (#113482 ) Enable all valid registers for intrinsics that read from and write to global named registers.	2024-10-24 10:05:18 -04:00
Lei Huang	a19f05b9ec	Revert "[PowerPC] Expand global named register support" (#113457 ) Reverts llvm/llvm-project#112603	2024-10-23 09:36:28 -04:00
Lei Huang	06d192925d	[PowerPC] Expand global named register support (#112603 ) Enable all valid registers for intrinsics that read from and write to global named registers.	2024-10-22 14:34:24 -04:00
RolandF77	fc59f2cc0f	[PowerPC] special case small int constant for custom scalar_to_vector (#109850 ) Special case small int constant in the PPC custom lowering of scalar_to_vector.	2024-10-21 12:19:07 -04:00
Zaara Syeda	c5ca1b8626	[PPC] Add custom lowering for uaddo (#110137 ) Improve the codegen for uaddo node for i64 in 64-bit mode and i32 in 32-bit mode by custom lowering.	2024-10-21 11:13:16 -04:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
Keith Packard	44b020a381	[PowerPC][ISelLowering] Support -mstack-protector-guard=tls (#110928 ) Add support for using a thread-local variable with a specified offset for holding the stack guard canary value. This supports both 32- and 64- bit PowerPC targets. This mirrors changes from #108942 but targeting PowerPC instead of RISCV. Because both of these PRs modify the same driver functions, this series is stack on top of the RISC-V one. --------- Signed-off-by: Keith Packard <keithp@keithp.com>	2024-10-17 19:06:47 -07:00
Qiongsi Wu	f9d0789064	[PGO] Initialize GCOV Writeout and Reset Functions in the Runtime on AIX (#108570 ) This PR registers the writeout and reset functions for `gcov` for all modules in the PGO runtime, instead of registering them using global constructors in each module. The change is made for AIX only, but the same mechanism works on Linux on Power. When registering such functions using global constructors in each module without `-ffunction-sections`, the AIX linker cannot garbage collect unused undefined symbols, because such symbols are grouped in the same section as the `__sinit` symbol. Keeping such undefined symbols causes link errors (see test case https://github.com/llvm/llvm-project/pull/108570/files#diff-500a7e1ba871e1b6b61b523700d5e30987900002add306e1b5e4972cf6d5a4f1R1 for this scenario). This PR implements the initialization in the runtime, hence avoiding introducing `__sinit` into each module. The implementation adds a new global variable `__llvm_covinit_functions` to each module. This new global variable contains the function pointers to the `Writeout` and `Reset` functions. `__llvm_covinit_functions`'s section is the named section `__llvm_covinit`. The linker will aggregate all the `__llvm_covinit` sections from each module to form one single named section in the final binary. The pair of functions ``` const __llvm_gcov_init_func_struct __llvm_profile_begin_covinit(); const __llvm_gcov_init_func_struct __llvm_profile_end_covinit(); ``` are implemented to return the start and end address of this named section in the final binary, and they are used in function ``` __llvm_profile_gcov_initialize() ``` (which is a constructor function in the runtime) so the runtime knows the addresses of all the `Writeout` and `Reset` functions from all the modules. One noticeable implementation detail relevant to AIX is that to preserve the `__llvm_covinit` from the linker's garbage collection, a `.ref` pseudo instruction is inserted into them, referring to the section that contains the `__llvm_gcov_ctr` variables, which are used in the instrumented code. The `__llvm_gcov_ctr` variables did not belong to named sections before, but this PR added them to the `__llvm_gcov_ctr_section` named section, so we can add a `.ref` pseudo instruction that refers to them in the `__llvm_covinit` section.	2024-10-17 09:32:10 -04:00
Stefan Pintilie	dcc5ba4a4d	[PowerPC] Add missing patterns for lround when i32 is returned. (#111863 ) The patch adds support for lround when the output type of the rounding is i32. The support for a rounding result of type i64 existed before this patch.	2024-10-16 10:25:09 -04:00
Christudasan Devadasan	488d3924dd	[CodeGen][NewPM] Port EarlyIfConversion pass to NPM. (#108508 )	2024-10-16 13:22:57 +05:30
Akshat Oke	8b20f1b924	[MIR] Fix tests for flags in register info (#112179 ) [MIR] Serialize virtual register flags #110228 introduces register flags which appear empty in .mir dumps. Future tests should use `-simplify-mir`.	2024-10-14 18:28:54 +05:30
Oliver Stannard	1e49670b31	[DAGISel] Keep flags when converting FP load/store to integer (#111679 ) This DAG combine replaces a floating-point load/store pair which has no other uses with an integer one, but did not copy the memory operand flags to the new instructions, resulting in it dropping the volatile flag. This optimisation is still valid if one or both of the instructions is volatile, so we can copy over the whole MachineMemOperand to generate volatile integer loads and stores where needed.	2024-10-10 09:17:50 +01:00
Simon Pilgrim	55890968ac	[PowerPC] vec-min-max.ll - regenerate with common check prefixes to reduce duplication. NFC.	2024-10-08 17:36:35 +01:00
Ramkumar Ramachandra	45817aa726	LICM: hoist BO assoc for and, or, xor (#111146 ) Trivially lift the Opcode limitation on hoistBOAssociation to also hoist and, or, and xor. Alive2 proofs: https://alive2.llvm.org/ce/z/rVNP2X	2024-10-04 19:13:51 +01:00
Matt Arsenault	187dcd8e22	DAG: Preserve disjoint flag when emitting final instructions (#110795 )	2024-10-02 19:37:04 +04:00
Craig Topper	92a8b81bdf	[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed vectors in some cases. (#109232 ) Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and ARM.	2024-09-30 11:44:49 -07:00
Timothy Pearson	90c1474863	[SDAG] Honor signed arguments in floating point libcalls (#109134 ) In ExpandFPLibCall, an assumption is made that all floating point libcalls that take integer arguments use unsigned integers. In the case of ldexp and frexp, this assumption is incorrect, leading to miscompilation and subsequent target-dependent incorrect operation. Indicate that ldexp and frexp utilize signed arguments in ExpandFPLibCall. Fixes #108904 Signed-off-by: Timothy Pearson <tpearson@solidsilicon.com>	2024-09-25 11:09:50 +04:00
futog	3e0a76b1fd	[Codegen][LegalizeIntegerTypes] Improve shift through stack (#96151 ) Minor improvement on cc39c3b17fb2598e20ca0854f9fe6d69169d85c7. Use an aligned stack slot to store the shifted value. Use the native register width as shifting unit, so the load of the shift result is aligned. If the shift amount is a multiple of the native register width, there is no need to do a follow-up shift after the load. I added new tests for these cases. Co-authored-by: Gergely Futo <gergely.futo@hightec-rt.com>	2024-09-23 11:45:43 +02:00
Akshat Oke	d2d78e584b	[NewPM][CodeGen] Port MachineLICM to NPM (#107376 )	2024-09-20 11:34:18 +05:30
Zaara Syeda	22067a8eb4	[PowerPC] Fix assert exposed by PR 95931 in LowerBITCAST (#108062 ) Hit Assertion failed: Num < NumOperands && "Invalid child # of SDNode!" Fix by checking opcode and value type before calling getOperand.	2024-09-10 14:14:01 -04:00
Qiu Chaofan	06c331163e	[PowerPC] Implement llvm.set.rounding intrinsic (#67302 )	2024-09-10 14:30:31 +08:00
Jeremy Morse	7a930ce327	[DWARF] Emit a minimal line-table for totally empty functions (#107267 ) In degenerate but legal inputs, we can have functions that have no source locations at all -- all the DebugLocs attached to instructions are empty. LLVM didn't produce any source location for the function; with this patch it will at least emit the function-scope source location. Demonstrated by empty-line-info.ll The XCOFF test modified has similar symptoms -- with this patch, the size of the ".dwline" section grows a bit, thus shifting some of the file internal offsets, which I've updated.	2024-09-09 12:54:45 +01:00
anjenner	4af249fe6e	Add usub_cond and usub_sat operations to atomicrmw (#105568 ) These both perform conditional subtraction, returning the minuend and zero respectively, if the difference is negative.	2024-09-06 16:19:20 +01:00
Matt Arsenault	100d9b8994	Reapply "AtomicExpand: Allow incrementally legalizing atomicrmw" (#107307 ) This reverts commit 63da545ccdd41d9eb2392a8d0e848a65eb24f5fa. Use reverse iteration in the instruction loop to avoid sanitizer errors. This also has the side effect of avoiding the AArch64 codegen quality regressions. Closes #107309	2024-09-06 18:37:34 +04:00
Simon Pilgrim	6ec889e53f	[DAG] Add support for neg(abd(x,y)) patterns. Currently limited to cases which have legal/custom ABDS/ABDU handling - I'll extend this for all targets in future (similar to how we support neg(abs(x))) once I've addressed some outstanding regressions on aarch64/riscv. Helps avoid a lot of extra cmov instructions on x86 in particular, and allows us to more easily improve the codegen in future commits.	2024-09-06 13:16:09 +01:00
Matt Arsenault	fc3e6a8186	DAG: Handle lowering unordered compare with inf (#100378 ) Try to take advantage of the nan check behavior of fcmp. x86_64 looks better, x86_32 looks worse.	2024-09-05 19:54:32 +04:00
RolandF77	26ba186bd0	[PowerPC] Improve pwr7 codegen for v4i8 load (#104507 ) There are no partial vector loads on pwr7 so current v4i8 codegen is an int load then store to vector sized temp and re-load as vector. Try to use lfiwax to load 32 bits into an FP reg and take advantage of VSX FP and vector reg sharing to move the result to the right vector position.	2024-09-04 12:55:27 -04:00
Christudasan Devadasan	6c143a86cd	[CodeGen][NewPM] Port MachineCSE pass to new pass manager. (#106605 )	2024-09-04 18:54:07 +05:30
paperchalice	69657eb7f6	[llc] Provide `opt` like verifier options (#106665 ) - Support `verify-each` option. - Default behavior is verifying output only.	2024-09-04 17:37:34 +08:00
Michael Marjieh	00c198b2ca	[MachinePipeliner] Make Recurrence MII More Accurate (#105475 ) Current RecMII calculation is bigger than it needs to be. The calculation was refined in this patch.	2024-09-03 16:15:17 +09:00
Craig Topper	aa91d90cb0	[LegalizeVectorOps][PowerPC] Use xor to expand fneg. (#106595 ) This preserves the semantis of fneg and matches what we do in LegalizeDAG. I kept the legal FSUB check to force unrolling for some targets that don't have FSUB but have XOR. On Aarch64, using xor broke some tests that expected to see a (v1f64 (fma (insertvector_elt (f64 (fneg (extractvectorelt X)))))) pattern.	2024-08-29 15:00:23 -07:00
Stephen Tozer	3d08ade7bd	[ExtendLifetimes] Implement llvm.fake.use to extend variable lifetimes (#86149 ) This patch is part of a set of patches that add an `-fextend-lifetimes` flag to clang, which extends the lifetimes of local variables and parameters for improved debuggability. In addition to that flag, the patch series adds a pragma to selectively disable `-fextend-lifetimes`, and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes` for this pointers only. All changes and tests in these patches were written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer) has handled review and merging. The extend lifetimes flag is intended to eventually be set on by `-Og`, as discussed in the RFC here: https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850 This patch implements a new intrinsic instruction in LLVM, `llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand and has no effect other than "using" its operand, to ensure that its operand remains live until after the fake use. This patch does not emit fake uses anywhere; the next patch in this sequence causes them to be emitted from the clang frontend, such that for each variable (or this) a fake.use operand is inserted at the end of that variable's scope, using that variable's value. This patch covers everything post-frontend, which is largely just the basic plumbing for a new intrinsic/instruction, along with a few steps to preserve the fake uses through optimizations (such as moving them ahead of a tail call or translating them through SROA). Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>	2024-08-29 17:53:32 +01:00
Matt Arsenault	7b7b0b95b2	DAG: Check if is_fpclass is custom, instead of isLegalOrCustom (#105577 ) For some reason, isOperationLegalOrCustom is not the same as isOperationLegal \|\| isOperationCustom. Unfortunately, it checks if the type is legal which makes it uesless for custom lowering on non-legal types (which is always ppcf128). Really the DAG builder shouldn't be going to expand this in the builder, it makes it difficult to work with. It's only here to work around the DAG requiring legal integer types the same size as the FP type after type legalization.	2024-08-29 14:05:43 +04:00
RolandF77	89bbcbe285	[PowerPC] fix legalization crash (#105563 ) If v2i64 scalar_to_vector is made custom, llc can crash in certain legalization cases where v2i64 vectors are injected, even if they weren't otherwise present. The code generated would be fine, but that operation is not handled in ReplaceNodeResults. Add handling.	2024-08-28 11:22:23 -04:00
Kai Luo	8e901c255d	[PowerPC] Retire PPCExpandISel pass (#84289 ) We can decide whether to expand isel or not in instruction selection pass and early-if-conversion pass. The transformation implemented in PPCExpandISel can be retired considering PPC backend doesn't generate `isel` instructions post-RA. Also if we are seeking performant branch-or-isel decision, we can turn to selectoptimize pass. --------- Co-authored-by: Kai Luo <lkail@cn.ibm.com>	2024-08-27 09:43:52 +08:00
Zaara Syeda	327edbe07a	[PowerPC] Fix mask for __st[d/w/h/b]cx builtins (#104453 ) These builtins are currently returning CR0 which will have the format [0, 0, flag_true_if_saved, XER]. We only want to return flag_true_if_saved. This patch adds a shift to remove the XER bit before returning.	2024-08-22 09:55:46 -04:00
Sergei Barannikov	c91cc459d3	[DataLayout] Refactor the rest of `parseSpecification` (#104545 ) The aim is to improve test coverage of data layout string parsing. Pull Request: https://github.com/llvm/llvm-project/pull/104545	2024-08-20 11:25:49 +03:00
Qiu Chaofan	b6d1df2afd	[PowerPC] Support -mno-red-zone option (#94581 )	2024-08-19 17:58:08 +08:00
Amy Kwan	cf721e29c6	[PowerPC] Do not merge TLS constants within PPCMergeStringPool.cpp (#94059 ) This patch prevents thread-local constants to be merged within PPCMergeStringPool.cpp. The PPCMergeStringPool pass primarily merges non-thread-local constants together, and thread-local constants should not be mixed together with other (non-thread-local) constants. In the event that thread-local and other non-thread-local constants are pooled together, the llvm.threadlocal.address intrinsic can fail as it expects its argument to be a thread-local global value, but the merged string structure created by the PPCMergeStringPool pass is not thread-local as a whole.	2024-08-16 15:06:50 -04:00
Amy Kwan	9325381998	[PowerPC][GlobalMerge] Enable GlobalMerge by default on AIX (#101226 ) This patch turns on the GlobalMerge pass by default on AIX and updates LIT tests accordingly.	2024-08-15 15:25:54 -04:00
Craig Topper	abc1acf8df	[TargetLowering][AMDGPU][ARM][RISCV][X86] Teach SimplifyDemandedBits to combine (srl (sra X, C1), ShAmt) -> sra(X, C1+ShAmt) (#101751 ) If the upper bits of the shr aren't demanded. This helps with cases where the outer srl was originally an sra and was converted to a srl by SimplifyDemandedBits before it had a chance to combine with the inner sra. This can occur when the inner sra was part of a sign_extend_inreg expansion. There are some regressions in ARM and Thumb2.	2024-08-14 08:44:57 -07:00
Amy Kwan	5e990b0b7f	[PowerPC][GlobalMerge] Reduce TOC usage by merging internal and private global data (#101224 ) This patch aims to reduce TOC usage by merging internal and private global data. Moreover, we also add the GlobalMerge pass within the PPCTargetMachine pipeline, which is disabled by default. This transformation can be enabled by -ppc-global-merge.	2024-08-14 10:14:33 -04:00
RolandF77	8b6e9de3dd	[PowerPC] improve P10 store forwarding on P7 scalar to vector (#102330 ) Try to make P7 code with scalar to vector operations that use store/re-load to run smoother on P10 by supplying enough store width to cover the load and allow hardware store forwarding.	2024-08-12 12:30:06 -04:00
Peter Rong	74e4694b8c	[LTO] enable `ObjCARCContractPass` only on optimized build (#101114 ) \#92331 tried to make `ObjCARCContractPass` by default, but it caused a regression on O0 builds and was reverted. This patch trys to bring that back by: 1. reverts the [revert](`1579e9ca9c`). 2. `createObjCARCContractPass` only on optimized builds. Tests are updated to refelect the changes. Specifically, all `O0` tests should not include `ObjCARCContractPass` Signed-off-by: Peter Rong <PeterRong@meta.com>	2024-08-09 13:04:25 -07:00
Simon Pilgrim	13d04fa560	[DAG] Add legalization handling for ABDS/ABDU (#92576 ) (REAPPLIED) Always match ABD patterns pre-legalization, and use TargetLowering::expandABD to expand again during legalization. abdu(lhs, rhs) -> sub(xor(sub(lhs, rhs), usub_overflow(lhs, rhs)), usub_overflow(lhs, rhs)) Alive2: https://alive2.llvm.org/ce/z/dVdMyv REAPPLIED: Fix regression issue with "abs(ext(x) - ext(y)) -> zext(abd(x, y))" fold failing after type legalization	2024-08-08 11:39:05 +01:00
Tim Gymnich	408d82d352	[PowerPC] Respect endianness when bitcasting to fp128 (#95931 ) Fixes #92246 Match the behaviour of `bitcast v2i64 (BUILD_PAIR %lo %hi)` when encountering `bitcast fp128 (BUILD_PAIR %lo $hi)`. by inserting a missing swap of the arguments based on endianness. ### Current behaviour: fp128 bitcast fp128 (BUILD_PAIR %lo $hi) => BUILD_FP128 %lo %hi BUILD_FP128 %lo %hi => MTVSRDD %hi %lo v2i64 bitcast v2i64 (BUILD_PAIR %lo %hi) => BUILD_VECTOR %hi %lo BUILD_VECTOR %hi %lo => MTVSRDD %lo %hi	2024-08-08 08:51:04 +08:00

1 2 3 4 5 ...

3951 Commits