llvm-project

Author	SHA1	Message	Date
David Green	4ac2721e51	[AArch64] Add costs for ST3 and ST4 instructions, modelled as store(shuffle). (#87934 ) This tries to add some costs for the shuffle in a ST3/ST4 instruction, which are represented in LLVM IR as store(interleaving shuffle). In order to detect the store, it needs to add a CxtI context instruction to check the users of the shuffle. LD3 and LD4 are added, LD2 should be a zip1 shuffle, which will be added in another patch. It should help fix some of the regressions from #87510.	2024-04-09 16:36:08 +01:00
Chen Zheng	f33a6dcf95	[PPC][NFC] add an option for GatherAllAliasesMaxDepth (#87071 ) GatherAllAliases is time consuming. Add an debug option on PPC to control the complexity of the function. This is useful when debuging compile time related issues.	2024-04-02 08:40:28 +08:00
Amy Kwan	a3efc53f16	[AIX][TLS] Produce a faster local-exec access sequence for the "aix-small-tls" global variable attribute (#83053 ) Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows the backend to produce a faster access sequence for the local-exec TLS model, where loading from the TOC can be avoided, for local-exec TLS variables that are annotated with the "aix-small-tls" attribute. The expectation is for local-exec TLS variables to be set with this attribute through PGO. Furthermore, the optimized access sequence is only generated for local-exec TLS variables annotated with "aix-small-tls", only if they are less than ~32KB in size.	2024-03-28 09:18:45 -04:00
Il-Capitano	308ed0233a	[Intrinsics] Make `patchpoint.i64` generic on its return type (#85911 ) Currently patchpoints can only have two result types, `void` and `i64`. This limits the result to general purpose registers. This patch makes `patchpoint.i64` an overloadable intrinsic, allowing result values that can fit in a single register (e.g. integers, pointers, floats).	2024-03-26 19:08:52 +05:30
Sergei Barannikov	5e5b656102	[MC] Make `MCParsedAsmOperand::getReg()` return `MCRegister` (#86444 )	2024-03-25 05:13:48 +03:00
Chen Zheng	90454a6098	[PowerPC][AIX] support explicit sections for -ffunction-sections (#85351 ) Fix crashes in https://godbolt.org/z/6voEa1o6Y	2024-03-22 13:23:36 +08:00
Christudasan Devadasan	1f1f569b29	[PowerPC] Clang format (NFC).	2024-03-21 01:56:41 +05:30
Qiu Chaofan	d03263814a	[PowerPC] Fix operand regclass of XSTSTDCSP	2024-03-20 17:45:19 +08:00
Jeremy Morse	b9d83eff25	[NFC][RemoveDIs] Use iterators for insertion at various call-sites (#84736 ) These are the last remaining "trivial" changes to passes that use Instruction pointers for insertion. All of this should be NFC, it's just changing the spelling of how we identify a position. In one or two locations, I'm also switching uses of getNextNode etc to using std::next with iterators. This too should be NFC. --------- Merged by: Stephen Tozer <stephen.tozer@sony.com>	2024-03-19 16:36:29 +00:00
Jonas Paulsson	8b8e1adbde	[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879 ) - Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and select those nodes properly instead. This avoids exposing them to the DAGCombiner. - AtomicExpand pass no longer casts float/double atomic load/stores to integer (FP128 is still casted).	2024-03-18 17:21:50 -04:00
Jonas Paulsson	09bc6abba6	[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001 ) - Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code. - Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().	2024-03-18 10:37:59 -04:00
Qiu Chaofan	e5b20c83e5	[PowerPC] Update chain uses when emitting lxsizx (#84892 )	2024-03-18 22:31:05 +08:00
Qiu Chaofan	65ae09eeb6	[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040 ) rldimi is 64-bit instruction, so the corresponding builtin should not be available in 32-bit mode. Rotate amount should be in range and cases when mask is zero needs special handling. This change also swaps the first and second operands of rldimi/rlwimi to match previous behavior. For masks not ending at bit 63-SH, rotation will be inserted before rldimi.	2024-03-18 14:17:16 +08:00
David Green	601e102bdb	[CodeGen] Use LocationSize for MMO getSize (#84751 ) This is part of #70452 that changes the type used for the external interface of MMO to LocationSize as opposed to uint64_t. This means the constructors take LocationSize, and convert ~UINT64_C(0) to LocationSize::beforeOrAfter(). The getSize methods return a LocationSize. This allows us to be more precise with unknown sizes, not accidentally treating them as unsigned values, and in the future should allow us to add proper scalable vector support but none of that is included in this patch. It should mostly be an NFC. Global ISel is still expected to use the underlying LLT as it needs, and are not expected to see unknown sizes for generic operations. Most of the changes are hopefully fairly mechanical, adding a lot of getValue() calls and protecting them with hasValue() where needed.	2024-03-17 18:15:56 +00:00
Sean Fertile	2d80505401	[AIX] Support per global code model. (#79202 ) Exploit the per global code model attribute on AIX. On AIX we need to update both the code sequence used to access the global (either 1 or 2 instructions for small and large code model respectively) and the storage mapping class that we emit the toc entry. --------- Co-authored-by: Amy Kwan <akwan0907@gmail.com>	2024-03-15 12:52:04 -04:00
Zaara Syeda	00ba2a6f1e	[AIX][TOC] Fix buildbot failures from commit b4ae8df (#85303 ) The following tests fail when built with Address and Undefined sanitizers: CodeGen/PowerPC/basic-toc-data-def.ll CodeGen/PowerPC/toc-data-large-array2.ll Subtarget may be null in emitGlobalVariable, for example in the testcase where we have no functions in the IR. The fix moves this function from PPCSubtarget to a static helper function. This only fails with sanitizers because the Subtarget is not used in the member function.	2024-03-14 16:26:58 -04:00
Zaara Syeda	cc761a7c35	[PowerPC][NFC] Rename ADDItocL to match the 64-bit naming convention (#85099 ) In preparation of adding a similar instruction for large code model on AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8 to match the naming convention of other instructions with 32-bit and 64-bit variants.	2024-03-13 11:57:07 -04:00
Zaara Syeda	37b5eb0a0a	[AIX][TOC] Add -mtocdata/-mno-tocdata options on AIX (#67999 ) This patch enables support that the XL compiler had for AIX under -qdatalocal/-qdataimported.	2024-03-13 10:26:31 -04:00
Arthur Eubanks	94c988bcfd	[NFC] Remove unused parameter from shouldAssumeDSOLocal()	2024-03-11 19:48:17 +00:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
Jeremy Morse	f33f66be7d	[NFC][RemoveDIs] Always use iterators for inserting PHIs It's becoming potentially unsafe to insert a PHI instruction using a plain Instruction pointer. Switch all the remaining sites that create and insert PHIs to use iterators instead. For example, the code in ComplexDeinterleavingPass.cpp is definitely at-risk of mixing PHIs and debug-info.	2024-03-05 17:00:12 +00:00
Qiu Chaofan	906580bad3	[PowerPC] Add intrinsics for rldimi/rlwimi/rlwnm (#82968 ) These builtins are already there in Clang, however current codegen may produce suboptimal results due to their complex behavior. Implement them as intrinsics to ensure expected instructions are emitted.	2024-03-04 21:13:59 +08:00
Chen Zheng	8d1046ae49	[PowerPC] adjust cost for extract i64 from vector on P9 and above (#82963 ) https://godbolt.org/z/Ma347Tx1W	2024-03-04 09:37:11 +08:00
George Koehler	6b70c5d79f	[PowerPC] provide CFI for ELF32 to unwind cr2, cr3, cr4 (#83098 ) Delete the code that skips the CFI for the condition register on ELF32. The code checked !MustSaveCR, which happened only when Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling cr in a different way. The spill was missing CFI. After deleting this code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14 to r31. Fixes #83094	2024-03-02 22:18:24 -05:00
Felix (Ting Wang)	5b05870953	[PowerPC] Support local-dynamic TLS relocation on AIX (#66316 ) Supports TLS local-dynamic on AIX, generates below sequence of code: ``` .tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier .tc mh[TC],mh[TC]@ml # Module handle for the caller lwz 3,mh[TC]$2$ $$ For 64-bit: ld 3,mh[TC]$2$ bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0 #r3 = &TLS for module lwz 4,foo[TC]$2$ $$ For 64-bit: ld 4,foo[TC]$2$ add 5,3,4 # Compute &foo .rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML" ``` --------- Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan> Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>	2024-03-01 08:09:40 +08:00
Kai Luo	d1924f0474	[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction (#72845 ) When expand `select_cc` in finalize-isel, we should not generate `isel` for targets not feature it.	2024-03-01 08:03:06 +08:00
Rishabh Bali	fe42e72db2	[CodeGen] Port AtomicExpand to new Pass Manager (#71220 ) Port the `atomicexpand` pass to the new Pass Manager. Fixes #64559	2024-02-25 18:42:22 +05:30
Chen Zheng	80f3bb4cf2	[PowerPC] adjust cost for vector insert/extract with non const index (#79092 ) P9 has vxform `Vector Extract Element Instructions` like `vextuwrx` and P10 has vxform `Vector Insert Element instructions` like `vinsd`. Update the instruction cost reflecting these instructions. Fixes https://github.com/llvm/llvm-project/issues/50249	2024-02-19 09:57:49 +08:00
Qiu Chaofan	292d9e869f	[PowerPC] Mask constant operands in ValueBit tracking (#67653 ) In IR or C code, shift amount larger than value size is undefined behavior. But in practice, backend lowering for shift_parts produces add/sub of shift amounts, thus constant shift amounts might be negative or larger than value size, which depends on ISA definition. PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction) will be taken, and if the highest among them is 1, result will be zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift amount. This commit emulates the behavior and avoids array overflow in bit permutation's value bits calculator.	2024-02-06 18:37:31 +08:00
Kai Luo	a9670fb0de	[PowerPC] Fix assertion of InstDisp for local-exec TLS. NFC. Fixes https://github.com/llvm/llvm-project/issues/80557.	2024-02-05 13:54:36 +08:00
Mikael Holmen	42ec9934e1	[PowerPC] Fix gcc -Wparentheses warning [NFC] Without this gcc warns like ../lib/Target/PowerPC/PPCAsmPrinter.cpp:1650:33: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] 1650 \| (InstDisp >= -32768) && \| ~~~~~~~~~~~~~~~~~~~~~^~ 1651 \| "Expecting the instruction displacement for local-exec TLS " \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1652 \| "variables to be between [-32768, 32768)!"); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	2024-02-02 10:50:47 +01:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Jie Fu	40f6b7d476	[PowerPC] Fix -Wunused-variable in PPCAsmPrinter.cpp and PPCISelDAGToDAG.cpp (NFC) llvm-project/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp:1648:15: error: unused variable 'InstDisp' [-Werror,-Wunused-variable] ptrdiff_t InstDisp = TLSVarAddress + Offset - Delta; ^ llvm-project/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7624:19: error: unused variable 'TPReg' [-Werror,-Wunused-variable] RegisterSDNode *TPReg = dyn_cast<RegisterSDNode>(TPRegNode.getNode()); ^ llvm-project/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7625:23: error: unused variable 'Subtarget' [-Werror,-Wunused-variable] const PPCSubtarget &Subtarget = ^	2024-02-01 22:50:14 +08:00
Amy Kwan	2a50921553	[AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (#71485 ) This patch utilizes the -maix-small-local-exec-tls option to produce a faster, non-TOC-based access sequence for the local-exec TLS model. Specifically, for when the offsets from the TLS variable are non-zero. In particular, this patch produces either a single: - addi/la with a displacement off of R13 plus a non-zero offset for when an address is calculated, or - load or store off of R13 plus a non-zero offset for when an address is calculated and used for further access where R13 is the thread pointer, respectively. In order to produce a single addi or load/store off of the thread pointer with a non-zero offset, this patch also adds the necessary support in the assembly printer when printing these instructions. Specifically: - The non-zero offset is added to the TLS variable address when the address of the TLS variable + it's offset is less than 32KB. - Otherwise, when the address of the TLS variable + its offset is greater than 32KB, the non-zero offset (and a multiple of 64KB) is subtracted from the TLS address. This handling in the assembly printer is necessary to ensure that the TLS address + the non-zero offset is between [-32768, 32768), so that the total displacement can fit within the addi/load/store instructions. This patch is meant to be a follow-up to 3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the optimization occurs for when the offset is zero).	2024-02-01 09:29:21 -05:00
Zaara Syeda	a03a6e9964	[AIX] [XCOFF] Add support for common and local common symbols in the TOC (#79530 ) This patch adds support for common and local symbols in the TOC for AIX. Note that we need to update isVirtualSection so as a common symbol in TOC will have the symbol type XTY_CM and will be initialized when placed in the TOC so sections with this type are no longer virtual. --------- Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>	2024-01-31 16:34:21 -05:00
Shimin Cui	1bab570e9b	Move the PowerPC/PPCMergeStringPool work to initializer (#77352 ) Currently, the `PPCMergeStringPool` merges the global variable after the `AsmPrinter` initializer adds the global variables to its symbol list. This is to move the merging work of `PPCMergeStringPool` to its initializer, just like what GlobalMerge does, to avoid adding merged global variables to the `AsmPrinter` symbol lis.	2024-01-31 10:27:07 -05:00
Oskar Wirga	ff4636a4ab	Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940 ) This is a fix for the regression seen in https://github.com/llvm/llvm-project/pull/79498 > Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. Now we do not recompute the entire CFG but we do ensure that the newly added MBB do reach convergence.	2024-01-30 19:33:04 -08:00
Kazu Hirata	ae46855f53	[Target] Use getConstantOperand (NFC)	2024-01-28 18:03:38 -08:00
Kazu Hirata	1f5934a901	[PowerPC] Directly call Instruction::getMetadata (NFC)	2024-01-28 18:03:36 -08:00
Nikita Popov	07a1925b8b	Revert "Refactor recomputeLiveIns to operate on whole CFG (#79498 )" This reverts commit 59bf60519fc30d9d36c86abd83093b068f6b1e4b. Introduces a major compile-time regression.	2024-01-26 22:33:17 +01:00
Oskar Wirga	59bf60519f	Refactor recomputeLiveIns to operate on whole CFG (#79498 ) Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. This PR fixes that by simply recomputing the liveins for the entire CFG until convergence is achieved. This makes it harder to introduce subtle bugs which alter liveness.	2024-01-26 11:25:36 -08:00
Amy Kwan	d5fe1bd081	[AIX][TLS] Disallow the use of -maix-small-local-exec-tls and -fno-data-sections (#79252 ) This patch disallows the use of the -maix-small-local-exec-tls and -fno-data-sections options within clang, and also disallows the use of the aix-small-local-exec-tls attribute with the -data-sections=false option in llc. This is because having data sections off when using the aix-small-local-exec-tls feature is not ideal for performance. As the small-local-exec-tls region is a limited resource, this space should not used for variables that may be replaced. Note, that on AIX, data sections is turned on by default, so this patch makes it so that a diagnostic is emitted when users explicitly turn off data sections while using the aix-small-local-exec-tls feature.	2024-01-26 12:39:25 -05:00
Nemanja Ivanovic	67c1c1dbb6	[PowerPC][X86] Make cpu id builtins target independent and lower for PPC (#68919 ) Make __builtin_cpu_{init\|supports\|is} target independent and provide an opt-in query for targets that want to support it. Each target is still responsible for their specific lowering/code-gen. Also provide code-gen for PowerPC. I originally proposed this in https://reviews.llvm.org/D152914 and this addresses the comments I received there. --------- Co-authored-by: Nemanja Ivanovic <nemanjaivanovic@nemanjas-air.kpn> Co-authored-by: Nemanja Ivanovic <nemanja@synopsys.com>	2024-01-26 11:24:50 -05:00
Shimin Cui	e278c67096	Add support to meger strings used by metadata (#77364 ) Currently if the merged string is used by metadata, its metadata uses are not replaced if the string is merged. This is to add code support for the metadata use replacement.	2024-01-26 09:22:37 -05:00
Shengchen Kan	550f0eb2ce	[NFC] Rename TargetInstrInfo::FoldImmediate to TargetInstrInfo::foldImmediate and simplify implementation for X86	2024-01-26 20:50:58 +08:00
Nico Weber	184ca39529	[llvm] Move CodeGenTypes library to its own directory (#79444 ) Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.	2024-01-25 12:01:31 -05:00
RolandF77	4beea6b195	[PowerPC] lower partial vector store cost (#78358 ) There are matching store opcodes (stfd, stxsiwx) for the load opcodes that make 32-bit and 64-bit vector operations cheap with VSX, so stores should also be cheap.	2024-01-23 16:07:18 -05:00
Chen Zheng	1f61507401	[NFC][PowerPC] remove the redundant spill related flags setting	2024-01-18 23:07:42 -05:00
Nikita Popov	87bc91d425	[PowerPC] Fix shuffle combine with undef elements (#77787 ) This custom DAG combine works on a shuffle where one source vector is a zero splat, which means we can adjust the shuffle indices to refer to any element of the splat -- as long as we stay in the same vector. In the case where an undef (-1) index into the non-splat vector was used, we ended up adjusting the splat index to -1+NumElements, which points into the wrong vector. Fix this by using the first element from the splat if the other one is undef. There are four cases this theoretically affects, but in practice I only managed to demonstrate a miscompile with one of them. I think two of theses are effectively dead due to the operand canonicalization at the start of the transform. Fixes https://github.com/llvm/llvm-project/issues/77748.	2024-01-15 10:12:33 +01:00
Qiu Chaofan	85071a3c74	[PowerPC] Implement fence builtin (#76495 )	2024-01-15 11:19:16 +08:00

1 2 3 4 5 ...

7291 Commits