llvm-project

Author	SHA1	Message	Date
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Qiu Chaofan	9b5e2b5261	[PowerPC] Implement basic macro fusion in Power10 Including basic fusion types around arithmetic and logical instructions. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111693	2021-11-08 17:23:56 +08:00
Kamau Bridgeman	8737c74fab	[PowerPC][MMA] Allow MMA builtin types in pre-P10 compilation units This patch allows the use of __vector_quad and __vector_pair, PPC MMA builtin types, on all PowerPC 64-bit compilation units. When these types are made available the builtins that use them automatically become available so semantic checking for mma and pair vector memop __builtins is also expanded to ensure these builtin function call are only allowed on Power10 and new architectures. All related test cases are updated to ensure test coverage. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D109599	2021-10-05 07:59:32 -05:00
Amy Kwan	5041a485b9	[PowerPC] Exploit Prefixed Load/Stores using the refactored Load/Store Implementation This patch exploits the prefixed load and store instructions utilizing the refactored load/store implementation introduced in D93370. Prefixed load and store instructions are emitted whenever we are loading or storing a value with an offset that fits into a 34-bit signed immediate. Patterns for the prefixed load and stores are added in this patch, as well as the implementation that detects when we are loading and storing a value with an offset that fits in 34-bits. Differential Revision: https://reviews.llvm.org/D96075	2021-09-14 08:39:49 -05:00
Qiu Chaofan	a08fc1361a	[PowerPC] Change VSRpRC allocation order On PowerPC, VSRpRC represents the pairs of even and odd VSX register, and VRRC corresponds to higher 32 VSX registers. In some cases, extra copies are produced when handling incoming VRRC arguments with VSRpRC. This patch changes allocation order of VSRpRC to eliminate this kind of copy. Stack frame sizes may increase if allocating non-volatile registers, and some other vector copies happen. They need fix in future changes. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D104855	2021-06-25 16:04:41 +08:00
Fangrui Song	88cadb894c	[PowerPC][test] Add explicit dso_local to definitions in ELF static relocation model tests TargetMachine::shouldAssumeDSOLocal currently implies dso_local for such definitions. Adding explicit dso_local makes these tests align with the clang -fpic behavior and allow the removal of the TargetMachine::shouldAssumeDSOLocal special case. Rewrite preemption.ll to dsolocal-static.ll and dsolocal-pic.ll, and add "PIC Level" metadata.	2020-12-30 10:32:34 -08:00
Baptiste Saleil	18db29ea6f	[PowerPC] Add peephole to remove redundant accumulator prime/unprime instructions In some situations, the compiler may insert an accumulator prime instruction and an accumulator unprime instruction with no use of that accumulator between the two. That's for example the case when we store an accumulator after assembling it or restoring it. This patch adds a peephole to remove these prime and unprime instructions. Differential Revision: https://reviews.llvm.org/D91386	2020-11-18 15:01:07 -06:00
Baptiste Saleil	0156914275	[PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types This patch legalizes the v256i1 and v512i1 types that will be used for MMA. It implements loads and stores of these types. v256i1 is a pair of VSX registers, so for this type, we load/store the two underlying registers. v512i1 is used for MMA accumulators. So in addition to loading and storing the 4 associated VSX registers, we generate instructions to prime (copy the VSX registers to the accumulator) after loading and unprime (copy the accumulator back to the VSX registers) before storing. This patch also adds the UACC register class that is necessary to implement the loads and stores. This class represents accumulator in their unprimed form and allow the distinction between primed and unprimed accumulators to avoid invalid copies of the VSX registers associated with primed accumulators. Differential Revision: https://reviews.llvm.org/D84968	2020-09-28 14:39:37 -05:00

8 Commits