llvm-project

Author	SHA1	Message	Date
Matt Arsenault	80e2c26dfd	RegisterCoalescer: Fix name of pass I finally snapped and fixed this inconsistency.	2023-06-21 10:30:43 -04:00
Jay Foad	0b8a2eaf62	[AMDGPU] Add some positive tests for merging S_LOAD instructions	2023-06-21 13:56:03 +01:00
Pravin Jagtap	8e1e871e2f	[AMDGPU] Preserve dom-tree analysis in atomic optimizer. AMDGPUAtomicOptimizer updates the dominator tree whenever it modified the control flow. Therefore preserving the analysis similar to legacy PM. Reviewed By: arsenm, yassingh, #amdgpu Differential Revision: https://reviews.llvm.org/D153349	2023-06-21 08:02:43 -04:00
Kishan Parmar	c42f0a6e64	PowerPC/SPE: Add phony registers for high halves of SPE SuperRegs The intent of this patch is to make upper halves of SPE SuperRegs(s0,..,s31) as artificial regs, similar to how X86 has done it. And emit store /reload instructions for the required halves. PR : https://github.com/llvm/llvm-project/issues/57307 Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D152437	2023-06-21 10:24:40 +00:00
WANG Xuerui	00786d3a5f	[LoongArch] Support CodeModel::Large codegen This is intended to behave like GCC's `-mcmodel=extreme`. Technically the true GCC equivalent would be `-mcmodel=large` which is not yet implemented there, and we probably do not want to take the "Large" name until things settle in GCC side, but: * LLVM does not have a `CodeModel::Extreme`, and it seems too early to have such a variant added just for enabling LoongArch; and * `CodeModel::Small` is already being used for GCC `-mcmodel=normal` which is already a case of divergent naming. Regarding the codegen, loads/stores immediately after a PC-relative large address load (that ends with something like `add.d $addr, $addr, $tmp`) should get merged with the addition into corresponding `ldx/stx` ops, but is currently not done. This is because pseudo-instructions are expanded after instruction selection, and is best fixed with a separate change. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D150522	2023-06-21 16:41:10 +08:00
WuXinlong	c9e08fa606	[RISCV] Add a pass to merge moving parameter registers instructions for Zcmp This patch adds a pass to generate `cm.mvsa01` & `cm.mva01s`. RISCVMoveOptimizer.cpp which combines two mv inst into one cm.mva01s or cm.mva01s. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D150415	2023-06-21 15:41:51 +08:00
tianleli	1c27275813	[DAG] Unroll and expand illegal result of LDEXP and POWI instead of widen. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D153104	2023-06-21 14:27:39 +08:00
Fangrui Song	e0a6561ec9	[XRay] Make xray_fn_idx entries PC-relative As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020), PC-relative entries avoid dynamic relocations and can therefore make the section read-only. This is similar to D78082 and D78590. We cannot commit to support compiler/runtime built at different versions, so just don't play with versions. For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+` symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`. A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d). Differential Revision: https://reviews.llvm.org/D152661	2023-06-20 22:40:56 -07:00
Krzysztof Parzyszek	dbc283bb9e	[Hexagon] Handle 64-bit operands when lowering ADDO/SUBO	2023-06-20 12:43:37 -07:00
eopXD	9ed668ad93	[RISCV] Model vxrm control for vsmul, vssra, vssrl, vnclip, and vnclipu Depends on D151397. This patch follows the patch-set of D151395. This patch seeks to update all the remaining fixed-point intrinsics to model vxrm control, adding rounding mode control for `vsmul`, `vssra`, `vssrl`, `vnclip`, and `vnclipu`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D152879	2023-06-20 11:09:24 -07:00
eopXD	5510f0b8f4	[3/3][RISCV][POC] Model vxrm in C intrinsics for RVV fixed-point instruction vaadd, vasub Depends on D151396. This is the 3rd patch of the patch-set. For the cover letter of the patch-set, please checkout D151395. This commit consists of change in both clang front-end and RISC- back-end. In the front-end, this commit adds an additional operand to the C intrinsics of `vaadd`, `vaaddu`, `vasub`, and `vasubu`, that models the control of the rounding mode. In the back-end, using `vaadd` as an example, this commit replaces the existing `int.riscv.vaadd.` with `int.riscv.vaadd.rm.` that was introduced in the previous patch, with the extra operand that models the control of the rounding mode (`vxrm`) for RVV fixed-point intrinsics. Note: The first 3 commit of the patch-set shows the intent to model the rounding mode for fixed-point intrinsics by applying change to `vaadd`, `vaaddu`, `vasub`, and `vasubu`. The proceeding patch will apply the change to the rest of the other fixed-point instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151397	2023-06-20 11:08:09 -07:00
eopXD	7c8365121a	[2/3][RISCV][POC] Model vxrm in LLVM intrinsics and machine instructions for RVV fixed-point instructions Depends on D151395. This is the 2nd patch of the patch-set. For the cover letter of the patch-set, please checkout D151395. This patch originates from D121376. This commit models vxrm by adding an immediate operand into intrinsics and machine instructions of RVV fixed-point instruction `vaadd`, `vaaddu`, `vasub`, and `vasubu`. This commit only covers intrinsics of the four instructions, the proceeding patches of the patch-set will do the same to other RVV fixed-point instructions. The current naiive approach is to have a write to vxrm inserted before every fixed-point instruction. This is done by the new added pass `RISCVInsertReadWriteCSR`. The reason to name the pass in a more general term is because we will also model rounding mode for the RVV floating- point instructions. The approach will be improved in the future, implementing partial redundancy elimination algorithms to it. The original LLVM intrinsics and machine instructions, take `vaadd` as an example, does not model the rounding mode is not removed in this patch. That is, `int.riscv.vaadd.` co-exists with `int.riscv.vaadd.rm.` after this patch. The next patch will add C intrinsics of vaadd with an additional operand that models the control of the rounding mode, in this patch, `int.riscv.vaadd.rm.` will replace `int.riscv.vaadd.`. Authored-by: ShihPo Hung <shihpo.hung@sifive.com> Co-Authored-by: eop Chen <eop.chen@sifive.com> Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D151396	2023-06-20 11:07:01 -07:00
Matt Arsenault	e777da468c	AMDGPU: Delete old AMDGPUPropagateAttributes pass The optimizing, non-broken features have all been moved to AMDGPUAttributor. The only remaining piece of functionality was the broken propagation of the wavesize features. This was fundamentally broken and a hack for device library linking. It doesn't matter when the device libraries are correctly linked and internalized. In case of linked-as-normal-bitcode (as comgr still does), we're reliant on the global subtarget anyway. If we can get away without forcing target-cpu, we should just as well be able to get away without propagating target-features.	2023-06-20 13:05:45 -04:00
Amy Kwan	f5ae075048	[AIX][TLS] Generate 32-bit local-exec access code sequence This patch adds support for the TLS local-exec access model on AIX to allow for the ability to generate the 32-bit (specifically, non-optimized) code sequence. This work is a follow up of D149722. The particular sequence that is generated for this sequence is as follows: ``` .tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier bla .__get_tpointer() // get the thread pointer, modifies r3 lwz reg1, var[TC](2) // load the variable offset add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer ``` Differential Revision: https://reviews.llvm.org/D152669	2023-06-20 11:57:38 -05:00
Craig Topper	8680c28add	[RISCV] Remove mask from vrgatherei16 in lowerVECTOR_INTERLEAVE. Unless I'm missing something we need to update the whole vector not just where OddMask is true. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D153087	2023-06-20 09:36:38 -07:00
Matt Arsenault	7dcb9c0f09	InlineSpiller: Consider copy bundles when looking for snippet copies This was looking for full copies produced by SplitKit, but SplitKit introduces copy bundles if not all lanes are live. The scan for uses needs to look at bundles, not individual instructions. This is a prerequisite to avoiding some redundant spills due to subregisters which will help avoid an allocation failure in a future patch.	2023-06-20 12:26:27 -04:00
Simon Pilgrim	ff23856c1c	[DAG] Fold (abds x, y) -> (abdu x, y) iff both args are known positive This is a generic DAG combine version of D151055 which recognizes when a signed ABDS can be safely replaced with a unsigned ABDU instruction if it is legal. Alive2: https://alive2.llvm.org/ce/z/pb5BjG Differential Revision: https://reviews.llvm.org/D153328	2023-06-20 15:31:22 +01:00
Jingu Kang	cce08185b4	[AArch64] Try to fold uaddlv and uaddlp Add tablegen pattern for uaddlv(uaddlp(x)) ==> uaddlv(x). Differential Revision: https://reviews.llvm.org/D153323	2023-06-20 15:14:27 +01:00
Weining Lu	3dd319ecf3	[LoongArch] Optimize conditional selection of integer This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions. ``` int sel(int a, int b) { return (a < b) ? a : 0; } ``` ``` slt $a1,$a0,$a1 masknez $a2,$r0,$a1 maskeqz $a0,$a0,$a1 or $a0,$a0,$a2 ``` => ``` slt $a1,$a0,$a1 maskeqz $a0,$a0,$a1 ``` Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D153193	2023-06-20 21:54:40 +08:00
Pravin Jagtap	699addeff0	[AMDGPU] Use verify<domtree> instead of intra-pass asserts. Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc significantly. This change does the verification after the atomic optimizer pass and should fix the regression reported in D153232. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D153261	2023-06-20 09:52:58 -04:00
Weining Lu	2efdacf74c	[LoongArch] Add missing chains and remove unnecessary `SDNPSideEffect` property for some intrinsic nodes	2023-06-20 21:16:26 +08:00
David Green	1d27ad2077	[AArch64] Add tablegen patterns for fp16 fcvtn2. Similar to the existing f32 pattern, this adds a tablegen pattern for the fp16 fcvtn2.	2023-06-20 14:10:25 +01:00
Ivan Kosarev	2d3e6c4402	[AMDGPU] Drop GFX11 runs for dagcombine-fma-fmad.ll and fma.f16.ll. They cause failures on the llvm-clang-x86_64-expensive-checks-debian buildbot. This partially reverts D153269 [AMDGPU][GFX11] Add test coverage for FMA instructions.	2023-06-20 11:32:44 +01:00
Francesco Petrogalli	c7430ff9bf	[CodeGen][test] Add missing `REQUIRES`. Differential Revision: https://reviews.llvm.org/D153325	2023-06-20 12:00:07 +02:00
Ivan Kosarev	dec42ffa28	[AMDGPU][GFX11] Add test coverage for FMA instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153269	2023-06-20 10:50:03 +01:00
Francesco Petrogalli	37db9cae2b	[llc][MISched] Add `-misched-detail-resource-booking` to llc. The option `-misched-detail-resource-booking` prints the following information every time the method `SchedBoundary::getNextResourceCycle` is invoked: 1. counters of the resources that have already been booked; 2. the values returned by `getNextResourceCycle`, which is the next available cycle in which a resource can be booked. The method is useful to debug low-level checks inside the machine scheduler that make decisions based on the values returned by `getNextResourceCycle`. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D153116	2023-06-20 11:46:27 +02:00
Francesco Petrogalli	25f8b1a0a8	Revert "[llc][MISched] Add `-misched-detail-resource-booking` to llc." Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485: llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking' if (MischedDetailResourceBooking) This reverts commit fc06262c1c365777e71207b6a5de281cba927c96.	2023-06-20 11:28:45 +02:00
Francesco Petrogalli	fc06262c1c	[llc][MISched] Add `-misched-detail-resource-booking` to llc. The option `-misched-detail-resource-booking` prints the following information every time the method `SchedBoundary::getNextResourceCycle` is invoked: 1. counters of the resources that have already been booked; 2. the values returned by `getNextResourceCycle`, which is the next available cycle in which a resource can be booked. The method is useful to debug low-level checks inside the machine scheduler that make decisions based on the values returned by `getNextResourceCycle`. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D153116	2023-06-20 11:13:39 +02:00
Ben Shi	6d05f3f56e	[CSKY] Optimize multiplication with immediates Try to break a multiplication with a specific immediate to an/a addition/subtraction of left shifts. Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153106	2023-06-20 16:03:31 +08:00
Ben Shi	56e33d9881	[CSKY][test][NFC] Add more tests of multiplication with immediates Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153105	2023-06-20 16:03:15 +08:00
Bing1 Yu	516e32678d	[X86][AMX] set Stride to Tile's Col when doing combine amxcast and store into tilestore %tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64) %vec = call <256 x i8> @llvm.x86.cast.tile.to.vector.v256i8(x86_amx...%tile) store <256 x i8> %vec, <256 x i8>* %dst_ptr, align 256 => %tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64) %stride = sext i16 32 to i64 call void @llvm.x86.tilestored64.internal(i16 8, i16 32, i8* %dst_ptr, i64 32, x86_amx %tile) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D153002	2023-06-20 11:55:25 +08:00
Fangrui Song	dafaa8463e	[XRay] Make llvm.xray.customevent parameter type match __xray_customevent The intrinsic has a smaller integer type than the parameter type of builtin-function/API. Fix this similar to commit 3fa3cb408d8d0f1365b322262e501b6945f7ead9.	2023-06-19 20:38:16 -07:00
Fangrui Song	3fa3cb408d	[XRay] Make llvm.xray.typedevent parameter type match __xray_typedevent The Clang built-in function is void __xray_typedevent(size_t, const void *, size_t), but the LLVM intrinsics has smaller integer types. Since we only allow 64-bit ELF/Mach-O targets, we can change llvm.xray.typedevent to i64/ptr/i64. This allows encoding more information and avoids i16 legalization for many non-X86 targets. fdrLoggingHandleTypedEvent only supports uint16_t event type.	2023-06-19 20:28:39 -07:00
Jay Foad	4b6d41cd1d	[AMDGPU] Do not release VGPRs if there may be pending scratch stores Differential Revision: https://reviews.llvm.org/D153295	2023-06-19 21:12:43 +01:00
Amy Kwan	d5659808b2	[AIX][TLS] Generate 64-bit local-exec access code sequence This patch adds support for the TLS local-exec access model on AIX to allow for the ability to generate the 64-bit (specifically, non-optimized) code sequence. For this patch in particular, the sequence that is generated involves a load of the variable offset, followed by an add of the loaded variable offset to r13 (which is thread pointer, respectively). This code sequence looks like the following: ``` ld reg1,var[TC](2) add reg2, reg1, r13 // r13 contains the thread pointer ``` The TOC (.tc pseudo-op) entries generated in the assembly files are also changed where we add the @le relocation for the variable offset. Differential Revision: https://reviews.llvm.org/D149722	2023-06-19 12:17:30 -05:00
Florian Hahn	dae5cd73cb	Recommit "[LSR] Consider post-inc form when creating extends/truncates." This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2. The recommitted version now checks the PostIncLoopSets for all fixups and returns nullptr if the result doesn't match for all fixups.	2023-06-19 17:57:06 +01:00
Jeffrey Byrnes	ac2d6df2d6	[AMDGPU] Add basic support for extended i8 perm matching Differential Revision: https://reviews.llvm.org/D142782 Change-Id: Ibb95224f7885839e8b77a705f487f10b47a258a6	2023-06-19 09:53:25 -07:00
Jay Foad	eb7491769a	[AMDGPU] Reimplement the GFX11 early release VGPRs optimization Implement this optimization in SIInsertWaitcnts, where we already have information about whether there might be outstanding VMEM store instructions. This has the following advantages: - Correctly handles atomics-with-return. - Correctly handles call instructions. - Should be faster because it does not require running a separate pass. Differential Revision: https://reviews.llvm.org/D153279	2023-06-19 17:12:54 +01:00
Matt Arsenault	7c8958118c	AMDGPU: Remove amdgpu-waves-per-eu support in old attribute pass AMDGPUAttributor now handles this attribute with value merging, so delete the old approach which could only apply this to functions which did not set it, or cloned the function.	2023-06-19 11:50:50 -04:00
Krzysztof Parzyszek	734881a6d5	[Hexagon] Fix range checks for immediate operands The output assembly (textual) contains the instruction r29 = add(r29,#4294967136) The value 4294967136 is -160 when interpreted as a signed 32-bit integer, so it fits in the range of the immediate operand without a constant extender. The range check in HexagonInstrInfo was putting the operand value into an int variable, reporting no need for an extender. This resulted in a packet with 4 instructions, including the "add". The corresponding check in HexagonMCInstrInfo was using an int64_t variable, causing the range check to fail, and an extender to be emitted when lowering to MCInst, resulting in a packet with too many instructions.	2023-06-19 08:22:41 -07:00
David Green	d0f56c3e5c	[AArch64] Add and expand the testing of fmin/fmax reduction. NFC For both CodeGen and CostModelling, this adds extran testing for the new lvm.vector.reduce.fmaximum and lvm.vector.reduce.fminimum intrinsics, as well as making sure there is test coverage for all the various cases.	2023-06-19 15:47:21 +01:00
David Green	16b46dde0b	[AArch64] More tablegen patterns for addp of two extracts Similar to D152245, this adds integer addp patterns, using the larger v4i32 addp from addp extractlow, extracthi.	2023-06-19 07:52:46 +01:00
David Green	68f34e4d39	[AArch64] Add tablegen patterns for faddp of two extracts This adds some simple tablegen patterns for converting `faddp v2f32 extractlow(Rn), v2f32 extracthigh(Rn)` to `faddp v4f32 Rn, v4f32 Rn` using the q variants of the instructions, avoiding the extra ext needed to extract the high lanes. Only the bottom lanes of the new faddp are used, the second Rn operand is used as a placeholder. It uses Rn to prevent any false dependencies, but could equally by undef. Differential Revision: https://reviews.llvm.org/D152245	2023-06-19 07:48:31 +01:00
Fangrui Song	b9a134aa62	[XRay] Mark Mach-O xray_instr_map and xray_fn_idx as S_ATTR_LIVE_SUPPORT Add the `S_ATTR_LIVE_SUPPORT` attribute to the sections so that `ld -dead_strip` will retain subsections that reference live functions, once we we add linker private "l" symbols as atoms.	2023-06-18 19:30:16 -07:00
Jianjian GUAN	04ed822dcc	[RISCV] Match shl (ext v, splat 1) to vector widening add. Since we use match shl (v, splat 1) to vadd, we could also expand to widening add. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153112	2023-06-19 09:46:36 +08:00
Fangrui Song	49b61ead47	[XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes	2023-06-18 13:32:40 -07:00
Simon Pilgrim	fb60dda189	[GlobalIsel][X86] selectDivRem - fix typo in 64-bit AH handling code This function was lifted from fast-isel, and still referred to the Instruction::SRem/URrem opcodes, instead of the G_SREM/G_UREM opcodes. But it turns out these aren't necessary at all as only the G_SREM/G_UREM codepaths will use the AH register for DivRemResultReg anyhow.	2023-06-18 17:37:17 +01:00
Simon Pilgrim	46479ea785	[GlobalIsel][X86] Regenerate srem/urem select test coverage	2023-06-18 17:06:32 +01:00
Yingwei Zheng	315e3001c0	[CodeGenPrepare][RISCV] Remove asserting VH references before erasing the dead GEP Fixes issue https://github.com/llvm/llvm-project/issues/63365 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153194	2023-06-18 23:40:47 +08:00
Simon Pilgrim	e1164c7a92	[X86] Regenerate tls.ll and reuse common linux check prefixes	2023-06-18 16:02:59 +01:00

... 83 84 85 86 87 ...

52796 Commits