llvm-project

Author	SHA1	Message	Date
Craig Topper	ffa2810f7b	[RISCV] Optimize lowering of VECREDUCE_FMINIMUM/VECREDUCE_FMAXIMUM. (#85165 ) Use a normal min/max reduction that doesn't propagate nans and force the result to nan at the end if any elements were nan.	2024-03-14 12:51:29 -07:00
Thorsten Schütt	4f873730d6	precommit test	2024-03-14 19:18:40 +01:00
Florian Mayer	58a20a0b96	[MTE] fix bug that prevented stack coloring with MTE (#84422 )	2024-03-14 09:26:58 -07:00
Jonas Paulsson	6588ac3017	[MachineCombiner] Don't ignore PHI depths (#82025 ) The depths of the Root and the NewRoot are to be compared in MachineCombiner::improvesCriticalPathLen(), and while the call to BlockTrace.getInstrCycles(*Root) includes the Depth of a PHI, for some reason PHI nodes have been ignored in getOperandDef(). This patch removes the special handling of PHIs in getOperandDef() so that Root and NewRoot get a fair comparison. This does not affect loop headers as MachineTraceMetrics handles that case by ignoring incoming PHI edges.	2024-03-14 11:47:29 -04:00
Craig Topper	23323e2837	[TargetLowering][RISCV] Propagate fastmath flags for the vector operations emitted in expandVecReduce. (#85164 ) We used the fastmath flags for any scalar ops created, but not vector.	2024-03-14 08:39:32 -07:00
Michael Maitland	818e0272f5	[RISCV] Model integer min max instructions from Zbb execute in late-B ALU We don't model the early vs late ALU so we just need to remove usage of SiFivePipeA for these instructions.	2024-03-14 06:02:53 -07:00
Thorsten Schütt	5f774619ea	[GlobalIsel] Combine ADDO (#82927 ) Perform the requested arithmetic and produce a carry output in addition to the normal result. Clang has them as builtins (__builtin_add_overflow_p). The middle end has intrinsics for them (sadd_with_overflow). AArch64: ADDS Add and set flags On Neoverse V2, they run at half the throughput of basic arithmetic and have a limited set of pipelines.	2024-03-14 12:45:19 +01:00
Vyacheslav Levytskyy	afec257d36	[SPIRV] Add type inference of function parameters by call instances (#85077 ) This PR adds type inference of function parameters by call instances. Two use cases that demonstrate the problem are added.	2024-03-14 10:50:11 +01:00
paperchalice	edc2066465	[CodeGen][GC] Skip function without GC in `GCLoweringPass` (#84421 )	2024-03-14 13:07:41 +08:00
Carl Ritson	c29b265eb9	Reapply "[AMDGPU] Add pal metadata 3.0 support to callable pal funcs (#67104 )" This reverts commit 7d508eb5d38f4bbbab4230a666d9e742e271af61.	2024-03-14 10:56:43 +09:00
Kolya Panchenko	aa68e2814d	[RISCV] Support `llvm.masked.compressstore` intrinsic (#83457 ) The changeset enables lowering of `llvm.masked.compressstore(%data, %ptr, %mask)` for RVV for fixed vector type into: ``` %0 = vcompress %data, %mask, %vl %new_vl = vcpop %mask, %vl vse %0, %ptr, %1, %new_vl ``` Such lowering is only possible when `%data` fits into available LMULs and otherwise `llvm.masked.compressstore` is scalarized by `ScalarizeMaskedMemIntrin` pass. Even though RVV spec in the section `15.8` provide alternative sequence for compressstore, use of `vcompress + vcpop` should be a proper canonical form to lower `llvm.masked.compressstore`. If RISC-V target find the sequence from `15.8` better, peephole optimization can transform `vcompress + vcpop` into that sequence.	2024-03-13 15:18:51 -04:00
Usman Nadeem	0b46884036	Revert "Revert "[AArch64] Improve lowering of truncating uzp1"" (#85119 ) Reverts llvm/llvm-project#85115 The fix was already merged in `79cd2c0bb9`	2024-03-13 11:58:10 -07:00
Mehdi Amini	06e310fee1	Revert "[AArch64] Improve lowering of truncating uzp1" (#85115 ) Reverts llvm/llvm-project#82457 The bot is broken, likely because of mid-air collision.	2024-03-13 11:32:53 -07:00
Nadeem, Usman	79cd2c0bb9	[AArch64] Fix tests after PR82457 Change-Id: I44a7e4a10af750b3339d6564c6ce6c2e5c17778e	2024-03-13 09:55:22 -07:00
Usman Nadeem	57b991ab39	[AArch64] Improve lowering of truncating uzp1 (#82457 ) There were two existing patterns: `concat_vectors(trunc(x), trunc(y)) -> uzp1(x, y)` `concat_vectors(assertzext(trunc(x)), assertzext(trunc(y))) -> uzp1(x, y)` Move them into a class and add the following `assertsext` pattern to it: `concat_vectors(assertsext(trunc(x)), assertsext(trunc(y))) -> uzp1(x, y)` Add the following transform for v8i8 and v4i16 result types to help with pattern matching: `truncating uzp1(x, y) -> trunc(concat(x, y))` And a pattern to go with it: `trunc(concat_vectors(x, y)) -> uzp1 (x, y)` Add another isel pattern for v8i8 and v4i16 result vector types, similar to the existing concat pattern, but with a trunc node in the begining: `trunc(concat_vectors(assertext_trunc(x), assertext_trunc(y))) -> xtn(uzp1(x, y))`	2024-03-13 09:05:55 -07:00
Zaara Syeda	cc761a7c35	[PowerPC][NFC] Rename ADDItocL to match the 64-bit naming convention (#85099 ) In preparation of adding a similar instruction for large code model on AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8 to match the naming convention of other instructions with 32-bit and 64-bit variants.	2024-03-13 11:57:07 -04:00
Zaara Syeda	37b5eb0a0a	[AIX][TOC] Add -mtocdata/-mno-tocdata options on AIX (#67999 ) This patch enables support that the XL compiler had for AIX under -qdatalocal/-qdataimported.	2024-03-13 10:26:31 -04:00
Harald van Dijk	ceb744eb2f	[AMDGPU] Fix canonicalization of truncated values. (#83054 ) We were relying on roundings to implicitly canonicalize, which is generally safe, except with roundings that may be optimized away. Fixes #82937.	2024-03-13 12:08:39 +00:00
Simon Pilgrim	a7af53e99b	[DAG] visitSUB - convert some folds to use SDPatternMatch General cleanup and allows us to handle several commutable matches with a single pattern	2024-03-13 12:00:24 +00:00
Nikita Popov	20b15e645c	[Tests] Drop inrange attribute from some tests (NFC) These don't actually test anything related to inrange, so drop the attribute.	2024-03-13 11:49:16 +01:00
Sander de Smalen	e42e97a4ad	[AArch64][SME] Don't mark 'smstart za' as using/defining VG. (#84775 ) VG is only used/defined when changing the streaming mode, using 'smstart sm' or plainly 'smstart' (same for smstop).	2024-03-13 08:21:33 +00:00
Vyacheslav Levytskyy	0a443f13b4	[SPIR-V] Add implementation of G_SPLAT_VECTOR opcode and fix invalid types processing (#84766 ) This PR: * adds support for G_SPLAT_VECTOR generic opcode that may be legally generated instead of G_BUILD_VECTOR by previous passes of the translator (see https://github.com/llvm/llvm-project/pull/80378 for the source of breaking changes); * improves deduction of types for opaque pointers. This PR also fixes the following issues: * if a function has ptr argument(s), two functions that have different SPIR-V type definitions may get identical LLVM function types and break agreements of global register and duplicate checker; * checks for pointer types do not account for TypedPointerType. Update of tests: * A test case is added to cover the issue with function ptr parameters. * The first case, that is support for G_SPLAT_VECTOR generic opcode, is covered by existing test cases. * Multiple additional checks by `spirv-val` is added to cover more possibilities of generation of invalid code.	2024-03-13 08:32:01 +01:00
Lu Weining	e4edbae0aa	Revert "[llvm][LoongArch] Improve loongarch_lasx_xvpermi_q instrinsic" (#84708 ) Reverts llvm/llvm-project#82984 See the discussion in https://github.com/llvm/llvm-project/pull/83540.	2024-03-13 11:51:47 +08:00
4ast	2aacb56e83	BPF address space insn (#84410 ) This commit aims to support BPF arena kernel side [feature](https://lore.kernel.org/bpf/20240209040608.98927-1-alexei.starovoitov@gmail.com/): - arena is a memory region accessible from both BPF program and userspace; - base pointers for this memory region differ between kernel and user spaces; - `dst_reg = addr_space_cast(src_reg, dst_addr_space, src_addr_space)` translates src_reg, a pointer in src_addr_space to dst_reg, equivalent pointer in dst_addr_space, {src,dst}_addr_space are immediate constants; - number 0 is assigned to kernel address space; - number 1 is assigned to user address space. On the LLVM side, the goal is to make load and store operations on arena pointers "transparent" for BPF programs: - assume that pointers with non-zero address space are pointers to arena memory; - assume that arena is identified by address space number; - assume that address space zero corresponds to kernel address space; - assume that every BPF-side load or store from arena is done via pointer in user address space, thus convert base pointers using `addr_space_cast(src_reg, 0, 1)`; Only load, store, cmpxchg and atomicrmw IR instructions are handled by this transformation. For example, the following C code: ```c #define __as __attribute__((address_space(1))) void copy(int __as from, int __as to) { to = from; } ``` Compiled to the following IR: ```llvm define void @copy(ptr addrspace(1) %from, ptr addrspace(1) %to) { entry: %0 = load i32, ptr addrspace(1) %from, align 4 store i32 %0, ptr addrspace(1) %to, align 4 ret void } ``` Is transformed to: ```llvm %to2 = addrspacecast ptr addrspace(1) %to to ptr ;; ! %from1 = addrspacecast ptr addrspace(1) %from to ptr ;; ! %0 = load i32, ptr %from1, align 4, !tbaa !3 store i32 %0, ptr %to2, align 4, !tbaa !3 ret void ``` And compiled as: ```asm r2 = addr_space_cast(r2, 0, 1) r1 = addr_space_cast(r1, 0, 1) r1 = (u32 )(r1 + 0) (u32 )(r2 + 0) = r1 exit ``` Co-authored-by: Eduard Zingerman <eddyz87@gmail.com>	2024-03-13 02:27:25 +02:00
Michael Maitland	2f400a2fd7	[GISEL] Add G_VSCALE instruction (#84542 )	2024-03-12 20:22:49 -04:00
S. Bharadwaj Yadavalli	54f631d116	[DirectX][NFC] Model precise overload type specification of DXIL Ops (#83917 ) Implement an abstraction to specify precise overload types supported by DXIL ops. These overload types are typically a subset of LLVM intrinsics. Implement the corresponding changes in DXILEmitter backend. Add tests to verify expected errors for unsupported overload types at code generation time. Add tests to check for correct overload error output.	2024-03-12 16:51:18 -04:00
Arthur Eubanks	6bbb73b4cb	[X86] Fix determining if globals with size <8 bits are large (#84975 ) Previously any global under 8 bits would accidentally be considered 0 sized, which is considered a large global.	2024-03-12 12:43:29 -07:00
Arthur Eubanks	45219702e7	[test][X86] Precommit test for large data threshold and i1 global	2024-03-12 19:08:40 +00:00
Simon Pilgrim	c1af6ab505	[X86] getFauxShuffleMask - recognise CONCAT(SUB0, SUB1) style patterns Handles the INSERT_SUBVECTOR(INSERT_SUBVECTOR(UNDEF,SUB0,0),SUB1,N) pattern Currently limited to v8i64/v8f64 cases as only AVX512 has decent cross lane 2-input shuffles, the plan is to relax this as I deal with some regressions	2024-03-12 17:40:19 +00:00
Jun Wang	c4e517f59c	[AMDGPU] Adding the amdgpu_num_work_groups function attribute (#79035 ) A new function attribute named amdgpu_num_work_groups is added. This attribute, which consists of three integers, allows programmers to let the compiler know the number of workgroups to be launched in each of the three dimensions and do optimizations based on that information. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-03-12 10:30:39 -07:00
Matt Arsenault	bd72ebd8d1	AMDGPU: Add some more mfma hazard recognizer tests (#84727 )	2024-03-12 22:05:47 +05:30
Jake Egan	fa1d13590c	[AIX][tests] Disable failing tests on AIX These new tests are failing on the AIX bot because the -I option isn't supported. Disable these tests for now until they can be fixed.	2024-03-12 12:11:18 -04:00
Nemanja Ivanovic	08dd645c15	[RISC-V] Bad immediate value for Zcmp instructions with E extension (#84925 ) When we are using the Zcmp extension together with the E extension in 32-bit mode and we need to spill both callee-saved registers as well as needing a couple of 32-bit stack slots, we emit a meaningless stack adjustment with cm.push/cm.popret. Furthermore this leads to the stack slot for the ra being clobbered so control returns to a random location. This is just a pre-commit test so that the PR for the fix shows the difference in code generation.	2024-03-12 16:26:49 +01:00
Bjorn Pettersson	4d0f79e346	Pre commit test cases SRL/SRA support in canCreateUndefOrPoison. NFC Add test cases to show that we can't push freeze through SRA/SRL with 'exact' flag when there are multiple uses.	2024-03-12 16:03:18 +01:00
Danial Klimkin	afd4758703	Revert "[NVPTX] Add support for atomic add for f16 type" (#84918 ) Reverts llvm/llvm-project#84295 due to breakages.	2024-03-12 15:01:18 +01:00
Adrian Kuegel	8e0f4b943f	[NVPTX] Add support for atomic add for f16 type (#84295 ) atom.add.noftz.f16 is supported since SM 7.0	2024-03-12 09:12:44 +01:00
Dhruv Chawla (work)	1d900e2984	[AArch64][GlobalISel] Avoid generating inserts for undefs when selecting G_BUILD_VECTOR (#84452 ) It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef values choose random registers for copying values from.	2024-03-12 11:57:07 +05:30
Phoebe Wang	e89b4bcf32	[X86] Remove SlowDivide tuning from GRTTuning (#84676 ) The DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later products. Checked from Agner Fog's table and uops.info.	2024-03-12 13:41:49 +08:00
Craig Topper	884b051a42	Recommit "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690 )" With special case with Add constant is 0. Original message: We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected. More comments in the patch. The new cases check isLegalAddImmediate to avoid some regressions in lit tests.	2024-03-11 12:39:38 -07:00
Michael Maitland	034cc2f5d0	[GISEL] Add G_INSERT_SUBVECTOR and G_EXTRACT_SUBVECTOR (#84538 ) G_INSERT and G_EXTRACT are not sufficient to use to represent both INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector. We would like to be able to INSERT/EXTRACT on vectors in cases that INSERT/EXTRACT on vector subregisters are not sufficient, so we add these opcodes. I tried to do a patch where we treated G_EXTRACT as both G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop at this [point](`8b5b294ec2/llvm/lib/Target/RISCV/RISCVISelLowering.cpp (L9932)`) in the SDAG equivalent code.	2024-03-11 13:47:30 -04:00
Simon Pilgrim	6cd68c2f87	[X86] Add base SSE2 coverage to SRL/SRA combines tests	2024-03-11 16:25:05 +00:00
Simon Pilgrim	7dc4d5f6a0	[X86] Add AVX512 (x86-64-v4) coverage to generic shift combines tests	2024-03-11 16:22:47 +00:00
Sivan Shani	5e688f0dbd	[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm Re-land 634b0243b8f7acc85af4f16b70e91d86ded4dc83. T1 allow for an optional registers list, the register list must be {d0-d15}. T2 define a mandatory register list, the register list must be {d0-d31}. The requirements for T1/T2 are as follows: T1 T2 Require: v8-M.Main, v8.1-M.Main, secure state secure state 16 D Regs valid valid 32 D Regs UNDEFINED valid No D Regs NOP NOP	2024-03-11 14:27:28 +00:00
Pierre van Houtryve	d4569d42b5	[AMDGPU] Let LowerModuleLDS run twice on the same module (#81729 ) If all variables in the module are absolute, this means we're running the pass again on an already lowered module, and that works. If none of them are absolute, lowering can proceed as usual. Only diagnose cases where we have a mix of absolute/non-absolute GVs, which means we added LDS GVs after lowering, which is broken. See #81491 Split from #75333	2024-03-11 09:20:01 +01:00
Craig Topper	561ddb1687	Revert "[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690 )" This reverts commit 0813b90ff5d195d8a40c280f6b745f1cc43e087a. Fixes miscompile reported in #84718.	2024-03-11 00:51:21 -07:00
AtariDreams	4e0e9b17c6	[SelectionDAG] Switch to LiveRegUnits (#84197 )	2024-03-11 12:47:39 +05:30
Carl Ritson	4a21e3afa2	[LiveIntervals] repairIntervalsInRange: recompute width changes (#78564 ) Extend repairIntervalsInRange to completely recompute the interva for a register if subregister defs exist without precise subrange matches (LaneMask exactly matching subregister). This occurs when register sequences are lowered to copies such that the size of the copies do not match any uses of the subregisters formed (i.e. during twoaddressinstruction). The subranges without this change are probably legal, but do not match those generated by live interval computation. This creates problems with other code that assumes subranges precisely cover all subregisters defined, e.g. shrinkToUses().	2024-03-11 15:24:17 +09:00
Kito Cheng	b7f97d3661	[RISCV] Place mergeable small read only data into srodata section (#82214 ) Small mergeable read only data was place on the sdata before, but it also means it lose the mergeable property, which means lose some code size optimization opportunity during link time.	2024-03-11 13:57:06 +08:00
Carl Ritson	d9e6aa7048	[AMDGPU] Update LiveInterval def index for early-clobber (#79285 ) On converting an instruction to an early-clobber definition in convertToThreeAddress, we must also update live intervals for the register to start at the early-clobber index.	2024-03-11 14:54:11 +09:00
Craig Topper	d8d2dea7fc	[RISCV] Handle FP riscv_masked_strided_load with 0 stride. (#84576 ) Previously, we tried to create an integer extending load. We need to a non-extending FP load instead. Fixes #84541.	2024-03-10 21:22:37 -07:00

... 6 7 8 9 10 ...

52796 Commits