llvm-project

Author	SHA1	Message	Date
wanglei	a60a5421b6	Reland "[LoongArch] Support CTLZ with lsx/lasx" This patch simultaneously adds tests for `CTPOP`. This relands 07cec73dcd095035257eec1f213d273b10988130 with fix tests.	2023-12-02 17:22:40 +08:00
wanglei	63e6bba0c3	Revert "[LoongArch] Support CTLZ with lsx/lasx" This reverts commit 07cec73dcd095035257eec1f213d273b10988130.	2023-12-02 17:17:48 +08:00
wanglei	07cec73dcd	[LoongArch] Support CTLZ with lsx/lasx This patch simultaneously adds tests for `CTPOP`.	2023-12-02 17:13:36 +08:00
wanglei	66a3e4fafb	[LoongArch] Override TargetLowering::isShuffleMaskLegal By default, `isShuffleMaskLegal` always returns true, which can result in the expansion of `BUILD_VECTOR` into a `VECTOR_SHUFFLE` node in certain situations. Subsequently, the `VECTOR_SHUFFLE` node is expanded again into a `BUILD_VECTOR`, leading to an infinite loop. To address this, we always return false, allowing the expansion of `BUILD_VECTOR` through the stack.	2023-12-02 14:25:17 +08:00
Arthur Eubanks	d8a04398f9	Reland [X86] With large code model, put functions into .ltext with large section flag (#73037 ) So that when mixing small and large text, large text stays out of the way of the rest of the binary. This is useful for mixing precompiled small code model object files and built-from-source large code model binaries so that the the text sections don't get merged. The reland fixes an issue where a function in the large code model would reference small data without GOTOFF. This was incorrectly reverted in 76f78ecc789d58baa3a88b2fe2a57428f07e5362.	2023-12-01 14:23:44 -08:00
Craig Topper	7e7aaa53a1	[RISCV][GISel] Support G_ABS with Zbb. (#72939 ) We can use neg+max or negw+max.	2023-12-01 11:13:45 -08:00
David Green	aa7e873f2f	[AArch64] Regenerate fmin/fmax/memcpy legalization tests. NFC	2023-12-01 19:04:29 +00:00
Philip Reames	e817966718	[RISCV] Collapse fast unaligned access into a single feature [nfc-ish] (#73971 ) When we'd originally added unaligned-scalar-mem and unaligned-vector-mem, they were separated into two parts under the theory that some processor might implement one, but not the other. At the moment, we don't have evidence of such a processor. The C/C++ level interface, and the clang driver command lines have settled on a single unaligned flag which indicates both scalar and vector support unaligned. Given that, let's remove the test matrix complexity for a set of configurations which don't appear useful. Given these are internal feature names, I don't think we need to provide any forward compatibility. Anyone disagree? Note: The immediate trigger for this patch was finding another case where the unaligned-vector-mem wasn't being properly serialized to IR from clang which resulted in problems reproducing assembly from clang's -emit-llvm feature. Instead of fixing this, I decided getting rid of the complexity was the better approach.	2023-12-01 11:00:59 -08:00
Craig Topper	f866fde598	[RISCV][GISel] Lower G_FCONSTANT to constant pool load without F or D. (#73034 ) I used an IR test because it was easier than constructing different MIR test for each type of addressing.	2023-12-01 10:24:26 -08:00
Mircea Trofin	7832a8582a	[mlgo] Fix test post PR #73899 Opcode value change.	2023-12-01 09:05:22 -08:00
Dmitri Gribenko	76f78ecc78	Revert "Reland [X86] With large code model, put functions into .ltext with large section flag (#73037 )" This reverts commit 4bf8a688956a759b7b6b8d94f42d25c13c7af130. This commit seems to be breaking the semantics of the ObjectFile::isSectionText method, which breaks numba/llvmlite bindings.	2023-12-01 17:18:14 +01:00
Jon Roelofs	39d15a7d3b	[AArch64][SME] Remove implicit-def's on smstart (#69012 ) When we lower calls, the sequence of argument copy-to-reg nodes are glued to the smstart. In the InstrEmitter, these glued copies are turned into implicit defs, since the actual call instruction uses those physregs, resulting in the register allocator adding unnecessary copies of regs that are preserved anyway.	2023-12-01 07:34:22 -08:00
Matthew Devereau	e59a0cd7d8	[AArch64][SME2] Add SME2 builtins for zero { zt0 } (#72274 ) See https://github.com/ARM-software/acle/pull/217 Patch by: Kerry McLaughlin kerry.mclaughlin@arm.com	2023-12-01 14:30:39 +00:00
Matt Devereau	5fe7ae848c	[AArch64][SME2] Add ldr_zt, str_zt builtins and intrinsics (#72849 ) Adds the builtins: void svldr_zt(uint64_t zt, const void rn) void svstr_zt(uint64_t zt, void rn) And the intrinsics: call void @llvm.aarch64.sme.ldr.zt(i32, ptr) tail call void @llvm.aarch64.sme.str.zt(i32, ptr) Patch by: Kerry McLaughlin kerry.mclaughlin@arm.com	2023-12-01 09:34:38 +00:00
Ramkumar Ramachandra	4d1dc7770a	AMDGPU/load-global-i32: regenerate test using UTC (NFC) (#73962 ) Fix the RUN lines so that UTC runs cleanly, and regenerate the test load-global-i32.ll using utils/update_llc_test_checks.py.	2023-12-01 09:22:13 +00:00
Ramkumar Ramachandra	d48d1edcf3	PowerPC/aix-cc-abi: regenerate test using UTC (NFC) (#73963 ) Split out the parts of aix-cc-abi.ll that requires to be regenerated by utils/update_mir_test_checks.py into aix-cc-abi-mir.ll, and regenerate it using the script. Regenerate aix-cc-abi.ll using utils/update_llc_test_checks.py.	2023-12-01 08:22:18 +00:00
Valery Pykhtin	604c29e934	[AMDGPU] NFC. Add test for debug info on CFG annotation instructions. (#73959 )	2023-12-01 09:10:29 +01:00
Piyou Chen	5ecb37b45a	[RISCV] Make InitUndef handle undef operand (#65755 ) Address https://github.com/llvm/llvm-project/issues/65704. If the operand is marked as undef, the InitUndef misses this case. This patch makes InitUndef pass handle the undef operand case.	2023-12-01 01:04:42 -06:00
Uday Bondhugula	173fcf7da5	[NVPTX] Lower 16xi8 and 8xi8 stores efficiently (#73646 ) Lower 16xi8 vector stores in NVPTX ISel efficiently using st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads and 8xf16. Similarly, 8xi8 using st.v2.u32.	2023-12-01 11:00:01 +05:30
leecheechen	dbbc7c31c8	[LoongArch] Add some binary IR instructions testcases for LASX (#74031 ) The IR instructions include: - Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv - Bitwise Binary Operations: shl lshr ashr	2023-12-01 13:14:11 +08:00
Craig Topper	755c28a940	[GISel][Mips] Infer alignment when creating memory operand for G_VASTART. (#74004 )	2023-11-30 19:55:23 -08:00
Piyou Chen	d0a39e617b	[RISCV] default enable splitting regalloc between RVV and other (#72950 ) This patch make riscv-split-regalloc as true by default. It will not affect the codegen result if it vector register allocation doesn't exist. If there is the vector register allocation, it may affect the non-rvv register LiveInterval's segment/weight. It will make the allocation in a different order.	2023-11-30 21:12:46 -06:00
wanglei	ca66df3b02	[LoongArch] Add more and/or/xor patterns for vector types	2023-12-01 10:28:41 +08:00
wanglei	add224c0a0	[LoongArch] Custom lowering `ISD::BUILD_VECTOR`	2023-12-01 09:13:39 +08:00
wanglei	f2cbd1fdf7	[LoongArch] Add codegen support for insertelement	2023-12-01 09:13:39 +08:00
Paul Kirth	cfe1ece833	[clang][llvm][fatlto] Avoid cloning modules in FatLTO (#72180 ) https://github.com/llvm/llvm-project/issues/70703 pointed out that cloning LLVM modules could lead to miscompiles when using FatLTO. This is due to an existing issue when cloning modules with labels (see #55991 and #47769). Since this can lead to miscompilation, we can avoid cloning the LLVM modules, which was desirable anyway. This patch modifies the EmbedBitcodePass to no longer clone the module or run an input pipeline over it. Further, it make FatLTO always perform UnifiedLTO, so we can still defer the Thin/Full LTO decision to link-time. Lastly, it removes dead/obsolete code related to now defunct options that do not work with the EmbedBitcodePass implementation any longer.	2023-11-30 17:09:34 -08:00
Arthur Eubanks	4bf8a68895	Reland [X86] With large code model, put functions into .ltext with large section flag (#73037 ) So that when mixing small and large text, large text stays out of the way of the rest of the binary. This is useful for mixing precompiled small code model object files and built-from-source large code model binaries so that the the text sections don't get merged. The reland fixes an issue where a function in the large code model would reference small data without GOTOFF.	2023-11-30 15:17:17 -08:00
Jeffrey Byrnes	1b02f594b3	[AMDGPU] Rework dot4 signedness checks (#68757 ) Using the known/unknown value of the sign bit, reason about the signedness version of the dot4 instruction.	2023-11-30 13:38:05 -08:00
Eduard Zingerman	2484469803	Revert "[BPF] Attribute preserve_static_offset for structs" This reverts commit cb13e9286b6d4e384b5d4203e853d44e2eff0f0f. Buildbot reports MSAN failures in tests added in this commit: https://lab.llvm.org/buildbot/#/builders/5/builds/38806 Failing tests: LLVM :: CodeGen/BPF/preserve-static-offset/load-arr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-ptr-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-struct-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/load-union-pai.ll LLVM :: CodeGen/BPF/preserve-static-offset/store-pai.ll	2023-11-30 22:29:45 +02:00
Natalie Chouinard	f8a21dff70	[SPIR-V] Mark currently failing tests as XFAIL (#73858 ) These tests are currently failing and their fix is being tracked in Issue #60133. Marking them as XFAIL for now will get the test suite to a passing state so we can work on adding a GitHub action to automatically run these tests on a PR bot to help keep the tree green. Also removed the no-longer supported -opaque-pointers=0 flag from the couple tests where it was remaining.	2023-11-30 15:17:32 -05:00
Douglas Yung	c12de14876	Revert "[X86] Canonicalize fp zero vectors from bitcasted integer zero vectors" This reverts commit 169db80e41936811c6744f2c513a1ed00d97f10e. This change is causing many test failures on Windows bots: - https://lab.llvm.org/buildbot/#/builders/235/builds/3616 - https://lab.llvm.org/buildbot/#/builders/233/builds/4883 - https://lab.llvm.org/buildbot/#/builders/216/builds/31174	2023-11-30 11:59:50 -08:00
Simon Pilgrim	169db80e41	[X86] Canonicalize fp zero vectors from bitcasted integer zero vectors Generic code is supposed to handle this but can be blocked by hasOneUse checks. Noticed while investigating #26392	2023-11-30 18:33:52 +00:00
Simon Pilgrim	539e60c34a	[X86] X86FixupVectorConstantsPass - consistently use non-DQI 128/256-bit subvector broadcasts Without the predicate there's no benefit to using the DQI variants instead of the default AVX512F instructions	2023-11-30 18:33:52 +00:00
Eduard Zingerman	cb13e9286b	[BPF] Attribute preserve_static_offset for structs This commit adds a new BPF specific structure attribte `__attribute__((preserve_static_offset))` and a pass to deal with it. This attribute may be attached to a struct or union declaration, where it notifies the compiler that this structure is a "context" structure. The following limitations apply to context structures: - runtime environment might patch access to the fields of this type by updating the field offset; BPF verifier limits access patterns allowed for certain data types. E.g. `struct __sk_buff` and `struct bpf_sock_ops`. For these types only `LD/ST <reg> <static-offset>` memory loads and stores are allowed. This is so because offsets of the fields of these structures do not match real offsets in the running kernel. During BPF program load/verification loads and stores to the fields of these types are rewritten so that offsets match real offsets. For this rewrite to happen static offsets have to be encoded in the instructions. See `kernel/bpf/verifier.c:convert_ctx_access` function in the Linux kernel source tree for details. - runtime environment might disallow access to the field of the type through modified pointers. During BPF program verification a tag `PTR_TO_CTX` is tracked for register values. In case if register with such tag is modified BPF programs are not allowed to read or write memory using register. See kernel/bpf/verifier.c:check_mem_access function in the Linux kernel source tree for details. Access to the structure fields is translated to IR as a sequence: - `(load (getelementptr %ptr %offset))` or - `(store (getelementptr %ptr %offset))` During instruction selection phase such sequences are translated as a single load instruction with embedded offset, e.g. `LDW %ptr, %offset`, which matches access pattern necessary for the restricted set of types described above (when `%offset` is static). Multiple optimizer passes might separate these instructions, this includes: - SimplifyCFGPass (sinking) - InstCombine (sinking) - GVN (hoisting) The `preserve_static_offset` attribute marks structures for which the following transformations happen: - at the early IR processing stage: - `(load (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.load`; - `(store (getelementptr ...))` replaced by call to intrinsic `llvm.bpf.getelementptr.and.store`; - at the late IR processing stage this modification is undone. Such handling prevents various optimizer passes from generating sequences of instructions that would be rejected by BPF verifier. The __attribute__((preserve_static_offset)) has a priority over __attribute__((preserve_access_index)). When preserve_access_index attribute is present preserve access index transformations are not applied. This addresses the issue reported by the following thread: https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 Differential Revision: https://reviews.llvm.org/D133361	2023-11-30 19:45:03 +02:00
Momchil Velikov	cc944f502f	[AArch64] Stack probing for function prologues (#66524 ) This adds code to AArch64 function prologues to protect against stack clash attacks by probing (writing to) the stack at regular enough intervals to ensure that the guard page cannot be skipped over. The patch depends on and maintains the following invariants: Upon function entry the caller guarantees that it has probed the stack (e.g. performed a store) at some address [sp, #N], where`0 <= N <= 1024`. This invariant comes from a requirement for compatibility with GCC. Any address range in the allocated stack, no smaller than stack-probe-size bytes contains at least one probe At any time the stack pointer is above or in the guard page Probes are performed in descreasing address order The stack-probe-size is a function attribute that can be set by a platform to correspond to the guard page size. By default, the stack probe size is 4KiB, which is a safe default as this is the smallest possible page size for AArch64. Linux uses a 64KiB guard for AArch64, so this can be overridden by the stack-probe-size function attribute. For small frames without a frame pointer (<= 240 bytes), no probes are needed. For larger frame sizes, LLVM always stores x29 to the stack. This serves as an implicit stack probe. Thus, while allocating stack objects the compiler assumes that the stack has been probed at [sp]. There are multiple probing sequences that can be emitted, depending on the size of the stack allocation: A straight-line sequence of subtracts and stores, used when the allocation size is smaller than 5 guard pages. A loop allocating and probing one page size per iteration, plus at most a single probe to deal with the remainder, used when the allocation size is larger but still known at compile time. A loop which moves the SP down to the target value held in a register (or a loop, moving a scratch register to the target value help in SP), used when the allocation size is not known at compile-time, such as when allocating space for SVE values, or when over-aligning the stack. This is emitted in AArch64InstrInfo because it will also be used for dynamic allocas in a future patch. A single probe where the amount of stack adjustment is unknown, but is known to be less than or equal to a page size. --------- Co-authored-by: Oliver Stannard <oliver.stannard@linaro.org>	2023-11-30 17:41:51 +00:00
Michael Maitland	12326f5ff0	[RISCV][GISEL] lowerFormalArguments for variadic arguments (#73064 ) This is based of the varargs coe in RISCVTargetLowering::LowerFormalArguments.	2023-11-30 12:15:35 -05:00
David Green	4d80122598	[AArch64] Teach areMemAccessesTriviallyDisjoint about scalable widths. (#73655 ) The base change here is to change getMemOperandWithOffsetWidth to return a TypeSize Width, which in turn allows areMemAccessesTriviallyDisjoint to reason about trivially disjoint widths.	2023-11-30 16:54:28 +00:00
Michael Maitland	a6f7278595	[RISCV][GISEL] legalize, regbankselect, and instruction-select G_PTRMASK (#73062 ) This is done in instruction-select instead of in legalization for the sake of alias analysis.	2023-11-30 11:54:01 -05:00
Michael Maitland	dbb9043dea	[RISCV][GISEL] legalize, regbankselect, and instruction-select for G_… (#73061 ) …[UN]MERGE_VALUES When MERGE or UNMERGE s64 on a subtarget that is non-64bit, it must have the D extension and use FPR in order to be legal. All other instances of MERGE and UNMERGE that can be made legal should be narrowed, widend, or replaced by the combiner.	2023-11-30 11:53:25 -05:00
Michael Maitland	6976dac09d	[RISCV][GISEL] regbankselect and instruction-select for G_IMPLICIT_DEF (#73060 ) This is similar to the selection of G_IMPLICIT_DEF in AArch64. Regbankselect may need to be improved in a future patch.	2023-11-30 11:38:02 -05:00
Philip Reames	ff5e536b5e	[RISCV] Add combines to form binop from tail insert idioms (#72675 ) This patch contains two related combines: 1) If we have an scalar vector insert into the result of a concat_vector, sink the insert into the operand of the concat. 2) If we have a insert of a scalar binop into a vector binop of the same opcode and the RHS of both are constant, perform the insert and then the binop. The common theme to both is pushing inserts closer to the sources of the computation graph. The goal is to enable forming vector bin ops from inserts of scalar binops at the end of another vector. For RISCV specifically, the concat_vector transform will push inserts to smaller vectors. This will have the effect of reducing lmul for the vslides, and usually doesn't require an additional vsetvli since the source vectors are already working in the narrower VL. I tried that one as a target independent combine first, and it doesn't appear profitable on all targets. This is only one approach to the problem. Another idea would be to aggressively form build_vectors and subvector inserts from the individual scalar inserts, and then have a transform which sunk a subvector_insert down through the concat. The advantage of the alternate approach is that we expose parallelism in the insert sequence, even if the source vector isn't a concat_vector. If reviewers are okay with it, I'd like to start with this approach, and then explore that direction in a follow up patch.	2023-11-30 07:32:42 -08:00
Piotr Sobczak	73d9f5fda6	[AMDGPU] Add test for GCNRegPressure tracker bug (#73786 ) Add a test to document an existing problem in GCNRegPressure tracker. The upward tracker does not count the registers used (16 of them) in movrel instruction (for example V_INDIRECT_REG_WRITE_MOVREL_B32_V16). The downward tracker counts the registers but reports a mismatch: %0:L0000000000000C00 isn't found in LIS reported set	2023-11-30 16:26:44 +01:00
Shengchen Kan	a4e1aa256b	[X86][tablgen] Auto-gen broadcast tables (#73654 ) 1. Add TB_BCAST_SH for FP16 2. Auto-gen 4 broadcast tables BroadcastTable[1-4] issue: https://github.com/llvm/llvm-project/issues/66360	2023-11-30 22:24:31 +08:00
David Spickett	99d485917a	[llvm][AArch64] Preserve regmask when expanding the BLR_BTI pseudo instruction (#73927 ) Fixes #73787 Not doing so lead to us making use of a register after the call, which has been clobbered by the call. Added an MIR test that runs only the pseudo expansion pass.	2023-11-30 14:23:26 +00:00
leecheechen	29a0f3ec2b	[LoongArch] Add some binary IR instructions testcases for LSX (#73929 ) The IR instructions include: - Binary Operations: add fadd sub fsub mul fmul udiv sdiv fdiv - Bitwise Binary Operations: shl lshr ashr	2023-11-30 21:41:18 +08:00
Matt Arsenault	c44dca15a4	MachineVerifier: Reject extra non-register operands on instructions (#73758 ) We were allowing extra immediate arguments, and only bothering to check if registers were implicit or not. Also consolidate extra operand checks in verifier, to make this testable. We had 3 different places checking if you were trying to build an instruction with more operands than allowed by the definition. We had an assertion in addOperand, a direct check in the MIRParser to avoid the assertion, and the machine verifier checks. Remove the assert and parser check so the verifier can provide a consistent verification experience, which will also handle instructions modified in place.	2023-11-30 22:33:42 +09:00
David Green	269e3049ea	[AArch64] Remove invalid check lines from sme-aarch64-svcount.ll. NFC	2023-11-30 12:15:54 +00:00
Paul Walker	4db451a87d	[LLVM][SVE] Honour calling convention when using SVE for fixed length vectors. (#70847 ) NOTE: I'm not sure how many of the corner cases are part of the documented ABI but that shouldn't matter because my goal is for `-msve-vector-bits` to have no affect on the way arguments and returns are processed.	2023-11-30 12:09:58 +00:00
Simon Pilgrim	1d20b009a0	[X86] Enable v8f16/v16f16/v32f16 FCOPYSIGN custom lowering on SSE2/AVX/AVX512 targets	2023-11-30 11:48:33 +00:00
Simon Pilgrim	e653e0303d	[X86] Add fcopysign vector test coverage	2023-11-30 11:33:39 +00:00

... 34 35 36 37 38 ...

52796 Commits