llvm-project

Author	SHA1	Message	Date
Thorsten Schütt	deefe3fbc9	[GlobalIsel] Post-review combine ADDO (#85961 ) https://github.com/llvm/llvm-project/pull/82927	2024-03-21 03:56:40 +01:00
Freddy Ye	07a5e31cb3	Move pre-commit test for #85737 (#86062 )	2024-03-21 10:55:26 +08:00
Freddy Ye	35a66f965c	Precommit test for #85737 (#86056 ) Copied from llvm/test/CodeGen/X86/domain-reassignment.mir	2024-03-21 10:19:28 +08:00
Paul Kirth	f6f474c4ef	[llvm][lld] Pre-commit tests for RISCV TLSDESC symbols Currently, we mistakenly mark the local labels used in RISC-V TLSDESC as TLS symbols, when they should not be. This patch adds tests with the current incorrect behavior, and subsequent patches will address the issue. Reviewers: MaskRay, topperc Reviewed By: MaskRay Pull Request: https://github.com/llvm/llvm-project/pull/85816	2024-03-20 13:39:39 -07:00
S. Bharadwaj Yadavalli	3f39571228	[DirectX][DXIL] Distinguish return type for overload type resolution. (#85646 ) Return type of DXIL Ops may be different from valid overload type of the parameters, if any. Such DXIL Ops are correctly represented in DXIL.td. However, DXILEmitter assumes the return type to be the same as parameter overload type, if one exists. This results in generation in incorrect overload index value in DXILOperation.inc for the DXIL Op and incorrect DXIL operation function call in DXILOpLowering pass. This change distinguishes return types correctly from parameter overload types in DXILEmitter backend to handle such DXIL ops. Add specification for DXIL Op `isinf` and corresponding tests to verify the above change. Fixes issue #85125	2024-03-20 14:48:16 -04:00
Craig Topper	891172d9be	[RISCV] Use 'riscv-isa' module flag to set ELF flags and attributes. (#85155 ) Walk all the ISA strings and set the subtarget bits for any extension we find in any string. This allows LTO output to have a ELF attributes from the union of all of the files used to compile it.	2024-03-20 11:35:19 -07:00
Vyacheslav Levytskyy	c2483ed52d	[SPIRV] Add __spirv_ builtins for existing instructions (#85654 ) This PR: * adds __spirv_ builtins for existing instructions; * fixes parsing of "syncscope" values in atomic instructions; * fix a special case of binary header emision.	2024-03-20 19:28:29 +01:00
Vyacheslav Levytskyy	949d70d5e0	[SPIR-V] Fix incorrect bitwise instructions applied to the bool type (#85929 ) This PR ensures that LLVM IR bitwise instructions result in logical SPIR-V instructions when applied to i1 type.	2024-03-20 19:23:12 +01:00
Jonas Paulsson	9ebd329ad8	Revert "Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 )" This reverts commit 05bde30585710a51592eee0a6cf6df8184d09c92. Reverting due to verifier complaints with expensive checks on build-bot.	2024-03-20 11:48:30 -04:00
Craig Topper	576d81baa5	[RISCV] Use REG_SEQUENCE/EXTRACT_SUBREG to move between individual GPRs and GPRPair. (#85887 ) Previously we used memory like we do to move between GPRs and FPR64 with the D extension on RV32. We can instead use REG_SEQUENCE/EXTRACT_SUBREG to inform register allocation how to do the copy without memory.	2024-03-20 08:44:24 -07:00
Thomas Lively	767e0c8bce	[WebAssembly] Select BUILD_VECTOR with large unsigned lane values (#85880 ) Previously we expected lane constants to be in the range of signed values for each lane size, but the included test case produced large unsigned values that fall outside that range. Allow instruction selection to proceed in this case rather than failing. Fixes #63817.	2024-03-20 08:42:42 -07:00
Neumann Hon	5fb2797f23	[GOFF][z/OS] Change PrivateGlobalPrefix and PrivateLabelPrefix to be L# (#85730 ) The current values for PrivateGlobalPrefix and PrivateLabelPrefix (@@ and @ respectively) are, in hindsight, poor choices for multiple reasons: First, there exist externally visible routines from the language environment that begin with @@. These functions are certainly not local/private by any means and they should not share a prefix with private globals. Secondly, both private globals and private labels should be handled the same way by GOFF, so it doesn't make much sense for them to have separate prefixes. GOFF remains the only file format where these are different and there is no reason for that to be the case	2024-03-20 10:30:30 -04:00
Jonas Paulsson	05bde30585	Move assertion for AdjustsStack from PEI to MachineVerifier. (#85698 ) Have the verifier report a missing AdjustsStack flag rather than waiting until PEI asserts.	2024-03-20 10:29:12 -04:00
Benjamin Kramer	5f5a64134b	Revert "[DAGCombiner] Simplifying `{si\|ui}tofp` when only signbit is needed" This reverts commit 353fbeb0a294d2c7cef6d88607fa0fd50ee81462. It crashes when it encounters an UINT_TO_FP. llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1618 in SDValue llvm::SelectionDAG::getConstant(const ConstantInt &, const SDLoc &, EVT, bool, bool): VT.isInteger() && "Cannot create FP integer constant!"	2024-03-20 15:08:37 +01:00
Pravin Jagtap	e52a687871	[AMDGPU][NFC] Test clean up (#85922 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-20 17:29:42 +05:30
Pravin Jagtap	070d1e8321	[AMDGPU] Add test for fpext & fptrunc with bf16. (#85909 ) Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-20 14:45:38 +05:30
YunQiang Su	d7e28cd82b	MIPS: Support -m(no-)unaligned-access for r6 (#85174 ) MIPSr6 ISA requires normal load/store instructions support misunaligned memory access, while it is not always do so by hardware. On some microarchitectures or some corner cases it may need support by OS. Don't confuse with pre-R6's lwl/lwr famlily: MIPSr6 doesn't support them, instead, r6 requires lw instruction support misunaligned memory access. So, if -mstrict-align is used for pre-R6, lwl/lwr won't be disabled. If -mstrict-align is used for r6 and the access is not well aligned, some lb/lh instructions will be used to replace lw. This is useful for OS kernels. To be back-compatible with GCC, -m(no-)unaligned-access are also added as Neg-Alias of -m(no-)strict-align.	2024-03-20 14:18:24 +08:00
Peter Rong	4a026b5092	[AMDGCN] Use ZExt when handling indices in insertment element (#85718 ) When i1 true is used as an index, SExt extends it to i32 -1. This would cause BitVector to overflow. The language manual have specified that the index shall be treated as an unsigned number, this patch fixes that. (https://llvm.org/docs/LangRef.html#insertelement-instruction) This patch fixes #85717 --------- Signed-off-by: Peter Rong <PeterRong96@gmail.com>	2024-03-19 21:44:08 -07:00
Jiahan Xie	4bf06bebb9	[GISEL][RISCV] IRTranslator for scalable vector load (#80006 ) Add IRTranslator for scalable vector load instruction and include corresponding tests with alignment argument included, which can be smaller/equal/larger than element size or smaller/equal/larger than the minimum total vector size.	2024-03-19 20:12:26 -04:00
Alex MacLean	888e284903	[NVPTX] Use PTX prmt for llvm.bswap (#85545 )	2024-03-19 15:18:53 -07:00
Noah Goldstein	353fbeb0a2	[DAGCombiner] Simplifying `{si\|ui}tofp` when only signbit is needed If we only need the signbit `uitofp` simplified to 0, and `sitofp` simplifies to `bitcast`. Closes #85138	2024-03-19 17:17:35 -05:00
Noah Goldstein	ebd1379663	[DAGCombiner] Add tests for simplifying `{si\|ui}tofp`; NFC	2024-03-19 17:17:35 -05:00
quic-areg	31f4b329c8	[Hexagon] ELF attributes for Hexagon (#85359 ) Defines a subset of attributes and emits them to a section called .hexagon.attributes. The current attributes recorded are the attributes needed by llvm-objdump to automatically determine target features and eliminate the need to manually pass features.	2024-03-19 16:22:30 -05:00
Simon Pilgrim	2377b9773d	[DAG] SimplifyShift - shift i1/vXi1 X, Y --> X (any non-zero shift amount is undefined). Alive2: https://alive2.llvm.org/ce/z/SdESbg Fixes #85681	2024-03-19 20:18:37 +00:00
Changpeng Fang	ab76052fa9	AMDGPU: Treat SWMMAC the same as MFMA and other WMMA for sched_barrier (#85721 )	2024-03-19 09:58:09 -07:00
Luke Lau	e59f120e3a	[RISCV] Add test for strided load combine regression. NFC This adds a reduced test case for the regression seen in x264 with #83035. If the intermediate concatenating shuffles are large enough then the splitting combine will prevent the strided load combine which is preferable.	2024-03-20 00:38:23 +08:00
Farzon Lotfi	081a66ffac	[DXIL] implement dot intrinsic lowering for integers (#85662 ) this implements part 1 of 2 for #83626 - `CGBuiltin.cpp` - modified to have seperate cases for signed and unsigned integers. - `SemaChecking.cpp` - modified to prevent the generation of a double dot product intrinsic if the builtin were to be called directly. - `IntrinsicsDirectX.td` creation of the signed and unsigned dot intrinsics needed for instruction expansion. - `DXILIntrinsicExpansion.cpp` - handle instruction expansion cases for integer dot product.	2024-03-19 12:03:43 -04:00
Simon Pilgrim	66125ad8e9	[X86] Add test coverage for vector avgceils/avgceilu/avgfloors/avgflooru test patterns SSE only has AVGCEILU vXi8/vXi16 support - but for other types we should be trying to use the fixed width expansion instead of extensions	2024-03-19 15:32:40 +00:00
ostannard	ef395a492a	[AArch64] Add soft-float ABI (#84146 ) This is re-working of #74460, which adds a soft-float ABI for AArch64. That was reverted because it causes errors when building the linux and fuchsia kernels. The problem is that GCC's implementation of the ABI compatibility checks when using the hard-float ABI on a target without FP registers does it's checks after optimisation. The previous version of this patch reported errors for all uses of floating-point types, which is stricter than what GCC does in practice. This changes two things compared to the first version: * Only check the types of function arguments and returns, not the types of other values. This is more relaxed than GCC, while still guaranteeing ABI compatibility. * Move the check from Sema to CodeGen, so that inline functions are only checked if they are actually used. There are some cases in the linux kernel which depend on this behaviour of GCC.	2024-03-19 13:58:51 +00:00
Ulrich Weigand	335f365982	Reapply: [SystemZ] Fix overflow flag for i128 USUBO We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement USUBO/USUBO_CARRY for the i128 data type. However, these instructions use an inverted sense of the borrow indication flag (a value of 1 indicates no borrow, while a value of 0 indicated borrow). This does not match the semantics of the boolean "overflow" flag of the USUBO/USUBO_CARRY ISD nodes. Fix this by generating code to explicitly invert the flag. These cancel out of the result of USUBO feeds into an USUBO_CARRY. To avoid unnecessary zero-extend operations, also improve the DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc))) sequences where appropriate. Fixes: https://github.com/llvm/llvm-project/issues/83268	2024-03-19 14:07:08 +01:00
Shourya Goel	92764c99e9	[DAG] Matched Fixedwidth Pattern for ISD::AVGCEILU (#85031 ) Fixes: #84753	2024-03-19 13:02:37 +00:00
Luke Lau	ef520ca6b1	Revert "[RISCV] Recursively split concat_vector into smaller LMULs (#83035 )" This reverts commit c59129a7c79448837d665de8f2743ad4b14666f6. This causes regressions in some x264 workloads like pixel_var_8x8 due to it interfering with the strided load combine. Reverting so I can try to rework it as a lowering instead.	2024-03-19 20:59:03 +08:00
Simon Pilgrim	9f433bf8ca	[X86] Add PAVG(0,x) test coverage for PR #85581	2024-03-19 12:39:47 +00:00
Pravin Jagtap	08701e35ed	[AMDGPU][NFC] Test clean up. (#85775 ) Added common check for DPP and Iterative strategies for uniform value case since optimization applied is same. Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>	2024-03-19 18:00:34 +05:30
Pierre van Houtryve	953c13b5c9	[AMDGPU][PromoteAlloca] Whole-function alloca promotion to vector (#84735 ) Update PromoteAllocaToVector so it considers the whole function before promoting allocas. Allocas are scored & sorted so the highest value ones are seen first. The budget is now per function instead of per alloca. Passed internal performance testing.	2024-03-19 11:49:22 +01:00
Ulrich Weigand	d1c3795968	Revert "Fix overflow flag for i128 USUBO" This reverts commit d9c31ee9568277e4303715736b40925e41503596.	2024-03-19 11:43:05 +01:00
Ulrich Weigand	d9c31ee956	Fix overflow flag for i128 USUBO We use the VSCBIQ/VSBIQ/VSBCBIQ family of instructions to implement USUBO/USUBO_CARRY for the i128 data type. However, these instructions use an inverted sense of the borrow indication flag (a value of 1 indicates no borrow, while a value of 0 indicated borrow). This does not match the semantics of the boolean "overflow" flag of the USUBO/USUBO_CARRY ISD nodes. Fix this by generating code to explicitly invert the flag. These cancel out of the result of USUBO feeds into an USUBO_CARRY. To avoid unnecessary zero-extend operations, also improve the DAGCombine handling of ZERO_EXTEND to optimize (zext (xor (trunc))) sequences where appropriate. Fixes: https://github.com/llvm/llvm-project/issues/83268	2024-03-19 11:20:52 +01:00
Shourya Goel	703920d413	[DAG] Matched FixedWidth pattern for ISD::AVGFLOORU (#84903 ) Fixes: #84749	2024-03-19 08:29:55 +00:00
Adrian Kuegel	f0a5e50550	[llvm][NVPTX] Add missing feature guard.	2024-03-19 06:53:14 +00:00
Jonas Paulsson	8b8e1adbde	[SystemZ] Don't lower ATOMIC_LOAD/STORE to LOAD/STORE (#75879 ) - Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and select those nodes properly instead. This avoids exposing them to the DAGCombiner. - AtomicExpand pass no longer casts float/double atomic load/stores to integer (FP128 is still casted).	2024-03-18 17:21:50 -04:00
David Green	9a784303a3	[AArch64][GlobalISel] Legalize small G_TRUNC (#85625 ) This is an alternative to #85610, that moreElement's small G_TRUNC vectors to widen the vectors. It needs to disable one of the existing Unmerge(Trunc(..)) combines, and some of the code is not as optimal as it could be. I believe with some extra optimizations it could look better (I was thinking combining trunc(buildvector) -> buildvector and possibly improving buildvector lowering by generating insert_vector_element earlier).	2024-03-18 10:04:31 -07:00
Jonas Paulsson	09bc6abba6	[MachineFrameInfo] Refactoring around computeMaxcallFrameSize() (NFC) (#78001 ) - Use computeMaxCallFrameSize() in PEI::calculateCallFrameInfo() instead of duplicating the code. - Set AdjustsStack in FinalizeISel instead of in computeMaxCallFrameSize().	2024-03-18 10:37:59 -04:00
Qiu Chaofan	e5b20c83e5	[PowerPC] Update chain uses when emitting lxsizx (#84892 )	2024-03-18 22:31:05 +08:00
Vyacheslav Levytskyy	59f34e8c2b	[SPIRV] Add Lifetime intrinsics/instructions (#85391 ) This PR: * adds Lifetime intrinsics/instructions * fixes how the binary header is emitted (correct version and better approximation of Bound) * add validation into more test cases	2024-03-18 11:42:44 +01:00
Yingwei Zheng	38a44bdc93	[CodeGenPrepare] Reverse the canonicalization of isInf/isNanOrInf (#81572 ) In commit `2b582440c1`, we canonicalize the isInf/isNanOrInf idiom into fabs+fcmp for better analysis/codegen (See also the discussion in https://github.com/llvm/llvm-project/pull/76338). This patch reverses the fabs+fcmp to `is.fpclass`. If the `is.fpclass` is not supported by the target, it will be expanded by TLI. Fixes the regression introduced by `2b582440c1` and https://github.com/llvm/llvm-project/pull/80414#issuecomment-1936374206.	2024-03-18 18:27:45 +08:00
pvanhout	3493438605	Revert "[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333 )" This reverts commit 9b98692eedb78aa106539c36ba02944f32cae1ff.	2024-03-18 11:18:57 +01:00
Sander de Smalen	7fad304a03	[AArch64][SME] Make coalescer barrier available without +sme. (#85311 ) For each call that changes the streaming-mode ISel inserts a COALESCER_BARRIER node for the FP and (non-scalable) vector arguments to the callee. When calling a non-streaming function from a streaming-compatible function, it's not required to have +sme (in case the SME code-path is not actually executed at runtime). The patterns to match the COALESCER_BARRIER however were still predicated with `HasSME`, which is incorrect. This patch tries to fix that.	2024-03-18 09:43:03 +00:00
Pierre van Houtryve	9b98692eed	[AMDGPU] Run LowerLDS at the end of the fullLTO pipeline (#75333 ) This change allows us to use `--lto-partitions` in some cases (not at all guaranteed it works perfectly), as LDS is lowered before the module is split for parallel codegen. We must run LowerLDS before splitting modules as it needs to see all callers of functions with LDS to properly lower them.	2024-03-18 09:09:43 +01:00
Qiu Chaofan	65ae09eeb6	[PowerPC] Fix behavior of rldimi/rlwimi/rlwnm builtins (#85040 ) rldimi is 64-bit instruction, so the corresponding builtin should not be available in 32-bit mode. Rotate amount should be in range and cases when mask is zero needs special handling. This change also swaps the first and second operands of rldimi/rlwimi to match previous behavior. For masks not ending at bit 63-SH, rotation will be inserted before rldimi.	2024-03-18 14:17:16 +08:00
Sameer Sahasrabuddhe	ec34699f75	[GlobalISel] convergence control tokens and intrinsics (#67006 ) [GlobalISel] Implement convergence control tokens and intrinsics in GMIR In the IR translator, convert the LLVM token type to LLT::token(), which is an alias for the s0 type. These show up as implicit uses on convergent operations. Differential Revision: https://reviews.llvm.org/D158147	2024-03-18 10:34:11 +05:30

1 2 3 4 5 ...

52521 Commits