llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	53b9d479d5	[X86] i256-add - replace i386 triple X32 check prefixes with X86 and add gnux32 triple tests	2024-01-31 11:04:19 +00:00
Vyacheslav Levytskyy	5a07774fe1	[SPIR-V] Improve how lowering of formal arguments in SPIR-V Backend interprets a value of 'kernel_arg_type' (#78730 ) The goal of this PR is to tolerate differences between description of formal arguments by function metadata (represented by "kernel_arg_type") and LLVM actual parameter types. A compiler may use "kernel_arg_type" of function metadata fields to encode detailed type information, whereas LLVM IR may utilize for an actual parameter a more general type, in particular, opaque pointer type. This PR proposes to resolve this by a fallback to LLVM actual parameter types during the lowering of formal function arguments in cases when the type can't be created by string content of "kernel_arg_type", i.e., when "kernel_arg_type" contains a type unknown for the SPIR-V Backend. An example of the issue manifestation is https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/main/test/transcoding/KernelArgTypeInOpString.ll, where a compiler generates for the following kernel function detailed `kernel_arg_type` info in a form of `!{!"image_kernel_data", !"myInt", !"struct struct_name"}`, and in LLVM IR same arguments are referred to as `@foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData)`. Both definitions are correct, and the resulting LLVM IR is correct, but lowering stage of SPIR-V Backend fails to generate SPIR-V type. ``` typedef int myInt; typedef struct { int width; int height; } image_kernel_data; struct struct_name { int i; int y; }; void kernel foo(__global image_kernel_data* in, __global struct struct_name outData, myInt out) {} ``` ``` define spir_kernel void @foo(ptr addrspace(1) %in, i32 %out, ptr addrspace(1) %outData) ... !kernel_arg_type !7 ... { entry: ret void } ... !7 = !{!"image_kernel_data", !"myInt", !"struct struct_name"} ``` The PR changes a contract of `SPIRVType getArgSPIRVType(...)` in a way that it may return `nullptr` to signal that the metadata string content is not recognized, so corresponding comments are added and a couple of checks for `nullptr` are inserted where appropriate.	2024-01-31 02:58:50 -08:00
Jay Foad	c2c650f62e	[AMDGPU] Stop combining arbitrary offsets into PAL relocs (#80034 ) PAL uses ELF REL (not RELA) relocations which can only store a 32-bit addend in the instruction, even for reloc types like R_AMDGPU_ABS32_HI which require the upper 32 bits of a 64-bit address calculation to be correct. This means that it is not safe to fold an arbitrary offset into a GlobalAddressSDNode, so stop doing that. In practice this is mostly a problem for small negative offsets which do not work as expected because PAL treats the 32-bit addend as unsigned.	2024-01-31 10:28:23 +00:00
Yingwei Zheng	50e80e06d1	[ValueTracking] Merge `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` (#76360 ) This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into `computeKnownFPClass` to improve the signbit inference. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-01-31 18:26:50 +08:00
Yingwei Zheng	89f87c3876	[RISCV][MC] Add MC layer support for the experimental zabha extension (#80005 ) This patch implements the zabha (Byte and Halfword Atomic Memory Operations) v1.0-rc1 extension. See also https://github.com/riscv/riscv-zabha/blob/v1.0-rc1/zabha.adoc.	2024-01-31 17:06:43 +08:00
Sander de Smalen	dd73666182	[SME] Stop RA from coalescing COPY instructions that transcend beyond smstart/smstop. (#78294 ) This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that expands to a 'nop', but which stops the register allocator from coalescing a COPY node when its use/def crosses a SMSTART or SMSTOP instruction. For example: %0:fpr64 = COPY killed $d0 undef %2.dsub:zpr = COPY %0 // <- Do not coalesce this COPY ADJCALLSTACKDOWN 0, 0 MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0 $d0 = COPY killed %0 BL @use_f64, csr_aarch64_aapcs If the COPY would be coalesced, that would lead to: $d0 = COPY killed %0 being replaced by: $d0 = COPY killed %2.dsub which means the whole ZPR reg would be live upto the call, causing the MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register: str q0, [sp] // 16-byte Folded Spill smstop sm ldr z0, [sp] // 16-byte Folded Reload bl use_f64 which would be incorrect for two reasons: 1. The program may load more data than it has allocated. 2. If there are other SVE objects on the stack, the compiler might use the 'mul vl' addressing modes to access the spill location. By disabling the coalescing, we get the desired results: str d0, [sp, #8] // 8-byte Folded Spill smstop sm ldr d0, [sp, #8] // 8-byte Folded Reload bl use_f64	2024-01-31 09:04:13 +00:00
Chia	dc5dca1d01	[RISCV][Isel] Remove redundant vmerge for the scalable vwadd(u).wv (#80079 ) Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited. ### Code ``` define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) { %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer) %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64> %ret = add <vscale x 8 x i64> %sa, %y ret <vscale x 8 x i64> %ret } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/xsoa5xPrd) ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vmv.v.i v12, 0 vmerge.vvm v24, v12, v8, v0 vwadd.wv v8, v16, v24 ret ``` ### After this patch ``` vwadd_wv_mask_v8i32: li a0, 42 vsetvli a1, zero, e32, m4, ta, ma vmslt.vx v0, v8, a0 vsetvli zero, zero, e32, m4, tu, mu vwadd.wv v16, v16, v8, v0.t vmv8r.v v8, v16 ret ```	2024-01-31 17:11:07 +09:00
Changpeng Fang	3564666fe1	[AMDGPU]: Fix type signatures for wmma intrinsics, NFC (#80087 ) Make the wmma intrinsic type signatures to be canonical. We need a type signature as long as the type is not fixed. However, when an argument's type matches a previous argument's type, we do not need the signature for this argument. This patch fixes three general cases: 1. add missing signatures 2. remove signatures for matching arguments 3. reorer the signatures -- return type signature should always appear first	2024-01-30 23:17:35 -08:00
Craig Topper	8a98091162	[RISCV] Use disjoint flag in or_is_add.	2024-01-30 22:12:28 -08:00
Shengchen Kan	8e77390c06	[X86][CodeGen] Support folding memory broadcast in X86InstrInfo::foldMemoryOperandImpl (#79761 )	2024-01-31 12:51:03 +08:00
Oskar Wirga	ff4636a4ab	Refactor recomputeLiveIns to converge on added MachineBasicBlocks (#79940 ) This is a fix for the regression seen in https://github.com/llvm/llvm-project/pull/79498 > Currently, the way that recomputeLiveIns works is that it will recompute the livein registers for that MachineBasicBlock but it matters what order you call recomputeLiveIn which can result in incorrect register allocations down the line. Now we do not recompute the entire CFG but we do ensure that the newly added MBB do reach convergence.	2024-01-30 19:33:04 -08:00
Congcong Cai	c43fda3efc	Revert "[WebAssembly] avoid to use explicit disabled feature" This reverts commit 1a17f2beb9cd1f5bbaa64502ab5c02ff74c199a4.	2024-01-31 11:20:34 +08:00
Congcong Cai	1a17f2beb9	[WebAssembly] avoid to use explicit disabled feature In `CoalesceFeaturesAndStripAtomics`, feature string is converted to FeatureBitset and back to feature string. It will lose information about explicit diasbled features.	2024-01-31 11:14:40 +08:00
Billy Laws	c761b4a5e4	[AArch64] Fix variadic tail-calls on ARM64EC (#79774 ) ARM64EC varargs calls expect that x4 = sp at entry, special handling is needed to ensure this with tail calls since they occur after the epilogue and the x4 write happens before. I tried going through AArch64MachineFrameLowering for this, hoping to avoid creating the dummy object but this was the best I could do since the stack info that uses isn't populated at this stage, CreateFixedObject also explicitly forbids 0 sized objects.	2024-01-30 18:32:15 -08:00
PiJoules	a356e6ccad	[SelectionDAG] Expand fixed point multiplication into libcall (#79352 ) 32-bit ARMv6 with thumb doesn't support MULHS/MUL_LOHI as legal/custom nodes during expansion which will cause fixed point multiplication of _Accum types to fail with fixed point arithmetic. Prior to this, we just happen to use fixed point multiplication on platforms that happen to support these MULHS/MUL_LOHI. This patch attempts to check if the multiplication can be done via libcalls, which are provided by the arm runtime. These libcall attempts are made elsewhere, so this patch refactors that libcall logic into its own functions and the fixed point expansion calls and reuses that logic.	2024-01-30 13:58:55 -08:00
Vyacheslav Levytskyy	9e02e8f1a7	fix producing multiple identical opaque pointer types (#79060 ) This PR fixes https://github.com/llvm/llvm-project/issues/79057 and improves code generation for opaque pointers by replacing the culprit SPIRVGlobalRegistry::getOpTypePointer() call with a more appropriate SPIRVGlobalRegistry::getOrCreateSPIRVPointerType() call. The latter function works together with the `DuplicatesTracker` (`SPIRVGeneralDuplicatesTracker DT;` from `class SPIRVGlobalRegistry`) to trace existence of previous definitions of opaque pointers. This allows to produce just one `OpTypePointer` command for all identical opaque pointers definitions and to return the very same type record for subsequent `SPIRVGlobalRegistry::createSPIRVType()` invocations. This PR alone improves code generation by producing a single needed definition per all opaque pointers to i8 of the same address space instead of multiple identical definitions produced before the patch. From the root cause analysis of https://github.com/llvm/llvm-project/issues/79057 we see also that this PR resolves the problem of inconsistency between keeping multiple instruction for identical opaque pointer types and just a single record for all such instructions in the `DuplicatesTracker`, and so it also resolves the issue with crashes on creation of a struct with opaque pointer fields due to the fact that now such struct fields refer to the same operand `<id>` having a required record in the data structure used for dependencies analysis (see https://github.com/llvm/llvm-project/issues/79057).	2024-01-30 18:11:53 +01:00
Vyacheslav Levytskyy	39483797b8	prevent undefined behaviour of SPIR-V Backend non-asserts builds when dealing with token type (#78437 ) The goal of this PR is to fix the issue when use of token type in LLVM intrinsic causes undefined behavior of SPIR-V Backend code generator when assertions are disabled: https://github.com/llvm/llvm-project/issues/78434 Among possible fix options, discussed in the https://github.com/llvm/llvm-project/issues/78434 issue description, the option to generate a meaningful error before execution arrives at the `llvm_unreachable` call looks like a better solution for now, because SPIR-V doesn't support token type anyway without additional extensions. The PR is to generate a user-friendly error message and exit without generating a stack dump when such a usage of token type was detected that would lead to undefined behavior of SPIR-V Backend code generator.	2024-01-30 18:10:57 +01:00
Vyacheslav Levytskyy	b9d623105d	generate a name of an unnamed global variable for Instruction Selection (#78293 ) The goal of this PR is to fix the issue of global unnamed variables causing SPIR-V Backend code generation to crash: https://github.com/llvm/llvm-project/issues/78278 The reason for the crash is that GlobalValue's getGlobalIdentifier() would fail for unnamed global variable when trying to access the first character of the name (see lib/IR/Globals.cpp:150). This leads to assert in Debug and undefined behaviour in Release builds. The proposed fix generates a name of an unnamed global variable as __unnamed_<unsigned number>, in a style of similar existing LLVM implementation (see lib/IR/Mangler.cpp:131). A new class member variable is added into `SPIRVInstructionSelector` class to keep track of the number we give to anonymous global values to generate the same name every time when this is needed. The patch adds a new LIT test with the smallest implementation of reproducer ll code.	2024-01-30 18:09:52 +01:00
Florian Hahn	d1e162e5d9	[AArch64] Add custom lowering for load <3 x i8>. (#78632 ) Add custom combine to lower load <3 x i8> as the more efficient sequence below: ldrb wX, [x0, #2] ldrh wY, [x0] orr wX, wY, wX, lsl #16 fmov s0, wX At the moment, there are almost no cases in which such vector operations will be generated automatically. The motivating case is non-power-of-2 SLP vectorization: https://github.com/llvm/llvm-project/pull/77790	2024-01-30 14:04:27 +00:00
Florian Hahn	6251b6bd8d	[AArch64] Add tests with sext of vec3 loads. Another round of additional tests for https://github.com/llvm/llvm-project/pull/7863 with different sext/zext and use variants.	2024-01-30 13:21:51 +00:00
Shengchen Kan	e5054fb5c6	[X86][test] Update CodeGen/X86/popcnt.ll after #78545	2024-01-30 18:38:16 +08:00
XinWang10	1a219e989f	[X86] Support EVEX compression from MOVBErr to BSWAP (#79775 ) APX promoted MOVBE instructions were supported in #77431. The reg2reg variants of MOVBE are newly introduced by APX and can be optimized to BSWAP instruction when the 2 register operands are same. This patch adds manual entries for MOVBErr instructions when we do ndd to non-ndd compression #77731. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2024-01-30 16:57:51 +08:00
XinWang10	5910e34a2f	[X86][MC] Support encoding optimization & assembler relaxation about immediate operands for APX instructions (#78545 ) Encoding optimization: ``` mi/mi32 -> mi8 ri/ri32 -> ri8 ``` if the immediate operand is 8-bit wide. Assembler relaxation: ``` mi8 -> mi/mi32 ri8 -> ri/ri32 ``` If the immediate operand is a symbol expression and it's value is unknown.	2024-01-30 14:21:06 +08:00
Arthur Eubanks	198652a0ff	[X86] Treat __start_/__stop_ symbols as large (#79909 ) Followup to #79884. The linker adds __start_foo/__stop_foo symbols pointing to the beginning/end of the foo section. These can be far away from text, so treat them as large symbols under the medium/large code models. Performance to access these is almost certainly not important.	2024-01-29 21:00:16 -07:00
Liao Chunyu	45188c64db	[DAGCombiner] Use generalized pattern matcher in foldBoolSelectToLogic (#79101 ) support vp.select TODO: Possibly other functions could be supported, eg: SimplifySelect()	2024-01-30 10:26:51 +08:00
Justin Fargnoli	577738a12d	Revert "Disable incorrect peephole optimizations" (#79916 ) This reverts commit ff77058141e8026357ca514ad0d45c6c50921290.	2024-01-29 16:22:07 -08:00
Justin Fargnoli	ff77058141	Disable incorrect peephole optimizations	2024-01-29 15:54:40 -08:00
Jivan Hakobyan	0461448313	[RISCV][ISel] Add ISel support for experimental Zimop extension (#77089 ) This implements ISel support for mopr[0-31] and moprr[0-7] instructions for 32 and 64 bits --------- Co-authored-by: ln8-8 <lyut.nersisyan@gmail.com>	2024-01-29 15:24:00 -08:00
Craig Topper	7855703194	[RISCV] Move vp.splice tests into rvv directory. NFC	2024-01-29 15:01:52 -08:00
Arthur Eubanks	d6e07e0845	[X86] Treat __ehdr_start as large (#79884 ) The __ehdr_start symbol is added by the linker and points to the ELF file headers, which can be very far away from text. Treat it as a large symbol under the medium/large code models. Performance to access __ehdr_start is almost certainly not important. There are a couple of other symbols that the linker adds [1], but this is the most relevant one that may be far away from text. [1] `547c395b27/lld/ELF/Writer.cpp (L226)`	2024-01-29 14:25:40 -08:00
Joseph Huber	e633807a1f	[NVPTX] Add builtin support for 'globaltimer' (#79765 ) Summary: This patch adds support for `globaltimer` to match `clock` and `clock64`. See the PTX ISA reference for details. This patch does not implement the `hi` or `lo` variants for brevity as they can be obtained from this with the cost of an additional register. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#special-registers-globaltimer-globaltimer-lo-globaltimer-hi	2024-01-29 14:11:54 -06:00
Joseph Huber	ea8014046c	[NVPTX] Add builtin for 'exit' handling (#79777 ) Summary: The PTX ISA has always supported the 'exit' instruction to terminate individual threads. This patch adds a builtin to handle it. See the PTX documentation for further details. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-exit	2024-01-29 14:09:34 -06:00
Joseph Huber	5f12cc912a	[NVPTX] Add builtin support for 'nanosleep' PTX instrunction (#79888 ) Summary: This patch adds a builtin for the `nanosleep` PTX function. It takes either an immediate or a register and sleeps for [0, 2t] nanoseconds given t. More information at the documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-nanosleep	2024-01-29 14:07:58 -06:00
Joseph Huber	d492faa7aa	[NVPTX] Add 'activemask' builtin and intrinsic support (#79768 ) Summary: This patch adds support for getting the 'activemask' instruction's value without needing to use inline assembly. See the relevant PTX reference for details. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-activemask	2024-01-29 14:07:30 -06:00
Simon Pilgrim	3ab5dbb199	[X86] sext-i1.ll - replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-01-29 18:00:36 +00:00
Simon Pilgrim	2aef33230d	[X86] fast-isel-store.ll - cleanup check prefixes 32/64-bit triples and check prefixes were inverted, and missing unwind attribute to strip cfi noise	2024-01-29 18:00:35 +00:00
Simon Pilgrim	cbe5985ff7	[X86] Replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-01-29 16:50:32 +00:00
David Green	9520773c46	[AArch64] Don't generate neon integer complex numbers with +sve2. NFC (#79829 ) The condition for allowing integer complex number support could also allow neon fixed length complex numbers if +sve2 was specified. This tightens the condition to only allow integer complex number support for scalable vectors. We could generalize this in the future to generate SVE intrinsics for fixed-length vectors, but for the moment this opts for the simpler fix.	2024-01-29 16:46:22 +00:00
Alex Bradbury	d833b9d677	[RISCV] Graduate Zicond to non-experimental (#79811 ) The Zicond extension was ratified in the last few months, with no changes that affect the LLVM implementation. Although there's surely more tuning that could be done about when to select Zicond or not, there are no known correctness issues. Therefore, we should mark support as non-experimental.	2024-01-29 15:58:54 +00:00
Pierre van Houtryve	ce72f78f37	[AMDGPU] Fix mul combine for MUL24 (#79110 ) MUL24 can now return a i64 for i32 operands, but the combine was never updated to handle this case. Extend the operand when rewriting the ADD to handle it. Fixes SWDEV-436654	2024-01-29 16:37:20 +01:00
Simon Pilgrim	06f5b956a0	[X86] pmovsx-inreg.ll - replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-01-29 14:23:08 +00:00
Simon Pilgrim	ccb2810ee3	[X86] anyext.ll - replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-01-29 14:23:08 +00:00
Simon Pilgrim	8a074c84ff	[X86] fixup-bw-copy.ll - replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-01-29 14:23:08 +00:00
Simon Pilgrim	bc879a9019	[X86] mul-i256.ll - simplify function attributes and remove cfi noise	2024-01-29 13:19:47 +00:00
Simon Pilgrim	3a4a7dcd62	[X86] Replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-01-29 13:19:47 +00:00
Michal Paszkowski	0fbaf03f70	[SPIR-V] Cast ptr kernel args to i8* when used as Store's value operand (#78603 ) Handle a special case when StoreInst's value operand is a kernel argument of a pointer type. Since these arguments could have either a basic element type (e.g. float*) or OpenCL builtin type (sampler_t), bitcast the StoreInst's value operand to default pointer element type (i8). This pull request addresses the issue https://github.com/llvm/llvm-project/issues/72864	2024-01-28 19:30:14 -08:00
chuongg3	a7cfff8dc6	[AArch64][GlobalISel] Lower Shuffle Vector to REV (#79591 ) Add lowering for i16 and i32 vectors for Shuffle Vector instructions with REV mask	2024-01-28 20:35:02 +00:00
chuongg3	2c552d319a	[AArch64][GlobalISel] Legalize G_ABS for Larger/Smaller Vectors (#79117 ) Legalize G_ABS for larger/smaller width vectors with legal element sizes Fallsback for the smaller width vector tests because it is unable to legalize for G_ANYEXT smaller width vectors	2024-01-28 20:21:38 +00:00
David Green	915c3d9e5a	Revert "[AArch64] merge index address with large offset into base address" This reverts commit 32878c2065c8005b3ea30c79e16dfd7eed55d645 due to #79756 and #76202.	2024-01-28 17:01:21 +00:00
Shengchen Kan	6d7c8a6e06	[X86][test] Update failed tests in 60dbb2cec1bbf65aacf6752a59b0666a23aaa3ae after rebase	2024-01-29 00:32:30 +08:00

... 18 19 20 21 22 ...

52796 Commits