llvm-project

Author	SHA1	Message	Date
Fangrui Song	9e9907f1cf	[AMDGPU,test] Change llc -march= to -mtriple= (#75982 ) Similar to 806761a7629df268c8aed49657aeccffa6bca449. For IR files without a target triple, -mtriple= specifies the full target triple while -march= merely sets the architecture part of the default target triple, leaving a target triple which may not make sense, e.g. amdgpu-apple-darwin. Therefore, -march= is error-prone and not recommended for tests without a target triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it outrightly. This patch changes AMDGPU tests to not rely on the default OS/environment components. Tests that need fixes are not changed: ``` LLVM :: CodeGen/AMDGPU/fabs.f64.ll LLVM :: CodeGen/AMDGPU/fabs.ll LLVM :: CodeGen/AMDGPU/floor.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll LLVM :: CodeGen/AMDGPU/fneg-fabs.ll LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll LLVM :: CodeGen/AMDGPU/schedule-if-2.ll ```	2024-01-16 21:54:58 -08:00
Phoebe Wang	8d6e82d501	[X86] Use vXi1 for `k` constraint in inline asm (#77733 ) Fixes #77172	2024-01-17 11:40:32 +08:00
Phoebe Wang	9745c13ca8	[X86][BF16] Improve float -> bfloat lowering under AVX512BF16 and AVXNECONVERT (#78042 )	2024-01-17 10:09:26 +08:00
Florian Mayer	f3190c78ec	Revert "[AMDGPU] Sign extend simm16 in setreg intrinsic" (#78372 ) Reverts llvm/llvm-project#77997 Broke UBSan bots.	2024-01-16 16:37:48 -08:00
Rahman Lavaee	e1616ef9d7	[BasicBlockSections] Always keep the entry block in the beginning of the function. (#74696 ) BasicBlockSections must enforce placing the entry block at the beginning of the function regardless of the basic block sections profile.	2024-01-16 14:15:33 -08:00
mmoadeli	aa23e493f2	[NVPTX] Fix generating permute bytes from register pair when the initial values are undefined (#74437 ) When generating the permute bytes for the prmt instruction, the existence of an undefined initial value initialises the int32 that holds the mask with all 1's (0xFFFFFFFF). That initialization subsequently leads to complications during the subsequent OR operation, leading to inaccuracies in populating mask values for the following bytes. Consequently, the final value persists as a constant -1, irrespective of the actual mask values that succeed the initial set value.	2024-01-16 11:05:41 -08:00
Simon Pilgrim	d6ee91b110	[X86] Add test case for Issue #77805	2024-01-16 17:23:45 +00:00
Stanislav Mekhanoshin	371fdbaa57	[AMDGPU] Sign extend simm16 in setreg intrinsic (#77997 ) We currently force users to use a negative contant in the intrinsic call. Changing it zext would break existing programs, so just sign extend an argument.	2024-01-16 09:17:18 -08:00
Craig Topper	7fe5269b54	[RISCV] Bump Zfbfmin, Zvfbfmin, and Zvfbfwma to 1.0. (#78021 )	2024-01-16 08:42:21 -08:00
Simon Pilgrim	662d1cb86b	[X86] Add test case for Issue #78109	2024-01-16 16:33:21 +00:00
Koakuma	118d4234ac	[SPARC] Prefer RDPC over CALL to implement GETPCX for 64-bit target On 64-bit target, prefer using RDPC over CALL to get the value of %pc. This is faster on modern processors (Niagara T1 and newer) and avoids polluting the processor's predictor state. The old behavior of using a fake CALL is still done when tuning for classic UltraSPARC processors, since RDPC is much slower there. A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT loads, and about 7% speedup on INSERT/UPDATE loads. Reviewed By: @s-barannikov Github PR: https://github.com/llvm/llvm-project/pull/78280	2024-01-16 22:46:39 +07:00
Luke Lau	93d39657f5	[RISCV] Remove -riscv-v-vector-bits-min flag that was left behind. NFC This should have been removed in 74f985b793bf4005e49736f8c2cef8b5cbf7c1ab	2024-01-16 21:30:32 +07:00
Wang Pengcheng	3ac9fe69f7	[RISCV] CodeGen of RVE and ilp32e/lp64e ABIs (#76777 ) This commit includes the necessary changes to clang and LLVM to support codegen of `RVE` and the `ilp32e`/`lp64e` ABIs. The differences between `RVE` and `RVI` are: * `RVE` reduces the integer register count to 16(x0-x16). * The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits. `RVE` can be combined with all current standard extensions. The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are: * Only 6 integer argument registers (rather than 8). * Only 2 callee-saved registers (rather than 12). * A Stack Alignment of 32bits (rather than 128bits). * ilp32e isn't compatible with D ISA extension. If `ilp32e` or `lp64` is used with an ISA that has any of the registers x16-x31 and f0-f31, then these registers are considered temporaries. To be compatible with the implementation of ilp32e in GCC, we don't use aligned registers to pass variadic arguments and set stack alignment\ to 4-bytes for types with length of 2*XLEN. FastCC is also supported on RVE, while GHC isn't since there is only one avaiable register. Differential Revision: https://reviews.llvm.org/D70401	2024-01-16 20:44:30 +08:00
Shengchen Kan	b1eaffd389	[X86][test] Add test for lowering NDD AND We supported encoding/decoding for APX AND in #76319 This test should be added in #77564 but was missing.	2024-01-16 20:09:40 +08:00
Sander de Smalen	289999bad7	[Clang] Make sdot builtins available to SME (#77792 ) See the specification for more details: * https://github.com/ARM-software/acle/blob/main/main/acle.md#udot-sdot-fdot-vectors * https://github.com/ARM-software/acle/blob/main/main/acle.md#udot-sdot-fdot-indexed	2024-01-16 10:32:30 +00:00
Alfie Richards	60c775769b	[ARM] Add missing earlyclobber to sqrshr and uqrshl instructions. (#77782 ) This avoids possible undefined behavior using the same register for Rm and Rda. Additionally adds a check in MC to produce an error upon parsing this case.	2024-01-16 10:30:16 +00:00
David Green	1074b94f5d	[ARM] Fix phi operand order issue in MVEGatherScatterLowering (#78208 ) With commuted operands on the phi node, the two old incoming values could be removed in the wrong order, removing newly added operand instead of the old one.	2024-01-16 10:15:05 +00:00
Pierre van Houtryve	4b0a76a3d7	[GlobalISel] Fix buildCopyFromRegs for split vectors (#77448 ) Fixes #77055	2024-01-16 10:04:20 +01:00
Matt Arsenault	480cc413b7	AMDGPU/GlobalISel: Handle inreg arguments as SGPRs (#78123 ) This is the missing GISel part of 54470176afe20b16e6b026ab989591d1d19ad2b7	2024-01-16 15:13:31 +07:00
Alex Bradbury	84f7fb6217	[MachineScheduler] Add option to control reordering for store/load clustering (#75338 ) Reordering based on the sort order of the MemOpInfo array was disabled in <https://reviews.llvm.org/D72706>. However, it's not clear this is desirable for al targets. It also makes it more difficult to compare the incremental benefit of enabling load clustering in the selectiondag scheduler as well was the machinescheduler, as the sdag scheduler does seem to allow this reordering. This patch adds a parameter that can control the behaviour on a per-target basis. Split out from #73789.	2024-01-16 07:17:41 +00:00
Luke Lau	286a366d05	[RISCV] Remove vmv.s.x and vmv.x.s lmul pseudo variants (#71501 ) vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX and PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR register class (just like the actual instruction), so we now only have TableGen patterns for vectors of LMUL <= 1. We now rely on the existing combines that shrink LMUL down to 1 for vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines later and just inserting the nodes with the correct type in a later patch. The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no longer carries any information about LMUL, so if it's the only vector pseudo instruction in a block then it now defaults to LMUL=1.	2024-01-16 13:36:24 +07:00
Shilei Tian	d63c2e52e6	[AMDGPU][MC] Remove incorrect `_e32` suffix from `v_dot2c_f32_f16` and `v_dot4c_i32_i8` (#77993 ) The two VOP2 instructions cannot be encoded as VOP3. Fix #54691.	2024-01-15 23:11:50 -05:00
Michal Paszkowski	59e5cb7b83	[SPIR-V] Do not emit spv_ptrcast if GEP result is of expected type (#78122 ) Prior to this change spv_ptrcast (and OpBitcast) was never emitted for GEP resulting pointers. While such SPIR-V was (mostly) accepted by the NEO GPU driver, the generated SPIR-V was incorrect. The newly added test (pointers/getelementptr-bitcast-load.ll) verifies that a correct bitcast is added for more complex cases and passes spirv-val. The test is based on an OpenCL CTS test (basic/prefetch).	2024-01-15 19:56:11 -08:00
Amara Emerson	eb009ed249	[GlobalISel] Fix the select->minmax combine from trying to operate on pointer types.	2024-01-15 18:20:18 -08:00
David Green	6719a5a3f6	[ARM] Extra test for MVE gather optimization with commuted phi operands. NFC	2024-01-15 19:28:55 +00:00
Jonas Paulsson	1d1893097a	[SystemZ] Don't use FP Load and Test as comparisons to same reg (#78074 ) The usage of FP Load and Test instructions as a comparison against zero with the assumption that the dest reg will always reflect the source reg is actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the instruction may actually change the value as opposed to being a pure register move with a test. This patch - changes instruction selection to always emit FP LT with a scratch def reg, which will typically be allocated to the same reg if dead. - Removes the conversions into FP LT in SystemZElimcompare.	2024-01-15 19:36:40 +01:00
chuongg3	927b8a0f4f	[AArch64][GlobalISel] Combine vecreduce(ext) to {U/S}ADDLV (#75832 )	2024-01-15 18:26:27 +00:00
Jay Foad	ba131b7017	[AMDGPU] Do not generate s_set_inst_prefetch_distance for GFX12 (#78190 ) GFX12 can still encode the s_set_inst_prefetch_distance instruction but it has no effect.	2024-01-15 18:20:45 +00:00
Jay Foad	ed60cb8fb9	[AMDGPU] Disable hasVALUPartialForwardingHazard for GFX12 (#78188 )	2024-01-15 18:20:10 +00:00
Jay Foad	85705bbf1d	[AMDGPU] Disable hasVALUMaskWriteHazard for GFX12 (#78187 )	2024-01-15 18:19:32 +00:00
Jonas Paulsson	e2ce91f48c	Fix test output for 3b16d8c	2024-01-15 12:04:00 -06:00
Tuan Chuong Goh	22c24be37c	[AArch64][GlobalISel] Pre-commit for Combine vecreduce(ext) to {U/S}ADDLV	2024-01-15 17:54:52 +00:00
chuongg3	fcfe1b6482	[GlobalISel] Refactor extractParts() (#75223 ) Moved extractParts() and extractVectorParts() from LegalizerHelper to Utils to be able to use it in different passes. extractParts() will also try to use unmerge when doing irregular splits where possible, falling back to extract elements when not.	2024-01-15 16:40:39 +00:00
Dávid Ferenc Szabó	0ff3d729f9	[GlobalISel] Make IRTranslator able to handle PHIs with empty types. (#73235 ) SelectionDAG already handle this since e53b7d1a11d180ed7b33190a837d8898ab2a0b71.	2024-01-15 23:26:30 +07:00
Hans Wennborg	677ced8af2	Require asserts for llvm/test/CodeGen/PowerPC/fence.ll	2024-01-15 17:25:49 +01:00
Jonas Paulsson	3b16d8c8ea	[SystemZ] Don't crash on undef source in shouldCoalesce() (#78056 ) SystemZRegisterInfo::shouldCoalesce() has to be able to handle an undef source.	2024-01-15 17:24:38 +01:00
Jay Foad	f3d07881c8	[AMDGPU] Remove functions with incompatible gws attribute (#78143 ) This change is to remove incompatible gws related functions in order to make device-libs work correctly under -O0 for gfx1200+ Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>	2024-01-15 16:23:39 +00:00
Amara Emerson	c32d02efd2	[AArch64][GlobalISel] Fix not extending GPR32->GPR64 result of anyext indexed load. Was causing assertions to fail.	2024-01-15 08:22:39 -08:00
Nathan Gauër	0e1037edbf	[SPIR-V] Strip convergence intrinsics before ISel (#75948 ) The structurizer will require the frontend to emit convergence intrinsics. Once uses to restructurize the control-flow, those intrinsics shall be removed, as they cannot be converted to SPIR-V. This commit adds a new pass to the SPIR-V backend which strips those intrinsics. Those 2 new steps are not limited to Vulkan as OpenCL could also benefit from not crashing if a convertent operation is in the IR (even though the frontend doesn't generate such intrinsics). Signed-off-by: Nathan Gauër <brioche@google.com>	2024-01-15 11:35:35 +01:00
Nikita Popov	87bc91d425	[PowerPC] Fix shuffle combine with undef elements (#77787 ) This custom DAG combine works on a shuffle where one source vector is a zero splat, which means we can adjust the shuffle indices to refer to any element of the splat -- as long as we stay in the same vector. In the case where an undef (-1) index into the non-splat vector was used, we ended up adjusting the splat index to -1+NumElements, which points into the wrong vector. Fix this by using the first element from the splat if the other one is undef. There are four cases this theoretically affects, but in practice I only managed to demonstrate a miscompile with one of them. I think two of theses are effectively dead due to the operand canonicalization at the start of the transform. Fixes https://github.com/llvm/llvm-project/issues/77748.	2024-01-15 10:12:33 +01:00
Qiu Chaofan	ce1f9465b0	[NFC] Pre-commit case of ppcf128 extractelt soften	2024-01-15 15:27:36 +08:00
Luke Lau	3b7abf38fb	[RISCV] Add disjoint flag to or ops in RISCVGatherScatterLowering tests. NFC InstCombine will add the disjoint flag to these or instructions. This patch adds them to the tests so that it matches the input RISCVGatherScatterLowering will receive in practice, allowing us to rely on said disjoint flag: https://github.com/llvm/llvm-project/pull/77800#discussion_r1449231844	2024-01-15 14:09:27 +07:00
Luke Lau	0cf768e7f1	[RISCV] Handle disjoint or in RISCVGatherScatterLowering (#77800 ) This patch adds support for the disjoint flag in the non-recursive case, as well as adding an additional check for it in the recursive case. Note that haveNoCommonBitsSet should be equivalent to having the disjoint flag set, and the check can be removed in a follow-up patch. Co-authored-by: Philip Reames <preames@rivosinc.com> --------- Co-authored-by: Philip Reames <preames@rivosinc.com>	2024-01-15 13:37:09 +07:00
Luke Lau	c07a1fe7b4	[RISCV] Lower vfmv.s.f intrinsics to VFMV_S_F_VL first (#76699 ) Currently vfmv.s.f intrinsics are directly selected to their pseudos via a tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their corresponding VL SDNode, then get selected from a pattern in RISCVInstrInfoVVLPatterns.td This patch brings vfmv.s.f inline with the other move instructions. Split out from #71501, where we did this to preserve the behaviour of selecting vmv_s_x for VFMV_S_F_VL for small enough immediates.	2024-01-15 12:07:29 +07:00
Qiu Chaofan	85071a3c74	[PowerPC] Implement fence builtin (#76495 )	2024-01-15 11:19:16 +08:00
Nicholas Mosier	49138d97c0	[X86] Fix SLH crash on llvm.eh.sjlh.longjmp (#77959 ) Fix #60081.	2024-01-14 12:03:18 +08:00
Heejin Ahn	d871f40deb	[WebAssembly] Use DebugValueManager only when subprogram exists (#77978 ) We previously scanned the whole BB for `DBG_VALUE` instruction even when the program doesn't have debug info, i.e., the function doesn't have a subprogram associated with it, which can make compilation unnecessarily slow. This disables `DebugValueManager` when a `DISubprogram` doesn't exist for a function. This only reduces unnecessary work in non-debug mode and does not change output, so it's hard to add a test to test this behavior. Test changes were necessary because their `DISubprogram`s were not correctly linked with the functions, so with this PR the compiler incorrectly assumed the functions didn't have a subprogram and the tests started to fail. Fixes https://github.com/emscripten-core/emscripten/issues/21048.	2024-01-13 14:55:54 -08:00
Durgadoss R	8d817f6479	[LLVM][NVPTX]: Add aligned versions of cluster barriers (#77940 )	2024-01-13 10:41:19 +01:00
Usman Nadeem	792fa23c1b	[AArch64][SVE2] Lower OR to SLI/SRI (#77555 ) Code builds on NEON code and the tests are adapted from NEON tests minus the tests for illegal types.	2024-01-12 11:23:56 -08:00
Min-Yih Hsu	2f2217a8f7	[RISCV] Add missing tests for inttoptr/ptrtoint on scalable vectors (#77857 ) Add missing tests for inttoptr/ptrtoint on scalable vectors. Previously we only had inttoptr/ptrtoint tests for fixed vectors.	2024-01-12 09:52:07 -08:00

... 23 24 25 26 27 ...

52796 Commits