llvm-project

Author	SHA1	Message	Date
Sami Tolvanen	83835e22c7	[RISCV] Implement KCFI operand bundle lowering With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits "kcfi" operand bundles to indirect call instructions. Similarly to the target-specific lowering added in D119296, implement KCFI operand bundle lowering for RISC-V. This patch disables the generic KCFI pass for RISC-V in Clang, and adds the KCFI machine function pass in `RISCVPassConfig::addPreSched` to emit target-specific `KCFI_CHECK` pseudo instructions before calls that have KCFI operand bundles. The machine function pass also bundles the instructions to ensure we emit the checks immediately before the calls, which is not possible with the generic pass. `KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a contiguous code sequence that traps if the expected hash in the operand bundle doesn't match the hash before the target function address. This patch emits an `ebreak` instruction for error handling to match the Linux kernel's `BUG()` implementation. Just like for X86, we also emit trap locations to a `.kcfi_traps` section to support error handling, as we cannot embed additional information to the trap instruction itself. Relands commit 62fa708ceb027713b386c7e0efda994f8bdc27e2 with fixed tests. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D148385	2023-06-23 22:57:56 +00:00
Sami Tolvanen	e809ebeb6c	Revert "[RISCV] Implement KCFI operand bundle lowering" This reverts commit 62fa708ceb027713b386c7e0efda994f8bdc27e2. Reverting to investigate -verify-machineinstrs errors in MIR tests.	2023-06-23 21:42:57 +00:00
xortoast	bb648c9177	[WebAssembly] Add lowering for llvm.rint and llvm.roundeven WebAssembly doesn't expose inexact exceptions, so frint can be mapped to fnearbyint. Likewise, WebAssembly always rounds ties-to-even, so froundeven can be mapped to fnearbyint. Differential Revision: https://reviews.llvm.org/D153451	2023-06-23 14:07:11 -07:00
Amara Emerson	1ec30106a5	Darwin: Use the GOT to reference ___stack_chk_guard. e018cbf7208b changed the default behaviour for Darwin, and this breaks some existing software. rdar://110350601	2023-06-23 14:05:40 -07:00
Sami Tolvanen	62fa708ceb	[RISCV] Implement KCFI operand bundle lowering With `-fsanitize=kcfi` (Kernel Control-Flow Integrity), Clang emits "kcfi" operand bundles to indirect call instructions. Similarly to the target-specific lowering added in D119296, implement KCFI operand bundle lowering for RISC-V. This patch disables the generic KCFI pass for RISC-V in Clang, and adds the KCFI machine function pass in `RISCVPassConfig::addPreSched` to emit target-specific `KCFI_CHECK` pseudo instructions before calls that have KCFI operand bundles. The machine function pass also bundles the instructions to ensure we emit the checks immediately before the calls, which is not possible with the generic pass. `KCFI_CHECK` instructions are lowered in `RISCVAsmPrinter` to a contiguous code sequence that traps if the expected hash in the operand bundle doesn't match the hash before the target function address. This patch emits an `ebreak` instruction for error handling to match the Linux kernel's `BUG()` implementation. Just like for X86, we also emit trap locations to a `.kcfi_traps` section to support error handling, as we cannot embed additional information to the trap instruction itself. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D148385	2023-06-23 18:25:24 +00:00
Artem Belevich	60941f1d28	[NVPTX] Lower v2f16 and v2bf16 stores as 32-bit scalars. This avoids unnecessary vector splitting that was needed for vectorized store instruction. Differential Revision: https://reviews.llvm.org/D152593	2023-06-23 10:58:44 -07:00
Artem Belevich	7e5d7d208f	[NVPTX] Correctly lower extending loads for fp16 vectors. Fixes https://github.com/llvm/llvm-project/issues/63436 Improves lowering of extending FP vector loads. We were previously splitting them unnecessarily. Differential Revision: https://reviews.llvm.org/D153477	2023-06-23 10:45:49 -07:00
Fangrui Song	f9fd0062b6	[XRay][AArch64] Suppport __xray_customevent/__xray_typedevent `__xray_customevent` and `__xray_typedevent` are built-in functions in Clang. With -fxray-instrument, they are lowered to intrinsics llvm.xray.customevent and llvm.xray.typedevent, respectively. These intrinsics are then lowered to TargetOpcode::{PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL}. The target is responsible for generating a code sequence that calls either `__xray_CustomEvent` (with 2 arguments) or `__xray_TypedEvent` (with 3 arguments). Before patching, the code sequence is prefixed by a branch instruction that skips the rest of the code sequence. After patching (compiler-rt/lib/xray/xray_AArch64.cpp), the branch instruction becomes a NOP and the function call will take effects. This patch implements the lowering process for {PATCHABLE_EVENT_CALL,PATCHABLE_TYPED_EVENT_CALL} and implements the runtime. ``` // Lowering of PATCHABLE_EVENT_CALL .Lxray_sled_N: b #24 stp x0, x1, [sp, #-16]! x0 = reg of op0 x1 = reg of op1 bl __xray_CustomEvent ldrp x0, x1, [sp], #16 ``` As a result, two updated tests in compiler-rt/test/xray/TestCases/Posix/ now pass on AArch64. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D153320	2023-06-23 09:24:18 -07:00
Ties Stuij	5ddd561cb5	disable execute-only tests which are failing with expensive checks Temporarily disabling the execute-only tests. We recently added codegen for armv6-m, which is still in heavy development (D152795). Disabling the tests while we're figuring out what's going on is probably the least disruptive option, as a patch dependent on it also already landed.	2023-06-23 16:35:24 +01:00
David Green	589c940eb3	[DAG] Fix and expand fmin/fmax reassociation fold. This call to reassociateReduction is used by both fminnum/fmaxnum and fminimum/fmaximum. In adding support for fminimum/fmaximum we appear to be fixing the use of an incorrect reduction type, which should have only applied to minnum/maxnum. I also believe that it doesn't need nsz and reassoc to perform the reassociation. For float min/max it should always be valid. Differential Revision: https://reviews.llvm.org/D153247	2023-06-23 14:45:14 +01:00
Alex Bradbury	690b1c847f	[RISCV] Implement support for bf16 truncate/extend on hard FP targets For the same reasons as D151284, this requires custom lowering of the truncate libcall on hard float ABIs (the normal libcall code path is used on soft ABIs). The extend operation is implemented by a shift just as in the standard legalisation, but needs to be custom lowered because i32 isn't a legal type on RV64. This patch aims to make the minimal changes that result in correct codegen for the bfloat.ll tests. Differential Revision: https://reviews.llvm.org/D151663	2023-06-23 14:18:59 +01:00
Matt Arsenault	2449931b01	AMDGPU: Don't use old form of fneg in some tests	2023-06-23 09:11:06 -04:00
Matt Arsenault	c56e4a8c42	AMDGPU: Modernize exp codegen tests Find and replace on the new log tests (plus <3 x half> which was missing). Apparently exp10 never worked.	2023-06-23 09:11:06 -04:00
Matt Arsenault	89ccfa1b39	AMDGPU: Use correct lowering for llvm.log2.f32 We previously directly codegened to v_log_f32, which is broken for denormals. The lowering isn't complicated, you simply need to scale denormal inputs and adjust the result. Note log and log10 are still not accurate enough, and will be fixed separately.	2023-06-23 08:37:37 -04:00
Ivan Kosarev	813f6a495b	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 12. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152905	2023-06-23 13:33:06 +01:00
Matt Arsenault	089f652f17	AMDGPU: Add more log vector tests	2023-06-23 08:28:42 -04:00
Ivan Kosarev	9435942447	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 10. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152903	2023-06-23 12:24:52 +01:00
Michael Platings	041ffc155f	[Clang][Driver] Warn on invalid Arm or AArch64 baremetal target triple A common user mistake is specifying a target of aarch64-none-eabi or arm-none-elf whereas the correct names are aarch64-none-elf & arm-none-eabi. Currently if a target of aarch64-none-eabi is specified then the Generic_ELF toolchain is used, unlike aarch64-none-elf which will use the BareMetal toolchain. This is unlikely to be intended by the user so issue a warning that the target is invalid. The target parser is liberal in what input it accepts so invalid triples may yield behaviour that's sufficiently close to what the user intended. Therefore invalid triples were used in many tests. This change updates those tests to use valid triples. One test (gnu-mcount.c) relies on the Generic_ELF toolchain behaviour so change it to explicitly specify aarch64-unknown-none-gnu as the target. Reviewed By: peter.smith, DavidSpickett Differential Revision: https://reviews.llvm.org/D153430	2023-06-23 11:54:29 +01:00
David Green	71ac2a8e23	[AArch64] Add tests for double reducts of vector.reduce.fmaximum/fminimum. NFC Including some tests with mixed minnum/minimum reductions and removing the fast from fmin/fmax reductions as those should not be needed.	2023-06-23 11:28:09 +01:00
Igor Kirillov	04a8070b46	Revert "Revert "[CodeGen] Extend reduction support in ComplexDeinterleaving pass to support predication"" Adds the capability to recognize SelectInst that appear in the IR. These instructions are generated during scalable vectorization for reduction and when the code contains conditions inside the loop body or when "-prefer-predicate-over-epilogue=predicate-dont-vectorize" is set. Differential Revision: https://reviews.llvm.org/D152558 This reverts commit ab09654832dba5cef8baa6400fdfd3e4d1495624. Reason: Reapplying after removing unnecessary default case in switch expression.	2023-06-23 10:13:22 +00:00
Ties Stuij	2273741ea2	[ARM] generate armv6m eXecute Only (XO) code [ARM] generate armv6m eXecute Only (XO) code for immediates, globals Previously eXecute Only (XO) support was implemented for targets that support MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449 XO prevents the compiler from generating data accesses to code sections. This patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and must resort to the following general pattern to avoid loads: movs r3, :upper8_15:foo lsls r3, #8 adds r3, :upper0_7:foo lsls r3, #8 adds r3, :lower8_15:foo lsls r3, #8 adds r3, :lower0_7:foo ldr r3, [r3] This is equivalent to the code pattern generated by GCC. The above relocations are new to LLVM and have been implemented in a parent patch: https://reviews.llvm.org/D149443. This patch limits itself to implementing codegen for this pattern and enabling XO for armv6-M in the backend. Separate patches will follow for: - switch tables - replacing specific loads from constant islands which are spread out over the ARM backend codebase. Amongst others: FastISel, call lowering, stack frames. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D152795	2023-06-23 10:50:47 +01:00
Jolanta Jensen	c5ed93f975	[SVE ACLE] Remove DAG combines that are no longer relevant. This patch removes DAG combines that are no longer relevant because equivalent IR combines have been added. Differential Revision: https://reviews.llvm.org/D153445	2023-06-23 09:09:07 +00:00
Dhruv Chawla	3f77724de7	[TargetLowering] Better code generation for ISD::SADDSAT/SSUBSAT when operand sign is known When the sign of either of the operands is known, it is possible to determine what the saturating value will be without having to compute it using the sign bits. Differential Revision: https://reviews.llvm.org/D153575	2023-06-23 13:20:36 +05:30
Dhruv Chawla	911df1e8dd	[AArch64] Pre-commit test for D153575	2023-06-23 13:08:26 +05:30
Amaury Séchet	34d8c5b9ce	[DAG] Peek through trunc when combining select into shifts. This fixes a regression in D127115 Depends on D127115 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D151916	2023-06-23 00:35:39 +00:00
Sheng	65b710efc1	[m68k] Fix incorrect handling of TLS when matching addressing mode. `TargetGlobalTLSAddress` is not considered and handled correctly when matching addressing mode, which leads to an incorrect result of instruction selection. fixes #63162. Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D153103	2023-06-23 08:30:53 +08:00
Vitaly Buka	ab09654832	Revert "[CodeGen] Extend reduction support in ComplexDeinterleaving pass to support predication" ComplexDeinterleavingPass.cpp:1849:3: error: default label in switch which covers all enumeration values This reverts commit 116953b82130df1ebd817b3587b16154f659c013.	2023-06-22 11:29:38 -07:00
Fangrui Song	bef8294650	[XRay] Make xray_instr_map compatible with Mach-O The `__DATA,xray_instr_map` section has label differences like `.quad Lxray_sled_0-Ltmp0` that is represented as a pair of UNSIGNED and SUBTRACTOR relocations. LLVM integrated assembler attempts to rewrite A-B into A-B'+offset where B' can be included in the symbol table. B' is called an atom and should be a non-temporary symbol in the same section. However, since `xray_instr_map` does not define a non-temporary symbol, the SUBTRACTOR relocation will have no associated symbol, and its `r_extern` value will be 0. Therefore, we will see linker errors like: error: SUBTRACTOR relocation must be extern at offset 0 of __DATA,xray_instr_map in a.o To fix this issue, we need to define a non-temporary symbol in the section. We can accomplish this by renaming `Lxray_sleds_start0` to `lxray_sleds_start0` ("L" to "l"). `lxray_sleds_start0` serves as the atom for this dead-strippable subsection. With the `S_ATTR_LIVE_SUPPORT` attribute, `ld -dead_strip` will retain subsections that reference live functions. Special thanks to Oleksii Lozovskyi for reporting the issue and providing initial analysis. Differential Revision: https://reviews.llvm.org/D153239	2023-06-22 10:03:17 -07:00
Igor Kirillov	116953b821	[CodeGen] Extend reduction support in ComplexDeinterleaving pass to support predication Adds the capability to recognize SelectInst that appear in the IR. These instructions are generated during scalable vectorization for reduction and when the code contains conditions inside the loop body or when "-prefer-predicate-over-epilogue=predicate-dont-vectorize" is set. Differential Revision: https://reviews.llvm.org/D152558	2023-06-22 16:49:40 +00:00
Craig Topper	08f1aa8728	[RISCV] Move Zca/Zcb/Zcd/Zcf/Zcmp/Zcmt out of experimental status. According to https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions these were ratified in April 2023. Reviewed By: VincentWu Differential Revision: https://reviews.llvm.org/D153161	2023-06-22 09:22:58 -07:00
Paul Kirth	3ea8f25265	[RISCV] Strengthen atomic ordering for sequentially consistent stores This is a similar change to one proposed for GCC: https://inbox.sourceware.org/gcc-patches/20230414170942.1695672-1-patrick@rivosinc.com/ The changes in this patch are based on the proposal by Hans Boehm to more closely match the intended semantics for sequentially consistent stores and to allow some platforms to avoid an ABI break when switching to more performant atomic instructions. Platforms that have already compiled code using the existing mappings will also have more time to gradually replace that code in preparation of the switch. Further details can be found in the psABI proposal: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/378. This patch implements a mapping that is stronger than the one outlined in table A.6 of the RISC-V unprivileged spec to be future compatible with table A.7 of the same document. The related discussion can be found at https://lists.riscv.org/g/tech-unprivileged/topic/risc_v_memory_model_topics/92916241 The major change to RISC-V code generation is that we will now emit a trailing fence for sequentially consistent stores. The new code sequence should have the following form: ``` fence rw,w; s{b\|h\|w\|d}; fence rw,rw; ``` Other changes and optimizations like using amoswap will be handled separately. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D149486	2023-06-22 15:42:17 +00:00
Nikita Popov	81ec494c36	[SDAGBuilder] Handle multi-part arguments in argument copy elision (PR63430) When eliding an argument copy, we need to update the chain to ensure the argument reads are performed before later writes. However, the code doing this only handled this for the first part of the argument. If the argument had multiple parts, the chains of the later parts were dropped. Make sure we preserve all chains. Fixes https://github.com/llvm/llvm-project/issues/63430.	2023-06-22 17:04:56 +02:00
Jay Foad	c85923190f	[AMDGPU] Regenerate some checks	2023-06-22 13:28:30 +01:00
David Green	68a09c9290	[AArch64] Remove G_VECREDUCE_FADD from selectReduction I believe that for fp reductions we can use the imported tablegen patterns for selection, as opposed to going via selectReduction. Integer reductions are more difficult, as the return types in selection DAG will be promoted to i32. Differential Revision: https://reviews.llvm.org/D153244	2023-06-22 12:46:54 +01:00
Pravin Jagtap	597fb7fb46	[AMDGPU] Switch to the new cl option amdgpu-atomic-optimizer-strategy. Atomic optimizer is turned on by default through D152649. This patch removes the usage of old command line option amdgpu-atomic-optimizations and transfer the responsibility to `amdgpu-atomic-optimizer-strategy`. We can safely remove old option when LLPC remove its all usage. Reviewed By: foad, arsenm, #amdgpu, cdevadas Differential Revision: https://reviews.llvm.org/D153007	2023-06-22 07:06:42 -04:00
Matt Arsenault	18b93562cf	DAG: Expand legalization of is.fpclass to fcmp for DAZ Try to use a compare with 0 if DAZ is assumed. FPClassTest really needs to be marked as a bimask enum, but the API for that is currently broken.	2023-06-22 06:18:02 -04:00
Simon Pilgrim	411deb97cf	[DAG] ScalarizeVectorResult - add ISD::MULHS/ISD::MULHU handling Fixes #63439	2023-06-22 11:09:55 +01:00
Ivan Kosarev	e67288fa5a	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 14. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152907	2023-06-22 10:52:16 +01:00
Ivan Kosarev	3bf0041dd9	[AMDGPU][GFX11] Add test coverage for 16-bit conversions, part 13. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D152906	2023-06-22 10:42:59 +01:00
David Green	31c8901f5c	[AArch64] Mark vecreduce_fminimum and vecreduce_fmaximum as legal This adds some simple lowering of vecreduce_fminimum and vecreduce_fmaximum to fminv/fmaxv instructions, in the same way that vecreduce_fmax / vecreduce_fmin is lowered to fminnmv / fmaxnmv. Differential Revision: https://reviews.llvm.org/D153246	2023-06-22 09:52:02 +01:00
Matt Arsenault	92ee60b66f	AMDGPU: Drop and upgrade llvm.amdgcn.atomic.inc/dec to atomicrmw	2023-06-21 21:20:26 -04:00
David Green	400b3c47c2	[ARM] Repair check lines in sub-cmp-peephole.ll test. NFC Commit ec77747fbdca901e0fded58f940dae62e0f6b726 regenerated the check lines without being very careful about which lines were updated. This attempts to fix them to make sure the V7 and V8 lines are emitted as needed.	2023-06-21 22:47:30 +01:00
Guozhi Wei	1bcb6a3da2	[MBP] Enable duplicating return block to remove jump to return Sometimes LLVM generates branch to return instruction, like PR63227. It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds we avoid duplicating a BB into another already placed BB to prevent destroying computed layout. But if the successor BB is a return block, duplicating it will only reduce taken branches without hurt to any other branches. Differential Revision: https://reviews.llvm.org/D153093	2023-06-21 18:54:31 +00:00
Tim Besard	1ee4d880e8	NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG. PTX does not have a notion of `unreachable`, which results in emitted basic blocks having an edge to the next block: ``` block1: call @does_not_return(); // unreachable block2: // ptxas will create a CFG edge from block1 to block2 ``` This may result in significant changes to the control flow graph, e.g., when LLVM moves unreachable blocks to the end of the function. That's a problem in the context of divergent control flow, as `ptxas` uses the CFG to determine divergent regions, while some intructions may not be executed divergently. For example, `bar.sync` is not allowed to be executed divergently on Pascal or earlier. If we start with the following: ``` entry: // start of divergent region @%p0 bra cont; @%p1 bra unlikely; ... bra.uni cont; unlikely: ... // unreachable cont: // end of divergent region bar.sync 0; bra.uni exit; exit: ret; ``` it is transformed by the branch-folder and block-placement passes to: ``` entry: // start of divergent region @%p0 bra cont; @%p1 bra unlikely; ... bra.uni cont; cont: bar.sync 0; bra.uni exit; unlikely: ... // unreachable exit: // end of divergent region ret; ``` After moving the `unlikely` block to the end of the function, it has an edge to the `exit` block, which widens the divergent region and makes the `bar.sync` instruction happen divergently. That causes wrong computations, as we've been running into for years with Julia code (which emits a lot of `trap` + `unreachable` code all over the place). To work around this, add an `exit` instruction before every `unreachable`, as `ptxas` understands that exit terminates the CFG. Note that `trap` is not equivalent, and only future versions of `ptxas` will model it like `exit`. Another alternative would be to emit a branch to the block itself, but emitting `exit` seems like a cleaner solution to represent `unreachable` to me. Also note that this may not be sufficient, as it's possible that the block with unreachable control flow is branched to from different divergent regions, e.g. after block merging, in which case it may still be the case that `ptxas` could reconstruct a CFG where divergent regions are merged (I haven't confirmed this, but also haven't encountered this pattern in the wild yet): ``` entry: // start of divergent region 1 @%p0 bra cont1; @%p1 bra unlikely; bra.uni cont1; cont1: // intended end of divergent region 1 bar.sync 0; // start of divergent region 2 @%p2 bra cont2; @%p3 bra unlikely; bra.uni cont2; cont2: // intended end of divergent region 2 bra.uni exit; unlikely: ... exit; exit: // possible end of merged divergent region? ``` I originally tried to avoid the above by cloning paths towards `unreachable` and splitting the outgoing edges, but that quickly became too complicated. I propose we go with the simple solution first, also because modern GPUs with more flexible hardware thread schedulers don't even suffer from this issue. Finally, although I expect this to fix most of https://bugs.llvm.org/show_bug.cgi?id=27738, I do still encounter miscompilations with Julia's unreachable-heavy code when targeting these older GPUs using an older `ptxas` version (specifically, from CUDA 11.4 or below). This is likely due to related bugs in `ptxas` which have been fixed since, as I have filed several reproducers with NVIDIA over the past couple of years. I'm not inclined to look into fixing those issues over here, and will instead be recommending our users to upgrade CUDA to 11.5+ when using these GPUs. Also see: - https://github.com/JuliaGPU/CUDAnative.jl/issues/4 - https://github.com/JuliaGPU/CUDA.jl/issues/1746 - https://discourse.llvm.org/t/llvm-reordering-blocks-breaks-ptxas-divergence-analysis/71126 Reviewed By: jdoerfert, tra Differential Revision: https://reviews.llvm.org/D152789	2023-06-21 11:40:31 -07:00
Luke Lau	485d25007a	[RISCV] Custom lower fixed vector undef to scalable undef This avoids undefs from being expanded to a build vector of zeroes. As noted by @craig.topper in D153399 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153411	2023-06-21 17:14:57 +01:00
Craig Topper	aae155c50b	[RISCV] Use a build_vector instead of a chain insert_vector_elts for vXi1 build_vector lowreing. A build_vector is the canonical representation rather than multiple insert_vector_elts. Unfortunately, this regresses quite a few tests now primarily due to not having a vmv.s.x special case, but I hope we can improve this with future patches. Stress testing in our downstream found an infinite loop in DAG combine. This patch breaks the infinite loop. The insert_vector_element chain starts with a fixed vector undef. Fixed vector undef is currently expanded to a build_vector of 0s which gets lowered to a vmv.v.i. The insert chain overwrites all elements so SimplifyDemandedVectorElts turns the vmv.v.i back into undef and the cycle repeats. We probably should custom lower fixed vector undef to scalable vector undef. I think that would also fix the infinite loop, but I didn't test that. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D153399	2023-06-21 08:57:46 -07:00
Craig Topper	ddf3f1b3b2	[RISCV] Stop isInterleaveShuffle from producing illegal extract_subvectors. The definition for ISD::EXTRACT_SUBVECTOR says the index must be aligned to the known minimum elements of the extracted type. We mostly got away with this but it turns out there are places that depend on this. For example, this code in getNode for ISD::EXTRACT_SUBVECTOR ``` // EXTRACT_SUBVECTOR of CONCAT_VECTOR can be simplified if the pieces of // the concat have the same type as the extract. if (N1.getOpcode() == ISD::CONCAT_VECTORS && N1.getNumOperands() > 0 && VT == N1.getOperand(0).getValueType()) { unsigned Factor = VT.getVectorMinNumElements(); return N1.getOperand(N2C->getZExtValue() / Factor); } ``` This depends on N2C->getZExtValue() being evenly divisible by Factor. Reviewed By: luke Differential Revision: https://reviews.llvm.org/D153380	2023-06-21 08:52:28 -07:00
Nikita Popov	565c7525b9	[X86] Add test for PR63430 (NFC)	2023-06-21 17:12:57 +02:00
Matt Arsenault	6e8911e4c6	RISCV: Update test	2023-06-21 11:08:57 -04:00
Matt Arsenault	bb8649691d	X86: Fix asserts only test This test should really check the MIR result rather than rely on the debug output.	2023-06-21 10:40:48 -04:00

... 82 83 84 85 86 ...

52796 Commits