llvm-project

Author	SHA1	Message	Date
Fangrui Song	157ba33007	[CSKY,test] Update switch.ll	2023-10-31 15:47:05 -07:00
Fangrui Song	5888dee7d0	[ARM,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#70014 ) The ELF code from https://reviews.llvm.org/D112811 emits LDRLIT_ga_pcrel when `TM.isPositionIndependent()` but uses a different condition `Subtarget.isGVIndirectSymbol(GV)` (aka dso_preemptable on ELF targets). This would cause incorrect access for dso_preemptable `__stack_chk_guard` with the static relocation model. Regarding whether `__stack_chk_guard` gets the dso_local specifier, https://reviews.llvm.org/D150841 switched to `M.getDirectAccessExternalData()` (implied by "PIC Level") instead of `TM.getRelocationModel() == Reloc::Static`. The result is that when non-zero "PIC Level" is used with static relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking), `__stack_chk_guard` accesses are incorrect. ``` ldr r0, .LCPI0_0 ldr r0, [r0] ldr r0, [r0] // incorrectly dereferences __stack_chk_guard ... .LCPI0_0: .long __stack_chk_guard ``` To fix this, for dso_preemptable `__stack_chk_guard`, emit a GOT PIC code sequence like for -fpic using `LDRLIT_ga_pcrel`: ``` ldr r0, .LCPI0_0 .LPC0_0: add r0, pc, r0 ldr r0, [r0] ldr r0, [r0] ... LCPI0_0: .Ltmp0: .long __stack_chk_guard(GOT_PREL)-((.LPC0_0+8)-.Ltmp0) ``` Technically, `LDRLIT_ga_abs` with `R_ARM_GOT_ABS` could be used, but `R_ARM_GOT_ABS` does not have GNU or integrated assembler support. (Note, `.LCPI0_0: .long __stack_chk_guard@GOT` produces an `R_ARM_GOT_BREL`, which is not desired). This patch fixes #6499 while not changing behavior for the following configurations: ``` run arm.linux.nopic --target=arm-linux-gnueabi -fno-pic run arm.linux.pie --target=arm-linux-gnueabi -fpie run arm.linux.pic --target=arm-linux-gnueabi -fpic run armv6.darwin.nopic --target=armv6-apple-darwin -fno-pic run armv6.darwin.dynamicnopic --target=armv6-apple-darwin -mdynamic-no-pic run armv6.darwin.pic --target=armv6-apple-darwin -fpic run armv7.darwin.nopic --target=armv7-apple-darwin -mcpu=cortex-a8 -fno-pic run armv7.darwin.dynamicnopic --target=armv7-apple-darwin -mcpu=cortex-a8 -mdynamic-no-pic run armv7.darwin.pic --target=armv7-apple-darwin -mcpu=cortex-a8 -fpic run arm64.darwin.pic --target=arm64-apple-darwin ```	2023-10-31 15:37:26 -07:00
Fangrui Song	6ae7b735db	[ARM][test] Improve stack-protector tests llvm/test/LTO/ARM/ssp-static-reloc.ll is more about using the static relocation model with "PIC Level" and unrelated to the LTO infrastructure. Move the test. Update stack_guard_remat.ll to clearly test "PIC Level" with the relevant relocation models.	2023-10-31 15:30:08 -07:00
Ramkumar Ramachandra	7a76038835	CodeGen/RISCV: increase test coverage of lrint, llrint (#70826 ) To follow up on 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering), increase the test coverage to test the codegen of the i32-variant of lrint on RV64, and llrint on RV32.	2023-10-31 19:16:39 +00:00
Sander de Smalen	73498d2608	[AArch64] Also implement PNR -> PNR copies. (#70682 ) Previously we only implemented PNR -> PPR and PPR -> PNR copies.	2023-10-31 16:52:42 +00:00
Simon Pilgrim	51d4ad6701	[AMDGPU] amdgpu-codegenprepare-idiv.ll - regenerate checks. NFC. Reduces diffs in a future patch	2023-10-31 13:24:27 +00:00
Matt Arsenault	e62d25e37d	RegisterCoalescer: Relax assert for super register def rematerialization (#69088 )	2023-10-31 21:52:36 +09:00
Jay Foad	a6dabed348	[AMDGPU] Fix nondeterminism in SIFixSGPRCopies (#70644 ) There are a couple of loops that iterate over V2SCopies. The iteration order needs to be deterministic, otherwise we can call moveToVALU in different orders, which causes temporary vregs to be allocated in different orders, which can affect register allocation heuristics.	2023-10-31 11:47:42 +00:00
Kerry McLaughlin	3b786f2c76	[AArch64] Add intrinsic to count trailing zero elements This patch introduces an experimental intrinsic for counting the trailing zero elements in a vector. The intrinsic has generic expansion in SelectionDAGBuilder, and for AArch64 there is a pattern which matches to brkb & cntp instructions where SVE is enabled. The intrinsic has a second operand, is_zero_poison, similar to the existing cttz intrinsic. These changes have been split out from D158291.	2023-10-31 10:48:08 +00:00
Jessica Del	b8d3ccdff1	[AMDGPU] - Add s_bitreplicate intrinsic (#69209 ) Add intrinsic for s_bitreplicate. Lower to S_BITREPLICATE_B64_B32 machine instruction in both GISel and Selection DAG. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-10-31 11:26:45 +01:00
Ilya Leoshkevich	03934e70ef	[SystemZ] Enable AtomicExpand pass (#70398 ) The upcoming OpenMP support for SystemZ requires handling of IR insns like `atomicrmw fadd`. Normally atomic float operations are expanded by Clang and such insns do not occur, but OpenMP generates them directly. Other architectures handle this using the AtomicExpand pass, which SystemZ did not need so far. Enable it. Currently AtomicExpand treats atomic load and stores of floats pessimistically: it casts them to integers, which SystemZ does not need, since the floating point load and store instructions are already atomic. However, the way Clang currently expands them is pessimistic as well, so this change does not make things worse. Optimizing operations on atomic floats can be a separate change in the future. This change does not create any differences the Linux kernel build.	2023-10-31 09:51:06 +01:00
Luke Lau	03c8fbf092	[RISCV] Add _RM pseudos to pseudos table (#70693 )	2023-10-31 08:41:56 +00:00
Fangrui Song	5908559c10	[X86] Don't set SHF_X86_64_LARGE for variables with explicit section name of a well-known small data section prefix (#70748 ) Commit f3ea73133f91c1c23596d45680c8f2269c1dd289 allows SHF_X86_64_LARGE for all global variables with an explicit section. For the following variables, their data sections will be annotated as SHF_X86_64_LARGE. ``` const char relro[512] __attribute__((section(".rodata"))) = "a"; const char *const relro __attribute__((section(".data.rel.ro"))) = "a"; char data[512] __attribute__((section(".data"))) = "a"; ``` The typical linker requirement is that we do not create more than one output section with the same name, and the only output section should have the bitwise OR value of all input section flags. Therefore, the output .data section will have the SHF_X86_64_LARGE flag and be moved away from the regular sections. This is undesired but benign. However, .data.rel.ro having the SHF_X86_64_LARGE flag is problematic because dynamic loaders do not support more than one PT_GNU_RELRO program header, and LLD produces the error `error: section: .jcr is not contiguous with other relro sections`. I believe the most appropriate solution is to disallow SHF_X86_64_LARGE on variables with an explicit section of certain prefixes ( .bss/.data/.bss) and allow others (e.g. metadata sections for various instrumentation). Fortunately, global variables with an explicit .bss/.data/.bss section are rare, so they should not cause excessive relocation overflow pressure.	2023-10-30 17:03:04 -07:00
Philip Reames	784a2cd561	[RISCV] Rewrite RISCVCodeGenPrepare using zext nneg [nfc-ish] (#70739 ) This stacks on #70725. Once we have lowering for zext nneg, we can rewrite all of the existing RISCVCodeGenPrepare login in terms of zext nneg instead of sext. The change isn't NFC from the perspective of the individual pass, but should be from the perspective of codegen as a whole. As noted in the TODO, one piece can be moved to instcombine, but I'll leave that to a separate commit.	2023-10-30 16:35:30 -07:00
Philip Reames	83c560b3bf	[SDAG] Prefer forming sign_extend for zext nneg per target preference (#70725 ) Builds on #67982 which recently introduced the nneg flag on a zext instruction. Note that this change is the first point where the flag is being used for an optimization, and thus may expose latent miscompiles. We've recently taught both CVP and InstCombine to infer the flag when forming zext, but nothing else is using the flag just yet.	2023-10-30 15:29:57 -07:00
Justin Bogner	428af867d8	[DirectX] Update test after `opt` learned to infer datalayout (#70726 ) Since e39f6c1844fa "[opt] Infer DataLayout from triple if not specified", this test (correctly) emits a load of an i64 with 8 byte alignment, rather than with 4 byte alignment.	2023-10-30 14:04:15 -07:00
Philip Reames	c92c86f66a	[RISCV] Add test coverage for "zext nneg" [nfc] This IR feature was recently added in #67982. An upcoming change will improve our lowering on these examples.	2023-10-30 13:35:58 -07:00
Philip Reames	cc6f9cf5a2	[RISCV] Add zbb coverage to test file [nfc]	2023-10-30 13:18:35 -07:00
Michael Maitland	04dd2ac03a	[RISCV][GlobalISel] Select G_GLOBAL_VALUE (#70091 ) G_GLOBAL_VALUE should be lowered into an absolute address if `-codemodel=small` is used or into a PC-relative if `-codemodel=medium` is used. PR #68380 tried to create special instructions to do this, but I don't see why we need to do that.	2023-10-30 15:46:36 -04:00
Igor Kirillov	849f963e31	[CodeGen] Improve ExpandMemCmp for more efficient non-register aligned sizes handling (#70469 ) * Enhanced the logic of ExpandMemCmp pass to merge contiguous subsequences in LoadSequence, based on sizes allowed in `AllowedTailExpansions`. * This enhancement seeks to minimize the number of basic blocks and produce optimized code when using memcmp with non-register aligned sizes. * Enable this feature for AArch64 with memcmp sizes modulo 8 equal to 3, 5, and 6. Reapplication of #69942 after fixing a bug	2023-10-30 18:40:48 +00:00
Antonio Frighetto	9fe5700611	[AArch64] Add support for v8.4a `ldapur`/`stlur` AArch64 backend now features v8.4a atomic Load-Acquire RCpc and Store-Release register unscaled support.	2023-10-30 19:27:48 +01:00
Antonio Frighetto	a8799719f7	[AArch64] Introduce tests for PR67879 (NFC)	2023-10-30 19:27:48 +01:00
Nikita Popov	e46dd6fbc0	Revert "[InstCombine] Simplify and/or of icmp eq with op replacement (#70335 )" This reverts commit 1770a2e325192f1665018e21200596da1904a330. Stage 2 llvm-tblgen crashes when generating X86GenAsmWriter.inc and other files.	2023-10-30 18:33:03 +01:00
Craig Topper	9a7c26a399	[GISel] Restrict G_BSWAP to multiples of 16 bits. (#70245 ) This is consistent with the IR verifier and SelectionDAG's getNode. Update tests accordingly. I tried to keep some coverage of non-pow2 when possible. X86 didn't like a G_UNMERGE_VALUES from s48 to 3 s16 that got created when I tried s48.	2023-10-30 10:27:57 -07:00
Craig Topper	77e88db6b7	[RISCV][GISel] Add missing curly brace to test. NFC	2023-10-30 10:12:56 -07:00
Craig Topper	284d136c4a	[RISCV] Teach copyPhysReg to allow copies between GPR<->FPR32/FPR64 (#70525 ) This is needed because GISel emits copies instead of bitcasts like SelectionDAG.	2023-10-30 09:58:51 -07:00
Jay Foad	101008be83	[AMDGPU] CodeGen for 64-bit buffer atomic cmpswap intrinsics (#70475 ) Implement codegen for: llvm.amdgcn.raw.buffer.atomic.cmpswap.i64 llvm.amdgcn.raw.ptr.buffer.atomic.cmpswap.i64 llvm.amdgcn.struct.buffer.atomic.cmpswap.i64 llvm.amdgcn.struct.ptr.buffer.atomic.cmpswap.i64	2023-10-30 16:44:22 +00:00
Jessica Del	849297c97d	[AMDGPU][wmma] - Add tied wmma intrinsic (#69903 ) These new intrinsics, `amdgcn_wmma_tied_f16_16x16x16_f16` and `amdgcn_wmma_tied_f16_16x16x16_f16`, explicitly tie the destination accumulator matrix to the input accumulator matrix. The `wmma_f16` and `wmma_bf16` intrinsics only write to 16-bit of the 32-bit destination VGPRs. Which half is determined via the `op_sel` argument. The other half of the destination registers remains unchanged. In some cases however, we expect the destination to copy the other halves from the input accumulator. For instance, when packing two separate accumulator matrices into one. In that case, the two matrices are tied into the same registers, but separate halves. Then it is important to copy the other matrix values to the new destination.	2023-10-30 16:23:49 +01:00
Luke Lau	72e6c1c70d	[RISCV] Begin moving post-isel vector peepholes to a MF pass (#70342 ) We currently have three postprocess peephole optimisations for vector pseudos: 1) Masked pseudo with all ones mask -> unmasked pseudo 2) Merge vmerge pseudo into operand pseudo's mask 3) vmerge pseudo with all ones mask -> vmv.v.v pseudo This patch aims to move these peepholes out of SelectionDAG and into a separate RISCVFoldMasks MachineFunction pass. There are a few motivations for doing this: * The current SelectionDAG implementation operates on MachineSDNodes, which are essentially MachineInstrs but require a bunch of logic to reason about chain and glue operands. The RISCVII::hasOp helper functions also don't exactly line up with the SDNode operands. Mutating these pseudos and their operands in place becomes a good bit easier at the MachineInstr level. For example, we would no longer need to check for cycles in the DAG during performCombineVMergeAndVOps. Although it's further down the line, moving this code out of SelectionDAG allows it to be reused by GlobalISel later on. * In performCombineVMergeAndVOps, it may be possible to commute the operands to enable folding in more cases (see test/CodeGen/RISCV/rvv/vmadd-vp.ll). There is existing machinery to commute operands in TII::commuteInstruction, but it's implemented on MachineInstrs. The pass runs straight after ISel, before any of the other machine SSA optimization passes run. This is so that dead-mi-elimination can mop up any vmsets that are no longer used (but if preferred we could try and erase them from inside RISCVFoldMasks itself). This also means that these peepholes are no longer run at codegen -O0, so this patch isn't strictly NFC. Only the performVMergeToVMv peephole is refactored in this patch, the remaining two would be implemented later. And as noted by @preames, it should be possible to move doPeepholeSExtW out of SelectionDAG as well.	2023-10-30 15:17:00 +00:00
Stanislav Mekhanoshin	fe8335babb	[AMDGPU] Select 64-bit imm moves if can be encoded as 32 bit operand (#70395 ) This allows folding of 64-bit operands if fit into 32-bit. Fixes https://github.com/llvm/llvm-project/issues/67781	2023-10-30 08:12:28 -07:00
Stanislav Mekhanoshin	ee6d62db99	[AMDGPU] Prevent folding of the negative i32 literals as i64 (#70274 ) We can use sign extended 64-bit literals, but only for signed operands. At the moment we do not know if an operand is signed. Such operand will be encoded as its low 32 bits and then either correctly sign extended or incorrectly zero extended by HW.	2023-10-30 08:07:43 -07:00
Nikita Popov	292f34b0d3	[AArch64][GlobalISel] Fix incorrect ABI when tail call not supported (#70215 ) The check for whether a tail call is supported calls determineAssignments(), which may modify argument flags. As such, even though the check fails and a non-tail call will be emitted, it will not have a different (incorrect) ABI. Fix this by operating on a separate copy of the arguments. Fixes https://github.com/llvm/llvm-project/issues/70207.	2023-10-30 15:01:01 +01:00
Simon Pilgrim	432649700d	[X86] vec_insert-5.ll - ensure we build with +mmx as we reference x86_mmx types Enabling SSE doesn't guarantee MMX is enabled on all targets Avoids a crash in D152928 (although we still currently see a regression with that patch applied resulting in MMX codegen)	2023-10-30 12:43:18 +00:00
Nikita Popov	1770a2e325	[InstCombine] Simplify and/or of icmp eq with op replacement (#70335 ) and/or in logical (select) form benefit from generic simplifications via simplifyWithOpReplaced(). However, the corresponding fold for plain and/or currently does not exist. Similar to selects, there are two general cases for this fold (illustrated with `and`, but there are `or` conjugates). The basic case is something like `(a == b) & c`, where the replacement of a with b or b with a inside c allows it to fold to true or false. Then the whole operation will fold to either false or `a == b`. The second case is something like `(a != b) & c`, where the replacement inside c allows it to fold to false. In that case, the operand can be replaced with c, because in the case where a == b (and thus the icmp is false), c itself will already be false. As the test diffs show, this catches quite a lot of patterns in existing test coverage. This also obsoletes quite a few existing special-case and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst), but I haven't removed anything as part of this patch in the interest of risk mitigation. Fixes #69050. Fixes #69091.	2023-10-30 10:05:39 +01:00
Cullen Rhodes	54732a3e0b	[AArch64] Use TargetRegisterClass::hasSubClassEq in tryToFindRegisterToRename When renaming store operands for pairing in the load/store optimizer it tries to find an available register from the minimal physical register class of the original register. For each register it compares the equality of minimal physical register class of all sub/super registers with the minimal physical register class of the original register. Simply checking for register class equality can break once additional register classes are added, as was the case when adding: def foo : RegisterClass<"AArch64", [i32], 32, (sequence "W%u", 12, 15)> which broke: CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir CodeGen/AArch64/stp-opt-with-renaming.mir Since the introduction of the register class above, the rename register in test1 of the reserved regs test changed from x12 to x18. The reason for this is the minimal physical register class of x12 (as well as x13-x15) and its sub/super registers no longer matches that of x9 (GPR64noip_and_tcGPR64). Rather than selecting a matching register based on a comparison of the minimal physical register classes of the original and rename registers, this patch selects based on `MachineInstr::getRegClassConstraint` for the original register. It's worth mentioning the parameter passing registers (r0-r7) could be now be used as rename registers since the GPR32arg and GPR64arg register classes are subclasses of the minimal physical register class for x8 for example. I'm not entirely sure if we want to exclude those registers, if so maybe we could explicitly exclude those register classes. Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D88663	2023-10-30 08:47:39 +00:00
David Green	072a7edec3	[AArch64] Add additional concat trunc -> UZP1 patterns These extra patterns come from the lowering of fptoi, where an extra assertzext is present between some of the vectors.	2023-10-29 22:58:57 +00:00
Simon Pilgrim	d96529af3c	[DAG] Attempt shl narrowing in SimplifyDemandedBits (REAPPLIED) If a shl node leaves the upper half bits zero / undemanded, then see if we can profitably perform this with a half-width shl and a free trunc/zext. Followup to D146121 Reapplied - moved after the ShrinkDemandedOp call; reuse the existing KnownBits result; ensure that we only attempt this if all the upper bits are demanded; 547dc461225ba should address the remaining regressions that were noticed in the previous commit. Differential Revision: https://reviews.llvm.org/D155472	2023-10-29 15:38:46 +00:00
Phoebe Wang	b5281afe42	[X86] Avoid returning the same shuffle operation for broadcast (#70592 ) This is to fix a crash since aab8b2eb080d, which generates a new pattern ``` t35: v8i32 = xor t11, t14 t36: v8i32 = vector_shuffle<0,1,0,1,0,1,0,1> t35, undef:v8i32 ``` The pattern exposed a bug introduced since f885c08034, which breaks element widen but doesn't handle the broadcast case. The patch just solved the crash issue. I observed performance regression cased by above patches in the test, which may need further investigation.	2023-10-29 21:55:00 +08:00
Craig Topper	133e50db31	[RISCV][GISel] Directly emit X0 from getICMPOperandsForBranch instead of using buildConstant+selectConstant. This is simpler to implement and matches what SelectionDAG emits. Also allows us to make getICMPOperandsForBranch a static function.	2023-10-28 23:50:46 -07:00
Craig Topper	4ac304242b	[RISCV][GISel] Support G_FPEXT/G_FPTRUNC for F and D extension.	2023-10-28 16:22:17 -07:00
Craig Topper	49ae2efb80	[RISCV][GISel] Support G_FMA/NEG/ABS/SQRT/MAXNUM/MINNUM for F and D extension.	2023-10-28 15:53:15 -07:00
XChy	fc6bdb8549	[SimplifyCFG] Reland transform for redirecting phis between unmergeable BB and SuccBB (#68473 ) Reland #67275 with #68953 resolved.	2023-10-28 17:10:20 +08:00
Rahman Lavaee	f70e39ec17	[BasicBlockSections] Apply path cloning with -basic-block-sections. (#68860 ) `28b9126879` introduced the path cloning format in the basic-block-sections profile. This PR validates and applies path clonings. A path cloning is valid if all of these conditions hold: 1. All bb ids in the path are mapped to existing blocks. 2. Each two consecutive bb ids in the path have a successor relationship in the CFG. 3. The path does not include a block with indirect branches, except possibly as the last block. Applying a path cloning involves cloning all blocks in the path (except the first one) and setting up their branches. Once all clonings are applied, the cluster information is used to guide block layout in the modified function.	2023-10-27 21:49:39 -07:00
Changpeng Fang	8ceb72ffe5	[AMDGPU] make v32i16/v32f16 legal (#70484 ) Some upcoming intrinsics will be using these new types	2023-10-27 15:28:31 -07:00
Mircea Trofin	840bf2a0bb	[mlgo][regalloc] Fix tests post 9a091de7	2023-10-27 14:03:25 -07:00
Stanislav Mekhanoshin	d136432038	[AMDGPU] Remove unneeded implicit-def from shrink-i32-kimm.mir. NFC. (#70489 )	2023-10-27 13:32:48 -07:00
Guozhi Wei	9a091de7fe	[X86, Peephole] Enable FoldImmediate for X86 Enable FoldImmediate for X86 by implementing X86InstrInfo::FoldImmediate. Also enhanced peephole by deleting identical instructions after FoldImmediate. Differential Revision: https://reviews.llvm.org/D151848	2023-10-27 19:47:23 +00:00
Michael Maitland	4b581125ed	[RISCV][GISel] Fix failing test case for G_BSWAP The test was not updated correctly in #70226. This patch resolves that problem.	2023-10-27 10:22:09 -07:00
Michael Maitland	4c43c1eeef	[RISCV][GISEL] Add legalizer for G_BSWAP (#70226 ) Lower G_BSWAP into simpler instructions that can be selected in instruction selection. A future patch can handle when there is Zbb.	2023-10-27 13:03:25 -04:00
Paul Walker	7c90be2857	[SVE] Fix incorrect offset calculation when rewriting an instruction's frame index. (#70315 ) When partially packing an offset into an SVE load/store instruction we are incorrectly calculating the remainder.	2023-10-27 16:53:30 +01:00

... 43 44 45 46 47 ...

52796 Commits