llvm-project

Author	SHA1	Message	Date
Craig Topper	6e4be7e12a	[RISCV] Split double out of compress-float.ll. Add Zcf and Zcd RUN lines. Make Zcf/Zcd depend on Zca. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D153826	2023-06-27 09:44:51 -07:00
Amy Kwan	11b71ade51	[PowerPC][TLS] Add additional TLS X-Form loads/store instructions This patch is a follow up to D43315, and adds the following new load/store TLS specific instructions for integer and floating point scalar types: ``` LHAXTLS LWAXTLS LHAXTLS_32 LWAXTLS_32 LFSXTLS LFDXTLS STFSXTLS STFDXTLS ``` These instructions can be used to optimized TLS sequences when D-Form loads/stores follow an ADD_TLS instruction. Duplicate versions of these instructions are also added within an isAsmParserOnly=1 block (similar to D47382) to allow llvm-mc to assemble these instructions. Differential Revision: https://reviews.llvm.org/D153645	2023-06-27 11:33:38 -05:00
Philip Reames	65de6b0b0e	[RISCV] Remove legacy TA/TU pseudo distinction for VID This is a follow on to D152740. The focus of this patch is on actually removing the old TA (unsuffixed) version. I realized we already had plumbing for combined TA/TU pseudos - used by some of the ternary instructions. As such, we can go ahead and fully remove the old TA, and rename the _TU variant to be unsuffixed. (The rename must happen in this patch for the table structure to work out as expected.) The scheduling difference comes from an omission in D152740. If we selected a _MASK variant - either from manual ISEL or instrincs - we were going through doPeepholeMaskedRVV and still getting the TA variant. The use of the IsCombined flag in the MaskedPseudo table causes us to use the TU (now unsuffixed) variant instead. Differential Revision: https://reviews.llvm.org/D153155	2023-06-27 09:12:00 -07:00
Igor Kirillov	1fce8df53a	Fix the ComplexDeinterleaving bug when handling mixed reductions. Add a missing check that ensures that ComplexDeinterleaving for reduction is only analyzed for Real and Imaginary Instructions of the same type. Differential Revision: https://reviews.llvm.org/D153862	2023-06-27 14:40:49 +00:00
Ties Stuij	03db28edbb	[ARM] in ExpandTMOV32BitImm, CPSR register ops should be `Define`d The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm were marked as kill, instead of define. Best to use the pre-existing t1CondCodeOp fn to construct CPSRs. Reviewed By: simonwallis2 Differential Revision: https://reviews.llvm.org/D153763	2023-06-27 14:58:22 +01:00
Simon Pilgrim	7b77dd6afd	[X86] SimplifyDemandedBitsForTargetNode - add X86ISD::ANDNP handling Add X86ISD::ANDNP handling to targetShrinkDemandedConstant as well, which allows us to replace a lot of truncated masks with (rematerializable) allones values	2023-06-27 14:23:00 +01:00
Diana Picus	d98e44b343	[AMDGPU][DAGISel] Be more flexible about what calls are allowed Remove DAGISel checks on calling conventions. GlobalISel doesn't have these checks either and we prefer it that way (see D152794). Add a simple test like the one introduced in D117479 for GlobalISel. Differential Revision: https://reviews.llvm.org/D153535	2023-06-27 09:49:38 +02:00
Alex MacLean	17aa37dd30	[SelectionDAG] Add memory size for CSEMap ID calculation In NVPTX `ReplaceVectorLoad()`, i1 and i8 types are promoted to i16, followed by a truncate operation. Thus, v2i8 (or v2i1) and v2i16 will have the same VTList, which causes a collision in CSEMap. To differentiate the original VTList, let's add the size in generating an ID. Otherwise the compiler crashes in refineAlignment: `MMO->getSize() == getSize() && "Size mismatch!"` Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153712	2023-06-26 16:12:48 -07:00
Eduard Zingerman	6a6db74b77	[BPF] Propagate NoMerge attribute when lowering function calls `NoMerge` attribute on machine instructions prevents certain transformations from merging these instructions. One of such transformations is 'llvm/lib/CodeGen/BranchFolding.cpp'. This attribute should be copied from IR `call` instructions to machine level instructions. See `X86TargetLowering::LowerCall` as another example. Differential Revision: https://reviews.llvm.org/D152987	2023-06-27 01:15:45 +03:00
David Green	aaca8e2c34	[AArch64] Don't recreate nodes in tryCombineLongOpWithDup If we don't find a node with either operand through isEssentiallyExtractHighSubvector, there is little point recreating the node with the same operands. Returning SDValue better communicates that no changes were made. This fixes #63491 by not recreating uabd nodes with swapped operands. As noted in the ticket there are other fixes that might be useful to make too, but this should prevent the infinite combine.	2023-06-26 22:41:18 +01:00
Matthias Braun	02ba5b8c6b	Ignore load/store until stack address computation No longer conservatively assume a load/store accesses the stack when we can prove that we did not compute any stack-relative address up to this point in the program. We do this in a cheap not-quite-a-dataflow-analysis: Assume `NoStackAddressUsed` when all predecessors of a block already guarantee it. Process blocks in reverse post order to guarantee that except for loop headers we have processed all predecessors of a block before processing the block itself. For loops we accept the conservative answer as they are unlikely to be shrink-wrappable anyway. Differential Revision: https://reviews.llvm.org/D152213	2023-06-26 13:50:36 -07:00
Matthias Braun	759b217626	Switch tests to use update_llc_test_checks Switch and update some tests to use `update_llc_test_checks` to reduce clutter in upcoming change. Differential Revision: https://reviews.llvm.org/D152215	2023-06-26 13:50:36 -07:00
Philip Reames	237efe7eaa	[RISCV] Regen rvv/fixed-vectors-fmf.ll to avoid spurious test deltas	2023-06-26 12:53:33 -07:00
Craig Topper	4afa2ab7a5	[RISCV][SelectionDAGBuilder] Fix an implicit scalable TypeSize to fixed size conversion in getUniformBase. If the index needs to be scaled by a scalable size, just give up. Fixes #63459 Reviewed By: frasercrmck, RKSimon Differential Revision: https://reviews.llvm.org/D153601	2023-06-26 11:56:17 -07:00
Matt Arsenault	f2596b754c	SeparateConstOffsetFromGEP: Don't use SCEV This was only using the SCEV expressions as a map key, which we can do just as well with the value pointers. This also allows it to handle vectors.	2023-06-26 13:58:06 -04:00
Maurice Heumann	249bd9eab0	[ARM] Fix codegen of unaligned volatile load/store of i64 Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE. However, these instructions require the addresses to be aligned. Unaligned loads/stores therefore should be ignored by this handling. Differential Revision: https://reviews.llvm.org/D152790	2023-06-26 10:45:41 -07:00
Eli Friedman	bc7f11ccb0	[SelectionDAG] Improve expansion of wide min/max The current implementation tries to handle the high and low halves separately, but that's less efficient in most cases; use a wide SETCC instead. Differential Revision: https://reviews.llvm.org/D151358	2023-06-26 10:45:41 -07:00
Ahmed Bougacha	b3272f5ddb	[AArch64][PAC] Select MOVK for ptrauth.blend intrinsic. Blend combines two discriminator values used by other ptrauth ops. On AArch64 here, it does that by replacing the high 16 bits of the LHS with the low 16 bits of the RHS. Usually the RHS is a constant, which lets us do this efficiently in a single MOVK. When the RHS isn't constant, we can do a BFI. In a sense, this is implementing an ABI decision (how to lower the software construct of "blend"), but if there are interesting variants to consider, this could be made object-file-format-specific in some way. Differential Revision: https://reviews.llvm.org/D132384	2023-06-26 09:43:37 -07:00
Simon Pilgrim	868351f894	[X86] combineMul - ensure getTargetConstantFromNode splat extraction is the correct element width The extracted Constant and Constant::getSplatValue can both be any bitwidth - they don't necessarily match the original ConstantSDNode type Fixes #63507	2023-06-26 16:50:14 +01:00
Simon Pilgrim	6756947ac6	[X86] lowerV8I16Shuffle - use PACKSS(SEXT_INREG(X),SEXT_INREG(Y)) for pre-SSSE3 truncation shuffles The comment about PSHUFLW+PSHUFHW+PSHUFD was outdated as that referred to a single input case, but that is now always handled earlier. Another step towards removing premature combines to vector truncation combines to PACK.	2023-06-26 16:50:13 +01:00
Luke Lau	0e9384a6c6	[RISCV] Teach doPeepholeMaskedRVV to handle vslide{up,down} We already handle vslide1{up,down}, so this extends it to vslide{up,down}. This was unintentionally added in https://reviews.llvm.org/D150463 and then removed in 37cfcfcef76bb615b941d7077ca81168bd7ad080, but unless I'm missing something this should still be ok as the mask only controls what destination elements are written to. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153631	2023-06-26 09:36:03 +01:00
Luke Lau	0d0bfa8a14	[RISCV] Add test cases for vmerge peephole with vslides Currently vslide1{up,down}s can have vmerges folded into them, but not vslide{up,down}s. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D153630	2023-06-26 09:36:00 +01:00
Craig Topper	b105b3266f	[RISCV] Properly handle partial writes in isConvertibleToVMV_V_V. We were only checking for the previous insructions to write exactly the register or a super register. We ignored writes to a subregister and continued searching for the producing instruction. We need to abort instead. There's another check inside the if body to abort if the registers don't match exactly. So we just need to check for overlap so we enter the if body. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D153490	2023-06-25 23:08:47 -07:00
Craig Topper	43e57bda4c	[RISCV] Add test case for D153490. NFC	2023-06-25 22:58:18 -07:00
Wang Rui	4262ae20d8	[LoongArch] Optimize conditional selection of integer This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions. ``` int sel(int a, int b) { return (a < b) ? a : 0; } ``` ``` slt $a1,$a0,$a1 masknez $a2,$r0,$a1 maskeqz $a0,$a0,$a1 or $a0,$a0,$a2 ``` => ``` slt $a1,$a0,$a1 maskeqz $a0,$a0,$a1 ``` Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D153193	2023-06-26 10:33:51 +08:00
Weining Lu	fb563717fb	Revert "[LoongArch] Optimize conditional selection of integer" This reverts commit 3dd319ecf3be64598ea84d1730033854cade7123. Sorry, I forgot to amend the author name and email when merging this patch.	2023-06-26 10:30:42 +08:00
Matt Arsenault	c3b27c236d	RegAllocGreedy: Fix assert with remarks on unassigned subregisters This tried to query the physical subregister on virtual registers if they were left unassigned.	2023-06-25 19:26:25 -04:00
Matt Arsenault	9f274939db	AMDGPU: Handle the easy parts of strict fptrunc f64->f16 is hard. The expansion is all integer but we need to raise exceptions. Also doesn't handle the illegal f16 targets.	2023-06-25 19:26:25 -04:00
Matt Arsenault	3d409e55a1	AMDGPU: Handle constrained fpext	2023-06-25 19:26:25 -04:00
Amaury Séchet	391a95fdb1	[NFC] Autogenerate CodeGen/AMDGPU/combine-reg-or-const.ll	2023-06-25 22:56:42 +00:00
Amaury Séchet	632a8aca07	[NFC] Autogenerate CodeGen/PowerPC/tail-dup-break-cfg.ll	2023-06-25 22:55:49 +00:00
Niwin Anto	10b1f58cba	[AArch64][GlobalISel] IR translate support for a return instruction of type <1 x i8> or <1 x i16> when using GlobalISel. Code generation for return instruction of type <1 x i8> or <1 x i16> when using GlobalISel causes internal compiler crash Could not handle ret ty. Fixes: https://github.com/llvm/llvm-project/issues/58211 Differential Revision: https://reviews.llvm.org/D153300	2023-06-25 14:40:48 -07:00
Tobias Stadler	84a6a057e6	[AArch64][GlobalISel] Select G_UADDE/G_SADDE/G_USUBE/G_SSUBE This implements the remaining overflow generating instructions in the AArch64 GlobalISel selector. Now wide add/sub operations do not fallback to SelectionDAG anymore. We make use of PostSelectOptimize to cleanup the hereby generated flag-setting operations when the carry-out is unused. Since we do not fallback anymore when selecting add/sub atomics on O0 some test changes were required there. Fixes: https://github.com/llvm/llvm-project/issues/59407 Differential Revision: https://reviews.llvm.org/D153164	2023-06-25 14:32:00 -07:00
Amaury Séchet	e345b9ca7a	[NFC] Autogenerate CodeGen/PowerPC/pr40922.ll	2023-06-25 21:05:06 +00:00
David Green	6fcc562fc7	[AArch64] Add SVE tests for double reducts of vector.reduce.fmaximum/fminimum. NFC Now that the SVE parts are in, we can fill in the double reduction tests without them causing problems.	2023-06-25 08:44:43 +01:00
Fangrui Song	2a61ceddb3	[BPF] Remove unused legacy passes after TargetMachine::adjustPassManager removal D137796 made these passes unused. `opt --bpf-ir-peephole` is specified in one test. Add a `registerPipelineParsingCallback` so that we can use change the test to use `opt --passes=bpf-ir-peephole` instead.	2023-06-24 22:44:06 -07:00
Amaury Séchet	93af6bdcaf	[NFC] Autogenerate CodeGen/PowerPC/select-i1-vs-i1.ll	2023-06-25 01:27:29 +00:00
Amaury Séchet	8412a17b79	[NFC] Autogenerate CodeGen/ARM/2013-07-29-vector-or-combine.ll	2023-06-25 01:05:21 +00:00
Amaury Séchet	7457acb842	[NFC] Autogenerate CodeGen/ARM/2011-03-15-LdStMultipleBug.ll	2023-06-25 01:02:49 +00:00
Amaury Séchet	e271a539c5	[NFC] Autogenerate CodeGen/ARM/and-sext-combine.ll	2023-06-25 00:55:03 +00:00
Amaury Séchet	78c1985f99	[NFC] Autogenerate CodeGen/ARM/machine-cse-cmp.ll	2023-06-25 00:44:30 +00:00
Amaury Séchet	2e8111d4c4	[NFC] Autogenerate CodeGen/ARM/pr35103.ll	2023-06-25 00:29:14 +00:00
Thorsten Schütt	e0e998f8d8	[GlobalIsel][X86]] Legalize G_CONSTANT_FOLD_BARRIER Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D153684	2023-06-24 17:19:51 +02:00
David Green	f8003689f1	[AArch64] Add SVE lowering for vector.reduce.fminimum and fmaximum Following what is already performed for vector.reduce.fmin/fmax, this adds lowering for the new vector.reduce.fminimum/fmaximum nodes to the SVE fminv and fmaxv instructions via the existing FMINV_PRED/FMAXV_PRED nodes. Differential Revision: https://reviews.llvm.org/D153288	2023-06-24 11:12:58 +01:00
Thorsten Schütt	68ed9d9472	[GlobalIsel][X86] G_STORE extension Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153643	2023-06-24 05:51:30 +02:00
Alex Brachet	6085eb3084	Revert "Reland [llvm] Preliminary fat-lto-objects support" This reverts commit 44265dc3554ef40920b587eeb787a400663af6c7.	2023-06-24 01:15:50 +00:00
Alex Bradbury	929124993a	Recommit "[RISCV] Implement support for bf16 truncate/extend on hard FP targets" Without the changes from D153598. Original commit message: For the same reasons as D151284, this requires custom lowering of the truncate libcall on hard float ABIs (the normal libcall code path is used on soft ABIs). The extend operation is implemented by a shift just as in the standard legalisation, but needs to be custom lowered because i32 isn't a legal type on RV64. This patch aims to make the minimal changes that result in correct codegen for the bfloat.ll tests. Differential Revision: https://reviews.llvm.org/D151663	2023-06-23 17:23:12 -07:00
Craig Topper	076759f068	Revert "[RISCV] Implement support for bf16 truncate/extend on hard FP targets" This was committed with D153598 merged into it. Reverting to recommit as separate patches. This reverts commit 690b1c847f0b188202a86dc25a0a76fd8c4618f4.	2023-06-23 17:23:12 -07:00
Teresa Johnson	200cc952a2	[LTO][GlobalDCE] Use pass parameter instead of module flag for LTO phase D63932 added a module flag to indicate that we are executing the regular LTO post merge pipeline, so that GlobalDCE could perform more aggressive optimization for Dead Virtual Function Elimination. This caused issues trying to reuse bitcode that had already been through the LTO pipeline (see context in D139816). Instead support this by passing down a parameter flag to the GlobalDCEPass constructor, which is the more usual way for indicating this information. Most test changes are to remove incidental uses of this flag. Of the 2 real uses, llvm/test/LTO/ARM/lto-linking-metadata.ll is now obsolete and removed in this patch, and the virtual-functions-visibility-post-lto.ll test is updated to use the regular LTO default pipeline where this parameter is set to true. Differential Revision: https://reviews.llvm.org/D153655	2023-06-23 17:05:07 -07:00
Paul Kirth	44265dc355	Reland [llvm] Preliminary fat-lto-objects support Fat LTO objects contain both LTO compatible IR, as well as generated object code. This allows users to defer the choice of whether to use LTO or not to link-time. This is a feature available in GCC for some time, and makes the existing -ffat-lto-objects flag functional in the same way as GCC's. Within LLVM, we add a new EmbedBitcodePass that serializes the module to the object file, and expose a new pass pipeline for compiling fat objects. The new pipeline initially clones the module and runs the selected (Thin)LTOPrelink pipeline, after which it will serialize the module into a `.llvm.lto` section of an ELF file. When compiling for (Thin)LTO, this normally the point at which the compiler would emit a object file containing the bitcode and metadata. After that point we compile the original module using the PerModuleDefaultPipeline used for non-LTO compilation. We generate standard object files at the end of this pipeline, which contain machine code and the new `.llvm.lto` section containing bitcode. Since the two pipelines operate on different copies of the module, we can be sure that the bitcode in the `.llvm.lto` section and object code in `.text` are congruent with the existing output produced by the default and LTO pipelines. Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977 Earlier versions of this patch were missing REQUIRES lines for llc related tests in Transforms/EmbedBitcode. Those tests are now under CodeGen/X86, which should avoid running the check on unsupported platforms. Reviewed By: tejohnson, MaskRay, nikic Differential Revision: https://reviews.llvm.org/D146776	2023-06-23 23:23:58 +00:00

... 81 82 83 84 85 ...

52796 Commits