llvm-project

Author	SHA1	Message	Date
Fangrui Song	eabaee0c59	[RISCV] Omit "@plt" in assembly output "call foo@plt" (#72467 ) R_RISCV_CALL/R_RISCV_CALL_PLT distinction is not necessary and R_RISCV_CALL has been deprecated. Since https://reviews.llvm.org/D132530 `call foo` assembles to R_RISCV_CALL_PLT. The `@plt` suffix is not useful and can be removed now (matching AArch64 and PowerPC). GNU assembler assembles `call foo` to RISCV_CALL_PLT since 2022-09 (70f35d72ef04cd23771875c1661c9975044a749c). Without this patch, unconditionally changing MO_CALL to MO_PLT could create `jump .L1@plt, a0`, which is invalid in LLVM integrated assembler and GNU assembler.	2024-01-07 12:09:44 -08:00
David Green	780a5116ba	[AArch64] Fix condition for combining UADDV and Add. (#76809 ) This should have been checking that the transform was valid, but used incorrect conditions letting through invalid combinations of lo/hi extracts. Hopefully fixes #76769	2024-01-07 08:23:17 +00:00
Luke Lau	274f8332b9	[RISCV] Don't attempt PRE if available info is SEW/LMUL ratio only (#77063 )	2024-01-07 14:23:01 +07:00
Thorsten Schütt	a085402ef5	Revert "[GlobalIsel] Combine select of binops (#76763 )" This reverts commit 1687555572ee4fb435da400dde02e7a1e60b742c.	2024-01-06 17:04:24 +01:00
Thorsten Schütt	1687555572	[GlobalIsel] Combine select of binops (#76763 )	2024-01-06 11:28:10 +01:00
hev	16094cb629	[llvm][LoongArch] Support per-global code model attribute for LoongArch (#72079 ) This patch gets the code model from global variable attribute if it has, otherwise the target's will be used. --------- Signed-off-by: WANG Rui <wangrui@loongson.cn>	2024-01-06 13:36:09 +08:00
Shengchen Kan	a5902a4d24	[X86][NFC] Rename variables/passes for EVEX compression optimization RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031 APX introduces EGPR, NDD and NF instructions. In addition to compressing EVEX encoded AVX512 instructions into VEX encoding, we also have several more possible optimizations. a. Promoted instruction (EVEX space) -> pre-promotion instruction (legacy space) b. NDD (EVEX space) -> non-NDD (legacy space) c. NF_ND (EVEX space) -> NF (EVEX space) The first two types of compression can usually reduce code size, while the third type of compression can help hardware decode although the instruction length remains unchanged. So we do the renaming for the upcoming APX optimizations. BTW, I clang-format the code in X86CompressEVEX.cpp, X86CompressEVEXTablesEmitter.cpp. This patch also extracts the NFC in #77065 into a separate commit.	2024-01-06 12:41:09 +08:00
Mikhail Gudim	ba3ef331b4	[RISCV][GlobalISel] Zbkb support for G_BSWAP (#77050 ) This instructions is legal in the presence of Zbkb extension.	2024-01-05 23:19:46 -05:00
Daniel Hoekwater	def42537ee	[NFC][CodeGen][AArch64] Add tests for unconditional branch duplication c9f3288 introduced unconditional branch deduplication for basic block sections and machine function splitting, but it didn't add tests for AArch64 since prior behavior crashed the test. This change adds tests for AArch64 and has no functional change.	2024-01-05 23:39:01 +00:00
David Green	365fbbfbcf	[AArch64] Add some extra tests for SelectOpt. NFC	2024-01-05 21:04:01 +00:00
Simon Pilgrim	070ac1dcd5	[SystemZ] vec-perm-14.ll - partially regenerate checks so we can see all the vperm codegen We can't use the script as we need to keep the shuffle mask constant pool checks, but do more than just check that a second vperm isn't generated	2024-01-05 18:00:08 +00:00
Craig Topper	4dd5d96797	[RISCV] Don't call use_nodbg_operands for physical registers in RISCVOptWInstrs hasAllNBitUsers. (#77032 ) The ADDIW in the new test case was incorrectly removed due to incorrectly following the x10 register from the return value back to the argument. This is due to use_nodbg_operands returning every instruction that uses a physical register regardless of the data flow.	2024-01-05 09:22:54 -08:00
Mircea Trofin	c49965b97e	[mlgo] Fix post PR #76919 Relaxed a bit the opcode checks to make the test less sensitive to changes resulting in opcode numbering.	2024-01-05 09:10:03 -08:00
Orlando Cazalet-Hyams	10b03e6662	[RemoveDIs] Handle DPValues in FastISel (#76952 ) The change is fairly mechanical: 1. Factor code from `FastISel::selectIntrinsicCall`, which converts debug intrinsics into debug instructions, into functions (NFC). 2. Call those functions for DPValues attached to instructions too. The test updates look the same as other RemoveDIs changes: re-run the tests with `--try-experimental-debuginfo-iterators`, which checks the output is identical using the new debug info format (if it has been enabled in the cmake configuration). Depends on #76941 (otherwise some modified tests spuriously fail).	2024-01-05 15:11:47 +00:00
Simon Pilgrim	ae81400a0f	[X86] keylocker-intrinsics.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-05 13:38:44 +00:00
Simon Pilgrim	b51130a331	[X86] combine-fneg.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-05 13:38:44 +00:00
Simon Pilgrim	4ecd6384af	[X86] fp128-cast.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-05 13:38:44 +00:00
Simon Pilgrim	c307147660	[X86] vec_fptrunc.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-05 13:38:43 +00:00
Simon Pilgrim	1dbdf7658a	[X86] vec_fpext.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-05 13:38:43 +00:00
Florian Hahn	da148a0805	[AArch64] Add tests showing unnecessary cast promotion.	2024-01-05 13:32:00 +00:00
David Green	d187dfe515	[AArch64] Add some tests for addLikeOr with csinc. NFC	2024-01-05 12:39:32 +00:00
Simon Pilgrim	7648371c25	Revert 4d7c5ad58467502fcbc433591edff40d8a4d697d "[NewPM] Update CodeGenPreparePass reference in CodeGenPassBuilder (#77054 )" Revert e0c554ad87d18dcbfcb9b6485d0da800ae1338d1 "Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380)" Revert #75380 and #77054 as they were breaking EXPENSIVE_CHECKS buildbots: https://lab.llvm.org/buildbot/#/builders/104	2024-01-05 12:28:10 +00:00
Jay Foad	e96e7a9a86	[AMDGPU] Implement readcyclecounter for GFX12 (#76965 )	2024-01-05 08:20:52 +00:00
David Green	77b124cc57	[AArch64][GlobalISel] Add legalization for G_VECREDUCE_SEQ_FADD. (#76238 ) And G_VECREDUCE_SEQ_FMUL at the same time. They require the elements of the vector operand to be accumulated in order, so just need to be scalarized. Some of the operands are not simplified as much as they can quite yet due to not canonicalizing constant operands post-legalization.	2024-01-05 08:11:44 +00:00
XinWang10	f5f66e26b5	[X86]Support lowering for APX Promoted SHA/MOVDIR/CRC32/INVPCID/CET instructions (#76786 ) R16-R31 was added into GPRs in https://github.com/llvm/llvm-project/pull/70958, This patch supports the lowering for promoted SHA/MOVDIR/CRC32/INVPCID/CET. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2024-01-05 15:56:15 +08:00
Phoebe Wang	59af659ee3	[X86][BF16] Try to use `f16` for lowering (#76901 ) This patch fixes BF16 32-bit ABI problem: https://godbolt.org/z/6dMnh8jGG	2024-01-05 15:25:18 +08:00
Nick Anderson	e0c554ad87	Port CodeGenPrepare to new pass manager (and BasicBlockSectionsProfil… (#75380 ) Port CodeGenPrepare to new pass manager and dependency BasicBlockSectionsProfileReader Fixes: #64560 Co-authored-by: Krishna-13-cyber <84722531+Krishna-13-cyber@users.noreply.github.com>	2024-01-05 13:47:56 +07:00
Benoit Jacob	054b5fc0fd	X86: add some missing lowerings for shuffles on `bf16` element type. (#76076 ) Some shuffles with `bf16` as element type were running into a `llvm_unreachable`. Key to reproducing was to chain two shuffles. ```llvm define <2 x bfloat> @shuffle_chained_v32bf16_v2bf16(<32 x bfloat> %a) { %s = shufflevector <32 x bfloat> %a, <32 x bfloat> zeroinitializer, <32 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23, i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31> %s2 = shufflevector <32 x bfloat> %s, <32 x bfloat> zeroinitializer, <2 x i32> <i32 0, i32 1> ret <2 x bfloat> %s2 } ``` This was hitting this UNREACHABLE: ``` Not a valid 512-bit x86 vector type! UNREACHABLE executed at /home/benoit/iree/third_party/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:17124! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: /home/benoit/mlir-build/bin/llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f,+avx512vl,+avx512bw,+avx512bf16 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'X86 DAG->DAG Instruction Selection' on function '@shuffle_chained_v32bf16_v2bf16' ```	2024-01-04 22:39:40 -05:00
Michal Paszkowski	b4cfb50c65	[SPIR-V] Emit SPIR-V bitcasts between source/expected pointer type (#69621 ) This patch introduces a new spv_ptrcast intrinsic for tracking expected pointer types. The change fixes multiple OpenCL CTS regressions due the switch to opaque pointers (e.g. basic/hiloeo).	2024-01-04 19:31:15 -08:00
Dávid Ferenc Szabó	f68647997b	[GlobalISel] Adding support for handling G_ASSERT_{SEXT,ZEXT,ALIGN} i… (#74196 ) …n artifact combiner These instructions are hint for optimizations and can be treated as copies and are handled as such with this change. Without it is possible to run into an assertion, since tryCombineUnmergeValues rightfully use getDefIgnoringCopies to get the source MI, which already handle these hint instructions and treat them as copies. The problem is that markDefDead only considers COPYs, which will lead to crash with assertion for cases like testUnmergeHintOfTrunc.	2024-01-05 10:13:39 +07:00
wanglei	c56a5e895a	[LoongArch] Reimplement the expansion of PseudoLA_LARGE instructions (#76555 ) According to the description of the psABI v2.30: https://github.com/loongson/la-abi-specs/releases/tag/v2.30, moved the expansion of relevant pseudo-instructions from `LoongArchPreRAExpandPseudo` pass to `LoongArchExpandPseudo` pass, to ensure that the code sequences of `PseudoLA_LARGE` instructions and Medium code model's function call are not scheduled.	2024-01-05 10:57:53 +08:00
wanglei	3d6fc35b90	[LoongArch] Pre-commit test for #76555 . NFC	2024-01-05 10:57:40 +08:00
wanglei	2cf420d5b8	[LoongArch] Emit function call code sequence as `PCADDU18I+JIRL` in medium code model According to the description of the psABI v2.20: https://github.com/loongson/la-abi-specs/releases/tag/v2.20, adjustments are made to the function call instructions under the medium code model. At the same time, AsmParser has already supported parsing the call36 and tail36 macro instructions.	2024-01-05 10:56:47 +08:00
Matt Arsenault	597086c609	DAG: Implement promotion for strict_fp_round (#74332 ) Needs an AMDGPU hack to get the selection to work. The ordinary variant is custom lowered through an almost equivalent target node that would need a strict variant for additional known bits optimizations.	2024-01-05 08:44:19 +07:00
Matt Arsenault	47685633a7	AMDGPU: Make v4bf16 a legal type (#76217 ) Gets a few code quality improvements. A few cases are worse from losing load narrowing. Depends #76213 #76214 #76215	2024-01-05 08:35:07 +07:00
Simon Pilgrim	2cbf652615	[X86] avx512-pmovxrm.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-04 17:17:08 +00:00
Simon Pilgrim	63e3074781	[X86] aligned-variadic.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-04 17:17:07 +00:00
Simon Pilgrim	ce4459d590	[X86] 64-bit-shift-by-32-minus-y.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-04 17:17:07 +00:00
Simon Pilgrim	076dbc0272	[X86] SimplifyDemandedVectorEltsForTargetNode - add X86ISD::VZEXT_LOAD handling. Simplify to a scalar_to_vector(load()) if we don't demand any of the upper vector elements.	2024-01-04 17:17:07 +00:00
Simon Pilgrim	5cd3cf1072	[X86] cvtv2f32.ll - replace X32 checks with X86. NFC. We try to use X32 for gnux32 triples only.	2024-01-04 17:17:06 +00:00
Matt Arsenault	460ffcddd9	AMDGPU: Make bf16/v2bf16 legal types (#76215 ) There are some intrinsics are using i16 vectors in place of bfloat vectors. Move towards making bf16 vectors legal so these can migrate. Leave the larger vectors for a later change. Depends #76213 #76214	2024-01-04 22:31:18 +07:00
Chaitanya	9803de0e8e	[AMDGPU] Add dynamic LDS size implicit kernel argument to CO-v5 (#65273 ) "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout. Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a function uses dynamic LDS. hidden argument will be added in below cases: - LDS global is used in the kernel. - Kernel calls a function which uses LDS global. - LDS pointer is passed as argument to kernel itself.	2024-01-04 19:05:12 +05:30
Jay Foad	26ff659c39	[AMDGPU] Remove some unused check prefixes	2024-01-04 13:16:46 +00:00
HaohaiWen	e147dcbcbc	[SEH] Add test to track EHa register liveness verification (#76921 ) This test tracks bug of MachineVerifier to check live range segment for EHa. Async exception can happen at any place within seh scope, not only the call instruction. Need to teach MachineVerifier to know that.	2024-01-04 20:49:11 +08:00
Thomas Preud'homme	ce61b0e9a4	Add out-of-line-atomics support to GlobalISel (#74588 ) This patch implement the GlobalISel counterpart to 4d7df43ffdb460dddb2877a886f75f45c3fee188.	2024-01-04 10:15:16 +00:00
sstipanovic	b4ac4d2264	[NFC][AMDGPU] Move image-atomic-attributes test to test/Assembler. (#76917 )	2024-01-04 10:38:18 +01:00
Chen Zheng	dd4dc2111e	nfc add cases for pr47156 and pr47155	2024-01-04 03:56:40 -05:00
Phoebe Wang	176c341198	[X86][BF16] Add 32-bit tests to show ABI problem, NFC	2024-01-04 15:43:34 +08:00
David Green	5550e9c841	[GlobalISel][AArch64] Add libcall lowering for fpowi. (#67114 ) This adds legalization, notably libcall lowering for fpowi. It is a little different to other methods as the function takes both a float and integer register. Otherwise all vectors get scalarized and fp16 is promoted to fp32.	2024-01-04 07:26:23 +00:00
sstipanovic	55395f5c83	[AMDGPU] Remove `nosync` from image atomic intrinsics. (#76814 ) Remove `nosync` as discussed in https://github.com/llvm/llvm-project/pull/73613	2024-01-04 08:22:05 +01:00

... 26 27 28 29 30 ...

52796 Commits