llvm-project

Author	SHA1	Message	Date
Alex Bradbury	1cffd26483	[TargetLowering][RISCV] Improve codegen for saturating bf16 to int conversion Extending to f32 first (as is done for f16) results in better generated code for RISC-V (and affects no other in-tree tests). Additionally, performing the FP_EXTEND first seems equally justified for bf16 as for f16. Differential Revision: https://reviews.llvm.org/D156944	2023-08-07 11:21:25 +01:00
Alex Bradbury	7a1b2adc45	[RISCV] Implement straight-forward bf16<->int conversion cases This ports over the test cases half-convert.ll and implements patterns or RISCVISelLowering.cpp changes for all of the most straight-forward cases (those that don't require changes outside of lib/Target/RISCV). The remaining cases and noted poor codegen for saturating conversions will be handled in follow-up patches. Differential Revision: https://reviews.llvm.org/D156943	2023-08-07 11:12:51 +01:00
Jingu Kang	f580901d5d	[MachineCSE] Add an option to override the profitability heuristics Differential Revision: https://reviews.llvm.org/D157002	2023-08-07 10:06:02 +01:00
Simon Pilgrim	9d3b19e8e9	[X86] ReplaceNodeResults - relax the value type constraints for TRUNCATE widening With SSSE3, widen the truncation for anything other than vXi64 -> vXi8 smaller than v8i64 (where PSHUFB would be better).	2023-08-07 09:41:38 +01:00
Yashwant Singh	3dc413e25d	[AMDGPU] Skip debug instruction uses while optimizing live range of a reg in SIOptimizeVGPRLiveRange This will prevent the `assert(!O.readsReg())` from firing in SIOptimizeVGPRLiveRange::optimizeLiveRange Fix for #64163 Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D156893	2023-08-07 11:35:39 +05:30
David Green	ffc5ed976a	[AArch64][GISel] Expand handling for G_FABS to more vector types. This now reuses the existing lowering for G_FMIN/MAX for G_FABS too, which can handle more type successfully. We can hopefully reuse the same pattern action definition for other fp operations too.	2023-08-06 14:58:25 +01:00
David Green	0e757122a1	[AArch64][GISel] Expand lowering for fminimum and fmaximum This replicates the G_FMINNUM and G_FMAXNUM lowering to G_FMINIMUM and G_FMAXIMUM, reusing the same action definition for lowering.	2023-08-06 14:36:52 +01:00
David Green	6df2c2b4a2	[AArch64] Add a more extensive fabs test. NFC Now covers gisel as well as selection dag, and more types are tested. The existing tests for combines to fabs are moved to fabs-combine.ll.	2023-08-06 14:02:57 +01:00
Simon Pilgrim	ce2ec06516	[X86] Only fold broadcast with extract_vector_elt/scalar_to_vector if the scalar type matches the vector element type Avoid handling implicit extension/truncation with scalar<->vector transfers Fixes #64439	2023-08-05 16:01:22 +01:00
Simon Pilgrim	ef4330f4f3	[X86] truncateVectorWithPACK - handle vector truncations to sub-64-bit vector widths Extend the existing 128-bit -> 64-bit truncation handling by widening/narrowing the src/dst vectors and use the lower half operand/result for PACKSS/PACKUS instructions.	2023-08-05 16:01:22 +01:00
Stanislav Mekhanoshin	0c7e8c06bc	[AMDGPU] Change syncscopes.mir not to use undefined cpol bits. NFC.	2023-08-04 11:19:12 -07:00
Simon Pilgrim	e22908692c	[X86] ReplaceNodeResults - widen sub-128-bit vector truncations if it would allow them to use PACKSS/PACKUS We currently just scalarize sub-128-bit vector truncations, but if the input vector has sufficient signbits/zerobits then we should try to use PACKSS/PACKUS with a widened vector with don't care upper elements. Shuffle lowering will struggle to detect this if we wait until the scalarization has been revectorized as a shuffle. Another step towards issue #63710	2023-08-04 17:36:19 +01:00
Craig Topper	814250191d	[RISCV] Add vector legalization for fmaximum/fminimum. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D156937	2023-08-04 08:07:14 -07:00
Ben Shi	a133fb289a	[CSKY][NFC] Fix broken tests in eac78fdf68f58e113b2cf18a14baccb8f5ebcf50	2023-08-04 22:01:44 +08:00
David Green	bbe945b8a1	[AArch64][GISel] Expand G_DUP and G_DUPLANE to v8s8 and v4s16 This fills in the gaps with v8s8 and v4s8 vectors for G_DUP and G_DUPLANE, using the existing code that is generalized to more types.	2023-08-04 12:43:53 +01:00
Vladislav Dzhidzhoev	19d7ab14ec	[GlobalISel] Handle sequences of trunc(sext/zext/anyext...) in artifact combiner trunc(sext/zext/anyext... x) -> x pattern is handled in artifact combiner to avoid extra copy instructions in https://reviews.llvm.org/D156831.	2023-08-04 13:29:49 +02:00
Jay Foad	34ffc30a90	[AMDGPU] Fix typo in comment in test	2023-08-04 11:04:54 +01:00
Ben Shi	eac78fdf68	[CSKY][test][NFC] Add tests of conditional branch and value select These tests will be optimzied with BTSTI16/BTSTI32 in the future. Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D154767	2023-08-04 16:10:57 +08:00
Ben Shi	528831dd1a	[CSKY] Optimize ANDI/ORI to BSETI/BCLRI for specific immediates Reviewed By: zixuan-wu Differential Revision: https://reviews.llvm.org/D153614	2023-08-04 16:10:36 +08:00
David Green	f3b9b94a8b	[AArch64][GISel] Expand arm64-dup and arm64-rev tests for global isel. NFC	2023-08-04 09:06:47 +01:00
Craig Topper	40f3708205	[RISCV] Add a test case that would have failed before D156974. NFC Tweak the immediate on two vror.vi test cases to use a uimm6 immediate that would have failed before D156974 when we were looking for a simm6 immediate.	2023-08-03 11:23:55 -07:00
Craig Topper	a8c502a589	[RISCV] Add bf16 to isFPImmLegal. Part of this test file was stolen from D156895. We should merge them when committing. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156926	2023-08-03 08:27:38 -07:00
pvanhout	62ea799e6c	[AMDGPU] Break Large PHIs: Take whole PHI chains into account Previous heuristics had a big flaw: they only looked at single PHI at a time, and didn't take into account the whole "chain". The concept of "chain" is important because if we only break a chain partially, we risk forcing regalloc to reserve twice as many registers for that vector. We also risk adding a lot of copies that shouldn't be there and can inhibit backend optimizations. The solution I found is to consider the whole "PHI chain" when looking at PHI. That is, we recursively look at the PHI's incoming value & users for other PHIs, then make a decision about the chain as a whole. The currrent threshold requires that at least `ceil(chain size * (2/3))` PHIs have at least one interesting incoming value. In simple terms, two-thirds (rounded up) of the PHIs should be breakable. This seems to work well. A lower threshold such as 50% is too aggressive because chains can often have 7 or 9 PHIs, and breaking 3+ or 4+ PHIs in those case often causes performance issue. Fixes SWDEV-409648, SWDEV-398393, SWDEV-413487 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D156414	2023-08-03 16:41:11 +02:00
Tamir Duberstein	055893beac	[BPF] Don't crash on missing line info When compiling Rust code we may end up with calls to functions provided by other code units. Presently this code crashes on a null pointer dereference - this patch avoids that crash and adds a test. Reviewed By: ast Differential Revision: https://reviews.llvm.org/D156446	2023-08-03 09:18:12 -04:00
Piyou Chen	05041b78a7	[RISCV] emit .option directive for functions with target features which differ from module default When function has different attributes from module, emit the .option <attribute> before the function body. This allows non-integrated assemblers to properly assemble the functions (which may contain instructions dependent on the extra target features). Reviewed By: craig.topper, reames Differential Revision: https://reviews.llvm.org/D155155	2023-08-03 04:22:39 -07:00
Oliver Stannard	f2e7285b03	[AArch64][PtrAuth] Fix unwind state for tail calls When generating unwind tables for code which uses return-address signing, we need to toggle the RA_SIGN_STATE DWARF register around any tail-calls, because these require the return address to be authenticated before the call, and could throw an exception. This is done using the .cfi_negate_ra_state directive before the call, and .cfi_restore_state at the start of the next basic block. However, since D153098, the .cfi_restore_state isn't being inserted, because the CFIFixup pass isn't being run. This re-enables that pass when return-adress signing is enabled. Reviewed By: ikudrin, MaskRay Differential Revision: https://reviews.llvm.org/D156428	2023-08-03 11:45:51 +01:00
Jay Foad	0da19a2be5	[PEI][WebAssembly] Switch to backwards frame index elimination Backwards frame index elimination uses backwards register scavenging, which is preferred because it does not rely on accurate kill flags. Differential Revision: https://reviews.llvm.org/D156691	2023-08-03 10:21:43 +01:00
Simon Pilgrim	7f9b94c044	[X86] LowerBuildVectorv16i8 - attempt to merge lowest 2 x i16 insertions into a i32 MOVD scalar_to_vectpr Similar to D156350, if we were going to create 2 x i16 insertions (MOVD+PINSRW), try to merge them into a single MOVD to reduce the amount of GPR<->VEC traffic	2023-08-03 10:20:20 +01:00
Jim Lin	a2938ba707	[RISCV] Add tests that m extension enabled in extractelt-int-rv64.ll. NFC. It has been added in extractelt-int-rv32.ll.	2023-08-03 15:34:44 +08:00
Yeting Kuo	f68c6879ad	[RISCV] Use max pushed register to get pushed register number. Previously we used the number of registers needed saved and pushable as the number of pushed registers. We also use pushed register number to caculate the stack size. It is not correct because Zcmp pushes registers from $ra to the max register needed saved and there is no gurantee that the needed saved registers are a sequenced list from $ra. There is an example about that. PushPopRegs should be 6 (ra,s0 - s4)= instead of 1. ``` ; llc -mtriple=riscv32 -mattr=+zcmp define void @foo() { entry: ; Old: .cfi_def_cfa_offset 16 ; New: .cfi_def_cfa_offset 32 tail call void asm sideeffect "li s4, 0", "~{s4}"() ret void } ``` Reviewed By: Jim, kito-cheng Differential Revision: https://reviews.llvm.org/D156407	2023-08-03 14:49:15 +08:00
Alex Bradbury	8a71f44e00	[RISCV] Expand test coverage of bf16 operations with Zfbfmin and fix gaps This doesn't bring us to parity with the test/CodeGen/RISCV/half-* test cases, it simply picks off an initial set that can be supported especially easy. In order to make the review more manageable, I'll follow up with other cases. There is zero innovation in the test cases - they simply take the existing half/float cases and replace f16->bf16 and half->bfloat. Differential Revision: https://reviews.llvm.org/D156895	2023-08-03 07:06:57 +01:00
Bing1 Yu	6ee497aa0b	[X86][Regcall] Add an option to respect regcall ABI v.4 in win64&win32 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D155863	2023-08-03 13:58:33 +08:00
Jim Lin	40cc106fa0	[RISCV] Scalarize binop followed by extractelement to custom lowered instruction isOperationLegalOrCustomOrPromote returns true only if VT is other or legal and operation action is Legal, Custom or Promote. Permit a vector binary operation can be converted to scalar binary operation which is custom lowered with illegal type. One of cases is i32 isn't a legal type on RV64 and its ALU operations is set to custom lowering, so vadd for element type i32 can be converted to addw. Reviewed By: jacquesguan, craig.topper Differential Revision: https://reviews.llvm.org/D156692	2023-08-03 13:02:49 +08:00
Craig Topper	c1c5da8f1f	[RISCV] Merge fp-imm.ll and zfh-imm.ll into float/double/half-imm.ll. NFC fp-imm.ll and zfh-imm.ll test 0.0 and -0.0 while float/double/half-imm.ll tested other non-zero constants. It seems like they should all be tested together. There are slight coverage changes due to different command lines, but I'm not sure its meaningful. For example, we now don't test double 0.0 and -0.0 with only the F extension. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D156929	2023-08-02 20:16:50 -07:00
Yeting Kuo	cd79599304	[RISCV] Teach lowerScalarInsert to handle scalar value is the first element of a fixed vector. D155929 teach lowerScalarInsert to handl start value (extractelement scalable_vector, 0) and specifically converts fixed extracted vectors to scalable vectors when lowering vector reduction. It's not enough because there is another way to create (extractelement fixed_vector, 0) as a start value of lowerScalarInsert like #64327. #64327: https://github.com/llvm/llvm-project/issues/64327. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156863	2023-08-03 10:53:14 +08:00
Phoebe Wang	4d6f4c9c93	[X86] Special handle for v1i1 during ExtractBitFromMaskVector Fixes #64322 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D156855	2023-08-03 09:50:31 +08:00
Luke Lau	0834355227	[RISCV] Add VP patterns for vwsll.[vv,vx,vi] This patch adds patterns for the existing riscv_shl_vl VL node. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D156915	2023-08-03 00:43:13 +01:00
Matt Arsenault	54bda79335	AMDGPU: Simplify and improve sincos matching The first trivial example I tried failed to merge due to the user scan logic. Remove the complicated scan of users handling with distance thresholds, with a same block restriction. The actual expansion of sincos is basically the same size as sin or cos individually. Copy the technique the generic optimization uses, which is to just use the input instruction as the insert point or just insert at the start of the entry block. https://reviews.llvm.org/D156706	2023-08-02 17:48:35 -04:00
Philip Reames	660b740e4b	[DAG] Support store merging of vector constant stores Ran across this when making a change to RISCV memset lowering. Seems very odd that manually merging a store into a vector prevents it from being further merged. Differential Revision: https://reviews.llvm.org/D156349	2023-08-02 14:41:46 -07:00
Alex Bradbury	667602793b	[RISCV] Implement support for bf16 select when zfbfmin is enabled These test cases previously caused an error. RISCVInstrInfo::copyPhysReg also needed a tweak in order to account for copying bf16 values in FPR16 registers. Differential Revision: https://reviews.llvm.org/D156883	2023-08-02 20:04:30 +01:00
4vtomat	346c1f2641	[RISCV] Support vector crypto extension LLVM IR Depends on D141672 Differential Revision: https://reviews.llvm.org/D138809	2023-08-02 10:25:36 -07:00
Philip Reames	fe4c99d1d6	[RISCV] Add test case showing CSE regression from issue 64282	2023-08-02 09:12:46 -07:00
Matt Arsenault	b953155b49	AMDGPU: Fix counting debug instructions in execz skip threshold	2023-08-02 08:09:41 -04:00
Mirko Brkusanin	acdc503d6c	[AMDGPU][GlobalISel] Update applyMappingImpl for G_ABS and type v2s16 For G_ABS with type v2s16 and sgpr inputs break down into two s32 G_ABS instructions. Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D155867	2023-08-02 12:27:06 +02:00
Mirko Brkusanin	fadf3e7f2b	[AMDGPU][GlobalISel] Update legalizer for G_ABS, G_SMIN, G_SMAX, G_UMIN, G_UMAX There is no need to increase the size of odd sized vectors if they are going to be scalarized by a different rule. Patch by: Acim Maravic Differential Revision: https://reviews.llvm.org/D155865	2023-08-02 12:18:18 +02:00
Alex Bradbury	8acb8a143f	[RISCV] Make Zcf and Zcd imply the F and D extensions respectively This was an omission in the spec that has now been addressed https://github.com/riscv/riscv-code-size-reduction/pull/224. Differential Revision: https://reviews.llvm.org/D156314	2023-08-02 10:40:38 +01:00
Alex Bradbury	be0dac268d	[RISCV] Improve codegen for i8/i16 'atomicrmw xchg a, {0,-1}' As noted in <https://github.com/llvm/llvm-project/issues/64090>, it's more efficient to lower a partword 'atomicrmw xchg a, 0` to and amoand with appropriate mask. There are a range of possible ways to go about this - e.g. writing a combine based on the `llvm.riscv.masked.atomicrmw.xchg` intrinsic, or introducing a new interface to AtomicExpandPass to allow target-specific atomics conversions, or trying to lift the conversion into AtomicExpandPass itself based on querying some target hook. Ultimately I've gone with what appears to be the simplest approach - just covering this case in emitMaskedAtomicRMWIntrinsic. I perhaps should have given that hook a different name way back when it was introduced. This also handles the `atomicrmw xchg a, -1` case suggested by Craig during review. Fixes https://github.com/llvm/llvm-project/issues/64090 Differential Revision: https://reviews.llvm.org/D156801	2023-08-02 09:48:50 +01:00
Jay Foad	c2093b8504	[AMDGPU] Add target features for GDS and GWS GFX9 subtargets from GFX90A onwards lack GDS but still have GWS. Differential Revision: https://reviews.llvm.org/D156713	2023-08-02 09:02:07 +01:00
Jay Foad	8f973d5c45	[DebugInfo] Fix crash when printing malformed DBG machine instructions MachineVerifier does not check that DBG_VALUE, DBG_VALUE_LIST and DBG_INSTR_REF have the expected number of operands, so printing them (e.g. with -print-after-all) should not crash. Differential Revision: https://reviews.llvm.org/D156226	2023-08-02 08:28:20 +01:00
Jim Lin	d6a48a348a	[RISCV] Fix the CFI offset for callee-saved registers stored by Zcmp push. Issue mentioned: https://github.com/riscv/riscv-code-size-reduction/issues/182 The order of callee-saved registers stored by Zcmp push in memory is reversed. Pseudo code for cm.push in https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.4-1/Zc.1.0.4-1.pdf ``` if (XLEN==32) bytes=4; else bytes=8; addr=sp-bytes; for(i in 27,26,25,24,23,22,21,20,19,18,9,8,1) { //if register i is in xreg_list if (xreg_list[i]) { switch(bytes) { 4: asm("sw x[i], 0(addr)"); 8: asm("sd x[i], 0(addr)"); } addr-=bytes; } } ``` The placement order for push is s11, s10, ..., ra. CFI offset should be calculed as reversed order for correct stack unwinding. Reviewed By: fakepaper56, kito-cheng Differential Revision: https://reviews.llvm.org/D156437	2023-08-02 13:03:21 +08:00

... 68 69 70 71 72 ...

52796 Commits