llvm-project

Author	SHA1	Message	Date
Min-Yih Hsu	1e39575a98	[RISCV] CSE by swapping conditional branches (#71111 ) DAGCombiner, as well as InstCombine, tend to canonicalize GE/LE into GT/LT, namely: ``` X >= C --> X > (C - 1) ``` Which sometime generates off-by-one constants that could have been CSE'd with surrounding constants. Instead of changing such canonicalization, this patch tries to swap those branch conditions post-isel, in the hope of resurfacing more constant CSE opportunities. More specifically, it performs the following optimization: For two constants C0 and C1 from ``` li Y, C0 li Z, C1 ``` To remove redundnat `li Y, C0`, 1. if C1 = C0 + 1 we can turn: (a) blt Y, X -> bge X, Z (b) bge Y, X -> blt X, Z 2. if C1 = C0 - 1 we can turn: (a) blt X, Y -> bge Z, X (b) bge X, Y -> blt Z, X This optimization will be done by PeepholeOptimizer through RISCVInstrInfo::optimizeCondBranch.	2023-11-03 09:03:52 -07:00
Philip Reames	f6f769203d	[tests] Autogenerate a couple of tests As usual, making it easier for an upcoming test delta to be seen. Note that several of these are examples of extremely bad testing practice. Checking internal debug output (for no real purpose), and checking the result of a fully O2 + llc run instead of reducing the specific problematic pass.	2023-11-03 08:42:23 -07:00
Paul Walker	de88371d9d	[LLVM][AArch64] Add ASM constraints for reduced GPR register ranges. (#70970 ) [LLVM][AArch64] Add ASM constraints for reduced GPR register ranges. The patch adds the follow ASM constraints: Uci => w8-w11/x8-x11 Ucj => w12-w15/x12-x15 These constraints are required for SME load/store instructions where a reduced set of GPRs are used to specify ZA array vectors. NOTE: GCC has agreed to use the same constraint syntax.	2023-11-03 15:34:45 +00:00
Jessica Del	6e4692c9ee	[AMDGPU] - Add s_wqm intrinsics (#71048 ) Add intrinsics to generate `s_wqm_b32` and `s_wqm_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-03 14:48:59 +01:00
Paul Walker	17970df6dc	[LLVM][SVE] Move ADDVL isel patterns under UseScalarIncVL feature flag. (#71173 ) Also removes a duplicate pattern.	2023-11-03 13:23:02 +00:00
Nikita Popov	e4a4122eb6	[IR] Remove zext and sext constant expressions (#71040 ) Remove support for zext and sext constant expressions. All places creating them have been removed beforehand, so this just removes the APIs and uses of these constant expressions in tests. There is some additional cleanup that can be done on top of this, e.g. we can remove the ZExtInst vs ZExtOperator footgun. This is part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.	2023-11-03 10:46:07 +01:00
Nikita Popov	060de415af	Reapply [InstCombine] Simplify and/or of icmp eq with op replacement (#70335 ) Relative to the first attempt, this contains two changes: First, we only handle the case where one side simplifies to true or false, instead of calling simplification recursively. The previous approach would return poison if one operand simplified to poison (under the equality assumption), which is incorrect. Second, we do not fold llvm.is.constant in simplifyWithOpReplaced(). We may be assuming that a value is constant, if the equality holds, but it may not actually be constant. This is nominally just a QoI issue, but the std::list implementation in libstdc++ relies on the precise behavior in a way that causes miscompiles. ----- and/or in logical (select) form benefit from generic simplifications via simplifyWithOpReplaced(). However, the corresponding fold for plain and/or currently does not exist. Similar to selects, there are two general cases for this fold (illustrated with `and`, but there are `or` conjugates). The basic case is something like `(a == b) & c`, where the replacement of a with b or b with a inside c allows it to fold to true or false. Then the whole operation will fold to either false or `a == b`. The second case is something like `(a != b) & c`, where the replacement inside c allows it to fold to false. In that case, the operand can be replaced with c, because in the case where a == b (and thus the icmp is false), c itself will already be false. As the test diffs show, this catches quite a lot of patterns in existing test coverage. This also obsoletes quite a few existing special-case and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst), but I haven't removed anything as part of this patch in the interest of risk mitigation. Fixes #69050. Fixes #69091.	2023-11-03 10:16:15 +01:00
Brandon Wu	74f38df1d1	[RISCV] Support Xsfvfnrclipxfqf extensions (#68297 ) FP32-to-int8 Ranged Clip Instructions https://sifive.cdn.prismic.io/sifive/0aacff47-f530-43dc-8446-5caa2260ece0_xsfvfnrclipxfqf-spec.pdf	2023-11-03 10:52:37 +08:00
Brandon Wu	945d2e6e60	[RISCV] Support Xsfvfwmaccqqq extensions (#68296 ) Bfloat16 Matrix Multiply Accumulate Instruction https://sifive.cdn.prismic.io/sifive/c391d53e-ffcf-4091-82f6-c37bf3e883ed_xsfvfwmaccqqq-spec.pdf	2023-11-03 10:08:26 +08:00
Michael Maitland	801a30aa8f	[CodeGen][MIR] Support parsing of scalable vectors in MIR (#70893 ) This patch builds on the support for vectors by adding ability to parse scalable vectors in MIR and updates error messages to reflect that ability.	2023-11-02 21:49:18 -04:00
Brandon Wu	65dc96c2cf	[RISCV] Fix wrong implication for zvknhb. (#66860 )	2023-11-03 09:32:21 +08:00
Zhaoxuan Jiang	1f54ef78d5	[AArch64] Only clear kill flags if necessary when merging str (#69680 ) Previously the kill flags of the source register were unconditionally cleared when a `str` pair was merged, which results in suboptimal register allocation and inhibits some renaming opportunities which may allow further merging `str`.	2023-11-02 17:03:21 -07:00
Kai Luo	7b5505b0d5	[PowerPC] Change registers used in test due to ABI breakage. NFC. (#70758 ) Usage of `r30` and `r31` has broken current traceback table's convention on AIX. Avoid using CSRs in livein list.	2023-11-03 07:08:33 +08:00
Craig Topper	9769026858	[RISCV] Add (i32 (and GPR:, TrailingOnesMask:)) pattern for RV64 with legal i32.	2023-11-02 15:03:05 -07:00
Craig Topper	014390d937	[RISCV] Implement cross basic block VXRM write insertion. (#70382 ) This adds a new pass to insert VXRM writes for vector instructions. With the goal of avoiding redundant writes. The pass does 2 dataflow algorithms. The first is a forward data flow to calculate where a VXRM value is available. The second is a backwards dataflow to determine where a VXRM value is anticipated. Finally, we use the results of these two dataflows to insert VXRM writes where a value is anticipated, but not available. The pass does not split critical edges so we aren't always able to eliminate all redundancy. The pass will only insert vxrm writes on paths that always require it.	2023-11-02 14:09:27 -07:00
qcolombet	839f1e40b1	[X86][SDAG] Improve the lowering of `s\|uitofp i8\|i16 to half` (#70834 ) Prior to this patch, vector `s\|uitofp` from narrow types (`<= i16`) were scalarized when the hardware doesn't support fp16 conversions natively. This patch fixes that by avoiding using `i16` as an intermediate type when there is no hardware support conversion from this type to half. In other words, when the target doesn't support `avx512fp16`, we avoid using intermediate `i16` vectors for `s\|uitofp` conversions. Instead we extend the narrow type to `i32`, which will be converted to `float` and downcasted to `half`. Put differently, we go from: ``` s\|uitofp iNarrow %src to half ``` To ``` %tmp = s\|zext iNarrow %src to i32 %tmpfp = s\|uitofp i32 %tmp to float fptrunc float %tmpfp to half ``` Note that this patch: - Doesn't change the actual lowering of i32 to half. I.e., the `float` intermediate step and the final downcasting are what existed for this input type to half. - Changes only the intermediate type for the lowering of `s\|uitofp`. I.e., the first `s\|zext` from i16 to i32. Remark: The vector and scalar lowering of `s\|uitofp` don't use the same code path. Not super happy about that, but I'm not planning to fix that, at least in this PR. This fixes https://github.com/llvm/llvm-project/issues/67080	2023-11-02 21:25:36 +01:00
Amara Emerson	d62c6ad2b0	Fix more RISCV GISel tests using -march instead of -mtriple	2023-11-02 12:42:00 -07:00
Nico Weber	6acd1671e6	Revert "[AMDGPU] Generate wwm-reserved.ll (NFC)" This reverts commit b3523d7e6d8834468cfcb66e629adbe17da90ea5. Breaks tests on mac, see: https://github.com/llvm/llvm-project/commit/b3523d7e6d88344#commitcomment-131547708	2023-11-02 14:55:41 -04:00
Simon Pilgrim	4c41e7ce20	[X86] Add the second test case mentioned on Issue #65895	2023-11-02 16:19:34 +00:00
Ramkumar Ramachandra	5e1d81ac68	LegalizeIntegerTypes: implement PromoteIntRes for xrint (#71055 ) Recently, 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering) introduced vector variants of llvm.lrint, llvm.llrint, and bundled several tests along with the code change. However, it forgot to test lrint and llrint on fixed vectors on RISC-V, and it turns out that that fixed-vectors-lrint.ll requires PromoteIntRes_XRINT to be implemented. Implement it, and add tests for fixed-vector lrint, llrint.	2023-11-02 15:53:56 +00:00
Paul Walker	c95253b1ba	[LLVM][SVE] Clean VLS tests to not use wide vectors as function return types.	2023-11-02 12:41:37 +00:00
Jay Foad	b90cfe4601	[AMDGPU] New ttracedata intrinsics (#70235 ) Add llvm.amdgcn.s.ttracedata and llvm.amdgcn.s.ttracedata.imm which map directly to the corresponding instructions s_ttracedata and s_ttracedata_imm. These are inherently whole-wave operations so any non-uniform inputs are readfirstlaned.	2023-11-02 10:35:15 +00:00
Jay Foad	65bad23e43	[AMDGPU] Fix test for #70532 (Implement moveToVALU for S_CSELECT_B64)	2023-11-02 10:31:02 +00:00
Jay Foad	1590cac494	[AMDGPU] Implement moveToVALU for S_CSELECT_B64 (#70352 ) moveToVALU previously only handled S_CSELECT_B64 in the trivial case where it was semantically equivalent to a copy. Implement the general case using V_CNDMASK_B64_PSEUDO and implement post-RA expansion of V_CNDMASK_B64_PSEUDO with immediate as well as register operands.	2023-11-02 10:08:09 +00:00
Jessica Del	41cf94e6b8	[AMDGPU] - Add s_quadmask intrinsics (#70804 ) Add intrinsics to generate `s_quadmask_b32` and `s_quadmask_b64`. Support VGPR arguments by inserting a `v_readfirstlane`.	2023-11-02 10:37:52 +01:00
Thomas Symalla	18839aec4e	[AMDGPU] Detect kills in register sets when trying to form V_CMPX instructions. (#68293 ) During the SIOptimizeExecMasking pass, we try to form V_CMPX instructions by detecting S_AND_SAVEEXEC and V_MOV instructions. Generally, we require the input operand of the V_MOV, which is the input operand to the to-be-formed V_CMPX, to be alive. This is forced by clearing the kill flags on the operand after V_CMPX has been generated. However, if we have a kill of a register set that contains said register, this will not be detected by clearKillFlags. With this change, possible additional kill-flag candidates will be detected during the final call to findInstrBackwards and then, the kill flag will be removed to keep all registers in the set alive. Co-authored-by: Thomas Symalla <thomas.symalla@amd.com>	2023-11-02 10:36:27 +01:00
Carl Ritson	b3523d7e6d	[AMDGPU] Generate wwm-reserved.ll (NFC)	2023-11-02 17:50:42 +09:00
Carl Ritson	0eb516817d	[AMDGPU] Remove dom tree requirements from SIWholeQuadMode pass (#71012 ) SIWholeQuadMode preserves dominator and post dominator trees, but does not require them.	2023-11-02 17:16:19 +09:00
Matt Arsenault	5a9b99630b	X86: Move ExpandLargeFpConvert tests to test/Transforms	2023-11-02 15:50:31 +09:00
Matt Arsenault	d636d73f94	Revert "Move ExpandLargeDivRem to llvm/test/CodeGen/X86 because they need a triple" This reverts commit 6bf1b4e8e0776e6f27013434d8b632016ccc795c. Requiring a triple does not require moving these to a codegen test directory. Move these to an x86 specific subdirectory of a transforms test.	2023-11-02 14:38:12 +09:00
Tobias Stadler	ba0763e4cb	[GlobalISel][M68k] Update test after 373c343 Missed test case in experimental target, which was not covered by pre-merge checks.	2023-11-02 03:32:47 +01:00
Tobias Stadler	373c343a77	Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND Reland 3686a0b after fixing an exposed miscompile in #68840 Differential Revision: https://reviews.llvm.org/D159140	2023-11-02 00:18:19 +01:00
Valery Pykhtin	e808f8a616	[AMDGPU] GCNRegPressurePrinter pass to print GCNRegPressure values for testing. (#70031 ) Using GCNDownwardRPTracker or GCNUpwardRPTracker the pass collects register pressure values for a function and prints these values next to instructions. Output can be used to generate Filecheck rules in mir tests.	2023-11-01 23:01:39 +01:00
Craig Topper	cfb791aa4b	[RISCV] Add RV64 i32 patterns for bseti/bclri/binvi. Needed for -riscv-experimental-rv64-legal-i32 and probably GISel.	2023-11-01 13:30:47 -07:00
Jay Foad	86f2e09250	[AMDGPU] Tweak handling of GlobalAddress operands in SI_PC_ADD_REL_OFFSET (#70960 ) When SI_PC_ADD_REL_OFFSET is expanded to S_GETPC/S_ADD/S_ADDC, the GlobalAddress operands have to be adjusted by 4 or 12 bytes to account for the offset from the end of the S_GETPC instruction to the literal operands. Do this all in SIInstrInfo::expandPostRAPseudo instead of duplicating the adjustment code in both AMDGPULegalizerInfo and SITargetLowering. NFCI.	2023-11-01 19:48:30 +00:00
Craig Topper	c4649d05cf	[RISCV] Teach RISCVOptWInstrs that 'bset x0, 30-0' satisfies isSignExtendingOpW. Constant materialization can use bset x0, 11 to create 2048.	2023-11-01 12:29:37 -07:00
Fangrui Song	a62b86a3e6	[AArch64,ELF] Restrict MOVZ/MOVK to non-PIC large code model (#70178 ) There is no PIC support for -mcmodel=large (https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst) and Clang recently rejects -mcmodel= with PIC (#70262). The current backend code assumes that the large code model is non-PIC. This patch adds `!getTargetMachine().isPositionIndependent()` conditions to clarify that the support is non-PIC only. In addition, add some tests as change detectors in case PIC large code model is supported in the future. If other front-ends/JITs use the large code model with PIC, they will get small code model code sequence, instead of potentially-incorrect MOVZ/MOVK sequence, which is only suitable for non-PIC. The sequence will cause text relocations using ELF linkers. (The small code model code sequence is usually sufficient as ADRP+ADD or ADRP+LDR targets [-232,232), which has a doubled range of x86-64 R_X86_64_REX_GOTPCRELX/R_X86_64_PC32 [-232,232).)	2023-11-01 12:10:44 -07:00
Craig Topper	5570d3250f	[RISCV] Don't promote i32 and/or/xor with -riscv-experimental-rv64-legal-i32. Some test improvements, but also some regressions that need to be fixed.	2023-11-01 11:36:46 -07:00
Craig Topper	8912200966	[RISCV] Add experimental support for making i32 a legal type on RV64 in SelectionDAG. (#70357 ) This will select i32 operations directly to W instructions without custom nodes. Hopefully this can allow us to be less dependent on hasAllNBitUsers to recover i32 operations in RISCVISelDAGToDAG.cpp. This support is enabled with a command line option that is off by default. Generated code is still not optimal. I've duplicated many test cases for this, but its not complete. Enabling this runs all existing lit tests without crashing.	2023-11-01 09:36:41 -07:00
Sander de Smalen	7dc20abed0	[AArch64] Fix spillfill-sve.mir with expensive checks. This fixes an issue introduced by PR #70679. Using constrainRegClass() is not strong enough to actually force the use of a register to be a PPR register class. It will need an actual COPY to do the conversion. The downside is that this introduces an extra register, which is an issue we may want to fix at a later point using a custom copy operation where the register allocator uses the same register when it can.	2023-11-01 16:29:44 +00:00
Shengchen Kan	860f9e5170	[NFC][X86] Reorder the registers to reduce unnecessary iterations (#70222 ) * Introduce field `PositionOrder` for class `Register` and `RegisterTuples` * If register A's `PositionOrder` < register B's `PositionOrder`, then A is placed before B in the enum in X86GenRegisterInfo.inc * The new order of registers in the enum for X86 will be 1. Registers before AVX512, 2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers) 3. AMX registers (TMM) 4. APX registers (R16-R31) * Add a new target hook `getNumSupportedRegs()` to return the number of registers for the function (may overestimate). * Replace `getNumRegs()` with `getNumSupportedRegs()` in LiveVariables to eliminate iterations on unsupported registers This patch can reduce 0.3% instruction count regression for sqlite3 during compile-stage (O3) by not iterating on APX registers for #67702	2023-11-02 00:12:05 +08:00
Nikita Popov	57384aeb37	[ConstantFold] Avoid creating undesirable cast expressions Similar to what we do for binops, for undesirable casts we should call the constant folding API instead of the constant expr API, to avoid indirect creation of undesirable cast ops.	2023-11-01 16:50:52 +01:00
Nikita Popov	7a5c14cb27	[X86] Regenerate test checks (NFC)	2023-11-01 16:37:30 +01:00
Sander de Smalen	2efea512c2	[AArch64] Fix spilling/filling of virtual registers in PNR regclass. (#70679 ) We made the assumption that the registers were always physical registers, which doesn't have to be true.	2023-11-01 10:57:12 +00:00
Simon Pilgrim	f471f6ff2f	[X86] combineTruncateWithSat - relax minimum truncation size for PACKSS/PACKUS truncateVectorWithPACK handling of sub-128-bit result types was improved some time ago, so remove the old 64-bit limit Fixes #68466	2023-11-01 10:33:35 +00:00
Simon Pilgrim	432e11478a	[X86] fpclamptosat_vec.ll - add AVX2/AVX512 test coverage	2023-11-01 10:04:28 +00:00
Simon Pilgrim	dc5e6e4c07	[X86] Add fpclamptosat to vXi8 test coverage Adds additional test coverage for Issue #68466	2023-11-01 10:04:28 +00:00
Craig Topper	2862d17b30	[RISCV][GISel] Add test case for FP load/store legalization. NFC	2023-10-31 23:49:56 -07:00
Qiu Chaofan	b46e768455	[DAGCombine] Fold setcc_eq infinity into is.fpclass (#67829 )	2023-11-01 11:51:15 +09:00
Min-Yih Hsu	87f671756d	[RISCV] Use FLI + FNEG to materialize some negative FP constants (#70825 ) Most of the FP constants supported by FLI are positive. For negative FP constants X whose positive values is supported by FLI, we can use `(FNEG (FLI -X))` to materialize X.	2023-10-31 17:52:50 -07:00

... 42 43 44 45 46 ...

52796 Commits