llvm-project

Author	SHA1	Message	Date
Matthew Devereau	23ea98f155	[AArch64][SVE2] Do not emit RSHRNB for large shifts (#66672 ) rshrnb's shift amount operand must be between 1-EltSizeInBits. This patch stops RSHRNB ISD nodes being emitted in this case	2023-09-22 10:36:54 +01:00
Ivan Kosarev	469b3bfad2	[AMDGPU] Add True16 register classes. Reviewed By: rampitec, Joe_Nash Differential Revision: https://reviews.llvm.org/D156099	2023-09-22 10:17:02 +01:00
Fangrui Song	4389252c58	Revert "[DAG] getNode() - remove oneuse limit from (zext (trunc (assertzext x))) -> (assertzext x) fold" This reverts commit 05926a5a557878aa233ac8431b3acddf54422e58. Caused AArch64 crash #12 0x00007f09eec09181 skipExtensionForVectorMULL(llvm::SDNode, llvm::SelectionDAG&) #13 0x00007f09eec08289 llvm::AArch64TargetLowering::LowerMUL(llvm::SDValue, llvm::SelectionDAG&) const #14 0x00007f09eec1a3fd llvm::AArch64TargetLowering::LowerOperation(llvm::SDValue, llvm::SelectionDAG&) const #15 0x00007f09dc8586a7 (anonymous namespace)::VectorLegalizer::LowerOperationWrapper(llvm::SDNode, llvm::SmallVectorImpl<llvm::SDValue>&)	2023-09-22 00:14:31 -07:00
David Green	22f423aa46	[ARM] Add some extra testing for MVE postinc loops. NFC	2023-09-22 07:08:49 +01:00
Amara Emerson	985362e2f3	[AArch64][GlobalISel] Avoid running the shl(zext(a), C) -> zext(shl(a, C)) combine. (#67045 )	2023-09-22 09:37:52 +08:00
Amara Emerson	ddddf7f35e	[AArch64][GlobalISel] Split offsets of consecutive stores to aid STP … (#66980 )	2023-09-22 09:35:43 +08:00
Arthur Eubanks	9b6b2a0cec	[X86] Use RIP-relative for non-globals in medium code model in classifyLocalReference() (#67070 ) We only want to treat globals as potentially far away, not other things like constants in the constant pool. This matches the object file emission that only puts the large section flag on globals. Remove FIXME since the remaining differences are accesses to 0 sized globals which are intentional.	2023-09-21 16:50:33 -07:00
Artem Belevich	d06b3e3b6a	[NVPTX] improve lowering for common byte-extraction operations. (#66945 ) Some critical code paths we have depend on efficient byte extraction from data loaded as integers. By default LLVM tries to extract bytes by storing/loading from stack, which is very inefficient on GPU.	2023-09-21 13:48:54 -07:00
Matthew Devereau	b967f3a1d7	[AArch64] Separate PNR into its own Register Class (#65306 ) This patch separates PNR registers into their own register class instead of sharing a register class with PPR registers. This primarily allows us to return more accurate register classes when applying assembly constraints, but also more protection from supplying an incorrect predicate type to an invalid register operand.	2023-09-21 19:53:16 +01:00
Momchil Velikov	3769aaaf1f	[AArch64] Pre-commit some tests for D152828 (NFC) Generate a few of the relevant tests with `update_llc_test_checks.py` and pre-commit. Makes it easier to spot the differences in D152828. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D157116	2023-09-21 18:49:14 +01:00
Momchil Velikov	0eb0a65d0f	[AArch64] Correctly determine if {ADD,SUB}{W,X}rs instructions are cheap These are marked to be "as cheap as a move". According to publicly available Software Optimization Guides, they have one cycle latency and maximum throughput only on some microarchitectures, only for `LSL` and only for some shift amounts. This patch uses the subtarget feature `FeatureALULSLFast` to determine how cheap the instructions are. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D152827 Change-Id: I8f0d7e79bcf277ebf959719991c29a1bc7829486	2023-09-21 18:44:24 +01:00
Momchil Velikov	ededcb0041	[AArch64] Refactor AArch64InstrInfo::isAsCheapAsAMove (NFC) - remove `FeatureCustomCheapAsMoveHandling`: when you have target features affecting `isAsCheapAsAMove` that can be given on command line or passed via attributes, then every sub-target effectively has custom handling - remove special handling of `FMOVD0`/etc: `FVMOV` with an immediate zero operand is never[1] more expensive tha an `FMOV` with a register operand. - remove special handling of `COPY` - copy is trivially as cheap as itself - make the function default to the `MachineInstr` attribute `isAsCheapAsAMove` - remove special handling of `ANDWrr`/etc and of `ANDWri`/etc: the fallback `MachineInstr` attribute is already non-zero. - remove special handling of `ADDWri`/`SUBWri`/`ADDXri`/`SUBXri` - there are always[1] one cycle latency with maximum (for the micro-architecture) throughput - check if `MOVi32Imm`/`MOVi64Imm` can be expanded into a "cheap" sequence of instructions There is a little twist with determining whether a MOVi32Imm`/`MOVi64Imm` is "as-cheap-as-a-move". Even if one of these pseudo-instructions needs to be expanded to more than one MOVZ, MOVN, or MOVK instructions, materialisation may be preferrable to allocating a register to hold the constant. For the moment a cutoff at two instructions seems like a reasonable compromise. [1] according to 19 software optimisation manuals Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D154722	2023-09-21 18:30:01 +01:00
Sirish Pande	e6f9483f77	[SelectionDAG] Flags are dropped when creating a new FMUL (#66701 ) While simplifying some vector operators in DAG combine, we may need to create new instructions for simplified vectors. At that time, we need to make sure that all the flags of the new instruction are copied/modified from the old instruction. If "contract" is dropped from an instruction like FMUL, it may not generate FMA instruction which would impact performance. Here's an example where "contract" flag is dropped when FMUL is created. Replacing.2 t42: v2f32 = fmul contract t41, t38 With: t48: v2f32 = fmul t38, t38 Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-09-21 10:26:34 -05:00
Luke Lau	b5ff71e261	[RISCV] Shrink vslideup's LMUL when lowering fixed insert_subvector (#65997 ) Similar to #65598, if we're using a vslideup to insert a fixed length vector into another vector, then we can work out the minimum number of registers it will need to slide up across given the minimum VLEN, and shrink the type operated on to reduce LMUL accordingly. This is somewhat dependent on #66211 , since it introduces a subregister copy that triggers a crash with -early-live-intervals in one of the tests. Stacked upon #66211	2023-09-21 13:55:49 +01:00
David Spickett	1778d6802b	Revert "[AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG" This reverts commit fb8f59156f0f208f6192ed808fc223eda6c0e7ec and b8e9450acb1ad10d002a85b7dafa9d14c764478f. Due to test suite failures on AArch64: https://lab.llvm.org/buildbot/#/builders/183/builds/16057	2023-09-21 12:33:11 +00:00
Jeffrey Byrnes	acb4854563	[AMDGPU] Precommit test for D159533 (#66965 ) Precommit test ahead of https://reviews.llvm.org/D159533 for ISD::FSHR / AMDGPUISD::PERM combine	2023-09-21 12:17:59 +01:00
Simon Pilgrim	05926a5a55	[DAG] getNode() - remove oneuse limit from (zext (trunc (assertzext x))) -> (assertzext x) fold Noticed on D159533 and I've finally deal with the x86 regressions - MatchingStackOffset wasn't peeking through AssertZext nodes while trying to find CopyFromReg/Load sources, it was only removing them if they were part of a (trunc (assertzext x)) pattern.	2023-09-21 12:07:49 +01:00
Paulo Matos	0495cd89fc	[UpdateTestChecks] Add support for SPIRV in update_llc_test_checks.py (#66213 ) Support for SPIRV added, updated test SPV_INTEL_optnone.ll using the script. Previously https://reviews.llvm.org/D157858	2023-09-21 12:51:42 +02:00
Mirko Brkušanin	ecfdc23dd2	[AMDGPU] Select gfx1150 SALU Float instructions (#66885 )	2023-09-21 12:22:55 +02:00
David Green	af56c4a4cb	[AArch64] Add an aarch64-enable-ext-to-tbl option. NFC This transform has caused a few issues with operations that can naturally be extended. This patch just adds a debug option for disabling the transform, useful for testing cases where it might not be profitable.	2023-09-21 11:20:19 +01:00
Nathan Gauër	6bad175a87	[SPIRV][DX] Share one test between backends (#65975 ) One big issue with DirectXShaderCompiler was test coverage: DXIL and SPIR-V backends had their own tests. When a bug was found in one, the other wasn't always checked. This lead to unequal support of HLSL for both backends. We'd like to avoid those issues here, hence the test-sharing. By default, all the tests in this folder are marked as requiring DirectX. But as SPIR-V support grows, each test drop this requirement, and check the SPIR-V behavior. I would have preferred to mark new tests as XFAIL for SPIR-V by default, so we could differentiate real unsupported tests (as SPIR-V has no equivalent), from newly added tests. But the way LIT is built, I don't think this is possible. --------- Signed-off-by: Nathan Gauër <brioche@google.com>	2023-09-21 12:15:55 +02:00
Pierre van Houtryve	fe2f67e4ba	[AMDGPU] Remove Code Object V2 (#65715 ) Code Object V2 has been deprecated for more than a year now. We can safely remove it from LLVM. - [clang] Remove support for the `-mcode-object-version=2` option. - [lld] Remove/refactor tests that were still using COV2 - [llvm] Update AMDGPUUsage.rst - Code Object V2 docs are left for informational purposes because those code objects may still be supported by the runtime/loaders for a while. - [AMDGPU] Remove COV2 emission capabilities. - [AMDGPU] Remove `MetadataStreamerYamlV2` which was only used by COV2 - [AMDGPU] Update all tests that were still using COV2 - They are either deleted or ported directly to code object v4 (as v3 is also planned to be removed soon).	2023-09-21 12:00:45 +02:00
Nikita Popov	8b4e29b35d	[X86] Add test for #66984 (NFC)	2023-09-21 09:14:10 +02:00
Craig Topper	cbd4596168	Recommmit "[RISCV] Improve contant materialization to end with 'not' if the cons… (#66950 )" With MC test updates. Original commit message We can invert the value and treat it as if it had leading zeroes.	2023-09-20 18:51:51 -07:00
Craig Topper	ea064ba6a2	Revert "[RISCV] Improve contant materialization to end with 'not' if the cons… (#66950 )" This reverts commit a8b8e9476451e125e81bd24fbde6605246c59a0e. Forgot to update MC tests.	2023-09-20 17:05:00 -07:00
Craig Topper	a8b8e94764	[RISCV] Improve contant materialization to end with 'not' if the cons… (#66950 ) …tant has leading ones. We can invert the value and treat it as if it had leading zeroes.	2023-09-20 16:52:51 -07:00
Matheus Izvekov	8b04f1e49a	[X86] fix combineSubSetcc to handle a large constant (#66941 )	2023-09-20 23:17:33 +02:00
Aiden Grossman	3dc2f2618b	[MLGO] Move MBB Profile Dump test to Generic (#66856 ) This patch moves the MBB Profile Dump to ./llvm/test/CodeGen/Generic from ./llvm/test/CodeGen/MlRegAlloc as the profile dump doesn't have anything to do with the ML guided register allocation heuristic.	2023-09-20 11:50:33 -07:00
Fangrui Song	9f4c9b90c9	Revert D155711 "[SimplifyCFG] Hoist common instructions on Switch." This reverts commit 96ea48ff5dcba46af350f5300eafd7f7394ba606. The change may cause Verifier.cpp error "musttail call must precede a ret with an optional bitcast"	2023-09-20 11:49:20 -07:00
Noah Goldstein	6d6314ba64	[DAGCombiner] Extend `combineFMulOrFDivWithIntPow2` to work for non-splat float vecs Do so by extending `matchUnaryPredicate` to also work for `ConstantFPSDNode` types then encapsulate the constant checks in a lambda and pass it to `matchUnaryPredicate`. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154868	2023-09-20 13:28:24 -05:00
Noah Goldstein	47c642f9a0	[DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp Note: This is moving D154678 which previously implemented this in InstCombine. Concerns where brought up that this was de-canonicalizing and really targeting a codegen improvement, so placing in DAGCombiner. This implements: ``` (fmul C, (uitofp Pow2)) -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa)) (fdiv C, (uitofp Pow2)) -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa)) ``` The motivation is mostly fdiv where 2^(-p) is a fairly common expression. The patch is intentionally conservative about the transform, only doing so if we: 1) have IEEE floats 2) C is normal 3) add/sub of max(Log2(Pow2)) stays in the min/max exponent bounds. Alive2 can't realistically prove this, but did test float16/float32 cases (within the bounds of the above rules) exhaustively. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154805	2023-09-20 13:28:24 -05:00
Noah Goldstein	32a46919a2	[AMDGPU] Add tests for folding `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp; NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D159405	2023-09-20 13:28:24 -05:00
Noah Goldstein	6ec53b4567	[X86] Add tests for folding `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp; NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D154804	2023-09-20 13:28:24 -05:00
Sirish Pande	cc3491fd45	[SelectionDAG] [NFC] Add pre-commit test for PR66701. (#66796 ) [SelectionDAG] [NFC] Add pre-commit test for PR66701. Co-authored-by: Sirish Pande <sirish.pande@amd.com>	2023-09-20 11:37:18 -05:00
David Green	46a1908c26	[AArch64] Add some tests for setcc known bits fold. NFC	2023-09-20 17:36:01 +01:00
Vladislav Dzhidzhoev	b8e9450acb	Cleanup fallback NOT checks	2023-09-20 18:22:54 +02:00
Vladislav Dzhidzhoev	fb8f59156f	[AArch64][GlobalISel] Adopt dup(load) -> LD1R patterns from SelectionDAG Follow-up of #65630.	2023-09-20 18:22:54 +02:00
Simon Pilgrim	ad762f2a9f	[X86] Regenerate pr39098.ll	2023-09-20 16:58:00 +01:00
Simon Pilgrim	2ec697b4c7	[AMDGPU] Regenerate always-uniform.ll	2023-09-20 16:58:00 +01:00
Natalie Chouinard	47a377d5e0	[SPIRV] Fix OpConstant float and double printing Print OpConstant floats as formatted decimal floating points, with special case exceptions to print infinity and NaN as hexfloats. This change follows from the fixes in https://github.com/llvm/llvm-project/pull/66686 to correct how constant values are printed generally. Differential Revision: https://reviews.llvm.org/D159376	2023-09-20 15:26:41 +00:00
Joe Nash	2c0f2b510c	[AMDGPU] Convert tests rotr.ll and rotl.ll to be auto-generated (#66828 ) and add GFX11 coverage. NFC	2023-09-20 10:32:04 -04:00
Matt Devereau	d297399b35	[AArch64][SME] Enable TPIDR2 lazy-save for za_preserved This change makes callees with the __arm_preserves_za type attribute comply with the dormant state requirements when it's caller has the __arm_shared_za type attribute. Several external SME functions also do not need to lazy save. `5e67092434/aapcs64/aapcs64.rst (L1381)` Differential Revision: https://reviews.llvm.org/D159186	2023-09-20 13:34:41 +00:00
Natalie Chouinard	f7bfa583b7	[SPIR-V] Fix 64-bit integer literal printing (#66686 ) Previously, the SPIR-V instruction printer was always printing the first operand of an `OpConstant`'s literal value as one of the fixed operands. This is incorrect for 64-bit values, where the first operand is actually the value's lower-order word and should be combined with the following higher-order word before printing. This change fixes that issue by waiting to print the last fixed operand of `OpConstant` instructions until the variadic operands are ready to be printed, then using `NumFixedOps - 1` as the starting operand index for the literal value operands. Depends on D156049	2023-09-20 09:31:14 -04:00
Luke Lau	450dfab8c3	[RISCV] Add tests where bin ops of splats could be scalarized. NFC (#65747 ) This adds tests for fixed and scalable vectors where we have a binary op on two splats that could be scalarized. Normally this would be scalarized in the middle-end by VectorCombine, but as noted in https://reviews.llvm.org/D159190, this pattern can crop up during CodeGen afterwards. Note that a combine already exists for this, but on RISC-V currently it only works on scalable vectors where the element type == XLEN. See #65068 and #65072	2023-09-20 13:23:56 +01:00
Simon Pilgrim	3b7dfda79d	[X86] Add test cases for gnux32 large constants Issue #55061 Test file showing current codegen for D124406	2023-09-20 12:24:51 +01:00
Simon Pilgrim	170ba6ee12	[X86] combineINSERT_SUBVECTOR - attempt to combine concatenated shuffles If all the concatenated subvectors are targets shuffle nodes, then call combineX86ShufflesRecursively to attempt to combine them. Unlike the existing shuffle concatenation in collectConcatOps, this isn't limited to splat cases and won't attempt to concat the source nodes prior to creating the larger shuffle node, so will usually only combine to create cross-lane shuffles. This exposed a hidden issue in matchBinaryShuffle that wasn't limiting v64i8/v32i16 UNPACK nodes to AVX512BW targets.	2023-09-20 12:17:51 +01:00
Simon Pilgrim	0662791a13	[X86] vector-interleaved tests - add AVX512-SLOW/AVX512-FAST common prefixes to reduce duplication These aren't always used but its lot more manageable to keep the vector-interleaved files using the same RUN lines wherever possible	2023-09-20 12:17:51 +01:00
Jay Foad	a68c7241ec	[AMDGPU] Run twoaddr tests with -early-live-intervals (#66775 ) Sample test case: %3 = V_FMAC_F32_e32 killed %0, %1, %2, implicit $mode, implicit $exec With LiveVariables this is converted to three-address form just because there is no "killed" flag on %2. To make it do the same thing with LiveIntervals I added a later use of %2: %3 = V_FMAC_F32_e32 killed %0, %1, %2, implicit $mode, implicit $exec S_ENDPGM 0, implicit %2	2023-09-20 08:22:00 +01:00
Yeting Kuo	976df42e6a	[RISCV] Fix bugs about register list of Zcmp push/pop. (#66073 ) The pr does two things. One is to fix internal compiler error when we need to spill callee saves but none of them is GPR, another is to fix wrong register number for pushed registers are {ra, s0-s11}.	2023-09-20 15:20:13 +08:00
Dhruv Chawla	3e992d81af	[InferAlignment] Enable InferAlignment pass by default This gives an improvement of 0.6%: https://llvm-compile-time-tracker.com/compare.php?from=7d35fe6d08e2b9b786e1c8454cd2391463832167&to=0456c8e8a42be06b62ad4c3e3cf34b21f2633d1e&stat=instructions:u Differential Revision: https://reviews.llvm.org/D158600	2023-09-20 12:08:52 +05:30

... 53 54 55 56 57 ...

52796 Commits