llvm-project

Author	SHA1	Message	Date
Tim Northover	6b98824a58	AArch64: emit `fcmp ord %a, zeroinitializer` as a single fcmeq. Most "ord" checks need two real-world compares to implement, but this is the canonical form of a "!isnan" check, which is equivalent to comparing the input for equality against itself.	2022-12-07 19:17:30 +00:00
Amara Emerson	53445f5b1c	[GlobalISel] Add a new G_INVOKE_REGION_START instruction to fix an EH bug. We currently have a bug where the legalizer, when dealing with phi operands, may create instructions in the phi's incoming blocks at points which are effectively dead due to a possible exception throw. Say we have: throwbb: EH_LABEL x0 = %callarg1 BL @may_throw_call EH_LABEL B returnbb bb: %v = phi i1 %true, throwbb, %false.... When legalizing we may need to widen the i1 %true value, and to do that we need to create new extension instructions in the incoming block. Our insertion point currently is the MBB::getFirstTerminator() which puts the IP before the unconditional branch terminator in throwbb. These extensions may never be executed if the call throws, and therefore we need to emit them before the call (but not too early, since our new instruction may need values defined within throwbb as well). throwbb: EH_LABEL x0 = %callarg1 BL @may_throw_call EH_LABEL %true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws! B returnbb bb: %v = phi i32 %true, throwbb, %false.... To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START is a terminator, which tries to model the fact that in the IR, the original invoke inst is actually a terminator as well. By using that as the new insertion point, we make sure to place new instructions on always executing paths. Unfortunately we still need to make the legalizer use a new insertion point API that I've added, since the existing `getFirstTerminator()` method does a reverse walk up the block, and any non-terminator instructions cause it to bail out. To avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new method that does a forward walk instead. Differential Revision: https://reviews.llvm.org/D137905	2022-12-07 10:28:51 -08:00
Anton Sidorenko	f8ed709345	[MachineCombiner] Extend reassociation logic to handle inverse instructions Machine combiner supports generic reassociation only of associative and commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we can extend this generic support to handle patterns like (X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`. This patch adds interface functions to process reassociation patterns of associative/commutative instructions and their inverse variants with minimal changes in backends. Differential Revision: https://reviews.llvm.org/D136754	2022-12-07 13:50:28 +03:00
David Sherwood	bfb6f47e9e	[SVE] Change some bfloat lane intrinsics to use i32 immediates Almost all of the other SVE LLVM IR intrinsics take i32 values for lane indices or other immediates. We should bring the bfloat intrinsics in line with that. It will also make it easier to add support for the SVE2.1 float intrinsics in future, since they reuse the same underlying instruction classes. I've maintained backwards compatibility with the old i64 variants and used the autoupgrade mechanism. Differential Revision: https://reviews.llvm.org/D138788	2022-12-07 09:19:54 +00:00
Sanjay Patel	adc7c589c3	[SDAG] try to convert bit set/clear to signbit test when trunc is free (X & Pow2MaskC) == 0 --> (trunc X) >= 0 (X & Pow2MaskC) != 0 --> (trunc X) < 0 This was noted as a regression in the post-commit feedback for D112634 (where we canonicalized IR differently). For x86, this saves a few instruction bytes. AArch64 seems neutral. Differential Revision: https://reviews.llvm.org/D139363	2022-12-06 11:34:48 -05:00
Sanjay Patel	772c2f461b	[AArch64][RISCV][x86] add tests for masked val equality with 0; NFC	2022-12-06 11:34:48 -05:00
Sander de Smalen	5922a04dbd	[AArch64][SVE2p1] Make use of REVD instruction. Reversing double-words within a quard-word is possible using the REVD instruction when SVE2p1 is enabled. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D139119	2022-12-06 15:42:32 +00:00
chenglin.bi	5cd900ce3c	[AArch64] Transform shift+and to shift+shift to select more shifted register and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D138904	2022-12-06 23:29:56 +08:00
bipmis	bda1f0b96c	Add tests which can be matched to umull	2022-12-06 12:49:05 +00:00
Archibald Elliott	83b3304dd2	[AArch64] Implement __arm_rsr128/__arm_wsr128 This only contains the SelectionDAG implementation. GlobalISel to follow. The broad approach is: - Introduce new builtins for 128-bit wide instructions. - Lower these to @llvm.read_register.i128/@llvm.write_register.i128 - Introduce target-specific ISD nodes which have legal operands (two i64s rather than an i128). These are named AArch64::{MRRS, MSRR} to match the instructions they are for. These are a little complex as they need to match the "shape" of what they're replacing or the legaliser complains. - Select these using the existing tryReadRegister/tryWriteRegister to share the MDString parsing code, and introduce additional code to ensure these are selected into the right MRRS/MSRR instructions. What makes this hard is ensuring that the two i64s end up in an XSeqPair register pair, because SelectionDAG doesn't care that much about register classes if it can avoid doing so. The main change to existing code is the reorganisation of tryReadRegister and tryWriteRegister to try to keep the string parsing code separate from the instruction creating code. This also includes the changes to clang to define and use the ACLE feature macro named `__ARM_FEATURE_SYSREG128`. Contributors: Sam Elliott Lucas Prates Differential Revision: https://reviews.llvm.org/D139086	2022-12-06 11:39:05 +00:00
Ties Stuij	94e7e58fa4	[AArch64] implement GPR (U/S)(MIN/MAX) instruction SDag support Using SelectionDag, lower umin, umax, smin, smax intrinsics to corresponding UMIN, UMAX, SMIN, SMAX instructions when feat CSSC is available. See specs for corresponding immediate and register versions in: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138813	2022-12-06 10:57:49 +00:00
Ties Stuij	eaea4608e6	[AArch64] lower abs intrinsic to new ABS instruction in SelDag When feature CSSC is available, the SelectionDag abs intrinsic should map to the new scalar ABS instruction. Additionally, the SIMDTwoScalarD tablegen defm includes a pattern match for scalar i64, which we don't want to use when CSSC is enabled. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/ABS--Absolute-value- Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138812	2022-12-06 10:48:21 +00:00
Ties Stuij	2f778e60c9	[AArch64] SelectionDag codegen for gpr CTZ instruction When feature CSSC is available we should use instruction CTZ in SelectionDag where applicable: - CTTZ intrinsics are lowered to using the gpr CTZ instruction - BITREVERSE -> CTLZ instruction pattern gets replaced by CTZ spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CTZ--Count-Trailing-Zeros- Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138811	2022-12-06 10:42:07 +00:00
Kristina Bessonova	4e958b4d7c	[llvm-objdump] Avoid using mapping symbols as branch target labels The main motivation for this change is to avoid ambiguity because mapping symbol names may not be unique across a binary and do not allow uniquely identifying target address. So that mapping symbols used as branch target labels make llvm-objdump output less readable. Another point is that mapping symbols sometimes appear in non-allocatable sections, like debug info sections which make objdump output even more confusing. For example, a small AArch64 executable may contain plenty of `$d[.*]` symbols and none of them would be useful as a label for resolving a branch or a memory operand target address: ``` 0000000000000254 l .note.ABI-tag 0000000000000000 $d 00000000000008d4 l .eh_frame 0000000000000000 $d 0000000000000868 l .rodata 0000000000000000 $d 0000000000011028 l .data 0000000000000000 $d 0000000000010db8 l .fini_array 0000000000000000 $d 0000000000010db0 l .init_array 0000000000000000 $d 00000000000008e8 l .eh_frame 0000000000000000 $d 0000000000011034 l .bss 0000000000000000 $d ``` Note that GNU objdump doesn't use mapping symbols as branch target labels for all targets that support such symbols (ARM, AArch64, CSKY). Differential Revision: https://reviews.llvm.org/D139131	2022-12-06 12:19:12 +02:00
Leonard Chan	d629db535f	Reland "[llvm] Teach FastISel for AArch64 about tagged globals" This reverts commit aacf17aa0ca8a67efc0ad2d4cfd90e551b5d6a7f. Fixed by using the right register class for the movk.	2022-12-05 23:27:36 +00:00
Leonard Chan	aacf17aa0c	Revert "[llvm] Teach FastISel for AArch64 about tagged globals" This reverts commit 7358c29a42714eb8d7d7bcdb58688d20430689e4. This broke an upstream builder: https://lab.llvm.org/buildbot/#/builders/16/builds/39356	2022-12-05 22:45:04 +00:00
Leonard Chan	7358c29a42	[llvm] Teach FastISel for AArch64 about tagged globals This addresses https://github.com/llvm/llvm-project/issues/57750. For some globals, the tag wasn't propagated correctly because the necessary movk wasn't emitted sometimes. Differential Revision: https://reviews.llvm.org/D138615	2022-12-05 22:16:55 +00:00
Philip Reames	186c192261	[SDAG] Allow scalable vectors in SimplifyDemanded routines This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. Differential Revision: https://reviews.llvm.org/D137190	2022-12-05 12:42:16 -08:00
Jonas Paulsson	5ecd363295	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." This reverts commit 122efef8ee9be57055d204d52c38700fe933c033. - Patch fixed to not reuse definitions from predecessors in EH landing pads. - Late review suggestions (by MaskRay) have been addressed. - M68k/pipeline.ll test updated. - Init captures added in processBlock() to avoid capturing structured bindings. - RISCV has this disabled for now. Original commit message: A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-05 12:53:50 -06:00
Philip Reames	7969ab85e0	[SDAG] Allow scalable vectors in ComputeKnownBits (try 2) This was previously reverted due to a hang on a Hexagon bot. This turned out to be a bug in the Hexagon backend around how splat_vectors are legalized (which they're using for fixed length vectors!). I adjusted this patch to remove the implicit truncate support. This hides the hexagon bug for now, and unblocks the rest of the change. Original commit message: This is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch. Differential Revision: https://reviews.llvm.org/D137140	2022-12-05 08:52:37 -08:00
bipmis	081b7f6b03	[AAch64] Optimize muls with operands having enough sign bits. Muls with 64bit operands where each of the operand is having more than 32 sign bits, we can generate a single smull instruction on a 32bit operand. Differential Revision: https://reviews.llvm.org/D138817	2022-12-05 15:08:31 +00:00
Dmitry Vyukov	dbe8c2c316	Use-after-return sanitizer binary metadata Currently per-function metadata consists of: (start-pc, size, features) This adds a new UAR feature and if it's set an additional element: (start-pc, size, features, stack-args-size) Reviewed By: melver Differential Revision: https://reviews.llvm.org/D136078	2022-12-05 14:40:31 +01:00
Vladislav Dzhidzhoev	f32cafedf0	[GlobalISel][DebugInfo] Propagate debug location for localized constants After IRTranslator pass, constants are deduplicated and translated into instructions at entry block, having debug locations lost. Localization of constants may cause emission of extra zero lines in debug_line section, like here https://godbolt.org/z/ecvsxxfKn. In this example, constant gets placed as a first instruction in entry block, and despite it has no debug location, AsmPrinter emits zero line for it. If a localized constant has the only user, we can assume that it has the same debug location as its user, since they are placed consequently. Differential Revision: https://reviews.llvm.org/D128192	2022-12-05 16:38:24 +03:00
Sander de Smalen	4d2f0f723a	[AArch64][SME] Avoid going through memory for streaming-compatible splats Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D139111	2022-12-05 13:04:30 +00:00
Tiehu Zhang	7927722a74	[AArch64][SVE2] Add patterns for eor3 Add patterns for: eor x, (eor y, z) -> eor3 x, y, z Reviewed By: dmgreen, sdesmalen Differential Revision: https://reviews.llvm.org/D138793	2022-12-05 18:16:55 +08:00
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit 17db0de330f943833296ae72e26fa988bba39cb3. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
David Green	16a72a0f87	[AArch64] Enable the select optimize pass for AArch64 This enabled the select optimize patch for ARM Out of order AArch64 cores. It is trying to solve a problem that is difficult for the compiler to fix. The criteria for when a csel is better or worse than a branch depends heavily on whether the branch is well predicted and the amount of ILP in the loop (as well as other criteria like the core in question and the relative performance of the branch predictor). The pass seems to do a decent job though, with the inner loop heuristics being well implemented and doing a better job than I had expected in general, even without PGO information. I've been doing quite a bit of benchmarking. The headline numbers are these for SPEC2017 on a Neoverse N1: 500.perlbench_r -0.12% 502.gcc_r 0.02% 505.mcf_r 6.02% 520.omnetpp_r 0.32% 523.xalancbmk_r 0.20% 525.x264_r 0.02% 531.deepsjeng_r 0.00% 541.leela_r -0.09% 548.exchange2_r 0.00% 557.xz_r -0.20% Running benchmarks with a combination of the llvm-test-suite plus several versions of SPEC gave between a 0.2% and 0.4% geomean improvement depending on the core/run. The instruction count went down by 0.1% too, which is a good sign, but the results can be a little noisy. Some issues from other benchmarks I had ran were improved in rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted branches will see in improvement, badly predicted branches may get worse, and on average performance seems to be a little better overall. This patch enables the pass for AArch64 under -O3 for cores that will benefit for it. i.e. not in-order cores that do not fit into the "Assume infinite resources that allow to fully exploit the available instruction-level parallelism" cost model. It uses a subtarget feature for specifying when the pass will be enabled, which I have enabled under cpu=generic as the performance increases for out of order cores seems larger than any decreases for inorder, which were minor. Differential Revision: https://reviews.llvm.org/D138990	2022-12-03 16:08:58 +00:00
Matt Arsenault	a74c5707be	Fix some test files with executable permissions	2022-12-02 17:12:03 -05:00
Matt Arsenault	46584de02c	AArch64/GlobalISel: Convert tests to opaque pointers inttoptr_add.ll had a mangled bitcast constantexpr. translate-gep.ll: Restored a 0 GEP	2022-12-02 16:19:38 -05:00
Matt Arsenault	9585500fea	GlobalISel: Replace bitcast test pointer usage This won't be meaningful with opaque pointers (I guess we could leave a ptr to ptr bitcast, or allow same sized address space bitcasts).	2022-12-02 16:19:38 -05:00
Matt Arsenault	8c04c78cfa	AArch64/GlobalISel: Regenerate test checks Try to shrink the diff in the opaque pointer conversion. Had to work around some update_mir_test_checks bugs. It seems to struggle when the successor list is empty around the blank line checks it inserts.	2022-12-02 16:19:38 -05:00
Ties Stuij	82a5f1c62b	[AArch64] use CNT for ISD::popcnt and ISD::parity if available These are the two places where we explicitly want to use cnt in SelectionDAG when feature CSSC is available: ISD::popcnt and ISD::parity For both, we need to make sure we're emitting optimized code for i32 (and lower), i64 and i128. The most optimal way is of course using the GPR CNT instruction. If we don't have CSSC, but we do have neon, we'll use floating point CNT. If all fails, we'll fall back on the general GPR popcnt and parity implementations. spec: https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/CNT--Count-bits- Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D138808	2022-12-02 11:27:14 +00:00
chenglin.bi	e63f64bd14	[AArch64] Precommit test for D138904; NFC shift + and -> shift + shift to select more shfited registers.	2022-12-02 10:59:03 +08:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit 6d12599fd4134c1da63198c74a25490d28c733f6.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Sander de Smalen	d32c9e8384	Reland "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca." Phabricator review for this patch was D138791	2022-12-01 14:48:30 +00:00
Sander de Smalen	ea11f4ff0a	Reland "[AArch64][SME]: Add precursory tests for D138791" This reverts commit 06846596eb1768eea06778a5b6da31145e84e461.	2022-12-01 14:48:30 +00:00
David Sherwood	06846596eb	Revert "[AArch64][SME]: Add precursory tests for D138791" This reverts commit 45adca0f52af346a131163d1cc3e4a08baf7f0f1.	2022-12-01 11:14:01 +00:00
David Sherwood	4a5ccf4e93	Revert "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca." This reverts commit 279c0a83aa22cd35d4b7c7c52b85d2a86f2528a7.	2022-12-01 10:22:21 +00:00
Freddy Ye	89f36dd8f3	[X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241	2022-12-01 13:47:43 +08:00
Hassnaa Hamdi	2bda5a6287	[AArch64][SME][NFC]: Enable lowering truncate for enhancement. Enable lowering truncate to enhance the generated code.	2022-12-01 03:54:28 +00:00
Hassnaa Hamdi	279c0a83aa	[AArch64][SME]: Generate streaming-compatible code for ld2-alloca. To generate code compatible to streaming mode: - disable lowering interleaved load to avoid generating invalid NEON intrinsics. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D138791	2022-12-01 02:31:01 +00:00
Hassnaa Hamdi	45adca0f52	[AArch64][SME]: Add precursory tests for D138791 Testing files: - ld2-alloca.ll	2022-12-01 02:31:01 +00:00
Hassnaa Hamdi	1dee88fac1	[AArch64][SME]: Add streaming-compatible testing files. Testing files: - int-compares.ll - int-immediates.ll - log-reduce.ll Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D138717	2022-12-01 01:37:36 +00:00
Hassnaa Hamdi	43eef96833	[AArch64][SME]: Add streaming-compatible testing files. Testing files: - limit-duplane.ll - optimize-ptrue.ll - ptest.ll Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D138768	2022-12-01 01:15:44 +00:00
Hassnaa Hamdi	de7222b2dd	[AArch64][SME]: Add streaming-compatible testing files. Testing files: - subvector.ll - permute-rev.ll - permute-zip-uzp-trn.ll - vector-shuffle.ll Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D138683	2022-12-01 00:52:08 +00:00
Marco Elver	b95646fe70	Revert "Use-after-return sanitizer binary metadata" This reverts commit d3c851d3fc8b69dda70bf5f999c5b39dc314dd73. Some bots broke: - https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview - https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio	2022-11-30 23:35:50 +01:00
Dmitry Vyukov	d3c851d3fc	Use-after-return sanitizer binary metadata Currently per-function metadata consists of: (start-pc, size, features) This adds a new UAR feature and if it's set an additional element: (start-pc, size, features, stack-args-size) Reviewed By: melver Differential Revision: https://reviews.llvm.org/D136078	2022-11-30 14:50:22 +01:00
Philip Reames	fc0efb7e78	[SDAG] Allow scalable vectors in ComputeNumSignBits (try 2) I had reverted this before the holiday week because a problem was reported with a related change (D137140 - scalable vector known bits in DAG). I had initially confused the two patches, and then decided to leave this reverted out an abundance of caution. Now that we're through the holiday week, reapplying. I also roled in fixes for several post commit review comments that hadn't landed with the original change. Original commit message This is a continuation of the series of patches adding lane wise support for scalable vectors in various knownbit-esq routines. The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations. Differential Revision: https://reviews.llvm.org/D137141	2022-11-29 08:25:05 -08:00

1 2 3 4 5 ...

6194 Commits