llvm-project

Author	SHA1	Message	Date
Andrew Lazarev	cfbb9a66ae	Revert "[msan] Switch switch() from strict handling to (icmp eq)-style handling" (#180636 ) Reverts llvm/llvm-project#179851 Breaks https://lab.llvm.org/buildbot/#/builders/164/builds/18551 and https://lab.llvm.org/buildbot/#/builders/94/builds/15188	2026-02-09 19:23:52 -05:00
Thurston Dang	64de25d183	[msan] Handle NEON floating-point absolute compare greater than/equal (#180120 ) Uses existing handleVectorComparePackedIntrinsic()	2026-02-06 09:43:25 -08:00
Thurston Dang	bd19bfeb26	[msan] Switch switch() from strict handling to (icmp eq)-style handling (#179851 ) Currently, the SwitchInst: ``` switch i32 %Val, label %else [ i32 0, label %A i32 1, label %B i32 2, label %C ] ``` is strictly handled i.e., MSan will check that %Val is fully initialized. This is appropriate nearly all the time. However, sometimes the compiler may convert (icmp + br) into a switch statement. (icmp + br) has different semantics: MSan allows icmp eq/ne with partly initialized inputs to still result in a fully initialized output, if there exists a bit that is initialized in both inputs with a differing value e.g., suppose: ``` %A = 00000000 00001010 %B = 00000000 00000110 %C = 00000000 00000011 %Val = 00000001 ???????? (where ? denotes an uninitialized bit) ``` Even though %Val has uninitialized bits, the initialized '1' bit immediately to the left, compared to the corresponding initialized '0' bit in %A/%B/%C suffices to prove that %Val does not match any of those cases. This is similar to a real-world case with std::optional (where the has_value bit may be initialized but the value is not). This patch adds this relaxed icmp logic to the switch instrumentation as well, to make MSan's behavior equivalent under optimization. Note that this edge case only applies if the switch input value definitively does not match any of the cases (matching any of the cases requires an exact, fully initialized match). If it is uncertain whether the switch input value could, depending on the uninitialized bits, match one of the cases or not, MSan will report use-of-uninitialized memory.	2026-02-06 09:14:50 -08:00
Thurston Dang	7211938492	[msan][NFCI] Refactor icmp eq/ne into propagateEqualityComparison() (#180115 ) This will be useful for handling switch (https://github.com/llvm/llvm-project/pull/179851).	2026-02-05 23:22:46 -08:00
Thurston Dang	3fd046440c	[msan] Handle NEON bfmmla (#176264 ) This patch adapts handleNEONMatrixMultiply() (used for integer matrix multiply: smmla/ummla/usmmla) to floating-point (bfmmla).	2026-02-05 21:35:38 -08:00
Thurston Dang	c41f956884	[msan][NFCI] Generalize handleAVX512VectorGenericMaskedFP (#179850 ) handleAVX512VectorGenericMaskedFP() assumes there is one vector of data (excluding the mask). This patch generalizes it to allow multiple vectors of data, which we assume will be munged together. Future work can apply this to intrinsics such as: ``` <16 x float> @llvm.x86.avx512.mask.scalef.ps.512 (<16 x float>, <16 x float>, <16 x float>, i16, i32) WriteThru A B Mask Rounding ```	2026-02-05 20:34:24 -08:00
Thurston Dang	bb7d1efbbd	[msan] Add intermediate verbosity instruction dump (#178771 ) This patch does not change MSan's instrumentation. -msan-dump-{heuristic,strict}-instructions currently prints out two lines per instruction: 1) instruction name only e.g., `call llvm.aarch64.neon.uqsub.v16i8` 2) the full instruction, including actual variables e.g., `%vqsubq_v.i15 = call noundef <16 x i8> @llvm.aarch64.neon.uqsub.v16i8(<16 x i8> %vext21.i, <16 x i8> splat (i8 1)), !dbg !66` Option 1) is too sparse for some uses, because it does not contain the return types or parameter types (although `.v16i8` is part of the function name in this example, in general, the function name does not describe the types completely; e.g., `<16 x float> llvm.x86.avx512.mask.scalef.ps.512(<16 x float>, <16 x float>, <16 x float>, i16, i32)`). OTOH option 2) can be too verbose because it contains the actual variables. This patch adds a Goldilocks-verbosity line: - instruction prototype (including return type and parameter types) e.g., `call <16 x i8> @llvm.aarch64.neon.uqsub(<16 x i8>, <16 x i8>)`. Note that this uses the base/non-overloaded function name, since the parameter types are already included in the output.	2026-02-04 16:50:11 -08:00
Thurston Dang	61c7d9e0b2	[msan] Support Arm NEON usdot (#178982 ) Handle tariff-free dot-product using the existing handleVectorDotProductIntrinsic() instead of with the default handler.	2026-01-30 18:02:49 -08:00
Thurston Dang	ac47d8c227	[msan] Handle Arm NEON BFloat16 multiply-add to single-precision (#178510 ) aarch64.neon.bfmlalb/t perform dot-products after zeroing out the odd/even-indexed values. We handle these by generalizing handleVectorDotProductIntrinsic() and (mis-)using getPclmulMask().	2026-01-29 20:09:24 -08:00
Jameson Nash	eef6b62dcc	[NFCI][IRBuilder] Add CreateAllocationSize helper (#178346 ) Create a new `IRBuilderBase::CreateAllocationSize` method to compute the runtime size of an alloca as a Value. This handles both static and dynamic allocas by computing `ArraySize ElementSize`, and using CreateTypeSize to properly handle scalable vectors. This de-duplicates code across multiple instrumentation and analysis passes and increases consistency. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 22:46:25 -05:00
Thurston Dang	bf43c9813b	[msan][NFCI] Check number of operands in handleVectorComparePackedIntrinsic() (#177291 ) handleVectorComparePackedIntrinsic() can currently handle x86 and Arm NEON vector comparisons, but is a bit lax about checking the number of operands. This patch parameterizes the handler to check for the correct number of operands, and also that the 3rd operand in x86 vector comparisons is an ImmArg.	2026-01-22 21:38:22 -08:00
Thurston Dang	0381093140	[msan] Handle NEON vsli/vsri (vector shift left/right and insert) (#177283 ) This does a shift and combine on the two vectors, hence we can precisely propagate the shadow by applying the intrinsic to the input shadows.	2026-01-22 10:18:30 -08:00
Thurston Dang	b887b523a2	[msan] Handle aarch64_neon_vcvt* (#177243 ) This fills in missing gaps in MSan's AArch64 NEON vector conversion intrinsic handling (intrinsics named aarch64_neon_vcvt* instead of aarch64_neon_fcvt*). SVE support sold separately. It also generalizes handleNEONVectorConvertIntrinsic to handle conversions to/from fixed-point.	2026-01-21 14:50:50 -08:00
Thurston Dang	792e3398f1	[msan] Handle NEON dot product intrinsics (#176084 ) Propagate shadow by reusing existing `handleVectorPmaddIntrinsic()` (used for analogous x86 instructions; renamed to `handleVectorDotProductIntrinsic()`), instead of strictly handling.	2026-01-21 09:44:12 -08:00
Thurston Dang	4d9624cab1	[msan][NFCI] Refactor visitAnd body into helper function (#176031 ) This allows reuse of the core visitAnd logic e.g., in handleVectorPmaddIntrinsic().	2026-01-14 13:52:12 -08:00
Thurston Dang	ad94750383	[msan] Handle NEON matrix multiplication (integral) (#174510 ) Instead of strictly handling smmla/ummla/usmmla, this patch propagates the shadow, with each output element considered initialized if all its constituent inputs are fully initialized.	2026-01-14 10:47:54 -08:00
Nikita Popov	8fd85ba9e6	[LLVM] Temporarily allow implicit truncation in some places Split out from https://github.com/llvm/llvm-project/pull/171456. This explicitly allows implicit truncation in a number of places, prior to switching the default. This limits the scope of the initial change.	2026-01-05 09:52:57 +01:00
Thurston Dang	b0fce8ec2e	[msan][NFCI] Remove element-size override for VNNI intrinsics (#172762 ) MSan's handleVectorPmaddIntrinsic had an EltSizeInBits parameter to override the incorrect element size for VNNI intrinsics. Now that the element size has been corrected (https://github.com/llvm/llvm-project/issues/97271), it is no longer necessary to override the element size. This patch also updates the comments.	2025-12-19 13:09:56 -08:00
BaiXilin	4f79552d25	[x86][AVX-VNNI] Fix VPDPWXXD Argument Types (#169456 ) Fixed the argument types of the following intrinsics to match with the ISA: - vpdpwssd_128, vpdpwssd_256, vpdpwssd_512, - vpdpwssds_128, vpdpwssds_256, vpdpwssds_512 - vpdpwsud_128, vpdpwsud_256, vpdowsud_512 - vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512 - vpdpwusd_128, vpdpwusd_256, vpdpwusd_512 - vpdpwusds_128, vpdpwusds_256, vpdpwusds_512 - vpdpwuud_128, vpdpwuud_256, vpdpwuud_512 - vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512 Fixes #97271. Note that this is the last PR for the issue.	2025-12-09 17:10:20 +00:00
Thurston Dang	1056584746	[msan] Fix handling of 256-bit hadd/hsub instructions (#168121 ) These horizontal add/sub instructions are currently handled by adding/subtracting tuples of the first operand, followed by tuples of the second operand. This is not the correct semantics for the 256-bit insructions: they process the first half of the first operand, then the first half of the second operand, then the second half of the first operand, and finally the second half of the second operand (trust me bro []). This patch fixes the issue by applying the "shards" functionality that was added in https://github.com/llvm/llvm-project/pull/167954, to handle the top and bottom 128-bit "shards" in turn. [] clang/test/CodeGen/X86/avx2-builtins.c: ``` TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32( (__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80}, (__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}), 30,70,20,60,110,150,100,140)); ```	2025-11-20 13:23:06 -08:00
Thurston Dang	30d8f69de9	[msan][NFCI] Generalize handlePairwiseShadowOrIntrinsic to have shards (#167954 ) This will allow fixing up the handling of AVX2 phadd/phsub instructions in a future patch, by setting Shards = 2. Currently, the extra functionality is not used.	2025-11-13 20:51:50 -08:00
Thurston Dang	f6004aea30	[msan] Support x86_avx512bf16_dpbf16ps (#166862 ) Use the generalized handleVectorPmaddIntrinsic(), but multiplication by an initialized zero does not guarantee that the result is zero (counter-example: multiply zero by NaN).	2025-11-13 20:51:15 -08:00
Thurston Dang	cdf52a1325	[msan][NFCI] Generalize handleVectorPmaddIntrinsic() (#166282 ) This generalizes `handleVectorPmaddIntrinsic()`: - potentially handle floating-point type intrinsics (e.g., `llvm.x86.avx512bf16.dpbf16ps.512`). This usage is not enabled yet. - "multiplication with an initialized zero guarantees that the corresponding output becomes initialized" is now gated by a parameter	2025-11-04 19:52:25 -08:00
Yi-Chi Lee	964b4abe6c	[Instrumentation] Fix typos across files in Transforms/Instrumentation (#165251 ) Closes #165240.	2025-10-27 16:23:45 +01:00
Thurston Dang	b6e6a4dc6d	[msan] Convert target("aarch64.svcount") from compile-time crash to MSan false negatives (#165028 ) MSan currently crashes at compile-time when it encounters target("aarch64.svcount") (e.g., https://github.com/llvm/llvm-project/pull/164315). This patch duct-tapes MSan so that it won't crash at compile-time, and instead propagates a clean shadow (resulting in false negatives but not false positives).	2025-10-24 14:03:08 -07:00
Nikita Popov	573ca36753	[IR] Replace alignment argument with attribute on masked intrinsics (#163802 ) The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter` intrinsics currently accept a separate alignment immarg. Replace this with an `align` attribute on the pointer / vector of pointers argument. This is the standard representation for alignment information on intrinsics, and is already used by all other memory intrinsics. This means the signatures now match llvm.expandload, llvm.vp.load, etc. (Things like llvm.memcpy used to have a separate alignment argument as well, but were already migrated a long time ago.) It's worth noting that the masked.gather and masked.scatter intrinsics previously accepted a zero alignment to indicate the ABI type alignment of the element type. This special case is gone now: If the align attribute is omitted, the implied alignment is 1, as usual. If ABI alignment is desired, it needs to be explicitly emitted (which the IRBuilder API already requires anyway).	2025-10-20 08:50:09 +00:00
Nikita Popov	a4767e63ee	[MemorySanitizer] Use getelementptr instead of ptrtoint+add+inttoptr (#161392 ) MemorySanitizer currently does a lot of pointer arithmetic using ptrtoint+add+inttoptr instead of using getelementptr. As far as I can tell, there is no need to use this pattern -- msan is not trying to synthesize pointers with different provenance here. The pointers in question stay within one object (like the TLS parameter area). I suspect that this is just a leftover from pre-opaque-pointer types where this was a natural way to perform offset arithmetic. Nowadays we should just emit a getelementptr i8, aka ptradd.	2025-10-02 09:16:08 +02:00
BaiXilin	0d9dd60815	[x86][AVX-VNNI] Fix VPDPBXXD Argument Type (#159222 ) Fixed intrinsic VPDP[SS,SU,UU]D[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271.	2025-09-30 09:41:12 +00:00
Thurston Dang	7ad70d2793	[msan] Handle AVX512/AVX10 vrndscale (#160624 ) Uses the updated handleAVX512VectorGenericMaskedFP() from https://github.com/llvm/llvm-project/pull/159966	2025-09-25 17:59:05 -07:00
Thurston Dang	475e0ee7fa	[msan][NFCI] Generalize handleAVX512VectorGenericMaskedFP() operands (#159966 ) This generalizes handleAVX512VectorGenericMaskedFP() (introduced in #158397), to potentially handle intrinsics that have A/WriteThru/Mask in an operand order that is different to AVX512/AVX10 rcp and rsqrt. Any operands other than A and WriteThru must be fully initialized. For example, the generalized handler could be applied in follow-up work to many of the AVX512 rndscale intrinsics: ``` <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512(<32 x half>, i32, <32 x half>, i32, i32) <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512(<16 x float>, i32, <16 x float>, i16, i32) <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512(<8 x double>, i32, <8 x double>, i8, i32) A Imm WriteThru Mask Rounding <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256(<8 x float>, i32, <8 x float>, i8) <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128(<4 x float>, i32, <4 x float>, i8) <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256(<4 x double>, i32, <4 x double>, i8) <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128(<2 x double>, i32, <2 x double>, i8) A Imm WriteThru Mask ```	2025-09-24 16:32:40 -07:00
Thurston Dang	ddd49c61cc	[msan] Handle AVX512/AVX10 rcp and rsqrt (#158397 ) Adds a new handler, handleAVX512VectorGenericMaskedFP(), and applies it to AVX512/AVX10 rcp and rsqrt	2025-09-15 12:19:14 -07:00
Thurston Dang	2fca446779	[msan] Handle AVX512 pack with saturation intrinsics (#157984 ) Approximately handle avx512_{packssdw/packsswb/packusdw/packuswb} with the existing handleVectorPackIntrinsic(), instead of relying on the default (strict) handler.	2025-09-11 10:41:40 -07:00
Kazu Hirata	752c1cf805	[Instrumentation] Fix formatting of MemorySanitizer.cpp	2025-09-10 08:23:13 -07:00
Karlo Basioli	3c60c03f53	Mark variable as maybe unused (only used in debug mode) (#157875 )	2025-09-10 16:10:32 +01:00
BaiXilin	94e2c19f86	[x86][AVX-VNNI] Fix VPDPBUSD Argument Types (#155194 ) Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA. Fixes part of #97271	2025-09-10 12:24:16 +00:00
Thurston Dang	1cc84bcc08	[msan] Fix multiply-add-accumulate (#153927 ) to use ReductionFactor (#155748 ) https://github.com/llvm/llvm-project/pull/153927 incorrectly cast using a hardcoded reduction factor of two, rather than using the parameter. This caused false negatives but not false positives. (The only incorrect case was a reduction factor of four; if four values {A,B,C,D} are being reduced, the result is fully zero iff {A,B} and {C,D} are both zero after pairwise reduction. If only one of those reduced pairs is zero, then the quadwise reduction is non-zero.)	2025-09-02 10:16:57 -07:00
Thurston Dang	5dafe66f07	[msan][NFCI] Refactor visitIntrinsicInst() into instruction families (#154878 ) Currently visitIntrinsicInst() is a long, partly unsorted list. This patch groups them into cross-platform, X86 SIMD, and Arm SIMD families, making the overall intent of visitIntrinsicInst() clearer: ``` void visitIntrinsicInst(IntrinsicInst &I) { if (maybeHandleCrossPlatformIntrinsic(I)) return; if (maybeHandleX86SIMDIntrinsic(I)) return; if (maybeHandleArmSIMDIntrinsic(I)) return; if (maybeHandleUnknownIntrinsic(I)) return; visitInstruction(I); } ``` There is one disadvantage: the compiler will not tell us if the switch statements in the handlers have overlapping coverage.	2025-08-25 12:11:48 -07:00
Thurston Dang	e45210afe2	[msan] Handle AVX512 VCVTPS2PH (#154460 ) This extends handleAVX512VectorConvertFPToInt() from 556c8467d15a131552e3c84478d768bafd95d4e6 (https://github.com/llvm/llvm-project/pull/147377) to handle AVX512 VCVTPS2PH.	2025-08-21 15:03:01 -07:00
Thurston Dang	4220538e25	[msan] Handle multiply-add-accumulate; apply to AVX Vector Neural Network Instructions (VNNI) (#153927 ) This extends the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to three-operand intrinsics (multiply-add-accumulate), and applies it to the AVX Vector Neural Network Instructions. Updates the tests from https://github.com/llvm/llvm-project/pull/153135	2025-08-18 13:18:27 -07:00
Thurston Dang	ade755d62b	[msan] Add Instrumentation for Avx512 Instructions: pmaddw, pmaddubs (#153919 ) This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512 equivalent of the pmaddw and pmaddubs intrinsics: <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>) <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)	2025-08-18 11:31:15 -07:00
Thurston Dang	638bd11c13	[msan] Handle SSE/AVX pshuf intrinsic by applying to shadow (#153895 ) llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>) are currently handled strictly, which is suboptimal. llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>) llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically handled using maybeHandleSimpleNomemIntrinsic, which is incorrect. Since the second argument is the shuffle order, we instrument all these intrinsics using `handleIntrinsicByApplyingToShadow(..., /trailingVerbatimArgs=/1)` (https://github.com/llvm/llvm-project/pull/114490).	2025-08-15 20:28:30 -07:00
Thurston Dang	2b75ff192d	[msan] Reland with even more improvement: Improve packed multiply-add instrumentation (#153353 ) This reverts commit cf002847a464c004a57ca4777251b1aafc33d958 i.e., relands ba603b5e4d44f1a25207a2a00196471d2ba93424. It was reverted because it was subtly wrong: multiplying an uninitialized zero should not result in an initialized zero. This reland fixes the issue by using instrumentation analogous to visitAnd (bitwise AND of an initialized zero and an uninitialized value results in an initialized value). Additionally, this reland expands a test case; fixes the commit message; and optimizes the change to avoid the need for horizontalReduce. The current instrumentation has false positives: it does not take into account that multiplying an initialized zero value with an uninitialized value results in an initialized zero value This change fixes the issue during the multiplication step. The horizontal add step is modeled using bitwise OR. Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.	2025-08-15 16:35:42 -07:00
Thurston Dang	cf002847a4	Revert "[msan] Improve packed multiply-add instrumentation" (#153343 ) Reverts llvm/llvm-project#152941 Buildbot breakage: https://lab.llvm.org/buildbot/#/builders/66/builds/17843	2025-08-12 21:32:07 -07:00
Thurston Dang	ba603b5e4d	[msan] Improve packed multiply-add instrumentation (#152941 ) The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value. This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR. Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.	2025-08-12 19:13:48 -07:00
Jie Fu	2fc1b3dd9f	[MemorySanitizer] Fix an unused-variable warning (NFC) /llvm-project/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2752:22: error: unused variable 'ParamType' [-Werror,-Wunused-variable] FixedVectorType *ParamType = ^ 1 error generated.	2025-08-12 07:51:53 +08:00
Thurston Dang	ef5022745c	[NFCI][msan] Refactor into 'horizontalReduce' (#152961 ) The functionality is used by two helper functions, and will be used even more in the future (e.g., https://github.com/llvm/llvm-project/pull/152941).	2025-08-11 15:48:20 -07:00
Nikita Popov	c23b4fbdbb	[IR] Remove size argument from lifetime intrinsics (#150248 ) Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).	2025-08-08 11:09:34 +02:00
Nikita Popov	86727fe9a1	[IR] Allow poison argument to lifetime markers (#151148 ) This slightly relaxes the invariant established in #149310, by also allowing the lifetime argument to be poison. This is to support the typical pattern of RAUWing with poison when removing an instruction. It's worth noting that this does not require any conservative assumptions, lifetimes with poison arguments can simply be skipped. Fixes https://github.com/llvm/llvm-project/issues/151119.	2025-08-04 10:02:04 +02:00
Thurston Dang	56944e606a	[msan] Approximately handle AVX Galois Field Affine Transformation (#150794 ) e.g., <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8) <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8) <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8) Out A x b where A and x are packed matrices, b is a vector, Out = A * x + b in GF(2) Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix computation also includes a parity calculation. For the bitwise AND of bits V1 and V2, the exact shadow is: Out_Shadow = (V1_Shadow & V2_Shadow) \| (V1 & V2_Shadow) \| (V1_Shadow & V2) We approximate the shadow of gf2p8affine using: Out_Shadow = _mm512_gf2p8affine_epi64_epi8(x_Shadow, A_shadow, 0) \| _mm512_gf2p8affine_epi64_epi8(x, A_shadow, 0) \| _mm512_gf2p8affine_epi64_epi8(x_Shadow, A, 0) \| _mm512_set1_epi8(b_Shadow) This approximation has false negatives: if an intermediate dot-product contains an even number of 1's, the parity is 0. It has no false positives. Updates the test from https://github.com/llvm/llvm-project/pull/149258	2025-07-30 08:06:50 -07:00
Kazu Hirata	3e53d4d386	[llvm] Remove unused includes (NFC) (#150265 ) These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.	2025-07-23 15:18:46 -07:00

1 2 3 4 5 ...

646 Commits