llvm-project

Author	SHA1	Message	Date
Mingming Liu	dda73336ad	[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597 ) The motivating use case is to support import the function declaration across modules to construct call graph edges for indirect calls [1] when importing the function definition costs too much compile time (e.g., the function is too large has no `noinline` attribute). 1. Currently, when the compiled IR module doesn't have a function definition but its postlink combined summary contains the function summary or a global alias summary with this function as aliasee, the function definition will be imported from source module by IRMover. The implementation is in FunctionImporter::importFunctions [2] 2. In order for FunctionImporter to import a declaration of a function, both function summary and alias summary need to carry the def / decl state. Specifically, all existing summary fields doesn't differ across import modules, but the def / decl state of is decided by `<ImportModule, Function>`. This change encodes the def/decl state in `GlobalValueSummary::GVFlags`. In the subsequent changes 1. The indexing step `computeImportForModule` [3] will compute the set of definitions and the set of declarations for each module, and passing on the information to bitcode writer. 2. Bitcode writer will look up the def/decl state and sets the state when it writes out the flag value. This is demonstrated in https://github.com/llvm/llvm-project/pull/87600 3. Function importer will read the def/decl state when reading the combined summary to figure out two sets of global values, and IRMover will be updated to import the declaration (aka linkGlobalValuePrototype [4]) into the destination module. - The next change is https://github.com/llvm/llvm-project/pull/87600 [1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)` [3] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)` [4] `3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)`	2024-04-10 19:46:01 -07:00
Jianjian Guan	fd50151180	[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275 )	2024-04-11 10:23:26 +08:00
Freddy Ye	f4509cf284	[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473 ) apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266 apx-syntax-recommendation: https://cdrdv2.intel.com/v1/dl/getContent/817241	2024-04-11 10:18:29 +08:00
Vitaly Buka	d927d1867f	[UBSAN] Emit optimization remarks (#88304 )	2024-04-10 16:30:42 -07:00
Matthias Braun	acb7ddc5cf	[WebAssembly] Remove threadlocal.address when disabling TLS (#88209 ) Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS. This fixes errors revealed by the stricter IR verification introduced in PR #87841.	2024-04-10 16:24:02 -07:00
Oskar Wirga	a9d4ddd98a	[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218 ) I noticed that we weren't propagating ALL type metadata that was attached to CFI functions: # BEFORE ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028 ``` # AFTER ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030 ``` This patch makes sure that the entire vector of metadata is copied over.	2024-04-10 15:37:27 -07:00
Farzon Lotfi	05093e2438	[Spirv][HLSL] Add OpAll lowering and float vec support (#87952 ) The main point of this change was to add support for HLSL's all intrinsic. In the process of doing that I found a few issues around creating an `OpConstantComposite` via `buildZerosVal`. First the current code didn't support floats so the process of adding `buildZerosValF` meant I needed a float version of `getOrCreateIntConstVector`. After doing so I renamed both versions to `getOrCreateConstVector`. That meant I needed to create a float type version of `getOrCreateIntCompositeOrNull`. Luckily the type information was low for this function so was able to split it out into a helpwe and rename `getOrCreateIntCompositeOrNull` to `getOrCreateCompositeOrNull` With the exception of type handling differences of the code and Null vs 0 Constant Op codes these functions should be identical. To handle scalar floats I could not use `buildConstantFP` like this PR did: https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603 because that would create too many superfluous registers (that causes problems in the validator), I had to create a float version of `getOrCreateConstInt` which I called `getOrCreateConstFP`. similar problems with doing it like this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540. `buildZerosValF` also has a use of a function `getZeroFP`. This is because half, float, and double scalar values of 0 would collide in `SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`. `getORCreateConstFP` needed its own version of `getOrCreateConstIntReg` which I called `getOrCreateConstFloatReg` The one difference in this function is `getOrCreateConstFloatReg` returns a bit width so we don't have to call `getScalarOrVectorBitWidth` twice ie when it is used again in `getOrCreateConstFP` for `OpConstantF` `addNumImm`. `getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper which called a `getOrCreateSPIRVFloatType` helper. There was no equivalent IntegerType::get for floats so I handled this with a switch statement on bit widths to get the right LLVM float type. Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This is partly a cosmetic change. When Zeros are treated as nulls, we don't create `OpConstantComposite` vectors which is something we do in the DXCs SPIRV backend. The DXC SPIRV backend also does not use `OpConstantNull`. Finally, I needed a means to test the behavior of the OpConstantNull and `OpConstantComposite` changes and this was one way I could do that via the same tests.	2024-04-10 16:27:44 -04:00
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Alexey Bataev	2b00a73f62	[SLP]Buildvector for alternate instructions with non-profitable gather operands. If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84978	2024-04-10 14:33:56 -04:00
Noah Goldstein	81cdd35c0c	[ValueTracking] Add support for `xor`/`disjoint or` in `isKnownNonZero` Handles cases like `X ^ Y == X` / `X disjoint\| Y == X`. Both of these cases have identical logic to the existing `add` case, so just converting the `add` code to a more general helper. Proofs: https://alive2.llvm.org/ce/z/Htm7pe Closes #87706	2024-04-10 13:13:43 -05:00
Noah Goldstein	2646790155	[ValueTracking] Add tests for `xor`/`disjoint or` in `isKnownNonZero`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	0c57a2e4b4	[ValueTracking] Add support for `xor`/`disjoint or` in `getInvertibleOperands` This strengthens our `isKnownNonEqual` logic with some fairly trivial cases. Proofs: https://alive2.llvm.org/ce/z/4pxRTj Closes #87705	2024-04-10 13:13:43 -05:00
Noah Goldstein	195d278d50	[ValueTracking] Add tests for `xor`/`disjoint or` in `getInvertibleOperands`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	9c545a14c0	[ValueTracking] Add support for `insertelement` in `isKnownNonZero` Inserts don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87703	2024-04-10 13:13:43 -05:00
Noah Goldstein	8a28b9b8ec	[ValueTracking] Add tests for `insertelement` in `isKnownNonZero`; NFC	2024-04-10 13:13:43 -05:00
Noah Goldstein	87528bfefb	[ValueTracking] Add support for `shufflevector` in `isKnownNonZero` Shuffles don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87702	2024-04-10 13:13:42 -05:00
Noah Goldstein	c1d3f39ae9	[ValueTracking] Add tests for `shufflevector` in `isKnownNonZero`	2024-04-10 13:13:42 -05:00
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Craig Topper	f27f369710	[RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069 ) This code was trying to save temporary argument registers in interrupt handler functions that contain calls. With the exception that all FP registers are saved including the normally callee saved registers. If all of the callees use an FP ABI and the interrupt handler doesn't touch the normally callee saved FP registers, we don't need to save them. It doesn't appear that we need to special case functions with calls. The normal callee saved register handling will already check each of the calls and consider a register clobbered if the call doesn't explicitly say it is preserved. All of the test changes are from the removal of the FP callee saved registers. There are tests for interrupt handlers with F and D extension that use ilp32 or lp64 ABIs that are not affected by this change. They still save the FP callee saved registers as they should. gcc appears to have a bug where the D extension being enabled with the ilp32f or lp64f ABI does not save the FP callee saved regs. The callee would only save/restore the lower 32 bits and clobber the upper bits. LLVM saves the FP callee saved regs in this case and there is an unchanged test for it. The unnecessary save/restore was raised in this thread https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1	2024-04-10 10:28:54 -07:00
David Green	4dcf33b6c2	[AArch64] Cleanup and GISel coverage for lrint tests. NFC	2024-04-10 18:13:57 +01:00
Vyacheslav Levytskyy	335d5d5f47	[SPIRV] Tweak parsing of base type name in builtins (#88255 ) This PR is a small improvement of parsing of base type name in builtins, allowing to understand `unsigned ...` types. The test case that fails without the fix is attached.	2024-04-10 19:04:31 +02:00
Evgenii Stepanov	e72c949c15	[msan] Overflow intrinsics. (#88210 )	2024-04-10 09:12:25 -07:00
Craig Topper	323d3ab257	[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221 ) We recently optimized the code when the Odd vector was undef to fix a poison bug. There are additional optimizations we can do if the even vector is undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a vzext.vf2 and a vsll.	2024-04-10 09:08:50 -07:00
Noah Goldstein	f1ee458ddb	[ValueTracking] improve `isKnownNonZero` precision for `smax` Instead of relying on known-bits for strictly positive, use the `isKnownPositive` API. This will use `isKnownNonZero` which is more accurate. Closes #88170	2024-04-10 10:40:49 -05:00
Noah Goldstein	2ff82c2c64	[ValueTracking] Add tests for improving `isKnownNonZero` of `smax`; NFC	2024-04-10 10:40:49 -05:00
Noah Goldstein	37ca6fa1e2	[ValueTracking] Add support for overflow detection functions is `isKnownNonZero` Adds support for: `{s,u}{add,sub,mul}.with.overflow` The logic is identical to the the non-overflow binops, we where just missing the cases. Closes #87701	2024-04-10 10:40:48 -05:00
Noah Goldstein	a02b3c0182	[ValueTracking] Add tests for overflow detection functions is `isKnownNonZero`; NFC	2024-04-10 10:40:48 -05:00
Noah Goldstein	41c52217b0	[ValueTracking] Add support for `vector_reduce_{s,u}{min,max}` in `computeKnownBits` Previously missing. We compute by just applying the reduce function on the knownbits of each element. Closes #88169	2024-04-10 10:40:48 -05:00
Noah Goldstein	77d668451a	[ValueTracking] Add support for `vector_reduce_{s,u}{min,max}` in `isKnownNonZero` Previously missing, proofs for all implementations: https://alive2.llvm.org/ce/z/G8wpmG	2024-04-10 10:40:48 -05:00
Noah Goldstein	f9f4aba547	[InstCombine] Add tests for non-zero/knownbits of `vector_reduce_{s,u}{min,max}`; NFC	2024-04-10 10:40:48 -05:00
Craig Topper	7f1b9adfc8	[RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884 ) This improves a pattern that occurs in 531.deepsjeng_r. Reducing the dynamic instruction count by 0.5%. This may be possible to improve in SelectionDAG, but given the special cases around shXadd formation, it's not obvious it can be done in a robust way without adding multiple special cases. I've used a GEP with 2 indices because that mostly closely resembles the motivating case. Most of the test cases are the simplest GEP case. One test has a logical right shift on an index which is closer to the deepsjeng code. This requires special handling in isel to reverse a DAGCombiner canonicalization that turns a pair of shifts into (srl (and X, C1), C2).	2024-04-10 08:39:56 -07:00
Alexey Bataev	6ca5a410d2	[SLP]Fix PR87358: broken module, Instruction does not dominate all uses. If the first node is a gather node with extractelement instructions, still need to put the vector value after all instructions, not after the very first one.	2024-04-10 08:24:15 -07:00
annamthomas	54a9f0007c	[SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010 ) BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120. Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift. Fixes #87798	2024-04-10 09:02:23 -04:00
Florian Hahn	94ed57dab6	[PhaseOrdering] Add test for #85551 . Add test for missed hoisting of checks from std::span https://github.com/llvm/llvm-project/issues/85551	2024-04-10 13:30:30 +01:00
Dinar Temirbulatov	990c4bc95f	[AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (#83514 ) Allow to fold or/and-and to BSL instuction for scalable vectors.	2024-04-10 11:07:59 +01:00
Simon Pilgrim	0e7d14d2e8	[X86] Regenerate mmx-intrinsics.ll test checks	2024-04-10 10:42:01 +01:00
hev	0d17e1f0e5	[LoongArch] Revert `sp` adjustment in prologue (#88110 ) After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency spill slot for GPR for large frames") to generate better code, as the issue with `RegScavenger` has been resolved. Fixes #88109	2024-04-10 17:13:25 +08:00
Paschalis Mpeis	e50c4c83b6	[AArch64][TLI] Add TLI mappings for ArmPL modf, sincos, sincospi (#83143 ) ArmPL 24.04 release fixes a bug concerning these methods, so now they can be re-introduced to TLI mappings.	2024-04-10 09:34:46 +01:00
Chia	469caa31e7	[RISCV] Use vwadd.vx for splat vector with extension (#87249 ) This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like `(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can use `vwadd.vx` and `vwadd.w` for such a case. ### Source code ``` define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) { %sb = sext i32 %b to i64 %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0 %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64> %ve = add <vscale x 8 x i64> %vc, %splat ret <vscale x 8 x i64> %ve } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/sq191PsT4) ``` vwadd_vx_splat_sext: sext.w a0, a0 vsetvli a1, zero, e64, m8, ta, ma vmv.v.x v16, a0 vsetvli zero, zero, e32, m4, ta, ma vwadd.wv v16, v16, v8 vmv8r.v v8, v16 ret ``` ### After this patch ``` vwadd_vx_splat_sext vsetvli a1, zero, e32, m4, ta, ma vwadd.vx v16, v8, a0 vmv8r.v v8, v16 ret ```	2024-04-10 15:26:17 +09:00
XChy	313a33b9df	[InstCombine] Reduce nested logical operator if poison is implied (#86823 ) Fixes #76623 Alive2 proof: https://alive2.llvm.org/ce/z/gX6znJ (I'm not sure how to write a proof for such transform, maybe there are mistakes) In most cases, `icmp(a, C1) && (other_cond && icmp(a, C2))` will be reduced to `icmp(a, C1) & (other_cond && icmp(a, C2))`, since latter icmp always implies the poison of the former. After reduction, it's easier to simplify the icmp chain. Similarly, this patch does the same thing for `(A && B) && C --> A && (B & C)`. Maybe we could constraint such reduction only on icmps if there is regression in benchmarks.	2024-04-10 14:19:44 +08:00
Shih-Po Hung	3d985a6f1b	[RISCV][TTI] Scale the cost of Select with LMUL (#88098 ) Use the Val type to estimate the instruction cost for SelectInst.	2024-04-10 14:18:15 +08:00
Noah Goldstein	6c40d463c2	[X86] Use `nneg` flag when trying to convert `uitofp` -> `sitofp` Closes #86694	2024-04-09 23:06:55 -05:00
Noah Goldstein	84a5332a68	[X86] Add tests for `uitofp nneg` -> `sitofp`; NFC	2024-04-09 23:06:55 -05:00
hanbeom	44c79da3ae	[InstCombine] Remove shl if we only demand known signbits of shift source (#79014 ) This patch resolve TODO written in commit: `5909c67883` Proof: https://alive2.llvm.org/ce/z/C3VNoR	2024-04-10 11:19:09 +09:00
Shih-Po Hung	ee52add6cb	[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931 ) This patch uses the argument type to infer the LMUL cost for the index generation, add, and comparison.	2024-04-10 10:08:33 +08:00
Evgenii Stepanov	e8a3b72272	[msan] Precommit tests. Precommit tests for overflowing and saturating arithmetic intrinsics.	2024-04-09 16:32:48 -07:00
Noah Goldstein	9170e38575	Add support for `nneg` flag with `uitofp` As noted when #82404 was pushed (canonicalizing `sitofp` -> `uitofp`), different signedness on fp casts can have dramatic performance implications on different backends. So, it makes to create a reliable means for the backend to pick its cast signedness if either are correct. Further, this allows us to start canonicalizing `sitofp`- > `uitofp` which may easy middle end analysis. Closes #86141	2024-04-09 18:12:33 -05:00
Teresa Johnson	a332cfc986	[MemProf] Perform cloning for each allocation separately (#87112 ) Restructures the cloning slightly to perform all cloning for each allocation separately. The prior algorithm would sometimes miss cloning opportunities in cases where trimmed cold contexts partially overlapped with longer contexts for different allocations. Most of the change is isolated to the helpers that move edges to new or existing clones, which now support moving a subset of context ids.	2024-04-09 14:12:32 -07:00
Noah Goldstein	71ef04d7cd	[InstCombine] fold `(icmp eq/ne (or disjoint x, C0), C1)` -> `(icmp eq/ne x, C0^C1)` Proof: https://alive2.llvm.org/ce/z/m3xoo_ Closes #87734	2024-04-09 15:38:18 -05:00
Noah Goldstein	5b58eb68ed	[InstCombine] Add tests for folding `(icmp eq/ne (or disjoint x, C0), C1)`; NFC	2024-04-09 15:38:18 -05:00

1 2 3 4 5 ...

112068 Commits