llvm-project

Author	SHA1	Message	Date
Mingming Liu	dda73336ad	[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597 ) The motivating use case is to support import the function declaration across modules to construct call graph edges for indirect calls [1] when importing the function definition costs too much compile time (e.g., the function is too large has no `noinline` attribute). 1. Currently, when the compiled IR module doesn't have a function definition but its postlink combined summary contains the function summary or a global alias summary with this function as aliasee, the function definition will be imported from source module by IRMover. The implementation is in FunctionImporter::importFunctions [2] 2. In order for FunctionImporter to import a declaration of a function, both function summary and alias summary need to carry the def / decl state. Specifically, all existing summary fields doesn't differ across import modules, but the def / decl state of is decided by `<ImportModule, Function>`. This change encodes the def/decl state in `GlobalValueSummary::GVFlags`. In the subsequent changes 1. The indexing step `computeImportForModule` [3] will compute the set of definitions and the set of declarations for each module, and passing on the information to bitcode writer. 2. Bitcode writer will look up the def/decl state and sets the state when it writes out the flag value. This is demonstrated in https://github.com/llvm/llvm-project/pull/87600 3. Function importer will read the def/decl state when reading the combined summary to figure out two sets of global values, and IRMover will be updated to import the declaration (aka linkGlobalValuePrototype [4]) into the destination module. - The next change is https://github.com/llvm/llvm-project/pull/87600 [1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5 [2] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)` [3] `3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)` [4] `3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)`	2024-04-10 19:46:01 -07:00
Craig Topper	999b9e6ddb	[RISCV] Use vector getConstant instead of getSplatVector+getConstant. NFC	2024-04-10 19:39:41 -07:00
Jianjian Guan	fd50151180	[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275 )	2024-04-11 10:23:26 +08:00
Freddy Ye	f4509cf284	[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473 ) apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266 apx-syntax-recommendation: https://cdrdv2.intel.com/v1/dl/getContent/817241	2024-04-11 10:18:29 +08:00
Jordan Rupprecht	6b46166ef2	[llvm][NFC] Suppress `-Wunused-result` call to `write` Commit 87e6f87fe7e343eb656e9b49d30cbb065c086651 adds a call to `::write()`, which may be annotated w/ `warn_unused_result`, leading to `-Wunused-result` failures.	2024-04-11 02:14:07 +00:00
Vitaly Buka	d927d1867f	[UBSAN] Emit optimization remarks (#88304 )	2024-04-10 16:30:42 -07:00
Matthias Braun	acb7ddc5cf	[WebAssembly] Remove threadlocal.address when disabling TLS (#88209 ) Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS. This fixes errors revealed by the stricter IR verification introduced in PR #87841.	2024-04-10 16:24:02 -07:00
Oskar Wirga	a9d4ddd98a	[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218 ) I noticed that we weren't propagating ALL type metadata that was attached to CFI functions: # BEFORE ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028 ``` # AFTER ``` ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030 ... fn merging ; Function Attrs: minsize nounwind optsize ssp uwtable(sync) define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030 ``` This patch makes sure that the entire vector of metadata is copied over.	2024-04-10 15:37:27 -07:00
Craig Topper	d8f1e5d289	[APInt] Remove accumulator initialization from tcMultiply and tcFullMultiply. NFCI (#88202 ) The tcMultiplyPart routine has a flag that says whether to add to the accumulator or overwrite it. By using the overwrite mode on the first iteration we don't need to initialize the accumulator to zero. Note, the initialization in tcFullMultiply was only initializing the first rhsParts of dst. tcMultiplyPart always overwrites the rhsParts+1 part that just contains the last carry. The first write to each part of dst past rhsParts is a carry write so that's how the upper part of dst is initialized.	2024-04-10 15:07:16 -07:00
Stanislav Mekhanoshin	2fdfea088c	[AMDGPU] Add v2i32 to the VS_64 types. NFCI. (#88318 ) I am trying to use VOP3Inst with intrinsic taking v2i32 operand and it fails to create patterm without it.	2024-04-10 14:50:54 -07:00
Farzon Lotfi	05093e2438	[Spirv][HLSL] Add OpAll lowering and float vec support (#87952 ) The main point of this change was to add support for HLSL's all intrinsic. In the process of doing that I found a few issues around creating an `OpConstantComposite` via `buildZerosVal`. First the current code didn't support floats so the process of adding `buildZerosValF` meant I needed a float version of `getOrCreateIntConstVector`. After doing so I renamed both versions to `getOrCreateConstVector`. That meant I needed to create a float type version of `getOrCreateIntCompositeOrNull`. Luckily the type information was low for this function so was able to split it out into a helpwe and rename `getOrCreateIntCompositeOrNull` to `getOrCreateCompositeOrNull` With the exception of type handling differences of the code and Null vs 0 Constant Op codes these functions should be identical. To handle scalar floats I could not use `buildConstantFP` like this PR did: https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603 because that would create too many superfluous registers (that causes problems in the validator), I had to create a float version of `getOrCreateConstInt` which I called `getOrCreateConstFP`. similar problems with doing it like this: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540. `buildZerosValF` also has a use of a function `getZeroFP`. This is because half, float, and double scalar values of 0 would collide in `SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`. `getORCreateConstFP` needed its own version of `getOrCreateConstIntReg` which I called `getOrCreateConstFloatReg` The one difference in this function is `getOrCreateConstFloatReg` returns a bit width so we don't have to call `getScalarOrVectorBitWidth` twice ie when it is used again in `getOrCreateConstFP` for `OpConstantF` `addNumImm`. `getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper which called a `getOrCreateSPIRVFloatType` helper. There was no equivalent IntegerType::get for floats so I handled this with a switch statement on bit widths to get the right LLVM float type. Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This is partly a cosmetic change. When Zeros are treated as nulls, we don't create `OpConstantComposite` vectors which is something we do in the DXCs SPIRV backend. The DXC SPIRV backend also does not use `OpConstantNull`. Finally, I needed a means to test the behavior of the OpConstantNull and `OpConstantComposite` changes and this was one way I could do that via the same tests.	2024-04-10 16:27:44 -04:00
shamithoke	e3ef4612c1	Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764 ) Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors Extend the operation to i32 and i64. --------- Co-authored-by: shami <shami_thoke@yahoo.com>	2024-04-10 20:22:44 +01:00
Alexey Bataev	2b00a73f62	[SLP]Buildvector for alternate instructions with non-profitable gather operands. If the operands of the potentially alternate node are going to produce buildvector sequences, which result in more instructions, than the original code, then suhinstructions should be vectorized as alternate node, better to end up with the buildvector node. Left column - experimental, Right - reference. Metric: size..text Program size..text results results0 diff test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test 413680.00 416272.00 0.6% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00 0.0% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 664901.00 664949.00 0.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1171371.00 1171355.00 -0.0% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 1036396.00 1036284.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 111280.00 111248.00 -0.0% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1392113.00 1391361.00 -0.1% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1392113.00 1391361.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 281676.00 281452.00 -0.1% test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 3025.00 3019.00 -0.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6351.00 6335.00 -0.3% Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test 15.00 16.00 6.7% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1703.00 1707.00 0.2% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00 26239.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00 11754.00 -0.1% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 824.00 822.00 -0.2% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 5668.00 5654.00 -0.2% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 792.00 790.00 -0.3% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 792.00 790.00 -0.3% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 1389.00 1384.00 -0.4% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 596.00 590.00 -1.0% test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test 6.00 5.00 -16.7% Metric: exec_time Program exec_time results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 99.14 100.00 0.9% Other changes are not significant (less than 0.1% percent with exectime less 5 secs). SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns remain scalar, smaller code. External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small changes, some extra stores gets vectorized. External/SPEC/CINT2017speed/625.x264_s/625.x264_s External/SPEC/CINT2017rate/525.x264_r/525.x264_r x264 has one change in a loop body, in function ssim_end4, some code remain scalar, resulting in less code size. External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code gets vectorized, looks like some other patterns were matched. MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were vectorized (looks like the graphs become profitable) MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small changes in vectorized code (some small part remain scalar). External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s Many changes cause by the fact that the code of one function becomes smaller (onvertLCHabToRGB) and this functions gets inlined after that. MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small changes here and there, some extra code is vectorized, some remain scalar (2 x vectors) MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars + 2 insertelems instead of insert, broadcast, alt code (3 instructions, total 5 insts) MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph becomes profitable and gets vectorized. External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s Some small graph becomes profitable and gets vectorized. MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code. Reviewers: RKSimon, dtcxzyw Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/84978	2024-04-10 14:33:56 -04:00
Noah Goldstein	81cdd35c0c	[ValueTracking] Add support for `xor`/`disjoint or` in `isKnownNonZero` Handles cases like `X ^ Y == X` / `X disjoint\| Y == X`. Both of these cases have identical logic to the existing `add` case, so just converting the `add` code to a more general helper. Proofs: https://alive2.llvm.org/ce/z/Htm7pe Closes #87706	2024-04-10 13:13:43 -05:00
Noah Goldstein	0c57a2e4b4	[ValueTracking] Add support for `xor`/`disjoint or` in `getInvertibleOperands` This strengthens our `isKnownNonEqual` logic with some fairly trivial cases. Proofs: https://alive2.llvm.org/ce/z/4pxRTj Closes #87705	2024-04-10 13:13:43 -05:00
Noah Goldstein	9c545a14c0	[ValueTracking] Add support for `insertelement` in `isKnownNonZero` Inserts don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87703	2024-04-10 13:13:43 -05:00
Noah Goldstein	87528bfefb	[ValueTracking] Add support for `shufflevector` in `isKnownNonZero` Shuffles don't modify the data, so if all elements that end up in the destination are non-zero the result is non-zero. Closes #87702	2024-04-10 13:13:42 -05:00
Jun Wang	86842e1f72	[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236 ) This patch introduces a new command-line option for clang, namely, amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt instruction is generated after each memory load/store instruction. The counter values are always 0, but which counters are involved depends on the memory instruction. --------- Co-authored-by: Jun Wang <jun.wang7@amd.com>	2024-04-10 10:47:04 -07:00
Craig Topper	f27f369710	[RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069 ) This code was trying to save temporary argument registers in interrupt handler functions that contain calls. With the exception that all FP registers are saved including the normally callee saved registers. If all of the callees use an FP ABI and the interrupt handler doesn't touch the normally callee saved FP registers, we don't need to save them. It doesn't appear that we need to special case functions with calls. The normal callee saved register handling will already check each of the calls and consider a register clobbered if the call doesn't explicitly say it is preserved. All of the test changes are from the removal of the FP callee saved registers. There are tests for interrupt handlers with F and D extension that use ilp32 or lp64 ABIs that are not affected by this change. They still save the FP callee saved registers as they should. gcc appears to have a bug where the D extension being enabled with the ilp32f or lp64f ABI does not save the FP callee saved regs. The callee would only save/restore the lower 32 bits and clobber the upper bits. LLVM saves the FP callee saved regs in this case and there is an unchanged test for it. The unnecessary save/restore was raised in this thread https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1	2024-04-10 10:28:54 -07:00
Vyacheslav Levytskyy	335d5d5f47	[SPIRV] Tweak parsing of base type name in builtins (#88255 ) This PR is a small improvement of parsing of base type name in builtins, allowing to understand `unsigned ...` types. The test case that fails without the fix is attached.	2024-04-10 19:04:31 +02:00
Evgenii Stepanov	e72c949c15	[msan] Overflow intrinsics. (#88210 )	2024-04-10 09:12:25 -07:00
Craig Topper	323d3ab257	[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221 ) We recently optimized the code when the Odd vector was undef to fix a poison bug. There are additional optimizations we can do if the even vector is undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a vzext.vf2 and a vsll.	2024-04-10 09:08:50 -07:00
Noah Goldstein	f1ee458ddb	[ValueTracking] improve `isKnownNonZero` precision for `smax` Instead of relying on known-bits for strictly positive, use the `isKnownPositive` API. This will use `isKnownNonZero` which is more accurate. Closes #88170	2024-04-10 10:40:49 -05:00
Noah Goldstein	37ca6fa1e2	[ValueTracking] Add support for overflow detection functions is `isKnownNonZero` Adds support for: `{s,u}{add,sub,mul}.with.overflow` The logic is identical to the the non-overflow binops, we where just missing the cases. Closes #87701	2024-04-10 10:40:48 -05:00
Noah Goldstein	f0a487d7e2	[ValueTracking] Split `isNonZero(mul)` logic to a helper; NFC	2024-04-10 10:40:48 -05:00
Noah Goldstein	41c52217b0	[ValueTracking] Add support for `vector_reduce_{s,u}{min,max}` in `computeKnownBits` Previously missing. We compute by just applying the reduce function on the knownbits of each element. Closes #88169	2024-04-10 10:40:48 -05:00
Noah Goldstein	77d668451a	[ValueTracking] Add support for `vector_reduce_{s,u}{min,max}` in `isKnownNonZero` Previously missing, proofs for all implementations: https://alive2.llvm.org/ce/z/G8wpmG	2024-04-10 10:40:48 -05:00
Craig Topper	7f1b9adfc8	[RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884 ) This improves a pattern that occurs in 531.deepsjeng_r. Reducing the dynamic instruction count by 0.5%. This may be possible to improve in SelectionDAG, but given the special cases around shXadd formation, it's not obvious it can be done in a robust way without adding multiple special cases. I've used a GEP with 2 indices because that mostly closely resembles the motivating case. Most of the test cases are the simplest GEP case. One test has a logical right shift on an index which is closer to the deepsjeng code. This requires special handling in isel to reverse a DAGCombiner canonicalization that turns a pair of shifts into (srl (and X, C1), C2).	2024-04-10 08:39:56 -07:00
Alexey Bataev	6ca5a410d2	[SLP]Fix PR87358: broken module, Instruction does not dominate all uses. If the first node is a gather node with extractelement instructions, still need to put the vector value after all instructions, not after the very first one.	2024-04-10 08:24:15 -07:00
Philip Reames	5ae9ffbd18	[RISCV] Address review comment from 88062 As pointed out by Fraser, KillSrcReg is always false at this point in code, and having the inconcistency on whether we check the flag between the if and else blocks is confusing.	2024-04-10 07:21:41 -07:00
Alexey Bataev	938a73422e	[SLP][NFC]Walk over entries, not single values. Better to walk over SLP nodes rather than single values. Matching a value to a node is not a 1-to-1 relation, one value may be part of several nodes and compiler may get wrong node, when trying to map it. Currently there are no such issues detected, but they may appear in future.	2024-04-10 06:03:26 -07:00
annamthomas	54a9f0007c	[SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010 ) BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120. Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift. Fixes #87798	2024-04-10 09:02:23 -04:00
Dinar Temirbulatov	990c4bc95f	[AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (#83514 ) Allow to fold or/and-and to BSL instuction for scalable vectors.	2024-04-10 11:07:59 +01:00
Florian Hahn	cac4c14ecf	[LAA] Replace std::tuple with struct (NFCI). As suggested in https://github.com/llvm/llvm-project/pull/88039, replace the tuple with a struct, to make it easier to extend.	2024-04-10 10:28:43 +01:00
hev	0d17e1f0e5	[LoongArch] Revert `sp` adjustment in prologue (#88110 ) After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency spill slot for GPR for large frames") to generate better code, as the issue with `RegScavenger` has been resolved. Fixes #88109	2024-04-10 17:13:25 +08:00
Chia	469caa31e7	[RISCV] Use vwadd.vx for splat vector with extension (#87249 ) This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like `(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can use `vwadd.vx` and `vwadd.w` for such a case. ### Source code ``` define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) { %sb = sext i32 %b to i64 %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0 %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64> %ve = add <vscale x 8 x i64> %vc, %splat ret <vscale x 8 x i64> %ve } ``` ### Before this patch [Compiler Explorer](https://godbolt.org/z/sq191PsT4) ``` vwadd_vx_splat_sext: sext.w a0, a0 vsetvli a1, zero, e64, m8, ta, ma vmv.v.x v16, a0 vsetvli zero, zero, e32, m4, ta, ma vwadd.wv v16, v16, v8 vmv8r.v v8, v16 ret ``` ### After this patch ``` vwadd_vx_splat_sext vsetvli a1, zero, e32, m4, ta, ma vwadd.vx v16, v8, a0 vmv8r.v v8, v16 ret ```	2024-04-10 15:26:17 +09:00
XChy	313a33b9df	[InstCombine] Reduce nested logical operator if poison is implied (#86823 ) Fixes #76623 Alive2 proof: https://alive2.llvm.org/ce/z/gX6znJ (I'm not sure how to write a proof for such transform, maybe there are mistakes) In most cases, `icmp(a, C1) && (other_cond && icmp(a, C2))` will be reduced to `icmp(a, C1) & (other_cond && icmp(a, C2))`, since latter icmp always implies the poison of the former. After reduction, it's easier to simplify the icmp chain. Similarly, this patch does the same thing for `(A && B) && C --> A && (B & C)`. Maybe we could constraint such reduction only on icmps if there is regression in benchmarks.	2024-04-10 14:19:44 +08:00
Shih-Po Hung	3d985a6f1b	[RISCV][TTI] Scale the cost of Select with LMUL (#88098 ) Use the Val type to estimate the instruction cost for SelectInst.	2024-04-10 14:18:15 +08:00
Congzhe	b0662a7a7d	[CodeMoverUtils] Enhance CodeMoverUtils to sink an entire BB (#87857 ) When moving an entire basic block after `InsertPoint`, currently we check each instruction whether their users are dominated by `InsertPoint`, however, this can be improved such that even a user is not dominated by `InsertPoint`, as long as it appears as a subsequent instruction in the same BB, it is safe to move. This patch is similar to commit 751be2a064f119af74c7b9b1e52bc904d8aa114d that enhanced hoisting an entire BB, and this patch enhances sinking an entire BB. Please refer to the added functionality in test case `llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp` that was not supported without this patch.	2024-04-10 00:28:21 -04:00
Noah Goldstein	6c40d463c2	[X86] Use `nneg` flag when trying to convert `uitofp` -> `sitofp` Closes #86694	2024-04-09 23:06:55 -05:00
Noah Goldstein	7013638978	[DAG] Add support for `nneg` flag with `uitofp` Copy `nneg` flag when building `UINT_TO_FP` from `uitofp` and use `nneg` flag in the one place we transform `UINT_TO_FP` -> `SINT_TO_FP` if the operand is non-negative.	2024-04-09 23:06:55 -05:00
Connor Sughrue	87e6f87fe7	[llvm][Support] Improvements to ListeningSocket functionality and documentation (#84710 ) Improvements include * Enable `ListeningSocket::accept` to timeout after a specified amount of time or block indefinitely * Enable `ListeningSocket::createUnix` to handle instances where the target socket address already exists and differentiate between situations where the existing file does and does not already have a bound socket * Doxygen comments Functionality added for the module build daemon --------- Co-authored-by: Michael Spencer <bigcheesegs@gmail.com>	2024-04-09 23:41:18 -04:00
Pengcheng Wang	8dc006ea40	[RISCV] Make EmitToStreamer return whether Inst is compressed This is helpful to reduce calls of `RISCVRVC::compress` in #77337. Reviewers: asb, lukel97, topperc Reviewed By: topperc Pull Request: https://github.com/llvm/llvm-project/pull/88120	2024-04-10 11:02:55 +08:00
Lei Wang	1aceee7bb6	Remove unused variable (#88223 ) fix the CI	2024-04-09 19:25:08 -07:00
hanbeom	44c79da3ae	[InstCombine] Remove shl if we only demand known signbits of shift source (#79014 ) This patch resolve TODO written in commit: `5909c67883` Proof: https://alive2.llvm.org/ce/z/C3VNoR	2024-04-10 11:19:09 +09:00
Shih-Po Hung	ee52add6cb	[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931 ) This patch uses the argument type to infer the LMUL cost for the index generation, add, and comparison.	2024-04-10 10:08:33 +08:00
Noah Goldstein	9170e38575	Add support for `nneg` flag with `uitofp` As noted when #82404 was pushed (canonicalizing `sitofp` -> `uitofp`), different signedness on fp casts can have dramatic performance implications on different backends. So, it makes to create a reliable means for the backend to pick its cast signedness if either are correct. Further, this allows us to start canonicalizing `sitofp`- > `uitofp` which may easy middle end analysis. Closes #86141	2024-04-09 18:12:33 -05:00
Teresa Johnson	a332cfc986	[MemProf] Perform cloning for each allocation separately (#87112 ) Restructures the cloning slightly to perform all cloning for each allocation separately. The prior algorithm would sometimes miss cloning opportunities in cases where trimmed cold contexts partially overlapped with longer contexts for different allocations. Most of the change is isolated to the helpers that move edges to new or existing clones, which now support moving a subset of context ids.	2024-04-09 14:12:32 -07:00
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Noah Goldstein	71ef04d7cd	[InstCombine] fold `(icmp eq/ne (or disjoint x, C0), C1)` -> `(icmp eq/ne x, C0^C1)` Proof: https://alive2.llvm.org/ce/z/m3xoo_ Closes #87734	2024-04-09 15:38:18 -05:00

1 2 3 4 5 ...

180384 Commits