180384 Commits

Author SHA1 Message Date
Mingming Liu
dda73336ad
[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597)
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.

This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.

In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.

- The next change is https://github.com/llvm/llvm-project/pull/87600

[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)
[3] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)
[4] 3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)
2024-04-10 19:46:01 -07:00
Craig Topper
999b9e6ddb [RISCV] Use vector getConstant instead of getSplatVector+getConstant. NFC 2024-04-10 19:39:41 -07:00
Jianjian Guan
fd50151180
[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275) 2024-04-11 10:23:26 +08:00
Freddy Ye
f4509cf284
[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Jordan Rupprecht
6b46166ef2 [llvm][NFC] Suppress -Wunused-result call to write
Commit 87e6f87fe7e343eb656e9b49d30cbb065c086651 adds a call to `::write()`, which may be annotated w/ `warn_unused_result`, leading to `-Wunused-result` failures.
2024-04-11 02:14:07 +00:00
Vitaly Buka
d927d1867f
[UBSAN] Emit optimization remarks (#88304) 2024-04-10 16:30:42 -07:00
Matthias Braun
acb7ddc5cf
[WebAssembly] Remove threadlocal.address when disabling TLS (#88209)
Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS.
This fixes errors revealed by the stricter IR verification introduced in
PR #87841.
2024-04-10 16:24:02 -07:00
Oskar Wirga
a9d4ddd98a
[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218)
I noticed that we weren't propagating ALL type metadata that was
attached to CFI functions:

# BEFORE

```
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030
... fn merging
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028
```

# AFTER

```
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030
... fn merging
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030
```

This patch makes sure that the entire vector of metadata is copied over.
2024-04-10 15:37:27 -07:00
Craig Topper
d8f1e5d289
[APInt] Remove accumulator initialization from tcMultiply and tcFullMultiply. NFCI (#88202)
The tcMultiplyPart routine has a flag that says whether to add to the
accumulator or overwrite it. By using the overwrite mode on the first
iteration we don't need to initialize the accumulator to zero.

Note, the initialization in tcFullMultiply was only initializing the
first rhsParts of dst. tcMultiplyPart always overwrites the rhsParts+1
part that just contains the last carry. The first write to each part of
dst past rhsParts is a carry write so that's how the upper part of dst
is initialized.
2024-04-10 15:07:16 -07:00
Stanislav Mekhanoshin
2fdfea088c
[AMDGPU] Add v2i32 to the VS_64 types. NFCI. (#88318)
I am trying to use VOP3Inst with intrinsic taking v2i32 operand and it
fails to create patterm without it.
2024-04-10 14:50:54 -07:00
Farzon Lotfi
05093e2438
[Spirv][HLSL] Add OpAll lowering and float vec support (#87952)
The main point of this change was to add support for HLSL's all
intrinsic.
In the process of doing that I found a few issues around creating an
`OpConstantComposite` via `buildZerosVal`.

First the current code didn't support floats so the process of adding
`buildZerosValF` meant I needed a
float version of `getOrCreateIntConstVector`. After doing so I renamed
both versions to `getOrCreateConstVector`. That meant I needed to create
a float type version of `getOrCreateIntCompositeOrNull`. Luckily the
type information was low for this function so was able to split it out
into a helpwe and rename `getOrCreateIntCompositeOrNull` to
`getOrCreateCompositeOrNull` With the exception of type handling
differences of the code and Null vs 0 Constant Op codes these functions
should be identical.

To handle scalar floats I could not use `buildConstantFP` like this PR
did:
https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603
because that would create too many superfluous registers (that causes
problems in the validator), I had to create a float version of
`getOrCreateConstInt` which I called `getOrCreateConstFP`.
similar problems with doing it like this:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540.

`buildZerosValF` also has a use of a function `getZeroFP`. This is
because half, float, and double scalar values of 0 would collide in
`SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`.

`getORCreateConstFP` needed its own version of `getOrCreateConstIntReg`
which I called `getOrCreateConstFloatReg` The one difference in this
function is `getOrCreateConstFloatReg` returns a bit width so we don't
have to call `getScalarOrVectorBitWidth` twice ie when it is used again
in `getOrCreateConstFP` for `OpConstantF` `addNumImm`.

`getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper
which called a `getOrCreateSPIRVFloatType` helper. There was no
equivalent IntegerType::get for floats so I handled this with a switch
statement on bit widths to get the right LLVM float type.

Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This
is partly a cosmetic change. When Zeros are treated as nulls, we don't
create `OpConstantComposite` vectors which is something we do in the
DXCs SPIRV backend. The DXC SPIRV backend also does not use
`OpConstantNull`. Finally, I needed a means to test the behavior of the
OpConstantNull and `OpConstantComposite` changes and this was one way I
could do that via the same tests.
2024-04-10 16:27:44 -04:00
shamithoke
e3ef4612c1
Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764)
Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors

Extend the operation to i32 and i64.

---------

Co-authored-by: shami <shami_thoke@yahoo.com>
2024-04-10 20:22:44 +01:00
Alexey Bataev
2b00a73f62
[SLP]Buildvector for alternate instructions with non-profitable gather operands.
If the operands of the potentially alternate node are going to produce
buildvector sequences, which result in more instructions, than the
original code, then suhinstructions should be vectorized as alternate
node, better to end up with the buildvector node.

Left column - experimental, Right - reference.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413680.00   416272.00  0.6%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00  0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664901.00   664949.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664901.00   664949.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1171371.00  1171355.00 -0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1036396.00  1036284.00 -0.0%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111280.00   111248.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1392113.00  1391361.00 -0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1392113.00  1391361.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281676.00   281452.00 -0.1%
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test     3025.00     3019.00 -0.2%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6351.00     6335.00 -0.3%

Metric: SLP.NumVectorInstructions

Program                                                                                                                                                SLP.NumVectorInstructions
                                                                                                                                                       results                   results0 diff
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test    15.00                     16.00   6.7%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1703.00                   1707.00   0.2%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1703.00                   1707.00   0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00                  26239.00  -0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00                  11754.00  -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   824.00                    822.00  -0.2%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5668.00                   5654.00  -0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5668.00                   5654.00  -0.2%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   792.00                    790.00  -0.3%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   792.00                    790.00  -0.3%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1389.00                   1384.00  -0.4%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   596.00                    590.00  -1.0%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6.00                      5.00 -16.7%

Metric: exec_time

Program                                                                                                                                                exec_time
                                                                                                                                                       results   results0  diff
                                                                               test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test     99.14    100.00    0.9%

Other changes are not significant (less than 0.1% percent with exectime
less 5 secs).

SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns
remain scalar, smaller code.
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small
changes, some extra stores gets vectorized.
External/SPEC/CINT2017speed/625.x264_s/625.x264_s
External/SPEC/CINT2017rate/525.x264_r/525.x264_r
x264 has one change in a loop body, in function ssim_end4, some code
remain scalar, resulting in less code size.
External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code
gets vectorized, looks like some other patterns were matched.
MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were
vectorized (looks like the graphs become profitable)
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small
changes in vectorized code (some small part remain scalar).
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s
Many changes cause by the fact that the code of one function becomes
smaller (onvertLCHabToRGB) and this functions gets inlined after that.
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small
changes here and there, some extra code is vectorized, some remain
scalar (2 x vectors)
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars
+ 2 insertelems instead of insert, broadcast, alt code (3 instructions,
  total 5 insts)
MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph
becomes profitable and gets vectorized.
External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r
External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s
Some small graph becomes profitable and gets vectorized.
MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code.

Reviewers: RKSimon, dtcxzyw

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84978
2024-04-10 14:33:56 -04:00
Noah Goldstein
81cdd35c0c [ValueTracking] Add support for xor/disjoint or in isKnownNonZero
Handles cases like `X ^ Y == X` / `X disjoint| Y == X`.

Both of these cases have identical logic to the existing `add` case,
so just converting the `add` code to a more general helper.

Proofs: https://alive2.llvm.org/ce/z/Htm7pe

Closes #87706
2024-04-10 13:13:43 -05:00
Noah Goldstein
0c57a2e4b4 [ValueTracking] Add support for xor/disjoint or in getInvertibleOperands
This strengthens our `isKnownNonEqual` logic with some fairly
trivial cases.

Proofs: https://alive2.llvm.org/ce/z/4pxRTj

Closes #87705
2024-04-10 13:13:43 -05:00
Noah Goldstein
9c545a14c0 [ValueTracking] Add support for insertelement in isKnownNonZero
Inserts don't modify the data, so if all elements that end up in the
destination are non-zero the result is non-zero.

Closes #87703
2024-04-10 13:13:43 -05:00
Noah Goldstein
87528bfefb [ValueTracking] Add support for shufflevector in isKnownNonZero
Shuffles don't modify the data, so if all elements that end up in the
destination are non-zero the result is non-zero.

Closes #87702
2024-04-10 13:13:42 -05:00
Jun Wang
86842e1f72
[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236)
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-04-10 10:47:04 -07:00
Craig Topper
f27f369710
[RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069)
This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.

If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.

It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.

All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.

gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.

The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
2024-04-10 10:28:54 -07:00
Vyacheslav Levytskyy
335d5d5f47
[SPIRV] Tweak parsing of base type name in builtins (#88255)
This PR is a small improvement of parsing of base type name in builtins,
allowing to understand `unsigned ...` types. The test case that fails
without the fix is attached.
2024-04-10 19:04:31 +02:00
Evgenii Stepanov
e72c949c15
[msan] Overflow intrinsics. (#88210) 2024-04-10 09:12:25 -07:00
Craig Topper
323d3ab257
[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221)
We recently optimized the code when the Odd vector was undef to fix a
poison bug.

There are additional optimizations we can do if the even vector is
undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a
vzext.vf2 and a vsll.
2024-04-10 09:08:50 -07:00
Noah Goldstein
f1ee458ddb [ValueTracking] improve isKnownNonZero precision for smax
Instead of relying on known-bits for strictly positive, use the
`isKnownPositive` API. This will use `isKnownNonZero` which is more
accurate.

Closes #88170
2024-04-10 10:40:49 -05:00
Noah Goldstein
37ca6fa1e2 [ValueTracking] Add support for overflow detection functions is isKnownNonZero
Adds support for: `{s,u}{add,sub,mul}.with.overflow`

The logic is identical to the the non-overflow binops, we where just
missing the cases.

Closes #87701
2024-04-10 10:40:48 -05:00
Noah Goldstein
f0a487d7e2 [ValueTracking] Split isNonZero(mul) logic to a helper; NFC 2024-04-10 10:40:48 -05:00
Noah Goldstein
41c52217b0 [ValueTracking] Add support for vector_reduce_{s,u}{min,max} in computeKnownBits
Previously missing. We compute by just applying the reduce function on
the knownbits of each element.

Closes #88169
2024-04-10 10:40:48 -05:00
Noah Goldstein
77d668451a [ValueTracking] Add support for vector_reduce_{s,u}{min,max} in isKnownNonZero
Previously missing, proofs for all implementations:
https://alive2.llvm.org/ce/z/G8wpmG
2024-04-10 10:40:48 -05:00
Craig Topper
7f1b9adfc8
[RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884)
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.

This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.

I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
2024-04-10 08:39:56 -07:00
Alexey Bataev
6ca5a410d2 [SLP]Fix PR87358: broken module, Instruction does not dominate all uses.
If the first node is a gather node with extractelement instructions,
still need to put the vector value after all instructions, not after the
very first one.
2024-04-10 08:24:15 -07:00
Philip Reames
5ae9ffbd18 [RISCV] Address review comment from 88062
As pointed out by Fraser, KillSrcReg is always false at this point in
code, and having the inconcistency on whether we check the flag between
the if and else blocks is confusing.
2024-04-10 07:21:41 -07:00
Alexey Bataev
938a73422e [SLP][NFC]Walk over entries, not single values.
Better to walk over SLP nodes rather than single values. Matching
a value to a node is not a 1-to-1 relation, one value may be part of
several nodes and compiler may get wrong node, when trying to map it.
Currently there are no such issues detected, but they may appear in
future.
2024-04-10 06:03:26 -07:00
annamthomas
54a9f0007c
[SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010)
BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120.
    
Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift.
    
 Fixes #87798
2024-04-10 09:02:23 -04:00
Dinar Temirbulatov
990c4bc95f
[AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (#83514)
Allow to fold or/and-and to BSL instuction for scalable vectors.
2024-04-10 11:07:59 +01:00
Florian Hahn
cac4c14ecf
[LAA] Replace std::tuple with struct (NFCI).
As suggested in https://github.com/llvm/llvm-project/pull/88039, replace
the tuple with a struct, to make it easier to extend.
2024-04-10 10:28:43 +01:00
hev
0d17e1f0e5
[LoongArch] Revert sp adjustment in prologue (#88110)
After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for
FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we
can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency
spill slot for GPR for large frames") to generate better code, as the
issue with `RegScavenger` has been resolved.

Fixes #88109
2024-04-10 17:13:25 +08:00
Chia
469caa31e7
[RISCV] Use vwadd.vx for splat vector with extension (#87249)
This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like
`(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can
use `vwadd.vx` and `vwadd.w` for such a case.

### Source code
```
define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) {
     %sb = sext i32 %b to i64
     %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0
     %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
     %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64>
     %ve = add <vscale x 8 x i64> %vc, %splat
     ret <vscale x 8 x i64> %ve
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/sq191PsT4)
```
vwadd_vx_splat_sext:
  sext.w a0, a0
  vsetvli a1, zero, e64, m8, ta, ma
  vmv.v.x v16, a0
  vsetvli zero, zero, e32, m4, ta, ma
  vwadd.wv v16, v16, v8
  vmv8r.v v8, v16
  ret
```
### After this patch
```
vwadd_vx_splat_sext
  vsetvli a1, zero, e32, m4, ta, ma
  vwadd.vx v16, v8, a0
  vmv8r.v v8, v16
  ret
```
2024-04-10 15:26:17 +09:00
XChy
313a33b9df
[InstCombine] Reduce nested logical operator if poison is implied (#86823)
Fixes #76623
Alive2 proof: https://alive2.llvm.org/ce/z/gX6znJ (I'm not sure how to
write a proof for such transform, maybe there are mistakes)

In most cases, `icmp(a, C1) && (other_cond && icmp(a, C2))` will be
reduced to `icmp(a, C1) & (other_cond && icmp(a, C2))`, since latter
icmp always implies the poison of the former. After reduction, it's
easier to simplify the icmp chain.
Similarly, this patch does the same thing for `(A && B) && C --> A && (B
& C)`. Maybe we could constraint such reduction only on icmps if there
is regression in benchmarks.
2024-04-10 14:19:44 +08:00
Shih-Po Hung
3d985a6f1b
[RISCV][TTI] Scale the cost of Select with LMUL (#88098)
Use the Val type to estimate the instruction cost for SelectInst.
2024-04-10 14:18:15 +08:00
Congzhe
b0662a7a7d
[CodeMoverUtils] Enhance CodeMoverUtils to sink an entire BB (#87857)
When moving an entire basic block after `InsertPoint`, currently we
check each instruction whether their users are dominated by
`InsertPoint`, however, this can be improved such that even a user is
not dominated by `InsertPoint`, as long as it appears as a subsequent
instruction in the same BB, it is safe to move.

This patch is similar to commit 751be2a064f119af74c7b9b1e52bc904d8aa114d
that enhanced hoisting an entire BB, and this patch enhances sinking an
entire BB. Please refer to the added functionality in test case
`llvm/unittests/Transforms/Utils/CodeMoverUtilsTest.cpp` that was not
supported without this patch.
2024-04-10 00:28:21 -04:00
Noah Goldstein
6c40d463c2 [X86] Use nneg flag when trying to convert uitofp -> sitofp
Closes #86694
2024-04-09 23:06:55 -05:00
Noah Goldstein
7013638978 [DAG] Add support for nneg flag with uitofp
Copy `nneg` flag when building `UINT_TO_FP` from `uitofp` and use
`nneg` flag in the one place we transform `UINT_TO_FP` -> `SINT_TO_FP`
if the operand is non-negative.
2024-04-09 23:06:55 -05:00
Connor Sughrue
87e6f87fe7
[llvm][Support] Improvements to ListeningSocket functionality and documentation (#84710)
Improvements include
* Enable `ListeningSocket::accept` to timeout after a specified amount
of time or block indefinitely
* Enable `ListeningSocket::createUnix` to handle instances where the
target socket address already exists and differentiate between
situations where the existing file does and does not already have a
bound socket
* Doxygen comments

Functionality added for the module build daemon

---------

Co-authored-by: Michael Spencer <bigcheesegs@gmail.com>
2024-04-09 23:41:18 -04:00
Pengcheng Wang
8dc006ea40
[RISCV] Make EmitToStreamer return whether Inst is compressed
This is helpful to reduce calls of `RISCVRVC::compress` in #77337.

Reviewers: asb, lukel97, topperc

Reviewed By: topperc

Pull Request: https://github.com/llvm/llvm-project/pull/88120
2024-04-10 11:02:55 +08:00
Lei Wang
1aceee7bb6
Remove unused variable (#88223)
fix the CI
2024-04-09 19:25:08 -07:00
hanbeom
44c79da3ae
[InstCombine] Remove shl if we only demand known signbits of shift source (#79014)
This patch resolve TODO written in commit:
5909c67883

Proof: https://alive2.llvm.org/ce/z/C3VNoR
2024-04-10 11:19:09 +09:00
Shih-Po Hung
ee52add6cb
[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931)
This patch uses the argument type to infer the LMUL cost for the index
generation, add, and comparison.
2024-04-10 10:08:33 +08:00
Noah Goldstein
9170e38575 Add support for nneg flag with uitofp
As noted when #82404 was pushed (canonicalizing `sitofp` -> `uitofp`),
different signedness on fp casts can have dramatic performance
implications on different backends.

So, it makes to create a reliable means for the backend to pick its
cast signedness if either are correct.

Further, this allows us to start canonicalizing `sitofp`- > `uitofp`
which may easy middle end analysis.

Closes #86141
2024-04-09 18:12:33 -05:00
Teresa Johnson
a332cfc986
[MemProf] Perform cloning for each allocation separately (#87112)
Restructures the cloning slightly to perform all cloning for each
allocation separately. The prior algorithm would sometimes miss cloning
opportunities in cases where trimmed cold contexts partially overlapped
with longer contexts for different allocations.

Most of the change is isolated to the helpers that move edges to new or
existing clones, which now support moving a subset of context ids.
2024-04-09 14:12:32 -07:00
Joseph Huber
470aefb240
[Offload][NFC] Remove omp_ prefix from offloading entries (#88071)
Summary:
These entires are generic for offloading with the new driver now. Having
the `omp` prefix was a historical artifact and is confusing when used
for CUDA. This patch just renames them for now, future patches will
rework the binary format to make it more common.
2024-04-09 15:50:15 -05:00
Noah Goldstein
71ef04d7cd [InstCombine] fold (icmp eq/ne (or disjoint x, C0), C1) -> (icmp eq/ne x, C0^C1)
Proof: https://alive2.llvm.org/ce/z/m3xoo_

Closes #87734
2024-04-09 15:38:18 -05:00