112068 Commits

Author SHA1 Message Date
Mingming Liu
dda73336ad
[ThinLTO]Record import type in GlobalValueSummary::GVFlags (#87597)
The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.

This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.

In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.

- The next change is https://github.com/llvm/llvm-project/pull/87600

[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)
[3] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)
[4] 3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)
2024-04-10 19:46:01 -07:00
Jianjian Guan
fd50151180
[RISCV] Only support SPLAT_VECTOR for Zvfhmin when also enable the scalar extension of half fp (#88275) 2024-04-11 10:23:26 +08:00
Freddy Ye
f4509cf284
[X86][MC] Support enc/dec for SETZUCC and promoted SETCC. (#86473)
apx-spec: https://cdrdv2.intel.com/v1/dl/getContent/784266
apx-syntax-recommendation:
https://cdrdv2.intel.com/v1/dl/getContent/817241
2024-04-11 10:18:29 +08:00
Vitaly Buka
d927d1867f
[UBSAN] Emit optimization remarks (#88304) 2024-04-10 16:30:42 -07:00
Matthias Braun
acb7ddc5cf
[WebAssembly] Remove threadlocal.address when disabling TLS (#88209)
Remove `llvm.threadlocal.address` intrinsic usage when disabling TLS.
This fixes errors revealed by the stricter IR verification introduced in
PR #87841.
2024-04-10 16:24:02 -07:00
Oskar Wirga
a9d4ddd98a
[MergeFuncs/CFI] Ensure all type metadata is propogated for CFI (#88218)
I noticed that we weren't propagating ALL type metadata that was
attached to CFI functions:

# BEFORE

```
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030
... fn merging
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !34028
```

# AFTER

```
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !dbg !62311 !type !34028 !type !34029 !type !34030
... fn merging
; Function Attrs: minsize nounwind optsize ssp uwtable(sync)
define internal void @foo(ptr nocapture noundef readonly %0) #0 !type !type !34028 !type !34029 !type !34030
```

This patch makes sure that the entire vector of metadata is copied over.
2024-04-10 15:37:27 -07:00
Farzon Lotfi
05093e2438
[Spirv][HLSL] Add OpAll lowering and float vec support (#87952)
The main point of this change was to add support for HLSL's all
intrinsic.
In the process of doing that I found a few issues around creating an
`OpConstantComposite` via `buildZerosVal`.

First the current code didn't support floats so the process of adding
`buildZerosValF` meant I needed a
float version of `getOrCreateIntConstVector`. After doing so I renamed
both versions to `getOrCreateConstVector`. That meant I needed to create
a float type version of `getOrCreateIntCompositeOrNull`. Luckily the
type information was low for this function so was able to split it out
into a helpwe and rename `getOrCreateIntCompositeOrNull` to
`getOrCreateCompositeOrNull` With the exception of type handling
differences of the code and Null vs 0 Constant Op codes these functions
should be identical.

To handle scalar floats I could not use `buildConstantFP` like this PR
did:
https://github.com/llvm/llvm-project/commit/0a2aaab5aba46#diff-733a189c5a8c3211f3a04fd6e719952a3fa231eadd8a7f11e6ecf1e584d57411R1603
because that would create too many superfluous registers (that causes
problems in the validator), I had to create a float version of
`getOrCreateConstInt` which I called `getOrCreateConstFP`.
similar problems with doing it like this:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SPIRV/SPIRVBuiltins.cpp#L1540.

`buildZerosValF` also has a use of a function `getZeroFP`. This is
because half, float, and double scalar values of 0 would collide in
`SPIRVDuplicatesTracker<Constant> CT` if you use `APFloat(0.0f)`.

`getORCreateConstFP` needed its own version of `getOrCreateConstIntReg`
which I called `getOrCreateConstFloatReg` The one difference in this
function is `getOrCreateConstFloatReg` returns a bit width so we don't
have to call `getScalarOrVectorBitWidth` twice ie when it is used again
in `getOrCreateConstFP` for `OpConstantF` `addNumImm`.

`getOrCreateConstFloatReg` needed an `assignFloatTypeToVReg` helper
which called a `getOrCreateSPIRVFloatType` helper. There was no
equivalent IntegerType::get for floats so I handled this with a switch
statement on bit widths to get the right LLVM float type.

Finally, there is the use of `bool ZeroAsNull = STI.isOpenCLEnv();` This
is partly a cosmetic change. When Zeros are treated as nulls, we don't
create `OpConstantComposite` vectors which is something we do in the
DXCs SPIRV backend. The DXC SPIRV backend also does not use
`OpConstantNull`. Finally, I needed a means to test the behavior of the
OpConstantNull and `OpConstantComposite` changes and this was one way I
could do that via the same tests.
2024-04-10 16:27:44 -04:00
shamithoke
e3ef4612c1
Perform bitreverse using AVX512 GFNI for i32 and i64. (#81764)
Currently, the lowering operation for bitreverse using Intel AVX512 GFNI only supports byte vectors

Extend the operation to i32 and i64.

---------

Co-authored-by: shami <shami_thoke@yahoo.com>
2024-04-10 20:22:44 +01:00
Alexey Bataev
2b00a73f62
[SLP]Buildvector for alternate instructions with non-profitable gather operands.
If the operands of the potentially alternate node are going to produce
buildvector sequences, which result in more instructions, than the
original code, then suhinstructions should be vectorized as alternate
node, better to end up with the buildvector node.

Left column - experimental, Right - reference.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/loop_unroll.test   413680.00   416272.00  0.6%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12351788.00 12354844.00  0.0%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   664901.00   664949.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   664901.00   664949.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1171371.00  1171355.00 -0.0%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  1036396.00  1036284.00 -0.0%
                                                                         test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   111280.00   111248.00 -0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1392113.00  1391361.00 -0.1%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1392113.00  1391361.00 -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   281676.00   281452.00 -0.1%
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test     3025.00     3019.00 -0.2%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6351.00     6335.00 -0.3%

Metric: SLP.NumVectorInstructions

Program                                                                                                                                                SLP.NumVectorInstructions
                                                                                                                                                       results                   results0 diff
                                                                                    test-suite :: MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes.test    15.00                     16.00   6.7%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  1703.00                   1707.00   0.2%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  1703.00                   1707.00   0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 26241.00                  26239.00  -0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 11761.00                  11754.00  -0.1%
                                                                        test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   824.00                    822.00  -0.2%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  5668.00                   5654.00  -0.2%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  5668.00                   5654.00  -0.2%
                                                                                     test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   792.00                    790.00  -0.3%
                                                                                    test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   792.00                    790.00  -0.3%
                                                                                       test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test  1389.00                   1384.00  -0.4%
                                                                                         test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   596.00                    590.00  -1.0%
                                                                                test-suite :: MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig.test     6.00                      5.00 -16.7%

Metric: exec_time

Program                                                                                                                                                exec_time
                                                                                                                                                       results   results0  diff
                                                                               test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test     99.14    100.00    0.9%

Other changes are not significant (less than 0.1% percent with exectime
less 5 secs).

SingleSource/Benchmarks/Adobe-C++/loop_unroll - same small patterns
remain scalar, smaller code.
External/SPEC/CFP2017rate/526.blender_r/526.blender_r - many small
changes, some extra stores gets vectorized.
External/SPEC/CINT2017speed/625.x264_s/625.x264_s
External/SPEC/CINT2017rate/525.x264_r/525.x264_r
x264 has one change in a loop body, in function ssim_end4, some code
remain scalar, resulting in less code size.
External/SPEC/CFP2017rate/511.povray_r/511.povray_r - some extra code
gets vectorized, looks like some other patterns were matched.
MultiSource/Benchmarks/7zip/7zip-benchmark - extra stores were
vectorized (looks like the graphs become profitable)
MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg - small
changes in vectorized code (some small part remain scalar).
External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r
External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s
Many changes cause by the fact that the code of one function becomes
smaller (onvertLCHabToRGB) and this functions gets inlined after that.
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc - some small
changes here and there, some extra code is vectorized, some remain
scalar (2 x vectors)
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes - emits 2 scalars
+ 2 insertelems instead of insert, broadcast, alt code (3 instructions,
  total 5 insts)
MultiSource/Benchmarks/Prolangs-C/plot2fig/plot2fig - small graph
becomes profitable and gets vectorized.
External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r
External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s
Some small graph becomes profitable and gets vectorized.
MultiSource/Benchmarks/FreeBench/pifft/pifft - no changes in final code.

Reviewers: RKSimon, dtcxzyw

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/84978
2024-04-10 14:33:56 -04:00
Noah Goldstein
81cdd35c0c [ValueTracking] Add support for xor/disjoint or in isKnownNonZero
Handles cases like `X ^ Y == X` / `X disjoint| Y == X`.

Both of these cases have identical logic to the existing `add` case,
so just converting the `add` code to a more general helper.

Proofs: https://alive2.llvm.org/ce/z/Htm7pe

Closes #87706
2024-04-10 13:13:43 -05:00
Noah Goldstein
2646790155 [ValueTracking] Add tests for xor/disjoint or in isKnownNonZero; NFC 2024-04-10 13:13:43 -05:00
Noah Goldstein
0c57a2e4b4 [ValueTracking] Add support for xor/disjoint or in getInvertibleOperands
This strengthens our `isKnownNonEqual` logic with some fairly
trivial cases.

Proofs: https://alive2.llvm.org/ce/z/4pxRTj

Closes #87705
2024-04-10 13:13:43 -05:00
Noah Goldstein
195d278d50 [ValueTracking] Add tests for xor/disjoint or in getInvertibleOperands; NFC 2024-04-10 13:13:43 -05:00
Noah Goldstein
9c545a14c0 [ValueTracking] Add support for insertelement in isKnownNonZero
Inserts don't modify the data, so if all elements that end up in the
destination are non-zero the result is non-zero.

Closes #87703
2024-04-10 13:13:43 -05:00
Noah Goldstein
8a28b9b8ec [ValueTracking] Add tests for insertelement in isKnownNonZero; NFC 2024-04-10 13:13:43 -05:00
Noah Goldstein
87528bfefb [ValueTracking] Add support for shufflevector in isKnownNonZero
Shuffles don't modify the data, so if all elements that end up in the
destination are non-zero the result is non-zero.

Closes #87702
2024-04-10 13:13:42 -05:00
Noah Goldstein
c1d3f39ae9 [ValueTracking] Add tests for shufflevector in isKnownNonZero 2024-04-10 13:13:42 -05:00
Jun Wang
86842e1f72
[AMDGPU] New clang option for emitting a waitcnt instruction after each memory instruction (#79236)
This patch introduces a new command-line option for clang, namely,
amdgpu-precise-mem-op (or precise-memory in the backend). When this option is specified, a waitcnt
instruction is generated after each memory load/store instruction. The
counter values are always 0, but which counters are involved depends on
the memory instruction.

---------

Co-authored-by: Jun Wang <jun.wang7@amd.com>
2024-04-10 10:47:04 -07:00
Craig Topper
f27f369710
[RISCV] Remove interrupt handler special case from RISCVFrameLowering::determineCalleeSaves. (#88069)
This code was trying to save temporary argument registers in interrupt
handler functions that contain calls. With the exception that all FP
registers are saved including the normally callee saved registers.

If all of the callees use an FP ABI and the interrupt handler doesn't
touch the normally callee saved FP registers, we don't need to save
them.

It doesn't appear that we need to special case functions with calls. The
normal callee saved register handling will already check each of the calls
and consider a register clobbered if the call doesn't explicitly say it is preserved.

All of the test changes are from the removal of the FP callee saved
registers. There are tests for interrupt handlers with F and D extension
that use ilp32 or lp64 ABIs that are not affected by this change. They
still save the FP callee saved registers as they should.

gcc appears to have a bug where the D extension being enabled with the
ilp32f or lp64f ABI does not save the FP callee saved regs. The callee
would only save/restore the lower 32 bits and clobber the upper bits.
LLVM saves the FP callee saved regs in this case and there is an
unchanged test for it.

The unnecessary save/restore was raised in this thread
https://discourse.llvm.org/t/has-bugs-when-optimizing-save-restore-csrs-by-changing-csr-xlen-f32-interrupt/78200/1
2024-04-10 10:28:54 -07:00
David Green
4dcf33b6c2 [AArch64] Cleanup and GISel coverage for lrint tests. NFC 2024-04-10 18:13:57 +01:00
Vyacheslav Levytskyy
335d5d5f47
[SPIRV] Tweak parsing of base type name in builtins (#88255)
This PR is a small improvement of parsing of base type name in builtins,
allowing to understand `unsigned ...` types. The test case that fails
without the fix is attached.
2024-04-10 19:04:31 +02:00
Evgenii Stepanov
e72c949c15
[msan] Overflow intrinsics. (#88210) 2024-04-10 09:12:25 -07:00
Craig Topper
323d3ab257
[RISCV] Optimize undef Even vector in getWideningInterleave. (#88221)
We recently optimized the code when the Odd vector was undef to fix a
poison bug.

There are additional optimizations we can do if the even vector is
undef. With Zvbb, we can use a single vwsll. Without Zvbb, we can use a
vzext.vf2 and a vsll.
2024-04-10 09:08:50 -07:00
Noah Goldstein
f1ee458ddb [ValueTracking] improve isKnownNonZero precision for smax
Instead of relying on known-bits for strictly positive, use the
`isKnownPositive` API. This will use `isKnownNonZero` which is more
accurate.

Closes #88170
2024-04-10 10:40:49 -05:00
Noah Goldstein
2ff82c2c64 [ValueTracking] Add tests for improving isKnownNonZero of smax; NFC 2024-04-10 10:40:49 -05:00
Noah Goldstein
37ca6fa1e2 [ValueTracking] Add support for overflow detection functions is isKnownNonZero
Adds support for: `{s,u}{add,sub,mul}.with.overflow`

The logic is identical to the the non-overflow binops, we where just
missing the cases.

Closes #87701
2024-04-10 10:40:48 -05:00
Noah Goldstein
a02b3c0182 [ValueTracking] Add tests for overflow detection functions is isKnownNonZero; NFC 2024-04-10 10:40:48 -05:00
Noah Goldstein
41c52217b0 [ValueTracking] Add support for vector_reduce_{s,u}{min,max} in computeKnownBits
Previously missing. We compute by just applying the reduce function on
the knownbits of each element.

Closes #88169
2024-04-10 10:40:48 -05:00
Noah Goldstein
77d668451a [ValueTracking] Add support for vector_reduce_{s,u}{min,max} in isKnownNonZero
Previously missing, proofs for all implementations:
https://alive2.llvm.org/ce/z/G8wpmG
2024-04-10 10:40:48 -05:00
Noah Goldstein
f9f4aba547 [InstCombine] Add tests for non-zero/knownbits of vector_reduce_{s,u}{min,max}; NFC 2024-04-10 10:40:48 -05:00
Craig Topper
7f1b9adfc8
[RISCV] Add MachineCombiner to fold (sh3add Z, (add X, (slli Y, 6))) -> (sh3add (sh3add Y, Z), X). (#87884)
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.

This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.

I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
2024-04-10 08:39:56 -07:00
Alexey Bataev
6ca5a410d2 [SLP]Fix PR87358: broken module, Instruction does not dominate all uses.
If the first node is a gather node with extractelement instructions,
still need to put the vector value after all instructions, not after the
very first one.
2024-04-10 08:24:15 -07:00
annamthomas
54a9f0007c
[SCEV] Fix BinomialCoefficient Iteration to fit in W bits (#88010)
BinomialCoefficient computes the value of W-bit IV at iteration It of a loop. When W is 1, we can call multiplicative inverse on 0 which triggers an assert since 1b76120.
    
Since the arithmetic is supposed to wrap if It or K does not fit in W bits, do the truncation into W bits after we do the shift.
    
 Fixes #87798
2024-04-10 09:02:23 -04:00
Florian Hahn
94ed57dab6
[PhaseOrdering] Add test for #85551.
Add test for missed hoisting of checks from std::span
https://github.com/llvm/llvm-project/issues/85551
2024-04-10 13:30:30 +01:00
Dinar Temirbulatov
990c4bc95f
[AArch64][SVE2] Generate SVE2 BSL instruction in LLVM for bit-twiddling. (#83514)
Allow to fold or/and-and to BSL instuction for scalable vectors.
2024-04-10 11:07:59 +01:00
Simon Pilgrim
0e7d14d2e8 [X86] Regenerate mmx-intrinsics.ll test checks 2024-04-10 10:42:01 +01:00
hev
0d17e1f0e5
[LoongArch] Revert sp adjustment in prologue (#88110)
After commit 18c5f3c3 ("[RegisterScavenger][RISCV] Don't search for
FrameSetup instrs if we were searching from Non-FrameSetup instrs"), we
can revert the `sp` adjustment 4e2364a2 ("[LoongArch] Add emergency
spill slot for GPR for large frames") to generate better code, as the
issue with `RegScavenger` has been resolved.

Fixes #88109
2024-04-10 17:13:25 +08:00
Paschalis Mpeis
e50c4c83b6
[AArch64][TLI] Add TLI mappings for ArmPL modf, sincos, sincospi (#83143)
ArmPL 24.04 release fixes a bug concerning these methods,
so now they can be re-introduced to TLI mappings.
2024-04-10 09:34:46 +01:00
Chia
469caa31e7
[RISCV] Use vwadd.vx for splat vector with extension (#87249)
This patch allows `combineBinOp_VLToVWBinOp_VL` to handle patterns like
`(splat_vector (sext op))` or `(splat_vector (zext op))`. Then we can
use `vwadd.vx` and `vwadd.w` for such a case.

### Source code
```
define <vscale x 8 x i64> @vwadd_vx_splat_sext(<vscale x 8 x i32> %va, i32 %b) {
     %sb = sext i32 %b to i64
     %head = insertelement <vscale x 8 x i64> poison, i64 %sb, i32 0
     %splat = shufflevector <vscale x 8 x i64> %head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
     %vc = sext <vscale x 8 x i32> %va to <vscale x 8 x i64>
     %ve = add <vscale x 8 x i64> %vc, %splat
     ret <vscale x 8 x i64> %ve
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/sq191PsT4)
```
vwadd_vx_splat_sext:
  sext.w a0, a0
  vsetvli a1, zero, e64, m8, ta, ma
  vmv.v.x v16, a0
  vsetvli zero, zero, e32, m4, ta, ma
  vwadd.wv v16, v16, v8
  vmv8r.v v8, v16
  ret
```
### After this patch
```
vwadd_vx_splat_sext
  vsetvli a1, zero, e32, m4, ta, ma
  vwadd.vx v16, v8, a0
  vmv8r.v v8, v16
  ret
```
2024-04-10 15:26:17 +09:00
XChy
313a33b9df
[InstCombine] Reduce nested logical operator if poison is implied (#86823)
Fixes #76623
Alive2 proof: https://alive2.llvm.org/ce/z/gX6znJ (I'm not sure how to
write a proof for such transform, maybe there are mistakes)

In most cases, `icmp(a, C1) && (other_cond && icmp(a, C2))` will be
reduced to `icmp(a, C1) & (other_cond && icmp(a, C2))`, since latter
icmp always implies the poison of the former. After reduction, it's
easier to simplify the icmp chain.
Similarly, this patch does the same thing for `(A && B) && C --> A && (B
& C)`. Maybe we could constraint such reduction only on icmps if there
is regression in benchmarks.
2024-04-10 14:19:44 +08:00
Shih-Po Hung
3d985a6f1b
[RISCV][TTI] Scale the cost of Select with LMUL (#88098)
Use the Val type to estimate the instruction cost for SelectInst.
2024-04-10 14:18:15 +08:00
Noah Goldstein
6c40d463c2 [X86] Use nneg flag when trying to convert uitofp -> sitofp
Closes #86694
2024-04-09 23:06:55 -05:00
Noah Goldstein
84a5332a68 [X86] Add tests for uitofp nneg -> sitofp; NFC 2024-04-09 23:06:55 -05:00
hanbeom
44c79da3ae
[InstCombine] Remove shl if we only demand known signbits of shift source (#79014)
This patch resolve TODO written in commit:
5909c67883

Proof: https://alive2.llvm.org/ce/z/C3VNoR
2024-04-10 11:19:09 +09:00
Shih-Po Hung
ee52add6cb
[RISCV][TTI] Implement cost of intrinsic active_lane_mask (#87931)
This patch uses the argument type to infer the LMUL cost for the index
generation, add, and comparison.
2024-04-10 10:08:33 +08:00
Evgenii Stepanov
e8a3b72272 [msan] Precommit tests.
Precommit tests for overflowing and saturating arithmetic intrinsics.
2024-04-09 16:32:48 -07:00
Noah Goldstein
9170e38575 Add support for nneg flag with uitofp
As noted when #82404 was pushed (canonicalizing `sitofp` -> `uitofp`),
different signedness on fp casts can have dramatic performance
implications on different backends.

So, it makes to create a reliable means for the backend to pick its
cast signedness if either are correct.

Further, this allows us to start canonicalizing `sitofp`- > `uitofp`
which may easy middle end analysis.

Closes #86141
2024-04-09 18:12:33 -05:00
Teresa Johnson
a332cfc986
[MemProf] Perform cloning for each allocation separately (#87112)
Restructures the cloning slightly to perform all cloning for each
allocation separately. The prior algorithm would sometimes miss cloning
opportunities in cases where trimmed cold contexts partially overlapped
with longer contexts for different allocations.

Most of the change is isolated to the helpers that move edges to new or
existing clones, which now support moving a subset of context ids.
2024-04-09 14:12:32 -07:00
Noah Goldstein
71ef04d7cd [InstCombine] fold (icmp eq/ne (or disjoint x, C0), C1) -> (icmp eq/ne x, C0^C1)
Proof: https://alive2.llvm.org/ce/z/m3xoo_

Closes #87734
2024-04-09 15:38:18 -05:00
Noah Goldstein
5b58eb68ed [InstCombine] Add tests for folding (icmp eq/ne (or disjoint x, C0), C1); NFC 2024-04-09 15:38:18 -05:00