54094 Commits

Author SHA1 Message Date
Philip Reames
b5657d6dc7
[RISCV] Reverse default assumption about performance of vlseN.v vd, (rs1), x0 (#98205)
Some cores implement an optimization for a strided load with an x0
stride, which results in fewer memory operations being performed then
implied by VL since all address are the same. It seems to be the case
that this is the case only for a minority of available implementations.
We know that sifive-x280 does, but sifive-p670 and spacemit-x60 both do
not.

(To be more precise, measurements on the x60 appear to indicate that a
 stride of x0 has similar latency to a non-zero stride, and that both
 are about twice a vleN.v.  I'm taking this to mean the x0
 case is not optimized.)

We had an existing flag by which a processor could opt out of this
assumption but no upstream users. Instead of adding this flag to the
p670 and x60, this patch reverses the default and adds the opt-in flag
only to the x280.
2024-07-10 07:35:56 -07:00
Alex Bradbury
f8dbe1d09d
Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328)
Reverts llvm/llvm-project#89927 while we investigate performance
regressions reported by @dtcxzyw
2024-07-10 15:33:20 +01:00
Allen
d1006315b5
[AArch64] Lower for power of 2 signed divides with scalar type (#97879)
Expected same assemble for code which doesn't use sve registers when we
compile it with/without -msve-vector-bits=256.

Fix https://github.com/llvm/llvm-project/issues/97821
2024-07-10 21:52:09 +08:00
Alex Bradbury
af47a4ec50
[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927)
This avoids some cases where LSR produces results that lead to very poor
codegen. There's a chance we'll see minor degradations for some inputs
in the case that our metrics say the found solution is worse, but in
reality it's better than the starting point.

Per the review thread, at least one vendor has been enabling this by
defualt for some time and found overall it's an improvement. As such,
we'll enable by default and aim to fix any as-yet-unknown regressions
in-tree.
2024-07-10 13:23:31 +01:00
paperchalice
abde52aa66
[CodeGen][NewPM] Port LiveIntervals to new pass manager (#98118)
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.

This would be the last analysis required by `PHIElimination`.
2024-07-10 19:34:48 +08:00
Fabian Ritter
17316a5989
Revert "[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy" (#98295)
Reverts llvm/llvm-project#97998
This seems to cause a buildbot failure on clang-hip-vega20, in the HIP
test-suite, need to investigate.
2024-07-10 12:16:20 +02:00
Daniel Kiss
1782810b84 [Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.

This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.

Releand with test fixes.
2024-07-10 11:32:41 +02:00
Fabian Ritter
6c84bba218
[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy (#97998)
Memcpy intrinsics with statically unknown loop sizes are lowered with
two load/store loops: one with access widths specified by the target,
and a residual loop that copies remaining bytes individually.

As the residual loop operates byte-wise, its accesses are only
1-aligned. However, we currently use the alignment that is optimal for
the first loop in both, which is unsound. With this patch, we use the
correct alignment in the residual loop.

The lowering of memcpy with a static size already handles alignments for
the residual correctly.
2024-07-10 11:29:26 +02:00
Madhur Amilkanthwar
42672199ec
[GISel][AArch64] Libcall support for G_FPEXT 128-bit types (#97735)
This patch adds support for generating libcall
for 128-bit types of G_FPEXT.

This fixes ~10 fallbacks in RajaPerf benchmark.
2024-07-10 14:58:24 +05:30
Luke Lau
8ab19d2e70 [RISCV] Add -verify-machineinstrs to RISCVInsertVSETVLI MIR tests. NFC
Now that we're working with LiveIntervals, make sure that they're correct.
2024-07-10 16:30:57 +08:00
Daniel Kiss
4b2daeccc7
Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284)
Reverts llvm/llvm-project#82819
2024-07-10 10:22:38 +02:00
Daniel Kiss
e15d67cfc2
[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819)
So far branch protection, sign return address, guarded control stack
attributes are
only emitted as module flags to indicate the functions need to be
generated with
those features.
The problem is in case of an LTO build the module flags are merged with
the `min`
rule which means if one of the module is not build with sign return
address then the features
will be turned off for all functions. Due to the functions take the
branch-protection and
sign-return-address features from the module flags. The
sign-return-address is
function level option therefore it is expected functions from files that
is
compiled with -mbranch-protection=pac-ret to be protected.
The inliner might inline functions with different set of flags as it
doesn't consider
the module flags.
 
This patch adds the attributes to all functions and drops the checking
of the module flags
for the code generation.
Module flag is still used for generating the ELF markers.
Also drops the "true"/"false" values from the
branch-protection-enforcement,
branch-protection-pauth-lr, guarded-control-stack attributes as presence
of the
attribute means it is on absence means off and no other option.
2024-07-10 10:06:14 +02:00
Jianjian Guan
9af1f8fbad
[RISCV] Match vector fp-int convert intrinsics with specific RTZ rounding mode to the rtz variants (#98120) 2024-07-10 10:51:20 +08:00
Anatoly Trosinenko
a937d2918e
[AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (#98062)
Make SLS Hardening pass handle BLRA* instructions the same way it
handles BLR. The thunk names have the form

    __llvm_slsblr_thunk_xN            for BLR thunks
    __llvm_slsblr_thunk_(aaz|abz)_xN  for BLRAAZ and BLRABZ thunks
    __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks

Now there are about 1800 possible thunk names, so do not rely on linear
thunk function's name lookup and parse the name instead.

This patch reapplies llvm/llvm-project#97605.
2024-07-09 22:51:49 +03:00
Daniil Kovalev
746f572615
[test][PAC][AArch64] Add ELF tests for subtarget-neutral codegen (#98020)
Many parts of PAuth-related codegen are not MachO- or ELF-specific. Add
RUN lines against ELF targets to ensure that codegen works for ELF as
well as for MachO.
2024-07-09 21:00:55 +03:00
Min-Yih Hsu
7e2f96194f
[MachineSink] Fix missing sinks along critical edges (#97618)
4e0bd3f improved early MachineLICM's capabilities to hoist COPY from
physical registers out of a loop. However, it accidentally broke one of
MachineSink's preconditions on sinking cheap instructions (in this case,
COPY) which considered those instructions being profitable to sink only
when there are at least two of them in the same def-use chain in the
same basic block. So if early MachineLICM hoisted one of them out,
MachineSink no longer sink rest of the cheap instructions. This results
in redundant load immediate instructions from the motivating example
we've seen on RISC-V.

This patch fixes this by teaching MachineSink that if there is more than
one demand to sink a register into the same block from different
critical edges, it should be considered profitable as it increases the
CSE opportunities.
This change also improves two of the AArch64's cases.
2024-07-09 10:48:22 -07:00
Philip Reames
90d79e258e Reapply "[RISCV] Remove experimental from Ztso. (#96465)"
This was reverted in f985a8826bfa4ca3d23e654185de35e30ea6dc79.  Since that,
the default WMO lowering has moved to A67 compatible, the ABI attribute
emission has landed (off by default), and the LLD change to merge said
attributes have landed.  Our ztso lowering is believed to also be A67
compatible, and no known issues remain.

Original commit message:

Ztso 1.0 was ratified in January 2023.
Documentation:
https://github.com/riscv/riscv-isa-manual/blob/main/src/ztso-st-ext.adoc
2024-07-09 10:45:56 -07:00
Min Hsu
4283566663 [test][MachineSink][RISCV] Pre-commit test for #97618 2024-07-09 10:44:44 -07:00
Shengchen Kan
a9183b8899 [X86][MC] Fix encoding bug for CCMP introduced in #85175 2024-07-09 20:12:47 +08:00
David Spickett
9856af634d Revert "[AArch64][GlobalISel] Make G_DUP immediate 32-bits or larger (#96780)"
This reverts commit 5a5cd3f0bcdf37a32eadd85d6e57c642cb829402.

Due to test suite failures on AArch64:
https://lab.llvm.org/buildbot/#/builders/125/builds/541
2024-07-09 11:52:52 +00:00
Luke Lau
19cc46144d
[RISCV] Use VP strided load in concat_vectors combine (#98131) 2024-07-09 18:36:00 +08:00
Shengchen Kan
a8a21bbec2 [X86][test] Pre-update test for the encoding bug introduced in #85175 2024-07-09 17:25:55 +08:00
Malay Sanghi
a77d3ea310
[X86][GlobalISel] Add instruction selection support for x87 ld/st (#97016)
Add x87 G_LOAD/G_STORE selection support to existing C++ lowering.
2024-07-09 10:54:25 +02:00
Jianjian Guan
3259768557
[RISCV] Remove experimental for bf16 extensions (#97996)
They are already ratified now.
2024-07-09 14:34:03 +08:00
Craig Topper
bb8998dd3b [RISCV] Don't custom legalize vXf16 SPLAT_VECTOR with Zvfhmin without Zfhmin.
Marking SPLAT_VECTOR as Custom enables generic DAGCombine to turn
BUILD_VECTOR into SPLAT_VECTOR. We need to custom type legalize BUILD_VECTOR
without Zfhmin since we don't have the scalar f16 type. If we allow
SPLAT_VECTOR to be formed, we'll need to custom type legalize it too.

Easiest fix is to only enable SPLAT_VECTOR with Zvfhmin+Zfhmin. There's
still an issue that we need to properly support BUILD_VECTOR with Zvfhmin+Zfhmin.

Should fix the new case reported in #97849.

I've also changed the predicates to Zfhmin instead of ZfhminOrZhinxmin
since Zhinx isn't compatible with Zvfhmin.
2024-07-08 22:44:58 -07:00
Carl Ritson
7eb1a320cc
[AMDGPU] Update EXECZ retention in SIPreEmitPeephole for GFX10/12 (#97676)
The check to maintain EXECZ branches only checks S_WAITCNT.
Add handling for new waitcnt instructions in GFX10 and GFX12.
2024-07-09 14:44:31 +09:00
Luke Lau
3f83a69bcb
[RISCV] Allow folding vmerge into masked ops when mask is the same (#97989)
We currently only fold a vmerge into a masked true operand if the vmerge
has an all-ones mask, since we end up keeping the mask from the true
operand.

But if the masks are the same then we can still fold, because vmerge and
true have the same passthru. If an element was masked off in the
original vmerge, it will also be masked off in the resulting true, and
will have the same passthru value.

The motivation for this is to lower masked VP loads and stores with
passthrus to masked RVV instructions. Normally you can express a masked
RVV instruction with a mask undisturbed passthru via a combination of a
VP op with an all-ones mask and a vp.merge. But for loads and stores you
need the same mask on the VP op as well as the vp.merge.
2024-07-09 12:12:02 +08:00
paperchalice
4010f894a1
[CodeGen][NewPM] Port SlotIndexes to new pass manager (#97941)
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
2024-07-09 12:09:11 +08:00
paperchalice
ac0b2814c3
[CodeGen][NewPM] Port LiveVariables to new pass manager (#97880)
- Port `LiveVariables` to new pass manager.
- Convert to `LiveVariablesWrapperPass` in legacy pass manager.
2024-07-09 10:50:43 +08:00
paperchalice
79d0de2ac3
[CodeGen][NewPM] Port machine-loops to new pass manager (#97793)
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
2024-07-09 09:11:18 +08:00
Philip Reames
c95935789d
[RISCV] Directly use pack* in build_vector lowering (#98084)
In 03d4332, we extended build_vector lowering to pack elements into the
largest size which doesn't exceed either ELEN or XLEN. The zbkb
extension - ratified under scalar crypto, but otherwise not really
connected to crypto per se - adds the packh, packw, and pack
instructions. These instructions are designed for exactly this pairwise
packing.

I ended up choosing to directly lower to machine nodes. A combination of
the slightly non-uniform semantics of these instructions (packw *sign*
extends the result, whereas packh *zero* extends it), and our generic
dag canonicalization (which sinks shl through or nodes), make pattern
matching these tricky and not particularly robust. Another alternative
was to have an ISD node for them, but that didn't seem to add much in
practice.
2024-07-08 16:10:25 -07:00
Philip Reames
07bb0444dd [RISCV] Add build_vector coverage when zbkb is available
An uncomping change will make much more complete use of packh, packw, and
pack during element packing inside build_vector lowering.
2024-07-08 14:24:44 -07:00
Jon Roelofs
7f0d9bae9d
[llvm][AArch64] Fix a crash with an incorrect asm constraint (#98071)
Fixes: rdar://130887714
2024-07-08 14:00:29 -07:00
Paul Kirth
a4fec164bf
Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376)" (#90267)
With the tag merging in place, we can safely change the default for
+seq-cst-trailing-fence to the default, according to the recommendation
in
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc

This patch changes the default for the feature flag, and moves to more
consistent naming with respect to existing features.

This was reverted with https://github.com/llvm/llvm-project/pull/84597,
because ld.bfd would segfault with unknown riscv attributes. Now that
attributes emission is guarded with a backend flag,
`--riscv-abi-attributes`, this should be safe to reland, since it won't 
introduce abi tags unless the user opts into them.
2024-07-08 13:35:36 -07:00
Amy Huang
ae7ab043f2
Add __hlt intrinsic for Windows ARM. (#96578)
Add __hlt, which is a MSVC ARM64 intrinsic. 

This intrinsic is just the HLT instruction. MSVC's version seems to
return something undefined; in this patch
it will just return zero. 

MSVC intrinsics are defined here
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics.
I used unsigned int as the return type, because that is what the MSVC
intrin.h header uses, even though
it conflicts with the documentation.
2024-07-08 12:59:02 -07:00
David Green
e0012a0b3b [AArch64] Regenerate cmp-to-cmn.ll. NFC 2024-07-08 19:09:23 +01:00
Philip Reames
03d4332625
[RISCV] Pack build_vectors into largest available element type (#97351)
Our worst case build_vector lowering is a serial chain of vslide1down.vx
operations which creates a serial dependency chain through a relatively
high latency operation. We can instead pack together elements into ELEN
sized chunks, and move them from integer to scalar in a single
operation.

This reduces the length of the serial chain on the vector side, and
costs at most three scalar instructions per element. This is a win for
all cores when the sum of the latencies of the scalar instructions is
less than the vslide1down.vx being replaced, and is particularly
profitable for out-of-order cores which can overlap the scalar
computation.

This patch is restricted to configurations with zba and zbb. Without
both, the zero extend might require two instructions which would bring
the total scalar instructions per element to 4. zba and zba are both
present in the rva22u64 baseline which is looking to be quite common for
hardware in practice; we could extend this to systems without bitmanip
with a bit of extra effort.
2024-07-08 10:38:15 -07:00
chuongg3
5a5cd3f0bc
[AArch64][GlobalISel] Make G_DUP immediate 32-bits or larger (#96780)
G_DUP's immediate operand gets extended in RegBankSelect to allow for
better pattern matching in TableGen for #96782
2024-07-08 14:25:39 +01:00
Mahesh-Attarde
854bbc50fc
[X86][CodeGen] security check cookie execute only when needed (#95904)
For windows __security_check_cookie call gets call everytime function is return without fixup. Since this function is defined in runtime library, it incures cost of call in dll which simply does comparison and returns most time. With Fixup, We selective move to call in DLL only if comparison fails.
2024-07-08 14:11:21 +01:00
Manish Kausik H
69192e0193
[LegalizeDAG] Optimize CodeGen for ISD::CTLZ_ZERO_UNDEF (#83039)
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.

The details of the optimization are outlined in #82075

Fixes #82075

Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
2024-07-08 14:01:32 +01:00
Simon Pilgrim
92083e855b [X86] Allow VPERMV3 -> VPERMV folds to handle extraction from a wider source vector (e.g. v16i32 -> v4i32)
We don't need to restrict this to double width vectors, as long as we correctly bitcast the types

Improves the fix for #97968
2024-07-08 13:10:45 +01:00
Simon Pilgrim
8ac6b415e4 [X86] Ensure VPERMV3 -> VPERMV fold comes from a double width vector
#96414 + #97206 didn't ensure that we were extracting subvectors from a vector double the width of the destination.

We can relax this in a future patch, but fix the #97968 crash first.

Fixes #97968
2024-07-08 12:04:11 +01:00
Momchil Velikov
a497e987e5 Reapply "[AArch64] Lower extending sitofp using tbl (#92528)"
This re-commits d1a4f0c9fb559eb4c2fb56112e56343bcd333edc after
a issue was fixed in f92bfca9fc217cad9026598ef6755e711c0be070
("[AArch64] All bits of an exact right shift are demanded (#97448)").
2024-07-08 11:55:29 +01:00
esmeyi
c119da23af [PowerPC] Function descriptor symbol may be omitted for external symbol. #97526
If a function's address is taken, which means it may be called via a function pointer,
we need the function descriptor for it.
Otherwise, the function descriptor can be omitted for external symbols.
2024-07-08 03:47:33 -04:00
hstk30-hw
ef465bf8b1
[ARM] Fix arm32be softfp mode miscompilation for neon sdiv (#97883)
Related issue: https://github.com/llvm/llvm-project/issues/97782
2024-07-08 14:18:38 +08:00
Vikram Hegde
2a9607168b
[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933) 2024-07-08 10:53:16 +05:30
Feng Zou
e603451f3c
[X86] Support branch hint (#97721)
For more details about this feature, please refer to latest Intel 64 and
IA-32 Architectures Optimization Reference Manual Volume 1:
https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
2024-07-08 13:12:50 +08:00
Craig Topper
e4ee9bf0d2
[RISCV] Custom legalize vXf16 BUILD_VECTOR without Zfhmin. (#97874)
If we don't have Zfhmin, we will call `SoftPromoteHalfOperand` on the
BUILD_VECTOR. This operation is not supported by the generic code.

Instead, custom lower to a vXi16 BUILD_VECTOR using bitcasts.

Fixes #97849.
2024-07-07 20:25:09 -07:00
Anatoly Trosinenko
f90bac99e1
Revert "[AArch64][PAC] Support BLRA* instructions in SLS Hardening pass" (#97887)
This reverts commit 88b26293a24bdd85fce2b2f7191cc0a5bc0cecfe due to
failures of

    CodeGen/AArch64/speculation-hardening-sls-blra.mir
2024-07-06 13:55:12 +03:00
Anatoly Trosinenko
88b26293a2
[AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (#97605)
Make SLS Hardening pass handle BLRA* instructions the same way it
handles BLR. The thunk names have the form

    __llvm_slsblr_thunk_xN            for BLR thunks
    __llvm_slsblr_thunk_(aaz|abz)_xN  for BLRAAZ and BLRABZ thunks
    __llvm_slsblr_thunk_(aa|ab)_xN_xM for BLRAA and BLRAB thunks

Now there are about 1800 possible thunk names, so do not rely on linear
thunk function's name lookup and parse the name instead.
2024-07-06 13:36:02 +03:00