76770 Commits

Author SHA1 Message Date
Simon Pilgrim
ca827d53c5
[X86] Convert logicalshift(x, C) -> and(x, M) iff x is allsignbits (#83596)
If we're logical shifting an all-signbits value, then we can just mask out the shifted bits.

This helps removes some unnecessary bitcasted vXi16 shifts used for vXi8 shifts (which SimplifyDemandedBits will struggle to remove through the bitcast), and allows some AVX1 shifts of 256-bit values to stay as a YMM instruction.

Noticed in codegen from #82290
2024-03-02 12:44:33 +00:00
Shih-Po Hung
fb67dce1cb
[RISCV] Fix crash when unrolling loop containing vector instructions (#83384)
When MVT is not a vector type, TCK_CodeSize should return an invalid
cost. This patch adds a check in the beginning to make sure all cost
kinds return invalid costs consistently.

Before this patch, TCK_CodeSize returns a valid cost on scalar MVT but
other cost kinds doesn't.

This fixes the issue #83294 where a loop contains vector instructions
and MVT is scalar after type legalization when the vector extension is
not enabled,
2024-03-02 12:33:55 +08:00
Sergei Barannikov
057e725260
[Sparc] Use generated MatchRegisterName (NFCI) (#82165) 2024-03-02 05:17:41 +03:00
Fangrui Song
21c83feca5 [ARM] Simplify shouldAssumeDSOLocal for ELF. NFC 2024-03-01 16:14:48 -08:00
Noah Goldstein
ae76dfb747 [X86] Don't always separate conditions in (br (and/or cond0, cond1)) into separate branches
It makes sense to split if the cost of computing `cond1` is high
(proportionally to how likely `cond0` is), but it doesn't really make
sense to introduce a second branch if its only a few instructions.

Splitting can also get in the way of potentially folding patterns.

This patch introduces some logic to try to check if the cost of
computing `cond1` is relatively low, and if so don't split the
branches.

Modest improvement on clang bootstrap build:
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=cycles
Average stage2-O3:   0.59% Improvement (cycles)
Average stage2-O0-g: 1.20% Improvement (cycles)

Likewise on llvm-test-suite on SKX saw a net 0.84% improvement  (cycles)

There is also a modest compile time improvement with this patch:
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=instructions%3Au

Note that the stage2 instruction count increases is expected, this
patch trades instructions for decreasing branch-misses (which is
proportionately lower):
https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=branch-misses

NB: This will also likely help for APX targets with the new `CCMP` and
`CTEST` instructions.

Closes #81689
2024-03-01 15:35:34 -06:00
Changpeng Fang
84927a6728
AMDGPU: Simplify instruction definitions for global_load_tr_b64(b128) (#83601)
WaveSizePredicate is copied from pseudo to real
2024-03-01 10:03:54 -08:00
Farzon Lotfi
e741d889f4
[DXIL] Add frac unary lowering (#83465)
This change adds lowering for HLSL's frac intrinsic to DXIL.

This change should complete #70099
2024-03-01 12:53:05 -05:00
Farzon Lotfi
b542501ad7
[HLSL][DXIL] Implementation of round intrinsic (#83570)
hlsl_intrinsics.h - add the round  api
DXIL.td add the llvm intrinsic to DXIL lowering mapping 
This change reuses llvm's existing intrinsic
`__builtin_elementwise_round`\ `int_round`
This change implements: #70077
2024-03-01 12:27:25 -05:00
Jay Foad
53f89a0bb7
[AMDGPU] Remove AtomicNoRet class and getAtomicNoRetOp table (#83593) 2024-03-01 17:18:55 +00:00
Martin Wehking
92fe6c61f9
Silence illegal address computation warning (#83244)
Add an assertion before an access of ValMappings to ensure that it is
within the array bounds.
Silence a static analyzer warning through this.
2024-03-01 21:41:12 +05:30
Simon Pilgrim
765a5d62bc
[X86] Pre-SSE42 v2i64 sgt lowering - check if representable as v2i32 (#83560)
Without PCMPGTQ, if the i64 elements are sign-extended enough to be representable as i32 then we can compare the lower i32 bits with PCMPGTD and splat the results into the upper elements.

Value tracking has meant we already get pretty close with this, but this allows us to remove a lot of unnecessary bit flipping.
2024-03-01 14:29:12 +00:00
Martin Wehking
dfec4ef1a2
Use object directly instead of accessing ArrayRef (#83263)
Use RegOp directly inside debug code to silence a static analyzer that
warns about accessing it through its ArrayRef wrapper.
2024-03-01 19:15:42 +05:30
Shengchen Kan
924ad198f5 [X86][CodeGen] Add missing patterns for APX NDD instructions about encoding trick 2024-03-01 21:26:10 +08:00
Alfie Richards
b8e0f3e81e
[ARM] Change the type of CC and VCC code in splitMnemonic. (#83413)
This changes the type of `PredicationCode` and `VPTPredicationCode` from
`unsigned` to `ARMCC::CondCodes` and `ARMVCC::VPTCodes` resp' for
clarity and correctness.
2024-03-01 13:12:06 +00:00
Pierre van Houtryve
756166e342
[AMDGPU] Improve detection of non-null addrspacecast operands (#82311)
Use IR analysis to infer when an addrspacecast operand is nonnull, then
lower it to an intrinsic that the DAG can use to skip the null check.

I did this using an intrinsic as it's non-intrusive. An alternative
would have been to allow something like `!nonnull` on `addrspacecast`
then lower that to a custom opcode (or add an operand to the
addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just
one target's use case IMO.

I'm hoping that when we switch to GISel that we can move all this logic
to the MIR level without losing info, but currently the DAG doesn't see
enough so we need to act in CGP.

Fixes: SWDEV-316445
2024-03-01 14:01:10 +01:00
David Green
0e9a102129 [AArch64] Remove unused AArch64ISD::BIT. NFC
These were last used in the fcopysign lowering, which now uses AArch64ISD::BSP.
2024-03-01 11:44:58 +00:00
David Green
d458a19317
[AArch64] Mark AESD and AESE instructions as commutative. (#83390)
This come from
https://discourse.llvm.org/t/combining-aes-and-xor-can-be-improved-further/77248.

These instructions start out with:
```
  XOR Vd, Vn
  <some complicated math>
```
The initial XOR means that they can be treated as commutative, removing
some of the unnecessary mov's introduced during register allocation.
2024-03-01 10:24:27 +00:00
Jay Foad
4c8c335bcd
[AMDGPU] Rename hasGFX12Enc to hasRestrictedSOffset in BUF definitions. NFC. (#83434)
This just renames a tablegen argument to match the corresponding
subtarget feature.
2024-03-01 10:14:37 +00:00
chuongg3
4a5ec3cec8
Revert "[AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors" (#83544)
Reverts llvm/llvm-project#83038 due to failing build in Fuchsia build
https://lab.llvm.org/staging/#/builders/187/builds/1695
2024-03-01 08:56:34 +00:00
Nick Anderson
ba8e9ace13
[AMDGPU] promote i1 arg type for amdgpu_cs (#82971)
fixes #68087 
Not sure where to put regression tests for this pr? Also, should i1 args
not in reg also be promoted?
2024-03-01 14:25:46 +05:30
Shengchen Kan
420928b2fa [X86][CodeGen] Fix compile crash in EVEX compression for corner case
The base register of OPmi_ND may be allocated to the same physic
register as the ND operand.

OPmi_ND is not compressible b/c it has different semnatic from OPmi.
In this case, `isRedundantNewDataDest` should return false, otherwise
we would get error

Assertion `!IsNDLike && "Missing entry for ND-like instruction"' failed.
2024-03-01 16:13:09 +08:00
Dhruv Chawla (work)
6c39fa9e9f
[AArch64][GlobalISel] Expand abs.v4i8 to v4i16 and abs.v2s16 to v2s32 (#81231)
GISel was currently falling back to SDAG for these functions, and this
matches the way SDAG currently generates code for these functions.
2024-03-01 13:01:55 +05:30
Wang Pengcheng
2023a230d1
[RISCV] Move V0 to the end of register allocation order (#82967)
According to

https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html:

> The v0 register defined by the RISC-V vector extension is special in
> that it can be used both as a general purpose vector register and also
> as a mask register. As a preference, use registers other than v0 for
> non-mask values. Otherwise data will have to be moved out of v0 when a
> mask is required in an operation. v0 may be used when all other
> registers are in use, and using v0 would avoid spilling register state
> to memory.

And using V0 register may stall masking pipeline and stop chaining
for some microarchitectures.

So we should try to not use V0 and register groups contained it as
much as possible. We achieve this via moving V0 to the end of RA
order.
2024-03-01 12:17:56 +08:00
Felix (Ting Wang)
5b05870953
[PowerPC] Support local-dynamic TLS relocation on AIX (#66316)
Supports TLS local-dynamic on AIX, generates below sequence of code:

```
.tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier
.tc mh[TC],mh[TC]@ml # Module handle for the caller
lwz 3,mh[TC]\(2\) $$ For 64-bit: ld 3,mh[TC]\(2\)
bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0
#r3 = &TLS for module
lwz 4,foo[TC]\(2\) $$ For 64-bit: ld 4,foo[TC]\(2\)
add 5,3,4 # Compute &foo
.rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML"
```

---------

Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan>
Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>
2024-03-01 08:09:40 +08:00
Kai Luo
d1924f0474
[PowerPC] Do not generate isel instruction if target doesn't have this instruction (#72845)
When expand `select_cc` in finalize-isel, we should not generate `isel`
for targets not feature it.
2024-03-01 08:03:06 +08:00
Sumanth Gundapaneni
ca9d2e923b
[Hexagon] Add Loop Alignment pass. (#83379)
Inspect a basic block and if its single basic block loop with a small
number of instructions, set the Loop Alignment to 32 bytes. This will
avoid the cache line break in the first packet of loop which will cause
a stall per each execution of loop.
2024-02-29 16:57:33 -06:00
Leon Clark
5b07fd4799
[AMDGPU] Fix OpenCL conformance test failures for ctlz. (#83170)
Remove LSH transform and restore previous lowering.

Fixes conformance issue in
[77615](https://github.com/llvm/llvm-project/pull/77615) where OpenCL
integer_ops tests fail for integer_clz.

Co-authored-by: Leon Clark <leoclark@amd.com>
2024-02-29 22:28:13 +00:00
Craig Topper
6afda56faa
[RISCV] Store RVC and TSO ELF flags explicitly in RISCVTargetStreamer. NFCI (#83344)
Instead of caching STI in the RISCVELFTargetStreamer, store the two
flags we need from it.

My goal is to allow RISCVAsmPrinter to override these flags using IR
module metadata for LTO. So they need to be separated from the STI used
to construct the TargetStreamer.

This patch should be NFC as long as no one is changing the contents of
the STI that was used to construct the TargetStreamer between the
constructor and the use of the flags.
2024-02-29 08:55:52 -08:00
chuongg3
a344db793a
[AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors (#83038)
Legalize Smaller/Larger than legal vectors with i8 and i16 element
sizes.
Vectors with elements smaller than i8 will get widened to i8 elements.
2024-02-29 16:31:05 +00:00
Jonathan Thackray
147dc81c1d
[ARM][AArch64] Enable FEAT_FHM for Arm Neoverse N2 (#82613)
Correct an issue with Arm Neoverse N2 after it was
changed to a v9a core in change
f576cbe44eabb8a5ac0af817424a0d1e7c8fbf85:

  * FEAT_FHM should be enabled for this core.
2024-02-29 15:57:50 +00:00
Sumanth Gundapaneni
962f3e4e5b
[Hexagon] Use the correct call to detect debug instructions (#83373) 2024-02-29 09:51:52 -06:00
Sander de Smalen
5bd01ac822
[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235)
We can add implicit defs/uses of the 'VG' register to the instructions
to prevent the register allocator from rematerializing values in between
streaming-mode changes, as the def/use of VG will further nail down the
ordering that comes out of ISel. This avoids the heavy-handed approach
to prevent any kind of rematerialization.

While we could add 'VG' as a Use to all SVE instructions, we only really
need to do this for instructions that are rematerializable, as the
smstart/smstop instructions and pseudos act as scheduling barriers which
is sufficient to prevent other instructions from being scheduled in
between the streaming-mode-changing call sequence. However, we may
revisit this in the future.
2024-02-29 15:35:46 +00:00
Simon Pilgrim
80a328b011 [X86] SimplifyDemandedVectorEltsForTargetNode - add basic PCMPEQ/PCMPGT handling 2024-02-29 15:22:12 +00:00
Michael Maitland
4f132dca71
[RISCV] Enable PostRAScheduler for SiFive7 (#83166)
Based on numbers collected in our downstream toolchain.
2024-02-29 09:57:15 -05:00
RicoAfoat
1e6627ecef
[X86] matchAddressRecursively - ensure dead nodes are replaced before matching the index register (#82881)
Fixes #82431 - see #82431 for more information.
2024-02-29 14:55:51 +00:00
Petar Avramovic
0d572c41f9
AMDGPU\GlobalISel: remove amdgpu-global-isel-risky-select flag (#83426)
AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs.
Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI.
Lane mask merging algorithm for GlobalISel is now responsible for
selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering.
Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select
pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.
2024-02-29 15:38:54 +01:00
S. Bharadwaj Yadavalli
b1c8b9f89c
[DirectX][NFC] Leverage LLVM and DirectX intrinsic description in DXIL Op records (#83193)
* Leverage TableGen record descriptions of LLVM or DirectX intrinsics
that can be directly mapped in DXIL Ops TableGen description. As a
result, such DXIL Ops can be succinctly described without duplication.
DXILEmitter backend can derive the properties of DXIL Ops accordingly.
* Ensured that corresponding lit tests pass.
2024-02-29 06:21:44 -08:00
Petar Avramovic
6c2eec5cea
AMDGPU/GlobalISel: lane masks merging (#73337)
Basic implementation of lane mask merging for GlobalISel.
Lane masks on GlobalISel are registers with sgpr register class
and S1 LLT - required by machine uniformity analysis.
Implements equivalent of lowerPhis from SILowerI1Copies.cpp in:
patch 1: https://github.com/llvm/llvm-project/pull/75340
patch 2: https://github.com/llvm/llvm-project/pull/75349
patch 3: https://github.com/llvm/llvm-project/pull/80003
patch 4: https://github.com/llvm/llvm-project/pull/78431
patch 5: is in this commit:

AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers

Previously, in PHIs that represent lane masks, incoming registers
taken as-is were not selected as lane masks. Such registers are not
being merged with another lane mask and most often only have S1 LLT.
Implement constrainAsLaneMask by constraining incoming registers
taken as-is with lane mask attributes, essentially transforming them
to lane masks. This is final step in having PHI instructions created
in this pass to be fully instruction-selected.
2024-02-29 13:57:59 +01:00
Simon Pilgrim
139bcda542 [X86] SimplifyDemandedVectorEltsForTargetNode - add basic CVTPH2PS/CVTPS2PH handling
Allows us to peek through the F16 conversion nodes, mainly to simplify shuffles

An easy part of #83414
2024-02-29 12:33:49 +00:00
Jay Foad
20fe83bc85
[AMDGPU] Add new aliases ds_subrev_rtn_u32/u64 for ds_rsub_rtn_u32/u64 (#83408)
Following on from #83118, this adds aliases for the "rtn" forms of these
instructions. The fact that they were missing from SP3 was an oversight
which has been fixed now.
2024-02-29 12:02:06 +00:00
Jay Foad
02bad7a858 [AMDGPU] Simplify !if condition. NFC. 2024-02-29 11:45:20 +00:00
Simon Pilgrim
7ff3f9760d [X86] getFauxShuffleMask - handle insert_vector_elt(bitcast(extract_vector_elt(x))) shuffle patterns
If the bitcast is between types of equal scalar size (i.e. fp<->int bitcasts), then we can safely peek through them

Fixes #83289
2024-02-29 10:32:49 +00:00
Tomas Matheson
03420f570e Revert "[llvm][arm] add T1 and T2 assembly options for vlldm and vlstm (#83116)"
This reverts commit 634b0243b8f7acc85af4f16b70e91d86ded4dc83.

Failing EXPENSIVE_CHECKS builds with "undefined physical register".
2024-02-29 09:48:29 +00:00
Matt Arsenault
c757ca7417 AMDGPU: Remove dead declaration 2024-02-29 14:40:11 +05:30
Shih-Po Hung
6ee9c8afbc
[RISCV][CostModel] Updates reduction and shuffle cost (#77342)
- Make `andi` cost 1 in SK_Broadcast
- Query the cost of VID_V, VRSUB_VX/VRSUB_VI which would scale with LMUL
2024-02-29 15:41:19 +08:00
Craig Topper
95aab69c10
[RISCV] Remove experimental from Zacas. (#83195)
Document that we don't use the double compare and swap instructions due
to ABI concerns.
2024-02-28 21:46:58 -08:00
Yeting Kuo
14d8c4563e
[RISCV] Add more intrinsics into canSplatOperand. (#83106)
This patch adds smin/smax/umin/umax/sadd_sat/ssub_sat/uadd_sat/usub_sat
into canSplatOperand. It can help llvm fold vv instructions with one
splat operand to vx instructions.
2024-02-29 12:57:34 +08:00
XinWang10
ffa48f0c94
[X86][MC] Teach disassembler to recognize apx instructions which ignores W bit (#82747)
Extended VMX instructions and 8 bit apx extended instructions don't need
W bit, they are marked as W ignored in spec.
RFC:
https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4
2024-02-29 11:44:41 +08:00
Shengchen Kan
265e49d1f4 [X86][NFC] Lowercase first letter of function names in X86ExpandPseudo.cpp 2024-02-29 10:29:32 +08:00
Shengchen Kan
273cfd377b [X86][NFC] Avoid duplicated code in X86ExpandPseudo.cpp by using macro GET_EGPR_IF_ENABLED 2024-02-29 10:05:05 +08:00