52796 Commits

Author SHA1 Message Date
Matt Arsenault
80e2c26dfd RegisterCoalescer: Fix name of pass
I finally snapped and fixed this inconsistency.
2023-06-21 10:30:43 -04:00
Jay Foad
0b8a2eaf62 [AMDGPU] Add some positive tests for merging S_LOAD instructions 2023-06-21 13:56:03 +01:00
Pravin Jagtap
8e1e871e2f [AMDGPU] Preserve dom-tree analysis in atomic optimizer.
AMDGPUAtomicOptimizer updates the dominator tree whenever
it modified the control flow. Therefore preserving the
analysis similar to legacy PM.

Reviewed By: arsenm, yassingh, #amdgpu

Differential Revision: https://reviews.llvm.org/D153349
2023-06-21 08:02:43 -04:00
Kishan Parmar
c42f0a6e64 PowerPC/SPE: Add phony registers for high halves of SPE SuperRegs
The intent of this patch is to make upper halves of SPE SuperRegs(s0,..,s31)
as artificial regs, similar to how X86 has done it.
And emit store /reload instructions for the required halves.

PR : https://github.com/llvm/llvm-project/issues/57307

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D152437
2023-06-21 10:24:40 +00:00
WANG Xuerui
00786d3a5f [LoongArch] Support CodeModel::Large codegen
This is intended to behave like GCC's `-mcmodel=extreme`.

Technically the true GCC equivalent would be `-mcmodel=large` which is
not yet implemented there, and we probably do not want to take the
"Large" name until things settle in GCC side, but:

* LLVM does not have a `CodeModel::Extreme`, and it seems too early to
  have such a variant added just for enabling LoongArch; and
* `CodeModel::Small` is already being used for GCC `-mcmodel=normal`
  which is already a case of divergent naming.

Regarding the codegen, loads/stores immediately after a PC-relative
large address load (that ends with something like `add.d $addr, $addr,
$tmp`) should get merged with the addition into corresponding `ldx/stx`
ops, but is currently not done. This is because pseudo-instructions are
expanded after instruction selection, and is best fixed with a separate
change.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D150522
2023-06-21 16:41:10 +08:00
WuXinlong
c9e08fa606 [RISCV] Add a pass to merge moving parameter registers instructions for Zcmp
This patch adds a pass to generate `cm.mvsa01` & `cm.mva01s`.

RISCVMoveOptimizer.cpp which combines two mv inst into one cm.mva01s or cm.mva01s.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D150415
2023-06-21 15:41:51 +08:00
tianleli
1c27275813 [DAG] Unroll and expand illegal result of LDEXP and POWI instead of widen.
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D153104
2023-06-21 14:27:39 +08:00
Fangrui Song
e0a6561ec9 [XRay] Make xray_fn_idx entries PC-relative
As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020),
PC-relative entries avoid dynamic relocations and can therefore make the
section read-only.

This is similar to D78082 and D78590. We cannot commit to support
compiler/runtime built at different versions, so just don't play with versions.

For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+`
symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR
relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to
reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`.
A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable
section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d).

Differential Revision: https://reviews.llvm.org/D152661
2023-06-20 22:40:56 -07:00
Krzysztof Parzyszek
dbc283bb9e [Hexagon] Handle 64-bit operands when lowering ADDO/SUBO 2023-06-20 12:43:37 -07:00
eopXD
9ed668ad93 [RISCV] Model vxrm control for vsmul, vssra, vssrl, vnclip, and vnclipu
Depends on D151397.

This patch follows the patch-set of D151395. This patch seeks to
update all the remaining fixed-point intrinsics to model vxrm
control, adding rounding mode control for `vsmul`, `vssra`,
`vssrl`, `vnclip`, and `vnclipu`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D152879
2023-06-20 11:09:24 -07:00
eopXD
5510f0b8f4 [3/3][RISCV][POC] Model vxrm in C intrinsics for RVV fixed-point instruction vaadd, vasub
Depends on D151396.

This is the 3rd patch of the patch-set. For the cover letter of the
patch-set, please checkout D151395.

This commit consists of change in both clang front-end and RISC-
back-end.

In the front-end, this commit adds an additional operand to the C
intrinsics of `vaadd`, `vaaddu`, `vasub`, and `vasubu`, that models
the control of the rounding mode.

In the back-end, using `vaadd` as an example, this commit replaces the
existing `int.riscv.vaadd.*` with `int.riscv.vaadd.rm.*` that was
introduced in the previous patch, with the extra operand that models
the control of the rounding mode (`vxrm`) for RVV fixed-point
intrinsics.

Note: The first 3 commit of the patch-set shows the intent to model the
rounding mode for fixed-point intrinsics by applying change to
`vaadd`, `vaaddu`, `vasub`, and `vasubu`. The proceeding patch will
apply the change to the rest of the other fixed-point instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151397
2023-06-20 11:08:09 -07:00
eopXD
7c8365121a [2/3][RISCV][POC] Model vxrm in LLVM intrinsics and machine instructions for RVV fixed-point instructions
Depends on D151395.

This is the 2nd patch of the patch-set. For the cover letter of the
patch-set, please checkout D151395. This patch originates from
D121376.

This commit models vxrm by adding an immediate operand into intrinsics
and machine instructions of RVV fixed-point instruction `vaadd`,
`vaaddu`, `vasub`, and `vasubu`. This commit only covers intrinsics of
the four instructions, the proceeding patches of the patch-set will do
the same to other RVV fixed-point instructions.

The current naiive approach is to have a write to vxrm inserted before
every fixed-point instruction. This is done by the new added pass
`RISCVInsertReadWriteCSR`. The reason to name the pass in a more general
term is because we will also model rounding mode for the RVV floating-
point instructions. The approach will be improved in the future,
implementing partial redundancy elimination algorithms to it.

The original LLVM intrinsics and machine instructions, take `vaadd` as
an example, does not model the rounding mode is not removed in this
patch. That is, `int.riscv.vaadd.*` co-exists with
`int.riscv.vaadd.rm.*` after this patch. The next patch will add C
intrinsics of vaadd with an additional operand that models the control
of the rounding mode, in this patch, `int.riscv.vaadd.rm.*` will
replace `int.riscv.vaadd.*`.

Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D151396
2023-06-20 11:07:01 -07:00
Matt Arsenault
e777da468c AMDGPU: Delete old AMDGPUPropagateAttributes pass
The optimizing, non-broken features have all been moved to
AMDGPUAttributor. The only remaining piece of functionality was the
broken propagation of the wavesize features. This was fundamentally
broken and a hack for device library linking. It doesn't matter when
the device libraries are correctly linked and internalized.

In case of linked-as-normal-bitcode (as comgr still does), we're
reliant on the global subtarget anyway. If we can get away without
forcing target-cpu, we should just as well be able to get away without
propagating target-features.
2023-06-20 13:05:45 -04:00
Amy Kwan
f5ae075048 [AIX][TLS] Generate 32-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.

The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le.   // variable offset, with the le relocation specifier

bla .__get_tpointer()     // get the thread pointer, modifies r3
lwz reg1, var[TC](2)      // load the variable offset
add reg2, r3, reg1        // add the variable offset to the retrieved thread pointer
```

Differential Revision: https://reviews.llvm.org/D152669
2023-06-20 11:57:38 -05:00
Craig Topper
8680c28add [RISCV] Remove mask from vrgatherei16 in lowerVECTOR_INTERLEAVE.
Unless I'm missing something we need to update the whole vector
not just where OddMask is true.

Reviewed By: luke

Differential Revision: https://reviews.llvm.org/D153087
2023-06-20 09:36:38 -07:00
Matt Arsenault
7dcb9c0f09 InlineSpiller: Consider copy bundles when looking for snippet copies
This was looking for full copies produced by SplitKit, but SplitKit
introduces copy bundles if not all lanes are live. The scan for uses
needs to look at bundles, not individual instructions.

This is a prerequisite to avoiding some redundant spills due to
subregisters which will help avoid an allocation failure in a future
patch.
2023-06-20 12:26:27 -04:00
Simon Pilgrim
ff23856c1c [DAG] Fold (abds x, y) -> (abdu x, y) iff both args are known positive
This is a generic DAG combine version of D151055 which recognizes when a signed ABDS can be safely replaced with a unsigned ABDU instruction if it is legal.

Alive2: https://alive2.llvm.org/ce/z/pb5BjG

Differential Revision: https://reviews.llvm.org/D153328
2023-06-20 15:31:22 +01:00
Jingu Kang
cce08185b4 [AArch64] Try to fold uaddlv and uaddlp
Add tablegen pattern for uaddlv(uaddlp(x)) ==> uaddlv(x).

Differential Revision: https://reviews.llvm.org/D153323
2023-06-20 15:14:27 +01:00
Weining Lu
3dd319ecf3 [LoongArch] Optimize conditional selection of integer
This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions.

```
int sel(int a, int b)
{
    return (a < b) ? a : 0;
}
```

```
slt	$a1,$a0,$a1
masknez	$a2,$r0,$a1
maskeqz	$a0,$a0,$a1
or	$a0,$a0,$a2
```

=>

```
slt	$a1,$a0,$a1
maskeqz	$a0,$a0,$a1
```

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D153193
2023-06-20 21:54:40 +08:00
Pravin Jagtap
699addeff0 [AMDGPU] Use verify<domtree> instead of intra-pass asserts.
Verifying dominator tree is expensive using intra-pass
asserts. Asserts added during D147408 are
increasing the build time of libc significantly. This change
does the verification after the atomic optimizer pass
and should fix the regression reported in D153232.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D153261
2023-06-20 09:52:58 -04:00
Weining Lu
2efdacf74c [LoongArch] Add missing chains and remove unnecessary SDNPSideEffect property for some intrinsic nodes 2023-06-20 21:16:26 +08:00
David Green
1d27ad2077 [AArch64] Add tablegen patterns for fp16 fcvtn2.
Similar to the existing f32 pattern, this adds a tablegen pattern for the fp16
fcvtn2.
2023-06-20 14:10:25 +01:00
Ivan Kosarev
2d3e6c4402 [AMDGPU] Drop GFX11 runs for dagcombine-fma-fmad.ll and fma.f16.ll.
They cause failures on the llvm-clang-x86_64-expensive-checks-debian
buildbot.

This partially reverts
D153269 [AMDGPU][GFX11] Add test coverage for FMA instructions.
2023-06-20 11:32:44 +01:00
Francesco Petrogalli
c7430ff9bf [CodeGen][test] Add missing REQUIRES.
Differential Revision: https://reviews.llvm.org/D153325
2023-06-20 12:00:07 +02:00
Ivan Kosarev
dec42ffa28 [AMDGPU][GFX11] Add test coverage for FMA instructions.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D153269
2023-06-20 10:50:03 +01:00
Francesco Petrogalli
37db9cae2b [llc][MISched] Add -misched-detail-resource-booking to llc.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:

1. counters of the resources that have already been booked;

2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.

The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D153116
2023-06-20 11:46:27 +02:00
Francesco Petrogalli
25f8b1a0a8 Revert "[llc][MISched] Add -misched-detail-resource-booking to llc."
Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485:

llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking'
 if (MischedDetailResourceBooking)

This reverts commit fc06262c1c365777e71207b6a5de281cba927c96.
2023-06-20 11:28:45 +02:00
Francesco Petrogalli
fc06262c1c [llc][MISched] Add -misched-detail-resource-booking to llc.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:

1. counters of the resources that have already been booked;

2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.

The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D153116
2023-06-20 11:13:39 +02:00
Ben Shi
6d05f3f56e [CSKY] Optimize multiplication with immediates
Try to break a multiplication with a specific immediate to
an/a addition/subtraction of left shifts.

Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153106
2023-06-20 16:03:31 +08:00
Ben Shi
56e33d9881 [CSKY][test][NFC] Add more tests of multiplication with immediates
Reviewed By: zixuan-wu

Differential Revision: https://reviews.llvm.org/D153105
2023-06-20 16:03:15 +08:00
Bing1 Yu
516e32678d [X86][AMX] set Stride to Tile's Col when doing combine amxcast and store into tilestore
%tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64)
%vec = call <256 x i8> @llvm.x86.cast.tile.to.vector.v256i8(x86_amx...%tile)
store <256 x i8> %vec, <256 x i8>* %dst_ptr, align 256
=>
%tile = call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 32, i8* %src_ptr, i64 64)
%stride = sext i16 32 to i64
call void @llvm.x86.tilestored64.internal(i16 8, i16 32, i8* %dst_ptr, i64 32, x86_amx %tile)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D153002
2023-06-20 11:55:25 +08:00
Fangrui Song
dafaa8463e [XRay] Make llvm.xray.customevent parameter type match __xray_customevent
The intrinsic has a smaller integer type than the parameter type of
builtin-function/API. Fix this similar to commit 3fa3cb408d8d0f1365b322262e501b6945f7ead9.
2023-06-19 20:38:16 -07:00
Fangrui Song
3fa3cb408d [XRay] Make llvm.xray.typedevent parameter type match __xray_typedevent
The Clang built-in function is void __xray_typedevent(size_t, const void *, size_t),
but the LLVM intrinsics has smaller integer types. Since we only allow
64-bit ELF/Mach-O targets, we can change llvm.xray.typedevent to
i64/ptr/i64.

This allows encoding more information and avoids i16 legalization for
many non-X86 targets.

fdrLoggingHandleTypedEvent only supports uint16_t event type.
2023-06-19 20:28:39 -07:00
Jay Foad
4b6d41cd1d [AMDGPU] Do not release VGPRs if there may be pending scratch stores
Differential Revision: https://reviews.llvm.org/D153295
2023-06-19 21:12:43 +01:00
Amy Kwan
d5659808b2 [AIX][TLS] Generate 64-bit local-exec access code sequence
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.

For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13     // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.

Differential Revision: https://reviews.llvm.org/D149722
2023-06-19 12:17:30 -05:00
Florian Hahn
dae5cd73cb
Recommit "[LSR] Consider post-inc form when creating extends/truncates."
This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2.

The recommitted version now checks the PostIncLoopSets for all fixups
and returns nullptr if the result doesn't match for all fixups.
2023-06-19 17:57:06 +01:00
Jeffrey Byrnes
ac2d6df2d6 [AMDGPU] Add basic support for extended i8 perm matching
Differential Revision: https://reviews.llvm.org/D142782

Change-Id: Ibb95224f7885839e8b77a705f487f10b47a258a6
2023-06-19 09:53:25 -07:00
Jay Foad
eb7491769a [AMDGPU] Reimplement the GFX11 early release VGPRs optimization
Implement this optimization in SIInsertWaitcnts, where we already have
information about whether there might be outstanding VMEM store
instructions. This has the following advantages:
- Correctly handles atomics-with-return.
- Correctly handles call instructions.
- Should be faster because it does not require running a separate pass.

Differential Revision: https://reviews.llvm.org/D153279
2023-06-19 17:12:54 +01:00
Matt Arsenault
7c8958118c AMDGPU: Remove amdgpu-waves-per-eu support in old attribute pass
AMDGPUAttributor now handles this attribute with value merging, so
delete the old approach which could only apply this to functions which
did not set it, or cloned the function.
2023-06-19 11:50:50 -04:00
Krzysztof Parzyszek
734881a6d5 [Hexagon] Fix range checks for immediate operands
The output assembly (textual) contains the instruction
  r29 = add(r29,#4294967136)
The value 4294967136 is -160 when interpreted as a signed 32-bit
integer, so it fits in the range of the immediate operand without
a constant extender. The range check in HexagonInstrInfo was putting
the operand value into an int variable, reporting no need for an
extender. This resulted in a packet with 4 instructions, including
the "add". The corresponding check in HexagonMCInstrInfo was using
an int64_t variable, causing the range check to fail, and an extender
to be emitted when lowering to MCInst, resulting in a packet with
too many instructions.
2023-06-19 08:22:41 -07:00
David Green
d0f56c3e5c [AArch64] Add and expand the testing of fmin/fmax reduction. NFC
For both CodeGen and CostModelling, this adds extran testing for the new
lvm.vector.reduce.fmaximum and lvm.vector.reduce.fminimum intrinsics, as well
as making sure there is test coverage for all the various cases.
2023-06-19 15:47:21 +01:00
David Green
16b46dde0b [AArch64] More tablegen patterns for addp of two extracts
Similar to D152245, this adds integer addp patterns, using the larger
v4i32 addp from addp extractlow, extracthi.
2023-06-19 07:52:46 +01:00
David Green
68f34e4d39 [AArch64] Add tablegen patterns for faddp of two extracts
This adds some simple tablegen patterns for converting
`faddp v2f32 extractlow(Rn), v2f32 extracthigh(Rn)` to
`faddp v4f32 Rn, v4f32 Rn` using the q variants of the
instructions, avoiding the extra ext needed to extract
the high lanes. Only the bottom lanes of the new faddp
are used, the second Rn operand is used as a placeholder.
It uses Rn to prevent any false dependencies, but could
equally by undef.

Differential Revision: https://reviews.llvm.org/D152245
2023-06-19 07:48:31 +01:00
Fangrui Song
b9a134aa62 [XRay] Mark Mach-O xray_instr_map and xray_fn_idx as S_ATTR_LIVE_SUPPORT
Add the `S_ATTR_LIVE_SUPPORT` attribute to the sections so that `ld -dead_strip`
will retain subsections that reference live functions, once we we add linker
private "l" symbols as atoms.
2023-06-18 19:30:16 -07:00
Jianjian GUAN
04ed822dcc [RISCV] Match shl (ext v, splat 1) to vector widening add.
Since we use match shl (v, splat 1) to vadd, we could also expand to widening add.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153112
2023-06-19 09:46:36 +08:00
Fangrui Song
49b61ead47 [XRay][test] Make tests less sensitive to .Ltmp/Ltmp label changes 2023-06-18 13:32:40 -07:00
Simon Pilgrim
fb60dda189 [GlobalIsel][X86] selectDivRem - fix typo in 64-bit AH handling code
This function was lifted from fast-isel, and still referred to the Instruction::SRem/URrem opcodes, instead of the G_SREM/G_UREM opcodes.

But it turns out these aren't necessary at all as only the G_SREM/G_UREM codepaths will use the AH register for DivRemResultReg anyhow.
2023-06-18 17:37:17 +01:00
Simon Pilgrim
46479ea785 [GlobalIsel][X86] Regenerate srem/urem select test coverage 2023-06-18 17:06:32 +01:00
Yingwei Zheng
315e3001c0
[CodeGenPrepare][RISCV] Remove asserting VH references before erasing the dead GEP
Fixes issue https://github.com/llvm/llvm-project/issues/63365

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D153194
2023-06-18 23:40:47 +08:00
Simon Pilgrim
e1164c7a92 [X86] Regenerate tls.ll and reuse common linux check prefixes 2023-06-18 16:02:59 +01:00