AMDGPUAtomicOptimizer updates the dominator tree whenever
it modified the control flow. Therefore preserving the
analysis similar to legacy PM.
Reviewed By: arsenm, yassingh, #amdgpu
Differential Revision: https://reviews.llvm.org/D153349
This is intended to behave like GCC's `-mcmodel=extreme`.
Technically the true GCC equivalent would be `-mcmodel=large` which is
not yet implemented there, and we probably do not want to take the
"Large" name until things settle in GCC side, but:
* LLVM does not have a `CodeModel::Extreme`, and it seems too early to
have such a variant added just for enabling LoongArch; and
* `CodeModel::Small` is already being used for GCC `-mcmodel=normal`
which is already a case of divergent naming.
Regarding the codegen, loads/stores immediately after a PC-relative
large address load (that ends with something like `add.d $addr, $addr,
$tmp`) should get merged with the addition into corresponding `ldx/stx`
ops, but is currently not done. This is because pseudo-instructions are
expanded after instruction selection, and is best fixed with a separate
change.
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D150522
This patch adds a pass to generate `cm.mvsa01` & `cm.mva01s`.
RISCVMoveOptimizer.cpp which combines two mv inst into one cm.mva01s or cm.mva01s.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D150415
As mentioned by commit c5d38924dc6688c15b3fa133abeb3626e8f0767c (Apr 2020),
PC-relative entries avoid dynamic relocations and can therefore make the
section read-only.
This is similar to D78082 and D78590. We cannot commit to support
compiler/runtime built at different versions, so just don't play with versions.
For Mach-O support (incomplete yet), we use non-temporary `lxray_fn_idx[0-9]+`
symbols. Label differences are represented as a pair of UNSIGNED and SUBTRACTOR
relocations. The SUBTRACTOR external relocation requires r_extern==1 (needs to
reference a symbol table entry) which can be satisfied by `lxray_fn_idx[0-9]+`.
A `lxray_fn_idx[0-9]+` symbol also serves as the atom for this dead-strippable
section (follow-up to commit b9a134aa629de23a1dcf4be32e946e4e308fc64d).
Differential Revision: https://reviews.llvm.org/D152661
Depends on D151397.
This patch follows the patch-set of D151395. This patch seeks to
update all the remaining fixed-point intrinsics to model vxrm
control, adding rounding mode control for `vsmul`, `vssra`,
`vssrl`, `vnclip`, and `vnclipu`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D152879
Depends on D151396.
This is the 3rd patch of the patch-set. For the cover letter of the
patch-set, please checkout D151395.
This commit consists of change in both clang front-end and RISC-
back-end.
In the front-end, this commit adds an additional operand to the C
intrinsics of `vaadd`, `vaaddu`, `vasub`, and `vasubu`, that models
the control of the rounding mode.
In the back-end, using `vaadd` as an example, this commit replaces the
existing `int.riscv.vaadd.*` with `int.riscv.vaadd.rm.*` that was
introduced in the previous patch, with the extra operand that models
the control of the rounding mode (`vxrm`) for RVV fixed-point
intrinsics.
Note: The first 3 commit of the patch-set shows the intent to model the
rounding mode for fixed-point intrinsics by applying change to
`vaadd`, `vaaddu`, `vasub`, and `vasubu`. The proceeding patch will
apply the change to the rest of the other fixed-point instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151397
Depends on D151395.
This is the 2nd patch of the patch-set. For the cover letter of the
patch-set, please checkout D151395. This patch originates from
D121376.
This commit models vxrm by adding an immediate operand into intrinsics
and machine instructions of RVV fixed-point instruction `vaadd`,
`vaaddu`, `vasub`, and `vasubu`. This commit only covers intrinsics of
the four instructions, the proceeding patches of the patch-set will do
the same to other RVV fixed-point instructions.
The current naiive approach is to have a write to vxrm inserted before
every fixed-point instruction. This is done by the new added pass
`RISCVInsertReadWriteCSR`. The reason to name the pass in a more general
term is because we will also model rounding mode for the RVV floating-
point instructions. The approach will be improved in the future,
implementing partial redundancy elimination algorithms to it.
The original LLVM intrinsics and machine instructions, take `vaadd` as
an example, does not model the rounding mode is not removed in this
patch. That is, `int.riscv.vaadd.*` co-exists with
`int.riscv.vaadd.rm.*` after this patch. The next patch will add C
intrinsics of vaadd with an additional operand that models the control
of the rounding mode, in this patch, `int.riscv.vaadd.rm.*` will
replace `int.riscv.vaadd.*`.
Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D151396
The optimizing, non-broken features have all been moved to
AMDGPUAttributor. The only remaining piece of functionality was the
broken propagation of the wavesize features. This was fundamentally
broken and a hack for device library linking. It doesn't matter when
the device libraries are correctly linked and internalized.
In case of linked-as-normal-bitcode (as comgr still does), we're
reliant on the global subtarget anyway. If we can get away without
forcing target-cpu, we should just as well be able to get away without
propagating target-features.
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 32-bit (specifically, non-optimized) code sequence.
This work is a follow up of D149722.
The particular sequence that is generated for this sequence is as follows:
```
.tc var[TC],var[TL]@le. // variable offset, with the le relocation specifier
bla .__get_tpointer() // get the thread pointer, modifies r3
lwz reg1, var[TC](2) // load the variable offset
add reg2, r3, reg1 // add the variable offset to the retrieved thread pointer
```
Differential Revision: https://reviews.llvm.org/D152669
Unless I'm missing something we need to update the whole vector
not just where OddMask is true.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D153087
This was looking for full copies produced by SplitKit, but SplitKit
introduces copy bundles if not all lanes are live. The scan for uses
needs to look at bundles, not individual instructions.
This is a prerequisite to avoiding some redundant spills due to
subregisters which will help avoid an allocation failure in a future
patch.
This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions.
```
int sel(int a, int b)
{
return (a < b) ? a : 0;
}
```
```
slt $a1,$a0,$a1
masknez $a2,$r0,$a1
maskeqz $a0,$a0,$a1
or $a0,$a0,$a2
```
=>
```
slt $a1,$a0,$a1
maskeqz $a0,$a0,$a1
```
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D153193
Verifying dominator tree is expensive using intra-pass
asserts. Asserts added during D147408 are
increasing the build time of libc significantly. This change
does the verification after the atomic optimizer pass
and should fix the regression reported in D153232.
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D153261
They cause failures on the llvm-clang-x86_64-expensive-checks-debian
buildbot.
This partially reverts
D153269 [AMDGPU][GFX11] Add test coverage for FMA instructions.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:
1. counters of the resources that have already been booked;
2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.
The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D153116
Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485:
llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking'
if (MischedDetailResourceBooking)
This reverts commit fc06262c1c365777e71207b6a5de281cba927c96.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:
1. counters of the resources that have already been booked;
2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.
The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.
Reviewed By: andreadb
Differential Revision: https://reviews.llvm.org/D153116
Try to break a multiplication with a specific immediate to
an/a addition/subtraction of left shifts.
Reviewed By: zixuan-wu
Differential Revision: https://reviews.llvm.org/D153106
The intrinsic has a smaller integer type than the parameter type of
builtin-function/API. Fix this similar to commit 3fa3cb408d8d0f1365b322262e501b6945f7ead9.
The Clang built-in function is void __xray_typedevent(size_t, const void *, size_t),
but the LLVM intrinsics has smaller integer types. Since we only allow
64-bit ELF/Mach-O targets, we can change llvm.xray.typedevent to
i64/ptr/i64.
This allows encoding more information and avoids i16 legalization for
many non-X86 targets.
fdrLoggingHandleTypedEvent only supports uint16_t event type.
This patch adds support for the TLS local-exec access model on AIX to allow
for the ability to generate the 64-bit (specifically, non-optimized) code sequence.
For this patch in particular, the sequence that is generated involves a load of the
variable offset, followed by an add of the loaded variable offset to r13 (which is
thread pointer, respectively). This code sequence looks like the following:
```
ld reg1,var[TC](2)
add reg2, reg1, r13 // r13 contains the thread pointer
```
The TOC (.tc pseudo-op) entries generated in the assembly files are also
changed where we add the @le relocation for the variable offset.
Differential Revision: https://reviews.llvm.org/D149722
This reverts the revert commit 1797ab36efc9c90c921cd725831f8c3f6a7125a2.
The recommitted version now checks the PostIncLoopSets for all fixups
and returns nullptr if the result doesn't match for all fixups.
Implement this optimization in SIInsertWaitcnts, where we already have
information about whether there might be outstanding VMEM store
instructions. This has the following advantages:
- Correctly handles atomics-with-return.
- Correctly handles call instructions.
- Should be faster because it does not require running a separate pass.
Differential Revision: https://reviews.llvm.org/D153279
AMDGPUAttributor now handles this attribute with value merging, so
delete the old approach which could only apply this to functions which
did not set it, or cloned the function.
The output assembly (textual) contains the instruction
r29 = add(r29,#4294967136)
The value 4294967136 is -160 when interpreted as a signed 32-bit
integer, so it fits in the range of the immediate operand without
a constant extender. The range check in HexagonInstrInfo was putting
the operand value into an int variable, reporting no need for an
extender. This resulted in a packet with 4 instructions, including
the "add". The corresponding check in HexagonMCInstrInfo was using
an int64_t variable, causing the range check to fail, and an extender
to be emitted when lowering to MCInst, resulting in a packet with
too many instructions.
For both CodeGen and CostModelling, this adds extran testing for the new
lvm.vector.reduce.fmaximum and lvm.vector.reduce.fminimum intrinsics, as well
as making sure there is test coverage for all the various cases.
This adds some simple tablegen patterns for converting
`faddp v2f32 extractlow(Rn), v2f32 extracthigh(Rn)` to
`faddp v4f32 Rn, v4f32 Rn` using the q variants of the
instructions, avoiding the extra ext needed to extract
the high lanes. Only the bottom lanes of the new faddp
are used, the second Rn operand is used as a placeholder.
It uses Rn to prevent any false dependencies, but could
equally by undef.
Differential Revision: https://reviews.llvm.org/D152245
Add the `S_ATTR_LIVE_SUPPORT` attribute to the sections so that `ld -dead_strip`
will retain subsections that reference live functions, once we we add linker
private "l" symbols as atoms.
Since we use match shl (v, splat 1) to vadd, we could also expand to widening add.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153112
This function was lifted from fast-isel, and still referred to the Instruction::SRem/URrem opcodes, instead of the G_SREM/G_UREM opcodes.
But it turns out these aren't necessary at all as only the G_SREM/G_UREM codepaths will use the AH register for DivRemResultReg anyhow.