Similar to 806761a7629df268c8aed49657aeccffa6bca449.
For IR files without a target triple, -mtriple= specifies the full
target triple while -march= merely sets the architecture part of the
default target triple, leaving a target triple which may not make sense,
e.g. amdgpu-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without
a target triple. The issue has been benign as we recognize
$unknown-apple-darwin as ELF instead of rejecting it outrightly.
This patch changes AMDGPU tests to not rely on the default
OS/environment components. Tests that need fixes are not changed:
```
LLVM :: CodeGen/AMDGPU/fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fabs.ll
LLVM :: CodeGen/AMDGPU/floor.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.f64.ll
LLVM :: CodeGen/AMDGPU/fneg-fabs.ll
LLVM :: CodeGen/AMDGPU/r600-infinite-loop-bug-while-reorganizing-vector.ll
LLVM :: CodeGen/AMDGPU/schedule-if-2.ll
```
When generating the permute bytes for the prmt instruction, the
existence of an undefined initial value initialises the int32 that holds
the mask with all 1's (0xFFFFFFFF). That initialization subsequently
leads to complications during the subsequent OR operation, leading to
inaccuracies in populating mask values for the following bytes.
Consequently, the final value persists as a constant -1, irrespective of
the actual mask values that succeed the initial set value.
We currently force users to use a negative contant in the intrinsic
call. Changing it zext would break existing programs, so just sign
extend an argument.
On 64-bit target, prefer using RDPC over CALL to get the value of %pc.
This is faster on modern processors (Niagara T1 and newer) and avoids
polluting the processor's predictor state.
The old behavior of using a fake CALL is still done when tuning for
classic UltraSPARC processors, since RDPC is much slower there.
A quick pgbench test on a SPARC T4 shows about 2% speedup on SELECT
loads, and about 7% speedup on INSERT/UPDATE loads.
Reviewed By: @s-barannikov
Github PR: https://github.com/llvm/llvm-project/pull/78280
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
This avoids possible undefined behavior using the same register for Rm
and Rda.
Additionally adds a check in MC to produce an error upon parsing this
case.
With commuted operands on the phi node, the two old incoming values
could be removed in the wrong order, removing newly added operand
instead of the old one.
Reordering based on the sort order of the MemOpInfo array was disabled
in <https://reviews.llvm.org/D72706>. However, it's not clear this is
desirable for al targets. It also makes it more difficult to compare the
incremental benefit of enabling load clustering in the selectiondag
scheduler as well was the machinescheduler, as the sdag scheduler does
seem to allow this reordering.
This patch adds a parameter that can control the behaviour on a
per-target basis.
Split out from #73789.
vmv.s.x and vmv.x.s ignore LMUL, so we can replace the PseudoVMV_S_X_MX
and
PseudoVMV_X_S_MX with just one pseudo each. These pseudos use the VR
register
class (just like the actual instruction), so we now only have TableGen
patterns for vectors of LMUL <= 1.
We now rely on the existing combines that shrink LMUL down to 1 for
vmv_s_x_vl (and vfmv_s_f_vl). We could look into removing these combines
later and just inserting the nodes with the correct type in a later
patch.
The test diff is due to the fact that a PseudoVMV_S_X/PsuedoVMV_X_S no
longer
carries any information about LMUL, so if it's the only vector pseudo
instruction in a block then it now defaults to LMUL=1.
Prior to this change spv_ptrcast (and OpBitcast) was never emitted for
GEP resulting pointers. While such SPIR-V was (mostly) accepted by the
NEO GPU driver, the generated SPIR-V was incorrect.
The newly added test (pointers/getelementptr-bitcast-load.ll) verifies
that a correct bitcast is added for more complex cases and passes
spirv-val. The test is based on an OpenCL CTS test (basic/prefetch).
The usage of FP Load and Test instructions as a comparison against zero
with the assumption that the dest reg will always reflect the source reg is
actually incorrect: Unfortunately, a SNaN will be converted to a QNaN, so the
instruction may actually change the value as opposed to being a pure register
move with a test.
This patch
- changes instruction selection to always emit FP LT with a scratch def
reg, which will typically be allocated to the same reg if dead.
- Removes the conversions into FP LT in SystemZElimcompare.
Moved extractParts() and extractVectorParts() from LegalizerHelper
to Utils to be able to use it in different passes.
extractParts() will also try to use unmerge when doing irregular
splits where possible, falling back to extract elements when not.
This change is to remove incompatible gws related functions
in order to make device-libs work correctly under -O0 for
gfx1200+
Co-authored-by: Changpeng Fang <changpeng.fang@amd.com>
The structurizer will require the frontend to emit convergence
intrinsics. Once uses to restructurize the control-flow, those
intrinsics shall be removed, as they cannot be converted to
SPIR-V.
This commit adds a new pass to the SPIR-V backend which strips those
intrinsics.
Those 2 new steps are not limited to Vulkan as OpenCL could
also benefit from not crashing if a convertent operation is in
the IR (even though the frontend doesn't generate such intrinsics).
Signed-off-by: Nathan Gauër <brioche@google.com>
This custom DAG combine works on a shuffle where one source vector is a
zero splat, which means we can adjust the shuffle indices to refer to
any element of the splat -- as long as we stay in the same vector.
In the case where an undef (-1) index into the non-splat vector was
used, we ended up adjusting the splat index to -1+NumElements, which
points into the wrong vector.
Fix this by using the first element from the splat if the other one is undef.
There are four cases this theoretically affects, but in practice I only
managed to demonstrate a miscompile with one of them. I think two of
theses are effectively dead due to the operand canonicalization at the
start of the transform.
Fixes https://github.com/llvm/llvm-project/issues/77748.
InstCombine will add the disjoint flag to these or instructions. This patch
adds them to the tests so that it matches the input RISCVGatherScatterLowering
will receive in practice, allowing us to rely on said disjoint flag:
https://github.com/llvm/llvm-project/pull/77800#discussion_r1449231844
This patch adds support for the disjoint flag in the non-recursive case,
as well as adding an additional check for it in the recursive case. Note
that haveNoCommonBitsSet should be equivalent to having the disjoint
flag set, and the check can be removed in a follow-up patch.
Co-authored-by: Philip Reames <preames@rivosinc.com>
---------
Co-authored-by: Philip Reames <preames@rivosinc.com>
Currently vfmv.s.f intrinsics are directly selected to their pseudos via
a
tablegen pattern in RISCVInstrInfoVPseudos.td, whereas the other move
instructions (vmv.s.x/vmv.v.x/vmv.v.f etc.) first get lowered to their
corresponding VL SDNode, then get selected from a pattern in
RISCVInstrInfoVVLPatterns.td
This patch brings vfmv.s.f inline with the other move instructions.
Split out from #71501, where we did this to preserve the behaviour of
selecting
vmv_s_x for VFMV_S_F_VL for small enough immediates.
We previously scanned the whole BB for `DBG_VALUE` instruction even when
the program doesn't have debug info, i.e., the function doesn't have a
subprogram associated with it, which can make compilation unnecessarily
slow. This disables `DebugValueManager` when a `DISubprogram` doesn't
exist for a function.
This only reduces unnecessary work in non-debug mode and does not change
output, so it's hard to add a test to test this behavior.
Test changes were necessary because their `DISubprogram`s were not
correctly linked with the functions, so with this PR the compiler
incorrectly assumed the functions didn't have a subprogram and the tests
started to fail.
Fixes https://github.com/emscripten-core/emscripten/issues/21048.