In review of bbde6b, I had originally proposed that we support the
legacy text format. As review evolved, it bacame clear this had been a
bad idea (too much complexity), but in order to let that patch finally
move forward, I approved the change with the variant. This change undoes
the variant, and updates all the tests to just use the array form.
The test does not check for anything related to cfi information so we
don't really need them in the test checks. Also it looks like there were
some failures on the Alpine Linux builders due to the placement of the
cfi information in the output assembly.
I have also changed `-march` to `-mtriple` in the run line similar to
2208c97
This patch fixes a bug introduced in #145878. A dependency was added in
the wrong direction, causing an assertion failure due to broken
topological order.
The pattern i32xi32->i64, should be matched to the sign-extended
multiply op, instead of explicit sign- extension of the operands
followed by non-widening multiply (this takes 4 operations instead of
one). Currently, if one of the operands of multiply inside a loop is a
constant, the sign-extension of this constant is hoisted out of the loop
by LICM pass and this pattern is not matched by the ISEL.
This change handles multiply operand with Opcode of the type AssertSext
which is seen when the sign-extension is hoisted out-of the loop.
Modifies the DetectUseSxtw() to check for this.
UGP contains the pointer for thread data:
> The TLS area is accessed at the processor level through the special
register UGP This register is set to the address one location above the
TLS area, which grows downwards from UGP.
From the Hexagon ABI spec -
https://docs.qualcomm.com/bundle/publicresource/80-N2040-23_REV_K_Qualcomm_Hexagon_Application_Binary_Interface_User_Guide.pdf
Also: disable clang-format for `NodeType` enum in
`llvm/lib/Target/Hexagon/HexagonISelLowering.h` to avoid disruptive
formatting.
Hexagon currently has an untested global flag to control fast
math variants of libcalls. Add fast variants as explicit libcall
options so this can be a flag based lowering decision, and implement
it. I have no idea what fast math flags the hexagon case requires,
so I picked the maximally potentially relevant set of flags although
this probably is refinable per call. Looking in compiler-rt, I'm not
sure if the fast variants are anything more than aliases.
Generate the saturating add instructions for sadd.sat for scalar and
vector instructions
Co-authored-by: aankit-quic <aankit@quicinc.com>
Co-authored-by: Jyotsna Verma <jverma@quicinc.com>
This patch adds an additional validation step to ensure that the
generated schedule does not violate loop-carried memory dependencies.
Prior to this patch, incorrect schedules could be produced due to the
lack of checks for the following types of dependencies:
- load-to-store backward (from bottom to top within the BB) dependencies
- store-to-load dependencies
- store-to-store dependencies
One possible solution to this issue is to add these dependencies
directly to the dependency graph, although doing so may lead to
performance degradation. In addition, no known cases of incorrect code
generation caused by these missing dependencies have been observed in
practice. Given these factors, this patch introduces a post-scheduling
validation phase to check for such previously missed dependencies,
instead of adding them to the graph before searching for a schedule.
Since no actual problems have been identified so far, it is likely that
most generated schedules are already valid. Therefore, this additional
validation is not expected to cause performance degradation in practice.
Split off from #135148 .
The remaining tasks are as follows:
- Address other missing loop-carried dependencies (e.g., output
dependencies between physical registers, barrier instructions, and
instructions that may raise floating-point exceptions)
- Remove code that are currently retained to maintain the existing
behavior but probably unnecessary.
- Eliminate `SwingSchedulerDAG::isLoopCarriedDep` and use
`SwingSchedulerDDG` to traverse edges after dependency analysis part.
This commit updates the Hexagon backend to handle vxi1 call operands
Without HVX enabled. It ensures compatibility for vector types of sizes
4, 8, 16, 32, 64, and 128 x i1 when HVX is not enabled.
Change undef branch conditions to the values that loop-simplify gives
them, and handle other undef values by using extra arguments. I'm making
this change because of an upcoming loop strength reduction change that
results in instsimplify removing more instructions due to them using
undef, causing the test checks to fail.
This will convert loads of constant strings to immediate values. Put
this behind a flag that is enabled by default so that we can toggle it
if need be.
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.
This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
In MachinePipeliner, loop-carried memory dependencies are represented by
DAG, which makes things complicated and causes some necessary
dependencies to be missing. This patch introduces a new class to manage
loop-carried memory dependencies to simplify the logic. The ultimate
goal is to add currently missing dependencies, but this is a first step
of that, and this patch doesn't intend to change current behavior. This
patch also adds new tests that show the missed dependencies, which
should be fixed in the future.
Split off from #135148
MachinePipeliner uses AliasAnalysis to collect loop-carried memory
dependencies. To analyze loop-carried dependencies, we need to
explicitly tell AliasAnalysis that the values may come from different
iterations. Before this patch, MachinePipeliner didn't do this, so some
loop-carried dependencies might be missed. For example, in the following
case, there is a loop-carried dependency from the load to the store, but
it wasn't considered.
```
def @f(ptr noalias %p0, ptr noalias %p1) {
entry:
br label %body
loop:
%idx0 = phi ptr [ %p0, %entry ], [ %p1, %body ]
%idx1 = phi ptr [ %p1, %entry ], [ %p0, %body ]
%v0 = load %idx0
...
store %v1, %idx1
...
}
```
Further, the handling of the underlying objects was not sound. If there
is no information about memory operands (i.e., `memoperands()` is
empty), it must be handled conservatively. However, Machinepipeliner
uses a dummy value (namely `UnknownValue`). It is distinguished from
other "known" objects, causing necessary dependencies to be missed.
(NOTE: in such cases, `buildSchedGraph` adds non-loop-carried
dependencies correctly, so perhaps a critical problem has not occurred.)
This patch fixes the above problems. This change has increased false
dependencies that didn't exist before. Therefore, this patch also
introduces additional alias checks with the underlying objects.
Split off from #135148
This patch removes UB from some tests for MachinePipeliner. This patch
fixes following cases.
- Branching on an `undef` value.
- Using `undef`/`null` as a pointer operand of a load/store.
There are other tests of pipeliner that contain the same UB, but for
now, this patch fixes particularly unstable cases when I developed
pipeliner.
Add check to make sure Dist > 0 or Dist < 0 for appropriate cmp cases to
hexagon hardware loops pass. The change modifies the
HexagonHardwareLoops pass to add runtime checks to make sure that
end_value > initial_value for less than comparisons and end_value <
initial_value for greater than comparisons.
Fix for https://github.com/llvm/llvm-project/issues/133241
@androm3da @iajbar PTAL
---------
Co-authored-by: aankit-quic <aankit@quicinc.com>
Bug relates to `early-if-predicator` and `early-ifcvt` passes. If
virtual register has "killed" flag in both basic blocks to be merged
into head, both instructions in head basic block will have "killed" flag
for this register. It makes MIR incorrect.
Example:
```
bb.0: ; if
...
%0:intregs = COPY $r0
J2_jumpf %2, %bb.2, implicit-def dead $pc
J2_jump %bb.1, implicit-def dead $pc
bb.1: ; if.then
...
S4_storeiri_io killed %0, 0, 1
J2_jump %bb.3, implicit-def dead $pc
bb.2: ; if.else
...
S4_storeiri_io killed %0, 0, 1
J2_jump %bb.3, implicit-def dead $pc
```
After early-if-predicator will become:
```
bb.0:
%0:intregs = COPY $r0
S4_storeirif_io %1, killed %0, 0, 1
S4_storeirit_io %1, killed %0, 0, 1
```
Having `killed` flag set twice in bb.0 for `%0` is an incorrect MIR.
Set the default processor version to v68 when the user does not specify
one in the command line. This includes changes in the LLVM backed and
linker (lld). Since lld normally sets the version based on inputs, this
change will only affect cases when there are no inputs.
Fixes#127558
The interval of newly generated reg in ModuloScheduleExpander is empty.
This will cause crash at some corner case. This patch recalculate the
live intervals of these regs.
For the ordered FP compare bitcode instructions, the Hexagon backend was
assuming that no operand could be a NaN. This assumption is flawed. This
patch fixes the code-generation to produce fpcmp.uo and and appropriate
bit comparison operators to account for the case when an operand to a FP
compare is a NaN.
Fix for https://github.com/llvm/llvm-project/issues/129391
Co-authored-by: aankit-quic <aankit@quicinc.com>
EnableTailMerge is false by default and is handled by the pass builder.
Passes are independent of target pipeline options.
This completes the generic `MachineLateOptimization` passes for the NPM
pipeline.
Fixes a crash with extract_subvectors in Hexagon backend seen when the
source vector is a vector-pair and result vector is not hvx vector size.
LLVM Issue: https://github.com/llvm/llvm-project/issues/128775Fixes#128775
---------
Co-authored-by: aankit-quic <aankit@quicinc.com>
This commit updates the Hexagon backend to handle
vxi1 call operands. It ensures compatibility for
vector types of sizes 4, 8, 16, 32, 64, and 128 x i1 when HVX is
enabled.
~Fixes #59009 and #118879~
Code in the HexagonBitTracker checks for a specific register class when
processing sub-registers. A crash occurred due to a register class that
was not handled. The register class is
DoubleRegs_with_isub_hi_in_IntRegsLow8RegClassID, which is a class
formed by creating a register pair when one of the sub registers is a
Low8 integer register.
Fixes#128078
Patch by: Brendon Cahoon
After #117558 landed, this code would assert "Value is not an N-bit
unsigned value" in getConstant(), from a test case in zig.
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Fixes#127296
When a def in a block A reaches another block B that is in A's iterated
dominance frontier, a phi node is added to B for the def register.
A clobbering def can be created at a call instruction, for a register
clobbered by a call.
However, phi nodes are not created for a register, when one of the
reaching defs of the register is a clobbering def.
This patch adds phi nodes for registers that have a clobbering reaching
def. These additional phis help in checking reaching defs for an
instruction in RDF based copy propagation and addressing mode
optimizations.
The previous implementation had false positive/negative cases in the
analysis of the loop carried dependency.
A missed dependency case is caused by incorrect analysis of address
increments. This is fixed by strict analysis of recursive definitions.
See added test swp-carried-dep4.mir.
Excessive dependency detection is fixed by improving the formula
for determining the overlap of address ranges to be accessed. See added test
swp-carried-dep5.mir.
This PR removes the old `nocapture` attribute, replacing it with the new
`captures` attribute introduced in #116990. This change is
intended to be essentially NFC, replacing existing uses of `nocapture`
with `captures(none)` without adding any new analysis capabilities.
Making use of non-`none` values is left for a followup.
Some notes:
* `nocapture` will be upgraded to `captures(none)` by the bitcode
reader.
* `nocapture` will also be upgraded by the textual IR reader. This is to
make it easier to use old IR files and somewhat reduce the test churn in
this PR.
* Helper APIs like `doesNotCapture()` will check for `captures(none)`.
* MLIR import will convert `captures(none)` into an `llvm.nocapture`
attribute. The representation in the LLVM IR dialect should be updated
separately.
At some corner cases, the cloned MI still retains an old slot index,
which leads to the compiler crashing. This patch update the slot index
map before delete the recycled MI.
https://github.com/llvm/llvm-project/issues/123165