Since we already know which register we want to extend, we don't have to
ask its defining MI about it
---------
Co-authored-by: Emil Tywoniak <Emil.Tywoniak@hightec-rt.com>
Instcombine canonicalizes selects to floating point and integer minmax.
This and the dag combiner canonicalize to floating point minmax. None of
them canonicalizes to integer minmax. On Neoverse V2 basic integer
arithmetic and integer minmax have the same costs.
The pre-index matcher just needs some small heuristics to make sure it
doesn't cause regressions. Apart from that it's a simple change, since
the only difference is an immediate operand of '1' vs '0' in the
instruction.
There isn't a test for this yet since the combines aren't used atm, but it will
be tested as part of a future commit. I'm just making this a separate change
tidyness reasons.
Combine any funnel shift with a shift amount of 0 to a copy.
Modulo is applied to shift amount if it is larger than the
instruction's bitwidth.
Differential Revision: https://reviews.llvm.org/D157591
uses when looking for load/store users. This was a simple logic bug during translation
of the equivalent function in SelectionDAG:
```
for (SDNode *Node : N->uses()) {
if (auto *LoadStore = dyn_cast<MemSDNode>(Node)) {
```
After D157690 we are seeing some crashes from Global ISel, which seem to be
related to the shift_of_shifted_logic_chain combine that can remove too many
instructions if the shift amount is zero.
This limits the fold to non-zero shifts, under the assumption that it is better
in that case to fold away the shift to a COPY.
Differential Revision: https://reviews.llvm.org/D158596
Rewrites some simple rules that cause little to no codegen regressions as MIR patterns.
I may have missed some easy cases, but some other rules have intentionally been left as-is because bigger
changes are needed to make them work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D157690
This check was unnecessary/incorrect, it was already being done by the target
hook default implementation, and the one in the matcher was checking for a
completely different thing. This change:
1) Removes the check and updates affected tests which now do some more reassociations.
2) Modifies the AMDGPU hooks which were stubbed with "return true" to also do the oneuse
check. Not sure why I didn't do this the first time.
There is no case where those functions return false. It's always return true.
Even if they were to return false, it's not really something we should rely on I think.
With the current combiner implementation, it would just make `tryCombineAll` return false without retrying anymore rules.
I also believe that if an applyer were to return false, it would mean that the match function is not good enough. Asserting on failure in an apply function is a better idea, IMO.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153619
- (op (op X, C1), C2) -> (op X, (op C1, C2))
- (op (op X, C1), Y) -> (op (op X, Y), C1)
Some code duplication with the G_PTR_ADD reassociations unfortunately but no
easy way to avoid it that I can see.
Differential Revision: https://reviews.llvm.org/D150230
matchCombineShlOfExtend did not check if the size of new shift would be
wider then a size of operand. Current condition did not work if the value
being shifted was zero. Updated to support vector splat.
Patch by: Acim Maravic
Differential Revision: https://reviews.llvm.org/D151122
In some rare corner cases where in between the div/rem pair there's a def of
the second instruction's source (but a different vreg due to the combine's
eqivalence checks), it will place the DIVREM at the first instruction's point,
causing a use-before-def. There wasn't an obvious fix that stood out to me
without doing more involved analysis than a combine should really be doing.
Fixes issue #60516
I'm open to new suggestions on how to approach this, as I'm not too happy
at bailing out here. It's not the first time we run into issues with value liveness
that the DAG world isn't affected by.
Differential Revision: https://reviews.llvm.org/D144336
I don't really understand what the point of wip_match_opcode is.
It doesn't seem to have any purpose other than to list opcodes
to have all the logic in pure C++. You can't seem to use it to
select multiple opcodes in the same way you use match.
Something is wrong with it, since the match emitter prints
"errors" if an opcode is covered by wip_match_opcode and
then appears in another pattern. For exmaple with this patch,
you see this several times in the build:
error: Leaf constant_fold_fabs is unreachable
note: Leaf idempotent_prop will have already matched
The combines are actually produced and the tests for them
do pass, so this seems to just be a broken warning.
Without the fix gcc complains with
../lib/CodeGen/GlobalISel/CombinerHelper.cpp:1652:52: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
1652 | SrcDef->getOpcode() == TargetOpcode::G_OR && "Unexpected op");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
There's a target hook that's called in DAGCombiner that we stub here, I'll
implement the equivalent override for AArch64 in a subsequent patch since it's
used by different shift combine.
This change by itself has minor code size improvements on arm64 -Os CTMark:
Program size.__text
outputg181ppyy output8av1cxfn diff
consumer-typeset/consumer-typeset 410648.00 410648.00 0.0%
tramp3d-v4/tramp3d-v4 364176.00 364176.00 0.0%
kimwitu++/kc 449216.00 449212.00 -0.0%
7zip/7zip-benchmark 576128.00 576120.00 -0.0%
sqlite3/sqlite3 285108.00 285100.00 -0.0%
SPASS/SPASS 411720.00 411688.00 -0.0%
ClamAV/clamscan 379868.00 379764.00 -0.0%
Bullet/bullet 452064.00 451928.00 -0.0%
mafft/pairlocalalign 246184.00 246108.00 -0.0%
lencod/lencod 428524.00 428152.00 -0.1%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D150086
If we have set of mergeable stores of shifts, but the original source value being shifted
is wider than the merged size, we should still be able to merge if we truncate first. To do this
however we need to search for stores speculatively up the block, without knowing exactly how
many stores we should see before we stop. The old algorithm has to match an exact number of
stores to fit the wide type, or it dies. The new one will try to set the wide type to however
many stores we found in the upwards block traversal and use later checks to verify if they're
a valid mergeable set.
The reason I need to move this to LoadStoreOpt is because the combiner works going top down
inside a block, which means that we end up doing partial merges because we haven't seen all
the possible stores before we mutate the MIR. In LoadStoreOpt we can go bottom up.
As a side effect of this change, we also end up doing better on an existing test case (missing_store)
since we manage to do a partial merge there.
We use this combine in the AArch64 postlegalizer combiner, which causes this
function to query the legalizer rules for the action for an invalid opcode/type
combination (G_AND and p0). Moving the legalizer query until after the validity
check in matchHoistLogicOpWithSameOpcodeHands() fixes this.
The extending loads combine tries to prefer sign-extends folding into loads vs
zexts, and in cases where a G_ZEXTLOAD is first used by a G_ZEXT, and then used
by a G_SEXT, it would select the G_SEXT even though the load is already
zero-extending.
Fixes issue #59630