The testcase I added previously failed because a SelectInst with invalid
operands was created (one side `addrspace(4)`, the other
`addrspace(5)`).
PointerReplacer needs to dig deeper if the true and/or false
instructions of the select are not available.
Fixes SWDEV-542957
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the uses of an instruction are replaced before replacing
the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
This reland PR includes a fix for an assertion failure caused by
https://github.com/llvm/llvm-project/pull/137215, which was reverted.
The failing test involved a phi and gep depending on each other, in
which case the PtrReplacer did not order them correctly for replacement.
This patch fixes it by adding a check during the definition of
`PostOrderWorklist`.
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In the case of PHIs with incoming values which have not yet been visited, they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal, instead of a simpletraversal over the collected users. This reordering ensures that the uses of an instruction are replaced before replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
This is a reland of https://github.com/llvm/llvm-project/pull/137215.
Now we only support D16 folding for `image sample` instructions with a
single user: a `fptrunc` to half.
However, we can actually support D16 folding for image.sample
instructions with multiple users,
as long as each user follows the pattern of extractelement followed by
fptrunc to half.
For example:
```
%sample = call <4 x float> @llvm.amdgcn.image.sample
%e0 = extractelement <4 x float> %sample, i32 0
%h0 = fptrunc float %e0 to half
%e1 = extractelement <4 x float> %sample, i32 1
%h1 = fptrunc float %e1 to half
%e2 = extractelement <4 x float> %sample, i32 2
%h2 = fptrunc float %e2 to half
```
This change enables D16 folding for such cases and avoids generating
`v_cvt_f16_f32_e32` instructions.
This patch enhances the PtrReplacer as follows:
1. Users are now collected iteratively to be generous on the stack. In
the case of PHIs with incoming values which have not yet been visited,
they are pushed back into the stack for reconsideration.
2. Replace users of the pointer root in a reverse-postorder traversal,
instead of a simple traversal over the collected users. This reordering
ensures that the operands of an instruction are replaced before
replacing the instruction itself.
3. During the replacement of PHI, use the same incoming value if it does
not have a replacement.
This patch specifically fixes the case when an incoming value of a PHI
is addrspacecasted.
When a read(first)lane is used on a binary operator and the intrinsic is
the only user of the operator, we can move the read(first)lane into the
operand if the other operand is uniform.
Unfortunately IC doesn't let us access UniformityAnalysis and thus we
can't truly check uniformity, we have to do with a basic uniformity
check which only allows constants or trivially uniform intrinsics calls.
We can also do the same for unary and cast operators.
By the pseudocode in the ISA manual, if any input is a nan it acts
like min3, which will fold to min2 of the other operands. The other
cases fold to min, I'm not sure how this one was wrong.
Since e39f6c1844fab59c638d8059a6cf139adb42279a opt will infer the
correct datalayout when given a triple. Avoid explicitly specifying it
in tests that depend on the AMDGPU target being present to avoid the
string becoming out of sync with the TargetInfo value.
Only tests with REQUIRES: amdgpu-registered-target or a local lit.cfg
were updated to ensure that tests for non-target-specific passes that
happen to use the AMDGPU layout still pass when building with a limited
set of targets.
Reviewed By: shiltian, arsenm
Pull Request: https://github.com/llvm/llvm-project/pull/137921
When constructing a PHI node in `PointerReplacer::replace`, the incoming
operands are expected to have already been replaced and in the
replacement map. However, when one of the incoming operands is a load,
the search of the map is unsuccessful, and a nullptr is returned from
`getReplacement`. The reason is that, when a load is replaced, all the
uses of the load has been actually replaced by the new load. It is
useless to insert the original load into the map. Instead, we should
place the new load into the map to meet the expectation of the later map
search.
Fixes: SWDEV-516420