Found using https://github.com/codespell-project/codespell
```
codespell RISCV --write-changes \
--ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru
```
Change the intersect for the anticipated algorithm to ignore unknown
when anticipating. This effectively allows VXRM writes speculatively
because it could do a VXRM write even when there's branches where VXRM
is unneeded.
The importance of this change is because VXRM writes causes pipeline
flushes in some micro-architectures and so it makes sense to allow more
aggressive hoisting even if it causes some degradation for the slow
path.
An example is this code:
```
typedef unsigned char uint8_t;
__attribute__ ((noipa))
void foo (uint8_t *dst, int i_dst_stride,
uint8_t *src1, int i_src1_stride,
uint8_t *src2, int i_src2_stride,
int i_width, int i_height )
{
for( int y = 0; y < i_height; y++ )
{
for( int x = 0; x < i_width; x++ )
dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
dst += i_dst_stride;
src1 += i_src1_stride;
src2 += i_src2_stride;
}
}
```
With this patch, the code above generates a hoisting VXRM writes out of
the outer loop.
Fixes#82659
There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411.
Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact.
After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.
If there's no shift being performed, the rounding mode doesn't matter.
We could do the same for vssra and vssrl, but they are no-ops with a
shift of 0 so would be better off being removed earlier.
This adds a new pass to insert VXRM writes for vector instructions. With
the goal of avoiding redundant writes.
The pass does 2 dataflow algorithms. The first is a forward data flow to
calculate where a VXRM value is available. The second is a backwards
dataflow to determine where a VXRM value is anticipated.
Finally, we use the results of these two dataflows to insert VXRM writes
where a value is anticipated, but not available.
The pass does not split critical edges so we aren't always able to
eliminate all redundancy.
The pass will only insert vxrm writes on paths that always require it.