This fixes#108936, but the calling convention doesn't match with GCC. I
doubt we have such a lib function for now, so leave the calling
convention as is.
fmod will be folded to frem in clang under -fno-math-errno and can be constant
folded in llvm if the operands are known. It can be relatively common to have
fp code that handles special values before doing some calculation:
```
if (isnan(f))
return handlenan;
if (isinf(f))
return handleinf;
..
fmod(f, 2.0)
```
This patch enables the folding of fmod to frem in instcombine if the first
parameter is not inf and the second is not zero. Other combinations do not set
errno.
The same transform is performed for fmod with the nnan flag, which implies the
input is known to not be inf/zero.
Properly demangle `_T` and `_P` return type manglings for MSVC 1920+.
Also added a unit test for `@` return type that is used when mangling
non-template auto placeholder return type function.
Tested the output against the undname shipped with MSVC 19.40.
This patch adds the basic API for the Legality component of the
vectorizer. It also adds some very basic code in the bottom-up
vectorizer that uses the API.
A region identifies a set of vector instructions generated by
vectorization passes. The vectorizer can then run a series of
RegionPasses on the region, evaluate the cost, and commit/reject the
transforms on a region-by-region basis, instead of an entire basic
block.
This is heavily based ov @vporpo's prototype. In particular, the doc
comment for the Region class is all his. The rest of this commit is
mostly boilerplate around a SetVector: getters, iterators, and some
debug helpers.
This is primarily to unblock Rust, which could potentially use
ImportListsTy::operator[] on a module that's not in ListsImpl and
cause concurrency problems.
This patch fixes a regression in the sense that it restores
ImportListsTy::lookup, which was available when ImportListsTy was just
a plain DenseMap.
The docs for Glob wasn't rendered correctly, I believe because the `\`
was not properly escaped. I haven't built these docs locally, so I'll
follow up to see if this is fixed after it lands.
https://llvm.org/doxygen/classllvm_1_1GlobPattern.html
Change the constant to INT_MAX instead of our own large number. Any
value larger than a valid frame index should work.
I'm a bit puzzled why it was using a shift of 30. A long time ago when
it was first created, the value was INT_MAX. Then it was changed in
e2b77d57c0c13 to (~0 >> 1) which I guess was trying to be INT_MAX
without using the constant. But ~0 is an `int` so that produced -1.
I'm not sure what the 'l' suffix was for. Unless that was an attempt to
avoid undefined behavior had the shift been 31 instead of 30. But 'long'
is 32 bits on some targets so that wouldn't have worked for all
platforms.
Using INT_MAX is straightforward and avoids any mysteries.
Since we are using the Scalarizer pass in the backend we needed a way to
allow this pass to operate on Target intrinsics.
We achieved this by adding `TargetTransformInfo ` to the Scalarizer
pass. This allowed us to call a function available to the DirectX
backend to know if an intrinsic is a target intrinsic that should be
scalarized.
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.
Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.
Clarify the G_EXTRACT_SUBVECTOR specification.
In the process of adding scalarization support for DirectX target
intrinsics I found that intrinsics that weren't marked with `IntrNoMem`
did not get removed by
`RecursivelyDeleteTriviallyDeadInstructionsPermissive`. So this change
is to make it more clear that our intrinsics don't have side effects.
I only added `IntrNoMem` to the intrinics in `IntrinsicsDirectX.td` I
was involved with. There a potentially a few other cases that might
warrant this attribute, but will need input on the others.
Unlike scalar, where AArch64 prefers expanding scmp/ucmp with select,
under Neon we can use the arithmetic expansion to generate fewer
instructions. Notably it also prevents the scalarization of vselect
during vector-legalization.
This patch removes from the list of allowed clauses for a handful of
compound constructs those that are specifically disallowed by the OpenMP
spec. In particular, the following restrictions are followed:
- (regarding combined constructs) If _directive-name-A_ is `target`, the
`copyin` clause must not be specified.
- (regarding composite constructs) If _directive-name-A_ is
`distribute`, the `ordered` clause must not be specified.
These restrictions are listed in the OpenMP Specification version 5.2,
sections 17.4 and 17.5. Since it's a similar case as PR #90754, I'm
adding people involved in that decision as reviewers here.
This is an implementation of the saturating fp to int conversions for
GlobalISel. On AArch64 the converstion instrctions work this way,
producing saturating results. LegalizerHelper::lowerFPTOINT_SAT is
ported from SDAG.
AArch64 has a lot of existing tests for fptosi_sat, covering a wide
range of types. I have tried to make most of them work all at once, but
a few fall back due to other missing features such as f128 handling for
min/max.
Before this patch, the pipeline selection logic in
ResourceManager::issueInstruction() didn't know how to correctly handle
instructions which consume multiple partially overlapping resource
groups. In some cases (like the test case from #108157), the inability
to correctly allocate resources on instruction issue was leading to
crashes.
The presence of multiple partially overlapping groups complicates the
selection process by introducing extra constraints. For those cases, the
issue logic now prioritizes groups which are more constrained than
others.
Fixes#108157
During inter-procedural SCCP, also infer attributes on arguments, not
just return values. This allows other non-interprocedural passes to make
use of the information later.
The InitUndef pass works around a register allocation issue, where undef
operands can be allocated to the same register as early-clobber result
operands. This may lead to ISA constraint violations, where certain
input and output registers are not allowed to overlap.
Originally this pass was implemented for RISCV, and then extended to ARM
in #77770. I've since removed the target-specific parts of the pass in
#106744 and #107885. This PR reduces the pass to use a single
requiresDisjointEarlyClobberAndUndef() target hook and enables it by
default. The hook is disabled for AMDGPU, because overlapping
early-clobber and undef operands are known to be safe for that target,
and we get significant codegen diffs otherwise.
The motivating case is the one in arm64-ldxr-stxr.ll, where we were
previously incorrectly allocating a stxp input and output to the same
register.
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.
This PR has two parts:
* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.
Remaining Tasks:
- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?
@topperc
---------
Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>