EdgeOnly indicates a phi def, which can already be identified by
LN_Last with non-null PInfo. Most of the code already reasons in
terms of LN_Last instead of EdgeOnly.
As far as I can tell, this set is pointless: It just represents whether
the target block has multiple predecessors, and the way it is
constructed and queried, we're not even reducing the number of
getSinglePredecessor() calls or something like that.
This probably should emit a DiagnosticInfoUnsupported and use the
default calling convention instead, but then we would need to pass
in the context.
Also move where CCAssignFnForCall is called. It was unnecessarily
called for each argument, so the error wouldn't trigger for functions
with 0 arguments.
This only ensures the error occurs for functions defined with the
calling convention. The error is still missed for outgoing calls
with no arguments. The lowering logic here is convoluted, calling
CCAssignFnForCall for each argument and it does not mirror
LowerFormalArguments so I'm not sure what's going on here.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch takes care of the clang side of the migration.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch takes care of the llvm side of the migration.
I invoked yaml2obj with a mistyped filename and received no error
message. I nearly thought the program had succeeded, but the shell's
exit code prompt tipped me off.
Pull Request: https://github.com/llvm/llvm-project/pull/144835
This patch adds support for parsing symbols in the XAndesPerf branch
immediate instructions. The branch immediate instructions use
`R_RISCV_NDS_BRANCH_10` relocation. It uses a 10-bit PC-relative branch
offset.
Updates the Root Signature metadata parser to extract version
information. This requirement was added after the initial parser
implementation.
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Construct RuntimeLibcallsInfo instead of manually creating a map.
This was repeating the setting of the RETURN_ADDRESS. This removes
an obstacle to generating libcall information with tablegen.
This is also not great, since it's setting a static map which
would be broken if there were ever a triple with a different libcall
configuration.
These are module level properties, and querying them through
a function-level subtarget context is confusing. Plus we don't
need an aliased name. This doesn't avoid all the uses, just the
ones in the TargetLowering constructor.
When a customer class inherits from a libc++ class, and is built with
"-flto -fwhole-program-vtables -static-libstdc++ \
-Wl,-plugin-opt=-whole-program-visibility", the libc++ class's vtable is
available_externally, meanwhile the customer class vtable is private.
And
both of them are !vcall_visibility == Linkage Unit.
In this case, icall.branch.funnel might be generated.
But the icall.branch.funnel would cause crash in LowerTypeTests because
available_externally Global_Object's GlobalTypeMember would not be
saved and finally leads to a NULL GlobalTypeMember which causes a crash.
Even saving the available_externally GO's GlobalTypeMember so that it is
not NULL to avoid the crash in LowerTypeTests, it still will crash in
SelectionDAGBuilder or Verifier, because operands linkage type
consistency
check of icall.branch.funnel can not pass.
So any one of available externally vtable would stop to generate
icall.branch.funnel.
This patch fixes FullLTO mode and split-LTO-unit ThinLTO mode.
Some SPARC ISA levels and/or extensions are defined in a way such that
the availability of it implies the availability of other, more fundamental
ISA features (for example, targeting 64-bit environment implies that
V9 instructions are available).
Properly set those in the TableGen definitions.
Fixes https://github.com/llvm/llvm-project/issues/142388.
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
Updating x87 floating point register significantly affect the
performance of the functions.
All the floating point exception reads will merge the results from both
mxcsr and x87 registers anyway.
Rather than having a number of static local variables, we now use
a single `OffloadContext` struct to store global state. This is
initialised by `olInit`, but is never deleted (de-initialization of
Offload isn't yet implemented).
The error reporting mechanism has not been moved to the struct, since
that's going to cause issues with teardown (error messages must outlive
liboffload).
Some new code was added to flang/Semantics that only depends on
facilities in flang/Evaluate. Move it into Evaluate and clean up some
minor stylistic problems.
None of the constructors touched in this patch has a corresponding
definition. This patch explicitly deletes them with "= delete" while
moving them to the public section of respective classes. Note that "=
delete" itself serves as documentation.
Identified with modernize-use-equals-delete.
Barrier instructions are no-ops in single-wave workgroups. This includes
s_barrier_signal_isfirst, which will leave SCC unmodified.
Model this correctly (via an implicit use of SCC) and ensure SCC==1
before the barrier instruction (if the wave is the only one of the
workgroup, then it is the first).
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Add missing test coverage for interleaving with
VPExtendedReduction/VPMulAccumulateReduction recipes.
Adds missing test coverage in preparation for
https://github.com/llvm/llvm-project/pull/144281.
New gfx950 MFMA allows bf16 operands.
c0cc81cdc0/llvm/include/llvm/IR/IntrinsicsAMDGPU.td (L3434)
When running `amdgpu-to-rocdl`, Current logic converts bf16 to i16
always which fails to compile for newer bf16 MFMA e.g.
`v_mfma_f32_16x16x32bf16`.
Backend expects bf16 type for the operands for those newer MFMAs. This
patch fixes it.
CC: @krzysz00 @dhernandez0 @giuseros @antiagainst @kuhar
This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations.
In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to `i64`.
This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:
```
block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10
```
While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.
Alive2: https://alive2.llvm.org/ce/z/gLgfYX
Related to #143219.
This patch updates the following ops to use `result` (instead of `res`)
as the name for their result argument:
* `vector.scalable.insert`
* `vector.scalable.extract`
* `vector.insert_strided_slice`
This change ensures naming consistency with other ops in the `vector`
dialect. It addresses part of:
* https://github.com/llvm/llvm-project/issues/131602