Since D110065, the 'R' profile support is added to LLVM. It turns the
`generic` cpu into the intersection of v8-a and v8-r. However, this
makes some backward compatibility problems. The original patch makes
the clang driver implicitly pass -march=armv8-a when only the triple
is specified. Since it only applies to clang, other tools like
llvm-objdump still faces the backward compatibility problem.
This patch applies the same idea to MC related tools by enabling '+v8a'
feature when nothing is specified (both CPU and FS are empty) for
MCSubtargetInfo creation.
This patch should fix PR53956.
Reviewed by: labrinea
Differential Revision: https://reviews.llvm.org/D124319
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7. The t2Bcc
instruction is recognized as a loop-ending branch.
MachinePipeliner is extended by adding support for
"unpipelineable" instructions. These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.
The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.
It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D122672
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.
This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.
Reviewed By: DiggerLin
Differential Revision: https://reviews.llvm.org/D123951
In MASM, if a QWORD symbol is passed to a jmp or call instruction in
64-bit mode or a DWORD or WORD symbol is passed in 32-bit mode, then
MSVC's assembler recognizes that as an indirect call. Additionally, if
the operand is qualified as a ptr, then that should also be an indirect
call.
Furthermore, in 64-bit mode, such operands are implicitly rip-relative
(in fact, MSVC's assembler ml64.exe does not allow explicitly specifying
rip as a base register.)
To keep this patch managable, this patch does not include:
* error messages for wrong operand types (e.g. passing a QWORD in 32-bit
mode)
* resolving indirect calls if the symbol is declared after it's first
use (llvm-ml currently only runs a single pass).
* imlementing the extern keyword (required to resolve
https://crbug.com/762167.)
This patch is likely missing a bunch of edge cases, so please do point
them out in the review.
Reviewed By: epastor, hans, MaskRay
Committed By: epastor (on behalf of ayzhao)
Differential Revision: https://reviews.llvm.org/D124413
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D124096
By clearing the HasDummyMask flag from mask register binary operations
and mask load/store.
HasDummyMask was causing an extra operand to get appended when
converting from MachineInstr to MCInst. This extra operand doesn't
appear in the assembly string so was mostly ignored, but it prevented
the alias instruction printing from working correctly.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D124424
Ideally we'd fold this with generic DAGCombiner, but that only works for !isTruncateFree cases - we might be able to adapt IsDesirableToPromoteOp to find truncated src ops in the future, but for now just use this peephole.
Noticed in Issue #55138
The `llvm.x86.cast.tile.to.vector` intrinsic is lowered to
`llvm.x86.tilestored64.internal` and `load <256 x i32>`. The
`llvm.x86.cast.vector.to.tile` is lowered to `store <256 x i32>` and
`llvm.x86.tileloadd64.internal`. When `llvm.x86.cast.tile.to.vector` is
used by `store <256 x i32>` or `load <256 x i32>` is used by
`llvm.x86.cast.vector.to.tile`, they can be combined by
`llvm.x86.tilestored64.internal` and `llvm.x86.tileloadd64.internal`.
Differential Revision: https://reviews.llvm.org/D124378
The default implementation of findCommutedOpIndices picks the
first two source operands. That's exactly what we want for the
scalar FMA instructions.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D124463
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
DXIL doesn't support attributes added after LLVM 3.7. The DXILPrepare
pass removes those attributes so they should never be present by the
time we reach the DXIL bitcode writer.
In the event that we somehow try to write a newer attribute in the DXIL
writer, we should fail hard (crash), because the output would be
invalid. This case should only be possible if the DXIL writer were
called without DXILPrepare being run first, which shouldn't be possible.
This patch also adds a default case to the switch statement over the
attribute list which covers all the removed cases and any new attribute
kinds that may be added in the future. The default case is handled like
other unsupported cases by a call to llvm_unreachable.
As older waves execute long sequences of VALU instructions, this may
prevent younger waves from address calculation and then issuing their
VMEM loads, which in turn leads the VALU unit to idle. This patch tries
to prevent this by temporarily raising the wave's priority.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D124246
Given a larger-than-legal shuffle mask, the final codegen will split
into multiple sub-vectors. This attempts to model that in
AArch64TTIImpl::getShuffleCost, splitting masks up according to the size
of the legalized vectors. If the sub-masks have at most 2 input sources
we can call getShuffleCost on them and sum the costs, to get a more
accurate final cost for the entire shuffle. The call to
improveShuffleKindFromMask helps to improve the shuffle kind for the
sub-mask cost call.
Differential Revision: https://reviews.llvm.org/D123414
Given a shuffle with 4 elements size 16 or 32, we can use the costs
directly from the PerfectShuffle tables to get a slightly more accurate
cost for the resulting shuffle.
Differential Revision: https://reviews.llvm.org/D123409
Following D118810 that reduced the size of ISel table,
this patch optimizes allone-masked RVV pseudos with TU policy and
swap them out to their unmasked TU pseudos.
Since the UNDEF merge operand is not preserved, we turn it into TA
pseudo regardless of the policy operand.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D121881
Add these bits to the MUBUF and FLAT LDS DMA instructions:
- LGKM_CNT - these operate on LDS;
- VALU - SPG 3.9.8: This instruction acts as both a MUBUF and
VALU instruction;
Codegen currently does not produce any of this, so the change is NFC.
Differential Revision: https://reviews.llvm.org/D124472
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:
$ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
$ ptxas debug-file-loc.s
ptxas debug-file-loc.s, line 42; fatal : Parsing error near
'"foo.h"': syntax error
Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.
Differential Revision: https://reviews.llvm.org/D121299
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
Switch statements that cover all cases should not have a `default`
case. When a switch covers all cases and includes a `default` case,
clang emits a diagnostic. Omitting the `default` case allows the
compiler to instead emit a diagnostic on unhandled enum values.
This change removes default cases from all the places that they
shouldn't be, and adds a missing enum case for one switch statement
that wasn't covering all values.
Instead of report fatal error, this patch emit error message and exit
when shapes are not pre-defined. This would cause the compiling fail but
not crash.
Differential Revision: https://reviews.llvm.org/D124342
As implemented this patch assumes that Typed pointer support remains in
the llvm::PointerType class, however this could be modified to use a
different subclass of llvm::Type that could be disallowed from use in
other contexts.
This does not rely on inserting typed pointers into the Module, it just
uses the llvm::PointerType class to track and unique types.
Fixes#54918
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D122268
Make sure NVPTX backend can handle bitcasting between `float` and `<2 x half>` types.
This was discovered through: https://github.com/intel/llvm/issues/5969
I'm not suggesting that such bitcasts make much sense, but it feels like the compiler should not hard crash on them.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124171