CodeGenPrepare modifies and restructures loops & control flow. So, it
shouldn't preserve LoopAnalysis.
The test `llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll` shows
CodeGenPrepare modifying loop structure, hence we cannot preserve
LoopAnalysis.
Currently, the register coalescer may try to commute an instruction
like:
```
%0.sub_lo32:gpr64 = AND %0.sub_lo32:gpr64(tied-def 0), %1.sub_lo32:gpr64
USE %0:gpr64
```
resulting in:
```
%1.sub_lo32:gpr64 = AND %1.sub_lo32:gpr64(tied-def 0), %0.sub_lo32:gpr64
USE %1:gpr64
```
However, this is not correct if the instruction doesn't define the
entire register, as the value of the upper 32-bits
of the register used in `USE` will not be the same.
SelectionDAG offered no way to widen TRUNCATE for pathological types
like <vscale x 1 x ...> as they do not allow scalarisation.
One way to go further to is widen to an intermediate type which will
allow to promote the element type in a later run of legalisation.
I recently observed that LLVM generates the following code:
```
addi a1, a0, -1
sltu a0, a0, a1
addi a0, a0, -1
and a0, a0, a1
ret
```
This could be optimized using the snez instruction instead.
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.
Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
Both passes expand instructions at the IR level.
They use the same kind of instruction visitation
logic and contain significant code duplication e.g.
for scalarization.
This PR implements the emitting of the post-link CFG information in PGO
analysis map, as explained in the
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617).
This is enabled by a flag `pgo-analysis-map-emit-bb-sections-cfg`.
This PR bumps the SHT_LLVM_BB_ADDR_MAP version to 5.
Also includes some refactoring changes related to storing the CFG in the
Basic block sections profile reader.
fixes https://github.com/llvm/llvm-project/issues/98389
As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.
I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
- Use `LLT::changeElementType()` instead of `LLT::changeElementSize()`
in `LegalizerHelper::lowerMinMax()` to avoid a crash in the case that
the destination type is a pointer vector;
- Reject `G_*MIN`/`G_*MAX` of pointers and pointer vectors in
`MachineVerifier`;
- Don't combine `G_SELECT`+`G_ICMP` pairs into `G_*MIN`/`G_*MAX` generic
instructions when the operands are pointers / pointer vectors.
Fixes#166556
If the sign bit of the denominator is known 0, do not emit the fabs.
Also, extend this to handle min/max with fabs inputs.
I originally tried to do this as the general combine on fabs, but
it proved to be too much trouble at this time. This is mostly
complexity introduced by expanding the various min/maxes into
canonicalizes, and then not being able to assume the sign bit
of canonicalize (fabs x) without nnan.
This defends against future code size regressions in the atan2 and
atan2pi library functions.
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check. Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).
Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the
underlying scalar type and use this to exit the pass early for targets
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to
be expanded in this pass from "Expand" to "LibCall".
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
c9821abfc0
added extra fences after sequentially consistent stores for
compatibility with MSVC's seq_cst loads (ldr+dmb). These extra fences
should not be needed for ARM LSE instructions that have both
acquire+release semantics, which results in a two way barrier, and
should be enough for sequential consistency.
Fixes https://github.com/llvm/llvm-project/issues/162345
Change-Id: I9148c73d0dcf3bf1b18a0915f96cac71ac1800f2
This was discovered while looking at the codegen for x64 when Control
Flow Guard is enabled.
When using `SelectionDAG`, LLVM would generate the following sequence
for a CF guarded indirect call:
```
leaq target_func(%rip), %rax
rex64 jmpq *__guard_dispatch_icall_fptr(%rip) # TAILCALL
```
However, when Fast ISel was used the following is generated:
```
leaq target_func(%rip), %rax
movq __guard_dispatch_icall_fptr(%rip), %rcx
rex64 jmpq *%rcx # TAILCALL
```
This was happening despite Fast ISel aborting and falling back to
`SelectionDAG`.
The root cause for this code gen is that `SelectionDAGISel` has a
special case when Fast ISel aborts when lowering a `CallInst` where it
tries to lower the instruction as its own basic block, which for such a
CF Guard call means that it is lowering an indirect call to
`__guard_dispatch_icall_fptr` without observing that the function was
being loaded into a pointer in the preceding (and bundled) instruction.
The fix for this is to not use the special case when a `CallInst` has
bundled instructions: it's better to allow the call and its bundled
instructions to be lowered together by `SelectionDAG` instead.
`vfs::FileSystem` is forward-declared in `SanitizerBinaryMetadata.h`.
The corresponding header must be included in any source file that
includes that header, or we risk issues when building with
`LLVM_BUILD_LLVM_DYLIB` to build LLVM as a DLL on Windows.
This effort is tracked in #109483.
This was using getSigned() with an unsigned (not sign extended)
argument. Using plain get() would be correct here. We can go
one step further and use getSignMask() to avoid the issue entirely.
The use of nested m_Reassociatable matchers by #169644 can result in
high compile times as the inner m_Reassociatable call is being repeated
a lot while the outer call is trying to match. Place the inner
m_ReassociatableAnd at the beginning of the pattern so it is not
repeatedly matched in recursion.
Without this patch DW_AT_call_target is used for all indirect call address
location expressions. The DWARF spec says:
For indirect calls or jumps where the address is not computable without use
of registers or memory locations that might be clobbered by the call the
DW_AT_call_target_clobbered attribute is used instead of the
DW_AT_call_target attribute.
This patch implements that behaviour.
PHIs that are larger than a legal integer type are split into multiple
virtual registers that are numbered sequentially. We can propagate the
known bits for each of these registers individually.
Big endian is not supported yet because the register order needs to be
reversed.
Fixes#171671
Fixes#70949. Prior to PR #151378 memory locations were incorrect; that
patch prevented the emission of the incorrect locations.
This patch fixes the underlying issue.
As per @arsenm 's instructions, I've separated the non-functional
changes from https://github.com/llvm/llvm-project/pull/169958.
Afterwards I'll tackle the functional ones one by one. I hope I did
everything right this time.
Full descriptions in the article:
https://pvs-studio.com/en/blog/posts/cpp/1318/
3. Array overrun is possible.
The PVS-Studio warning: V557 Array overrun is possible. The value of
'regIdx' index could reach 31. VEAsmParser.cpp 696
10. Excessive check.
The PVS-Studio warning: V547 Expression 'IsLeaf' is always false.
PPCInstrInfo.cpp 419
11. Doubling the same check.
The PVS-Studio warning: V581 The conditional expressions of the 'if'
statements situated alongside each other are identical. Check lines:
5820, 5823. PPCInstrInfo.cpp 5823
15. Excessive check.
The PVS-Studio warning: V547 Expression 'i != e' is always true.
MachineFunction.cpp 1444
17. Excessive assignment.
The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the
same value. MachineInstr.cpp 1995
18. Excessive check.
The PVS-Studio warning: V547 Expression 'AllSame' is always true.
SimplifyCFG.cpp 1914
19. Excessive check.
The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true.
LVDWARFReader.cpp 398
Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are
currently hard to split correctly, and there are a number of incorrect
cases.
The difficulty comes from how the intrinsics are defined. For example,
take `LOOP_DEPENDENCE_WAR_MASK`.
It is defined as the OR of:
* `(ptrB - ptrA) <= 0`
* `elementSize * lane < (ptrB - ptrA)`
Now, if we want to split a loop dependence mask for the high half of the
mask we want to compute:
* `(ptrB - ptrA) <= 0`
* `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)`
However, with the current opcode definitions, we can only modify ptrA or
ptrB, which may change the result of the first case, which should be
invariant to the lane.
This patch resolves these cases by adding a "lane offset" to the ISD
opcodes. The lane offset is always a constant. For scalable masks, it is
implicitly multiplied by vscale.
This makes splitting trivial as we increment the lane offset by
`LoVT.getElementCount()` now.
Note: In the AArch64 backend, we only support zero lane offsets (as
other cases are tricky to lower to whilewr/rw).
---------
Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
Explicitly cast the value to (int) before negating, so it gets properly
sign extended. Otherwise we end up with a large unsigned value instead
of a negative value for large bit widths.
This was found while working on
https://github.com/llvm/llvm-project/pull/171456.
This PR introduces a new mechanism for enforcing a sandbox around
filesystem reads coming from the compiler. A fatal error is raised
whenever the `llvm::sys::fs`, `llvm::MemoryBuffer::getFile*()` APIs get
used directly instead of going through the "blessed" virtual interface
of `llvm::vfs::FileSystem`.
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Add new C API functions so that switch case values remain accessible
from bindings for other languages.
While this could also be achieved by merely changing the order of
operands (i.e., first all successors, then all constants), doing so
would increase the asymptotic runtime of addCase from O(1) to O(n)
(i.e., adding n cases would be O(n^2)), because it would need to shift
all constants by one slot. Having null/invalid operands is also a bad
idea and would cause much more breakage.
Pull Request: https://github.com/llvm/llvm-project/pull/170984
Running optimizeTerminators prior to other optimizations like branch
layout can lead to more folding and better codegen, but is not on its
own able to capture all cases. There is benefit to running it in both
places. This adds the existing code from #161508 into the
AArch64RedundantCopyElimination pass, which sounds like a sensible
enough place for it.
This is a recommit with an extra fix for shrink-wrapping domtree use.
This continues the replacement of TargetLibraryInfo uses in codegen
with RuntimeLibcallsInfo started in
821d2825a4f782da3da3c03b8a002802bff4b95c.
The series there handled all of the multiple result calls. This
extends for the other handled case, which happened to be frem.
For some reason the Libcall for these are prefixed with "REM_", for
the instruction "frem", which maps to the libcall "fmod".
The fast instruction selector should should not force an SDAG fallback
to potentially make use of optimized libcall implementations.
Looking at
3e6fa462f3,
part of the motivation was to avoid libcalls in unoptimized builds for
targets that don't have them, but I believe this should be handled by
Clang directly emitting intrinsics instead of libcalls (which it already
does). FastISel should not second guess this.
Followup to https://github.com/llvm/llvm-project/pull/171288.