38854 Commits

Author SHA1 Message Date
Teja Alaghari
b4b5bfaf40
[CodeGen][NPM] Update MPDT similar to MDT after unreachable BB elimination (#172421)
After unreachable machine basic blocks are removed, MPDT should also be
updated with the latest block numbers alongside MDT.
2025-12-19 11:09:49 +05:30
Teja Alaghari
4e89e710d9
[CodeGenPrepare][NPM] Remove incorrect LoopAnalysis preservation in CodeGenPrepare (#172418)
CodeGenPrepare modifies and restructures loops & control flow. So, it
shouldn't preserve LoopAnalysis.

The test `llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll` shows
CodeGenPrepare modifying loop structure, hence we cannot preserve
LoopAnalysis.
2025-12-19 11:08:31 +05:30
KRM7
c9aea6248a
[RegisterCoalescer] Don't commute two-address instructions which only define a subregister (#169031)
Currently, the register coalescer may try to commute an instruction
like:
```
%0.sub_lo32:gpr64 = AND %0.sub_lo32:gpr64(tied-def 0), %1.sub_lo32:gpr64
USE %0:gpr64
```
resulting in:
```
%1.sub_lo32:gpr64 = AND %1.sub_lo32:gpr64(tied-def 0), %0.sub_lo32:gpr64
USE %1:gpr64
```
However, this is not correct if the instruction doesn't define the
entire register, as the value of the upper 32-bits
of the register used in `USE` will not be the same.
2025-12-18 23:24:44 +01:00
Gaëtan Bossu
ef58e6f6af
[SDAG] Widen TRUNCATE to intermediate type to avoid ISel failure (#172473)
SelectionDAG offered no way to widen TRUNCATE for pathological types
like <vscale x 1 x ...> as they do not allow scalarisation.

One way to go further to is widen to an intermediate type which will
allow to promote the element type in a later run of legalisation.
2025-12-18 17:19:34 +00:00
guan jian
4e675a0c45
[SelectionDAG] Lowering usub.sat(a, 1) to a - (a != 0) (#170076)
I recently observed that LLVM generates the following code:
```
	addi	a1, a0, -1
	sltu	a0, a0, a1
	addi	a0, a0, -1
	and	a0, a0, a1
	ret
```
This could be optimized using the snez instruction instead.
2025-12-18 14:31:53 +00:00
Frederik Harwath
5c05824d2b
[CodeGen] Rename expand-fp to expand-ir-insts (#172681)
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.

Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
2025-12-18 11:15:04 +00:00
Frederik Harwath
71760f324f
[CodeGen] Merge ExpandLargeDivRem into ExpandFp (#172680)
Both passes expand instructions at the IR level.
They use the same kind of instruction visitation
logic and contain significant code duplication e.g.
for scalarization.
2025-12-18 09:22:47 +01:00
Kevin Per
0036c67445
[RISCV]: Implemented softening of FCANONICALIZE (#169234)
The `ISD::FCANONICALIZE` is mapped to `llvm.minnum(x, x)`.

Closes https://github.com/llvm/llvm-project/issues/169216
2025-12-17 16:38:18 -08:00
Rahman Lavaee
53005fd435
Use the Propeller CFG profile in the PGO analysis map if it is available. (#163252)
This PR implements the emitting of the post-link CFG information in PGO
analysis map, as explained in the
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617).
This is enabled by a flag `pgo-analysis-map-emit-bb-sections-cfg`.

This PR bumps the SHT_LLVM_BB_ADDR_MAP version to 5.
Also includes some refactoring changes related to storing the CFG in the
Basic block sections profile reader.
2025-12-17 14:19:18 -08:00
Valeriy Savchenko
e7892d702f
[DAGCombiner] Fix assertion failure in vector division lowering (#172321) 2025-12-17 22:09:54 +00:00
Folkert de Vries
a587ccd87d
fix llvm.fma.f16 double rounding issue when there is no native support (#171904)
fixes https://github.com/llvm/llvm-project/issues/98389

As the issue describes, promoting `llvm.fma.f16` to `llvm.fma.f32` does
not work, because there is not enough precision to handle the repeated
rounding. `f64` does have sufficient space. So this PR explicitly
promotes the 16-bit fma to a 64-bit fma.

I could not find examples of a libcall being used for fma, but that's
something that could be looked in separately to work around code size
issues.
2025-12-17 22:03:01 +01:00
Pan Tao
b6bfa85686
[aarch64] Mix the frame pointer with the stack cookie when protecting the stack (#161114)
This strengthens the guard and matches MSVC.

Fixes #156573 .
2025-12-17 12:52:28 -08:00
natanelh-mobileye
fa78d6a5f1
[SDAG] Shrink (abd? (?ext x) (?ext y)) (#171865)
Alive2 test: https://alive2.llvm.org/ce/z/maryYU
Lit test before change: https://godbolt.org/z/nEKWdPbMv

Fixes #171640
2025-12-17 16:30:52 +00:00
Nikita Popov
edb45d8ae4 [SDAG] Allow implicit trunc in BUILD_VECTOR legalization
BUILD_VECTOR may have operands larger than the result element type,
in which case it is specified to truncate. As such, allow implicit
truncation.
2025-12-17 15:22:00 +01:00
Nathan Corbyn
b7a20c1cc4
[GlobalISel] Don't permit G_*MIN/G_*MAX of pointer vectors (#168872)
- Use `LLT::changeElementType()` instead of `LLT::changeElementSize()`
in `LegalizerHelper::lowerMinMax()` to avoid a crash in the case that
the destination type is a pointer vector;
- Reject `G_*MIN`/`G_*MAX` of pointers and pointer vectors in
`MachineVerifier`;
- Don't combine `G_SELECT`+`G_ICMP` pairs into `G_*MIN`/`G_*MAX` generic
instructions when the operands are pointers / pointer vectors.

Fixes #166556
2025-12-17 09:03:41 +00:00
Craig Topper
816c9d64a7
[TargetLowering] Use getNegative. NFC (#172526)
This also fixes the type for the SUB to be ShVT instead of VT. I guess
we only test this when ShVT == VT.
2025-12-16 16:45:18 -08:00
Matt Arsenault
68aea8e202
AMDGPU: Avoid introducing unnecessary fabs in fast fdiv lowering (#172553)
If the sign bit of the denominator is known 0, do not emit the fabs.
Also, extend this to handle min/max with fabs inputs.

I originally tried to do this as the general combine on fabs, but
it proved to be too much trouble at this time. This is mostly
complexity introduced by expanding the various min/maxes into
canonicalizes, and then not being able to assume the sign bit
of canonicalize (fabs x) without nnan.

This defends against future code size regressions in the atan2 and
atan2pi library functions.
2025-12-17 00:22:12 +01:00
Matt Arsenault
eb1876c960
DAG: Fix arith_fence handling in SignBitIsZeroFP (#172537) 2025-12-16 20:10:38 +00:00
Frederik Harwath
6ad41bcc49
[CodeGen] expand-fp: Change frem expansion criterion (#158285)
The existing condition for checking whether or not to expand an frem
instruction in expand-fp is not sufficiently precise.
The expansion on other targets than AMDGPU - which is the only intended
user right now - is only prevented due to the interaction with the
MaxLegalFpConvertBitWidth check.  Relying on this is conceptually wrong
and limits the use of the pass for other targets and further expansions
(e.g. merging with the similar ExpandLargeDivRem pass).

Change the expansion criterion to always expand frem of a given type
for targets that use "Expand" as the legalization action for the 
underlying scalar type and use this to exit the pass early for targets 
which do not require any expansions. This requires to change the
frem legalization action for all targets which do not want frem to 
be expanded in this pass from "Expand" to "LibCall".

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-12-16 17:31:26 +01:00
Usman Nadeem
1ea201d73b
[WoA] Remove extra barriers after ARM LSE instructions with MSVC (#169596)
c9821abfc0
added extra fences after sequentially consistent stores for
compatibility with MSVC's seq_cst loads (ldr+dmb). These extra fences
should not be needed for ARM LSE instructions that have both
acquire+release semantics, which results in a two way barrier, and
should be enough for sequential consistency.

Fixes https://github.com/llvm/llvm-project/issues/162345

Change-Id: I9148c73d0dcf3bf1b18a0915f96cac71ac1800f2
2025-12-15 17:19:40 -08:00
Daniel Paoliello
644fd3b665
[FastISel] Don't select a CallInst as a BasicBlock in the SelectionDAG fallback if it has bundled ops (#162895)
This was discovered while looking at the codegen for x64 when Control
Flow Guard is enabled.

When using `SelectionDAG`, LLVM would generate the following sequence
for a CF guarded indirect call:
```
	leaq	target_func(%rip), %rax
	rex64 jmpq	*__guard_dispatch_icall_fptr(%rip) # TAILCALL
```

However, when Fast ISel was used the following is generated:
```
	leaq	target_func(%rip), %rax
	movq	__guard_dispatch_icall_fptr(%rip), %rcx
	rex64 jmpq	*%rcx                   # TAILCALL
```

This was happening despite Fast ISel aborting and falling back to
`SelectionDAG`.

The root cause for this code gen is that `SelectionDAGISel` has a
special case when Fast ISel aborts when lowering a `CallInst` where it
tries to lower the instruction as its own basic block, which for such a
CF Guard call means that it is lowering an indirect call to
`__guard_dispatch_icall_fptr` without observing that the function was
being loaded into a pointer in the preceding (and bundled) instruction.

The fix for this is to not use the special case when a `CallInst` has
bundled instructions: it's better to allow the call and its bundled
instructions to be lowered together by `SelectionDAG` instead.
2025-12-15 14:38:20 -08:00
Orlando Cazalet-Hyams
3e32735020
[DWARF] Add support for DW_GNU_call_target_clobbered (#172336)
Fixes assertion trip introduced in #172167

See https://issues.chromium.org/issues/468825583#comment2
2025-12-15 18:24:21 +00:00
Fabrice de Gans
28e9954a44
llvm: Add missing VirtualFileSystem.h include (#171848)
`vfs::FileSystem` is forward-declared in `SanitizerBinaryMetadata.h`.
The corresponding header must be included in any source file that
includes that header, or we risk issues when building with
`LLVM_BUILD_LLVM_DYLIB` to build LLVM as a DLL on Windows.

This effort is tracked in #109483.
2025-12-15 11:45:13 -05:00
Benjamin Maxwell
1847a4efae
[SDAG] Fix incorrect usage of VECREDUCE_ADD (#171459)
The mask needs to be extended to `i32` before reducing or the reduction
can incorrectly optimized to a VECREDUCE_XOR.
2025-12-15 15:01:31 +00:00
Nikita Popov
3f82a8a784 [ExpandFp] Use getSignMask() (NFC)
This was using getSigned() with an unsigned (not sign extended)
argument. Using plain get() would be correct here. We can go
one step further and use getSignMask() to avoid the issue entirely.
2025-12-15 15:44:03 +01:00
Simon Pilgrim
a68fde5780
[DAG] foldAddToAvg - optimize nested m_Reassociatable matchers (#171681)
The use of nested m_Reassociatable matchers by #169644 can result in
high compile times as the inner m_Reassociatable call is being repeated
a lot while the outer call is trying to match. Place the inner
m_ReassociatableAnd at the beginning of the pattern so it is not
repeatedly matched in recursion.
2025-12-15 13:41:02 +00:00
Orlando Cazalet-Hyams
792704038a
[DebugInfo][DWARF] Use DW_AT_call_target_clobbered for exprs with volatile regs (#172167)
Without this patch DW_AT_call_target is used for all indirect call address
location expressions. The DWARF spec says:

    For indirect calls or jumps where the address is not computable without use
    of registers or memory locations that might be clobbered by the call the
    DW_AT_call_target_clobbered attribute is used instead of the
    DW_AT_call_target attribute.

This patch implements that behaviour.
2025-12-15 12:54:18 +00:00
Nathan Corbyn
2f9bf3f292
[GlobalISel](NFC) Refactor construction of LLTs in LegalizerHelper (#170664)
I spotted a number of places where we're duplicating logic provided by
the `LLT` class inline in `LegalizerHelper`. This PR tidies up these
spots.
2025-12-15 12:26:27 +00:00
Nikita Popov
ce1b04720a
[SelectOptimize] Respect optnone (#170858)
Add the missing skipFunction() call so that optnone attributes and
opt-bisect-limit is respected.
2025-12-15 09:21:02 +01:00
Mingjie Xu
681dbf9941
[WinEH] Use removeIncomingValueIf() in UpdatePHIOnClonedBlock() (NFC) (#171962) 2025-12-14 09:41:13 +08:00
Craig Topper
0cdc1b6dd4
[SelectionDAG] Support integer types with multiple registers in ComputePHILiveOutRegInfo. (#172081)
PHIs that are larger than a legal integer type are split into multiple
virtual registers that are numbered sequentially. We can propagate the
known bits for each of these registers individually.

Big endian is not supported yet because the register order needs to be
reversed.

Fixes #171671
2025-12-13 13:24:41 -08:00
Matt Arsenault
b2d9356719
DAG: Make more use of the LibcallImpl overload of getExternalSymbol (#172171)
Also add a new copy for TargetExternalSymbol that AArch64 needs.
2025-12-13 19:16:47 +00:00
Orlando Cazalet-Hyams
fa1dceb67f
[DebugInfo][DWARF] Allow memory locations in DW_AT_call_target expressions (#171183)
Fixes #70949. Prior to PR #151378 memory locations were incorrect; that
patch prevented the emission of the incorrect locations.

This patch fixes the underlying issue.
2025-12-13 17:37:35 +00:00
Matt Arsenault
d8b03f282a
DAG: Use the LibcallImpl to get calling conv in ExpandDivRemLibCall (#172152) 2025-12-13 11:41:24 +00:00
Seraphimt
0603d4af1d
Fix misprint in computeKnownFPClass in GISelValueTracking.cpp (#171566)
Fix wrong value(from Instruction enum) in conditional and add test
check.
Related with https://github.com/llvm/llvm-project/issues/169959
2025-12-12 20:59:07 +01:00
KRM7
e0e5b6e1f7
[GISel][Inlineasm] Support inlineasm i/s constraint for symbols (#170094) 2025-12-12 20:16:17 +01:00
Seraphimt
112a6126ef
Fixes non-functional changes found static analyzer (#171197)
As per @arsenm 's instructions, I've separated the non-functional
changes from https://github.com/llvm/llvm-project/pull/169958.
Afterwards I'll tackle the functional ones one by one. I hope I did
everything right this time.

Full descriptions in the article:
https://pvs-studio.com/en/blog/posts/cpp/1318/
3. Array overrun is possible.
The PVS-Studio warning: V557 Array overrun is possible. The value of
'regIdx' index could reach 31. VEAsmParser.cpp 696
10. Excessive check.
The PVS-Studio warning: V547 Expression 'IsLeaf' is always false.
PPCInstrInfo.cpp 419
11. Doubling the same check.
The PVS-Studio warning: V581 The conditional expressions of the 'if'
statements situated alongside each other are identical. Check lines:
5820, 5823. PPCInstrInfo.cpp 5823
15. Excessive check.
The PVS-Studio warning: V547 Expression 'i != e' is always true.
MachineFunction.cpp 1444
17. Excessive assignment.
The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the
same value. MachineInstr.cpp 1995
18. Excessive check.
The PVS-Studio warning: V547 Expression 'AllSame' is always true.
SimplifyCFG.cpp 1914
19. Excessive check.
The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true.
LVDWARFReader.cpp 398
2025-12-12 20:03:02 +01:00
Nikita Popov
1d7bfb752f [SafeStack] Use getSigned() for negative value 2025-12-12 11:15:44 +01:00
Sam Tebbs
19e1011df5
[SelectionDAG] Fix unsafe cases for loop.dependence.{war/raw}.mask (#168565)
Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are
currently hard to split correctly, and there are a number of incorrect
cases.

The difficulty comes from how the intrinsics are defined. For example,
take `LOOP_DEPENDENCE_WAR_MASK`.

It is defined as the OR of:

* `(ptrB - ptrA) <= 0`
* `elementSize * lane < (ptrB - ptrA)`

Now, if we want to split a loop dependence mask for the high half of the
mask we want to compute:

* `(ptrB - ptrA) <= 0`
* `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)`

However, with the current opcode definitions, we can only modify ptrA or
ptrB, which may change the result of the first case, which should be
invariant to the lane.

This patch resolves these cases by adding a "lane offset" to the ISD
opcodes. The lane offset is always a constant. For scalable masks, it is
implicitly multiplied by vscale.

This makes splitting trivial as we increment the lane offset by
`LoVT.getElementCount()` now.

Note: In the AArch64 backend, we only support zero lane offsets (as
other cases are tricky to lower to whilewr/rw).

---------

Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
2025-12-12 08:44:33 +00:00
Nikita Popov
43a4442fac
[ExpandFp] Fix incorrect ConstantInt construction (#171861)
Explicitly cast the value to (int) before negating, so it gets properly
sign extended. Otherwise we end up with a large unsigned value instead
of a negative value for large bit widths.

This was found while working on
https://github.com/llvm/llvm-project/pull/171456.
2025-12-12 08:54:44 +01:00
Jan Svoboda
8e999e3d78
[llvm][clang] Sandbox filesystem reads (#165350)
This PR introduces a new mechanism for enforcing a sandbox around
filesystem reads coming from the compiler. A fatal error is raised
whenever the `llvm::sys::fs`, `llvm::MemoryBuffer::getFile*()` APIs get
used directly instead of going through the "blessed" virtual interface
of `llvm::vfs::FileSystem`.
2025-12-11 15:42:13 -08:00
Craig Topper
3e414b940a
[FunctionLoweringInfo] Use KnownBits::intersectWith. NFC (#171893) 2025-12-11 13:21:02 -08:00
Craig Topper
98a8072a65
[FunctionLoweringInfo] Remove unnecesary check for isVectorTy when isIntegerTy is true. NFC (#171880)
isIntegerTy is only true for scalars.
2025-12-11 13:20:41 -08:00
Alexis Engelke
6813f8f037
[IR] Don't store switch case values as operands
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.

After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.

Add new C API functions so that switch case values remain accessible
from bindings for other languages.

While this could also be achieved by merely changing the order of
operands (i.e., first all successors, then all constants), doing so
would increase the asymptotic runtime of addCase from O(1) to O(n)
(i.e., adding n cases would be O(n^2)), because it would need to shift
all constants by one slot. Having null/invalid operands is also a bad
idea and would cause much more breakage.

Pull Request: https://github.com/llvm/llvm-project/pull/170984
2025-12-11 18:38:39 +01:00
David Green
8a4cc440f2 [AArch64] Run optimizeTerminators earlier too. (#170907)
Running optimizeTerminators prior to other optimizations like branch
layout can lead to more folding and better codegen, but is not on its
own able to capture all cases. There is benefit to running it in both
places. This adds the existing code from #161508 into the
AArch64RedundantCopyElimination pass, which sounds like a sensible
enough place for it.

This is a recommit with an extra fix for shrink-wrapping domtree use.
2025-12-11 15:33:15 +00:00
Ramkumar Ramachandra
85fafd5db0
[SCEVExp] Get DL from SE, strip constructor arg (NFC) (#171823) 2025-12-11 14:26:47 +00:00
Matt Arsenault
a3aaa1a391
DAG: Use RuntimeLibcalls to legalize vector frem calls (#170719)
This continues the replacement of TargetLibraryInfo uses in codegen
with RuntimeLibcallsInfo started in
821d2825a4f782da3da3c03b8a002802bff4b95c.
The series there handled all of the multiple result calls. This
extends for the other handled case, which happened to be frem.

For some reason the Libcall for these are prefixed with "REM_", for
the instruction "frem", which maps to the libcall "fmod".
2025-12-11 13:33:27 +00:00
Nikita Popov
d33d80fae6
[FastISel] Don't force SDAG fallback for libcalls (#171782)
The fast instruction selector should should not force an SDAG fallback
to potentially make use of optimized libcall implementations.

Looking at
3e6fa462f3,
part of the motivation was to avoid libcalls in unoptimized builds for
targets that don't have them, but I believe this should be handled by
Clang directly emitting intrinsics instead of libcalls (which it already
does). FastISel should not second guess this.

Followup to https://github.com/llvm/llvm-project/pull/171288.
2025-12-11 14:14:06 +01:00
JaydeepChauhan14
9b6b52b534
[AsmPrinter][NFC] Reuse Target Triple variable (#171612) 2025-12-11 12:28:59 +01:00
Shubham Sandeep Rastogi
16e6055273
Revert "[SelectionDAG] Salvage debuginfo when combining load and sext… (#171745)
… instrs. (#169779)"

This reverts commit 2b958b9ee24b8ea36dcc777b2d1bcfb66c4972b6.

I might have broken the sanitizer-x86_64-linux bot


/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_procmaps_linux.cpp
clang++:
/home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:248:
const T &llvm::ArrayRef<llvm::DbgValueLocEntry>::operator[](size_t)
const [T = llvm::DbgValueLocEntry]: Assertion `Index < Length &&
"Invalid index!"' failed.
2025-12-10 16:49:59 -08:00