setType is unneeded (and AsmPrinter tries not to modify symbols).
AsmPrinter. MCSA_ELF_TypeFunction is available on all
targets using getSymbolPreferLocal.
Pull Request: https://github.com/llvm/llvm-project/pull/128138
`N` may get merged with existing nodes inside the loop. Early exit when
it is deleted to avoid the crash.
Alternative solution: use `DAGNodeDeletedListener` to refresh the value
of N.
Closes https://github.com/llvm/llvm-project/issues/128143.
This reverts commit 34167f99668ce4d4d6a1fb88453a8d5b56d16ed5.
Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.
We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.
The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.
https://reviews.llvm.org/D122616
Callee saved registers should always be phyiscal registers. They are
often passed directly to other functions that take MCRegister like
getMinimalPhysRegClass or TargetRegisterClass::contains.
Unfortunately, sometimes the MCRegister is compared to a Register which
gave an ambiguous comparison error when the MCRegister is on the LHS.
Adding a MCRegister==Register comparison operator created more ambiguous
comparison errors elsewhere. These cases were usually comparing against
a base or frame pointer register that is a physical register in a
Register. For those I added an explicit conversion of Register to
MCRegister to fix the error.
Use nonstatic member instead. This requires explicit conversions, but
many will go away as we continue converting unsigned to Register.
In a few places where it was simple, I changed unsigned to Register.
The SDAG version uses fminnum/fmaxnum, in converting it to fcmp+select
it appears the order of the operands was chosen badly. This switches the
conditions used to keep the constant on the RHS.
Similar to #117309.
The advisor and logger are accessed through the provider, which is
served by the new PM. Legacy PM forwards calls to the provider.
New PM is a machine function analysis that lazily initializes the
provider.
This reverts commit 36eaf0daf5d6dd665d7c7a9ec38ea22f27709fed.
This is not a sound approach to dealing with this instruction change.
The new behavior is a different opcode pair, not a modifier on the
existing opcode.
For targets that support IEEE fminimum_num/fmaximum_num, the
corresponding *_min_num_fXY/*_max_num_fXY instructions themselves
already did the canonicalization for the inputs. As a result, we do not
need to explicitly canonicalize the inputs for fminnum/fmaxnum.
Legacy pass used to provide the advisor, so this extracts that logic
into a provider class used by both analysis passes.
All three (Default, Release, Development) legacy passes
`*AdvisorAnalysis` are basically renamed to `*AdvisorProvider`, so the
actual legacy wrapper passes are `*AdvisorAnalysisLegacy`.
There is only one NPM analysis `RegAllocEvictionAnalysis` that switches
between the three providers in the `::run` method, to be cached by the
NPM.
Also adds `RequireAnalysis<RegAllocEvictionAnalysis>` to the optimized
target reg alloc codegen builder.
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line
argument (aarch64-enable-partial-reduce-nodes) that indicates whether the
intrinsic experimental_vector_partial_ reduce_add will be transformed
into the new ISD node. Lowering with the new ISD nodes will, for now,
always be done as an expand.
Shift amounts in SelectionDAG don't have to match the result type
of the shift. SelectionDAGBuilder will aggressively truncate shift
amounts to the target's preferred type. This may result in a zero extend
that existed in IR being removed.
If we look through a truncate here, we can't guarantee the upper bits
of the truncate input are zero. There may have been a zext that was
removed. Unfortunately, this regresses tests where no truncate was
involved. The only way I can think to fix this is to add an assertzext
when SelectionDAGBuilder truncates a shift amount or remove the
early truncation of shift amounts from SelectionDAGBuilder all together.
Fixes#126889.
Previously this would give up on folding subregister copies through
a reg_sequence if the input operand already had a subregister index.
d246cc618adc52fdbd69d44a2a375c8af97b6106 stopped introducing these
subregister uses, and this is the first step to lifting that restriction.
I was expecting to be able to implement this only purely with compose /
reverse compose, but I wasn't able to make it work so relies on testing
the lanemasks for whether the copy reads a subset of the input.
The last use was removed in:
commit 05e6bb40ebfd285cc87f7ce326b7ba76c3c7f870
Author: Roger Ferrer Ibáñez <rofirrim@gmail.com>
Date: Thu May 30 14:55:32 2024 +0200
In C++20, non-type template parameters can be float/double. Clang didn't
emit those constants in DWARF. This patch emits floating point constants
the same way we do other integral template value parameters.
In 5235973ee03aca4148ecabe5eff64da2af1e034e, an ICE was fixed in
getMemsetStringVal where f128 wasn't handled. It was noted at the time
[1] that the code below this also looks suspect, since it assumes the
element type of VT is either an f32 or f64.
This part of getMemsetStringVal relates to memcpy operations where the
source is a copy from a zero constant. The VT in question is determined
by TargetLowering::findOptimalMemOpLowering, which in turn calls a
further TLI hook getOptimalMemOpType.
For AArch64, getOptimalMemOpType returns either a v16i8, f128, i64, i32
or Other. For Other, TargetLowering::findOptimalMemOpLowering will then
pick an integer VT. So on AArch64 at least, I don't believe the suspect
code can be reached.
For other targets, ARM and x86 are the only ones that return a FP vector
type from getOptimalMemOpType. For both targets, the only such type is
v2f64, but given f64 is already handled it should also be fine.
To defend this, I considered adding an assert as mentioned in [1], but
given getConstantFP handles vector types, I figured using this to fully
handle the FP types makes the code simpler and more robust.
For test coverage I added unreachables to both of the branches handling
FP types in this code, but found neither fired with check-llvm across
all targets.
Test coverage was added to llvm/test/CodeGen/AArch64/memcpy-f128.ll in
5235973ee03aca4148ecabe5eff64da2af1e034e to defend ICE on f128, but at
some point it stopped hitting this code.
AArch64TargetLowering::getOptimalMemOpType was updated in
29200611055f49a0d37243caa5f8bba1df9d57a6, so I suspect this is when it
happened, although I haven't verified this. Although I did find by
updating the test to disable NEON, getOptimalMemOpType returns an f128
and the branch is once again hit.
For the final branch noted as suspect in [1], as far as I can tell this
has never had any test coverage, so I've added a test to the ARM backend
for this.
Fixes: https://github.com/llvm/llvm-project/issues/20521 [1]
This patch attempts to reland
https://github.com/llvm/llvm-project/pull/120780 while addressing the
issues that caused the patch to be reverted.
Namely:
1. The patch had included code from the llvm/Passes directory in the
llvm/CodeGen directory.
2. The patch increased the backend compile time by 2% due to adding a
very expensive include in MachineFunctionPass.h
The patch has been re-structured so that there is no dependency between
the llvm/Passes and llvm/CodeGen directory, by moving the base class,
`class DroppedVariableStats` to the llvm/IR directory.
The expensive include in MachineFunctionPass.h has been changed to
contain forward declarations instead of other header includes which was
pulling a ton of code into MachineFunctionPass.h and should resolve any
issues when it comes to compile time increase.
`RegisterClassInfo` was supposed to be kept alive between pass runs,
which wasn't being done leading to recomputations increasing the compile
time.
Now the Impl class is a member of the legacy and new passes so that it
is not reconstructed on every pass run.
---------
Co-authored-by: Christudasan Devadasan <christudasan.devadasan@amd.com>
They have same semantics. NonUniqueID is more friendly for isUnique
implementation in MCSectionELF.
History: 97837b7 added support for unique IDs in sections and added
GenericSectionID. Later, 1dc16c7 added NonUniqueID.
- **Precommit tests for synchronous uwtable CFI fixup**
- **[CFIFixup] Fixup CFI for split functions with synchronous uwtables**
Commit
6e54fccede
disables CFI fixup for
functions with synchronous tables, breaking CFI for split functions.
Instead, we can disable *block-level* CFI fixup for functions with
synchronous tables.
Unwind tables can be:
- N/A (not present)
- Asynchronous
- Synchronous
Functions without unwind tables don't need CFI fixup (since they don't
care about CFI).
Functions with asynchronous unwind tables must be accurate for each
basic block, so full CFI fixup is necessary.
Functions with synchronous unwind tables only need to be accurate for
each function (specifically, the portion of a function in a given
section). Disabling CFI fixup entirely for functions with synchronous
uwtables may break CFI for a function split between two sections. The
portion in the first section may have valid CFI, while the portion in
the second section is missing a call frame.
Ex:
```
(.text.hot)
Foo (BB1):
<Call frame information>
...
BB2:
...
(.text.split)
BB3:
...
BB4:
<epilogue>
```
Even if `Foo` has a synchronous unwind table, we still need to insert
call frame information into `BB3` so that unwinding the call stack from
`BB3` or `BB4` works properly.