String pool merging currently, for a reason that's not entirely clear to
me, tries to create GEP instructions instead of GEP constant expressions
when replacing constant references. It only uses constant expressions in
cases where this is required. However, it does not catch all cases where
such a requirement exists. For example, the landingpad catch clause has
to be a constant.
Fix this by always using the constant expression variant, which also
makes the implementation simpler.
Additionally, there are some edge cases where even replacement with a
constant GEP is not legal. The one I am aware of is the
llvm.eh.typeid.for intrinsic, so add a special case to forbid
replacements for it.
Fixes https://github.com/llvm/llvm-project/issues/88844.
Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
If `LLVM_APPEND_VC_REV` is on, add the git revision to the `.file`
string. The revision can be set with `LLVM_FORCE_VC_REVISION`.
Before:
`.file "git_revision.cpp",,"LLVM version 19.0.0git"`
After:
`.file "git_revision.cpp",,"LLVM version 19.0.0git (LLVM_REVISION)"`
Add pre-commit MIR test for PR "[Promote Pseudo Opcode from 32-bit to
64-bit after eliminating the extsw instruction in PPCMIPeepholes
optimization](https://github.com/llvm/llvm-project/pull/85451)" which
fixes bug reported in the issue "[Inconsistent Output at -O1 and -O2
Optimization Levels on PowerPC64 Due to Complex Type Casting and Nested
Loop Structure](https://github.com/llvm/llvm-project/issues/71030)".
According to langref, llvm.maximum/minimum has -0.0 < +0.0 semantics and
propagates NaN.
Expand the nodes on targets not supporting the operation, by adding
extra check for NaN and using is_fpclass to check zero signs.
This improves handling of `threadlocal.address` intrinsic in analyses:
The thread-id cannot change within a function with the exception of
suspend points of pre-split coroutines. This changes
`llvm::getUnderlyingObject` to look through `threadlocal.address` in
these cases.
`GlobalsAAResult::AnalyzeUsesOfPointer` checks whether an address can be
traced to simple loads/stores or escapes to other places. Starting the
analysis from a thread-local `GlobalValue` the `threadlocal.address`
intrinsic is safe to skip here.
This improves issue #87437
The load narrowing part of TargetLowering::SimplifySetCC is updated
according to this:
1) The offset calculation (for big endian) did not work properly for
non byte-sized types. This is basically solved by an early exit
if the memory type isn't byte-sized. But the code is also corrected
to use the store size when calculating the offset.
2) To still allow some optimizations for non-byte-sized types the
TargetLowering::isPaddedAtMostSignificantBitsWhenStored hook is
added. By default it assumes that scalar integer types are padded
starting at the most significant bits, if the type needs padding
when being stored to memory.
3) Allow optimizing when isPaddedAtMostSignificantBitsWhenStored is
true, as that hook makes it possible for TargetLowering to know
how the non byte-sized value is aligned in memory.
4) Update the algorithm to always search for a narrowed load with
a power-of-2 byte-sized type. In the past the algorithm started
with the the width of the original load, and then divided it by
two for each iteration. But for a type such as i48 that would
just end up trying to narrow the load into a i24 or i12 load,
and then we would fail sooner or later due to not finding a
newVT that fulfilled newVT.isRound().
With this new approach we can narrow the i48 load into either
an i8, i16 or i32 load. By checking if such a load is allowed
(e.g. alignment wise) for any "multiple of 8 offset", then we can find
more opportunities for the optimization to trigger. So even for a
byte-sized type such as i32 we may now end up narrowing the load
into loading the 16 bits starting at offset 8 (if that is allowed
by the target). The old algorithm did not even consider that case.
5) Also start using getObjectPtrOffset instead of getMemBasePlusOffset
when creating the new ptr. This way we get "nsw" on the add.
These test cases show some miscomplies for big-endian when dealing
with non byte-sized loads. One part of the problem is that LLVM IR
isn't really telling where the padding goes for non byte-sized
loads/stores. So currently TargetLowering::SimplifySetCC can't assume
anything about it. But the implementation also do not consider that
the TypeStoreSize could be larger than the TypeSize, resulting in
the offset calculation being wrong for big-endian.
Pre-commit for https://github.com/llvm/llvm-project/pull/87646
Following the aix-small-local-exec-tls target attribute, this patch adds
a target attribute for an AIX-specific option in llc that informs the
compiler that it can use a faster access sequence for the local-dynamic
TLS model (formally named aix-small-local-dynamic-tls) when TLS
variables are less than ~32KB in size.
The patch either produces an addi/la with a displacement off of module
handle (return value from .__tls_get_mod) when the address is
calculated, or it produces an addi/la followed by a load/store when the
address is calculated and used for further accesses.
---------
Co-authored-by: Amy Kwan <amy.kwan1@ibm.com>
If the V2 of the vector_shuffle is undef, the two vector inputs are
expected to be the same when do the VECINSERT transformation. For now
the first operand of VECINSERT is set to undef which is not right.
This patch fixes this bug.
rldimi is 64-bit instruction, due to backward compatibility, it needs to
be expanded into series of rotate and masking in 32-bit environment. In
the future, we may improve bit permutation selector and remove such
direct codegen.
`RegisterClassInfo::getRegPressureSetLimit` has been changed to return a
smaller value than before so the limit may become negative in later
calculations. As a workaround, change to use
`TargetRegisterInfo::getRegPressureSetLimit`.
Also improve tests.
Previously we wouldn't remove dead copies from basic blocks with
successors. The comment said we didn't want to trust the live-in lists.
The comment is very old so I'm not sure if that's still a concern today.
This patch checks the live-in lists and removes copies from
MaybeDeadCopies if they are referenced by any live-ins in any
successors. We only do this if the tracksLiveness property is set. If
that property is not set, we retain the old behavior.
Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.
The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code.
This patch updates the frame lowering code to calculate the offsets as 64-bit values and resolves the overflows, resulting in the correct codegen for very large frames.
Fixes#48911
rldimi is 64-bit instruction, so the corresponding builtin should not
be available in 32-bit mode. Rotate amount should be in range and
cases when mask is zero needs special handling.
This change also swaps the first and second operands of rldimi/rlwimi
to match previous behavior. For masks not ending at bit 63-SH,
rotation will be inserted before rldimi.
Exploit the per global code model attribute on AIX. On AIX we need to
update both the code sequence used to access the global (either 1 or 2
instructions for small and large code model respectively) and the
storage mapping class that we emit the toc entry.
---------
Co-authored-by: Amy Kwan <akwan0907@gmail.com>
In preparation of adding a similar instruction for large code model on
AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8
to match the naming convention of other instructions with 32-bit and
64-bit variants.
This restores commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Previously reverted in f010b1bef4dda2c7082cbb41dbabf1f149cce306.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
This reverts commit c7fdd8c11e54585dc9d15d63de9742067e0506b9.
Reason: Broke the sanitizer buildbots. See the comments at
https://github.com/llvm/llvm-project/pull/71785
for more information.
These builtins are already there in Clang, however current codegen may
produce suboptimal results due to their complex behavior. Implement them
as intrinsics to ensure expected instructions are emitted.
Original commit 79889734b940356ab3381423c93ae06f22e772c9.
Perviously reverted in commit a2afcd5721869d1d03c8146bae3885b3385ba15e.
LLVM function calls carry convergence control tokens as operand bundles, where
the tokens themselves are produced by convergence control intrinsics. This patch
implements convergence control tokens in MIR as follows:
1. Introduce target-independent ISD opcodes and MIR opcodes for convergence
control intrinsics.
2. Model token values as untyped virtual registers in MIR.
The change also introduces an additional ISD opcode CONVERGENCECTRL_GLUE and a
corresponding machine opcode with the same spelling. This glues the convergence
control token to SDNodes that represent calls to intrinsics. The glued token is
later translated to an implicit argument in the MIR.
The lowering of calls to user-defined functions is target-specific. On AMDGPU,
the convergence control operand bundle at a non-intrinsic call is translated to
an explicit argument to the SI_CALL_ISEL instruction. Post-selection adjustment
converts this explicit argument to an implicit argument on the SI_CALL
instruction.
Delete the code that skips the CFI for the condition register on ELF32.
The code checked !MustSaveCR, which happened only when
Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling
cr in a different way. The spill was missing CFI. After deleting this
code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14
to r31.
Fixes#83094