LBARX loads a byte from memory into a register, automatically setting
the remaining bits of the register to zero. If a subsequent RLWINM
instruction is used to clear those same bits (which LBARX has already
set to zero), the RLWINM is redundant and can be eliminated.
these redundant clear instructions are introduced by 85a9f2e14859b.
This is to fix an assertion failure with PeepholePPC64. The load/store
offset can be negative. A reduced case from one of our failures is added
as well.
DenseSet, SmallPtrSet, SmallSet, SetVector, and StringSet recently
gained C++23-style insert_range. This patch replaces:
Dest.insert(Src.begin(), Src.end());
with:
Dest.insert_range(Src);
This patch does not touch custom begin like succ_begin for now.
https://reviews.llvm.org/D111433 introduced PPCISD::CALL_RM for
-frounding-math. -msecure-plt -frounding-math {-fpic,-fPIC} codegen for
PPC32 became incorrect when a function contains function calls but no
global variable references (GlobalBaseReg).
As reported by @q66 , musl/src/dirent/closedir.c implements such a
function, which is miscompiled.
PPCISD::CALL has custom logic to set up the base register
(https://reviews.llvm.org/D42112). Add an extra case for CALL_RM.
While here, improve the test to
* actually test `case PPCISD::CALL`: we need a non-leaf function that
doesn't access global variables (global variables lead to
GlobalBaseReg, which call `getGlobalBaseReg()` as well).
* test `ExternalSymbolSDNode` with a memset.
Supersedes: #72758
Pull Request: https://github.com/llvm/llvm-project/pull/121281
SDNode::use_iterator now returns an SDUse& when dereferenced.
SDNode::user_iterator returns SDNode*. SDNode::use_begin/use_end/uses
work on use_iterator. SDNode::user_begin/user_end/users work on
user_iterator.
We can now write range based for loops using SDUse& and SDNode::uses().
I've converted many of these in this patch. I didn't update loops that
have additional variables updated in their for statement.
Some loops use SDNode::use_iterator::getOperandNo() which also prevents
using range based for loops. I plan to move this into SDUse in a follow
up patch.
Most of these are just places that want the first user and aren't
iterating over the whole list.
While there I changed some use_size() == 1 to hasOneUse() which
is more efficient.
This is part of an effort to rename use_iterator to user_iterator
and provide a use_iterator that dereferences to SDUse&. This patch
helps reduce the diff on later patches.
This function is most often used in range based loops or algorithms
where the iterator is implicitly dereferenced. The dereference returns
an SDNode * of the user rather than SDUse * so users() is a better name.
I've long beeen annoyed that we can't write a range based loop over
SDUse when we need getOperandNo. I plan to rename use_iterator to
user_iterator and add a use_iterator that returns SDUse& on dereference.
This will make it more like IR.
This is to prevent assertion failures when we disable implicit
truncation in getConstant().
getCanonicalConstSplat() works with a mix of unsigned and signed values,
so I explicitly truncate the APInt there.
- Fix build with `EXPENSIVE_CHECKS`
- Remove unused `PassName::ID` to resolve warning
- Mark `~SelectionDAGISel` virtual so AArch64 backend can work properly
This reverts commit de37c06f01772e02465ccc9f538894c76d89a7a1 to
de37c06f01772e02465ccc9f538894c76d89a7a1
It still breaks EXPENSIVE_CHECKS build. Sorry.
Port selection dag isel to new pass manager.
Only `AMDGPU` and `X86` support new pass version. `-verify-machineinstrs` in new pass manager belongs to verify instrumentation, it is enabled by default.
This patch adds support for toc-data for 64-bit large code-model on AIX.
The sequence ADDIStocHA8/ADDItocL8 is used to access the data directly
from the TOC.
When emitting the instruction ADDIStocHA8, we check if the symbol has
toc-data attribute before creating a toc entry for it. When emitting the
instruction ADDItocL8, we use the LA8 instruction to load the address.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
38 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Similar to 3f46e5453d9310b15d974e876f6132e3cf50c4b1, this patch allows
the backend to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided, for local-exec TLS
variables that are annotated with the "aix-small-tls" attribute.
The expectation is for local-exec TLS variables to be set with this
attribute through PGO. Furthermore, the optimized access sequence is
only generated for local-exec TLS variables annotated with
"aix-small-tls", only if they are less than ~32KB in size.
Exploit the per global code model attribute on AIX. On AIX we need to
update both the code sequence used to access the global (either 1 or 2
instructions for small and large code model respectively) and the
storage mapping class that we emit the toc entry.
---------
Co-authored-by: Amy Kwan <akwan0907@gmail.com>
In preparation of adding a similar instruction for large code model on
AIX for 32-bit, rename the exisitng ADDItocL 64-instruction to ADDItocL8
to match the naming convention of other instructions with 32-bit and
64-bit variants.
In IR or C code, shift amount larger than value size is undefined
behavior. But in practice, backend lowering for shift_parts produces
add/sub of shift amounts, thus constant shift amounts might be
negative or larger than value size, which depends on ISA definition.
PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction)
will be taken, and if the highest among them is 1, result will be
zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift
amount.
This commit emulates the behavior and avoids array overflow in bit
permutation's value bits calculator.
This patch utilizes the -maix-small-local-exec-tls option to produce a
faster,
non-TOC-based access sequence for the local-exec TLS model.
Specifically, for
when the offsets from the TLS variable are non-zero.
In particular, this patch produces either a single:
- addi/la with a displacement off of R13 plus a non-zero offset for when
an address is calculated, or
- load or store off of R13 plus a non-zero offset for when an address is
calculated and used for further
access where R13 is the thread pointer, respectively.
In order to produce a single addi or load/store off of the thread
pointer with a non-zero offset,
this patch also adds the necessary support in the assembly printer when
printing these instructions.
Specifically:
- The non-zero offset is added to the TLS variable address when the
address of the
TLS variable + it's offset is less than 32KB.
- Otherwise, when the address of the TLS variable + its offset is
greater than 32KB, the
non-zero offset (and a multiple of 64KB) is subtracted from the TLS
address.
This handling in the assembly printer is necessary to ensure that the
TLS address + the non-zero offset
is between [-32768, 32768), so that the total displacement can fit
within the addi/load/store instructions.
This patch is meant to be a follow-up to
3f46e5453d9310b15d974e876f6132e3cf50c4b1 (where the
optimization occurs for when the offset is zero).
This patch adds support for common and local symbols in the TOC for AIX.
Note that we need to update isVirtualSection so as a common symbol in
TOC will have the symbol type XTY_CM and will be initialized when placed
in the TOC so sections with this type are no longer virtual.
---------
Co-authored-by: Zaara Syeda <syzaara@ca.ibm.com>
This follows on from #76708, allowing
`cast<ConstantSDNode>(N)->getZExtValue()` to be replaced with just
`N->getAsZextVal();`
Introduced via `git grep -l "cast<ConstantSDNode>\(.*\).*getZExtValue" |
xargs sed -E -i
's/cast<ConstantSDNode>\((.*)\)->getZExtValue/\1->getAsZExtVal/'` and
then using `git clang-format` on the result.
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.
Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.
This matches other nearby enums.
For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
This reverts commit ee643b706be2b6bef9980b25cc9cc988dab94bb5.
Fix up build failures in targets I missed in #66003
Kept as 3 commits for reviewers to see better what's changed. Will
squash when
merging.
- reland [InlineAsm] wrap ConstraintCode in enum class NFC (#66003)
- fix all the targets I missed in #66003
- fix off by one found by llvm/test/CodeGen/SystemZ/inline-asm-addr.ll
This patch utilizes the -maix-small-local-exec-tls option added in
D155544 to produce a faster access sequence for the local-exec TLS
model, where loading from the TOC can be avoided.
The patch either produces an addi/la with a displacement off of r13
(the thread pointer) when the address is calculated, or it produces an
addi/la followed by a load/store when the address is calculated and
used for further accesses.
This patch also optimizes this sequence a bit more where we can remove
the addi/la when the load/store offset is 0. A follow up patch will
be posted to account for when the load/store offset is non-zero, and
currently in these situations we keep the addi/la that precedes the
load/store.
Furthermore, this access sequence is only performed for TLS variables
that are less than ~32KB in size.
Differential Revision: https://reviews.llvm.org/D155600
Summary: Materialization a 64-bit constant with High32=Low32 only requires 2 instructions instead of 3 when Low32 can be materialized in 1 instruction.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D158495
This patch tries to catch a codegen opportunity where the rotate and
mask can be merged into a single RLDCL instruction.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D158328
Set the ReplaceFlags variable to false, since there is code meant only
for the ADDItocHi/ADDItocL nodes. This has the side effect of disabling
the peephole when the load/store instruction has a non-zero offset.
This patch also fixes retrieving the `ImmOpnd` node from the AIX small
code model pseduos and does the same for the register operand node.
This allows cleaning up the later calls to replaceOperands.
Finally move calculating the MaxOffset into the code guarded by
ReplaceFlags as it is only used there and the comment is specific to the ELF
ABI.
Fixes https://github.com/llvm/llvm-project/issues/63927
Differential Revision: https://reviews.llvm.org/D155957
Followup to D101178 - peephole optimization that converts a
load address instruction and a consuming load/store into just the
load/store when its safe to do so.
eg: converts the 2 instruction code sequence
la 4, i[TD](2)
stw 3, 0(4)
to
stw 3, i[TD](2)
Differential Revision: https://reviews.llvm.org/D101470
This patch is a follow up to D149722, D152669 and D153645, where a slightly more
optimized code sequence is generated for 64-bit and 32-bit local-exec accesses
when optimizations are turned on.
Handling is added PPCISelDAGToDAG.cpp in order to check if any D-form loads or
stores that follow an PPCISD::ADD_TLS can be optimized to use an X-Form load or
store. In this particular situation, this allows the ADD_TLS node to be removed
completely.
Differential Revision: https://reviews.llvm.org/D150367
This patch is a follow up to D43315, and adds the following new load/store
TLS specific instructions for integer and floating point scalar types:
```
LHAXTLS
LWAXTLS
LHAXTLS_32
LWAXTLS_32
LFSXTLS
LFDXTLS
STFSXTLS
STFDXTLS
```
These instructions can be used to optimized TLS sequences when D-Form
loads/stores follow an ADD_TLS instruction.
Duplicate versions of these instructions are also added within an isAsmParserOnly=1
block (similar to D47382) to allow llvm-mc to assemble these instructions.
Differential Revision: https://reviews.llvm.org/D153645
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
This is rework of;
- rG13e77db2df94 (r328395; MVT)
Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h`
can be restored as well.
Depends on D148767
Differential Revision: https://reviews.llvm.org/D149024