1037 Commits

Author SHA1 Message Date
Mikhail Gudim
562146499c
[CodeGen][NewPM] Port ReachingDefAnalysis to new pass manager. (#159572)
In this commit:
  (1) Added new pass manager support for `ReachingDefAnalysis`.
  (2) Added printer pass.
  (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-19 09:38:34 -04:00
Folkert de Vries
8a9e3333dd
s390x: optimize 128-bit fshl and fshr by high values (#154919)
Turn a funnel shift by N in the range `121..128` into a funnel shift in
the opposite direction by `128 - N`. Because there are dedicated
instructions for funnel shifts by values smaller than 8, this emits
fewer instructions.

This additional rule is useful because LLVM appears to canonicalize
`fshr` into `fshl`, meaning that the rules for `fshr` on values less
than 8 would not match on organic input.
2025-08-27 09:31:49 +02:00
Folkert de Vries
558657298a
s390x: pattern match saturated truncation (#155377)
Simplify min/max instruction matching by making the related
SelectionDAG operations legal.

Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.

Fixes https://github.com/llvm/llvm-project/issues/153655
2025-08-26 17:19:58 +02:00
Nikita Popov
63e7766047
[SystemZ] Allow forming overflow op for i128 (#153557)
Allow matching i128 overflow pattern into UADDO, which then allows use
of vaccq.
2025-08-14 16:15:22 +02:00
KRM7
ee47427386
[RegisterCoalescer] Fix subrange update when rematerialization widens a def (#151974)
Currently, when an instruction rematerialized by the register coalescer
defines more subregs of the destination register
than the original COPY instruction did, we only add dead defs for the
newly defined subregs if they were not defined anywhere
else. For example, consider something like this before
rematerialization:
```
 %0:reg64 = CONSTANT 1
 %1:reg128.sub_lo64_lo32 = COPY %0.lo32
 %1:reg128.sub_lo64_hi32 = ...
 ...
```
that would look like this after rematerializing `%0`:
```
 %0:reg64 = CONSTANT 2
 %1:reg128.sub_lo64 = CONSTANT 2
 %1:reg128.sub_lo64_hi32 = ...
 ...
```
A dead def would not be added for `%1.sub_lo64_hi32` at the 2nd
instruction because it's subrange wasn't empty beforehand.
2025-08-05 22:32:31 +09:00
Matt Arsenault
12568b6a4f
SystemZ: Add sincos intrinsic test (#147473)
The ZOS run line is mostly broken. update_test_checks seems
to not work on it and I have no idea what I'm looking at here.
It's not obvious to me what the calls are. I added some checks
for the references to the libcalls printed at the end of the module,
but didn't check anything in the function body. half also just
asserts somewhere.
2025-08-05 12:55:26 +09:00
sujianIBM
fc12fc635b
[SystemZ] Fix code in widening vector multiplication (#150836)
Commit cdc7864 has an error which would wrongly fold widening
multiplications into an even/odd widening operation.
This PR fixes it and adds tests to check scenarios which should not be
folded into an even/odd widening operation are actually not.
2025-07-31 13:18:23 -04:00
Simon Pilgrim
c37942df00
[DAG] visitFREEZE - limit freezing of multiple operands (#149797)
This is a partial revert of #145939 (I've kept the BUILD_VECTOR(FREEZE(UNDEF), FREEZE(UNDEF), elt2, ...) canonicalization) as we're getting reports of infinite loops (#148084).

The issue appears to be due to deep chains of nodes and how visitFREEZE replaces all instances of an operand with a common frozen version - other users of the original frozen node then get added back to the worklist but might no longer be able to confirm a node isn't poison due to recursion depth limits on isGuaranteedNotToBeUndefOrPoison.

The issue still exists with the old implementation but by only allowing a single frozen operand it helps prevent cases of interdependent frozen nodes.

I'm still working on supporting multiple operands as its critical for topological DAG handling but need to get a fix in for trunk and 21.x.

Fixes #148084
2025-07-22 15:40:55 +01:00
Trevor Gross
0db197adef
[Test] Mark a number of libcall tests nounwind (#148329)
Many tests for floating point libcalls include CFI directives, which
isn't needed for the purpose of these tests. Mark some of the relevant
test functions `nounwind` in order to remove this noise.
2025-07-12 11:57:28 +02:00
Vikram Hegde
fcd4a2fe7a
[CodeGen][NewPM] Port "PostRAMachineSink" pass to NPM (#129690) 2025-07-10 13:10:46 +05:30
Fangrui Song
68494ae072 [XRay] xray_fn_idx: fix alignment directive
Use `emitValueToAlignment` as the section does not contain code.
`emitCodeAlignment` would lead to ALIGN relocations on RISC-V and
LoongArch with linker relaxation.

In addition, change the alignment to wordsize, sufficient for the
runtime requirement (`XRayFunctionSledIndex`).

Related to #147322
2025-07-08 21:52:53 -07:00
Matt Arsenault
026307958b
SystemZ: Remove unnecessary requires asserts from test (#147477) 2025-07-09 09:28:57 +09:00
Matt Arsenault
1e26443cf9
CodeGen: Remove redundant REQUIRES registered-target from backend tests (#147475)
These are already applied to all the tests in the target subdirectory
2025-07-09 09:25:53 +09:00
Dominik Steenken
acdf1c7526
[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105)
This PR takes the work previously done by @pawan-nirpal-031 on X86 in
#106370, and makes it available in common code. This should enable all
targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data
types.

Canonicalization is implemented here as multiplication by `1.0`, as
suggested in [the
docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
2025-07-08 16:12:17 +01:00
Guy David
76274eb2b3
[PHIElimination] Revert #131837 #146320 #146337 (#146850)
Reverting because mis-compiles:
- https://github.com/llvm/llvm-project/pull/131837
- https://github.com/llvm/llvm-project/pull/146320
- https://github.com/llvm/llvm-project/pull/146337
2025-07-03 07:48:08 -04:00
Kai Nacke
ebcf7f91ff
[SystemZ][HLASM] Emit END instruction (#146110)
A HLASM source file must end with the END instruction. It is implemented
by adding a new function to the target streamer. This change also turns
SystemZHLASMSAsmString.h into a proper header file, and only uses the
SystemZTargetHLASMStreamer when HLASM output is generated.
2025-07-02 10:08:25 -04:00
woruyu
bbcebec3af
[DAG] Refactor X86 combineVSelectWithAllOnesOrZeros fold into a generic DAG Combine (#145298)
This PR resolves https://github.com/llvm/llvm-project/issues/144513

The modification include five pattern :
1.vselect Cond, 0, 0 → 0
2.vselect Cond, -1, 0 → bitcast Cond
3.vselect Cond, -1, x → or Cond, x
4.vselect Cond, x, 0 → and Cond, x
5.vselect Cond, 000..., X -> andn Cond, X

1-4 have been migrated to DAGCombine. 5 still in x86 code.

The reason is that you cannot use the andn instruction directly in
DAGCombine, you can only use and+xor, which will introduce optimization
order issues. For example, in the x86 backend, select Cond, 0, x →
(~Cond) & x, the backend will first check whether the cond node of
(~Cond) is a setcc node. If so, it will modify the comparison operator
of the condition.So the x86 backend cannot complete the optimization of
andn.In short, I think it is a better choice to keep the pattern of
vselect Cond, 000..., X instead of and+xor in combineDAG.

For commit, the first is code changes and x86 test(note 1), the second
is tests in other backend(node 2).

---------

Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2025-07-02 15:07:48 +01:00
Simon Pilgrim
38200e94f1
[DAG] visitFREEZE - always allow freezing multiple operands (#145939)
Always try to fold freeze(op(....)) -> op(freeze(),freeze(),freeze(),...).

This patch proposes we drop the opt-in limit for opcodes that are allowed to push a freeze through the op to freeze all its operands, through the tree towards the roots.

I'm struggling to find a strong reason for this limit apart from the DAG freeze handling being immature for so long - as we've improved coverage in canCreateUndefOrPoison/isGuaranteedNotToBeUndefOrPoison it looks like the regressions are not as severe.

Hopefully this will help some of the regression issues in #143102 etc.
2025-07-02 11:28:37 +01:00
Guy David
f5c62ee0fa
[PHIElimination] Reuse existing COPY in predecessor basic block (#131837)
The insertion point of COPY isn't always optimal and could eventually
lead to a worse block layout, see the regression test in the first
commit.

This change affects many architectures but the amount of total
instructions in the test cases seems too be slightly lower.
2025-06-29 21:28:42 +03:00
Kai Nacke
655d04859b
[GOFF] Add writing of text records (#137235)
Sections which are not allowed to carry data are marked as virtual. Only
complication when writing out the text is that it must be written in
chunks of 32k-1 bytes, which is done by having a wrapper stream writing
those records.
Data of BSS sections is not written, since the contents is known to be
zero. Instead, the fill byte value is used.
2025-06-26 13:50:40 -04:00
Kai Nacke
33872f1218
[GOFF] Add writing of section symbols (#133799)
Unlike other formats, the GOFF object file format uses a 2 dimensional structure
to define the location of data. For example, the equivalent of the ELF .text
section is made up of a Section Definition (SD) and a class (Element Definition;
ED). The name of the SD symbol depends on the application, while the class has
the predefined name C_CODE/C_CODE64 in AMODE31 and AMODE64 respectively.

Data can be placed into this structure in 2 ways. First, the data (in a text
record) can be associated with an ED symbol. To refer to data, a Label
Definition (LD) is used to give an offset into the data a name. When binding,
the whole data is pulled into the resulting executable, and the addresses
given by the LD symbols are resolved.

The alternative is to use a Part Definition (PR). In this case, the data (in
a text record) is associated with the part. When binding, only the data of
referenced PRs is pulled into the resulting binary.

Both approaches are used. SD, ED, and PR elements are modeled by nested
MCSectionGOFF instances, while LD elements are associated with MCSymbolGOFF
instances.

At the binary level, a record called "External Symbol Definition" (ESD) is used. The
ESD has a type (SD, ED, PR, LD), and depending on the type a different subset of
the fields is used.
2025-06-26 11:52:14 -04:00
Simon Pilgrim
eeb206d688 [SystemZ] vec-max-min-zerosplat.ll - regenerate checks
Reduces codegen diff in #145298
2025-06-25 11:18:00 +01:00
Stephen Tozer
757a0e6d3b
[SystemZ] Treat FAKE_USE instructions as instructions without a size (#144390)
This patch fixes an error in which `FAKE_USE` instructions would trigger
an assertion in SystemZLongBranch due to them having a size of 0 without
being excepted in the assertion that each instruction, other than a set
of known 0-size instruction types, should have a non-0 size.

`FAKE_USE` instructions are no-op instructions that are emitted into
LLVM by the `-fextend-variable-liveness` clang flag to help preserve the
liveness of source variables in optimized code, and therefore they
should be understood as being valid size 0 instructions.
2025-06-18 10:29:23 +01:00
Iris Shi
24d730b380
Reland "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (#143651) 2025-06-11 15:56:37 +08:00
Iris Shi
8c890eaa3f
Revert "[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets" (#143648) 2025-06-11 10:19:12 +08:00
Iris Shi
bfb48363b0
[SelectionDAG] Make (a & x) | (~a & y) -> (a & (x ^ y)) ^ y available for all targets (#137641) 2025-06-09 17:57:15 +08:00
Craig Topper
e4b4a939f8
[MCP] Disable BackwardCopyPropagateBlock for copies with implicit registers. (#137687)
If there's an implicit-def of a super register, the propagation
must preserve this implicit-def. Knowing how and when to do this
may require target specific knowledge so just disable it for now.
    
Prior to 2def1c4, we checked that the copy had explicit 2 operands
when that was removed we started allowing implicit operands through.
This patch adds a check for implicit operands, but still allows
extra explicit operands which was the goal of 2def1c4.
    
Fixes #131478.
2025-05-08 16:27:08 -07:00
Dominik Steenken
173ec728d2
[SystemZ] Fix a bug introduced by #135767 (#138280)
Commit `083b4a3d66` introduced a store-and-load pair around the `BRASL`
call to mcount. That load instruction did not properly declare its
target register as defined, leading to a bad machine instruction.

This commit fixes this by explicitly labeling `%r14` on the load as
`def`.
2025-05-02 19:01:13 +02:00
Dominik Steenken
083b4a3d66
[SystemZ] Add proper mcount handling (#135767)
When compiling with `-pg`, the `EntryExitInstrumenterPass` will insert
calls to the glibc function `mcount` at the begining of each
`MachineFunction`.

On SystemZ, these calls require special handling:

- The call to `mcount` needs to happen at the beginning of the prologue.
- Prior to the call to `mcount`, register `%r14`, the return address of
the callee function, must be stored 8 bytes above the stack pointer
`%r15`. After the call to `mcount` returns, that register needs to be
restored.

This commit adds some special handling to the EntryExitInstrumenterPass
that keeps the insertion of the mcount function into the module, but
skips over insertion of the actual call in order to perform this
insertion in the `emitPrologue` function. There, a simple sequence of
store/call/load is inserted, which implements the above.

The desired change in the `EntryExitInstrumenterPass` necessitated the
addition of a new attribute and attribute kind to each function, which
is used to trigger the postprocessing, aka call insertion, in
`emitPrologue`. Note that the new attribute must be of a different kind
than the `mcount` atribute, since otherwise it would replace that
attribute and later be deleted by the code that intended to delete
`mcount`. The new attribnute is called `insert-mcount`, while the
attribute kind is `systemz-backend`, to clearly mark it as a
SystemZ-specific backend concern.

This PR should address issue #121137 . The test inserted here is derived
from the example given in that issue.
2025-05-02 12:42:58 +02:00
Ulrich Weigand
be7ef6c52b
[MachineLICM] Recognize registers clobbered at EH landing pad entry (#122446)
EH landing pad entry implicitly clobbers target-specific exception
pointer and exception selector registers. The post-RA MachineLICM pass
needs to take these into account when deciding whether to hoist an
instruction out of the loop that initializes one of these registers.

Fixes: https://github.com/llvm/llvm-project/issues/122315
2025-04-25 22:27:27 +02:00
Jonas Paulsson
94a14f9f0d
[SystemZ] Add DAGCombine for FCOPYSIGN to remove rounding. (#136131)
Add a DAGCombine for FCOPYSIGN that removes the rounding which is never
needed as the sign bit is already in the correct place. This helps in particular the
rounding to f16 case which needs a libcall.

Also remove the roundings for other FP VTs and simplify the CPSDR
patterns correspondingly.

fp-copysign-03.ll test updated, now also covering the other FP VT
combinations.
2025-04-24 11:05:51 +02:00
Jonas Paulsson
1ec22fae7e
[SystemZ] Handle f16 load positive/negative/complement without libcalls. (#136286)
This can be done directly with the (64-bit) target instruction as only the sign bit
is changed.
2025-04-24 10:49:40 +02:00
Jonas Paulsson
6d03f51f0c
[SystemZ] Add support for 16-bit floating point. (#109164)
- _Float16 is now accepted by Clang.

- The half IR type is fully handled by the backend.

- These values are passed in FP registers and converted to/from float around
  each operation.

- Compiler-rt conversion functions are now built for s390x including the missing
  extendhfdf2 which was added.

Fixes #50374
2025-04-16 20:02:56 +02:00
Sergei Barannikov
3050061793
[AsmPrinter] Link .section_sizes to the correct section (#135583)
AsmPrinter may switch the current section when e.g., emitting a jump
table for a switch. `.stack_sizes` should still be linked to the
function section. If the section is wrong, readelf emits a warning
"relocation symbol is not in the expected section".
2025-04-14 20:04:50 +03:00
Dominik Steenken
e071233fa5
[SystemZ] Consider VST/VL as SimpleBDXStore/Load (#135623)
Previously `vst` and `vl` were not considered "simple" BDX stores and
loads, leading to, among other things, some opportunities for `mvc`
optimization to be missed.

This PR addresses this and updates some tests to account for additional
`mvc` instructions being emitted.

This is observed to have a neutral or slightly beneficial effect
performance-wise.
2025-04-14 18:58:57 +02:00
Ulrich Weigand
80267f8148
Support z17 processor name and scheduler description (#135254)
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.

This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
2025-04-11 00:20:58 +02:00
tltao
18189430ab
[SystemZ] Add check for INIT_UNDEF in getInstSizeInBytes (#134661)
Due to some optimization changes, INIT_UNDEF is making its way to
`getInstSizeInBytes` in `llvm/lib/Target/SystemZ/SystemZLongBranch.cpp`
but we do not have an exception there in the assert. Since INIT_UNDEF is
described as being similar to IMPLICIT_DEF and there is a check for
IMPLICIT_DEF, it seems logical to also add a check for INIT_UNDEF.

---------

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2025-04-10 16:16:20 -04:00
Dominik Steenken
e9a3ea2218
[SystemZ, DebugInfo] Instrument SystemZ backend passes for Instr-Ref DebugInfo (#133061)
This PR instruments the optimization passes in the SystemZ backend with
calls to `MachineFunction::substituteDebugValuesForInst` where
instruction substitutions are made to instructions that may compute
tracked values.

Tests are also added for each of the substitutions that were inserted.
Details on the individual passes follow.

### systemz-copy-physregs
When a copy targets an access register, we redirect the copy via an
auxiliary register. This leads to the final result being written by a
newly inserted SAR instruction, rather than the original MI, so we need
to update the debug value tracking to account for this.

### systemz-long-branch
This pass relaxes relative branch instructions based on the actual
locations of blocks. Only one of the branch instructions qualifies for
debug value tracking: BRCT, i.e. branch-relative-on-count, which
subtracts 1 from a register and branches if the result is not zero. This
is relaxed into an add-immediate and a conditional branch, so any
`debug-instr-number` present must move to the add-immediate instruction.

### systemz-post-rewrite
This pass replaces `LOCRMux` and `SELRMux` pseudoinstructions with
either the real versions of those instructions, or with branching
programs that implement the intent of the Pseudo. In all these cases,
any `debug-instr-number` attached to the pseudo needs to be reallocated
to the appropriate instruction in the result, either LOCR, SELR, or a
COPY.

### systemz-elim-compare
Similar to systemz-long-branch, for this pass, only few substitutions
are necessary, since it mainly deals with conditional branch
instructions. The only exceptiona are again branch-relative-on-count, as
it modifies a counter as part of the instruction, as well as any of the
load instructions that are affected.
2025-03-31 19:30:06 +02:00
Dominik Steenken
f24cf59d7a
[SystemZ] Add is(LoadFrom|StoreTo)StackSlotPostFE to SystemZBackend (#132928)
As part of an effort to enable instr-ref-based debug value tracking,
this PR implements `SystemZInstrInfo::isLoadFromStackSlotPostFE`, as
well as `SystemZInstrInfo::isStoreToStackSlotPostFE`. The implementation
relies upon the presence of MachineMemoryOperands on the relevant
`MachineInstr`s in order to access the `FrameIndex` post frame index
elimination.

Since these new functions are only meant to be called after frame-index
elimination, they assert against the present of a frame index on the
base register operand of the instruction.

Outside of the utility of these functions to enable instr-ref-based
debug value tracking, they also changes the behavior of the AsmPrinter,
since it will now be able to properly detect non-folded spills and
reloads, so this changes a number of tests that were checking
specifically for folded reloads.

Note that there are some tests that still check for `vst` and `vl` as
folded spills/reloads even though they should be straight reloads. This
will be addressed in a future PR.

Co-authored-by: Dominik Steenken <dominik.steenken@gmail.com>
2025-03-25 15:03:54 +01:00
tltao
f7a32b85b5
[MC][SystemZ] Introduce Target Specific HLASM Streamer for z/OS (#130535)
A more fleshed out version of a previous PR
https://github.com/llvm/llvm-project/pull/107415. The goal is to provide
platforms an alternative to the current MCAsmStreamer which only
supports the GNU Asm syntax.

RFC:
https://discourse.llvm.org/t/rfc-llvm-add-support-for-target-specific-asm-streamer/85095

---------

Co-authored-by: Tony Tao <tonytao@ca.ibm.com>
2025-03-21 11:36:35 -04:00
Ulrich Weigand
f4ea1055ad [SystemZ] Implement i128 funnel shifts
These can be handled via the VECTOR SHIFT LEFT/RIGHT DOUBLE
family of instructions, depending on architecture level.

Fixes: https://github.com/llvm/llvm-project/issues/129955
2025-03-15 18:28:44 +01:00
Ulrich Weigand
4155cc0fb3 [SystemZ] Recognize carry/borrow computation
Generate code using the VECTOR ADD COMPUTE CARRY and
VECTOR SUBTRACT COMPUTE BORROW INDICATION instructions
to implement open-coded IR with those semantics.

Handles integer vector types as well as i128.

Fixes: https://github.com/llvm/llvm-project/issues/129608
2025-03-15 18:28:44 +01:00
Ulrich Weigand
4a4987be36 [SystemZ] Optimize vector zero/sign extensions
Generate more efficient code for zero or sign extensions where
the source is a subvector generated via SHUFFLE_VECTOR.

Specifically, recognize patterns corresponding to (series of)
VECTOR UNPACK instructions, or the VECTOR SIGN EXTEND TO
DOUBLEWORD instruction.

As a special case, also handle zero or sign extensions of a
vector element to i128.

Fixes: https://github.com/llvm/llvm-project/issues/129576
Fixes: https://github.com/llvm/llvm-project/issues/129899
2025-03-15 18:28:44 +01:00
Ulrich Weigand
cdc7864986 [SystemZ] Optimize widening and high-word vector multiplication
Detect (non-intrinsic) IR patterns corresponding to the semantics
of the various widening and high-word multiplication instructions.

Specifically, this is done by:
- Recognizing even/odd widening multiplication patterns in DAGCombine
- Recognizing widening multiply-and-add on top during ISel
- Implementing the standard MULHS/MUHLU IR opcodes
- Detecting high-word multiply-and-add (which common code does not)

Depending on architecture level, this can support all integer
vector types as well as the scalar i128 type.

Fixes: https://github.com/llvm/llvm-project/issues/129705
2025-03-15 18:28:44 +01:00
Ulrich Weigand
7af3d3929e [SystemZ] Optimize vector comparison reductions
Generate efficient code using the condition code set by the
VECTOR (FP) COMPARE family of instructions to implement
vector comparison reductions, e.g. as resulting from
__builtin_reduce_and/or of some vector comparsion.

Fixes: https://github.com/llvm/llvm-project/issues/129434
2025-03-15 18:28:44 +01:00
Jonas Paulsson
85318bae28
[MachineLateInstrsCleanup] Handle multiple kills for a preceding definition. (#119132)
When removing a redundant definition in order to reuse an earlier
identical one it is necessary to remove any earlier kill flag as well.

Previously, the assumption has been that any register that kills the
defined Reg is enough to handle for this purpose, but this is actually
not quite enough. A kill of a super-register does not necessarily imply
that all of its subregs (including Reg) is defined at that point: a
partial definition of a register is legal. This means Reg may have been
killed earlier and is not live at that point.

This patch changes the tracking of kill flags to allow for multiple
flags to be removed: instead of remembering just the single / latest
kill flag, a vector is now used to track and remove them all.
TinyPtrVector seems ideal for this as there are only very rarely more
than one kill flag, and it doesn't seem to give much difference in
compile time.

The kill flags handling here is making this pass much more complicated
than it would have to be. This pass does not depend on kill flags for
its own use, so an interesting alternative to all this handling would be
to just remove them all. If there actually is a serious user, maybe that pass
could instead recompute them.

Also adding an assertion which is unrelated to kill flags, but it seems
to make sense (according to liberal assertion policy), to verify that
the preceding definition is in fact identical in clearKillsForDef().

Fixes #117783
2025-03-13 15:50:54 +01:00
Akshat Oke
af4ec59f8d
[CodeGen][NPM] Port ExpandPostRAPseudos to NPM (#129509) 2025-03-04 11:49:09 +05:30
Akshat Oke
77f44a9642
[CodeGen][NewPM] Port MachineSink to NPM (#115434)
Targets can set the EnableSinkAndFold option in CGPassBuilderOptions for
the NPM pipeline in buildCodeGenPipeline(... &Opts, ...)
2025-03-03 15:49:37 +05:30
Akshat Oke
aa1fe57b19
[RegAlloc][NewPM] Plug Greedy RA in codegen pipeline (#120557)
Use `-passes="regallocgreedy<[all|sgpr|wwm|vgpr]>` to insert the greedy
RA with a filter and `-regalloc-npm=<type>` to control which RA to use
in existing pipeline.
2025-03-03 11:06:15 +05:30
Jonas Paulsson
c298f71ea6
[SystemZ] Fix regstate of SELRMux operand in selectSLRMux(). (#128555)
It seems that there can be other cases with this that also can lead to
wrong code (discovered with csmith). This time it involved not the kill
flag but the undef flag.

Use the intersection of the flags from both MachineOperand:s instead
of the RegState from just one of them.
2025-02-28 15:03:04 +01:00