5010 Commits

Author SHA1 Message Date
David Green
ab9a80a3ad
[DAG] Allow AssertZExt to scalarize. (#122463)
With range and undef metadata on a call we can have vector AssertZExt
generated on a target with no vector operations. The AssertZExt needs to
scalarize to a normal `AssertZext tin, ValueType`. I have added
AssertSext too, although I do not have a test case.

Fixes #110374
2025-01-11 16:29:06 +00:00
Antonio Frighetto
446a426436 [ARM] Record store with pre/post-indexed addressing as mayStore
A miscompilation issue observed during machine sinking has been
addressed with improved handling.

Fixes: https://github.com/llvm/llvm-project/issues/121299.
2025-01-07 09:39:05 +01:00
Antonio Frighetto
7810e6a3a8 [ARM] Introduce test for PR121565 (NFC) 2025-01-07 09:39:05 +01:00
Björn Pettersson
3ad2399148
[DAGCombiner] Refactor and improve ReduceLoadOpStoreWidth (#119564)
This patch make a couple of improvements to ReduceLoadOpStoreWidth.

When determining the minimum size of "NewBW" we now take byte boundaries
into account. If we for example touch bits 6-10 we shouldn't accept
NewBW=8, because we would fail later when detecting that we can't access
bits from two different bytes in memory using a single load. Instead we
make sure to align LSB/MSB according to byte size boundaries up front
before searching for a viable "NewBW".

In the past we only tried to find a "ShAmt" that was a multiple of
"NewBW", but now we use a sliding window technique to scan for a viable
"ShAmt" that is a multiple of the byte size. This can help out finding
more opportunities for optimization (specially if the original type
isn't byte sized, and for big-endian targets when the original
load/store is aligned on the most significant bit).
2024-12-16 12:15:11 +01:00
Pengcheng Wang
da71203e6f
[MISched] Unify the way to specify scheduling direction (#119518)
For pre-ra scheduling, we use two options `-misched-topdown` and
`-misched-bottomup` to force the direction.

While for post-ra scheduling, we use `-misched-postra-direction`
with enumerated values (`topdown`, `bottomup` and `bidirectional`).

This is not unified and adds some mental burdens. Here we replace
these two options `-misched-topdown` and `-misched-bottomup` with
`-misched-prera-direction` with the same enumerated values.

To avoid the condition of `getNumOccurrences() > 0`, we add a new
enum value `Unspecified` and make it the default initial value.

These options are hidden, so we needn't keep the compatibility.
2024-12-12 11:24:07 +08:00
Bjorn Pettersson
22780f808a [DAGCombiner] Fix to avoid writing outside original store in ReduceLoadOpStoreWidth (#119203)
DAGCombiner::ReduceLoadOpStoreWidth could replace memory accesses
with more narrow loads/store, although sometimes the new load/store
would touch memory outside the original object. That seemed wrong
and this patch is simply avoiding doing the DAG combine in such
situations.

Also simplifying the expression used to align ShAmt down to a multiple
of NewBW. Subtracting (ShAmt % NewBW) should do the same thing as the
old more complicated expression.

Intention is to follow up with a patch that make more attempts, trying
to align the memory accesses at other offsets, allowing to trigger
the transform in more situations. The current strategy for deciding
size (NewBW) and offset (ShAmt) for the narrowed operations are a bit
ad-hoc, and not really considering big endian memory order in same
way as little endian.
2024-12-11 15:07:16 +01:00
Bjorn Pettersson
bc1f3eb593 [DAGCombiner] Pre-commit test case for ReduceLoadOpStoreWidth. NFC
Adding test cases related to narrowing of load-op-store sequences.
ReduceLoadOpStoreWidth isn't careful enough, so it may end up
creating load/store operations that access memory outside the region
touched by the original load/store. Using ARM as a target for the
test cases to show what happens for both little-endian and big-endian.

This patch also adds a way to override the TLI.isNarrowingProfitable
check in DAGCombiner::ReduceLoadOpStoreWidth by using the option
-combiner-reduce-load-op-store-width-force-narrowing-profitable.
Idea is that it should be simpler to for example add lit tests
verifying that the code is correct for big-endian (which otherwise
is difficult since there are no in-tree big-endian targets that
is overriding TLI.isNarrowingProfitable).

This is a pre-commit for
  https://github.com/llvm/llvm-project/pull/119203
2024-12-11 15:07:15 +01:00
Sergei Barannikov
e0ed0333f0
Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887)
Re-landing #116970 after fixing miscompilation error.

The original change made it possible for CMPZ to have multiple uses;
`ARMDAGToDAGISel::SelectCMPZ` was not prepared for this.

Pull Request: https://github.com/llvm/llvm-project/pull/118887


Original commit message:

Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.
2024-12-07 10:14:36 +03:00
Oliver Stannard
2d8e8dd2b8
[ARM] Add Cortex-A510 CPU for AArch32 (#118811)
This core was originally AArch64-only, but the r1p0 revision added
optional support for AArch32 at EL0.

TRM: https://developer.arm.com/documentation/101604/0103
2024-12-06 08:51:22 +00:00
Oliver Stannard
99b862efba
[DAGISel][ARM] Fix vector truncate combine for big-endian (#118101)
This DAG combine was incorrect for big-endian targets, because it
assumes that when a bitcast changes the lane width, the
least-significant bits of the wider lanes are in the lower-numbered
lanes of the smaller type, which is only true for little-endian.
2024-12-04 14:32:15 +00:00
Simon Pilgrim
e6eac65ad6 [ARM] 2012-03-13-DAGCombineBug.ll - regenerate checks 2024-12-02 11:46:49 +00:00
Martin Storsjö
2a5e1da57a
Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232)
Reverts llvm/llvm-project#116970.

This change broke Wine compiled for armv7, causing segfaults when
starting Wine. See llvm/llvm-project#116970 for more detailed discussion
about the issue.
2024-12-02 00:02:25 +02:00
Sergei Barannikov
a348f223ca
[ARM] Stop gluing ALU nodes to branches / selects (#116970)
Following #116547 and #116676, this PR changes the type of results and
operands of some nodes to accept / return a normal type instead of Glue.

Unfortunately, changing the result type of one node requires changing
the operand types of all potential consumer nodes, which in turn
requires changing the result types of all other possible producer nodes.
So this is a bulk change.

Pull Request: https://github.com/llvm/llvm-project/pull/116970
2024-11-30 08:14:24 +03:00
Sergei Barannikov
61a23646c9
[SjLjEHPrepare] Configure call sites correctly (#117656)
After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site`
before all invoke instruction except the one in the entry block, which
has the effect of bypassing landing pads on exceptions.

When configuring the call site for a potentially throwing instruction
check that it is not `InvokeInst` -- they are handled by earlier code.
2024-11-27 08:03:47 +03:00
Sergei Barannikov
ad9dcd96dc
Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

`TRI::getCrossCopyRegClass` is modified in a way that prevents DAG
scheduler from copying FPSCR into a virtual register. The register
allocator might need to spill the virtual register, but that only seem
to work in Thumb mode.
2024-11-22 22:29:58 +03:00
Sergei Barannikov
5d32a1409d
Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175)
Reverts llvm/llvm-project#116676

Reverting per post-commit feedback (causes miscompilation errors and/or
assertion failures).
2024-11-21 18:26:53 +03:00
Sergei Barannikov
8c56dd3040
[ARM] Stop gluing FP comparisons to FMSTAT (#116676)
Following #116547, this changes the result of `ARMISD::CMPFP*` and the
operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type.

This change allows comparisons to be CSEd and scheduled around as can be
seen in the test changes.

Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is
going to be changed in a separate patch.

This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a
negative value. The reason is the same as for CCR register class: it
makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV`
register to / from virtual registers. Previously, this was not
necessary, since no attempt was made to create copies in the first
place.

There might be a case when a copy can't be avoided (although not found
in existing tests). If a copy is necessary, the virtual register will be
created with `cl_FPSCR_NZCV` register class. If this register class is
inappropriate, `TRI::getCrossCopyRegClass` should be modified to return
the correct class.

Pull Request: https://github.com/llvm/llvm-project/pull/116676
2024-11-20 16:07:05 +03:00
Sergei Barannikov
aff98e4be0
[ARM] Stop gluing 1-bit shifts (#116547)
1. When two (or more) nodes are glued, DAG scheduler will always
schedule them as one piece, i.e. it will not allow any instructions to
be scheduled between them. It does so because if nodes are glued this
usually means that there is an implicit register dependency between
them, and an intervening node could clobber this physical register. When
emitting such nodes into machine IR, they will also be stuck together,
e.g.:
```
    %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr
    %10:gpr = RRX %3, implicit $cpsr
```

2. If a node has Glue result, SelectionDAG will not try to CSE this
node. If it did, it would break the implicit physical register
dependency. In practice this means that if a node with Glue result has
multiple uses, it has to be duplicated before each use. This the reason
for `ARMTargetLowering::duplicateCmp` to exist.

When using normal data dependency, dependent nodes can freely be
scheduled around. If there is a physical register dependency between
nodes, the physical register will be copied to/from a virtual register,
allowing other nodes to intervene between them. The resulting machine IR
might look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = COPY $cpsr
    %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = BICri killed %11, -2147483648, 14 /* CC::al */, $noreg, $noreg
    $cpsr = COPY %10
    %13:gpr = RRX %3, implicit $cpsr
```

The two copies are likely to be eliminated by register coalescer, given
that there are no instructions between them that clobber this physical
register. If the copies are unwanted in the first place (they could be
expensive or impossible), DAG scheduler will try to avoid inserting them
wherever possible, and the resulting machine IR will look like this:
```
    %9:gpr = LSRs1 killed %8, implicit-def $cpsr
    %10:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al */, $noreg, $noreg
    %11:gpr = BICri killed %10, -2147483648, 14 /* CC::al */, $noreg, $noreg
    %12:gpr = RRX %3, implicit $cpsr
```

On ARM, arithmetic operations and LSLS already use the new data flow
approach. This patch extends it to include 1-bit shifts.

Pull Request: https://github.com/llvm/llvm-project/pull/116547
2024-11-19 17:46:48 +03:00
Akshat Oke
3f9d02aae8
[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326)
With this, all machine SSA optimization passes are available in the new codegen pipeline.
2024-11-18 11:02:01 +05:30
Serge Pavlov
f97f96492d
[GlobalISel][ARM] Legalize reset_fpmode (#115859)
Implement lowering intrinsic `reset_fpmode` in Global Selector for ARM
target.
2024-11-16 17:21:33 +07:00
Tex Riddell
5c2a133b13
Emit constrained atan2 intrinsic for clang builtin (#113636)
This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

- `Builtins.td` - Add f16 support for libm atan2 builtin
- `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin
- `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of
atan2 for clang builtin to lib call calling convention check, now that
atan2 maps to an intrinsic.
- add atan2 cases to llvm.experimental.constrained tests for more
backends: ARM, PowerPC, RISCV, SystemZ.
- LangRef.rst: add llvm.experimental.constrained.atan2, revise
llvm.atan2 description.

Last part of Implement the atan2 HLSL Function. Fixes #70096.
2024-11-12 13:34:29 -08:00
abhishek-kaushik22
d2aff182d3
Revert "TLS loads opimization (hoist)" (#114740)
This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4.

Based on the discussions in #112772, this pass is not needed after the
introduction of `llvm.threadlocal.address` intrinsic.

Fixes https://github.com/llvm/llvm-project/issues/112771.
2024-11-07 10:10:28 +01:00
Paul Walker
38fffa630e
[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548) 2024-11-06 11:53:33 +00:00
Oliver Stannard
2d56de9e7e Revert "[ARM] Add extra tests for CVE-2024-7883 with undef/poison"
Reverting because this causes a test failure in the expensive-checks
buildbot.

This reverts commit ed9dab67e2932baf11bfa514b07b159c3bffd518.
2024-11-06 10:35:44 +00:00
Oliver Stannard
ed9dab67e2 [ARM] Add extra tests for CVE-2024-7883 with undef/poison 2024-11-06 09:28:14 +00:00
Jon Roelofs
4c3e1e3c4a
[llvm][AsmPrinter] Add an option to print instruction latencies (#113243)
... matching what we have in the disassembler. This isn't turned on by
default since several of the scheduling models are not completely
accurate, and we don't want to be misleading.
2024-11-05 17:28:52 -08:00
Simon Pilgrim
aef0e77c76
[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992)
If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly.

These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg.

Alternative to #114967

Fixes the remaining regression in #112588
2024-11-05 14:42:15 +00:00
Yingwei Zheng
917b3d13b5
[SDAG] Intersect poison-generating flags after CSE (#114650)
This patch intersects poison-generating flags after CSE to fix assertion
failure reported in
https://github.com/llvm/llvm-project/pull/112354#issuecomment-2452369552.

Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>
2024-11-02 19:06:27 +08:00
Oliver Stannard
33411d5207
[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433)
When doing a call from CMSE secure state to non-secure state for
v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and
restore the FP registers around the call. These instructions both check
the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents
of the FP registers are not secret) they execute as no-ops.

This causes a problem when CONTROL_S.SFPA==0 before the call, which
happens if there are no floating-point instructions executed between
entry to secure state and the call. If this is the case, then the VLSTM
instruction will do nothing, leaving the save area in the stack
uninitialised. If the called function returns a value in floating-point
registers, the call sequence includes an instruction to copy the return
value from a floating-point register to a GPR, which must be before the
VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM
will fully execute, and load the uninitialised stack memory into the FP
registers.

This causes two problems:
* The FP register file is clobbered, including all of the callee-saved
  registers, which might contain live values.
* The stack region might contain secret values, which will be leaked to
  non-secure state through the floating-point registers if/when we
  return to non-secure state.

The fix is to insert a `vmov s0, s0` instruction before the VLSTM
instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and
VLSTM instruction.

CVE: https://www.cve.org/cverecord?id=CVE-2024-7883
Security bulletin:
https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability
2024-11-01 09:36:13 +00:00
Benjamin Maxwell
c3260c65e8
[IR] Add llvm.sincos intrinsic (#109825)
This adds the `llvm.sincos` intrinsic, legalization, and lowering.

The `llvm.sincos` intrinsic takes a floating-point value and returns
both the sine and cosine (as a struct).

```
declare { float, float }          @llvm.sincos.f32(float  %Val)
declare { double, double }        @llvm.sincos.f64(double %Val)
declare { x86_fp80, x86_fp80 }    @llvm.sincos.f80(x86_fp80  %Val)
declare { fp128, fp128 }          @llvm.sincos.f128(fp128 %Val)
declare { ppc_fp128, ppc_fp128 }  @llvm.sincos.ppcf128(ppc_fp128  %Val)
declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float>  %Val)
```

The lowering is built on top of the existing FSINCOS ISD node, with
additional type legalization to allow for f16, f128, and vector values.
2024-10-29 10:52:20 +00:00
Serge Pavlov
819abe412d
[Test] Fix usage of constrained intrinsics (#113523)
Some tests contain errors in constrained intrinsic usage, such as missed
or extra type parameters, wrong type parameters order and some other.

---------

Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>
2024-10-28 14:07:32 +07:00
Oliver Stannard
376d7b27fa [ARM] Optimise byval arguments in tail-calls
We don't need to copy byval arguments to tail calls via a temporary, if
we can prove that we are not copying from the outgoing argument area.
This patch does this when the source if the argument is one of:
* Memory in the local stack frame, which can't be used for tail-call
  arguments.
* A global variable.

We can also avoid doing the copy completely if the source and
destination are the same memory location, which is the case when the
caller and callee have the same signature, and pass some arguments
through unmodified.
2024-10-25 09:34:09 +01:00
Oliver Stannard
914a3990d1 [ARM] Avoid clobbering byval arguments when passing to tail-calls
When passing byval arguments to tail-calls, we need to store them into
the stack memory in which this the caller received it's arguments. If
any of the outgoing arguments are forwarded from incoming byval
arguments, then the source of the copy is from the same stack memory.
This can result in the copy corrupting a value which is still to be
read.

The fix is to first make a copy of the outgoing byval arguments in local
stack space, and then copy them to their final location. This fixes the
correctness issue, but results in extra copying, which could be
optimised.
2024-10-25 09:34:09 +01:00
Oliver Stannard
78ec2e2ed5 [ARM] Allow tail calls with byval args
Byval arguments which are passed partially in registers get stored into
the local stack frame, but it is valid to tail-call them because the
part which gets spilled is always re-loaded into registers before doing
the tail-call, so it's OK for the spill area to be deallocated.
2024-10-25 09:34:08 +01:00
Oliver Stannard
82e6472197 [ARM] Allow functions with sret returns to be tail-called
It is valid to tail-call a function which returns through an sret
argument, as long as we have an incoming sret pointer to pass on.
2024-10-25 09:34:08 +01:00
Oliver Stannard
c1eb790cd2 [ARM] Tail-calls do not require caller and callee arguments to match
The ARM backend was checking that the outgoing values for a tail-call
matched the incoming argument values of the caller. This isn't
necessary, because the caller can change the values in both registers
and the stack before doing the tail-call. The actual limitation is that
the callee can't need more stack space for it's arguments than the
caller does.

This is needed for code using the musttail attribute, as well as
enabling tail calls as an optimisation in more cases.
2024-10-25 09:34:08 +01:00
Oliver Stannard
e3f218096c [ARM] Re-generate a test 2024-10-25 09:34:07 +01:00
Vladimir Radosavljevic
401d123a1f
[MCP] Optimize copies when src is used during backward propagation (#111130)
Before this patch, redundant COPY couldn't be removed for the following
case:
```
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0
```
This patch adds support for tracking the users of the source register
during backward propagation, so that we can remove the redundant COPY in
the above case and optimize it to:
```
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1
```
2024-10-23 13:37:02 +02:00
David Spickett
dd76d9b1bb
[llvm][ARM] Correct the properties of trap instructions (#113287)
Fixes #113154

The encodings used for llvm.trap() on ARM were all marked as barriers
and terminators. This lead to stack frame destroy code being inserted
before the trap if the trap was the last thing in the function and it
had no return statement.
```
void fn() {
  volatile int i = 0;
  __builtin_trap();
}
```
Produced:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        .inst   0xe7ffdefe  << trap
        bx      lr
```
All the other targets don't mark them this way, instead they mark them
with isTrap. I've changed ARM to do this, which fixes the code
generation:
```
fn:
        push    {r11, lr}   << stack frame create
<...>
        .inst   0xe7ffdefe  << trap
        mov     sp, r11
        pop     {r11, lr}   << stack frame destroy
        bx      lr
```
I've updated the existing trap test to force the need for a stack frame,
then check that the instruction immediately after the trap is resetting
the stack pointer.

debugtrap was already working but I've added the same checks for it
anyway.
2024-10-23 09:06:12 +01:00
Simon Pilgrim
94cddcfc1c [ARM] Add reduced regression test for infinite-loop due to #112710 2024-10-20 13:53:26 +01:00
Alex Rønne Petersen
5785cbb405
[llvm] Ensure that soft float targets don't emit fma() libcalls. (#106615)
The previous behavior could be harmful in some edge cases, such as
emitting a call to `fma()` in the `fma()` implementation itself.

Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`.
This was already done for PowerPC; this commit just extends that to Arm,
z/Arch, and x86. MIPS and SPARC already got it right, but I added tests
for them too, for good measure.

Note: I don't have commit access.
2024-10-19 06:13:15 -07:00
Alex Rønne Petersen
ad4a582fd9
[llvm] Consistently respect naked fn attribute in TargetFrameLowering::hasFP() (#106014)
Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.

Note: I don't have commit access.
2024-10-18 09:35:42 +04:00
gxlayer
4a2bd78f5b
[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM (#109628)
The -mno-omit-leaf-frame-pointer flag works on 32-bit ARM architectures
and addresses the bug reported in #108019
2024-10-17 20:25:06 +08:00
Albert Huang
aa2c0f35a1
[ARM] [AArch32] Add support for Arm China STAR-MC1 CPU (#110085)
STAR-MC1 is an Armv8m CPU.

Technical specifications available at:

https://www.armchina.com/download/Documents/Application-Notes/Technical-Reference-Manual?infoId=160
2024-10-14 15:48:12 +01:00
Akshat Oke
8b20f1b924
[MIR] Fix tests for flags in register info (#112179)
[MIR] Serialize virtual register flags #110228 introduces register flags
which appear empty in .mir dumps. Future tests should use
`-simplify-mir`.
2024-10-14 18:28:54 +05:30
Serge Pavlov
52e5683ddd
[GlobalISel][ARM] Legalization of G_CONSTANT using constant pool (#98308)
ARM uses complex encoding of immediate values using small number of
bits. As a result, some values cannot be represented as immediate
operands, they need to be synthesized in a register. This change
implements legalization of such constants with loading values from
constant pool.

---------

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2024-10-14 16:40:21 +07:00
Oliver Stannard
1e49670b31
[DAGISel] Keep flags when converting FP load/store to integer (#111679)
This DAG combine replaces a floating-point load/store pair which has no
other uses with an integer one, but did not copy the memory operand
flags to the new instructions, resulting in it dropping the volatile
flag. This optimisation is still valid if one or both of the
instructions is volatile, so we can copy over the whole
MachineMemOperand to generate volatile integer loads and stores where
needed.
2024-10-10 09:17:50 +01:00
YunQiang Su
d52c8408ff
SelectionDAG/expandFMINNUM_FMAXNUM: skips vector if SETCC/VSELECT is not legal (#109570)
If SETCC or VSELECT is not legal for vector, we should not expand it,
instead we can split the vectors.

So that, some simple scale instructions can be emitted instead of
some pairs of comparation+selection.
2024-10-10 08:39:25 +08:00
Ard Biesheuvel
2e47b93fd2
[ARM] Honour -mno-movt in stack protector handling (#109022)
When -mno-movt is passed to Clang, the ARM codegen correctly avoids
movt/movw pairs to take the address of __stack_chk_guard in the stack
protector code emitted into the function pro- and epilogues. However,
the Thumb2 codegen fails to do so, and happily emits movw/movt pairs
unless it is generating an ELF binary and the symbol might be in a
different DSO. Let's incorporate a check for useMovt() in the logic
here, so movt/movw are never emitted when -mno-movt is specified.

Suggestions welcome for how/where to add a test case for this.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-10-09 09:34:17 -07:00
Ramkumar Ramachandra
3fee3e83a8
KnownBits: refine srem for high-bits (#109121)
KnownBits::srem does not correctly set the leader zero-bits, omitting
the fact that LHS may be known-negative or known-non-negative. Fix this.

Alive2 proof: https://alive2.llvm.org/ce/z/Ugh-Dq
2024-09-27 12:00:50 +01:00