llvm-project

Author	SHA1	Message	Date
David Green	ab9a80a3ad	[DAG] Allow AssertZExt to scalarize. (#122463 ) With range and undef metadata on a call we can have vector AssertZExt generated on a target with no vector operations. The AssertZExt needs to scalarize to a normal `AssertZext tin, ValueType`. I have added AssertSext too, although I do not have a test case. Fixes #110374	2025-01-11 16:29:06 +00:00
Antonio Frighetto	446a426436	[ARM] Record store with pre/post-indexed addressing as `mayStore` A miscompilation issue observed during machine sinking has been addressed with improved handling. Fixes: https://github.com/llvm/llvm-project/issues/121299.	2025-01-07 09:39:05 +01:00
Antonio Frighetto	7810e6a3a8	[ARM] Introduce test for PR121565 (NFC)	2025-01-07 09:39:05 +01:00
Björn Pettersson	3ad2399148	[DAGCombiner] Refactor and improve ReduceLoadOpStoreWidth (#119564 ) This patch make a couple of improvements to ReduceLoadOpStoreWidth. When determining the minimum size of "NewBW" we now take byte boundaries into account. If we for example touch bits 6-10 we shouldn't accept NewBW=8, because we would fail later when detecting that we can't access bits from two different bytes in memory using a single load. Instead we make sure to align LSB/MSB according to byte size boundaries up front before searching for a viable "NewBW". In the past we only tried to find a "ShAmt" that was a multiple of "NewBW", but now we use a sliding window technique to scan for a viable "ShAmt" that is a multiple of the byte size. This can help out finding more opportunities for optimization (specially if the original type isn't byte sized, and for big-endian targets when the original load/store is aligned on the most significant bit).	2024-12-16 12:15:11 +01:00
Pengcheng Wang	da71203e6f	[MISched] Unify the way to specify scheduling direction (#119518 ) For pre-ra scheduling, we use two options `-misched-topdown` and `-misched-bottomup` to force the direction. While for post-ra scheduling, we use `-misched-postra-direction` with enumerated values (`topdown`, `bottomup` and `bidirectional`). This is not unified and adds some mental burdens. Here we replace these two options `-misched-topdown` and `-misched-bottomup` with `-misched-prera-direction` with the same enumerated values. To avoid the condition of `getNumOccurrences() > 0`, we add a new enum value `Unspecified` and make it the default initial value. These options are hidden, so we needn't keep the compatibility.	2024-12-12 11:24:07 +08:00
Bjorn Pettersson	22780f808a	[DAGCombiner] Fix to avoid writing outside original store in ReduceLoadOpStoreWidth (#119203 ) DAGCombiner::ReduceLoadOpStoreWidth could replace memory accesses with more narrow loads/store, although sometimes the new load/store would touch memory outside the original object. That seemed wrong and this patch is simply avoiding doing the DAG combine in such situations. Also simplifying the expression used to align ShAmt down to a multiple of NewBW. Subtracting (ShAmt % NewBW) should do the same thing as the old more complicated expression. Intention is to follow up with a patch that make more attempts, trying to align the memory accesses at other offsets, allowing to trigger the transform in more situations. The current strategy for deciding size (NewBW) and offset (ShAmt) for the narrowed operations are a bit ad-hoc, and not really considering big endian memory order in same way as little endian.	2024-12-11 15:07:16 +01:00
Bjorn Pettersson	bc1f3eb593	[DAGCombiner] Pre-commit test case for ReduceLoadOpStoreWidth. NFC Adding test cases related to narrowing of load-op-store sequences. ReduceLoadOpStoreWidth isn't careful enough, so it may end up creating load/store operations that access memory outside the region touched by the original load/store. Using ARM as a target for the test cases to show what happens for both little-endian and big-endian. This patch also adds a way to override the TLI.isNarrowingProfitable check in DAGCombiner::ReduceLoadOpStoreWidth by using the option -combiner-reduce-load-op-store-width-force-narrowing-profitable. Idea is that it should be simpler to for example add lit tests verifying that the code is correct for big-endian (which otherwise is difficult since there are no in-tree big-endian targets that is overriding TLI.isNarrowingProfitable). This is a pre-commit for https://github.com/llvm/llvm-project/pull/119203	2024-12-11 15:07:15 +01:00
Sergei Barannikov	e0ed0333f0	Reland "[ARM] Stop gluing ALU nodes to branches / selects" (#118887 ) Re-landing #116970 after fixing miscompilation error. The original change made it possible for CMPZ to have multiple uses; `ARMDAGToDAGISel::SelectCMPZ` was not prepared for this. Pull Request: https://github.com/llvm/llvm-project/pull/118887 Original commit message: Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change.	2024-12-07 10:14:36 +03:00
Oliver Stannard	2d8e8dd2b8	[ARM] Add Cortex-A510 CPU for AArch32 (#118811 ) This core was originally AArch64-only, but the r1p0 revision added optional support for AArch32 at EL0. TRM: https://developer.arm.com/documentation/101604/0103	2024-12-06 08:51:22 +00:00
Oliver Stannard	99b862efba	[DAGISel][ARM] Fix vector truncate combine for big-endian (#118101 ) This DAG combine was incorrect for big-endian targets, because it assumes that when a bitcast changes the lane width, the least-significant bits of the wider lanes are in the lower-numbered lanes of the smaller type, which is only true for little-endian.	2024-12-04 14:32:15 +00:00
Simon Pilgrim	e6eac65ad6	[ARM] 2012-03-13-DAGCombineBug.ll - regenerate checks	2024-12-02 11:46:49 +00:00
Martin Storsjö	2a5e1da57a	Revert "[ARM] Stop gluing ALU nodes to branches / selects" (#118232 ) Reverts llvm/llvm-project#116970. This change broke Wine compiled for armv7, causing segfaults when starting Wine. See llvm/llvm-project#116970 for more detailed discussion about the issue.	2024-12-02 00:02:25 +02:00
Sergei Barannikov	a348f223ca	[ARM] Stop gluing ALU nodes to branches / selects (#116970 ) Following #116547 and #116676, this PR changes the type of results and operands of some nodes to accept / return a normal type instead of Glue. Unfortunately, changing the result type of one node requires changing the operand types of all potential consumer nodes, which in turn requires changing the result types of all other possible producer nodes. So this is a bulk change. Pull Request: https://github.com/llvm/llvm-project/pull/116970	2024-11-30 08:14:24 +03:00
Sergei Barannikov	61a23646c9	[SjLjEHPrepare] Configure call sites correctly (#117656 ) After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site` before all invoke instruction except the one in the entry block, which has the effect of bypassing landing pads on exceptions. When configuring the call site for a potentially throwing instruction check that it is not `InvokeInst` -- they are handled by earlier code.	2024-11-27 08:03:47 +03:00
Sergei Barannikov	ad9dcd96dc	Reland "[ARM] Stop gluing FP comparisons to FMSTAT" (#117248 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. `TRI::getCrossCopyRegClass` is modified in a way that prevents DAG scheduler from copying FPSCR into a virtual register. The register allocator might need to spill the virtual register, but that only seem to work in Thumb mode.	2024-11-22 22:29:58 +03:00
Sergei Barannikov	5d32a1409d	Revert "[ARM] Stop gluing FP comparisons to FMSTAT" (#117175 ) Reverts llvm/llvm-project#116676 Reverting per post-commit feedback (causes miscompilation errors and/or assertion failures).	2024-11-21 18:26:53 +03:00
Sergei Barannikov	8c56dd3040	[ARM] Stop gluing FP comparisons to FMSTAT (#116676 ) Following #116547, this changes the result of `ARMISD::CMPFP*` and the operand of `ARMISD::FMSTAT` from a special `Glue` type to a normal type. This change allows comparisons to be CSEd and scheduled around as can be seen in the test changes. Note that `ARMISD::FMSTAT` is still glued to its consumer nodes; this is going to be changed in a separate patch. This patch also sets `CopyCost` of `cl_FPSCR_NZCV` register class to a negative value. The reason is the same as for CCR register class: it makes DAG scheduler and InstrEmitter try to avoid copies of `FPCSR_NZCV` register to / from virtual registers. Previously, this was not necessary, since no attempt was made to create copies in the first place. There might be a case when a copy can't be avoided (although not found in existing tests). If a copy is necessary, the virtual register will be created with `cl_FPSCR_NZCV` register class. If this register class is inappropriate, `TRI::getCrossCopyRegClass` should be modified to return the correct class. Pull Request: https://github.com/llvm/llvm-project/pull/116676	2024-11-20 16:07:05 +03:00
Sergei Barannikov	aff98e4be0	[ARM] Stop gluing 1-bit shifts (#116547 ) 1. When two (or more) nodes are glued, DAG scheduler will always schedule them as one piece, i.e. it will not allow any instructions to be scheduled between them. It does so because if nodes are glued this usually means that there is an implicit register dependency between them, and an intervening node could clobber this physical register. When emitting such nodes into machine IR, they will also be stuck together, e.g.: ``` %9:gpr = MOVsrl_glue killed %8, implicit-def $cpsr %10:gpr = RRX %3, implicit $cpsr ``` 2. If a node has Glue result, SelectionDAG will not try to CSE this node. If it did, it would break the implicit physical register dependency. In practice this means that if a node with Glue result has multiple uses, it has to be duplicated before each use. This the reason for `ARMTargetLowering::duplicateCmp` to exist. When using normal data dependency, dependent nodes can freely be scheduled around. If there is a physical register dependency between nodes, the physical register will be copied to/from a virtual register, allowing other nodes to intervene between them. The resulting machine IR might look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = COPY $cpsr %11:gpr = ORRrsi killed %9, %3, 242, 14 /* CC::al /, $noreg, $noreg %12:gpr = BICri killed %11, -2147483648, 14 / CC::al /, $noreg, $noreg $cpsr = COPY %10 %13:gpr = RRX %3, implicit $cpsr ``` The two copies are likely to be eliminated by register coalescer, given that there are no instructions between them that clobber this physical register. If the copies are unwanted in the first place (they could be expensive or impossible), DAG scheduler will try to avoid inserting them wherever possible, and the resulting machine IR will look like this: ``` %9:gpr = LSRs1 killed %8, implicit-def $cpsr %10:gpr = ORRrsi killed %9, %3, 242, 14 / CC::al /, $noreg, $noreg %11:gpr = BICri killed %10, -2147483648, 14 / CC::al */, $noreg, $noreg %12:gpr = RRX %3, implicit $cpsr ``` On ARM, arithmetic operations and LSLS already use the new data flow approach. This patch extends it to include 1-bit shifts. Pull Request: https://github.com/llvm/llvm-project/pull/116547	2024-11-19 17:46:48 +03:00
Akshat Oke	3f9d02aae8	[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326 ) With this, all machine SSA optimization passes are available in the new codegen pipeline.	2024-11-18 11:02:01 +05:30
Serge Pavlov	f97f96492d	[GlobalISel][ARM] Legalize reset_fpmode (#115859 ) Implement lowering intrinsic `reset_fpmode` in Global Selector for ARM target.	2024-11-16 17:21:33 +07:00
Tex Riddell	5c2a133b13	Emit constrained atan2 intrinsic for clang builtin (#113636 ) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 - `Builtins.td` - Add f16 support for libm atan2 builtin - `CGBuiltin.cpp` - Emit constraint atan2 intrinsic for clang builtin - `clang/test/CodeGenCXX/builtin-calling-conv.cpp` - Use erff instead of atan2 for clang builtin to lib call calling convention check, now that atan2 maps to an intrinsic. - add atan2 cases to llvm.experimental.constrained tests for more backends: ARM, PowerPC, RISCV, SystemZ. - LangRef.rst: add llvm.experimental.constrained.atan2, revise llvm.atan2 description. Last part of Implement the atan2 HLSL Function. Fixes #70096.	2024-11-12 13:34:29 -08:00
abhishek-kaushik22	d2aff182d3	Revert "TLS loads opimization (hoist)" (#114740 ) This reverts commit c31014322c0b5ae596da129cbb844fb2198b4ef4. Based on the discussions in #112772, this pass is not needed after the introduction of `llvm.threadlocal.address` intrinsic. Fixes https://github.com/llvm/llvm-project/issues/112771.	2024-11-07 10:10:28 +01:00
Paul Walker	38fffa630e	[LLVM][IR] Use splat syntax when printing Constant[Data]Vector. (#112548 )	2024-11-06 11:53:33 +00:00
Oliver Stannard	2d56de9e7e	Revert "[ARM] Add extra tests for CVE-2024-7883 with undef/poison" Reverting because this causes a test failure in the expensive-checks buildbot. This reverts commit ed9dab67e2932baf11bfa514b07b159c3bffd518.	2024-11-06 10:35:44 +00:00
Oliver Stannard	ed9dab67e2	[ARM] Add extra tests for CVE-2024-7883 with undef/poison	2024-11-06 09:28:14 +00:00
Jon Roelofs	4c3e1e3c4a	[llvm][AsmPrinter] Add an option to print instruction latencies (#113243 ) ... matching what we have in the disassembler. This isn't turned on by default since several of the scheduling models are not completely accurate, and we don't want to be misleading.	2024-11-05 17:28:52 -08:00
Simon Pilgrim	aef0e77c76	[DAG] visitAND - Fold (and (srl X, C), 1) -> (srl X, BW-1) for signbit extraction (#114992 ) If we're masking the LSB of a SRL node result and that is shifting down an extended sign bit, see if we can change the SRL to shift down the MSB directly. These patterns can occur during legalisation when we've sign extended to a wider type but the SRL is still shifting from the subreg. Alternative to #114967 Fixes the remaining regression in #112588	2024-11-05 14:42:15 +00:00
Yingwei Zheng	917b3d13b5	[SDAG] Intersect poison-generating flags after CSE (#114650 ) This patch intersects poison-generating flags after CSE to fix assertion failure reported in https://github.com/llvm/llvm-project/pull/112354#issuecomment-2452369552. Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>	2024-11-02 19:06:27 +08:00
Oliver Stannard	33411d5207	[ARM] Fix CMSE S->NS calls when CONTROL_S.SFPA==0 (CVE-2024-7883) (#114433 ) When doing a call from CMSE secure state to non-secure state for v8-M.main, we use the VLLDM and VLSTM instructions to save, clear and restore the FP registers around the call. These instructions both check the CONTROL_S.SFPA bit, and if it is clear (meaning the current contents of the FP registers are not secret) they execute as no-ops. This causes a problem when CONTROL_S.SFPA==0 before the call, which happens if there are no floating-point instructions executed between entry to secure state and the call. If this is the case, then the VLSTM instruction will do nothing, leaving the save area in the stack uninitialised. If the called function returns a value in floating-point registers, the call sequence includes an instruction to copy the return value from a floating-point register to a GPR, which must be before the VLLDM instruction. This copy sets CONTROL_S.SFPA, meaning that the VLLDM will fully execute, and load the uninitialised stack memory into the FP registers. This causes two problems: * The FP register file is clobbered, including all of the callee-saved registers, which might contain live values. * The stack region might contain secret values, which will be leaked to non-secure state through the floating-point registers if/when we return to non-secure state. The fix is to insert a `vmov s0, s0` instruction before the VLSTM instruction, to ensure that CONTROL_S.SFPA is set for both the VLLDM and VLSTM instruction. CVE: https://www.cve.org/cverecord?id=CVE-2024-7883 Security bulletin: https://developer.arm.com/Arm%20Security%20Center/Cortex-M%20Security%20Extensions%20Vulnerability	2024-11-01 09:36:13 +00:00
Benjamin Maxwell	c3260c65e8	[IR] Add `llvm.sincos` intrinsic (#109825 ) This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.	2024-10-29 10:52:20 +00:00
Serge Pavlov	819abe412d	[Test] Fix usage of constrained intrinsics (#113523 ) Some tests contain errors in constrained intrinsic usage, such as missed or extra type parameters, wrong type parameters order and some other. --------- Co-authored-by: Andy Kaylor <andy_kaylor@yahoo.com>	2024-10-28 14:07:32 +07:00
Oliver Stannard	376d7b27fa	[ARM] Optimise byval arguments in tail-calls We don't need to copy byval arguments to tail calls via a temporary, if we can prove that we are not copying from the outgoing argument area. This patch does this when the source if the argument is one of: * Memory in the local stack frame, which can't be used for tail-call arguments. * A global variable. We can also avoid doing the copy completely if the source and destination are the same memory location, which is the case when the caller and callee have the same signature, and pass some arguments through unmodified.	2024-10-25 09:34:09 +01:00
Oliver Stannard	914a3990d1	[ARM] Avoid clobbering byval arguments when passing to tail-calls When passing byval arguments to tail-calls, we need to store them into the stack memory in which this the caller received it's arguments. If any of the outgoing arguments are forwarded from incoming byval arguments, then the source of the copy is from the same stack memory. This can result in the copy corrupting a value which is still to be read. The fix is to first make a copy of the outgoing byval arguments in local stack space, and then copy them to their final location. This fixes the correctness issue, but results in extra copying, which could be optimised.	2024-10-25 09:34:09 +01:00
Oliver Stannard	78ec2e2ed5	[ARM] Allow tail calls with byval args Byval arguments which are passed partially in registers get stored into the local stack frame, but it is valid to tail-call them because the part which gets spilled is always re-loaded into registers before doing the tail-call, so it's OK for the spill area to be deallocated.	2024-10-25 09:34:08 +01:00
Oliver Stannard	82e6472197	[ARM] Allow functions with sret returns to be tail-called It is valid to tail-call a function which returns through an sret argument, as long as we have an incoming sret pointer to pass on.	2024-10-25 09:34:08 +01:00
Oliver Stannard	c1eb790cd2	[ARM] Tail-calls do not require caller and callee arguments to match The ARM backend was checking that the outgoing values for a tail-call matched the incoming argument values of the caller. This isn't necessary, because the caller can change the values in both registers and the stack before doing the tail-call. The actual limitation is that the callee can't need more stack space for it's arguments than the caller does. This is needed for code using the musttail attribute, as well as enabling tail calls as an optimisation in more cases.	2024-10-25 09:34:08 +01:00
Oliver Stannard	e3f218096c	[ARM] Re-generate a test	2024-10-25 09:34:07 +01:00
Vladimir Radosavljevic	401d123a1f	[MCP] Optimize copies when src is used during backward propagation (#111130 ) Before this patch, redundant COPY couldn't be removed for the following case: ``` $R0 = OP ... ... // Read of %R0 $R1 = COPY killed $R0 ``` This patch adds support for tracking the users of the source register during backward propagation, so that we can remove the redundant COPY in the above case and optimize it to: ``` $R1 = OP ... ... // Replace all uses of %R0 with $R1 ```	2024-10-23 13:37:02 +02:00
David Spickett	dd76d9b1bb	[llvm][ARM] Correct the properties of trap instructions (#113287 ) Fixes #113154 The encodings used for llvm.trap() on ARM were all marked as barriers and terminators. This lead to stack frame destroy code being inserted before the trap if the trap was the last thing in the function and it had no return statement. ``` void fn() { volatile int i = 0; __builtin_trap(); } ``` Produced: ``` fn: push {r11, lr} << stack frame create <...> mov sp, r11 pop {r11, lr} << stack frame destroy .inst 0xe7ffdefe << trap bx lr ``` All the other targets don't mark them this way, instead they mark them with isTrap. I've changed ARM to do this, which fixes the code generation: ``` fn: push {r11, lr} << stack frame create <...> .inst 0xe7ffdefe << trap mov sp, r11 pop {r11, lr} << stack frame destroy bx lr ``` I've updated the existing trap test to force the need for a stack frame, then check that the instruction immediately after the trap is resetting the stack pointer. debugtrap was already working but I've added the same checks for it anyway.	2024-10-23 09:06:12 +01:00
Simon Pilgrim	94cddcfc1c	[ARM] Add reduced regression test for infinite-loop due to #112710	2024-10-20 13:53:26 +01:00
Alex Rønne Petersen	5785cbb405	[llvm] Ensure that soft float targets don't emit `fma()` libcalls. (#106615 ) The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.	2024-10-19 06:13:15 -07:00
Alex Rønne Petersen	ad4a582fd9	[llvm] Consistently respect `naked` fn attribute in `TargetFrameLowering::hasFP()` (#106014 ) Some targets (e.g. PPC and Hexagon) already did this. I think it's best to do this consistently so that frontend authors don't run into inconsistent results when they emit `naked` functions. For example, in Zig, we had to change our emit code to also set `frame-pointer=none` to get reliable results across targets. Note: I don't have commit access.	2024-10-18 09:35:42 +04:00
gxlayer	4a2bd78f5b	[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM (#109628 ) The -mno-omit-leaf-frame-pointer flag works on 32-bit ARM architectures and addresses the bug reported in #108019	2024-10-17 20:25:06 +08:00
Albert Huang	aa2c0f35a1	[ARM] [AArch32] Add support for Arm China STAR-MC1 CPU (#110085 ) STAR-MC1 is an Armv8m CPU. Technical specifications available at: https://www.armchina.com/download/Documents/Application-Notes/Technical-Reference-Manual?infoId=160	2024-10-14 15:48:12 +01:00
Akshat Oke	8b20f1b924	[MIR] Fix tests for flags in register info (#112179 ) [MIR] Serialize virtual register flags #110228 introduces register flags which appear empty in .mir dumps. Future tests should use `-simplify-mir`.	2024-10-14 18:28:54 +05:30
Serge Pavlov	52e5683ddd	[GlobalISel][ARM] Legalization of G_CONSTANT using constant pool (#98308 ) ARM uses complex encoding of immediate values using small number of bits. As a result, some values cannot be represented as immediate operands, they need to be synthesized in a register. This change implements legalization of such constants with loading values from constant pool. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>	2024-10-14 16:40:21 +07:00
Oliver Stannard	1e49670b31	[DAGISel] Keep flags when converting FP load/store to integer (#111679 ) This DAG combine replaces a floating-point load/store pair which has no other uses with an integer one, but did not copy the memory operand flags to the new instructions, resulting in it dropping the volatile flag. This optimisation is still valid if one or both of the instructions is volatile, so we can copy over the whole MachineMemOperand to generate volatile integer loads and stores where needed.	2024-10-10 09:17:50 +01:00
YunQiang Su	d52c8408ff	SelectionDAG/expandFMINNUM_FMAXNUM: skips vector if SETCC/VSELECT is not legal (#109570 ) If SETCC or VSELECT is not legal for vector, we should not expand it, instead we can split the vectors. So that, some simple scale instructions can be emitted instead of some pairs of comparation+selection.	2024-10-10 08:39:25 +08:00
Ard Biesheuvel	2e47b93fd2	[ARM] Honour -mno-movt in stack protector handling (#109022 ) When -mno-movt is passed to Clang, the ARM codegen correctly avoids movt/movw pairs to take the address of __stack_chk_guard in the stack protector code emitted into the function pro- and epilogues. However, the Thumb2 codegen fails to do so, and happily emits movw/movt pairs unless it is generating an ELF binary and the symbol might be in a different DSO. Let's incorporate a check for useMovt() in the logic here, so movt/movw are never emitted when -mno-movt is specified. Suggestions welcome for how/where to add a test case for this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2024-10-09 09:34:17 -07:00
Ramkumar Ramachandra	3fee3e83a8	KnownBits: refine srem for high-bits (#109121 ) KnownBits::srem does not correctly set the leader zero-bits, omitting the fact that LHS may be known-negative or known-non-negative. Fix this. Alive2 proof: https://alive2.llvm.org/ce/z/Ugh-Dq	2024-09-27 12:00:50 +01:00

1 2 3 4 5 ...

5010 Commits