llvm-project

Author	SHA1	Message	Date
Philip Reames	b5657d6dc7	[RISCV] Reverse default assumption about performance of vlseN.v vd, (rs1), x0 (#98205 ) Some cores implement an optimization for a strided load with an x0 stride, which results in fewer memory operations being performed then implied by VL since all address are the same. It seems to be the case that this is the case only for a minority of available implementations. We know that sifive-x280 does, but sifive-p670 and spacemit-x60 both do not. (To be more precise, measurements on the x60 appear to indicate that a stride of x0 has similar latency to a non-zero stride, and that both are about twice a vleN.v. I'm taking this to mean the x0 case is not optimized.) We had an existing flag by which a processor could opt out of this assumption but no upstream users. Instead of adding this flag to the p670 and x60, this patch reverses the default and adds the opt-in flag only to the x280.	2024-07-10 07:35:56 -07:00
Alex Bradbury	f8dbe1d09d	Revert "[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default" (#98328 ) Reverts llvm/llvm-project#89927 while we investigate performance regressions reported by @dtcxzyw	2024-07-10 15:33:20 +01:00
Allen	d1006315b5	[AArch64] Lower for power of 2 signed divides with scalar type (#97879 ) Expected same assemble for code which doesn't use sve registers when we compile it with/without -msve-vector-bits=256. Fix https://github.com/llvm/llvm-project/issues/97821	2024-07-10 21:52:09 +08:00
Alex Bradbury	af47a4ec50	[RISCV] Enable TTI::shouldDropLSRSolutionIfLessProfitable by default (#89927 ) This avoids some cases where LSR produces results that lead to very poor codegen. There's a chance we'll see minor degradations for some inputs in the case that our metrics say the found solution is worse, but in reality it's better than the starting point. Per the review thread, at least one vendor has been enabling this by defualt for some time and found overall it's an improvement. As such, we'll enable by default and aim to fix any as-yet-unknown regressions in-tree.	2024-07-10 13:23:31 +01:00
paperchalice	abde52aa66	[CodeGen][NewPM] Port `LiveIntervals` to new pass manager (#98118 ) - Add `LiveIntervalsAnalysis`. - Add `LiveIntervalsPrinterPass`. - Use `LiveIntervalsWrapperPass` in legacy pass manager. - Use `std::unique_ptr` instead of raw pointer for `LICalc`, so destructor and default move constructor can handle it correctly. This would be the last analysis required by `PHIElimination`.	2024-07-10 19:34:48 +08:00
Fabian Ritter	17316a5989	Revert "[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy" (#98295 ) Reverts llvm/llvm-project#97998 This seems to cause a buildbot failure on clang-hip-vega20, in the HIP test-suite, need to investigate.	2024-07-10 12:16:20 +02:00
Daniel Kiss	1782810b84	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option. Releand with test fixes.	2024-07-10 11:32:41 +02:00
Fabian Ritter	6c84bba218	[LowerMemIntrinsics] Use correct alignment in residual loop for variable llvm.memcpy (#97998 ) Memcpy intrinsics with statically unknown loop sizes are lowered with two load/store loops: one with access widths specified by the target, and a residual loop that copies remaining bytes individually. As the residual loop operates byte-wise, its accesses are only 1-aligned. However, we currently use the alignment that is optimal for the first loop in both, which is unsound. With this patch, we use the correct alignment in the residual loop. The lowering of memcpy with a static size already handles alignments for the residual correctly.	2024-07-10 11:29:26 +02:00
Madhur Amilkanthwar	42672199ec	[GISel][AArch64] Libcall support for G_FPEXT 128-bit types (#97735 ) This patch adds support for generating libcall for 128-bit types of G_FPEXT. This fixes ~10 fallbacks in RajaPerf benchmark.	2024-07-10 14:58:24 +05:30
Luke Lau	8ab19d2e70	[RISCV] Add -verify-machineinstrs to RISCVInsertVSETVLI MIR tests. NFC Now that we're working with LiveIntervals, make sure that they're correct.	2024-07-10 16:30:57 +08:00
Daniel Kiss	4b2daeccc7	Revert "[Clang][ARM][AArch64] Alway emit protection attributes for functions." (#98284 ) Reverts llvm/llvm-project#82819	2024-07-10 10:22:38 +02:00
Daniel Kiss	e15d67cfc2	[Clang][ARM][AArch64] Alway emit protection attributes for functions. (#82819 ) So far branch protection, sign return address, guarded control stack attributes are only emitted as module flags to indicate the functions need to be generated with those features. The problem is in case of an LTO build the module flags are merged with the `min` rule which means if one of the module is not build with sign return address then the features will be turned off for all functions. Due to the functions take the branch-protection and sign-return-address features from the module flags. The sign-return-address is function level option therefore it is expected functions from files that is compiled with -mbranch-protection=pac-ret to be protected. The inliner might inline functions with different set of flags as it doesn't consider the module flags. This patch adds the attributes to all functions and drops the checking of the module flags for the code generation. Module flag is still used for generating the ELF markers. Also drops the "true"/"false" values from the branch-protection-enforcement, branch-protection-pauth-lr, guarded-control-stack attributes as presence of the attribute means it is on absence means off and no other option.	2024-07-10 10:06:14 +02:00
Jianjian Guan	9af1f8fbad	[RISCV] Match vector fp-int convert intrinsics with specific RTZ rounding mode to the rtz variants (#98120 )	2024-07-10 10:51:20 +08:00
Anatoly Trosinenko	a937d2918e	[AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (#98062 ) Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xN for BLR thunks __llvm_slsblr_thunk_(aaz\|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa\|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead. This patch reapplies llvm/llvm-project#97605.	2024-07-09 22:51:49 +03:00
Daniil Kovalev	746f572615	[test][PAC][AArch64] Add ELF tests for subtarget-neutral codegen (#98020 ) Many parts of PAuth-related codegen are not MachO- or ELF-specific. Add RUN lines against ELF targets to ensure that codegen works for ELF as well as for MachO.	2024-07-09 21:00:55 +03:00
Min-Yih Hsu	7e2f96194f	[MachineSink] Fix missing sinks along critical edges (#97618 ) 4e0bd3f improved early MachineLICM's capabilities to hoist COPY from physical registers out of a loop. However, it accidentally broke one of MachineSink's preconditions on sinking cheap instructions (in this case, COPY) which considered those instructions being profitable to sink only when there are at least two of them in the same def-use chain in the same basic block. So if early MachineLICM hoisted one of them out, MachineSink no longer sink rest of the cheap instructions. This results in redundant load immediate instructions from the motivating example we've seen on RISC-V. This patch fixes this by teaching MachineSink that if there is more than one demand to sink a register into the same block from different critical edges, it should be considered profitable as it increases the CSE opportunities. This change also improves two of the AArch64's cases.	2024-07-09 10:48:22 -07:00
Philip Reames	90d79e258e	Reapply "[RISCV] Remove experimental from Ztso. (#96465 )" This was reverted in f985a8826bfa4ca3d23e654185de35e30ea6dc79. Since that, the default WMO lowering has moved to A67 compatible, the ABI attribute emission has landed (off by default), and the LLD change to merge said attributes have landed. Our ztso lowering is believed to also be A67 compatible, and no known issues remain. Original commit message: Ztso 1.0 was ratified in January 2023. Documentation: https://github.com/riscv/riscv-isa-manual/blob/main/src/ztso-st-ext.adoc	2024-07-09 10:45:56 -07:00
Min Hsu	4283566663	[test][MachineSink][RISCV] Pre-commit test for #97618	2024-07-09 10:44:44 -07:00
Shengchen Kan	a9183b8899	[X86][MC] Fix encoding bug for CCMP introduced in #85175	2024-07-09 20:12:47 +08:00
David Spickett	9856af634d	Revert "[AArch64][GlobalISel] Make G_DUP immediate 32-bits or larger (#96780 )" This reverts commit 5a5cd3f0bcdf37a32eadd85d6e57c642cb829402. Due to test suite failures on AArch64: https://lab.llvm.org/buildbot/#/builders/125/builds/541	2024-07-09 11:52:52 +00:00
Luke Lau	19cc46144d	[RISCV] Use VP strided load in concat_vectors combine (#98131 )	2024-07-09 18:36:00 +08:00
Shengchen Kan	a8a21bbec2	[X86][test] Pre-update test for the encoding bug introduced in #85175	2024-07-09 17:25:55 +08:00
Malay Sanghi	a77d3ea310	[X86][GlobalISel] Add instruction selection support for x87 ld/st (#97016 ) Add x87 G_LOAD/G_STORE selection support to existing C++ lowering.	2024-07-09 10:54:25 +02:00
Jianjian Guan	3259768557	[RISCV] Remove experimental for bf16 extensions (#97996 ) They are already ratified now.	2024-07-09 14:34:03 +08:00
Craig Topper	bb8998dd3b	[RISCV] Don't custom legalize vXf16 SPLAT_VECTOR with Zvfhmin without Zfhmin. Marking SPLAT_VECTOR as Custom enables generic DAGCombine to turn BUILD_VECTOR into SPLAT_VECTOR. We need to custom type legalize BUILD_VECTOR without Zfhmin since we don't have the scalar f16 type. If we allow SPLAT_VECTOR to be formed, we'll need to custom type legalize it too. Easiest fix is to only enable SPLAT_VECTOR with Zvfhmin+Zfhmin. There's still an issue that we need to properly support BUILD_VECTOR with Zvfhmin+Zfhmin. Should fix the new case reported in #97849. I've also changed the predicates to Zfhmin instead of ZfhminOrZhinxmin since Zhinx isn't compatible with Zvfhmin.	2024-07-08 22:44:58 -07:00
Carl Ritson	7eb1a320cc	[AMDGPU] Update EXECZ retention in SIPreEmitPeephole for GFX10/12 (#97676 ) The check to maintain EXECZ branches only checks S_WAITCNT. Add handling for new waitcnt instructions in GFX10 and GFX12.	2024-07-09 14:44:31 +09:00
Luke Lau	3f83a69bcb	[RISCV] Allow folding vmerge into masked ops when mask is the same (#97989 ) We currently only fold a vmerge into a masked true operand if the vmerge has an all-ones mask, since we end up keeping the mask from the true operand. But if the masks are the same then we can still fold, because vmerge and true have the same passthru. If an element was masked off in the original vmerge, it will also be masked off in the resulting true, and will have the same passthru value. The motivation for this is to lower masked VP loads and stores with passthrus to masked RVV instructions. Normally you can express a masked RVV instruction with a mask undisturbed passthru via a combination of a VP op with an all-ones mask and a vp.merge. But for loads and stores you need the same mask on the VP op as well as the vp.merge.	2024-07-09 12:12:02 +08:00
paperchalice	4010f894a1	[CodeGen][NewPM] Port `SlotIndexes` to new pass manager (#97941 ) - Add `SlotIndexesAnalysis`. - Add `SlotIndexesPrinterPass`. - Use `SlotIndexesWrapperPass` in legacy pass.	2024-07-09 12:09:11 +08:00
paperchalice	ac0b2814c3	[CodeGen][NewPM] Port `LiveVariables` to new pass manager (#97880 ) - Port `LiveVariables` to new pass manager. - Convert to `LiveVariablesWrapperPass` in legacy pass manager.	2024-07-09 10:50:43 +08:00
paperchalice	79d0de2ac3	[CodeGen][NewPM] Port `machine-loops` to new pass manager (#97793 ) - Add `MachineLoopAnalysis`. - Add `MachineLoopPrinterPass`. - Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.	2024-07-09 09:11:18 +08:00
Philip Reames	c95935789d	[RISCV] Directly use pack* in build_vector lowering (#98084 ) In 03d4332, we extended build_vector lowering to pack elements into the largest size which doesn't exceed either ELEN or XLEN. The zbkb extension - ratified under scalar crypto, but otherwise not really connected to crypto per se - adds the packh, packw, and pack instructions. These instructions are designed for exactly this pairwise packing. I ended up choosing to directly lower to machine nodes. A combination of the slightly non-uniform semantics of these instructions (packw sign extends the result, whereas packh zero extends it), and our generic dag canonicalization (which sinks shl through or nodes), make pattern matching these tricky and not particularly robust. Another alternative was to have an ISD node for them, but that didn't seem to add much in practice.	2024-07-08 16:10:25 -07:00
Philip Reames	07bb0444dd	[RISCV] Add build_vector coverage when zbkb is available An uncomping change will make much more complete use of packh, packw, and pack during element packing inside build_vector lowering.	2024-07-08 14:24:44 -07:00
Jon Roelofs	7f0d9bae9d	[llvm][AArch64] Fix a crash with an incorrect asm constraint (#98071 ) Fixes: rdar://130887714	2024-07-08 14:00:29 -07:00
Paul Kirth	a4fec164bf	Reapply "[llvm][RISCV] Enable trailing fences for seq-cst stores by default (#87376 )" (#90267 ) With the tag merging in place, we can safely change the default for +seq-cst-trailing-fence to the default, according to the recommendation in https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-atomic.adoc This patch changes the default for the feature flag, and moves to more consistent naming with respect to existing features. This was reverted with https://github.com/llvm/llvm-project/pull/84597, because ld.bfd would segfault with unknown riscv attributes. Now that attributes emission is guarded with a backend flag, `--riscv-abi-attributes`, this should be safe to reland, since it won't introduce abi tags unless the user opts into them.	2024-07-08 13:35:36 -07:00
Amy Huang	ae7ab043f2	Add __hlt intrinsic for Windows ARM. (#96578 ) Add __hlt, which is a MSVC ARM64 intrinsic. This intrinsic is just the HLT instruction. MSVC's version seems to return something undefined; in this patch it will just return zero. MSVC intrinsics are defined here https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics. I used unsigned int as the return type, because that is what the MSVC intrin.h header uses, even though it conflicts with the documentation.	2024-07-08 12:59:02 -07:00
David Green	e0012a0b3b	[AArch64] Regenerate cmp-to-cmn.ll. NFC	2024-07-08 19:09:23 +01:00
Philip Reames	03d4332625	[RISCV] Pack build_vectors into largest available element type (#97351 ) Our worst case build_vector lowering is a serial chain of vslide1down.vx operations which creates a serial dependency chain through a relatively high latency operation. We can instead pack together elements into ELEN sized chunks, and move them from integer to scalar in a single operation. This reduces the length of the serial chain on the vector side, and costs at most three scalar instructions per element. This is a win for all cores when the sum of the latencies of the scalar instructions is less than the vslide1down.vx being replaced, and is particularly profitable for out-of-order cores which can overlap the scalar computation. This patch is restricted to configurations with zba and zbb. Without both, the zero extend might require two instructions which would bring the total scalar instructions per element to 4. zba and zba are both present in the rva22u64 baseline which is looking to be quite common for hardware in practice; we could extend this to systems without bitmanip with a bit of extra effort.	2024-07-08 10:38:15 -07:00
chuongg3	5a5cd3f0bc	[AArch64][GlobalISel] Make G_DUP immediate 32-bits or larger (#96780 ) G_DUP's immediate operand gets extended in RegBankSelect to allow for better pattern matching in TableGen for #96782	2024-07-08 14:25:39 +01:00
Mahesh-Attarde	854bbc50fc	[X86][CodeGen] security check cookie execute only when needed (#95904 ) For windows __security_check_cookie call gets call everytime function is return without fixup. Since this function is defined in runtime library, it incures cost of call in dll which simply does comparison and returns most time. With Fixup, We selective move to call in DLL only if comparison fails.	2024-07-08 14:11:21 +01:00
Manish Kausik H	69192e0193	[LegalizeDAG] Optimize CodeGen for `ISD::CTLZ_ZERO_UNDEF` (#83039 ) Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case. The details of the optimization are outlined in #82075 Fixes #82075 Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>	2024-07-08 14:01:32 +01:00
Simon Pilgrim	92083e855b	[X86] Allow VPERMV3 -> VPERMV folds to handle extraction from a wider source vector (e.g. v16i32 -> v4i32) We don't need to restrict this to double width vectors, as long as we correctly bitcast the types Improves the fix for #97968	2024-07-08 13:10:45 +01:00
Simon Pilgrim	8ac6b415e4	[X86] Ensure VPERMV3 -> VPERMV fold comes from a double width vector #96414 + #97206 didn't ensure that we were extracting subvectors from a vector double the width of the destination. We can relax this in a future patch, but fix the #97968 crash first. Fixes #97968	2024-07-08 12:04:11 +01:00
Momchil Velikov	a497e987e5	Reapply "[AArch64] Lower extending sitofp using tbl (#92528 )" This re-commits d1a4f0c9fb559eb4c2fb56112e56343bcd333edc after a issue was fixed in f92bfca9fc217cad9026598ef6755e711c0be070 ("[AArch64] All bits of an exact right shift are demanded (#97448)").	2024-07-08 11:55:29 +01:00
esmeyi	c119da23af	[PowerPC] Function descriptor symbol may be omitted for external symbol. #97526 If a function's address is taken, which means it may be called via a function pointer, we need the function descriptor for it. Otherwise, the function descriptor can be omitted for external symbols.	2024-07-08 03:47:33 -04:00
hstk30-hw	ef465bf8b1	[ARM] Fix arm32be softfp mode miscompilation for neon sdiv (#97883 ) Related issue: https://github.com/llvm/llvm-project/issues/97782	2024-07-08 14:18:38 +08:00
Vikram Hegde	2a9607168b	[AMDGPU] Cleanup bitcast spam in atomic optimizer (#96933 )	2024-07-08 10:53:16 +05:30
Feng Zou	e603451f3c	[X86] Support branch hint (#97721 ) For more details about this feature, please refer to latest Intel 64 and IA-32 Architectures Optimization Reference Manual Volume 1: https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html	2024-07-08 13:12:50 +08:00
Craig Topper	e4ee9bf0d2	[RISCV] Custom legalize vXf16 BUILD_VECTOR without Zfhmin. (#97874 ) If we don't have Zfhmin, we will call `SoftPromoteHalfOperand` on the BUILD_VECTOR. This operation is not supported by the generic code. Instead, custom lower to a vXi16 BUILD_VECTOR using bitcasts. Fixes #97849.	2024-07-07 20:25:09 -07:00
Anatoly Trosinenko	f90bac99e1	Revert "[AArch64][PAC] Support BLRA* instructions in SLS Hardening pass" (#97887 ) This reverts commit 88b26293a24bdd85fce2b2f7191cc0a5bc0cecfe due to failures of CodeGen/AArch64/speculation-hardening-sls-blra.mir	2024-07-06 13:55:12 +03:00
Anatoly Trosinenko	88b26293a2	[AArch64][PAC] Support BLRA* instructions in SLS Hardening pass (#97605 ) Make SLS Hardening pass handle BLRA* instructions the same way it handles BLR. The thunk names have the form __llvm_slsblr_thunk_xN for BLR thunks __llvm_slsblr_thunk_(aaz\|abz)_xN for BLRAAZ and BLRABZ thunks __llvm_slsblr_thunk_(aa\|ab)_xN_xM for BLRAA and BLRAB thunks Now there are about 1800 possible thunk names, so do not rely on linear thunk function's name lookup and parse the name instead.	2024-07-06 13:36:02 +03:00

1 2 3 4 5 ...

54094 Commits