llvm-project

Author	SHA1	Message	Date
Michal Paszkowski	43222bd309	[SPIR-V] Do not use OpenCL metadata for ptr element type resolution (#82678 ) This pull request aims to remove any dependency on OpenCL/SPIR-V type information in LLVM IR metadata. While, using metadata might simplify and prettify the resulting SPIR-V output (and restore some of the information missed in the transformation to opaque pointers), the overall methodology for resolving kernel parameter types is highly inefficient. The high-level strategy is to assign kernel parameter types in this order: 1. Resolving the types using builtin function calls as mangled names must contain type information or by looking up builtin definition in SPIRVBuiltins.td. Then: - Assigning the type temporarily using an intrinsic and later setting the right SPIR-V type in SPIRVGlobalRegistry after IRTranslation - Inserting a bitcast 2. Defaulting to LLVM IR types (in case of pointers the generic i8* type or types from byval/byref attributes) In case of type incompatibility (e.g. parameter defined initially as sampler_t and later used as image_t) the error will be found early on before IRTranslation (in the SPIRVEmitIntrinsics pass).	2024-03-03 22:38:59 -08:00
Shilei Tian	8300f30a92	[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056 ) This patch adds the support for `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16`.	2024-03-04 01:08:49 -05:00
Shilei Tian	2c5d01c2cf	Revert "[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056 )" This reverts commit b0c158bd947c360a4652eb0de3a4794f46deb88b. The changes in `compiler-rt` broke tests.	2024-03-04 00:33:31 -05:00
Shilei Tian	b0c158bd94	[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056 ) This patch adds the support for `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16`.	2024-03-04 00:01:50 -05:00
Phoebe Wang	ccc48d45b8	[X86][NFC] Replace X32 check prefixes with X86 We try to only use X32 for gnux32 triple tests.	2024-03-04 11:13:13 +08:00
Phoebe Wang	ff72c83b01	[X86] Add missing subvector_subreg_lowering for BF16 (#83720 ) Fixes: #83358	2024-03-04 10:15:43 +08:00
Yeting Kuo	61c283db4b	[ScalarizeMaskedMemIntrin] Use pointer alignment from pointer of masked.compressstore/expandload. (#83519 ) Previously we used Align(1) for all scalarized load/stores from masked.compressstore/expandload. For targets not supporting unaligned accesses, it make backend need to split aligned large width loads/stores to byte loads/stores. To solve this performance issue, this patch preserves the alignment of base pointer after scalarizing.	2024-03-04 09:41:16 +08:00
Lu Weining	5f058aa211	[LoongArch] Override LoongArchTargetLowering::getExtendForAtomicCmpSwapArg (#83656 ) This patch aims to solve Firefox issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1882301 Similar to 616289ed2922. Currently LoongArch uses an ll.[wd]/sc.[wd] loop for ATOMIC_CMP_XCHG. Because the comparison in the loop is full-width (i.e. the `bne` instruction), we must sign extend the input comparsion argument. Note that LoongArch ISA manual V1.1 has introduced compare-and-swap instructions. We would change the implementation (return `ANY_EXTEND`) when we support them.	2024-03-04 08:38:52 +08:00
David Majnemer	3dd6750027	[AArch64] Add more complete support for BF16 We can use a small amount of integer arithmetic to round FP32 to BF16 and extend BF16 to FP32. While a number of operations still require promotion, this can be reduced for some rather simple operations like abs, copysign, fneg but these can be done in a follow-up. A few neat optimizations are implemented: - round-inexact-to-odd is used for F64 to BF16 rounding. - quieting signaling NaNs for f32 -> bf16 tries to detect if a prior operation makes it unnecessary.	2024-03-03 22:39:50 +00:00
David Green	5f058398ab	[ARM] Mark AESD and AESE instructions as commutative. Similar to #83390, this marks AESD and AESE as commutative, as the logic of the instructions starts as a XOR between the two operands.	2024-03-03 16:56:21 +00:00
NAKAMURA Takumi	5b4759f9fd	Revert "[X86] Don't always separate conditions in `(br (and/or cond0, cond1))` into separate branches" This has been buggy for a while. Reverts #81689 This reverts commit ae76dfb74701e05e5ab4be194e20e49f10768e46.	2024-03-03 22:31:28 +09:00
Shengchen Kan	37293e69e6	[X86][CodeGen] Support long instruction fixup for APX NDD instructions (#83578 ) RFC: https://discourse.llvm.org/t/rfc-support-long-instruction-fixup-for-x86/76539	2024-03-03 13:03:35 +08:00
George Koehler	6b70c5d79f	[PowerPC] provide CFI for ELF32 to unwind cr2, cr3, cr4 (#83098 ) Delete the code that skips the CFI for the condition register on ELF32. The code checked !MustSaveCR, which happened only when Subtarget.is32BitELFABI(), where spillCalleeSavedRegisters is spilling cr in a different way. The spill was missing CFI. After deleting this code, a spill of cr2 to cr4 gets CFI in the same way as a spill of r14 to r31. Fixes #83094	2024-03-02 22:18:24 -05:00
Craig Topper	4dd9c2ed32	[RISCV] Use NewVL in splatPartsI64WithVL. (#83690 ) In 7b5cf52f32c09, I added this NewVL and checked that it had been set, but I didn't use it for the VL of the splat.	2024-03-02 17:08:48 -08:00
Bjorn Pettersson	da591d390e	[GlobalISel][TableGen] Take first result for multi-output instructions (#81130 ) Previously, tblgen would reject patterns where one of its nested instructions produced more than one result. These arise when the instruction definition contains 'outs' as well as 'Defs'. This patch fixes that by always taking the first result, which is how these situations are handled in SelectionIDAG. Original patch: https://reviews.llvm.org/D86617 Continued as: https://github.com/llvm/llvm-project/pull/81130	2024-03-02 20:10:02 +01:00
Fangrui Song	d89b771ef5	[ARM] Add alias tests for ROPI/RWPI https://reviews.llvm.org/D23195 does not test aliases.	2024-03-02 10:33:57 -08:00
Simon Pilgrim	ca827d53c5	[X86] Convert logicalshift(x, C) -> and(x, M) iff x is allsignbits (#83596 ) If we're logical shifting an all-signbits value, then we can just mask out the shifted bits. This helps removes some unnecessary bitcasted vXi16 shifts used for vXi8 shifts (which SimplifyDemandedBits will struggle to remove through the bitcast), and allows some AVX1 shifts of 256-bit values to stay as a YMM instruction. Noticed in codegen from #82290	2024-03-02 12:44:33 +00:00
Noah Goldstein	ae76dfb747	[X86] Don't always separate conditions in `(br (and/or cond0, cond1))` into separate branches It makes sense to split if the cost of computing `cond1` is high (proportionally to how likely `cond0` is), but it doesn't really make sense to introduce a second branch if its only a few instructions. Splitting can also get in the way of potentially folding patterns. This patch introduces some logic to try to check if the cost of computing `cond1` is relatively low, and if so don't split the branches. Modest improvement on clang bootstrap build: https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=cycles Average stage2-O3: 0.59% Improvement (cycles) Average stage2-O0-g: 1.20% Improvement (cycles) Likewise on llvm-test-suite on SKX saw a net 0.84% improvement (cycles) There is also a modest compile time improvement with this patch: https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=instructions%3Au Note that the stage2 instruction count increases is expected, this patch trades instructions for decreasing branch-misses (which is proportionately lower): https://llvm-compile-time-tracker.com/compare.php?from=79ce933114e46c891a5632f7ad4a004b93a5b808&to=978278eabc0bafe2f390ca8fcdad24154f954020&stat=branch-misses NB: This will also likely help for APX targets with the new `CCMP` and `CTEST` instructions. Closes #81689	2024-03-01 15:35:34 -06:00
Noah Goldstein	ec415aff63	[X86] Regenerate X86/lsr-addrecloops.ll test; NFC	2024-03-01 15:35:34 -06:00
Michael Liao	a490bbf539	[M68k] Fix compilation pipeline check - Add 'Init Undef Pass', which is target-independent now.	2024-03-01 14:49:27 -05:00
Simon Pilgrim	1e8d3c357e	[X86] cmp-shiftX-maskX.ll - add additional tests for #83596 Shows cases where logical shifts of allsignbits values can be profitably converted to masks	2024-03-01 19:20:14 +00:00
Simon Pilgrim	582718fe61	[X86] cmp-shiftX-maskX.ll - add AVX1 test coverage	2024-03-01 19:20:14 +00:00
Farzon Lotfi	e741d889f4	[DXIL] Add frac unary lowering (#83465 ) This change adds lowering for HLSL's frac intrinsic to DXIL. This change should complete #70099	2024-03-01 12:53:05 -05:00
Farzon Lotfi	b542501ad7	[HLSL][DXIL] Implementation of round intrinsic (#83570 ) hlsl_intrinsics.h - add the round api DXIL.td add the llvm intrinsic to DXIL lowering mapping This change reuses llvm's existing intrinsic `__builtin_elementwise_round`\ `int_round` This change implements: #70077	2024-03-01 12:27:25 -05:00
Craig Topper	0813b90ff5	[TypePromotion] Support positive addition amounts in isSafeWrap. (#81690 ) We can support these by changing the sext promotion to -zext(-C) and replacing a sgt check with ugt. Reframing the logic in terms of how the unsigned range are affected. More comments in the patch. The new cases check isLegalAddImmediate to avoid some regressions in lit tests.	2024-03-01 09:17:14 -08:00
Simon Pilgrim	765a5d62bc	[X86] Pre-SSE42 v2i64 sgt lowering - check if representable as v2i32 (#83560 ) Without PCMPGTQ, if the i64 elements are sign-extended enough to be representable as i32 then we can compare the lower i32 bits with PCMPGTD and splat the results into the upper elements. Value tracking has meant we already get pretty close with this, but this allows us to remove a lot of unnecessary bit flipping.	2024-03-01 14:29:12 +00:00
Shengchen Kan	924ad198f5	[X86][CodeGen] Add missing patterns for APX NDD instructions about encoding trick	2024-03-01 21:26:10 +08:00
Pierre van Houtryve	756166e342	[AMDGPU] Improve detection of non-null addrspacecast operands (#82311 ) Use IR analysis to infer when an addrspacecast operand is nonnull, then lower it to an intrinsic that the DAG can use to skip the null check. I did this using an intrinsic as it's non-intrusive. An alternative would have been to allow something like `!nonnull` on `addrspacecast` then lower that to a custom opcode (or add an operand to the addrspacecast MIR/DAG opcodes), but it's a lot of boilerplate for just one target's use case IMO. I'm hoping that when we switch to GISel that we can move all this logic to the MIR level without losing info, but currently the DAG doesn't see enough so we need to act in CGP. Fixes: SWDEV-316445	2024-03-01 14:01:10 +01:00
David Green	d458a19317	[AArch64] Mark AESD and AESE instructions as commutative. (#83390 ) This come from https://discourse.llvm.org/t/combining-aes-and-xor-can-be-improved-further/77248. These instructions start out with: ``` XOR Vd, Vn <some complicated math> ``` The initial XOR means that they can be treated as commutative, removing some of the unnecessary mov's introduced during register allocation.	2024-03-01 10:24:27 +00:00
chuongg3	4a5ec3cec8	Revert "[AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors" (#83544 ) Reverts llvm/llvm-project#83038 due to failing build in Fuchsia build https://lab.llvm.org/staging/#/builders/187/builds/1695	2024-03-01 08:56:34 +00:00
Nick Anderson	ba8e9ace13	[AMDGPU] promote i1 arg type for amdgpu_cs (#82971 ) fixes #68087 Not sure where to put regression tests for this pr? Also, should i1 args not in reg also be promoted?	2024-03-01 14:25:46 +05:30
Shengchen Kan	420928b2fa	[X86][CodeGen] Fix compile crash in EVEX compression for corner case The base register of OPmi_ND may be allocated to the same physic register as the ND operand. OPmi_ND is not compressible b/c it has different semnatic from OPmi. In this case, `isRedundantNewDataDest` should return false, otherwise we would get error Assertion `!IsNDLike && "Missing entry for ND-like instruction"' failed.	2024-03-01 16:13:09 +08:00
Dhruv Chawla (work)	6c39fa9e9f	[AArch64][GlobalISel] Expand abs.v4i8 to v4i16 and abs.v2s16 to v2s32 (#81231 ) GISel was currently falling back to SDAG for these functions, and this matches the way SDAG currently generates code for these functions.	2024-03-01 13:01:55 +05:30
Douglas Yung	edd0ef4f3c	Add "REQUIRES: asserts" to 2 tests added in #83379 using "-debug-only" run arguments.	2024-03-01 01:06:42 -05:00
Wang Pengcheng	2023a230d1	[RISCV] Move V0 to the end of register allocation order (#82967 ) According to https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html: > The v0 register defined by the RISC-V vector extension is special in > that it can be used both as a general purpose vector register and also > as a mask register. As a preference, use registers other than v0 for > non-mask values. Otherwise data will have to be moved out of v0 when a > mask is required in an operation. v0 may be used when all other > registers are in use, and using v0 would avoid spilling register state > to memory. And using V0 register may stall masking pipeline and stop chaining for some microarchitectures. So we should try to not use V0 and register groups contained it as much as possible. We achieve this via moving V0 to the end of RA order.	2024-03-01 12:17:56 +08:00
Felix (Ting Wang)	5b05870953	[PowerPC] Support local-dynamic TLS relocation on AIX (#66316 ) Supports TLS local-dynamic on AIX, generates below sequence of code: ``` .tc foo[TC],foo[TL]@ld # Variable offset, ld relocation specifier .tc mh[TC],mh[TC]@ml # Module handle for the caller lwz 3,mh[TC]$2$ $$ For 64-bit: ld 3,mh[TC]$2$ bla .__tls_get_mod # Modifies r0,r3,r4,r5,r11,lr,cr0 #r3 = &TLS for module lwz 4,foo[TC]$2$ $$ For 64-bit: ld 4,foo[TC]$2$ add 5,3,4 # Compute &foo .rename mh[TC], "\_$TLSML" # Symbol for the module handle must have the name "_$TLSML" ``` --------- Co-authored-by: tingwang <tingwang@tingwangs-MBP.lan> Co-authored-by: tingwang <tingwang@tingwangs-MacBook-Pro.local>	2024-03-01 08:09:40 +08:00
Kai Luo	d1924f0474	[PowerPC] Do not generate `isel` instruction if target doesn't have this instruction (#72845 ) When expand `select_cc` in finalize-isel, we should not generate `isel` for targets not feature it.	2024-03-01 08:03:06 +08:00
Sumanth Gundapaneni	ca9d2e923b	[Hexagon] Add Loop Alignment pass. (#83379 ) Inspect a basic block and if its single basic block loop with a small number of instructions, set the Loop Alignment to 32 bytes. This will avoid the cache line break in the first packet of loop which will cause a stall per each execution of loop.	2024-02-29 16:57:33 -06:00
Leon Clark	5b07fd4799	[AMDGPU] Fix OpenCL conformance test failures for ctlz. (#83170 ) Remove LSH transform and restore previous lowering. Fixes conformance issue in [77615](https://github.com/llvm/llvm-project/pull/77615) where OpenCL integer_ops tests fail for integer_clz. Co-authored-by: Leon Clark <leoclark@amd.com>	2024-02-29 22:28:13 +00:00
chuongg3	a344db793a	[AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors (#83038 ) Legalize Smaller/Larger than legal vectors with i8 and i16 element sizes. Vectors with elements smaller than i8 will get widened to i8 elements.	2024-02-29 16:31:05 +00:00
Sander de Smalen	5bd01ac822	[AArch64] Re-enable rematerialization for streaming-mode-changing functions. (#83235 ) We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.	2024-02-29 15:35:46 +00:00
Simon Pilgrim	80a328b011	[X86] SimplifyDemandedVectorEltsForTargetNode - add basic PCMPEQ/PCMPGT handling	2024-02-29 15:22:12 +00:00
Tuan Chuong Goh	92e5f13ad1	[AArch64][GlobalISel] Legalize G_SHUFFLE_VECTOR for Odd-Sized Vectors (#83038 )	2024-02-29 15:16:48 +00:00
Michael Maitland	4f132dca71	[RISCV] Enable PostRAScheduler for SiFive7 (#83166 ) Based on numbers collected in our downstream toolchain.	2024-02-29 09:57:15 -05:00
RicoAfoat	1e6627ecef	[X86] matchAddressRecursively - ensure dead nodes are replaced before matching the index register (#82881 ) Fixes #82431 - see #82431 for more information.	2024-02-29 14:55:51 +00:00
Petar Avramovic	0d572c41f9	AMDGPU\GlobalISel: remove amdgpu-global-isel-risky-select flag (#83426 ) AMDGPUInstructionSelector should no longer attempt to select S1 G_PHIs. Remove MIR test that attempts to inst-select divergent vcc(S1) G_PHI. Lane mask merging algorithm for GlobalISel is now responsible for selecting divergent S1 G_PHIs in AMDGPUGlobalISelDivergenceLowering. Uniform S1 G_PHIs should be lowered to S32 G_PHIs in reg bank select pass. In summary S1 G_PHIs should not reach AMDGPUInstructionSelector.	2024-02-29 15:38:54 +01:00
Petar Avramovic	6c2eec5cea	AMDGPU/GlobalISel: lane masks merging (#73337 ) Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: https://github.com/llvm/llvm-project/pull/75340 patch 2: https://github.com/llvm/llvm-project/pull/75349 patch 3: https://github.com/llvm/llvm-project/pull/80003 patch 4: https://github.com/llvm/llvm-project/pull/78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.	2024-02-29 13:57:59 +01:00
David Green	dbca8a49b6	[DAG] Improve known bits of Zext/Sext loads with range metadata (#80829 ) This extends the known bits for extending loads which have range metadata, handling the range metadata on the original memory type, extending that to the correct BitWidth.	2024-02-29 12:53:13 +00:00
Simon Pilgrim	139bcda542	[X86] SimplifyDemandedVectorEltsForTargetNode - add basic CVTPH2PS/CVTPS2PH handling Allows us to peek through the F16 conversion nodes, mainly to simplify shuffles An easy part of #83414	2024-02-29 12:33:49 +00:00
Simon Pilgrim	b50b50bfbf	[X86] cmov-fp.ll - regenerate with common 'NOSSE' prefix to reduce duplication	2024-02-29 12:16:52 +00:00

... 9 10 11 12 13 ...

52796 Commits