llvm-project

Author	SHA1	Message	Date
Simon Pilgrim	3a5cf6d99b	[X86] Rename AVX512 VEXTRACT/INSERT??x? to VEXTRACT/INSERT??X? (#116826 ) Use uppercase in the subvector description ("32x2" -> "32X4" etc.) - matches what we already do in VBROADCAST??X?, and we try to use uppercase for all x86 instruction mnemonics anyway (and lowercase just for the arg description suffix).	2024-11-20 08:25:01 +00:00
Simon Pilgrim	7dcefb37a4	[X86] Tidyup up AVX512 FPCLASS instruction naming (#116661 ) FPCLASS is a unary instruction with an immediate operand - update the naming to match similar instructions (e.g. VPSHUFD) by only using the source reg/mem and immediate in the instruction name	2024-11-19 11:26:46 +00:00
Daniel Zabawa	6fb7cdff3d	[X86] Recognize POP/ADD/SUB modifying rsp in getSPAdjust. (#114265 ) This code assumed only PUSHes would appear in call sequences. However, if calls require frame-pointer/base-pointer spills, only the PUSH operations inserted by spillFPBP will be recognized, and the adjustments to frame object offsets in prologepilog will be incorrect. This change correctly reports the SP adjustment for POP and ADD/SUB to rsp, and an assertion for unrecognized instructions that modify rsp.	2024-11-14 17:20:16 +01:00
Phoebe Wang	08af115d97	Fix mistakes in #113532 (#115631 ) Found during review #115151	2024-11-10 12:46:21 +08:00
Kazu Hirata	dfe43bd1ca	[X86] Remove unused includes (NFC) (#115593 ) Identified with misc-include-cleaner.	2024-11-09 08:23:46 -08:00
Phoebe Wang	c72a751dab	[X86][AMX] Support AMX-TRANSPOSE (#113532 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/671368	2024-11-01 16:45:03 +08:00
Simon Pilgrim	c59ac1a2f6	[X86] Cleanup AVX512 VBROADCAST subvector instruction names. (#108888 ) This patch makes the `VBROADCAST*X` subvector broadcast instructions consistent - the `*X` section represents the original subvector type/size, but we were not correctly using the AVX512 Z/Z256/Z128 suffix to consistently represent the destination width (or we missed it entirely).	2024-09-18 10:34:35 +01:00
Simon Pilgrim	c91f2a259f	[X86] Consistently use 'k' for predicate mask registers in instruction names (#108780 ) We use 'k' for move instructions and to indicate masked variants of evex instructions, but otherwise we're very inconsistent when we use 'k' vs 'r'.	2024-09-17 08:57:57 +01:00
Simon Pilgrim	614a064cac	[X86] Add missing immediate qualifier to the (V)INSERT/EXTRACT/PERM2 instruction names (#108593 ) Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-09-15 11:42:13 +01:00
Simon Pilgrim	ba8e4246e2	[X86] Add missing immediate qualifier to the (V)INSERTPS instruction names (#108568 ) Matches (V)BLENDPS etc and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-09-15 11:27:36 +01:00
Kyungwoo Lee	93b8d07a75	[MachineOutliner][NFC] Refactor (#105398 ) This patch prepares the NFC groundwork for global outlining using CGData, which will follow https://github.com/llvm/llvm-project/pull/90074. - The `MinRepeats` parameter is now explicitly passed to the `getOutliningCandidateInfo` function, rather than relying on a default value of 2. For local outlining, the minimum number of repetitions is typically 2, but for the global outlining (mentioned above), we will optimistically create a single `Candidate` for each `OutlinedFunction` if stable hashes match a specific code sequence. This parameter is adjusted accordingly in global outlining scenarios. - I have also implemented `unique_ptr` for `OutlinedFunction` to ensure safe and efficient memory management within `FunctionList`, avoiding unnecessary implicit copies. This depends on https://github.com/llvm/llvm-project/pull/101461. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.	2024-08-27 14:38:36 -07:00
Piyou Chen	b01c006f73	[TII][RISCV] Add renamable bit to copyPhysReg (#91179 ) The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.	2024-08-27 10:08:43 +08:00
Temperatureblock	db3c3fc90a	Simple check to ignore Inline asm fwait insertion (#101686 ) Just a simple check to ignore Inline asm fwait insertion Fixes #101613	2024-08-12 22:36:58 +08:00
Phoebe Wang	b0329206db	[X86][AVX10.2] Support AVX10.2 VNNI FP16/INT8/INT16 new instructions (#101783 ) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965	2024-08-05 18:57:42 +08:00
Shengchen Kan	50cf413426	[X86,CodeGen] Return the correct condition code for SETZUCC llvm-issue: https://github.com/llvm/llvm-project/issues/101288	2024-07-31 14:09:08 +08:00
Pengcheng Wang	ed4e75d5e5	[CodeGen] Remove AA parameter of isSafeToMove (#100691 ) This `AA` parameter is not used and for most uses they just pass a nullptr. The use of `AA` was removed since 8d0383e.	2024-07-26 15:47:47 +08:00
Matt Arsenault	3cb5604d2c	MachineOutliner: Use PM to query MachineModuleInfo (#99688 ) Avoid getting this from the MachineFunction	2024-07-24 13:22:56 +04:00
Nikita Popov	4169338e75	[IR] Don't include Module.h in Analysis.h (NFC) (#97023 ) Replace it with a forward declaration instead. Analysis.h is pulled in by all passes, but not all passes need to access the module.	2024-06-28 14:30:47 +02:00
Haohai Wen	be00190ce3	[TII][X86] Do not schedule frame-setup/frame-destory instructions (#96611 ) frame-setup/frame-destroy instruction can not be scheduled around by PostRAScheduler. Their order is critical for SEH.	2024-06-26 17:08:59 +08:00
Shengchen Kan	bdc7840c57	[X86][CodeGen] Share code between CompressEVEX pass and ND2NonND transform, NFCI	2024-06-19 16:03:57 +08:00
Shengchen Kan	1216cde81a	[X86][mem-fold] Support memory folding from MOV32r0 to MOV64mi32	2024-06-12 22:06:10 +08:00
paperchalice	837dc542b1	[CodeGen][NewPM] Split `MachineDominatorTree` into a concrete analysis result (#94571 ) Prepare for new pass manager version of `MachineDominatorTreeAnalysis`. We may need a machine dominator tree version of `DomTreeUpdater` to handle `SplitCriticalEdge` in some CodeGen passes.	2024-06-11 21:27:14 +08:00
Shengchen Kan	22c572eae0	[X86][CodeGen] Support memory folding for NDD -> RMW	2024-05-30 19:06:22 +08:00
Shengchen Kan	7f524f7ef2	[X86][CodeGen] Simplify the code in foldMemoryOperandImpl, NFCI In preparation for the coming NDD -> RMW fold.	2024-05-30 14:57:38 +08:00
Shengchen Kan	a9e8a3a18e	[X86][CodeGen] Extend X86CompressEVEX for NF transform	2024-05-29 15:41:31 +08:00
Shengchen Kan	331eb8a004	[X86][CodeGen] Support lowering for CCMP/CTEST (#91747 ) DAG combine for `CCMP` and `CTESTrr`: ``` and/or(setcc(cc0, flag0), setcc(cc1, sub (X, Y))) -> setcc(cc1, ccmp(X, Y, ~cflags/cflags, cc0/~cc0, flag0)) and/or(setcc(cc0, flag0), setcc(cc1, cmp (X, 0))) -> setcc(cc1, ctest(X, X, ~cflags/cflags, cc0/~cc0, flag0)) ``` where `cflags` is determined by `cc1`. Generic DAG combine: ``` cmp(setcc(cc, X), 0) brcond ne -> X brcond cc sub(setcc(cc, X), 1) brcond ne -> X brcond ~cc ``` Post DAG transform: `ANDrr/rm + CTESTrr -> CTESTrr/CTESTmr` Pattern match for `CTESTri`: ``` X= and A, B ctest(X, X, cflags, cc0/, flag0) -> ctest(A, B, cflags, cc0/, flag0) ``` `CTESTmi` is already handled by the memory folding mechanism in MIR.	2024-05-26 18:32:23 +08:00
Shengchen Kan	4b62afca64	[X86][CodeGen] Support flags copy lowering for CCMP/CTEST (#91849 ) ``` %1:gr64 = COPY $eflags OP1 may update eflags $eflags = COPY %1 OP2 may use eflags ``` To use eflags as input at 4th instruction, we need to use SETcc to preserve the eflags before 2, and update the source condition of OP2 according to value in GPR %1. In this patch, we support CCMP/CTEST as OP2.	2024-05-18 19:50:16 +08:00
Kazu Hirata	c18bcd0a57	[Target] Use StringRef::operator== instead of StringRef::equals (NFC) (#91072 ) (#91138 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 38 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-05 13:43:10 -07:00
Xu Zhang	f6d431f208	[CodeGen] Make the parameter TRI required in some functions. (#85968 ) Fixes #82659 There are some functions, such as `findRegisterDefOperandIdx` and `findRegisterDefOperand`, that have too many default parameters. As a result, we have encountered some issues due to the lack of TRI parameters, as shown in issue #82411. Following @RKSimon 's suggestion, this patch refactors 9 functions, including `{reads, kills, defines, modifies}Register`, `registerDefIsDead`, and `findRegister{UseOperandIdx, UseOperand, DefOperandIdx, DefOperand}`, adjusting the order of the TRI parameter and making it required. In addition, all the places that call these functions have also been updated correctly to ensure no additional impact. After this, the caller of these functions should explicitly know whether to pass the `TargetRegisterInfo` or just a `nullptr`.	2024-04-24 14:24:14 +01:00
Pengcheng Wang	b564036933	[MachineCombiner][NFC] Split target-dependent patterns We split target-dependent MachineCombiner patterns into their target folder. This makes MachineCombiner much more target-independent. Reviewers: davemgreen, asavonic, rotateright, RKSimon, lukel97, LuoYuanke, topperc, mshockwave, asi-sc Reviewed By: topperc, mshockwave Pull Request: https://github.com/llvm/llvm-project/pull/87991	2024-04-11 12:20:27 +08:00
Simon Pilgrim	ecb34599bd	[X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636 ) Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-04-04 15:20:16 +01:00
Freddy Ye	36b4b9d988	[X86] Support immediate folding for CCMP/CTEST (#86616 ) E.g. %0:gr32 = MOV32ri 81 CTEST32rr %0, %1, 2, 10, implicit-def $eflags, implicit $eflags => CTEST32ri %1, 81, 2, 10, implicit-def $eflags, implicit $eflags	2024-03-28 18:54:32 +08:00
XinWang10	7b766a6f50	[X86] Support APX CMOV/CFCMOV instructions (#82592 ) This patch support ND CMOV instructions and CFCMOV instructions. RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4	2024-03-17 20:18:56 +08:00
Ganesh	61fadd0b09	[X86] Fast AVX-512-VNNI vpdpwssd tuning (#85375 ) Adding a tuning feature to fix https://github.com/llvm/llvm-project/issues/84182 Generates vpdpwssd (instead of vpmaddwd + vpaddd sequence)	2024-03-15 16:45:41 +05:30
Simon Pilgrim	1ec5b1f483	[X86] Add missing immediate qualifier to the (V)PCLMULQDQ instruction names	2024-03-11 13:39:25 +00:00
Simon Pilgrim	92d7aca441	[X86] Add missing immediate qualifier to the (V)CMPSS/D instructions (#84496 ) Matches (V)CMPPS/D and makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on	2024-03-09 16:21:25 +00:00
David Green	44be5a7fdc	[Codegen] Make Width in getMemOperandsWithOffsetWidth a LocationSize. (#83875 ) This is another part of #70452 which makes getMemOperandsWithOffsetWidth use a LocationSize for Width, as opposed to the unsigned it currently uses. The advantages on it's own are not super high if getMemOperandsWithOffsetWidth usually uses known sizes, but if the values can come from an MMO it can help be more accurate in case they are Unknown (and in the future, scalable).	2024-03-06 17:40:13 +00:00
AtariDreams	3e40c96d89	[X86] Resolve FIXME: Add FPCW as a rounding control register (#82452 ) To prevent tests from breaking, another fix had to be made: Now, we check if the instruction after a waiting instruction is a call, and if so, we insert the wait.	2024-03-05 08:47:05 +08:00
Simon Pilgrim	448fe73428	[X86] Add X86::getVectorRegisterWidth helper. NFC. Replaces internal helper used by addConstantComments to allow reuse in a future patch.	2024-02-08 12:42:33 +00:00
Shengchen Kan	e270ec47cd	[X86] X86InstrInfo.cpp - Remove dead code for memory folding, NFCI `commuteInstruction(MI, false, OpNum, CommuteOpIdx2)` should never create any new instruction, so we don't need to check and erase it.	2024-02-02 11:14:07 +08:00
Philip Reames	3ff7caea33	[TTI] Use Register in isLoadFromStackSlot and isStoreToStackSlot [nfc] (#80339 )	2024-02-01 17:52:35 -08:00
Shengchen Kan	c82a645ef2	[X86][NFC] Simplify the code for memory fold	2024-02-01 13:43:25 +08:00
Shengchen Kan	e3c9327bc4	[X86][CodeGen] Set isReMaterializable = 1 for AVX broadcast load Broadcast of a single float should not be any slower than loading 32B using vmovaps. So remat it can help reduce register spill when there is big register pressure.	2024-01-31 20:55:56 +08:00
Kazu Hirata	5d7a0a734a	[X86] Use a range-based for loop (NFC)	2024-01-30 22:12:05 -08:00
Shengchen Kan	8e77390c06	[X86][CodeGen] Support folding memory broadcast in X86InstrInfo::foldMemoryOperandImpl (#79761 )	2024-01-31 12:51:03 +08:00
Shengchen Kan	2960656eb9	[X86][NFC] Extract code for commute in foldMemoryOperandImpl into functions To share code for folding broadcast in #79761	2024-01-31 00:09:08 +08:00
Shengchen Kan	02a275cca1	[X86][CodeGen] Add entries for TB_BCAST_SH in getBroadcastOpcode	2024-01-30 21:01:31 +08:00
Shengchen Kan	f28430d577	[X86][CodeGen] Add entries for TB_BCAST_W in getBroadcastOpcode and fix typo	2024-01-30 01:03:32 +08:00
Shengchen Kan	169553688c	[X86][NFC] Remove TB_FOLDED_BCAST and format code in X86InstrFoldTables.cpp	2024-01-30 00:27:16 +08:00
Shengchen Kan	7089c012ec	[X86][NFC] Replace if-else with switch-case in X86InstrInfo::foldMemoryOperandImpl	2024-01-28 10:30:26 +08:00

1 2 3 4 5 ...

1697 Commits