Some targets (e.g. PPC and Hexagon) already did this. I think it's best
to do this consistently so that frontend authors don't run into
inconsistent results when they emit `naked` functions. For example, in
Zig, we had to change our emit code to also set `frame-pointer=none` to
get reliable results across targets.
Note: I don't have commit access.
This patch adds icmp+select patterns for integer min/max matchers in
SDPatternMatch, similar to those in IR PatternMatch.
Reapply #111774.
Closes#108218.
The implementation was missing the fact that `G_EXTRACT_SUBVECTOR`
destination and source vector can be different types.
Also fix a bug in the MIR builder for `G_EXTRACT_SUBVECTOR` to generate
the correct opcode.
Clarify the G_EXTRACT_SUBVECTOR specification.
This PR is related to #99591. In this PR, instead of modifying how the
legalisation occurs depending on surrounding instructions, we refine
after legalisation.
This PR has two parts:
* `SDPatternMatch/MatchContext`: Modify a little bit the code to match
Operands (used by `m_Node(...)`) and Unary/Binary/Ternary Patterns to
make it compatible with `VPMatchContext`, instead of only `m_Opc`
supported. Some tests were added to ensure no regressions.
* `DAGCombiner`: Add a `foldSubCtlzNot` which detect and rewrite the
patterns using matching context.
Remaining Tasks:
- [ ] GlobalISel
- [ ] Currently the pattern matching will occur even before
legalisation. Should I restrict it to specific stages instead ?
- [ ] Style: Add a visitVP_SUB ?? Move `foldSubCtlzNot` in another
location for style consistency purpose ?
@topperc
---------
Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
While working on a MIR unittest, I noticed that parseMIR includes an
unused argument that sets a function name. This is not only redundant
but also irrelevant, as parseMIR is designed to parse entire module, not
specific functions, even though most unittests contain a single function
per module. To streamline the API, I have removed this unnecessary
argument from parseMIR. However, if this argument was originally
included to enhance readability or for any other purpose, please let me
know.
Add support for matching with `SDNodeFlags` i.e `add` with `nuw`.
This patch adds helpers for `or disjoint` or `zext nneg` with the same
names as we have in IR/PatternMatch api.
Closes#103060
Currently, when using a VP match context with `sd_context_match`, only Opcode matching is possible (`m_Opc(Opcode)`).
This PR suggest a way to make patterns with Operands (eg `m_Node`, `m_Add`, ...) works with a VP context.
This PR blocks another PR https://github.com/llvm/llvm-project/pull/102877.
Co-authored-by: v01dxyz <v01dxyz@v01d.xyz>
This avoids another unserializable field. Move the DbgInfoAvailable
field into the AsmPrinter, which is only really a cache/convenience
bit for checking a direct IR module metadata check.
Now revised to actually make the unit test compile, which I'd been
ignoring. No actual functional change, it's a type difference.
Original commit message follows.
[DebugInfo][InstrRef] Index DebugVariables and some DILocations (#99318)
A lot of time in LiveDebugValues is spent computing DenseMap keys for
DebugVariables, and they're made up of three pointers, so are large.
This patch installs an index for them: for the SSA and value-to-location
mapping parts of InstrRefBasedLDV we don't need to access things like
the variable declaration or the inlining site, so just use a uint32_t
identifier for each variable fragment that's tracked. The compile-time
performance improvements are substantial (almost 0.4% on the tracker).
About 80% of this patch is just replacing DebugVariable references with
DebugVariableIDs instead, however there are some larger consequences. We
spend lots of time fetching DILocations when emitting DBG_VALUE
instructions, so index those with the DebugVariables: this means all
DILocations on all new DBG_VALUE instructions will normalise to the
first-seen DILocation for the variable (which should be fine).
We also used to keep an ordering of when each variable was seen first in
a DBG_* instruction, in the AllVarsNumbering collection, so that we can
emit new DBG_* instructions in a stable order. We can hang this off the
DebugVariable index instead, so AllVarsNumbering is deleted.
Finally, rather than ordering by AllVarsNumbering just before DBG_*
instructions are linked into the output MIR, store instructions along
with their DebugVariableID, so that they can be sorted by that instead.
A lot of time in LiveDebugValues is spent computing DenseMap keys for
DebugVariables, and they're made up of three pointers, so are large.
This patch installs an index for them: for the SSA and value-to-location
mapping parts of InstrRefBasedLDV we don't need to access things like
the variable declaration or the inlining site, so just use a uint32_t
identifier for each variable fragment that's tracked. The compile-time
performance improvements are substantial (almost 0.4% on the tracker).
About 80% of this patch is just replacing DebugVariable references with
DebugVariableIDs instead, however there are some larger consequences. We
spend lots of time fetching DILocations when emitting DBG_VALUE
instructions, so index those with the DebugVariables: this means all
DILocations on all new DBG_VALUE instructions will normalise to the
first-seen DILocation for the variable (which should be fine).
We also used to keep an ordering of when each variable was seen first in
a DBG_* instruction, in the AllVarsNumbering collection, so that we can
emit new DBG_* instructions in a stable order. We can hang this off the
DebugVariable index instead, so AllVarsNumbering is deleted.
Finally, rather than ordering by AllVarsNumbering just before DBG_*
instructions are linked into the output MIR, store instructions along
with their DebugVariableID, so that they can be sorted by that instead.
Previously we had the same instructions being generated for `ISD::CTLZ` and `ISD::CTLZ_ZERO_UNDEF` which did not take advantage of the fact that zero is an invalid input for `ISD::CTLZ_ZERO_UNDEF`. This commit separates codegen for the two cases to allow for the optimization for the latter case.
The details of the optimization are outlined in #82075Fixes#82075
Co-authored-by: Manish Kausik H <hmamishkausik@gmail.com>
Timers are an out-of-line function call and a global variable access,
here twice per emitted instruction. At this granularity, not only the
time results become skewed, but the timers also add a performance
overhead when profiling is disabled. Also outside of the innermost loop,
timers add a measurable overhead. As this is quite expensive for a
mostly unused profiling facility, remove the timers.
Fixes#39650.
Currently, an AsmPrinterHandler has several methods that allow to
dynamically hook in unwind or debug info emission, e.g. at begin/end of
every function or instruction. The class hierarchy and the actually
overridden functions are as follows:
(SymSz=setSymbolSize, mFE=markFunctionEnd, BBS=BasicBlockSection,
FL=Funclet; b=beginX, e=endX)
SymSz Mod Fn mFE BBS FL Inst
AsmPrinterHandler - - - - - - -
` PseudoProbeHandler - - - - - - -
` WinCFGuard - e e - - - -
` EHStreamer - - - - - - -
` DwarfCFIException - e be - be - -
` ARMException - - be e - - -
` AIXException - - e - - - -
` WinException - e be e - be -
` WasmException - e be - - - -
` DebugHandlerBase - b be - be - be
` BTFDebug - e - - - - b
` CodeViewDebug - be - - - - b
` DWARFDebug yes be - - - - b
Doing virtual function calls per instruction is costly and useless when
the called function does nothing.
This commit performs the following clean-up/improvements:
- PseudoProbeHandler is no longer an AsmPrinterHandler -- it used
nothing of its functionality to hook in at the possible points. This
avoids virtual function calls when a pseudo probe printer is present.
- DebugHandlerBase is no longer an AsmPrinterHandler, but a separate
base class. DebugHandlerBase is the only remaining "hook" for begin/end
instruction and setSymbolSize (only used by DWARFDebug). begin/end for
function and basic block sections are never overriden and therefore are
no longer virtual. (Originally I intended there to be only one debug
handler, but BPF as the only target supports two at the same time: DWARF
and BTF.)
- AsmPrinterHandler no longer has begin/end instruction and
setSymbolSize hooks -- these were only used by DebugHandlerBase. This
avoid iterating over handlers in every instruction.
AsmPrinterHandler Mod Fn mFE BBS FL
` WinCFGuard e e - - -
` EHStreamer - - - - -
` DwarfCFIException e be - be -
` ARMException - be e - -
` AIXException - e - - -
` WinException e be e - be
` WasmException e be - - -
SymSz Mod Fn BBS Inst
DebugHandlerBase - b be be be
` BTFDebug - e b
` CodeViewDebug - be b
` DWARFDebug yes be b
PseudoProbeHandler (no shared methods)
To continue allowing external users (e.g., Julia) to hook in at every
instruction, a new method addDebugHandler is exposed.
This results in a performance improvement, especially in the -O0 -g0
case with unwind information (e.g., JIT baseline).
This reverts commit 0f8849349ae3d3f2f537ad6ab233a586fb39d375.
Resolve conflict in `MachinePostDominators.h` There is a conflict after
merging #96378, resolved in #96852. Both PRs modified
`MachinePostDominators.h` and triggered build failure.
This commit converts most of `DomTreeUpdater` into
`GenericDomTreeUpdater` class template, so IR and MIR can reuse some
codes.
There are some differences between interfaces of `BasicBlock` and
`MachineBasicBlock`, so subclasses still need to implement some
functions, like `forceFlushDeletedBB`.
simple_ilist does not take ownership of its nodes, which is fine for
SlotIndexes because the IndexListEntry nodes are allocated with a
BumpPtrAllocator and do not need to be freed.
Context: https://github.com/llvm/llvm-project/pull/95365#discussion_r1638236603
The current implementation of `m_SExt` matches both `ISD::SIGN_EXTEND` and `ISD::SIGN_EXTEND_INREG`. However, in cases where we specifically need to match _only_ `ISD::SIGN_EXTEND`, such as in the SelectionDAG graph below, this can lead to issues and unintended combinations.
```
SelectionDAG has 13 nodes:
t0: ch,glue = EntryToken
t2: v2i32,ch = CopyFromReg t0, Register:v2i32 %0
t21: v2i32 = sign_extend_inreg t2, ValueType:ch:v2i8
t4: v2i32,ch = CopyFromReg t0, Register:v2i32 %1
t22: v2i32 = sign_extend_inreg t4, ValueType:ch:v2i8
t23: v2i32 = avgfloors t21, t22
t24: v2i32 = sign_extend_inreg t23, ValueType:ch:v2i8
t15: ch,glue = CopyToReg t0, Register:v2i32 $d0, t24
t16: ch = AArch64ISD::RET_GLUE t15, Register:v2i32 $d0, t15:1
```
- Remove some cases where ULEB128 isn't needed
- Add a fastDecodeULEB128 tailored for GlobalISel which does unchecked
decoding optimized for the common case, which is 1 byte values. We
rarely have >1 byte Inst IDs, OpIdx, etc. and those are the most common
ULEB users by far.
This specific LEB128 decode function generates almost 2x less
instructions than the generic one.
In new pass system, `MachineFunction` could be an analysis result again,
machine module pass can now fetch them from analysis manager.
`MachineModuleInfo` no longer owns them.
Remove `FreeMachineFunctionPass`, replaced by
`InvalidateAnalysisPass<MachineFunctionAnalysis>`.
Now `FreeMachineFunction` is replaced by
`InvalidateAnalysisPass<MachineFunctionAnalysis>`, the workaround in
`MachineFunctionPassManager` is no longer needed, there is no difference
between `unittests/MIR/PassBuilderCallbacksTest.cpp` and
`unittests/IR/PassBuilderCallbacksTest.cpp`.