Skip to content

Several Problems when running tutorial-7.py #358

@cheng-wei-huang0612

Description

@cheng-wei-huang0612

Abstract

I encountered three problems when I run tutorial-7.py:

  1. slothy.targets.common.FatalParsingException: Inconsistent dt:
  2. slothy.targets.common.UnknownInstruction: Multiple matches found for <class 'slothy.targets.aarch64.aarch64_neon.x_str_sp_imm'> for str x1, [sp, #STACK_A_32]
  3. slothy.helper.LLVM_Mc_Error

I personally solved the first two in my branch https://github.com/cheng-wei-huang0612/slothy/tree/Patch--fixing-tutorial-7.py
But I do not know how to deal with the third.

Description

In the main branch, when I run python3 tutorial-7.py, I got

(venv) huangchengwei@huangchengweis-MacBook-Air tutorial % python3 tutorial-7.py
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
Traceback (most recent call last):
  File "/Users/huangchengwei/Documents/slothy/tutorial/tutorial-7.py", line 24, in <module>
    slothy.optimize(start="mainloop", end="end_label")
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/slothy.py", line 397, in optimize
    early, core, late, num_exceptional = Heuristics.periodic(body, logger, c)
                                         ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 364, in periodic
    dfg = DFG(
        body, logger.getChild("dfg_generate_outputs"), DFGConfig(conf.copy())
    )
  File "/Users/huangchengwei/Documents/slothy/slothy/core/dataflow.py", line 720, in __init__
    self.src = self._parse_source(src)
               ~~~~~~~~~~~~~~~~~~^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/dataflow.py", line 798, in _parse_source
    return list(map(self._parse_line, src_lines))
  File "/Users/huangchengwei/Documents/slothy/slothy/core/dataflow.py", line 787, in _parse_line
    insts = self.arch.Instruction.parser(line)
  File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 884, in parser
    inst = inst_class.make(src)
  File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 1246, in make
    return AArch64Instruction.build(cls, src)
           ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 1230, in build
    AArch64Instruction._enforce_datatype_matching(pattern, res)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 942, in _enforce_datatype_matching
    raise FatalParsingException(f"Inconsistent dt: {dt}")
slothy.targets.common.FatalParsingException: Inconsistent dt: <dt1>

It turns out that SLOTHY is complaining about the parsing of usra instruction, whose parser definition is originally:

class vusra(AArch64Instruction):
    pattern = "usra <Vd>.<dt0>, <Va>.<dt1>, <imm>"
    inputs = ["Va"]
    in_outs = ["Vd"]

I modified it into

class vusra(AArch64Instruction):
    pattern = "usra <Vd>.<dt>, <Va>.<dt>, <imm>"
    inputs = ["Va"]
    in_outs = ["Vd"]

and then the second issue arise:

(venv) huangchengwei@huangchengweis-MacBook-Air tutorial % python3 tutorial-7.py
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.slothy:Objective: None (any satisfying solution is fine)
INFO:slothy.mainloop.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.slothy:[74.0876s]: Found 1 solutions so far... objective (no objective): currently 0.0, bound 0.0
INFO:slothy.mainloop.slothy:OPTIMAL, wall time: 74.309375 s
INFO:slothy.mainloop.slothy:Booleans in result: 16950
INFO:slothy.mainloop.slothy.selfcheck:OK!
INFO:slothy.mainloop.slothy.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.split.0_96:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.0_96:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.0_96:Objective: minimize cycles
INFO:slothy.mainloop.split.0_96:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.0_96:[2.3609s]: Found 1 solutions so far... objective (minimize cycles): currently  (Cycles ~ 60.0, IPC ~ 1.60), bound  (Cycles ~ 54.0, IPC ~ 1.78)
INFO:slothy.mainloop.split.0_96:OPTIMAL, wall time: 3.732789 s
INFO:slothy.mainloop.split.0_96:Booleans in result: 8125
INFO:slothy.mainloop.split.0_96.selfcheck:OK!
INFO:slothy.mainloop.split.0_96.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop.split.0_96:Minimum number of stalls: 12
INFO:slothy.mainloop.split.96_192:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.96_192:Attempt optimization with max 32 stalls...
Traceback (most recent call last):
  File "/Users/huangchengwei/Documents/slothy/tutorial/tutorial-7.py", line 36, in <module>
    slothy.optimize(start="mainloop", end="end_label")
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/slothy.py", line 397, in optimize
    early, core, late, num_exceptional = Heuristics.periodic(body, logger, c)
                                         ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 373, in periodic
    res = Heuristics.linear(body, logger=logger, conf=conf)
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 462, in linear
    return Heuristics._split(body, logger, conf)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 986, in _split
    return Heuristics._split_inner(body, logger, c)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 882, in _split_inner
    cur_body, stalls, _ = optimize_chunks_many(
                          ~~~~~~~~~~~~~~~~~~~~^
        idx_lst, cur_body, stalls, show_stalls=False
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 831, in optimize_chunks_many
    body, stalls, cur_stalls, local_perm = optimize_chunk(
                                           ~~~~~~~~~~~~~~^
        start_idx, end_idx, body, stalls, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 795, in optimize_chunk
    result = Heuristics.optimize_binsearch(
        cur_body,
    ...<3 lines>...
        suffix_len=suffix_len,
    )
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 145, in optimize_binsearch
    return Heuristics.optimize_binsearch_internal(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        source, logger, conf, flexible=flexible, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 290, in optimize_binsearch_internal
    success = core.optimize(source, **kwargs)
  File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 1701, in optimize
    self._add_variables_functional_units()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 2633, in _add_variables_functional_units
    cycles_unit_occupied = self.target.get_inverse_throughput(t.inst)
  File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/cortex_a55.py", line 659, in get_inverse_throughput
    return lookup_multidict(inverse_throughput, src, instclass_src)
  File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1874, in lookup_multidict
    raise UnknownInstruction(
        f"Multiple matches found for {instclass} for {inst}"
    )
slothy.targets.common.UnknownInstruction: Multiple matches found for <class 'slothy.targets.aarch64.aarch64_neon.x_str_sp_imm'> for str x1, [sp, #STACK_A_32]

It turns out that the instructions: x_str_sp_imm x_ldr_stack_imm are already contained in Str_X and Ldr_X, respectively, so numbers (latency, etc) are specified twice.
Besides that, the execution unit of q_ldr1_stack is redefined in two places:

    # q-form vector instructions
    (
        ...
        Str_Q,
        q_ldr1_stack,
        q_ldr1_post_inc,
        ...
    ): [
        [ExecutionUnit.VEC0, ExecutionUnit.VEC1]
    ],  # these instructions use both VEC0 and VEC1
...
    # non-q-form vector instructions
    (
        ...
        d_ldr_stack_with_inc,
        q_ldr1_stack,
        Q_Ld2_Lane_Post_Inc,
        ...
    ): [
        ExecutionUnit.VEC0,
        ExecutionUnit.VEC1,
    ],  # these instructions use VEC0 or VEC1

After fixing these two, I finally got the LLVM-Mc Error as follows:

(venv) huangchengwei@huangchengweis-MacBook-Air tutorial % python3 tutorial-7.py
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.slothy:Objective: None (any satisfying solution is fine)
INFO:slothy.mainloop.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.slothy:[75.8771s]: Found 1 solutions so far... objective (no objective): currently 0.0, bound 0.0
INFO:slothy.mainloop.slothy:OPTIMAL, wall time: 76.141257 s
INFO:slothy.mainloop.slothy:Booleans in result: 17265
INFO:slothy.mainloop.slothy.selfcheck:OK!
INFO:slothy.mainloop.slothy.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.split.0_96:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.0_96:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.0_96:Objective: minimize cycles
INFO:slothy.mainloop.split.0_96:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.0_96:[2.9688s]: Found 1 solutions so far... objective (minimize cycles): currently  (Cycles ~ 60.0, IPC ~ 1.60), bound  (Cycles ~ 54.0, IPC ~ 1.78)
INFO:slothy.mainloop.split.0_96:OPTIMAL, wall time: 3.721717 s
INFO:slothy.mainloop.split.0_96:Booleans in result: 8125
INFO:slothy.mainloop.split.0_96.selfcheck:OK!
INFO:slothy.mainloop.split.0_96.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop.split.0_96:Minimum number of stalls: 12
INFO:slothy.mainloop.split.96_192:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.96_192:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.96_192:Objective: minimize cycles



...



INFO:slothy.mainloop.split.479_575:OPTIMAL, wall time: 0.180550 s
INFO:slothy.mainloop.split.479_575:Booleans in result: 67
INFO:slothy.mainloop.split.479_575.selfcheck:OK!
INFO:slothy.mainloop.split.479_575.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop.split.479_575:Minimum number of stalls: 77
INFO:slothy.mainloop.split.575_671:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.575_671:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.575_671:Objective: minimize cycles
INFO:slothy.mainloop.split.575_671:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.575_671:INFEASIBLE, wall time: 0.007577 s
INFO:slothy.mainloop.split.575_671:Attempt optimization with max 64 stalls...
INFO:slothy.mainloop.split.575_671:Objective: minimize cycles
INFO:slothy.mainloop.split.575_671:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.575_671:INFEASIBLE, wall time: 0.040110 s
INFO:slothy.mainloop.split.575_671:Attempt optimization with max 128 stalls...
INFO:slothy.mainloop.split.575_671:Objective: minimize cycles
INFO:slothy.mainloop.split.575_671:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.575_671:[0.1387s]: Found 1 solutions so far... objective (minimize cycles): currently  (Cycles ~ 116.0, IPC ~ 0.83), bound  (Cycles ~ 110.0, IPC ~ 0.87)
INFO:slothy.mainloop.split.575_671:OPTIMAL, wall time: 0.152324 s
INFO:slothy.mainloop.split.575_671:Booleans in result: 30
INFO:slothy.mainloop.split.575_671.selfcheck:OK!
INFO:slothy.mainloop.split.575_671.selftest:Running selftest (10 iterations)...
INFO:slothy.mainloop.split.575_671.selftest:Inferred that the following registers seem to act as pointers: set()
INFO:slothy.mainloop.split.575_671.selftest:Using default buffer size of 1024 bytes. If you want different buffer sizes, set selftest_address_registers manually.
ERROR:slothy.mainloop.split.575_671.selftest:llvm-mc failed to handle the following code
ERROR:slothy.mainloop.split.575_671.selftest:.global harness
harness:
        umlal v19.2D, v1.2S, v18.2S // ...............................................................................................................................................................................................................................................................................................*...............................................................................................................................................................................................
        mul v25.2S, v5.2S, v31.2S // ................................................................................................................................................................................................................................................................................................*..............................................................................................................................................................................................
        umull v13.2D, v16.2S, v20.2S // ................................................................................................................................................................................................................................................................................................*..............................................................................................................................................................................................


     ......


        zip2 v6.4S, v0.4S, v11.4S // ............................................................................................................................................................................................................................................................................................................................................*..................................................................................................................................................
        zip1 v22.4S, v25.4S, v12.4S // .............................................................................................................................................................................................................................................................................................................................................*.................................................................................................................................................
        zip2 v23.4S, v25.4S, v12.4S // .............................................................................................................................................................................................................................................................................................................................................*.................................................................................................................................................
        zip1 v25.2S, v13.2S, v2.2S // ..............................................................................................................................................................................................................................................................................................................................................*................................................................................................................................................
        zip2 v0.2S, v13.2S, v2.2S // ..............................................................................................................................................................................................................................................................................................................................................*................................................................................................................................................
        mov v16.d[0], v18.d[1] // ...............................................................................................................................................................................................................................................................................................................................................*...............................................................................................................................................
ERROR:slothy.mainloop.split.575_671.selftest:Output from llvm-mc
ERROR:slothy.mainloop.split.575_671.selftest:<stdin>:76:23: error: index must be an integer in range [-256, 255].
        ldr b29, [sp, #STACK_MASK2] // ....................................................................................................................................................................................................................................................................................................................................*..........................................................................................................................................................  // @slothy:reads=mask2
                      ^

Traceback (most recent call last):
  File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1383, in assemble
    r = subprocess.run(
        ["llvm-mc"] + args, input=code.encode(), capture_output=True, check=True
    )
  File "/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/subprocess.py", line 577, in run
    raise CalledProcessError(retcode, process.args,
                             output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['llvm-mc', '--arch=aarch64', '--assemble', '--filetype=obj', '--mattr=aes']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/huangchengwei/Documents/slothy/tutorial/tutorial-7.py", line 36, in <module>
    slothy.optimize(start="mainloop", end="end_label")
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/slothy.py", line 397, in optimize
    early, core, late, num_exceptional = Heuristics.periodic(body, logger, c)
                                         ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 373, in periodic
    res = Heuristics.linear(body, logger=logger, conf=conf)
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 462, in linear
    return Heuristics._split(body, logger, conf)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 986, in _split
    return Heuristics._split_inner(body, logger, c)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 882, in _split_inner
    cur_body, stalls, _ = optimize_chunks_many(
                          ~~~~~~~~~~~~~~~~~~~~^
        idx_lst, cur_body, stalls, show_stalls=False
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 831, in optimize_chunks_many
    body, stalls, cur_stalls, local_perm = optimize_chunk(
                                           ~~~~~~~~~~~~~~^
        start_idx, end_idx, body, stalls, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 795, in optimize_chunk
    result = Heuristics.optimize_binsearch(
        cur_body,
    ...<3 lines>...
        suffix_len=suffix_len,
    )
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 145, in optimize_binsearch
    return Heuristics.optimize_binsearch_internal(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        source, logger, conf, flexible=flexible, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 290, in optimize_binsearch_internal
    success = core.optimize(source, **kwargs)
  File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 1738, in optimize
    self._extract_result()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 2198, in _extract_result
    self._result.selftest(self.logger.getChild("selftest"))
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 978, in selftest
    SelfTest.run(
    ~~~~~~~~~~~~^
        self.config,
        ^^^^^^^^^^^^
    ...<5 lines>...
        self.config.selftest_iterations,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1580, in run
    final_regs_old, final_mem_old = run_code(codeA, txt="old")
                                    ~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1492, in run_code
    objcode, offset = LLVM_Mc.assemble(
                      ~~~~~~~~~~~~~~~~^
        code,
        ^^^^^
    ...<5 lines>...
        include_paths=config.compiler_include_paths,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1391, in assemble
    raise LLVM_Mc_Error from exc
slothy.helper.LLVM_Mc_Error

I think this is a issue related to slothy's functionality, since the error occurs when touching instructions with comment @slothy:reads=mask2.
I did not attempt to fix this because I am not familiar with how those @slothy:reads=mask2 syntax being processed in SLOTHY

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions