-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Abstract
I encountered three problems when I run tutorial-7.py:
- slothy.targets.common.FatalParsingException: Inconsistent dt:
- slothy.targets.common.UnknownInstruction: Multiple matches found for <class 'slothy.targets.aarch64.aarch64_neon.x_str_sp_imm'> for str x1, [sp, #STACK_A_32]
- slothy.helper.LLVM_Mc_Error
I personally solved the first two in my branch https://github.com/cheng-wei-huang0612/slothy/tree/Patch--fixing-tutorial-7.py
But I do not know how to deal with the third.
Description
In the main branch, when I run python3 tutorial-7.py, I got
(venv) huangchengwei@huangchengweis-MacBook-Air tutorial % python3 tutorial-7.py
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
Traceback (most recent call last):
File "/Users/huangchengwei/Documents/slothy/tutorial/tutorial-7.py", line 24, in <module>
slothy.optimize(start="mainloop", end="end_label")
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/slothy.py", line 397, in optimize
early, core, late, num_exceptional = Heuristics.periodic(body, logger, c)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 364, in periodic
dfg = DFG(
body, logger.getChild("dfg_generate_outputs"), DFGConfig(conf.copy())
)
File "/Users/huangchengwei/Documents/slothy/slothy/core/dataflow.py", line 720, in __init__
self.src = self._parse_source(src)
~~~~~~~~~~~~~~~~~~^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/dataflow.py", line 798, in _parse_source
return list(map(self._parse_line, src_lines))
File "/Users/huangchengwei/Documents/slothy/slothy/core/dataflow.py", line 787, in _parse_line
insts = self.arch.Instruction.parser(line)
File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 884, in parser
inst = inst_class.make(src)
File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 1246, in make
return AArch64Instruction.build(cls, src)
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 1230, in build
AArch64Instruction._enforce_datatype_matching(pattern, res)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/aarch64_neon.py", line 942, in _enforce_datatype_matching
raise FatalParsingException(f"Inconsistent dt: {dt}")
slothy.targets.common.FatalParsingException: Inconsistent dt: <dt1>
It turns out that SLOTHY is complaining about the parsing of usra instruction, whose parser definition is originally:
class vusra(AArch64Instruction):
pattern = "usra <Vd>.<dt0>, <Va>.<dt1>, <imm>"
inputs = ["Va"]
in_outs = ["Vd"]
I modified it into
class vusra(AArch64Instruction):
pattern = "usra <Vd>.<dt>, <Va>.<dt>, <imm>"
inputs = ["Va"]
in_outs = ["Vd"]
and then the second issue arise:
(venv) huangchengwei@huangchengweis-MacBook-Air tutorial % python3 tutorial-7.py
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.slothy:Objective: None (any satisfying solution is fine)
INFO:slothy.mainloop.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.slothy:[74.0876s]: Found 1 solutions so far... objective (no objective): currently 0.0, bound 0.0
INFO:slothy.mainloop.slothy:OPTIMAL, wall time: 74.309375 s
INFO:slothy.mainloop.slothy:Booleans in result: 16950
INFO:slothy.mainloop.slothy.selfcheck:OK!
INFO:slothy.mainloop.slothy.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.split.0_96:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.0_96:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.0_96:Objective: minimize cycles
INFO:slothy.mainloop.split.0_96:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.0_96:[2.3609s]: Found 1 solutions so far... objective (minimize cycles): currently (Cycles ~ 60.0, IPC ~ 1.60), bound (Cycles ~ 54.0, IPC ~ 1.78)
INFO:slothy.mainloop.split.0_96:OPTIMAL, wall time: 3.732789 s
INFO:slothy.mainloop.split.0_96:Booleans in result: 8125
INFO:slothy.mainloop.split.0_96.selfcheck:OK!
INFO:slothy.mainloop.split.0_96.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop.split.0_96:Minimum number of stalls: 12
INFO:slothy.mainloop.split.96_192:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.96_192:Attempt optimization with max 32 stalls...
Traceback (most recent call last):
File "/Users/huangchengwei/Documents/slothy/tutorial/tutorial-7.py", line 36, in <module>
slothy.optimize(start="mainloop", end="end_label")
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/slothy.py", line 397, in optimize
early, core, late, num_exceptional = Heuristics.periodic(body, logger, c)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 373, in periodic
res = Heuristics.linear(body, logger=logger, conf=conf)
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 462, in linear
return Heuristics._split(body, logger, conf)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 986, in _split
return Heuristics._split_inner(body, logger, c)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 882, in _split_inner
cur_body, stalls, _ = optimize_chunks_many(
~~~~~~~~~~~~~~~~~~~~^
idx_lst, cur_body, stalls, show_stalls=False
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 831, in optimize_chunks_many
body, stalls, cur_stalls, local_perm = optimize_chunk(
~~~~~~~~~~~~~~^
start_idx, end_idx, body, stalls, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 795, in optimize_chunk
result = Heuristics.optimize_binsearch(
cur_body,
...<3 lines>...
suffix_len=suffix_len,
)
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 145, in optimize_binsearch
return Heuristics.optimize_binsearch_internal(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
source, logger, conf, flexible=flexible, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 290, in optimize_binsearch_internal
success = core.optimize(source, **kwargs)
File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 1701, in optimize
self._add_variables_functional_units()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 2633, in _add_variables_functional_units
cycles_unit_occupied = self.target.get_inverse_throughput(t.inst)
File "/Users/huangchengwei/Documents/slothy/slothy/targets/aarch64/cortex_a55.py", line 659, in get_inverse_throughput
return lookup_multidict(inverse_throughput, src, instclass_src)
File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1874, in lookup_multidict
raise UnknownInstruction(
f"Multiple matches found for {instclass} for {inst}"
)
slothy.targets.common.UnknownInstruction: Multiple matches found for <class 'slothy.targets.aarch64.aarch64_neon.x_str_sp_imm'> for str x1, [sp, #STACK_A_32]
It turns out that the instructions: x_str_sp_imm x_ldr_stack_imm are already contained in Str_X and Ldr_X, respectively, so numbers (latency, etc) are specified twice.
Besides that, the execution unit of q_ldr1_stack is redefined in two places:
# q-form vector instructions
(
...
Str_Q,
q_ldr1_stack,
q_ldr1_post_inc,
...
): [
[ExecutionUnit.VEC0, ExecutionUnit.VEC1]
], # these instructions use both VEC0 and VEC1
...
# non-q-form vector instructions
(
...
d_ldr_stack_with_inc,
q_ldr1_stack,
Q_Ld2_Lane_Post_Inc,
...
): [
ExecutionUnit.VEC0,
ExecutionUnit.VEC1,
], # these instructions use VEC0 or VEC1
After fixing these two, I finally got the LLVM-Mc Error as follows:
(venv) huangchengwei@huangchengweis-MacBook-Air tutorial % python3 tutorial-7.py
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.slothy:Objective: None (any satisfying solution is fine)
INFO:slothy.mainloop.slothy:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.slothy:[75.8771s]: Found 1 solutions so far... objective (no objective): currently 0.0, bound 0.0
INFO:slothy.mainloop.slothy:OPTIMAL, wall time: 76.141257 s
INFO:slothy.mainloop.slothy:Booleans in result: 17265
INFO:slothy.mainloop.slothy.selfcheck:OK!
INFO:slothy.mainloop.slothy.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop:SLOTHY version: unknown
INFO:slothy:Instructions in body: 958
INFO:slothy.mainloop.split.0_96:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.0_96:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.0_96:Objective: minimize cycles
INFO:slothy.mainloop.split.0_96:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.0_96:[2.9688s]: Found 1 solutions so far... objective (minimize cycles): currently (Cycles ~ 60.0, IPC ~ 1.60), bound (Cycles ~ 54.0, IPC ~ 1.78)
INFO:slothy.mainloop.split.0_96:OPTIMAL, wall time: 3.721717 s
INFO:slothy.mainloop.split.0_96:Booleans in result: 8125
INFO:slothy.mainloop.split.0_96.selfcheck:OK!
INFO:slothy.mainloop.split.0_96.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop.split.0_96:Minimum number of stalls: 12
INFO:slothy.mainloop.split.96_192:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.96_192:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.96_192:Objective: minimize cycles
...
INFO:slothy.mainloop.split.479_575:OPTIMAL, wall time: 0.180550 s
INFO:slothy.mainloop.split.479_575:Booleans in result: 67
INFO:slothy.mainloop.split.479_575.selfcheck:OK!
INFO:slothy.mainloop.split.479_575.selftest:Skipping selftest as input contains symbolic registers.
INFO:slothy.mainloop.split.479_575:Minimum number of stalls: 77
INFO:slothy.mainloop.split.575_671:Perform internal binary search for minimal number of stalls...
INFO:slothy.mainloop.split.575_671:Attempt optimization with max 32 stalls...
INFO:slothy.mainloop.split.575_671:Objective: minimize cycles
INFO:slothy.mainloop.split.575_671:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.575_671:INFEASIBLE, wall time: 0.007577 s
INFO:slothy.mainloop.split.575_671:Attempt optimization with max 64 stalls...
INFO:slothy.mainloop.split.575_671:Objective: minimize cycles
INFO:slothy.mainloop.split.575_671:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.575_671:INFEASIBLE, wall time: 0.040110 s
INFO:slothy.mainloop.split.575_671:Attempt optimization with max 128 stalls...
INFO:slothy.mainloop.split.575_671:Objective: minimize cycles
INFO:slothy.mainloop.split.575_671:Invoking external constraint solver (OR-Tools CP-SAT v9.12.4544) ...
INFO:slothy.mainloop.split.575_671:[0.1387s]: Found 1 solutions so far... objective (minimize cycles): currently (Cycles ~ 116.0, IPC ~ 0.83), bound (Cycles ~ 110.0, IPC ~ 0.87)
INFO:slothy.mainloop.split.575_671:OPTIMAL, wall time: 0.152324 s
INFO:slothy.mainloop.split.575_671:Booleans in result: 30
INFO:slothy.mainloop.split.575_671.selfcheck:OK!
INFO:slothy.mainloop.split.575_671.selftest:Running selftest (10 iterations)...
INFO:slothy.mainloop.split.575_671.selftest:Inferred that the following registers seem to act as pointers: set()
INFO:slothy.mainloop.split.575_671.selftest:Using default buffer size of 1024 bytes. If you want different buffer sizes, set selftest_address_registers manually.
ERROR:slothy.mainloop.split.575_671.selftest:llvm-mc failed to handle the following code
ERROR:slothy.mainloop.split.575_671.selftest:.global harness
harness:
umlal v19.2D, v1.2S, v18.2S // ...............................................................................................................................................................................................................................................................................................*...............................................................................................................................................................................................
mul v25.2S, v5.2S, v31.2S // ................................................................................................................................................................................................................................................................................................*..............................................................................................................................................................................................
umull v13.2D, v16.2S, v20.2S // ................................................................................................................................................................................................................................................................................................*..............................................................................................................................................................................................
......
zip2 v6.4S, v0.4S, v11.4S // ............................................................................................................................................................................................................................................................................................................................................*..................................................................................................................................................
zip1 v22.4S, v25.4S, v12.4S // .............................................................................................................................................................................................................................................................................................................................................*.................................................................................................................................................
zip2 v23.4S, v25.4S, v12.4S // .............................................................................................................................................................................................................................................................................................................................................*.................................................................................................................................................
zip1 v25.2S, v13.2S, v2.2S // ..............................................................................................................................................................................................................................................................................................................................................*................................................................................................................................................
zip2 v0.2S, v13.2S, v2.2S // ..............................................................................................................................................................................................................................................................................................................................................*................................................................................................................................................
mov v16.d[0], v18.d[1] // ...............................................................................................................................................................................................................................................................................................................................................*...............................................................................................................................................
ERROR:slothy.mainloop.split.575_671.selftest:Output from llvm-mc
ERROR:slothy.mainloop.split.575_671.selftest:<stdin>:76:23: error: index must be an integer in range [-256, 255].
ldr b29, [sp, #STACK_MASK2] // ....................................................................................................................................................................................................................................................................................................................................*.......................................................................................................................................................... // @slothy:reads=mask2
^
Traceback (most recent call last):
File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1383, in assemble
r = subprocess.run(
["llvm-mc"] + args, input=code.encode(), capture_output=True, check=True
)
File "/opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/subprocess.py", line 577, in run
raise CalledProcessError(retcode, process.args,
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['llvm-mc', '--arch=aarch64', '--assemble', '--filetype=obj', '--mattr=aes']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/huangchengwei/Documents/slothy/tutorial/tutorial-7.py", line 36, in <module>
slothy.optimize(start="mainloop", end="end_label")
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/slothy.py", line 397, in optimize
early, core, late, num_exceptional = Heuristics.periodic(body, logger, c)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 373, in periodic
res = Heuristics.linear(body, logger=logger, conf=conf)
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 462, in linear
return Heuristics._split(body, logger, conf)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 986, in _split
return Heuristics._split_inner(body, logger, c)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 882, in _split_inner
cur_body, stalls, _ = optimize_chunks_many(
~~~~~~~~~~~~~~~~~~~~^
idx_lst, cur_body, stalls, show_stalls=False
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 831, in optimize_chunks_many
body, stalls, cur_stalls, local_perm = optimize_chunk(
~~~~~~~~~~~~~~^
start_idx, end_idx, body, stalls, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 795, in optimize_chunk
result = Heuristics.optimize_binsearch(
cur_body,
...<3 lines>...
suffix_len=suffix_len,
)
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 145, in optimize_binsearch
return Heuristics.optimize_binsearch_internal(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
source, logger, conf, flexible=flexible, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/core/heuristics.py", line 290, in optimize_binsearch_internal
success = core.optimize(source, **kwargs)
File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 1738, in optimize
self._extract_result()
~~~~~~~~~~~~~~~~~~~~^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 2198, in _extract_result
self._result.selftest(self.logger.getChild("selftest"))
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/core/core.py", line 978, in selftest
SelfTest.run(
~~~~~~~~~~~~^
self.config,
^^^^^^^^^^^^
...<5 lines>...
self.config.selftest_iterations,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1580, in run
final_regs_old, final_mem_old = run_code(codeA, txt="old")
~~~~~~~~^^^^^^^^^^^^^^^^^^
File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1492, in run_code
objcode, offset = LLVM_Mc.assemble(
~~~~~~~~~~~~~~~~^
code,
^^^^^
...<5 lines>...
include_paths=config.compiler_include_paths,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/huangchengwei/Documents/slothy/slothy/helper.py", line 1391, in assemble
raise LLVM_Mc_Error from exc
slothy.helper.LLVM_Mc_Error
I think this is a issue related to slothy's functionality, since the error occurs when touching instructions with comment @slothy:reads=mask2.
I did not attempt to fix this because I am not familiar with how those @slothy:reads=mask2 syntax being processed in SLOTHY