Skip to content

Commit f12fb43

Browse files
author
Fonic
committed
Upload version 3.3
Changelog for v3.3 release: - fixed regression regarding deduplication of consecutive data lines (added in v3.2) messing up disassembly split into separate files (i.e. reconstructed source files) (fixes issue #16) - prevent very long lines when deduplicating consecutive data lines by truncating hex output/display + appending '..' - added support for regions with multiple access sizes when generating/ outputting possible hints for code objects - extended pretty printer (modules/module_pretty_print.py) to produce hex dumps of bytes and other bytes-like objects - extended file writer (module_miscellaneous.py) to create folders for destination path if missing - fixed regex strings in 're.match' and 're.search' calls producing 'SyntaxWarning's with Python 3.12+ due to invalid escape sequences (https://stackoverflow.com/a/52335971/1976617) - applied various minor changes (console output, code formatting, comments, etc.)
1 parent 0b930a0 commit f12fb43

14 files changed

+481
-220
lines changed

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
## Changelog for v3.3 release
2+
3+
- fixed regression regarding *deduplication of consecutive data lines* (added in v3.2) messing up disassembly split into separate files (i.e. reconstructed source files) (fixes issue #16)
4+
- prevent very long lines when deduplicating consecutive data lines by truncating hex output/display + appending '..'
5+
- added support for regions with multiple access sizes when generating/outputting possible hints for code objects
6+
- extended pretty printer (`modules/module_pretty_print.py`) to produce hex dumps of bytes and other bytes-like objects
7+
- extended file writer (`module_miscellaneous.py`) to create folders for destination path if missing
8+
- fixed regex strings in `re.match` and `re.search` calls producing `SyntaxWarning`s with Python 3.12+ due to invalid escape sequences (https://stackoverflow.com/a/52335971/1976617)
9+
- applied various minor changes (console output, code formatting, comments, etc.)
10+
111
## Changelog for v3.2 release
212

313
- added algorithm to *deduplicate consecutive data lines* in formatted disassembly (*greatly* reduces disassembly size for data objects)
@@ -38,3 +48,7 @@
3848
- initial release
3949
- monolithic (everything in one single source file)
4050
- originally named 'wcdctool' (*Watcom Decompilation Tool*)
51+
52+
##
53+
54+
_Last updated: 08/31/24_

Executables/HARVEST.EXE

1.12 MB
Binary file not shown.

Hints/MK1.EXE.txt

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# ----------------------------------------------------------------------------
66
# -
77
# Created by Fonic <https://github.com/fonic> -
8-
# Date: 06/20/19 - 04/06/22 -
8+
# Date: 06/20/19 - 08/31/24 -
99
# -
1010
# ----------------------------------------------------------------------------
1111

@@ -361,4 +361,34 @@ Object 2:
361361
#3) offset = 0000D902H, length = 00000096H, type = data, mode = bytes, comment = Default highscore table; structure: 3 chars initials, 1 byte wins, 1 byte wins, 0, 4 bytes score (?)
362362
3) offset = 0000D902H, length = 00000096H, type = data, mode = struct:chars[3]:bytes[3]:dword, comment = Default highscore table (3 chars initials, 1 byte wins, 1 byte wins, 0, 4 bytes score) (?)
363363
4) offset = 0000E7A4H, length = 00000138H, type = data, mode = strings, comment = List of strings for keyboard keys
364-
5) start = 00024B40H, end = 00024BA0H, type = data, mode = comment, comment = Emulation of registers A0-A14 / B0-B14 of original arcade machine
364+
5) start = 00024B40H, end = 00024BA0H, type = data, mode = comment, comment = Emulation of registers A0-A14 / B0-B14 of original arcade machine's architecture (TMS34010)
365+
366+
367+
Global Info (section 1)
368+
==============================================================================
369+
370+
#
371+
# MK1.EXE seems to contain significantly less named labels in its debug info
372+
# (likely limited to exported symbols only, i.e. code blocks that are called
373+
# from other modules and thus need to be exported) compared to MK2.EXE. As a
374+
# result, there are functions like 'WHO_IS_ALONE' which run a lot longer than
375+
# they should and actually span over other, unnamed functions (for which there
376+
# are no named symbols available in debug info).
377+
#
378+
# Note that this is just a stub for demonstration, there are likely many more
379+
# missing symbols + long functions that require splitting. Since MK1 and MK2
380+
# are quite similar (MK2 seems to have used and expanded on MK1's code base),
381+
# a thorough comparison might help reveal further missing symbols.
382+
#
383+
# For details, refer to:
384+
# https://github.com/fonic/wcdatool/issues/15#issuecomment-2308410432
385+
#
386+
387+
Name: GAME_FINISHED
388+
address = 0001:0000ca2e
389+
module index = 12
390+
kind: (code)
391+
Name: GAME_OVER
392+
address = 0001:0000cb9e
393+
module index = 12
394+
kind: (code)

Hints/MK2.EXE.txt

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
# ----------------------------------------------------------------------------
66
# -
77
# Created by Fonic <https://github.com/fonic> -
8-
# Date: 06/20/19 - 07/25/23 -
8+
# Date: 06/20/19 - 08/31/24 -
99
# -
1010
# ----------------------------------------------------------------------------
1111

@@ -101,11 +101,12 @@ Object 2:
101101
208) start = 0003419EH, end = 000341B6H, type = data, mode = string, comment = String
102102
209) start = 000341B6H, end = 000341C6H, type = data, mode = string, comment = String
103103
210) start = 000341C6H, end = 000341D1H, type = data, mode = string, comment = String
104+
211) start = 000922ECH, end = 0009234CH, type = data, mode = comment, comment = Emulation of registers A0-A14 / B0-B14 of original arcade machine's architecture (TMS34010)
104105
# Code in data object
105106
# -> just a stub, there are many more
106107
# -> may be removed once this has been automated using tracing disassembler
107-
211) start = 0001D97DH, end = 0001D991H, type = code, mode = default, comment = Code (?)
108-
212) start = 0001D991H, end = 0001D9A5H, type = code, mode = default, comment = Code (?)
109-
213) start = 0001D9A5H, end = 0001D9DBH, type = code, mode = default, comment = Code (?)
110-
214) start = 0001D9DBH, end = 0001DA54H, type = code, mode = default, comment = Code (?)
108+
300) start = 0001D97DH, end = 0001D991H, type = code, mode = default, comment = Code (?)
109+
301) start = 0001D991H, end = 0001D9A5H, type = code, mode = default, comment = Code (?)
110+
302) start = 0001D9A5H, end = 0001D9DBH, type = code, mode = default, comment = Code (?)
111+
303) start = 0001D9DBH, end = 0001DA54H, type = code, mode = default, comment = Code (?)
111112
#...

README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,13 @@ Thus, I began writing my own tool. What originally started out as *mkdecomptool*
2424

2525
Note that while wcdatool performs the tasks it is designed for quite well, it is not intended to compete with or replace high-end tools like *IDA Pro* or *Ghidra*.
2626

27-
## Current state and future development
27+
## Current state / future development
2828

29-
Wcdatool is *work in progress*. You can tell from looking at the source code - there's tons of TODO, TESTING, FIXME, etc. flying around. Also, it is relatively slow as performance has not been the main focus ([Cython](https://cython.org/) might be utilized in the future to increase performance).
29+
Wcdatool works quite well in its current state - you'll get a well-readable, reasonably structured disassembly output (*objdump* format, *Intel* syntax). Check out issues [#9](https://github.com/fonic/wcdatool/issues/9) and [#11](https://github.com/fonic/wcdatool/issues/11) for games other than *Mortal Kombat* that wcdatool worked nicely for thus far. **Please note that wcdatool works best when used on executables that contain debug symbols.** If you come across other *unstripped* *Watcom*-based DOS applications that may be used for further testing and development, please let me know.
3030

31-
Nevertheless, it works quite well in its current state - you'll get a well-readable, reasonably structured disassembly output (*objdump* format, *Intel* syntax). Check out issues [#9](https://github.com/fonic/wcdatool/issues/9) and [#11](https://github.com/fonic/wcdatool/issues/11) for games other than *Mortal Kombat* that wcdatool worked nicely for thus far. Please note that wcdatool works best when used on executables that contain debug symbols. If you come across other *unstripped* *Watcom*-based DOS applications that may be used for further testing and development, please let me know.
31+
**However, the current approach has reached its EOL.** There is no point in advancing it any further (aside from fixing bugs), as there are limits inherent to the fundamental design that cannot be overcome easily. Thus, the next major goal is to cleanly *rewrite the disassembler module* and transition from *static code disassembly* to *execution flow tracing* (e.g. *Mortal Kombat 2* executable contains code within its data object, which is neither discovered nor analyzed with the current approach). Also, instead of treating objects separately, a *linear unified address space* containing all object data shall be implemented. This will allow to *apply fixups on a binary level*, which should simplify dealing with references that cross object boundaries and with placeholders (stubs) that are replaced via fixups at run time.
3232

33-
The *next major goal* is to cleanly rewrite the disassembler module and transition from *static code disassembly* to *execution flow tracing* (e.g. *Mortal Kombat 2* executable contains code within its data object, which is neither discovered nor processed with the current approach).
33+
Last but not least, wcdatool in its current state is relatively slow, as performance has not been the main focus during development. [Cython](https://cython.org/) might be utilized in the future to increase performance.
3434

3535
## Output sample
3636

@@ -97,17 +97,18 @@ There are multiple ways to use *wcdatool*, but the following instructions should
9797

9898
7. Have a look at the results in `wcdatool/Output`:
9999
- File `<name-of-executable>_zzz_log.txt` contains *log messages* (same as console output, but without coloring/formatting)
100-
- Files `<name-of-executable>_disasm_object_x_disassembly_plain.asm` contain *plain disassembly*
101-
- Files `<name-of-executable>_disasm_object_x_disassembly_formatted.asm` contain *formatted disassembly*
102-
- Folder `<name-of-executable>_modules` contains *formatted disassembly split into separate files* (this attempts to reconstruct the application's original source files if corresponding debug information is available)
100+
- Files `<name-of-executable>_disasm_object_x_disassembly_plain.asm` contain *plain disassembly* (unmodified *objdump* output, useful for reference)
101+
- Files `<name-of-executable>_disasm_object_x_disassembly_formatted.asm` contain *formatted disassembly* (this is arguably the most interesting/useful output)
102+
- Files `<name-of-executable>_disasm_object_x_disassembly_formatted_deduplicated.asm` contain *formatted deduplicated disassembly* (same as above, but with data portions being compressed for increased readability where applicable)
103+
- Folder `<name-of-executable>_modules` contains *formatted disassembly split into separate files* (same as above, additionally attempts to reconstruct an application's original source files if corresponding debug information is available)
103104

104105
**NOTE:** if you are new to assembler/assembly language, check out this [x86 Assembly Guide](https://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
105106

106107
8. Refine the output by analyzing the disassembly, updating the object hints and re-running *wcdatool* (i.e. loop steps 5-8):
107-
- Identify and add hints for regions in code objects that are actually data (look for `; misplaced item` comments, `(bad)` assembly instructions and labels with `; access size` comments)
108+
- Identify and add hints for regions in code objects that are actually data (look for `; misplaced item` comments, `(bad)` assembly instructions and labels with trailing `; access size` comments)
108109
- Identify and add hints for regions in data objects that are actually code (look for `call`/`jmp` instructions in code objects with fixup targets pointing to data objects)
109110
- Check section `Possible object hints` of *wcdatool*'s console output / log file for suggestions (not guaranteed to be correct, but likely a good starting point)
110-
- *The ultimate goal here is to eliminate all (or at least most) warnings issued by wcdatool*. Each warning points out a region of the disassembly that does currently seem flawed and therefore requires further attention/investigation. Note that there is a *cascading effect* at work (e.g. a region of data that is falsely intepreted as code may produce bogus branches, leading to further warnings), thus warnings should be tackled one (or few) at a time from first to last with *wcdatool* re-runs in between
111+
- *The ultimate goal is to eliminate all (or at least most) warnings issued by wcdatool*. Each warning points out a region of the disassembly that does currently seem flawed and therefore requires further attention/investigation. Note that there is a *cascading effect* at work (e.g. a region of data that is falsely intepreted as code may produce bogus branches, leading to further warnings), thus warnings should be tackled one (or few) at a time from first to last with *wcdatool* re-runs in between
111112

112113
**NOTE:** this is by far the most time-consuming part, but *crucial* to achieve good and clean results (!)
113114

@@ -153,4 +154,4 @@ If you want to get in touch with me, give feedback, ask questions or simply need
153154

154155
##
155156

156-
_Last updated: 08/12/23_
157+
_Last updated: 08/31/24_

0 commit comments

Comments
 (0)