+LuaJIT is also fully ABI-compatible to Lua 5.1 at the linker/dynamic
+loader level. This means you can compile a C module against the
+standard Lua headers and load the same shared library from either Lua
+or LuaJIT.
+
+
+
bit.* — Bitwise Operations
+
+LuaJIT supports all bitwise operations as defined by
+» Lua BitOp:
+
+This module is a LuaJIT built-in — you don't need to download or
+install Lua BitOp. The Lua BitOp site has full documentation for all
+» Lua BitOp API functions.
+
+
+Please make sure to require the module before using any of
+its functions:
+
+
+local bit = require("bit")
+
+
+An already installed Lua BitOp module is ignored by LuaJIT.
+This way you can use bit operations from both Lua and LuaJIT on a
+shared installation.
+
+
+
jit.* — JIT compiler control
+
+The functions in this built-in module control the behavior
+of the JIT compiler engine.
+
+
+
jit.on()
+jit.off()
+
+Turns the whole JIT compiler on (default) or off.
+
+
+These functions are typically used with the command line options
+-j on or -j off.
+
+
+
jit.flush()
+
+Flushes the whole cache of compiled code.
+
+
+
jit.flush(tr)
+
+Flushes the code for the specified root trace and all of its
+side traces from the cache.
+
+jit.on enables JIT compilation for a Lua function (this is
+the default).
+
+
+jit.off disables JIT compilation for a Lua function and
+flushes any already compiled code from the code cache.
+
+
+jit.flush flushes the code, but doesn't affect the
+enable/disable status.
+
+
+The current function, i.e. the Lua function calling this library
+function, can also be specified by passing true as the first
+argument.
+
+
+If the second argument is true, JIT compilation is also
+enabled, disabled or flushed recursively for all subfunctions of a
+function. With false only the subfunctions are affected.
+
+
+The jit.on and jit.off functions only set a flag
+which is checked when the function is about to be compiled. They do
+not trigger immediate compilation.
+
+
+Typical usage is jit.off(true, true) in the main chunk
+of a module to turn off JIT compilation for the whole module for
+debugging purposes.
+
+
+
jit.version
+
+Contains the LuaJIT version string.
+
+
+
jit.version_num
+
+Contains the version number of the LuaJIT core. Version xx.yy.zz
+is represented by the decimal number xxyyzz.
+
+
+
jit.arch
+
+Contains the target architecture name (CPU and optional ABI).
+
+
+
jit.opt.* — JIT compiler optimization control
+
+This module provides the backend for the -O command line
+option.
+
+
+You can also use it programmatically, e.g.:
+
+
+jit.opt.start(2) -- same as -O2
+jit.opt.start("-dce")
+jit.opt.start("hotloop=10", "hotexit=2")
+
+
+Unlike in LuaJIT 1.x, the module is built-in and
+optimization is turned on by default!
+It's no longer necessary to run require("jit.opt").start(),
+which was one of the ways to enable optimization.
+
+
+
jit.util.* — JIT compiler introspection
+
+This module holds functions to introspect the bytecode, generated
+traces, the IR and the generated machine code. The functionality
+provided by this module is still in flux and therefore undocumented.
+
+
+The debug modules -jbc, -jv and -jdump make
+extensive use of these functions. Please check out their source code,
+if you want to know more.
+
+This is a list of changes between the released versions of LuaJIT.
+The current development version is LuaJIT 2.0.0-beta1.
+The current stable version is LuaJIT 1.1.5.
+
Remove a (sometimes) wrong assertion in luaJIT_findpc().
+
DynASM now allows labels for displacements and .aword.
+
Fix some compiler warnings for DynASM glue (internal API change).
+
Correct naming for SSSE3 (temporarily known as SSE4) in DynASM and x86 disassembler.
+
The loadable debug modules now handle redirection to stdout
+(e.g. -j trace=-).
+
+
+
LuaJIT 1.1.2 — 2006-06-24
+
+
Fix MSVC inline assembly: use only local variables with
+lua_number2int().
+
Fix "attempt to call a thread value" bug on Mac OS X:
+make values of consts used as lightuserdata keys unique
+to avoid joining by the compiler/linker.
The C stack is kept 16 byte aligned (faster).
+Mandatory for Mac OS X on Intel, too.
+
Faster calling conventions for internal C helper functions.
+
Better instruction scheduling for function prologue, OP_CALL and
+OP_RETURN.
+
+
+
Miscellaneous optimizations:
+
+
Faster loads of FP constants. Remove narrow-to-wide store-to-load
+forwarding stalls.
+
Use (scalar) SSE2 ops (if the CPU supports it) to speed up slot moves
+and FP to integer conversions.
+
Optimized the two-argument form of OP_CONCAT (a..b).
+
Inlined OP_MOD (a%b).
+With better accuracy than the C variant, too.
+
Inlined OP_POW (a^b). Unroll x^k or
+use k^x = 2^(log2(k)*x) or call pow().
+
+
+
Changes in the optimizer:
+
+
Improved hinting for table keys derived from table values
+(t1[t2[x]]).
+
Lookup hinting now works with arbitrary object types and
+supports index chains, too.
+
Generate type hints for arithmetic and comparison operators,
+OP_LEN, OP_CONCAT and OP_FORPREP.
+
Remove several hint definitions in favour of a generic COMBINE hint.
+
Complete rewrite of jit.opt_inline module
+(ex jit.opt_lib).
+
+
+
Use adaptive deoptimization:
+
+
If runtime verification of a contract fails, the affected
+instruction is recompiled and patched on-the-fly.
+Regular programs will trigger deoptimization only occasionally.
+
This avoids generating code for uncommon fallback cases
+most of the time. Generated code is up to 30% smaller compared to
+LuaJIT 1.0.3.
+
Deoptimization is used for many opcodes and contracts:
+
+
OP_CALL, OP_TAILCALL: type mismatch for callable.
+
Inlined calls: closure mismatch, parameter number and type mismatches.
+
OP_GETTABLE, OP_SETTABLE: table or key type and range mismatches.
+
All arithmetic and comparison operators, OP_LEN, OP_CONCAT,
+OP_FORPREP: operand type and range mismatches.
+
+
Complete redesign of the debug and traceback info
+(bytecode ↔ mcode) to support deoptimization.
+Much more flexible and needs only 50% of the space.
+
The modules jit.trace, jit.dumphints and
+jit.dump handle deoptimization.
+
+
+
Inlined many popular library functions
+(for commonly used arguments only):
+
+
Most math.* functions (the 18 most used ones)
+[2x-10x faster].
+
string.len, string.sub and string.char
+[2x-10x faster].
+
table.insert, table.remove and table.getn
+[3x-5x faster].
+
coroutine.yield and coroutine.resume
+[3x-5x faster].
+
pairs, ipairs and the corresponding iterators
+[8x-15x faster].
+
+
+
Changes in the core and loadable modules and the stand-alone executable:
+
+
Added jit.version, jit.version_num
+and jit.arch.
+
Reorganized some internal API functions (jit.util.*mcode*).
+
The -j dump output now shows JSUB names, too.
+
New x86 disassembler module written in pure Lua. No dependency
+on ndisasm anymore. Flexible API, very compact (500 lines)
+and complete (x87, MMX, SSE, SSE2, SSE3, SSSE3, privileged instructions).
+
luajit -v prints the LuaJIT version and copyright
+on a separate line.
+
+
+
Added SSE, SSE2, SSE3 and SSSE3 support to DynASM.
+
Miscellaneous doc changes. Added a section about
+embedding LuaJIT.
The community-managed » Lua Wiki
+has information about diverse topics.
+
The primary source of information for the latest developments surrounding
+Lua is the » Lua mailing list.
+You can check out the » mailing
+list archive or
+» subscribe
+to the list (you need to be subscribed before posting).
+This is also the place where announcements and discussions about LuaJIT
+take place.
+
+
+
+
+
Q: Where can I learn more about the compiler technology used by LuaJIT?
+
+I'm planning to write more documentation about the internals of LuaJIT.
+In the meantime, please use the following Google Scholar searches
+to find relevant papers:
+Search for: » Trace Compiler
+Search for: » JIT Compiler
+Search for: » Dynamic Language Optimizations
+Search for: » SSA Form
+Search for: » Linear Scan Register Allocation
+And, you know, reading the source is of course the only way to enlightenment. :-)
+
+
+
+
+
Q: Why do I get this error: "attempt to index global 'arg' (a nil value)"?
+Q: My vararg functions fail after switching to LuaJIT!
+
LuaJIT is compatible to the Lua 5.1 language standard. It doesn't
+support the implicit arg parameter for old-style vararg
+functions from Lua 5.0. Please convert your code to the
+» Lua 5.1
+vararg syntax.
+
+
+
+
Q: Sometimes Ctrl-C fails to stop my Lua program. Why?
+
The interrupt signal handler sets a Lua debug hook. But this is
+currently ignored by compiled code (this will eventually be fixed). If
+your program is running in a tight loop and never falls back to the
+interpreter, the debug hook never runs and can't throw the
+"interrupted!" error. In the meantime you have to press Ctrl-C
+twice to get stop your program. That's similar to when it's stuck
+running inside a C function under the Lua interpreter.
+
+
+
+
Q: Why doesn't my favorite power-patch for Lua apply against LuaJIT?
+
Because it's a completely redesigned VM and has very little code
+in common with Lua anymore. Also, if the patch introduces changes to
+the Lua semantics, this would need to be reflected everywhere in the
+VM, from the interpreter up to all stages of the compiler. Please
+use only standard Lua language constructs. For many common needs you
+can use source transformations or use wrapper or proxy functions.
+The compiler will happily optimize away such indirections.
+
+
+
+
Q: Lua runs everywhere. Why doesn't LuaJIT support my CPU?
+
Because it's a compiler — it needs to generate native
+machine code. This means the code generator must be ported to each
+architecture. And the fast interpreter is written in assembler and
+must be ported, too. This is quite an undertaking. Currently only
+x86 CPUs are supported. x64 support is in the works. Other
+architectures will follow with sufficient demand and/or
+sponsoring.
+
+
+
+
Q: When will feature X be added? When will the next version be released?
+
When it's ready.
+C'mon, it's open source — I'm doing it on my own time and you're
+getting it for free. You can either contribute a patch or sponsor
+the development of certain features, if they are important to you.
+
+LuaJIT is only distributed as a source package. This page explains
+how to build and install LuaJIT with different operating systems
+and C compilers.
+
+
+For the impatient (on POSIX systems):
+
+
+make && sudo make install
+
+
+LuaJIT currently builds out-of-the box on all popular x86 systems
+(Linux, Windows, OSX etc.). It builds and runs fine as a 32 bit
+application under x64-based systems, too.
+
+
+
Configuring LuaJIT
+
+The standard configuration should work fine for most installations.
+Usually there is no need to tweak the settings, except when you want to
+install to a non-standard path. The following three files hold all
+user-configurable settings:
+
+
+
src/luaconf.h sets some configuration variables, in
+particular the default paths for loading modules.
+
Makefile has settings for installing LuaJIT (POSIX
+only).
+
src/Makefile has settings for compiling LuaJIT under POSIX,
+MinGW and Cygwin.
+
src/msvcbuild.bat has settings for compiling LuaJIT with
+MSVC.
+
+
+Please read the instructions given in these files, before changing
+any settings.
+
+
+
POSIX Systems (Linux, OSX, *BSD etc.)
+
Prerequisites
+
+Depending on your distribution, you may need to install a package for
+GCC (GCC 3.4 or later required), the development headers and/or a
+complete SDK.
+
+
+E.g. on a current Debian/Ubuntu, install libc6-dev
+with the package manager. Currently LuaJIT only builds as a 32 bit
+application, so you actually need to install libc6-dev-i386
+when building on an x64 OS.
+
+
+Download the current source package (pick the .tar.gz), if you haven't
+already done so. Move it to a directory of your choice, open a
+terminal window and change to this directory. Now unpack the archive
+and change to the newly created directory:
+
+The supplied Makefiles try to auto-detect the settings needed for your
+operating system and your compiler. They need to be run with GNU Make,
+which is probably the default on your system, anyway. Simply run:
+
+
+make
+
+
Installing LuaJIT
+
+The top-level Makefile installs LuaJIT by default under
+/usr/local, i.e. the executable ends up in
+/usr/local/bin and so on. You need to have root privileges
+to write to this path. So, assuming sudo is installed on your system,
+run the following command and enter your sudo password:
+
+
+sudo make install
+
+
+Otherwise specify the directory prefix as an absolute path, e.g.:
+
+
+sudo make install PREFIX=/opt/lj2
+
+
+But note that the installation prefix and the prefix for the module paths
+(configured in src/luaconf.h) must match.
+
+
+Note: to avoid overwriting a previous version, the beta test releases
+only install the LuaJIT executable under the versioned name (i.e.
+luajit-2.0.0-beta1). You probably want to create a symlink
+for convenience, with a command like this:
+
+Either install one of the open source SDKs
+(» MinGW or
+» Cygwin) which come with modified
+versions of GCC plus the required development headers.
+
+
+Or install Microsoft's Visual C++ (MSVC) — the freely downloadable
+» Express Edition
+works just fine.
+
+
+Next, download the source package and unpack it using an archive manager
+(e.g. the Windows Explorer) to a directory of your choice.
+
+
Building with MSVC
+
+Open a "Visual Studio .NET Command Prompt" and cd to the
+directory where you've unpacked the sources. Then run this command:
+
+
+cd src
+msvcbuild
+
+
+Then follow the installation instructions below.
+
+
Building with MinGW or Cygwin
+
+Open a command prompt window and make sure the MinGW or Cygwin programs
+are in your path. Then cd to the directory where
+you've unpacked the sources and run this command for MinGW:
+
+
+cd src
+mingw32-make
+
+
+Or this command for Cygwin:
+
+
+cd src
+make
+
+
+Then follow the installation instructions below.
+
+
Installing LuaJIT
+
+Copy luajit.exe and lua51.dll
+to a newly created directory (any location is ok). Add lua
+and lua\jit directories below it and copy all Lua files
+from the lib directory of the distribution to the latter directory.
+
+
+There are no hardcoded
+absolute path names — all modules are loaded relative to the
+directory where luajit.exe is installed
+(see src/luaconf.h).
+
+* Lua is a powerful, dynamic and light-weight programming language
+designed for extending applications. Lua is also frequently used as a
+general-purpose, stand-alone language. More information about
+Lua can be found at: » http://www.lua.org/
+
+
Compatibility
+
+LuaJIT implements the full set of language features defined by Lua 5.1.
+The virtual machine (VM) is API- and ABI-compatible to the
+standard Lua interpreter and can be deployed as a drop-in replacement.
+
+
+LuaJIT offers more performance, at the expense of portability. It
+currently runs on all popular operating systems based on x86 CPUs
+(Linux, Windows, OSX etc.). It will be ported to x64 CPUs and other
+platforms in the future, based on user demand and sponsoring.
+
+
+
Overview
+
+LuaJIT has been successfully used as a scripting middleware in
+games, 3D modellers, numerical simulations, trading platforms and many
+other specialty applications. It combines high flexibility with high
+performance and an unmatched low memory footprint: less than
+120K for the VM plus less than 80K for the JIT compiler.
+
+
+LuaJIT has been in continuous development since 2005. It's widely
+considered to be one of the fastest dynamic language
+implementations. It has outperfomed other dynamic languages on many
+cross-language benchmarks since its first release — often by a
+substantial margin. Only now, in 2009, other dynamic language VMs are
+starting to catch up with the performance of LuaJIT 1.x …
+
+
+2009 also marks the first release of the long-awaited LuaJIT 2.0.
+The whole VM has been rewritten from the ground up and relentlessly
+optimized for performance. It combines a high-speed interpreter,
+written in assembler, with a state-of-the-art JIT compiler.
+
+
+An innovative trace compiler is integrated with advanced,
+SSA-based optimizations and a highly tuned code generation backend. This
+allows a substantial reduction of the overhead associated with dynamic
+language features. It's destined to break into the performance range
+traditionally reserved for offline, static language compilers.
+
+
+
More ...
+
+Click on the LuaJIT sub-topics in the navigation bar to learn more
+about LuaJIT.
+
+
+Click on the Logo in the upper left corner to visit
+the LuaJIT project page on the web. All other links to online
+resources are marked with a '»'.
+
+LuaJIT has only a single stand-alone executable, called luajit on
+POSIX systems or luajit.exe on Windows. It can be used to run simple
+Lua statements or whole Lua applications from the command line. It has an
+interactive mode, too.
+
+
+Note: the beta test releases only install under the versioned name on
+POSIX systems (to avoid overwriting a previous version). You either need
+to type luajit-2.0.0-beta1 to start it or create a symlink
+with a command like this:
+
+Unlike previous versions optimization is turned on by default in
+LuaJIT 2.0! It's no longer necessary to use luajit -O.
+
+
+
Command Line Options
+
+The luajit stand-alone executable is just a slightly modified
+version of the regular lua stand-alone executable.
+It supports the same basic options, too. luajit -h
+prints a short list of the available options. Please have a look at the
+» Lua manual
+for details.
+
+
+Two additional options control the behavior of LuaJIT:
+
+
+
-j cmd[=arg[,arg...]]
+
+This option performs a LuaJIT control command or activates one of the
+loadable extension modules. The command is first looked up in the
+jit.* library. If no matching function is found, a module
+named jit.<cmd> is loaded and the start()
+function of the module is called with the specified arguments (if
+any). The space between -j and cmd is optional.
+
+
+Here are the available LuaJIT control commands:
+
+
+
-jon — Turns the JIT compiler on (default).
+
-joff — Turns the JIT compiler off (only use the interpreter).
+
-jflush — Flushes the whole cache of compiled code.
+
-jv — Shows verbose information about the progress of the JIT compiler.
+
-jdump — Dumps the code and structures used in various compiler stages.
+
+
+The -jv and -jdump commands are extension modules
+written in Lua. They are mainly used for debugging the JIT compiler
+itself. For a description of their options and output format, please
+read the comment block at the start of their source.
+They can be found in the lib directory of the source
+distribution or installed under the jit directory. By default
+this is /usr/local/share/luajit-2.0.0-beta1/jit on POSIX
+systems.
+
+
+
-O[level]
+-O[+]flag-O-flag
+-Oparam=value
+
+This options allows fine-tuned control of the optimizations used by
+the JIT compiler. This is mainly intended for debugging LuaJIT itself.
+Please note that the JIT compiler is extremly fast (we are talking
+about the microsecond to millisecond range). Disabling optimizations
+doesn't have any visible impact on its overhead, but usually generates
+code that runs slower.
+
+
+The first form sets an optimization level — this enables a
+specific mix of optimization flags. -O0 turns off all
+optimizations and higher numbers enable more optimizations. Omitting
+the level (i.e. just -O) sets the default optimization level,
+which is -O3 in the current version.
+
+
+The second form adds or removes individual optimization flags.
+The third form sets a parameter for the VM or the JIT compiler
+to a specific value.
+
+
+You can either use this option multiple times (like -Ocse
+-O-dce -Ohotloop=10) or separate several settings with a comma
+(like -O+cse,-dce,hotloop=10). The settings are applied from
+left to right and later settings override earlier ones. You can freely
+mix the three forms, but note that setting an optimization level
+overrides all earlier flags.
+
+
+Here are the available flags and at what optimization levels they
+are enabled:
+
+
+
+
Flag
+
-O1
+
-O2
+
-O3
+
+
+
+
fold
•
•
•
Constant Folding, Simplifications and Reassociation
+
+
cse
•
•
•
Common-Subexpression Elimination
+
+
dce
•
•
•
Dead-Code Elimination
+
+
narrow
•
•
Narrowing of numbers to integers
+
+
loop
•
•
Loop Optimizations (code hoisting)
+
+
fwd
•
Load Forwarding (L2L) and Store Forwarding (S2L)
+
+
dse
•
Dead-Store Elimination
+
+
fuse
•
Fusion of operands into instructions
+
+
+Here are the parameters and their default settings:
+
+
+
+
Parameter
+
Default
+
+
+
+
maxtrace
1000
Max. number of traces in the cache
+
+
maxrecord
2000
Max. number of recorded IR instructions
+
+
maxirconst
500
Max. number of IR constants of a trace
+
+
maxside
100
Max. number of side traces of a root trace
+
+
maxsnap
100
Max. number of snapshots for a trace
+
+
hotloop
57
Number of iterations to detect a hot loop
+
+
hotexit
10
Number of taken exits to start a side trace
+
+
tryside
4
Number of attempts to compile a side trace
+
+
instunroll
4
Max. unroll factor for instable loops
+
+
loopunroll
7
Max. unroll factor for loop ops in side traces
+
+
callunroll
3
Max. unroll factor for pseudo-recursive calls
+
+
sizemcode
32
Size of each machine code area in KBytes (Windows: 64K)
+
+
maxmcode
512
Max. total size of all machine code areas in KBytes
+The LuaJIT 1.x series represents
+the current stable branch. As of
+this writing there have been no open bugs since about a year. So, if
+you need a rock-solid VM, you are encouraged to fetch the latest
+release of LuaJIT 1.x from the » Download
+page.
+
+
+LuaJIT 2.0 is the currently active
+development branch.
+It has Beta Test status and is still undergoing
+substantial changes. It's expected to quickly mature within the next
+months. You should definitely start to evaluate it for new projects
+right now. But deploying it in production environments is not yet
+recommended.
+
+
+
Current Status
+
+This is a list of the things you should know about the LuaJIT 2.0 beta test:
+
+
+
+The JIT compiler can only generate code for CPUs with SSE2 at the
+moment. I.e. you need at least a P4, Core 2/i5/i7 or K8/K10 to use it. I
+plan to fix this during the beta phase and add support for emitting x87
+instructions to the backend.
+
+
+Obviously there will be many bugs in a VM which has been
+rewritten from the ground up. Please report your findings together with
+the circumstances needed to reproduce the bug. If possible reduce the
+problem down to a simple test cases.
+There is no formal bug tracker at the moment. The best place for
+discussion is the
+» Lua mailing list. Of course
+you may also send your bug report directly to me, especially when they
+contains lengthy debug output. Please check the
+Contact page for details.
+
+
+The VM is complete in the sense that it should run all Lua code
+just fine. It's considered a serious bug if the VM crashes or produces
+unexpected results — please report it. There are only very few
+known incompatibilities with standard Lua:
+
+
+The Lua debug API is missing a couple of features (call/return
+hooks) and shows slightly different behavior (no per-coroutine hooks).
+
+
+Most other issues you're likely to find (e.g. with the existing test
+suites) are differences in the implementation-defined behavior.
+These either have a good reason (like early tail call resolving which
+may cause differences in error reporting), are arbitrary design choices
+or are due to quirks in the VM. The latter cases may get fixed if a
+demonstrable need is shown.
+
+
+
+
+The JIT compiler is not complete (yet) and falls back to the
+interpreter in some cases. All of this works transparently, so unless
+you use -jv, you'll probably never notice (the interpreter is quite
+fast, too). Here are the known issues:
+
+
+Many known issues cause a NYI (not yet implemented) trace abort
+message. E.g. for calls to vararg functions or many string library
+functions. Reporting these is only mildly useful, except if you have good
+example code that shows the problem. Obviously, reports accompanied with
+a patch to fix the issue are more than welcome. But please check back
+with me, before writing major improvements, to avoid duplication of
+effort.
+
+
+Recursion is not traced yet. Often no trace will be generated at
+all or some unroll limit will catch it and aborts the trace.
+
+
+The trace compiler currently does not back off specialization for
+function call dispatch. It should really fall back to specializing on
+the prototype, not the closure identity. This can lead to the so-called
+"trace explosion" problem with closure-heavy programming. The
+trace linking heuristics prevent this, but in the worst case this
+means the code always falls back to the interpreter.
+
+
+Trace management needs more tuning: better blacklisting of aborted
+traces, less drastic countermeasures against trace explosion and better
+heuristics in general.
+
+
+Some checks are missing in the JIT-compiled code for obscure situations
+with open upvalues aliasing one of the SSA slots later on (or
+vice versa). Bonus points, if you can find a real world test case for
+this.
+
+
+
+
+
+
Roadmap
+
+Rather than stating exact release dates (I'm well known for making
+spectacularly wrong guesses), this roadmap lists the general project
+plan, sorted by priority, as well as ideas for the future:
+
+
+
+The main goal right now is to stabilize LuaJIT 2.0 and get it out of
+beta test. Correctness has priority over completeness. This
+implies the first stable release will certainly NOT compile every
+library function call and will fall back to the interpreter from time
+to time. This is perfectly ok, since it still executes all Lua code,
+just not at the highest possible speed.
+
+
+The next step is to get it to compile more library functions and handle
+more cases where the compiler currently bails out. This doesn't mean it
+will compile every corner case. It's much more important that it
+performs well in a majority of use cases. Every compiler has to make
+these trade-offs — completeness just cannot be the
+overriding goal for a low-footprint, low-overhead JIT compiler.
+
+
+More optimizations will be added in parallel to the last step on
+an as-needed basis. Array-bounds-check (ABC) removal, sinking of stores
+to aggregates and sinking of allocations are high on the list. Faster
+handling of NEWREF and better alias analysis are desirable, too. More
+complex optimizations with less pay-off, such as value-range-propagation
+(VRP) will have to wait.
+
+
+LuaJIT 2.0 has been designed with portability in mind.
+Nonetheless, it compiles to native code and needs to be adapted to each
+architecture. Porting the compiler backend is probably the easier task,
+but a key element of its design is the fast interpreter, written in
+machine-specific assembler.
+The code base and the internal structures are already prepared for
+easier porting to 64 bit architectures. The most likely next target is a
+port to x64, but this will have to wait until the x86 port
+stabilizes. Other ports will follow — companies which are
+interested in sponsoring a port to a particular architecture, please
+contact me.
+
+
+There are some planned structural improvements to the compiler,
+like compressed snapshot maps or generic handling of calls to helper
+methods. These are of lesser importance, unless other developments
+elevate their priority.
+
+
+Documentation about the internals of LuaJIT is still sorely
+missing. Although the source code is included and is IMHO well
+commented, many basic design decisions are in need of an explanation.
+The rather un-traditional compiler architecture and the many highly
+optimized data structures are a barrier for outside participation in
+the development. Alas, as I've repeatedly stated, I'm better at
+writing code than papers and I'm not in need of any academical merits.
+Someday I will find the time for it. :-)
+
+
+Producing good code for unbiased branches is a key problem for trace
+compilers. This is the main cause for "trace explosion".
+Hyperblock scheduling promises to solve this nicely at the
+price of a major redesign of the compiler. This would also pave the
+way for emitting predicated instructions, which is a prerequisite
+for efficient vectorization.
+
+
+Currently Lua is missing a standard library for access to structured
+binary data and arrays/buffers holding low-level data types.
+Allowing calls to arbitrary C functions (FFI) would obviate the
+need to write manual bindings. A variety of extension modules is floating
+around, with different scope and capabilities. Alas, none of them has been
+designed with a JIT compiler in mind.
+
+
+
diff --git a/dynasm/dasm_proto.h b/dynasm/dasm_proto.h
new file mode 100644
index 0000000000..94d9a9e28e
--- /dev/null
+++ b/dynasm/dasm_proto.h
@@ -0,0 +1,69 @@
+/*
+** DynASM encoding engine prototypes.
+** Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+** Released under the MIT/X license. See dynasm.lua for full copyright notice.
+*/
+
+#ifndef _DASM_PROTO_H
+#define _DASM_PROTO_H
+
+#include
+#include
+
+#define DASM_IDENT "DynASM 1.2.1"
+#define DASM_VERSION 10201 /* 1.2.1 */
+
+#ifndef Dst_DECL
+#define Dst_DECL dasm_State *Dst
+#endif
+
+#ifndef Dst_GET
+#define Dst_GET (Dst)
+#endif
+
+#ifndef DASM_FDEF
+#define DASM_FDEF extern
+#endif
+
+
+/* Internal DynASM encoder state. */
+typedef struct dasm_State dasm_State;
+
+/* Action list type. */
+typedef const unsigned char *dasm_ActList;
+
+
+/* Initialize and free DynASM state. */
+DASM_FDEF void dasm_init(Dst_DECL, int maxsection);
+DASM_FDEF void dasm_free(Dst_DECL);
+
+/* Setup global array. Must be called before dasm_setup(). */
+DASM_FDEF void dasm_setupglobal(Dst_DECL, void **gl, unsigned int maxgl);
+
+/* Grow PC label array. Can be called after dasm_setup(), too. */
+DASM_FDEF void dasm_growpc(Dst_DECL, unsigned int maxpc);
+
+/* Setup encoder. */
+DASM_FDEF void dasm_setup(Dst_DECL, dasm_ActList actionlist);
+
+/* Feed encoder with actions. Calls are generated by pre-processor. */
+DASM_FDEF void dasm_put(Dst_DECL, int start, ...);
+
+/* Link sections and return the resulting size. */
+DASM_FDEF int dasm_link(Dst_DECL, size_t *szp);
+
+/* Encode sections into buffer. */
+DASM_FDEF int dasm_encode(Dst_DECL, void *buffer);
+
+/* Get PC label offset. */
+DASM_FDEF int dasm_getpclabel(Dst_DECL, unsigned int pc);
+
+#ifdef DASM_CHECKS
+/* Optional sanity checker to call between isolated encoding steps. */
+DASM_FDEF int dasm_checkstep(Dst_DECL, int secmatch);
+#else
+#define dasm_checkstep(a, b) 0
+#endif
+
+
+#endif /* _DASM_PROTO_H */
diff --git a/dynasm/dasm_x86.h b/dynasm/dasm_x86.h
new file mode 100644
index 0000000000..dab33e5ae4
--- /dev/null
+++ b/dynasm/dasm_x86.h
@@ -0,0 +1,467 @@
+/*
+** DynASM x86 encoding engine.
+** Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+** Released under the MIT/X license. See dynasm.lua for full copyright notice.
+*/
+
+#include
+#include
+#include
+#include
+
+#define DASM_ARCH "x86"
+
+#ifndef DASM_EXTERN
+#define DASM_EXTERN(a,b,c,d) 0
+#endif
+
+/* Action definitions. DASM_STOP must be 255. */
+enum {
+ DASM_DISP = 233,
+ DASM_IMM_S, DASM_IMM_B, DASM_IMM_W, DASM_IMM_D, DASM_IMM_WB, DASM_IMM_DB,
+ DASM_VREG, DASM_SPACE, DASM_SETLABEL, DASM_REL_A, DASM_REL_LG, DASM_REL_PC,
+ DASM_IMM_LG, DASM_IMM_PC, DASM_LABEL_LG, DASM_LABEL_PC, DASM_ALIGN,
+ DASM_EXTERN, DASM_ESC, DASM_MARK, DASM_SECTION, DASM_STOP
+};
+
+/* Maximum number of section buffer positions for a single dasm_put() call. */
+#define DASM_MAXSECPOS 25
+
+/* DynASM encoder status codes. Action list offset or number are or'ed in. */
+#define DASM_S_OK 0x00000000
+#define DASM_S_NOMEM 0x01000000
+#define DASM_S_PHASE 0x02000000
+#define DASM_S_MATCH_SEC 0x03000000
+#define DASM_S_RANGE_I 0x11000000
+#define DASM_S_RANGE_SEC 0x12000000
+#define DASM_S_RANGE_LG 0x13000000
+#define DASM_S_RANGE_PC 0x14000000
+#define DASM_S_RANGE_VREG 0x15000000
+#define DASM_S_UNDEF_L 0x21000000
+#define DASM_S_UNDEF_PC 0x22000000
+
+/* Macros to convert positions (8 bit section + 24 bit index). */
+#define DASM_POS2IDX(pos) ((pos)&0x00ffffff)
+#define DASM_POS2BIAS(pos) ((pos)&0xff000000)
+#define DASM_SEC2POS(sec) ((sec)<<24)
+#define DASM_POS2SEC(pos) ((pos)>>24)
+#define DASM_POS2PTR(D, pos) (D->sections[DASM_POS2SEC(pos)].rbuf + (pos))
+
+/* Per-section structure. */
+typedef struct dasm_Section {
+ int *rbuf; /* Biased buffer pointer (negative section bias). */
+ int *buf; /* True buffer pointer. */
+ size_t bsize; /* Buffer size in bytes. */
+ int pos; /* Biased buffer position. */
+ int epos; /* End of biased buffer position - max single put. */
+ int ofs; /* Byte offset into section. */
+} dasm_Section;
+
+/* Core structure holding the DynASM encoding state. */
+struct dasm_State {
+ size_t psize; /* Allocated size of this structure. */
+ dasm_ActList actionlist; /* Current actionlist pointer. */
+ int *lglabels; /* Local/global chain/pos ptrs. */
+ size_t lgsize;
+ int *pclabels; /* PC label chains/pos ptrs. */
+ size_t pcsize;
+ void **globals; /* Array of globals (bias -10). */
+ dasm_Section *section; /* Pointer to active section. */
+ size_t codesize; /* Total size of all code sections. */
+ int maxsection; /* 0 <= sectionidx < maxsection. */
+ int status; /* Status code. */
+ dasm_Section sections[1]; /* All sections. Alloc-extended. */
+};
+
+/* The size of the core structure depends on the max. number of sections. */
+#define DASM_PSZ(ms) (sizeof(dasm_State)+(ms-1)*sizeof(dasm_Section))
+
+
+/* Initialize DynASM state. */
+void dasm_init(Dst_DECL, int maxsection)
+{
+ dasm_State *D;
+ size_t psz = 0;
+ int i;
+ Dst_REF = NULL;
+ DASM_M_GROW(Dst, struct dasm_State, Dst_REF, psz, DASM_PSZ(maxsection));
+ D = Dst_REF;
+ D->psize = psz;
+ D->lglabels = NULL;
+ D->lgsize = 0;
+ D->pclabels = NULL;
+ D->pcsize = 0;
+ D->globals = NULL;
+ D->maxsection = maxsection;
+ for (i = 0; i < maxsection; i++) {
+ D->sections[i].buf = NULL; /* Need this for pass3. */
+ D->sections[i].rbuf = D->sections[i].buf - DASM_SEC2POS(i);
+ D->sections[i].bsize = 0;
+ D->sections[i].epos = 0; /* Wrong, but is recalculated after resize. */
+ }
+}
+
+/* Free DynASM state. */
+void dasm_free(Dst_DECL)
+{
+ dasm_State *D = Dst_REF;
+ int i;
+ for (i = 0; i < D->maxsection; i++)
+ if (D->sections[i].buf)
+ DASM_M_FREE(Dst, D->sections[i].buf, D->sections[i].bsize);
+ if (D->pclabels) DASM_M_FREE(Dst, D->pclabels, D->pcsize);
+ if (D->lglabels) DASM_M_FREE(Dst, D->lglabels, D->lgsize);
+ DASM_M_FREE(Dst, D, D->psize);
+}
+
+/* Setup global label array. Must be called before dasm_setup(). */
+void dasm_setupglobal(Dst_DECL, void **gl, unsigned int maxgl)
+{
+ dasm_State *D = Dst_REF;
+ D->globals = gl - 10; /* Negative bias to compensate for locals. */
+ DASM_M_GROW(Dst, int, D->lglabels, D->lgsize, (10+maxgl)*sizeof(int));
+}
+
+/* Grow PC label array. Can be called after dasm_setup(), too. */
+void dasm_growpc(Dst_DECL, unsigned int maxpc)
+{
+ dasm_State *D = Dst_REF;
+ size_t osz = D->pcsize;
+ DASM_M_GROW(Dst, int, D->pclabels, D->pcsize, maxpc*sizeof(int));
+ memset((void *)(((unsigned char *)D->pclabels)+osz), 0, D->pcsize-osz);
+}
+
+/* Setup encoder. */
+void dasm_setup(Dst_DECL, dasm_ActList actionlist)
+{
+ dasm_State *D = Dst_REF;
+ int i;
+ D->actionlist = actionlist;
+ D->status = DASM_S_OK;
+ D->section = &D->sections[0];
+ memset((void *)D->lglabels, 0, D->lgsize);
+ if (D->pclabels) memset((void *)D->pclabels, 0, D->pcsize);
+ for (i = 0; i < D->maxsection; i++) {
+ D->sections[i].pos = DASM_SEC2POS(i);
+ D->sections[i].ofs = 0;
+ }
+}
+
+
+#ifdef DASM_CHECKS
+#define CK(x, st) \
+ do { if (!(x)) { \
+ D->status = DASM_S_##st|(p-D->actionlist-1); return; } } while (0)
+#define CKPL(kind, st) \
+ do { if ((size_t)((char *)pl-(char *)D->kind##labels) >= D->kind##size) { \
+ D->status = DASM_S_RANGE_##st|(p-D->actionlist-1); return; } } while (0)
+#else
+#define CK(x, st) ((void)0)
+#define CKPL(kind, st) ((void)0)
+#endif
+
+/* Pass 1: Store actions and args, link branches/labels, estimate offsets. */
+void dasm_put(Dst_DECL, int start, ...)
+{
+ va_list ap;
+ dasm_State *D = Dst_REF;
+ dasm_ActList p = D->actionlist + start;
+ dasm_Section *sec = D->section;
+ int pos = sec->pos, ofs = sec->ofs, mrm = 4;
+ int *b;
+
+ if (pos >= sec->epos) {
+ DASM_M_GROW(Dst, int, sec->buf, sec->bsize,
+ sec->bsize + 2*DASM_MAXSECPOS*sizeof(int));
+ sec->rbuf = sec->buf - DASM_POS2BIAS(pos);
+ sec->epos = (int)sec->bsize/sizeof(int) - DASM_MAXSECPOS+DASM_POS2BIAS(pos);
+ }
+
+ b = sec->rbuf;
+ b[pos++] = start;
+
+ va_start(ap, start);
+ while (1) {
+ int action = *p++;
+ if (action < DASM_DISP) {
+ ofs++;
+ } else if (action <= DASM_REL_A) {
+ int n = va_arg(ap, int);
+ b[pos++] = n;
+ switch (action) {
+ case DASM_DISP:
+ if (n == 0) { if ((mrm&7) == 4) mrm = p[-2]; if ((mrm&7) != 5) break; }
+ case DASM_IMM_DB: if (((n+128)&-256) == 0) goto ob;
+ case DASM_REL_A: /* Assumes ptrdiff_t is int. !x64 */
+ case DASM_IMM_D: ofs += 4; break;
+ case DASM_IMM_S: CK(((n+128)&-256) == 0, RANGE_I); goto ob;
+ case DASM_IMM_B: CK((n&-256) == 0, RANGE_I); ob: ofs++; break;
+ case DASM_IMM_WB: if (((n+128)&-256) == 0) goto ob;
+ case DASM_IMM_W: CK((n&-65536) == 0, RANGE_I); ofs += 2; break;
+ case DASM_SPACE: p++; ofs += n; break;
+ case DASM_SETLABEL: b[pos-2] = -0x40000000; break; /* Neg. label ofs. */
+ case DASM_VREG: CK((n&-8) == 0 && (n != 4 || (*p&1) == 0), RANGE_VREG);
+ if (*p++ == 1 && *p == DASM_DISP) mrm = n; continue;
+ }
+ mrm = 4;
+ } else {
+ int *pl, n;
+ switch (action) {
+ case DASM_REL_LG:
+ case DASM_IMM_LG:
+ n = *p++; pl = D->lglabels + n;
+ if (n <= 246) { CKPL(lg, LG); goto putrel; } /* Bkwd rel or global. */
+ pl -= 246; n = *pl;
+ if (n < 0) n = 0; /* Start new chain for fwd rel if label exists. */
+ goto linkrel;
+ case DASM_REL_PC:
+ case DASM_IMM_PC: pl = D->pclabels + va_arg(ap, int); CKPL(pc, PC);
+ putrel:
+ n = *pl;
+ if (n < 0) { /* Label exists. Get label pos and store it. */
+ b[pos] = -n;
+ } else {
+ linkrel:
+ b[pos] = n; /* Else link to rel chain, anchored at label. */
+ *pl = pos;
+ }
+ pos++;
+ ofs += 4; /* Maximum offset needed. */
+ if (action == DASM_REL_LG || action == DASM_REL_PC)
+ b[pos++] = ofs; /* Store pass1 offset estimate. */
+ break;
+ case DASM_LABEL_LG: pl = D->lglabels + *p++; CKPL(lg, LG); goto putlabel;
+ case DASM_LABEL_PC: pl = D->pclabels + va_arg(ap, int); CKPL(pc, PC);
+ putlabel:
+ n = *pl; /* n > 0: Collapse rel chain and replace with label pos. */
+ while (n > 0) { int *pb = DASM_POS2PTR(D, n); n = *pb; *pb = pos; }
+ *pl = -pos; /* Label exists now. */
+ b[pos++] = ofs; /* Store pass1 offset estimate. */
+ break;
+ case DASM_ALIGN:
+ ofs += *p++; /* Maximum alignment needed (arg is 2**n-1). */
+ b[pos++] = ofs; /* Store pass1 offset estimate. */
+ break;
+ case DASM_EXTERN: p += 2; ofs += 4; break;
+ case DASM_ESC: p++; ofs++; break;
+ case DASM_MARK: mrm = p[-2]; break;
+ case DASM_SECTION:
+ n = *p; CK(n < D->maxsection, RANGE_SEC); D->section = &D->sections[n];
+ case DASM_STOP: goto stop;
+ }
+ }
+ }
+stop:
+ va_end(ap);
+ sec->pos = pos;
+ sec->ofs = ofs;
+}
+#undef CK
+
+/* Pass 2: Link sections, shrink branches/aligns, fix label offsets. */
+int dasm_link(Dst_DECL, size_t *szp)
+{
+ dasm_State *D = Dst_REF;
+ int secnum;
+ int ofs = 0;
+
+#ifdef DASM_CHECKS
+ *szp = 0;
+ if (D->status != DASM_S_OK) return D->status;
+ {
+ int pc;
+ for (pc = 0; pc*sizeof(int) < D->pcsize; pc++)
+ if (D->pclabels[pc] > 0) return DASM_S_UNDEF_PC|pc;
+ }
+#endif
+
+ { /* Handle globals not defined in this translation unit. */
+ int idx;
+ for (idx = 10; idx*sizeof(int) < D->lgsize; idx++) {
+ int n = D->lglabels[idx];
+ /* Undefined label: Collapse rel chain and replace with marker (< 0). */
+ while (n > 0) { int *pb = DASM_POS2PTR(D, n); n = *pb; *pb = -idx; }
+ }
+ }
+
+ /* Combine all code sections. No support for data sections (yet). */
+ for (secnum = 0; secnum < D->maxsection; secnum++) {
+ dasm_Section *sec = D->sections + secnum;
+ int *b = sec->rbuf;
+ int pos = DASM_SEC2POS(secnum);
+ int lastpos = sec->pos;
+
+ while (pos != lastpos) {
+ dasm_ActList p = D->actionlist + b[pos++];
+ while (1) {
+ int op, action = *p++;
+ switch (action) {
+ case DASM_REL_LG: p++; op = p[-3]; goto rel_pc;
+ case DASM_REL_PC: op = p[-2]; rel_pc: {
+ int shrink = op == 0xe9 ? 3 : ((op&0xf0) == 0x80 ? 4 : 0);
+ if (shrink) { /* Shrinkable branch opcode? */
+ int lofs, lpos = b[pos];
+ if (lpos < 0) goto noshrink; /* Ext global? */
+ lofs = *DASM_POS2PTR(D, lpos);
+ if (lpos > pos) { /* Fwd label: add cumulative section offsets. */
+ int i;
+ for (i = secnum; i < DASM_POS2SEC(lpos); i++)
+ lofs += D->sections[i].ofs;
+ } else {
+ lofs -= ofs; /* Bkwd label: unfix offset. */
+ }
+ lofs -= b[pos+1]; /* Short branch ok? */
+ if (lofs >= -128-shrink && lofs <= 127) ofs -= shrink; /* Yes. */
+ else { noshrink: shrink = 0; } /* No, cannot shrink op. */
+ }
+ b[pos+1] = shrink;
+ pos += 2;
+ break;
+ }
+ case DASM_SPACE: case DASM_IMM_LG: case DASM_VREG: p++;
+ case DASM_DISP: case DASM_IMM_S: case DASM_IMM_B: case DASM_IMM_W:
+ case DASM_IMM_D: case DASM_IMM_WB: case DASM_IMM_DB:
+ case DASM_SETLABEL: case DASM_REL_A: case DASM_IMM_PC: pos++; break;
+ case DASM_LABEL_LG: p++;
+ case DASM_LABEL_PC: b[pos++] += ofs; break; /* Fix label offset. */
+ case DASM_ALIGN: ofs -= (b[pos++]+ofs)&*p++; break; /* Adjust ofs. */
+ case DASM_EXTERN: p += 2; break;
+ case DASM_ESC: p++; break;
+ case DASM_MARK: break;
+ case DASM_SECTION: case DASM_STOP: goto stop;
+ }
+ }
+ stop: (void)0;
+ }
+ ofs += sec->ofs; /* Next section starts right after current section. */
+ }
+
+ D->codesize = ofs; /* Total size of all code sections */
+ *szp = ofs;
+ return DASM_S_OK;
+}
+
+#define dasmb(x) *cp++ = (unsigned char)(x)
+#ifndef DASM_ALIGNED_WRITES
+#define dasmw(x) \
+ do { *((unsigned short *)cp) = (unsigned short)(x); cp+=2; } while (0)
+#define dasmd(x) \
+ do { *((unsigned int *)cp) = (unsigned int)(x); cp+=4; } while (0)
+#else
+#define dasmw(x) do { dasmb(x); dasmb((x)>>8); } while (0)
+#define dasmd(x) do { dasmw(x); dasmw((x)>>16); } while (0)
+#endif
+
+/* Pass 3: Encode sections. */
+int dasm_encode(Dst_DECL, void *buffer)
+{
+ dasm_State *D = Dst_REF;
+ unsigned char *base = (unsigned char *)buffer;
+ unsigned char *cp = base;
+ int secnum;
+
+ /* Encode all code sections. No support for data sections (yet). */
+ for (secnum = 0; secnum < D->maxsection; secnum++) {
+ dasm_Section *sec = D->sections + secnum;
+ int *b = sec->buf;
+ int *endb = sec->rbuf + sec->pos;
+
+ while (b != endb) {
+ dasm_ActList p = D->actionlist + *b++;
+ unsigned char *mark = NULL;
+ while (1) {
+ int action = *p++;
+ int n = (action >= DASM_DISP && action <= DASM_ALIGN) ? *b++ : 0;
+ switch (action) {
+ case DASM_DISP: if (!mark) mark = cp; {
+ unsigned char *mm = mark;
+ if (*p != DASM_IMM_DB && *p != DASM_IMM_WB) mark = NULL;
+ if (n == 0) { int mrm = mm[-1]&7; if (mrm == 4) mrm = mm[0]&7;
+ if (mrm != 5) { mm[-1] -= 0x80; break; } }
+ if (((n+128) & -256) != 0) goto wd; else mm[-1] -= 0x40;
+ }
+ case DASM_IMM_S: case DASM_IMM_B: wb: dasmb(n); break;
+ case DASM_IMM_DB: if (((n+128)&-256) == 0) {
+ db: if (!mark) mark = cp; mark[-2] += 2; mark = NULL; goto wb;
+ } else mark = NULL;
+ case DASM_IMM_D: wd: dasmd(n); break;
+ case DASM_IMM_WB: if (((n+128)&-256) == 0) goto db; else mark = NULL;
+ case DASM_IMM_W: dasmw(n); break;
+ case DASM_VREG: { int t = *p++; if (t >= 2) n<<=3; cp[-1] |= n; break; }
+ case DASM_REL_LG: p++; if (n >= 0) goto rel_pc;
+ b++; n = (int)(ptrdiff_t)D->globals[-n];
+ case DASM_REL_A: rel_a: n -= (int)(ptrdiff_t)(cp+4); goto wd; /* !x64 */
+ case DASM_REL_PC: rel_pc: {
+ int shrink = *b++;
+ int *pb = DASM_POS2PTR(D, n); if (*pb < 0) { n = pb[1]; goto rel_a; }
+ n = *pb - ((int)(cp-base) + 4-shrink);
+ if (shrink == 0) goto wd;
+ if (shrink == 4) { cp--; cp[-1] = *cp-0x10; } else cp[-1] = 0xeb;
+ goto wb;
+ }
+ case DASM_IMM_LG:
+ p++; if (n < 0) { n = (int)(ptrdiff_t)D->globals[-n]; goto wd; }
+ case DASM_IMM_PC: {
+ int *pb = DASM_POS2PTR(D, n);
+ n = *pb < 0 ? pb[1] : (*pb + (int)(ptrdiff_t)base);
+ goto wd;
+ }
+ case DASM_LABEL_LG: {
+ int idx = *p++;
+ if (idx >= 10)
+ D->globals[idx] = (void *)(base + (*p == DASM_SETLABEL ? *b : n));
+ break;
+ }
+ case DASM_LABEL_PC: case DASM_SETLABEL: break;
+ case DASM_SPACE: { int fill = *p++; while (n--) *cp++ = fill; break; }
+ case DASM_ALIGN:
+ n = *p++;
+ while (((cp-base) & n)) *cp++ = 0x90; /* nop */
+ break;
+ case DASM_EXTERN: n = DASM_EXTERN(Dst, cp, p[1], *p); p += 2; goto wd;
+ case DASM_MARK: mark = cp; break;
+ case DASM_ESC: action = *p++;
+ default: *cp++ = action; break;
+ case DASM_SECTION: case DASM_STOP: goto stop;
+ }
+ }
+ stop: (void)0;
+ }
+ }
+
+ if (base + D->codesize != cp) /* Check for phase errors. */
+ return DASM_S_PHASE;
+ return DASM_S_OK;
+}
+
+/* Get PC label offset. */
+int dasm_getpclabel(Dst_DECL, unsigned int pc)
+{
+ dasm_State *D = Dst_REF;
+ if (pc*sizeof(int) < D->pcsize) {
+ int pos = D->pclabels[pc];
+ if (pos < 0) return *DASM_POS2PTR(D, -pos);
+ if (pos > 0) return -1; /* Undefined. */
+ }
+ return -2; /* Unused or out of range. */
+}
+
+#ifdef DASM_CHECKS
+/* Optional sanity checker to call between isolated encoding steps. */
+int dasm_checkstep(Dst_DECL, int secmatch)
+{
+ dasm_State *D = Dst_REF;
+ if (D->status == DASM_S_OK) {
+ int i;
+ for (i = 1; i <= 9; i++) {
+ if (D->lglabels[i] > 0) { D->status = DASM_S_UNDEF_L|i; break; }
+ D->lglabels[i] = 0;
+ }
+ }
+ if (D->status == DASM_S_OK && secmatch >= 0 &&
+ D->section != &D->sections[secmatch])
+ D->status = DASM_S_MATCH_SEC|(D->section-D->sections);
+ return D->status;
+}
+#endif
+
diff --git a/dynasm/dasm_x86.lua b/dynasm/dasm_x86.lua
new file mode 100644
index 0000000000..8221080677
--- /dev/null
+++ b/dynasm/dasm_x86.lua
@@ -0,0 +1,1799 @@
+------------------------------------------------------------------------------
+-- DynASM x86 module.
+--
+-- Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+-- See dynasm.lua for full copyright notice.
+------------------------------------------------------------------------------
+
+-- Module information:
+local _info = {
+ arch = "x86",
+ description = "DynASM x86 (i386) module",
+ version = "1.2.1",
+ vernum = 10201,
+ release = "2009-04-16",
+ author = "Mike Pall",
+ license = "MIT",
+}
+
+-- Exported glue functions for the arch-specific module.
+local _M = { _info = _info }
+
+-- Cache library functions.
+local type, tonumber, pairs, ipairs = type, tonumber, pairs, ipairs
+local assert, unpack = assert, unpack
+local _s = string
+local sub, format, byte, char = _s.sub, _s.format, _s.byte, _s.char
+local find, match, gmatch, gsub = _s.find, _s.match, _s.gmatch, _s.gsub
+local concat, sort = table.concat, table.sort
+local char, unpack = string.char, unpack
+
+-- Inherited tables and callbacks.
+local g_opt, g_arch
+local wline, werror, wfatal, wwarn
+
+-- Action name list.
+-- CHECK: Keep this in sync with the C code!
+local action_names = {
+ -- int arg, 1 buffer pos:
+ "DISP", "IMM_S", "IMM_B", "IMM_W", "IMM_D", "IMM_WB", "IMM_DB",
+ -- action arg (1 byte), int arg, 1 buffer pos (reg/num):
+ "VREG", "SPACE",
+ -- ptrdiff_t arg, 1 buffer pos (address): !x64
+ "SETLABEL", "REL_A",
+ -- action arg (1 byte) or int arg, 2 buffer pos (link, offset):
+ "REL_LG", "REL_PC",
+ -- action arg (1 byte) or int arg, 1 buffer pos (link):
+ "IMM_LG", "IMM_PC",
+ -- action arg (1 byte) or int arg, 1 buffer pos (offset):
+ "LABEL_LG", "LABEL_PC",
+ -- action arg (1 byte), 1 buffer pos (offset):
+ "ALIGN",
+ -- action args (2 bytes), no buffer pos.
+ "EXTERN",
+ -- action arg (1 byte), no buffer pos.
+ "ESC",
+ -- no action arg, no buffer pos.
+ "MARK",
+ -- action arg (1 byte), no buffer pos, terminal action:
+ "SECTION",
+ -- no args, no buffer pos, terminal action:
+ "STOP"
+}
+
+-- Maximum number of section buffer positions for dasm_put().
+-- CHECK: Keep this in sync with the C code!
+local maxsecpos = 25 -- Keep this low, to avoid excessively long C lines.
+
+-- Action name -> action number (dynamically generated below).
+local map_action = {}
+-- First action number. Everything below does not need to be escaped.
+local actfirst = 256-#action_names
+
+-- Action list buffer and string (only used to remove dupes).
+local actlist = {}
+local actstr = ""
+
+-- Argument list for next dasm_put(). Start with offset 0 into action list.
+local actargs = { 0 }
+
+-- Current number of section buffer positions for dasm_put().
+local secpos = 1
+
+------------------------------------------------------------------------------
+
+-- Compute action numbers for action names.
+for n,name in ipairs(action_names) do
+ local num = actfirst + n - 1
+ map_action[name] = num
+end
+
+-- Dump action names and numbers.
+local function dumpactions(out)
+ out:write("DynASM encoding engine action codes:\n")
+ for n,name in ipairs(action_names) do
+ local num = map_action[name]
+ out:write(format(" %-10s %02X %d\n", name, num, num))
+ end
+ out:write("\n")
+end
+
+-- Write action list buffer as a huge static C array.
+local function writeactions(out, name)
+ local nn = #actlist
+ local last = actlist[nn] or 255
+ actlist[nn] = nil -- Remove last byte.
+ if nn == 0 then nn = 1 end
+ out:write("static const unsigned char ", name, "[", nn, "] = {\n")
+ local s = " "
+ for n,b in ipairs(actlist) do
+ s = s..b..","
+ if #s >= 75 then
+ assert(out:write(s, "\n"))
+ s = " "
+ end
+ end
+ out:write(s, last, "\n};\n\n") -- Add last byte back.
+end
+
+------------------------------------------------------------------------------
+
+-- Add byte to action list.
+local function wputxb(n)
+ assert(n >= 0 and n <= 255 and n % 1 == 0, "byte out of range")
+ actlist[#actlist+1] = n
+end
+
+-- Add action to list with optional arg. Advance buffer pos, too.
+local function waction(action, a, num)
+ wputxb(assert(map_action[action], "bad action name `"..action.."'"))
+ if a then actargs[#actargs+1] = a end
+ if a or num then secpos = secpos + (num or 1) end
+end
+
+-- Add call to embedded DynASM C code.
+local function wcall(func, args)
+ wline(format("dasm_%s(Dst, %s);", func, concat(args, ", ")), true)
+end
+
+-- Delete duplicate action list chunks. A tad slow, but so what.
+local function dedupechunk(offset)
+ local al, as = actlist, actstr
+ local chunk = char(unpack(al, offset+1, #al))
+ local orig = find(as, chunk, 1, true)
+ if orig then
+ actargs[1] = orig-1 -- Replace with original offset.
+ for i=offset+1,#al do al[i] = nil end -- Kill dupe.
+ else
+ actstr = as..chunk
+ end
+end
+
+-- Flush action list (intervening C code or buffer pos overflow).
+local function wflush(term)
+ local offset = actargs[1]
+ if #actlist == offset then return end -- Nothing to flush.
+ if not term then waction("STOP") end -- Terminate action list.
+ dedupechunk(offset)
+ wcall("put", actargs) -- Add call to dasm_put().
+ actargs = { #actlist } -- Actionlist offset is 1st arg to next dasm_put().
+ secpos = 1 -- The actionlist offset occupies a buffer position, too.
+end
+
+-- Put escaped byte.
+local function wputb(n)
+ if n >= actfirst then waction("ESC") end -- Need to escape byte.
+ wputxb(n)
+end
+
+------------------------------------------------------------------------------
+
+-- Global label name -> global label number. With auto assignment on 1st use.
+local next_global = 10
+local map_global = setmetatable({}, { __index = function(t, name)
+ if not match(name, "^[%a_][%w_]*$") then werror("bad global label") end
+ local n = next_global
+ if n > 246 then werror("too many global labels") end
+ next_global = n + 1
+ t[name] = n
+ return n
+end})
+
+-- Dump global labels.
+local function dumpglobals(out, lvl)
+ local t = {}
+ for name, n in pairs(map_global) do t[n] = name end
+ out:write("Global labels:\n")
+ for i=10,next_global-1 do
+ out:write(format(" %s\n", t[i]))
+ end
+ out:write("\n")
+end
+
+-- Write global label enum.
+local function writeglobals(out, prefix)
+ local t = {}
+ for name, n in pairs(map_global) do t[n] = name end
+ out:write("enum {\n")
+ for i=10,next_global-1 do
+ out:write(" ", prefix, t[i], ",\n")
+ end
+ out:write(" ", prefix, "_MAX\n};\n")
+end
+
+-- Write global label names.
+local function writeglobalnames(out, name)
+ local t = {}
+ for name, n in pairs(map_global) do t[n] = name end
+ out:write("static const char *const ", name, "[] = {\n")
+ for i=10,next_global-1 do
+ out:write(" \"", t[i], "\",\n")
+ end
+ out:write(" (const char *)0\n};\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Extern label name -> extern label number. With auto assignment on 1st use.
+local next_extern = -1
+local map_extern = setmetatable({}, { __index = function(t, name)
+ -- No restrictions on the name for now.
+ local n = next_extern
+ if n < -256 then werror("too many extern labels") end
+ next_extern = n - 1
+ t[name] = n
+ return n
+end})
+
+-- Dump extern labels.
+local function dumpexterns(out, lvl)
+ local t = {}
+ for name, n in pairs(map_extern) do t[-n] = name end
+ out:write("Extern labels:\n")
+ for i=1,-next_extern-1 do
+ out:write(format(" %s\n", t[i]))
+ end
+ out:write("\n")
+end
+
+-- Write extern label names.
+local function writeexternnames(out, name)
+ local t = {}
+ for name, n in pairs(map_extern) do t[-n] = name end
+ out:write("static const char *const ", name, "[] = {\n")
+ for i=1,-next_extern-1 do
+ out:write(" \"", t[i], "\",\n")
+ end
+ out:write(" (const char *)0\n};\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Arch-specific maps.
+local map_archdef = {} -- Ext. register name -> int. name.
+local map_reg_rev = {} -- Int. register name -> ext. name.
+local map_reg_num = {} -- Int. register name -> register number.
+local map_reg_opsize = {} -- Int. register name -> operand size.
+local map_reg_valid_base = {} -- Int. register name -> valid base register?
+local map_reg_valid_index = {} -- Int. register name -> valid index register?
+local reg_list = {} -- Canonical list of int. register names.
+
+local map_type = {} -- Type name -> { ctype, reg }
+local ctypenum = 0 -- Type number (for _PTx macros).
+
+local addrsize = "d" -- Size for address operands. !x64
+
+-- Helper function to fill register maps.
+local function mkrmap(sz, cl, names)
+ local cname = format("@%s", sz)
+ reg_list[#reg_list+1] = cname
+ map_archdef[cl] = cname
+ map_reg_rev[cname] = cl
+ map_reg_num[cname] = -1
+ map_reg_opsize[cname] = sz
+ if sz == addrsize then
+ map_reg_valid_base[cname] = true
+ map_reg_valid_index[cname] = true
+ end
+ for n,name in ipairs(names) do
+ local iname = format("@%s%x", sz, n-1)
+ reg_list[#reg_list+1] = iname
+ map_archdef[name] = iname
+ map_reg_rev[iname] = name
+ map_reg_num[iname] = n-1
+ map_reg_opsize[iname] = sz
+ if sz == addrsize then
+ map_reg_valid_base[iname] = true
+ map_reg_valid_index[iname] = true
+ end
+ end
+ reg_list[#reg_list+1] = ""
+end
+
+-- Integer registers (dword, word and byte sized).
+mkrmap("d", "Rd", {"eax", "ecx", "edx", "ebx", "esp", "ebp", "esi", "edi"})
+map_reg_valid_index[map_archdef.esp] = false
+mkrmap("w", "Rw", {"ax", "cx", "dx", "bx", "sp", "bp", "si", "di"})
+mkrmap("b", "Rb", {"al", "cl", "dl", "bl", "ah", "ch", "dh", "bh"})
+map_archdef["Ra"] = "@"..addrsize
+
+-- FP registers (internally tword sized, but use "f" as operand size).
+mkrmap("f", "Rf", {"st0", "st1", "st2", "st3", "st4", "st5", "st6", "st7"})
+
+-- SSE registers (oword sized, but qword and dword accessible).
+mkrmap("o", "xmm", {"xmm0","xmm1","xmm2","xmm3","xmm4","xmm5","xmm6","xmm7"})
+
+-- Operand size prefixes to codes.
+local map_opsize = {
+ byte = "b", word = "w", dword = "d", qword = "q", oword = "o", tword = "t",
+ aword = addrsize,
+}
+
+-- Operand size code to number.
+local map_opsizenum = {
+ b = 1, w = 2, d = 4, q = 8, o = 16, t = 10,
+}
+
+-- Operand size code to name.
+local map_opsizename = {
+ b = "byte", w = "word", d = "dword", q = "qword", o = "oword", t = "tword",
+ f = "fpword",
+}
+
+-- Valid index register scale factors.
+local map_xsc = {
+ ["1"] = 0, ["2"] = 1, ["4"] = 2, ["8"] = 3,
+}
+
+-- Condition codes.
+local map_cc = {
+ o = 0, no = 1, b = 2, nb = 3, e = 4, ne = 5, be = 6, nbe = 7,
+ s = 8, ns = 9, p = 10, np = 11, l = 12, nl = 13, le = 14, nle = 15,
+ c = 2, nae = 2, nc = 3, ae = 3, z = 4, nz = 5, na = 6, a = 7,
+ pe = 10, po = 11, nge = 12, ge = 13, ng = 14, g = 15,
+}
+
+
+-- Reverse defines for registers.
+function _M.revdef(s)
+ return gsub(s, "@%w+", map_reg_rev)
+end
+
+-- Dump register names and numbers
+local function dumpregs(out)
+ out:write("Register names, sizes and internal numbers:\n")
+ for _,reg in ipairs(reg_list) do
+ if reg == "" then
+ out:write("\n")
+ else
+ local name = map_reg_rev[reg]
+ local num = map_reg_num[reg]
+ local opsize = map_opsizename[map_reg_opsize[reg]]
+ out:write(format(" %-5s %-8s %s\n", name, opsize,
+ num < 0 and "(variable)" or num))
+ end
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Put action for label arg (IMM_LG, IMM_PC, REL_LG, REL_PC).
+local function wputlabel(aprefix, imm, num)
+ if type(imm) == "number" then
+ if imm < 0 then
+ waction("EXTERN")
+ wputxb(aprefix == "IMM_" and 0 or 1)
+ imm = -imm-1
+ else
+ waction(aprefix.."LG", nil, num);
+ end
+ wputxb(imm)
+ else
+ waction(aprefix.."PC", imm, num)
+ end
+end
+
+-- Put signed byte or arg.
+local function wputsbarg(n)
+ if type(n) == "number" then
+ if n < -128 or n > 127 then
+ werror("signed immediate byte out of range")
+ end
+ if n < 0 then n = n + 256 end
+ wputb(n)
+ else waction("IMM_S", n) end
+end
+
+-- Put unsigned byte or arg.
+local function wputbarg(n)
+ if type(n) == "number" then
+ if n < 0 or n > 255 then
+ werror("unsigned immediate byte out of range")
+ end
+ wputb(n)
+ else waction("IMM_B", n) end
+end
+
+-- Put unsigned word or arg.
+local function wputwarg(n)
+ if type(n) == "number" then
+ if n < 0 or n > 65535 then
+ werror("unsigned immediate word out of range")
+ end
+ local r = n%256; n = (n-r)/256; wputb(r); wputb(n);
+ else waction("IMM_W", n) end
+end
+
+-- Put signed or unsigned dword or arg.
+local function wputdarg(n)
+ local tn = type(n)
+ if tn == "number" then
+ if n < 0 then n = n + 4294967296 end
+ local r = n%256; n = (n-r)/256; wputb(r);
+ r = n%256; n = (n-r)/256; wputb(r);
+ r = n%256; n = (n-r)/256; wputb(r); wputb(n);
+ elseif tn == "table" then
+ wputlabel("IMM_", n[1], 1)
+ else
+ waction("IMM_D", n)
+ end
+end
+
+-- Put operand-size dependent number or arg (defaults to dword).
+local function wputszarg(sz, n)
+ if not sz or sz == "d" then wputdarg(n)
+ elseif sz == "w" then wputwarg(n)
+ elseif sz == "b" then wputbarg(n)
+ elseif sz == "s" then wputsbarg(n)
+ else werror("bad operand size") end
+end
+
+-- Put multi-byte opcode with operand-size dependent modifications.
+local function wputop(sz, op)
+ local r
+ if sz == "w" then wputb(102) end
+ -- Needs >32 bit numbers, but only for crc32 eax, word [ebx]
+ if op >= 4294967296 then r = op%4294967296 wputb((op-r)/4294967296) op = r end
+ if op >= 16777216 then r = op % 16777216 wputb((op-r) / 16777216) op = r end
+ if op >= 65536 then r = op % 65536 wputb((op-r) / 65536) op = r end
+ if op >= 256 then r = op % 256 wputb((op-r) / 256) op = r end
+ if sz == "b" then op = op - 1 end
+ wputb(op)
+end
+
+-- Put ModRM or SIB formatted byte.
+local function wputmodrm(m, s, rm, vs, vrm)
+ assert(m < 4 and s < 8 and rm < 8, "bad modrm operands")
+ wputb(64*m + 8*s + rm)
+end
+
+-- Put ModRM/SIB plus optional displacement.
+local function wputmrmsib(t, imark, s, vsreg)
+ local vreg, vxreg
+ local reg, xreg = t.reg, t.xreg
+ if reg and reg < 0 then reg = 0; vreg = t.vreg end
+ if xreg and xreg < 0 then xreg = 0; vxreg = t.vxreg end
+ if s < 0 then s = 0 end
+
+ -- Register mode.
+ if sub(t.mode, 1, 1) == "r" then
+ wputmodrm(3, s, reg)
+ if vsreg then waction("VREG", vsreg); wputxb(2) end
+ if vreg then waction("VREG", vreg); wputxb(0) end
+ return
+ end
+
+ local disp = t.disp
+ local tdisp = type(disp)
+ -- No base register?
+ if not reg then
+ if xreg then
+ -- Indexed mode with index register only.
+ -- [xreg*xsc+disp] -> (0, s, esp) (xsc, xreg, ebp)
+ wputmodrm(0, s, 4)
+ if imark then waction("MARK") end
+ if vsreg then waction("VREG", vsreg); wputxb(2) end
+ wputmodrm(t.xsc, xreg, 5)
+ if vxreg then waction("VREG", vxreg); wputxb(3) end
+ else
+ -- Pure displacement.
+ wputmodrm(0, s, 5) -- [disp] -> (0, s, ebp)
+ if imark then waction("MARK") end
+ if vsreg then waction("VREG", vsreg); wputxb(2) end
+ end
+ wputdarg(disp)
+ return
+ end
+
+ local m
+ if tdisp == "number" then -- Check displacement size at assembly time.
+ if disp == 0 and reg ~= 5 then -- [ebp] -> [ebp+0] (in SIB, too)
+ if not vreg then m = 0 end -- Force DISP to allow [Rd(5)] -> [ebp+0]
+ elseif disp >= -128 and disp <= 127 then m = 1
+ else m = 2 end
+ elseif tdisp == "table" then
+ m = 2
+ end
+
+ -- Index register present or esp as base register: need SIB encoding.
+ if xreg or reg == 4 then
+ wputmodrm(m or 2, s, 4) -- ModRM.
+ if m == nil or imark then waction("MARK") end
+ if vsreg then waction("VREG", vsreg); wputxb(2) end
+ wputmodrm(t.xsc or 0, xreg or 4, reg) -- SIB.
+ if vxreg then waction("VREG", vxreg); wputxb(3) end
+ if vreg then waction("VREG", vreg); wputxb(1) end
+ else
+ wputmodrm(m or 2, s, reg) -- ModRM.
+ if (imark and (m == 1 or m == 2)) or
+ (m == nil and (vsreg or vreg)) then waction("MARK") end
+ if vsreg then waction("VREG", vsreg); wputxb(2) end
+ if vreg then waction("VREG", vreg); wputxb(1) end
+ end
+
+ -- Put displacement.
+ if m == 1 then wputsbarg(disp)
+ elseif m == 2 then wputdarg(disp)
+ elseif m == nil then waction("DISP", disp) end
+end
+
+------------------------------------------------------------------------------
+
+-- Return human-readable operand mode string.
+local function opmodestr(op, args)
+ local m = {}
+ for i=1,#args do
+ local a = args[i]
+ m[#m+1] = sub(a.mode, 1, 1)..(a.opsize or "?")
+ end
+ return op.." "..concat(m, ",")
+end
+
+-- Convert number to valid integer or nil.
+local function toint(expr)
+ local n = tonumber(expr)
+ if n then
+ if n % 1 ~= 0 or n < -2147483648 or n > 4294967295 then
+ werror("bad integer number `"..expr.."'")
+ end
+ return n
+ end
+end
+
+-- Parse immediate expression.
+local function immexpr(expr)
+ -- &expr (pointer)
+ if sub(expr, 1, 1) == "&" then
+ return "iPJ", format("(ptrdiff_t)(%s)", sub(expr,2))
+ end
+
+ local prefix = sub(expr, 1, 2)
+ -- =>expr (pc label reference)
+ if prefix == "=>" then
+ return "iJ", sub(expr, 3)
+ end
+ -- ->name (global label reference)
+ if prefix == "->" then
+ return "iJ", map_global[sub(expr, 3)]
+ end
+
+ -- [<>][1-9] (local label reference)
+ local dir, lnum = match(expr, "^([<>])([1-9])$")
+ if dir then -- Fwd: 247-255, Bkwd: 1-9.
+ return "iJ", lnum + (dir == ">" and 246 or 0)
+ end
+
+ local extname = match(expr, "^extern%s+(%S+)$")
+ if extname then
+ return "iJ", map_extern[extname]
+ end
+
+ -- expr (interpreted as immediate)
+ return "iI", expr
+end
+
+-- Parse displacement expression: +-num, +-expr, +-opsize*num
+local function dispexpr(expr)
+ local disp = expr == "" and 0 or toint(expr)
+ if disp then return disp end
+ local c, dispt = match(expr, "^([+-])%s*(.+)$")
+ if c == "+" then
+ expr = dispt
+ elseif not c then
+ werror("bad displacement expression `"..expr.."'")
+ end
+ local opsize, tailops = match(dispt, "^(%w+)%s*%*%s*(.+)$")
+ local ops, imm = map_opsize[opsize], toint(tailops)
+ if ops and imm then
+ if c == "-" then imm = -imm end
+ return imm*map_opsizenum[ops]
+ end
+ local mode, iexpr = immexpr(dispt)
+ if mode == "iJ" then
+ if c == "-" then werror("cannot invert label reference") end
+ return { iexpr }
+ end
+ return expr -- Need to return original signed expression.
+end
+
+-- Parse register or type expression.
+local function rtexpr(expr)
+ if not expr then return end
+ local tname, ovreg = match(expr, "^([%w_]+):(@[%w_]+)$")
+ local tp = map_type[tname or expr]
+ if tp then
+ local reg = ovreg or tp.reg
+ local rnum = map_reg_num[reg]
+ if not rnum then
+ werror("type `"..(tname or expr).."' needs a register override")
+ end
+ if not map_reg_valid_base[reg] then
+ werror("bad base register override `"..(map_reg_rev[reg] or reg).."'")
+ end
+ return reg, rnum, tp
+ end
+ return expr, map_reg_num[expr]
+end
+
+-- Parse operand and return { mode, opsize, reg, xreg, xsc, disp, imm }.
+local function parseoperand(param)
+ local t = {}
+
+ local expr = param
+ local opsize, tailops = match(param, "^(%w+)%s*(.+)$")
+ if opsize then
+ t.opsize = map_opsize[opsize]
+ if t.opsize then expr = tailops end
+ end
+
+ local br = match(expr, "^%[%s*(.-)%s*%]$")
+ repeat
+ if br then
+ t.mode = "xm"
+
+ -- [disp]
+ t.disp = toint(br)
+ if t.disp then
+ t.mode = "xmO"
+ break
+ end
+
+ -- [reg...]
+ local tp
+ local reg, tailr = match(br, "^([@%w_:]+)%s*(.*)$")
+ reg, t.reg, tp = rtexpr(reg)
+ if not t.reg then
+ -- [expr]
+ t.mode = "xmO"
+ t.disp = dispexpr("+"..br)
+ break
+ end
+
+ if t.reg == -1 then
+ t.vreg, tailr = match(tailr, "^(%b())(.*)$")
+ if not t.vreg then werror("bad variable register expression") end
+ end
+
+ -- [xreg*xsc] or [xreg*xsc+-disp] or [xreg*xsc+-expr]
+ local xsc, tailsc = match(tailr, "^%*%s*([1248])%s*(.*)$")
+ if xsc then
+ if not map_reg_valid_index[reg] then
+ werror("bad index register `"..map_reg_rev[reg].."'")
+ end
+ t.xsc = map_xsc[xsc]
+ t.xreg = t.reg
+ t.vxreg = t.vreg
+ t.reg = nil
+ t.vreg = nil
+ t.disp = dispexpr(tailsc)
+ break
+ end
+ if not map_reg_valid_base[reg] then
+ werror("bad base register `"..map_reg_rev[reg].."'")
+ end
+
+ -- [reg] or [reg+-disp]
+ t.disp = toint(tailr) or (tailr == "" and 0)
+ if t.disp then break end
+
+ -- [reg+xreg...]
+ local xreg, tailx = match(tailr, "^+%s*([@%w_:]+)%s*(.*)$")
+ xreg, t.xreg, tp = rtexpr(xreg)
+ if not t.xreg then
+ -- [reg+-expr]
+ t.disp = dispexpr(tailr)
+ break
+ end
+ if not map_reg_valid_index[xreg] then
+ werror("bad index register `"..map_reg_rev[xreg].."'")
+ end
+
+ if t.xreg == -1 then
+ t.vxreg, tailx = match(tailx, "^(%b())(.*)$")
+ if not t.vxreg then werror("bad variable register expression") end
+ end
+
+ -- [reg+xreg*xsc...]
+ local xsc, tailsc = match(tailx, "^%*%s*([1248])%s*(.*)$")
+ if xsc then
+ t.xsc = map_xsc[xsc]
+ tailx = tailsc
+ end
+
+ -- [...] or [...+-disp] or [...+-expr]
+ t.disp = dispexpr(tailx)
+ else
+ -- imm or opsize*imm
+ local imm = toint(expr)
+ if not imm and sub(expr, 1, 1) == "*" and t.opsize then
+ imm = toint(sub(expr, 2))
+ if imm then
+ imm = imm * map_opsizenum[t.opsize]
+ t.opsize = nil
+ end
+ end
+ if imm then
+ if t.opsize then werror("bad operand size override") end
+ local m = "i"
+ if imm == 1 then m = m.."1" end
+ if imm >= 4294967168 and imm <= 4294967295 then imm = imm-4294967296 end
+ if imm >= -128 and imm <= 127 then m = m.."S" end
+ t.imm = imm
+ t.mode = m
+ break
+ end
+
+ local tp
+ local reg, tailr = match(expr, "^([@%w_:]+)%s*(.*)$")
+ reg, t.reg, tp = rtexpr(reg)
+ if t.reg then
+ if t.reg == -1 then
+ t.vreg, tailr = match(tailr, "^(%b())(.*)$")
+ if not t.vreg then werror("bad variable register expression") end
+ end
+ -- reg
+ if tailr == "" then
+ if t.opsize then werror("bad operand size override") end
+ t.opsize = map_reg_opsize[reg]
+ if t.opsize == "f" then
+ t.mode = t.reg == 0 and "fF" or "f"
+ else
+ if reg == "@w4" then wwarn("bad idea, try again with `esp'") end
+ t.mode = t.reg == 0 and "rmR" or (reg == "@b1" and "rmC" or "rm")
+ end
+ break
+ end
+
+ -- type[idx], type[idx].field, type->field -> [reg+offset_expr]
+ if not tp then werror("bad operand `"..param.."'") end
+ t.mode = "xm"
+ t.disp = format(tp.ctypefmt, tailr)
+ else
+ t.mode, t.imm = immexpr(expr)
+ if sub(t.mode, -1) == "J" then
+ if t.opsize and t.opsize ~= addrsize then
+ werror("bad operand size override")
+ end
+ t.opsize = addrsize
+ end
+ end
+ end
+ until true
+ return t
+end
+
+------------------------------------------------------------------------------
+-- x86 Template String Description
+-- ===============================
+--
+-- Each template string is a list of [match:]pattern pairs,
+-- separated by "|". The first match wins. No match means a
+-- bad or unsupported combination of operand modes or sizes.
+--
+-- The match part and the ":" is omitted if the operation has
+-- no operands. Otherwise the first N characters are matched
+-- against the mode strings of each of the N operands.
+--
+-- The mode string for each operand type is (see parseoperand()):
+-- Integer register: "rm", +"R" for eax, ax, al, +"C" for cl
+-- FP register: "f", +"F" for st0
+-- Index operand: "xm", +"O" for [disp] (pure offset)
+-- Immediate: "i", +"S" for signed 8 bit, +"1" for 1,
+-- +"I" for arg, +"P" for pointer
+-- Any: +"J" for valid jump targets
+--
+-- So a match character "m" (mixed) matches both an integer register
+-- and an index operand (to be encoded with the ModRM/SIB scheme).
+-- But "r" matches only a register and "x" only an index operand
+-- (e.g. for FP memory access operations).
+--
+-- The operand size match string starts right after the mode match
+-- characters and ends before the ":". "dwb" is assumed, if empty.
+-- The effective data size of the operation is matched against this list.
+--
+-- If only the regular "b", "w", "d", "q", "t" operand sizes are
+-- present, then all operands must be the same size. Unspecified sizes
+-- are ignored, but at least one operand must have a size or the pattern
+-- won't match (use the "byte", "word", "dword", "qword", "tword"
+-- operand size overrides. E.g.: mov dword [eax], 1).
+--
+-- If the list has a "1" or "2" prefix, the operand size is taken
+-- from the respective operand and any other operand sizes are ignored.
+-- If the list contains only ".", all operand sizes are ignored.
+-- If the list has a "/" prefix, the concatenated (mixed) operand sizes
+-- are compared to the match.
+--
+-- E.g. "rrdw" matches for either two dword registers or two word
+-- registers. "Fx2dq" matches an st0 operand plus an index operand
+-- pointing to a dword (float) or qword (double).
+--
+-- Every character after the ":" is part of the pattern string:
+-- Hex chars are accumulated to form the opcode (left to right).
+-- "n" disables the standard opcode mods
+-- (otherwise: -1 for "b", o16 prefix for "w")
+-- "r"/"R" adds the reg. number from the 1st/2nd operand to the opcode.
+-- "m"/"M" generates ModRM/SIB from the 1st/2nd operand.
+-- The spare 3 bits are either filled with the last hex digit or
+-- the result from a previous "r"/"R". The opcode is restored.
+--
+-- All of the following characters force a flush of the opcode:
+-- "o"/"O" stores a pure 32 bit disp (offset) from the 1st/2nd operand.
+-- "S" stores a signed 8 bit immediate from the last operand.
+-- "U" stores an unsigned 8 bit immediate from the last operand.
+-- "W" stores an unsigned 16 bit immediate from the last operand.
+-- "i" stores an operand sized immediate from the last operand.
+-- "I" dito, but generates an action code to optionally modify
+-- the opcode (+2) for a signed 8 bit immediate.
+-- "J" generates one of the REL action codes from the last operand.
+--
+------------------------------------------------------------------------------
+
+-- Template strings for x86 instructions. Ordered by first opcode byte.
+-- Unimplemented opcodes (deliberate omissions) are marked with *.
+local map_op = {
+ -- 00-05: add...
+ -- 06: *push es
+ -- 07: *pop es
+ -- 08-0D: or...
+ -- 0E: *push cs
+ -- 0F: two byte opcode prefix
+ -- 10-15: adc...
+ -- 16: *push ss
+ -- 17: *pop ss
+ -- 18-1D: sbb...
+ -- 1E: *push ds
+ -- 1F: *pop ds
+ -- 20-25: and...
+ es_0 = "26",
+ -- 27: *daa
+ -- 28-2D: sub...
+ cs_0 = "2E",
+ -- 2F: *das
+ -- 30-35: xor...
+ ss_0 = "36",
+ -- 37: *aaa
+ -- 38-3D: cmp...
+ ds_0 = "3E",
+ -- 3F: *aas
+ inc_1 = "rdw:40r|m:FF0m",
+ dec_1 = "rdw:48r|m:FF1m",
+ push_1 = "rdw:50r|mdw:FF6m|S.:6AS|ib:n6Ai|i.:68i",
+ pop_1 = "rdw:58r|mdw:8F0m",
+ -- 60: *pusha, *pushad, *pushaw
+ -- 61: *popa, *popad, *popaw
+ -- 62: *bound rdw,x
+ -- 63: *arpl mw,rw
+ fs_0 = "64",
+ gs_0 = "65",
+ o16_0 = "66",
+ a16_0 = "67",
+ -- 68: push idw
+ -- 69: imul rdw,mdw,idw
+ -- 6A: push ib
+ -- 6B: imul rdw,mdw,S
+ -- 6C: *insb
+ -- 6D: *insd, *insw
+ -- 6E: *outsb
+ -- 6F: *outsd, *outsw
+ -- 70-7F: jcc lb
+ -- 80: add... mb,i
+ -- 81: add... mdw,i
+ -- 82: *undefined
+ -- 83: add... mdw,S
+ test_2 = "mr:85Rm|rm:85rM|Ri:A9ri|mi:F70mi",
+ -- 86: xchg rb,mb
+ -- 87: xchg rdw,mdw
+ -- 88: mov mb,r
+ -- 89: mov mdw,r
+ -- 8A: mov r,mb
+ -- 8B: mov r,mdw
+ -- 8C: *mov mdw,seg
+ lea_2 = "rxd:8DrM",
+ -- 8E: *mov seg,mdw
+ -- 8F: pop mdw
+ nop_0 = "90",
+ xchg_2 = "Rrdw:90R|rRdw:90r|rm:87rM|mr:87Rm",
+ cbw_0 = "6698",
+ cwde_0 = "98",
+ cwd_0 = "6699",
+ cdq_0 = "99",
+ -- 9A: *call iw:idw
+ wait_0 = "9B",
+ fwait_0 = "9B",
+ pushf_0 = "9C",
+ pushfw_0 = "669C",
+ pushfd_0 = "9C",
+ popf_0 = "9D",
+ popfw_0 = "669D",
+ popfd_0 = "9D",
+ sahf_0 = "9E",
+ lahf_0 = "9F",
+ mov_2 = "OR:A3o|RO:A1O|mr:89Rm|rm:8BrM|rib:nB0ri|ridw:B8ri|mi:C70mi",
+ movsb_0 = "A4",
+ movsw_0 = "66A5",
+ movsd_0 = "A5",
+ cmpsb_0 = "A6",
+ cmpsw_0 = "66A7",
+ cmpsd_0 = "A7",
+ -- A8: test Rb,i
+ -- A9: test Rdw,i
+ stosb_0 = "AA",
+ stosw_0 = "66AB",
+ stosd_0 = "AB",
+ lodsb_0 = "AC",
+ lodsw_0 = "66AD",
+ lodsd_0 = "AD",
+ scasb_0 = "AE",
+ scasw_0 = "66AF",
+ scasd_0 = "AF",
+ -- B0-B7: mov rb,i
+ -- B8-BF: mov rdw,i
+ -- C0: rol... mb,i
+ -- C1: rol... mdw,i
+ ret_1 = "i.:nC2W",
+ ret_0 = "C3",
+ -- C4: *les rdw,mq
+ -- C5: *lds rdw,mq
+ -- C6: mov mb,i
+ -- C7: mov mdw,i
+ -- C8: *enter iw,ib
+ leave_0 = "C9",
+ -- CA: *retf iw
+ -- CB: *retf
+ int3_0 = "CC",
+ int_1 = "i.:nCDU",
+ into_0 = "CE",
+ -- CF: *iret
+ -- D0: rol... mb,1
+ -- D1: rol... mdw,1
+ -- D2: rol... mb,cl
+ -- D3: rol... mb,cl
+ -- D4: *aam ib
+ -- D5: *aad ib
+ -- D6: *salc
+ -- D7: *xlat
+ -- D8-DF: floating point ops
+ -- E0: *loopne
+ -- E1: *loope
+ -- E2: *loop
+ -- E3: *jcxz, *jecxz
+ -- E4: *in Rb,ib
+ -- E5: *in Rdw,ib
+ -- E6: *out ib,Rb
+ -- E7: *out ib,Rdw
+ call_1 = "md:FF2m|J.:E8J",
+ jmp_1 = "md:FF4m|J.:E9J", -- short: EB
+ -- EA: *jmp iw:idw
+ -- EB: jmp ib
+ -- EC: *in Rb,dx
+ -- ED: *in Rdw,dx
+ -- EE: *out dx,Rb
+ -- EF: *out dx,Rdw
+ -- F0: *lock
+ int1_0 = "F1",
+ repne_0 = "F2",
+ repnz_0 = "F2",
+ rep_0 = "F3",
+ repe_0 = "F3",
+ repz_0 = "F3",
+ -- F4: *hlt
+ cmc_0 = "F5",
+ -- F6: test... mb,i; div... mb
+ -- F7: test... mdw,i; div... mdw
+ clc_0 = "F8",
+ stc_0 = "F9",
+ -- FA: *cli
+ cld_0 = "FC",
+ std_0 = "FD",
+ -- FE: inc... mb
+ -- FF: inc... mdw
+
+ -- misc ops
+ not_1 = "m:F72m",
+ neg_1 = "m:F73m",
+ mul_1 = "m:F74m",
+ imul_1 = "m:F75m",
+ div_1 = "m:F76m",
+ idiv_1 = "m:F77m",
+
+ imul_2 = "rmdw:0FAFrM|rIdw:69rmI|rSdw:6BrmS|ridw:69rmi",
+ imul_3 = "rmIdw:69rMI|rmSdw:6BrMS|rmidw:69rMi",
+
+ movzx_2 = "rm/db:0FB6rM|rm/wb:0FB6rM|rm/dw:0FB7rM",
+ movsx_2 = "rm/db:0FBErM|rm/wb:0FBErM|rm/dw:0FBFrM",
+
+ bswap_1 = "rd:0FC8r",
+ bsf_2 = "rmdw:0FBCrM",
+ bsr_2 = "rmdw:0FBDrM",
+ bt_2 = "mrdw:0FA3Rm|midw:0FBA4mU",
+ btc_2 = "mrdw:0FBBRm|midw:0FBA7mU",
+ btr_2 = "mrdw:0FB3Rm|midw:0FBA6mU",
+ bts_2 = "mrdw:0FABRm|midw:0FBA5mU",
+
+ rdtsc_0 = "0F31", -- P1+
+ cpuid_0 = "0FA2", -- P1+
+
+ -- floating point ops
+ fst_1 = "ff:DDD0r|xd:D92m|xq:DD2m",
+ fstp_1 = "ff:DDD8r|xd:D93m|xq:DD3m|xt:DB7m",
+ fld_1 = "ff:D9C0r|xd:D90m|xq:DD0m|xt:DB5m",
+
+ fpop_0 = "DDD8", -- Alias for fstp st0.
+
+ fist_1 = "xw:nDF2m|xd:DB2m",
+ fistp_1 = "xw:nDF3m|xd:DB3m|xq:DF7m",
+ fild_1 = "xw:nDF0m|xd:DB0m|xq:DF5m",
+
+ fxch_0 = "D9C9",
+ fxch_1 = "ff:D9C8r",
+ fxch_2 = "fFf:D9C8r|Fff:D9C8R",
+
+ fucom_1 = "ff:DDE0r",
+ fucom_2 = "Fff:DDE0R",
+ fucomp_1 = "ff:DDE8r",
+ fucomp_2 = "Fff:DDE8R",
+ fucomi_1 = "ff:DBE8r", -- P6+
+ fucomi_2 = "Fff:DBE8R", -- P6+
+ fucomip_1 = "ff:DFE8r", -- P6+
+ fucomip_2 = "Fff:DFE8R", -- P6+
+ fcomi_1 = "ff:DBF0r", -- P6+
+ fcomi_2 = "Fff:DBF0R", -- P6+
+ fcomip_1 = "ff:DFF0r", -- P6+
+ fcomip_2 = "Fff:DFF0R", -- P6+
+ fucompp_0 = "DAE9",
+ fcompp_0 = "DED9",
+
+ fldcw_1 = "xw:nD95m",
+ fstcw_1 = "xw:n9BD97m",
+ fnstcw_1 = "xw:nD97m",
+ fstsw_1 = "Rw:n9BDFE0|xw:n9BDD7m",
+ fnstsw_1 = "Rw:nDFE0|xw:nDD7m",
+ fclex_0 = "9BDBE2",
+ fnclex_0 = "DBE2",
+
+ fnop_0 = "D9D0",
+ -- D9D1-D9DF: unassigned
+
+ fchs_0 = "D9E0",
+ fabs_0 = "D9E1",
+ -- D9E2: unassigned
+ -- D9E3: unassigned
+ ftst_0 = "D9E4",
+ fxam_0 = "D9E5",
+ -- D9E6: unassigned
+ -- D9E7: unassigned
+ fld1_0 = "D9E8",
+ fldl2t_0 = "D9E9",
+ fldl2e_0 = "D9EA",
+ fldpi_0 = "D9EB",
+ fldlg2_0 = "D9EC",
+ fldln2_0 = "D9ED",
+ fldz_0 = "D9EE",
+ -- D9EF: unassigned
+
+ f2xm1_0 = "D9F0",
+ fyl2x_0 = "D9F1",
+ fptan_0 = "D9F2",
+ fpatan_0 = "D9F3",
+ fxtract_0 = "D9F4",
+ fprem1_0 = "D9F5",
+ fdecstp_0 = "D9F6",
+ fincstp_0 = "D9F7",
+ fprem_0 = "D9F8",
+ fyl2xp1_0 = "D9F9",
+ fsqrt_0 = "D9FA",
+ fsincos_0 = "D9FB",
+ frndint_0 = "D9FC",
+ fscale_0 = "D9FD",
+ fsin_0 = "D9FE",
+ fcos_0 = "D9FF",
+
+ -- SSE, SSE2
+ andnpd_2 = "rmo:660F55rM",
+ andnps_2 = "rmo:0F55rM",
+ andpd_2 = "rmo:660F54rM",
+ andps_2 = "rmo:0F54rM",
+ clflush_1 = "x.:0FAE7m",
+ cmppd_3 = "rmio:660FC2rMU",
+ cmpps_3 = "rmio:0FC2rMU",
+ cmpsd_3 = "rmio:F20FC2rMU",
+ cmpss_3 = "rmio:F30FC2rMU",
+ comisd_2 = "rmo:660F2FrM",
+ comiss_2 = "rmo:0F2FrM",
+ cvtdq2pd_2 = "rro:F30FE6rM|rx/oq:",
+ cvtdq2ps_2 = "rmo:0F5BrM",
+ cvtpd2dq_2 = "rmo:F20FE6rM",
+ cvtpd2ps_2 = "rmo:660F5ArM",
+ cvtpi2pd_2 = "rx/oq:660F2ArM",
+ cvtpi2ps_2 = "rx/oq:0F2ArM",
+ cvtps2dq_2 = "rmo:660F5BrM",
+ cvtps2pd_2 = "rro:0F5ArM|rx/oq:",
+ cvtsd2si_2 = "rr/do:F20F2DrM|rx/dq:",
+ cvtsd2ss_2 = "rro:F20F5ArM|rx/oq:",
+ cvtsi2sd_2 = "rm/od:F20F2ArM",
+ cvtsi2ss_2 = "rm/od:F30F2ArM",
+ cvtss2sd_2 = "rro:F30F5ArM|rx/od:",
+ cvtss2si_2 = "rr/do:F20F2CrM|rx/dd:",
+ cvttpd2dq_2 = "rmo:660FE6rM",
+ cvttps2dq_2 = "rmo:F30F5BrM",
+ cvttsd2si_2 = "rr/do:F20F2CrM|rx/dq:",
+ cvttss2si_2 = "rr/do:F30F2CrM|rx/dd:",
+ ldmxcsr_1 = "xd:0FAE2m",
+ lfence_0 = "0FAEE8",
+ maskmovdqu_2 = "rro:660FF7rM",
+ mfence_0 = "0FAEF0",
+ movapd_2 = "rmo:660F28rM|mro:660F29Rm",
+ movaps_2 = "rmo:0F28rM|mro:0F29Rm",
+ movd_2 = "rm/od:660F6ErM|mr/do:660F7ERm",
+ movdqa_2 = "rmo:660F6FrM|mro:660F7FRm",
+ movdqu_2 = "rmo:F30F6FrM|mro:F30F7FRm",
+ movhlps_2 = "rro:0F12rM",
+ movhpd_2 = "rx/oq:660F16rM|xr/qo:660F17Rm",
+ movhps_2 = "rx/oq:0F16rM|xr/qo:0F17Rm",
+ movlhps_2 = "rro:0F16rM",
+ movlpd_2 = "rx/oq:660F12rM|xr/qo:660F13Rm",
+ movlps_2 = "rx/oq:0F12rM|xr/qo:0F13Rm",
+ movmskpd_2 = "rr/do:660F50rM",
+ movmskps_2 = "rr/do:0F50rM",
+ movntdq_2 = "xro:660FE7Rm",
+ movnti_2 = "xrd:0FC3Rm",
+ movntpd_2 = "xro:660F2BRm",
+ movntps_2 = "xro:0F2BRm",
+ movq_2 = "rro:F30F7ErM|rx/oq:|xr/qo:660FD6Rm",
+ movsd_2 = "rro:F20F10rM|rx/oq:|xr/qo:F20F11Rm",
+ movss_2 = "rro:F30F10rM|rx/od:|xr/do:F30F11Rm",
+ movupd_2 = "rmo:660F10rM|mro:660F11Rm",
+ movups_2 = "rmo:0F10rM|mro:0F11Rm",
+ orpd_2 = "rmo:660F56rM",
+ orps_2 = "rmo:0F56rM",
+ packssdw_2 = "rmo:660F6BrM",
+ packsswb_2 = "rmo:660F63rM",
+ packuswb_2 = "rmo:660F67rM",
+ paddb_2 = "rmo:660FFCrM",
+ paddd_2 = "rmo:660FFErM",
+ paddq_2 = "rmo:660FD4rM",
+ paddsb_2 = "rmo:660FECrM",
+ paddsw_2 = "rmo:660FEDrM",
+ paddusb_2 = "rmo:660FDCrM",
+ paddusw_2 = "rmo:660FDDrM",
+ paddw_2 = "rmo:660FFDrM",
+ pand_2 = "rmo:660FDBrM",
+ pandn_2 = "rmo:660FDFrM",
+ pause_0 = "F390",
+ pavgb_2 = "rmo:660FE0rM",
+ pavgw_2 = "rmo:660FE3rM",
+ pcmpeqb_2 = "rmo:660F74rM",
+ pcmpeqd_2 = "rmo:660F76rM",
+ pcmpeqw_2 = "rmo:660F75rM",
+ pcmpgtb_2 = "rmo:660F64rM",
+ pcmpgtd_2 = "rmo:660F66rM",
+ pcmpgtw_2 = "rmo:660F65rM",
+ pextrw_3 = "rri/do:660FC5rMU|xri/wo:660F3A15nrMU", -- Mem op: SSE4.1 only.
+ pinsrw_3 = "rri/od:660FC4rMU|rxi/ow:",
+ pmaddwd_2 = "rmo:660FF5rM",
+ pmaxsw_2 = "rmo:660FEErM",
+ pmaxub_2 = "rmo:660FDErM",
+ pminsw_2 = "rmo:660FEArM",
+ pminub_2 = "rmo:660FDArM",
+ pmovmskb_2 = "rr/do:660FD7rM",
+ pmulhuw_2 = "rmo:660FE4rM",
+ pmulhw_2 = "rmo:660FE5rM",
+ pmullw_2 = "rmo:660FD5rM",
+ pmuludq_2 = "rmo:660FF4rM",
+ por_2 = "rmo:660FEBrM",
+ prefetchnta_1 = "xb:n0F180m",
+ prefetcht0_1 = "xb:n0F181m",
+ prefetcht1_1 = "xb:n0F182m",
+ prefetcht2_1 = "xb:n0F183m",
+ psadbw_2 = "rmo:660FF6rM",
+ pshufd_3 = "rmio:660F70rMU",
+ pshufhw_3 = "rmio:F30F70rMU",
+ pshuflw_3 = "rmio:F20F70rMU",
+ pslld_2 = "rmo:660FF2rM|rio:660F726mU",
+ pslldq_2 = "rio:660F737mU",
+ psllq_2 = "rmo:660FF3rM|rio:660F736mU",
+ psllw_2 = "rmo:660FF1rM|rio:660F716mU",
+ psrad_2 = "rmo:660FE2rM|rio:660F724mU",
+ psraw_2 = "rmo:660FE1rM|rio:660F714mU",
+ psrld_2 = "rmo:660FD2rM|rio:660F722mU",
+ psrldq_2 = "rio:660F733mU",
+ psrlq_2 = "rmo:660FD3rM|rio:660F732mU",
+ psrlw_2 = "rmo:660FD1rM|rio:660F712mU",
+ psubb_2 = "rmo:660FF8rM",
+ psubd_2 = "rmo:660FFArM",
+ psubq_2 = "rmo:660FFBrM",
+ psubsb_2 = "rmo:660FE8rM",
+ psubsw_2 = "rmo:660FE9rM",
+ psubusb_2 = "rmo:660FD8rM",
+ psubusw_2 = "rmo:660FD9rM",
+ psubw_2 = "rmo:660FF9rM",
+ punpckhbw_2 = "rmo:660F68rM",
+ punpckhdq_2 = "rmo:660F6ArM",
+ punpckhqdq_2 = "rmo:660F6DrM",
+ punpckhwd_2 = "rmo:660F69rM",
+ punpcklbw_2 = "rmo:660F60rM",
+ punpckldq_2 = "rmo:660F62rM",
+ punpcklqdq_2 = "rmo:660F6CrM",
+ punpcklwd_2 = "rmo:660F61rM",
+ pxor_2 = "rmo:660FEFrM",
+ rcpps_2 = "rmo:0F53rM",
+ rcpss_2 = "rmo:F30F53rM",
+ rsqrtps_2 = "rmo:0F52rM",
+ rsqrtss_2 = "rmo:F30F52rM",
+ sfence_0 = "0FAEF8",
+ shufpd_3 = "rmio:660FC6rMU",
+ shufps_3 = "rmio:0FC6rMU",
+ stmxcsr_1 = "xd:0FAE3m",
+ ucomisd_2 = "rmo:660F2ErM",
+ ucomiss_2 = "rmo:0F2ErM",
+ unpckhpd_2 = "rmo:660F15rM",
+ unpckhps_2 = "rmo:0F15rM",
+ unpcklpd_2 = "rmo:660F14rM",
+ unpcklps_2 = "rmo:0F14rM",
+ xorpd_2 = "rmo:660F57rM",
+ xorps_2 = "rmo:0F57rM",
+
+ -- SSE3 ops
+ fisttp_1 = "xw:nDF1m|xd:DB1m|xq:DD1m",
+ addsubpd_2 = "rmo:660FD0rM",
+ addsubps_2 = "rmo:F20FD0rM",
+ haddpd_2 = "rmo:660F7CrM",
+ haddps_2 = "rmo:F20F7CrM",
+ hsubpd_2 = "rmo:660F7DrM",
+ hsubps_2 = "rmo:F20F7DrM",
+ lddqu_2 = "rxo:F20FF0rM",
+ movddup_2 = "rmo:F20F12rM",
+ movshdup_2 = "rmo:F30F16rM",
+ movsldup_2 = "rmo:F30F12rM",
+
+ -- SSSE3 ops
+ pabsb_2 = "rmo:660F381CrM",
+ pabsd_2 = "rmo:660F381ErM",
+ pabsw_2 = "rmo:660F381DrM",
+ palignr_3 = "rmio:660F3A0FrMU",
+ phaddd_2 = "rmo:660F3802rM",
+ phaddsw_2 = "rmo:660F3803rM",
+ phaddw_2 = "rmo:660F3801rM",
+ phsubd_2 = "rmo:660F3806rM",
+ phsubsw_2 = "rmo:660F3807rM",
+ phsubw_2 = "rmo:660F3805rM",
+ pmaddubsw_2 = "rmo:660F3804rM",
+ pmulhrsw_2 = "rmo:660F380BrM",
+ pshufb_2 = "rmo:660F3800rM",
+ psignb_2 = "rmo:660F3808rM",
+ psignd_2 = "rmo:660F380ArM",
+ psignw_2 = "rmo:660F3809rM",
+
+ -- SSE4.1 ops
+ blendpd_3 = "rmio:660F3A0DrMU",
+ blendps_3 = "rmio:660F3A0CrMU",
+ blendvpd_3 = "rmRo:660F3815rM",
+ blendvps_3 = "rmRo:660F3814rM",
+ dppd_3 = "rmio:660F3A41rMU",
+ dpps_3 = "rmio:660F3A40rMU",
+ extractps_3 = "mri/do:660F3A17RmU",
+ insertps_3 = "rrio:660F3A41rMU|rxi/od:",
+ movntdqa_2 = "rmo:660F382ArM",
+ mpsadbw_3 = "rmio:660F3A42rMU",
+ packusdw_2 = "rmo:660F382BrM",
+ pblendvb_3 = "rmRo:660F3810rM",
+ pblendw_3 = "rmio:660F3A0ErMU",
+ pcmpeqq_2 = "rmo:660F3829rM",
+ pextrb_3 = "rri/do:660F3A14nRmU|xri/bo:",
+ pextrd_3 = "mri/do:660F3A16RmU",
+ -- x64: pextrq
+ -- pextrw is SSE2, mem operand is SSE4.1 only
+ phminposuw_2 = "rmo:660F3841rM",
+ pinsrb_3 = "rri/od:660F3A20nrMU|rxi/ob:",
+ pinsrd_3 = "rmi/od:660F3A22rMU",
+ -- x64: pinsrq
+ pmaxsb_2 = "rmo:660F383CrM",
+ pmaxsd_2 = "rmo:660F383DrM",
+ pmaxud_2 = "rmo:660F383FrM",
+ pmaxuw_2 = "rmo:660F383ErM",
+ pminsb_2 = "rmo:660F3838rM",
+ pminsd_2 = "rmo:660F3839rM",
+ pminud_2 = "rmo:660F383BrM",
+ pminuw_2 = "rmo:660F383ArM",
+ pmovsxbd_2 = "rro:660F3821rM|rx/od:",
+ pmovsxbq_2 = "rro:660F3822rM|rx/ow:",
+ pmovsxbw_2 = "rro:660F3820rM|rx/oq:",
+ pmovsxdq_2 = "rro:660F3825rM|rx/oq:",
+ pmovsxwd_2 = "rro:660F3823rM|rx/oq:",
+ pmovsxwq_2 = "rro:660F3824rM|rx/od:",
+ pmovzxbd_2 = "rro:660F3831rM|rx/od:",
+ pmovzxbq_2 = "rro:660F3832rM|rx/ow:",
+ pmovzxbw_2 = "rro:660F3830rM|rx/oq:",
+ pmovzxdq_2 = "rro:660F3835rM|rx/oq:",
+ pmovzxwd_2 = "rro:660F3833rM|rx/oq:",
+ pmovzxwq_2 = "rro:660F3834rM|rx/od:",
+ pmuldq_2 = "rmo:660F3828rM",
+ pmulld_2 = "rmo:660F3840rM",
+ ptest_2 = "rmo:660F3817rM",
+ roundpd_3 = "rmio:660F3A09rMU",
+ roundps_3 = "rmio:660F3A08rMU",
+ roundsd_3 = "rrio:660F3A0BrMU|rxi/oq:",
+ roundss_3 = "rrio:660F3A0ArMU|rxi/od:",
+
+ -- SSE4.2 ops
+ crc32_2 = "rmd:F20F38F1rM|rm/dw:66F20F38F1rM|rm/db:F20F38F0nrM",
+ pcmpestri_3 = "rmio:660F3A61rMU",
+ pcmpestrm_3 = "rmio:660F3A60rMU",
+ pcmpgtq_2 = "rmo:660F3837rM",
+ pcmpistri_3 = "rmio:660F3A63rMU",
+ pcmpistrm_3 = "rmio:660F3A62rMU",
+ popcnt_2 = "rmdw:F30FB8rM",
+
+ -- SSE4a
+ extrq_2 = "rro:660F79rM",
+ extrq_3 = "riio:660F780mUU",
+ insertq_2 = "rro:F20F79rM",
+ insertq_4 = "rriio:F20F78rMUU",
+ lzcnt_2 = "rmdw:F30FBDrM",
+ movntsd_2 = "xr/qo:F20F2BRm",
+ movntss_2 = "xr/do:F30F2BRm",
+ -- popcnt is also in SSE4.2
+}
+
+------------------------------------------------------------------------------
+
+-- Arithmetic ops.
+for name,n in pairs{ add = 0, ["or"] = 1, adc = 2, sbb = 3,
+ ["and"] = 4, sub = 5, xor = 6, cmp = 7 } do
+ local n8 = n * 8
+ map_op[name.."_2"] = format(
+ "mr:%02XRm|rm:%02XrM|mI1dw:81%XmI|mS1dw:83%XmS|Ri1dwb:%02Xri|mi1dwb:81%Xmi",
+ 1+n8, 3+n8, n, n, 5+n8, n)
+end
+
+-- Shift ops.
+for name,n in pairs{ rol = 0, ror = 1, rcl = 2, rcr = 3,
+ shl = 4, shr = 5, sar = 7, sal = 4 } do
+ map_op[name.."_2"] = format("m1:D1%Xm|mC1dwb:D3%Xm|mi:C1%XmU", n, n, n)
+end
+
+-- Conditional ops.
+for cc,n in pairs(map_cc) do
+ map_op["j"..cc.."_1"] = format("J.:0F8%XJ", n) -- short: 7%X
+ map_op["set"..cc.."_1"] = format("mb:n0F9%X2m", n)
+ map_op["cmov"..cc.."_2"] = format("rmdw:0F4%XrM", n) -- P6+
+end
+
+-- FP arithmetic ops.
+for name,n in pairs{ add = 0, mul = 1, com = 2, comp = 3,
+ sub = 4, subr = 5, div = 6, divr = 7 } do
+ local nc = 192 + n * 8
+ local nr = nc + (n < 4 and 0 or (n % 2 == 0 and 8 or -8))
+ local fn = "f"..name
+ map_op[fn.."_1"] = format("ff:D8%02Xr|xd:D8%Xm|xq:DC%Xm", nc, n, n)
+ if n == 2 or n == 3 then
+ map_op[fn.."_2"] = format("Fff:D8%02XR|Fx2d:D8%XM|Fx2q:DC%XM", nc, n, n)
+ else
+ map_op[fn.."_2"] = format("Fff:D8%02XR|fFf:DC%02Xr|Fx2d:D8%XM|Fx2q:DC%XM", nc, nr, n, n)
+ map_op[fn.."p_1"] = format("ff:DE%02Xr", nr)
+ map_op[fn.."p_2"] = format("fFf:DE%02Xr", nr)
+ end
+ map_op["fi"..name.."_1"] = format("xd:DA%Xm|xw:nDE%Xm", n, n)
+end
+
+-- FP conditional moves.
+for cc,n in pairs{ b=0, e=1, be=2, u=3, nb=4, ne=5, nbe=6, nu=7 } do
+ local n4 = n % 4
+ local nc = 56000 + n4 * 8 + (n-n4) * 64
+ map_op["fcmov"..cc.."_1"] = format("ff:%04Xr", nc) -- P6+
+ map_op["fcmov"..cc.."_2"] = format("Fff:%04XR", nc) -- P6+
+end
+
+-- SSE FP arithmetic ops.
+for name,n in pairs{ sqrt = 1, add = 8, mul = 9,
+ sub = 12, min = 13, div = 14, max = 15 } do
+ map_op[name.."ps_2"] = format("rmo:0F5%XrM", n)
+ map_op[name.."ss_2"] = format("rro:F30F5%XrM|rx/od:", n)
+ map_op[name.."pd_2"] = format("rmo:660F5%XrM", n)
+ map_op[name.."sd_2"] = format("rro:F20F5%XrM|rx/oq:", n)
+end
+
+------------------------------------------------------------------------------
+
+-- Process pattern string.
+local function dopattern(pat, args, sz, op)
+ local digit, addin
+ local opcode = 0
+ local szov = sz
+ local narg = 1
+
+ -- Limit number of section buffer positions used by a single dasm_put().
+ -- A single opcode needs a maximum of 2 positions. !x64
+ if secpos+2 > maxsecpos then wflush() end
+
+ -- Process each character.
+ for c in gmatch(pat.."|", ".") do
+ if match(c, "%x") then -- Hex digit.
+ digit = byte(c) - 48
+ if digit > 48 then digit = digit - 39
+ elseif digit > 16 then digit = digit - 7 end
+ opcode = opcode*16 + digit
+ addin = nil
+ elseif c == "n" then -- Disable operand size mods for opcode.
+ szov = nil
+ elseif c == "r" then -- Merge 1st operand regno. into opcode.
+ addin = args[1]; opcode = opcode + addin.reg
+ if narg < 2 then narg = 2 end
+ elseif c == "R" then -- Merge 2nd operand regno. into opcode.
+ addin = args[2]; opcode = opcode + addin.reg
+ narg = 3
+ elseif c == "m" or c == "M" then -- Encode ModRM/SIB.
+ local s
+ if addin then
+ s = addin.reg
+ opcode = opcode - s -- Undo regno opcode merge.
+ else
+ s = opcode % 16 -- Undo last digit.
+ opcode = (opcode - s) / 16
+ end
+ wputop(szov, opcode); opcode = nil
+ local imark = (sub(pat, -1) == "I") -- Force a mark (ugly).
+ -- Put ModRM/SIB with regno/last digit as spare.
+ local nn = c == "m" and 1 or 2
+ wputmrmsib(args[nn], imark, s, addin and addin.vreg)
+ if narg <= nn then narg = nn + 1 end
+ addin = nil
+ else
+ if opcode then -- Flush opcode.
+ if addin and addin.reg == -1 then
+ wputop(szov, opcode + 1)
+ waction("VREG", addin.vreg); wputxb(0)
+ else
+ wputop(szov, opcode)
+ end
+ opcode = nil
+ end
+ if c == "|" then break end
+ if c == "o" then -- Offset (pure 32 bit displacement).
+ wputdarg(args[1].disp); if narg < 2 then narg = 2 end
+ elseif c == "O" then
+ wputdarg(args[2].disp); narg = 3
+ else
+ -- Anything else is an immediate operand.
+ local a = args[narg]
+ narg = narg + 1
+ local mode, imm = a.mode, a.imm
+ if mode == "iJ" and not match("iIJ", c) then
+ werror("bad operand size for label")
+ end
+ if c == "S" then
+ wputsbarg(imm)
+ elseif c == "U" then
+ wputbarg(imm)
+ elseif c == "W" then
+ wputwarg(imm)
+ elseif c == "i" or c == "I" then
+ if mode == "iJ" then
+ wputlabel("IMM_", imm, 1)
+ elseif mode == "iI" and c == "I" then
+ waction(sz == "w" and "IMM_WB" or "IMM_DB", imm)
+ else
+ wputszarg(sz, imm)
+ end
+ elseif c == "J" then
+ if mode == "iPJ" then
+ waction("REL_A", imm) -- !x64 (secpos)
+ else
+ wputlabel("REL_", imm, 2)
+ end
+ else
+ werror("bad char `"..c.."' in pattern `"..pat.."' for `"..op.."'")
+ end
+ end
+ end
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Mapping of operand modes to short names. Suppress output with '#'.
+local map_modename = {
+ r = "reg", R = "eax", C = "cl", x = "mem", m = "mrm", i = "imm",
+ f = "stx", F = "st0", J = "lbl", ["1"] = "1",
+ I = "#", S = "#", O = "#",
+}
+
+-- Return a table/string showing all possible operand modes.
+local function templatehelp(template, nparams)
+ if nparams == 0 then return "" end
+ local t = {}
+ for tm in gmatch(template, "[^%|]+") do
+ local s = map_modename[sub(tm, 1, 1)]
+ s = s..gsub(sub(tm, 2, nparams), ".", function(c)
+ return ", "..map_modename[c]
+ end)
+ if not match(s, "#") then t[#t+1] = s end
+ end
+ return t
+end
+
+-- Match operand modes against mode match part of template.
+local function matchtm(tm, args)
+ for i=1,#args do
+ if not match(args[i].mode, sub(tm, i, i)) then return end
+ end
+ return true
+end
+
+-- Handle opcodes defined with template strings.
+map_op[".template__"] = function(params, template, nparams)
+ if not params then return templatehelp(template, nparams) end
+ local args = {}
+
+ -- Zero-operand opcodes have no match part.
+ if #params == 0 then
+ dopattern(template, args, "d", params.op)
+ return
+ end
+
+ -- Determine common operand size (coerce undefined size) or flag as mixed.
+ local sz, szmix
+ for i,p in ipairs(params) do
+ args[i] = parseoperand(p)
+ local nsz = args[i].opsize
+ if nsz then
+ if sz and sz ~= nsz then szmix = true else sz = nsz end
+ end
+ end
+
+ -- Try all match:pattern pairs (separated by '|').
+ local gotmatch, lastpat
+ for tm in gmatch(template, "[^%|]+") do
+ -- Split off size match (starts after mode match) and pattern string.
+ local szm, pat = match(tm, "^(.-):(.*)$", #args+1)
+ if pat == "" then pat = lastpat else lastpat = pat end
+ if matchtm(tm, args) then
+ local prefix = sub(szm, 1, 1)
+ if prefix == "/" then -- Match both operand sizes.
+ if args[1].opsize == sub(szm, 2, 2) and
+ args[2].opsize == sub(szm, 3, 3) then
+ dopattern(pat, args, sz, params.op) -- Process pattern string.
+ return
+ end
+ else -- Match common operand size.
+ local szp = sz
+ if szm == "" then szm = "dwb" end -- Default size match.
+ if prefix == "1" then szp = args[1].opsize; szmix = nil
+ elseif prefix == "2" then szp = args[2].opsize; szmix = nil end
+ if not szmix and (prefix == "." or match(szm, szp or "#")) then
+ dopattern(pat, args, szp, params.op) -- Process pattern string.
+ return
+ end
+ end
+ gotmatch = true
+ end
+ end
+
+ local msg = "bad operand mode"
+ if gotmatch then
+ if szmix then
+ msg = "mixed operand size"
+ else
+ msg = sz and "bad operand size" or "missing operand size"
+ end
+ end
+
+ werror(msg.." in `"..opmodestr(params.op, args).."'")
+end
+
+------------------------------------------------------------------------------
+
+-- Pseudo-opcodes for data storage.
+local function op_data(params)
+ if not params then return "imm..." end
+ local sz = sub(params.op, 2, 2)
+ if sz == "a" then sz = addrsize end
+ for _,p in ipairs(params) do
+ local a = parseoperand(p)
+ if sub(a.mode, 1, 1) ~= "i" or (a.opsize and a.opsize ~= sz) then
+ werror("bad mode or size in `"..p.."'")
+ end
+ if a.mode == "iJ" then
+ wputlabel("IMM_", a.imm, 1)
+ else
+ wputszarg(sz, a.imm)
+ end
+ end
+end
+
+map_op[".byte_*"] = op_data
+map_op[".sbyte_*"] = op_data
+map_op[".word_*"] = op_data
+map_op[".dword_*"] = op_data
+map_op[".aword_*"] = op_data
+
+------------------------------------------------------------------------------
+
+-- Pseudo-opcode to mark the position where the action list is to be emitted.
+map_op[".actionlist_1"] = function(params)
+ if not params then return "cvar" end
+ local name = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeactions(out, name) end)
+end
+
+-- Pseudo-opcode to mark the position where the global enum is to be emitted.
+map_op[".globals_1"] = function(params)
+ if not params then return "prefix" end
+ local prefix = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeglobals(out, prefix) end)
+end
+
+-- Pseudo-opcode to mark the position where the global names are to be emitted.
+map_op[".globalnames_1"] = function(params)
+ if not params then return "cvar" end
+ local name = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeglobalnames(out, name) end)
+end
+
+-- Pseudo-opcode to mark the position where the extern names are to be emitted.
+map_op[".externnames_1"] = function(params)
+ if not params then return "cvar" end
+ local name = params[1] -- No syntax check. You get to keep the pieces.
+ wline(function(out) writeexternnames(out, name) end)
+end
+
+------------------------------------------------------------------------------
+
+-- Label pseudo-opcode (converted from trailing colon form).
+map_op[".label_2"] = function(params)
+ if not params then return "[1-9] | ->global | =>pcexpr [, addr]" end
+ local a = parseoperand(params[1])
+ local mode, imm = a.mode, a.imm
+ if type(imm) == "number" and (mode == "iJ" or (imm >= 1 and imm <= 9)) then
+ -- Local label (1: ... 9:) or global label (->global:).
+ waction("LABEL_LG", nil, 1)
+ wputxb(imm)
+ elseif mode == "iJ" then
+ -- PC label (=>pcexpr:).
+ waction("LABEL_PC", imm)
+ else
+ werror("bad label definition")
+ end
+ -- SETLABEL must immediately follow LABEL_LG/LABEL_PC.
+ local addr = params[2]
+ if addr then
+ local a = parseoperand(params[2])
+ if a.mode == "iPJ" then
+ waction("SETLABEL", a.imm) -- !x64 (secpos)
+ else
+ werror("bad label assignment")
+ end
+ end
+end
+map_op[".label_1"] = map_op[".label_2"]
+
+------------------------------------------------------------------------------
+
+-- Alignment pseudo-opcode.
+map_op[".align_1"] = function(params)
+ if not params then return "numpow2" end
+ local align = tonumber(params[1]) or map_opsizenum[map_opsize[params[1]]]
+ if align then
+ local x = align
+ -- Must be a power of 2 in the range (2 ... 256).
+ for i=1,8 do
+ x = x / 2
+ if x == 1 then
+ waction("ALIGN", nil, 1)
+ wputxb(align-1) -- Action byte is 2**n-1.
+ return
+ end
+ end
+ end
+ werror("bad alignment")
+end
+
+-- Spacing pseudo-opcode.
+map_op[".space_2"] = function(params)
+ if not params then return "num [, filler]" end
+ waction("SPACE", params[1])
+ local fill = params[2]
+ if fill then
+ fill = tonumber(fill)
+ if not fill or fill < 0 or fill > 255 then werror("bad filler") end
+ end
+ wputxb(fill or 0)
+end
+map_op[".space_1"] = map_op[".space_2"]
+
+------------------------------------------------------------------------------
+
+-- Pseudo-opcode for (primitive) type definitions (map to C types).
+map_op[".type_3"] = function(params, nparams)
+ if not params then
+ return nparams == 2 and "name, ctype" or "name, ctype, reg"
+ end
+ local name, ctype, reg = params[1], params[2], params[3]
+ if not match(name, "^[%a_][%w_]*$") then
+ werror("bad type name `"..name.."'")
+ end
+ local tp = map_type[name]
+ if tp then
+ werror("duplicate type `"..name.."'")
+ end
+ if reg and not map_reg_valid_base[reg] then
+ werror("bad base register `"..(map_reg_rev[reg] or reg).."'")
+ end
+ -- Add #type to defines. A bit unclean to put it in map_archdef.
+ map_archdef["#"..name] = "sizeof("..ctype..")"
+ -- Add new type and emit shortcut define.
+ local num = ctypenum + 1
+ map_type[name] = {
+ ctype = ctype,
+ ctypefmt = format("Dt%X(%%s)", num),
+ reg = reg,
+ }
+ wline(format("#define Dt%X(_V) (int)(ptrdiff_t)&(((%s *)0)_V)", num, ctype))
+ ctypenum = num
+end
+map_op[".type_2"] = map_op[".type_3"]
+
+-- Dump type definitions.
+local function dumptypes(out, lvl)
+ local t = {}
+ for name in pairs(map_type) do t[#t+1] = name end
+ sort(t)
+ out:write("Type definitions:\n")
+ for _,name in ipairs(t) do
+ local tp = map_type[name]
+ local reg = tp.reg and map_reg_rev[tp.reg] or ""
+ out:write(format(" %-20s %-20s %s\n", name, tp.ctype, reg))
+ end
+ out:write("\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Set the current section.
+function _M.section(num)
+ waction("SECTION")
+ wputxb(num)
+ wflush(true) -- SECTION is a terminal action.
+end
+
+------------------------------------------------------------------------------
+
+-- Dump architecture description.
+function _M.dumparch(out)
+ out:write(format("DynASM %s version %s, released %s\n\n",
+ _info.arch, _info.version, _info.release))
+ dumpregs(out)
+ dumpactions(out)
+end
+
+-- Dump all user defined elements.
+function _M.dumpdef(out, lvl)
+ dumptypes(out, lvl)
+ dumpglobals(out, lvl)
+ dumpexterns(out, lvl)
+end
+
+------------------------------------------------------------------------------
+
+-- Pass callbacks from/to the DynASM core.
+function _M.passcb(wl, we, wf, ww)
+ wline, werror, wfatal, wwarn = wl, we, wf, ww
+ return wflush
+end
+
+-- Setup the arch-specific module.
+function _M.setup(arch, opt)
+ g_arch, g_opt = arch, opt
+end
+
+-- Merge the core maps and the arch-specific maps.
+function _M.mergemaps(map_coreop, map_def)
+ setmetatable(map_op, { __index = map_coreop })
+ setmetatable(map_def, { __index = map_archdef })
+ return map_op, map_def
+end
+
+return _M
+
+------------------------------------------------------------------------------
+
diff --git a/dynasm/dynasm.lua b/dynasm/dynasm.lua
new file mode 100644
index 0000000000..20ff9cf5a7
--- /dev/null
+++ b/dynasm/dynasm.lua
@@ -0,0 +1,1070 @@
+------------------------------------------------------------------------------
+-- DynASM. A dynamic assembler for code generation engines.
+-- Originally designed and implemented for LuaJIT.
+--
+-- Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+-- See below for full copyright notice.
+------------------------------------------------------------------------------
+
+-- Application information.
+local _info = {
+ name = "DynASM",
+ description = "A dynamic assembler for code generation engines",
+ version = "1.2.1",
+ vernum = 10201,
+ release = "2009-04-16",
+ author = "Mike Pall",
+ url = "http://luajit.org/dynasm.html",
+ license = "MIT",
+ copyright = [[
+Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+[ MIT license: http://www.opensource.org/licenses/mit-license.php ]
+]],
+}
+
+-- Cache library functions.
+local type, pairs, ipairs = type, pairs, ipairs
+local pcall, error, assert = pcall, error, assert
+local _s = string
+local sub, match, gmatch, gsub = _s.sub, _s.match, _s.gmatch, _s.gsub
+local format, rep, upper = _s.format, _s.rep, _s.upper
+local _t = table
+local insert, remove, concat, sort = _t.insert, _t.remove, _t.concat, _t.sort
+local exit = os.exit
+local io = io
+local stdin, stdout, stderr = io.stdin, io.stdout, io.stderr
+
+------------------------------------------------------------------------------
+
+-- Program options.
+local g_opt = {}
+
+-- Global state for current file.
+local g_fname, g_curline, g_indent, g_lineno, g_synclineno, g_arch
+local g_errcount = 0
+
+-- Write buffer for output file.
+local g_wbuffer, g_capbuffer
+
+------------------------------------------------------------------------------
+
+-- Write an output line (or callback function) to the buffer.
+local function wline(line, needindent)
+ local buf = g_capbuffer or g_wbuffer
+ buf[#buf+1] = needindent and g_indent..line or line
+ g_synclineno = g_synclineno + 1
+end
+
+-- Write assembler line as a comment, if requestd.
+local function wcomment(aline)
+ if g_opt.comment then
+ wline(g_opt.comment..aline..g_opt.endcomment, true)
+ end
+end
+
+-- Resync CPP line numbers.
+local function wsync()
+ if g_synclineno ~= g_lineno and g_opt.cpp then
+ wline("# "..g_lineno..' "'..g_fname..'"')
+ g_synclineno = g_lineno
+ end
+end
+
+-- Dummy action flush function. Replaced with arch-specific function later.
+local function wflush(term)
+end
+
+-- Dump all buffered output lines.
+local function wdumplines(out, buf)
+ for _,line in ipairs(buf) do
+ if type(line) == "string" then
+ assert(out:write(line, "\n"))
+ else
+ -- Special callback to dynamically insert lines after end of processing.
+ line(out)
+ end
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Emit an error. Processing continues with next statement.
+local function werror(msg)
+ error(format("%s:%s: error: %s:\n%s", g_fname, g_lineno, msg, g_curline), 0)
+end
+
+-- Emit a fatal error. Processing stops.
+local function wfatal(msg)
+ g_errcount = "fatal"
+ werror(msg)
+end
+
+-- Print a warning. Processing continues.
+local function wwarn(msg)
+ stderr:write(format("%s:%s: warning: %s:\n%s\n",
+ g_fname, g_lineno, msg, g_curline))
+end
+
+-- Print caught error message. But suppress excessive errors.
+local function wprinterr(...)
+ if type(g_errcount) == "number" then
+ -- Regular error.
+ g_errcount = g_errcount + 1
+ if g_errcount < 21 then -- Seems to be a reasonable limit.
+ stderr:write(...)
+ elseif g_errcount == 21 then
+ stderr:write(g_fname,
+ ":*: warning: too many errors (suppressed further messages).\n")
+ end
+ else
+ -- Fatal error.
+ stderr:write(...)
+ return true -- Stop processing.
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Map holding all option handlers.
+local opt_map = {}
+local opt_current
+
+-- Print error and exit with error status.
+local function opterror(...)
+ stderr:write("dynasm.lua: ERROR: ", ...)
+ stderr:write("\n")
+ exit(1)
+end
+
+-- Get option parameter.
+local function optparam(args)
+ local argn = args.argn
+ local p = args[argn]
+ if not p then
+ opterror("missing parameter for option `", opt_current, "'.")
+ end
+ args.argn = argn + 1
+ return p
+end
+
+------------------------------------------------------------------------------
+
+-- Core pseudo-opcodes.
+local map_coreop = {}
+-- Dummy opcode map. Replaced by arch-specific map.
+local map_op = {}
+
+-- Forward declarations.
+local dostmt
+local readfile
+
+------------------------------------------------------------------------------
+
+-- Map for defines (initially empty, chains to arch-specific map).
+local map_def = {}
+
+-- Pseudo-opcode to define a substitution.
+map_coreop[".define_2"] = function(params, nparams)
+ if not params then return nparams == 1 and "name" or "name, subst" end
+ local name, def = params[1], params[2] or "1"
+ if not match(name, "^[%a_][%w_]*$") then werror("bad or duplicate define") end
+ map_def[name] = def
+end
+map_coreop[".define_1"] = map_coreop[".define_2"]
+
+-- Define a substitution on the command line.
+function opt_map.D(args)
+ local namesubst = optparam(args)
+ local name, subst = match(namesubst, "^([%a_][%w_]*)=(.*)$")
+ if name then
+ map_def[name] = subst
+ elseif match(namesubst, "^[%a_][%w_]*$") then
+ map_def[namesubst] = "1"
+ else
+ opterror("bad define")
+ end
+end
+
+-- Undefine a substitution on the command line.
+function opt_map.U(args)
+ local name = optparam(args)
+ if match(name, "^[%a_][%w_]*$") then
+ map_def[name] = nil
+ else
+ opterror("bad define")
+ end
+end
+
+-- Helper for definesubst.
+local gotsubst
+
+local function definesubst_one(word)
+ local subst = map_def[word]
+ if subst then gotsubst = word; return subst else return word end
+end
+
+-- Iteratively substitute defines.
+local function definesubst(stmt)
+ -- Limit number of iterations.
+ for i=1,100 do
+ gotsubst = false
+ stmt = gsub(stmt, "#?[%w_]+", definesubst_one)
+ if not gotsubst then break end
+ end
+ if gotsubst then wfatal("recursive define involving `"..gotsubst.."'") end
+ return stmt
+end
+
+-- Dump all defines.
+local function dumpdefines(out, lvl)
+ local t = {}
+ for name in pairs(map_def) do
+ t[#t+1] = name
+ end
+ sort(t)
+ out:write("Defines:\n")
+ for _,name in ipairs(t) do
+ local subst = map_def[name]
+ if g_arch then subst = g_arch.revdef(subst) end
+ out:write(format(" %-20s %s\n", name, subst))
+ end
+ out:write("\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Support variables for conditional assembly.
+local condlevel = 0
+local condstack = {}
+
+-- Evaluate condition with a Lua expression. Substitutions already performed.
+local function cond_eval(cond)
+ local func, err = loadstring("return "..cond)
+ if func then
+ setfenv(func, {}) -- No globals. All unknown identifiers evaluate to nil.
+ local ok, res = pcall(func)
+ if ok then
+ if res == 0 then return false end -- Oh well.
+ return not not res
+ end
+ err = res
+ end
+ wfatal("bad condition: "..err)
+end
+
+-- Skip statements until next conditional pseudo-opcode at the same level.
+local function stmtskip()
+ local dostmt_save = dostmt
+ local lvl = 0
+ dostmt = function(stmt)
+ local op = match(stmt, "^%s*(%S+)")
+ if op == ".if" then
+ lvl = lvl + 1
+ elseif lvl ~= 0 then
+ if op == ".endif" then lvl = lvl - 1 end
+ elseif op == ".elif" or op == ".else" or op == ".endif" then
+ dostmt = dostmt_save
+ dostmt(stmt)
+ end
+ end
+end
+
+-- Pseudo-opcodes for conditional assembly.
+map_coreop[".if_1"] = function(params)
+ if not params then return "condition" end
+ local lvl = condlevel + 1
+ local res = cond_eval(params[1])
+ condlevel = lvl
+ condstack[lvl] = res
+ if not res then stmtskip() end
+end
+
+map_coreop[".elif_1"] = function(params)
+ if not params then return "condition" end
+ if condlevel == 0 then wfatal(".elif without .if") end
+ local lvl = condlevel
+ local res = condstack[lvl]
+ if res then
+ if res == "else" then wfatal(".elif after .else") end
+ else
+ res = cond_eval(params[1])
+ if res then
+ condstack[lvl] = res
+ return
+ end
+ end
+ stmtskip()
+end
+
+map_coreop[".else_0"] = function(params)
+ if condlevel == 0 then wfatal(".else without .if") end
+ local lvl = condlevel
+ local res = condstack[lvl]
+ condstack[lvl] = "else"
+ if res then
+ if res == "else" then wfatal(".else after .else") end
+ stmtskip()
+ end
+end
+
+map_coreop[".endif_0"] = function(params)
+ local lvl = condlevel
+ if lvl == 0 then wfatal(".endif without .if") end
+ condlevel = lvl - 1
+end
+
+-- Check for unfinished conditionals.
+local function checkconds()
+ if g_errcount ~= "fatal" and condlevel ~= 0 then
+ wprinterr(g_fname, ":*: error: unbalanced conditional\n")
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Search for a file in the given path and open it for reading.
+local function pathopen(path, name)
+ local dirsep = match(package.path, "\\") and "\\" or "/"
+ for _,p in ipairs(path) do
+ local fullname = p == "" and name or p..dirsep..name
+ local fin = io.open(fullname, "r")
+ if fin then
+ g_fname = fullname
+ return fin
+ end
+ end
+end
+
+-- Include a file.
+map_coreop[".include_1"] = function(params)
+ if not params then return "filename" end
+ local name = params[1]
+ -- Save state. Ugly, I know. but upvalues are fast.
+ local gf, gl, gcl, gi = g_fname, g_lineno, g_curline, g_indent
+ -- Read the included file.
+ local fatal = readfile(pathopen(g_opt.include, name) or
+ wfatal("include file `"..name.."' not found"))
+ -- Restore state.
+ g_synclineno = -1
+ g_fname, g_lineno, g_curline, g_indent = gf, gl, gcl, gi
+ if fatal then wfatal("in include file") end
+end
+
+-- Make .include initially available, too.
+map_op[".include_1"] = map_coreop[".include_1"]
+
+------------------------------------------------------------------------------
+
+-- Support variables for macros.
+local mac_capture, mac_lineno, mac_name
+local mac_active = {}
+local mac_list = {}
+
+-- Pseudo-opcode to define a macro.
+map_coreop[".macro_*"] = function(mparams)
+ if not mparams then return "name [, params...]" end
+ -- Split off and validate macro name.
+ local name = remove(mparams, 1)
+ if not name then werror("missing macro name") end
+ if not (match(name, "^[%a_][%w_%.]*$") or match(name, "^%.[%w_%.]+$")) then
+ wfatal("bad macro name `"..name.."'")
+ end
+ -- Validate macro parameter names.
+ local mdup = {}
+ for _,mp in ipairs(mparams) do
+ if not match(mp, "^[%a_][%w_]*$") then
+ wfatal("bad macro parameter name `"..mp.."'")
+ end
+ if mdup[mp] then wfatal("duplicate macro parameter name `"..mp.."'") end
+ mdup[mp] = true
+ end
+ -- Check for duplicate or recursive macro definitions.
+ local opname = name.."_"..#mparams
+ if map_op[opname] or map_op[name.."_*"] then
+ wfatal("duplicate macro `"..name.."' ("..#mparams.." parameters)")
+ end
+ if mac_capture then wfatal("recursive macro definition") end
+
+ -- Enable statement capture.
+ local lines = {}
+ mac_lineno = g_lineno
+ mac_name = name
+ mac_capture = function(stmt) -- Statement capture function.
+ -- Stop macro definition with .endmacro pseudo-opcode.
+ if not match(stmt, "^%s*.endmacro%s*$") then
+ lines[#lines+1] = stmt
+ return
+ end
+ mac_capture = nil
+ mac_lineno = nil
+ mac_name = nil
+ mac_list[#mac_list+1] = opname
+ -- Add macro-op definition.
+ map_op[opname] = function(params)
+ if not params then return mparams, lines end
+ -- Protect against recursive macro invocation.
+ if mac_active[opname] then wfatal("recursive macro invocation") end
+ mac_active[opname] = true
+ -- Setup substitution map.
+ local subst = {}
+ for i,mp in ipairs(mparams) do subst[mp] = params[i] end
+ local mcom
+ if g_opt.maccomment and g_opt.comment then
+ mcom = " MACRO "..name.." ("..#mparams..")"
+ wcomment("{"..mcom)
+ end
+ -- Loop through all captured statements
+ for _,stmt in ipairs(lines) do
+ -- Substitute macro parameters.
+ local st = gsub(stmt, "[%w_]+", subst)
+ st = definesubst(st)
+ st = gsub(st, "%s*%.%.%s*", "") -- Token paste a..b.
+ if mcom and sub(st, 1, 1) ~= "|" then wcomment(st) end
+ -- Emit statement. Use a protected call for better diagnostics.
+ local ok, err = pcall(dostmt, st)
+ if not ok then
+ -- Add the captured statement to the error.
+ wprinterr(err, "\n", g_indent, "| ", stmt,
+ "\t[MACRO ", name, " (", #mparams, ")]\n")
+ end
+ end
+ if mcom then wcomment("}"..mcom) end
+ mac_active[opname] = nil
+ end
+ end
+end
+
+-- An .endmacro pseudo-opcode outside of a macro definition is an error.
+map_coreop[".endmacro_0"] = function(params)
+ wfatal(".endmacro without .macro")
+end
+
+-- Dump all macros and their contents (with -PP only).
+local function dumpmacros(out, lvl)
+ sort(mac_list)
+ out:write("Macros:\n")
+ for _,opname in ipairs(mac_list) do
+ local name = sub(opname, 1, -3)
+ local params, lines = map_op[opname]()
+ out:write(format(" %-20s %s\n", name, concat(params, ", ")))
+ if lvl > 1 then
+ for _,line in ipairs(lines) do
+ out:write(" |", line, "\n")
+ end
+ out:write("\n")
+ end
+ end
+ out:write("\n")
+end
+
+-- Check for unfinished macro definitions.
+local function checkmacros()
+ if mac_capture then
+ wprinterr(g_fname, ":", mac_lineno,
+ ": error: unfinished .macro `", mac_name ,"'\n")
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Support variables for captures.
+local cap_lineno, cap_name
+local cap_buffers = {}
+local cap_used = {}
+
+-- Start a capture.
+map_coreop[".capture_1"] = function(params)
+ if not params then return "name" end
+ wflush()
+ local name = params[1]
+ if not match(name, "^[%a_][%w_]*$") then
+ wfatal("bad capture name `"..name.."'")
+ end
+ if cap_name then
+ wfatal("already capturing to `"..cap_name.."' since line "..cap_lineno)
+ end
+ cap_name = name
+ cap_lineno = g_lineno
+ -- Create or continue a capture buffer and start the output line capture.
+ local buf = cap_buffers[name]
+ if not buf then buf = {}; cap_buffers[name] = buf end
+ g_capbuffer = buf
+ g_synclineno = 0
+end
+
+-- Stop a capture.
+map_coreop[".endcapture_0"] = function(params)
+ wflush()
+ if not cap_name then wfatal(".endcapture without a valid .capture") end
+ cap_name = nil
+ cap_lineno = nil
+ g_capbuffer = nil
+ g_synclineno = 0
+end
+
+-- Dump a capture buffer.
+map_coreop[".dumpcapture_1"] = function(params)
+ if not params then return "name" end
+ wflush()
+ local name = params[1]
+ if not match(name, "^[%a_][%w_]*$") then
+ wfatal("bad capture name `"..name.."'")
+ end
+ cap_used[name] = true
+ wline(function(out)
+ local buf = cap_buffers[name]
+ if buf then wdumplines(out, buf) end
+ end)
+ g_synclineno = 0
+end
+
+-- Dump all captures and their buffers (with -PP only).
+local function dumpcaptures(out, lvl)
+ out:write("Captures:\n")
+ for name,buf in pairs(cap_buffers) do
+ out:write(format(" %-20s %4s)\n", name, "("..#buf))
+ if lvl > 1 then
+ local bar = rep("=", 76)
+ out:write(" ", bar, "\n")
+ for _,line in ipairs(buf) do
+ out:write(" ", line, "\n")
+ end
+ out:write(" ", bar, "\n\n")
+ end
+ end
+ out:write("\n")
+end
+
+-- Check for unfinished or unused captures.
+local function checkcaptures()
+ if cap_name then
+ wprinterr(g_fname, ":", cap_lineno,
+ ": error: unfinished .capture `", cap_name,"'\n")
+ return
+ end
+ for name in pairs(cap_buffers) do
+ if not cap_used[name] then
+ wprinterr(g_fname, ":*: error: missing .dumpcapture ", name ,"\n")
+ end
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Sections names.
+local map_sections = {}
+
+-- Pseudo-opcode to define code sections.
+-- TODO: Data sections, BSS sections. Needs extra C code and API.
+map_coreop[".section_*"] = function(params)
+ if not params then return "name..." end
+ if #map_sections > 0 then werror("duplicate section definition") end
+ wflush()
+ for sn,name in ipairs(params) do
+ local opname = "."..name.."_0"
+ if not match(name, "^[%a][%w_]*$") or
+ map_op[opname] or map_op["."..name.."_*"] then
+ werror("bad section name `"..name.."'")
+ end
+ map_sections[#map_sections+1] = name
+ wline(format("#define DASM_SECTION_%s\t%d", upper(name), sn-1))
+ map_op[opname] = function(params) g_arch.section(sn-1) end
+ end
+ wline(format("#define DASM_MAXSECTION\t\t%d", #map_sections))
+end
+
+-- Dump all sections.
+local function dumpsections(out, lvl)
+ out:write("Sections:\n")
+ for _,name in ipairs(map_sections) do
+ out:write(format(" %s\n", name))
+ end
+ out:write("\n")
+end
+
+------------------------------------------------------------------------------
+
+-- Load architecture-specific module.
+local function loadarch(arch)
+ if not match(arch, "^[%w_]+$") then return "bad arch name" end
+ local ok, m_arch = pcall(require, "dasm_"..arch)
+ if not ok then return "cannot load module: "..m_arch end
+ g_arch = m_arch
+ wflush = m_arch.passcb(wline, werror, wfatal, wwarn)
+ m_arch.setup(arch, g_opt)
+ map_op, map_def = m_arch.mergemaps(map_coreop, map_def)
+end
+
+-- Dump architecture description.
+function opt_map.dumparch(args)
+ local name = optparam(args)
+ if not g_arch then
+ local err = loadarch(name)
+ if err then opterror(err) end
+ end
+
+ local t = {}
+ for name in pairs(map_coreop) do t[#t+1] = name end
+ for name in pairs(map_op) do t[#t+1] = name end
+ sort(t)
+
+ local out = stdout
+ local _arch = g_arch._info
+ out:write(format("%s version %s, released %s, %s\n",
+ _info.name, _info.version, _info.release, _info.url))
+ g_arch.dumparch(out)
+
+ local pseudo = true
+ out:write("Pseudo-Opcodes:\n")
+ for _,sname in ipairs(t) do
+ local name, nparam = match(sname, "^(.+)_([0-9%*])$")
+ if name then
+ if pseudo and sub(name, 1, 1) ~= "." then
+ out:write("\nOpcodes:\n")
+ pseudo = false
+ end
+ local f = map_op[sname]
+ local s
+ if nparam ~= "*" then nparam = nparam + 0 end
+ if nparam == 0 then
+ s = ""
+ elseif type(f) == "string" then
+ s = map_op[".template__"](nil, f, nparam)
+ else
+ s = f(nil, nparam)
+ end
+ if type(s) == "table" then
+ for _,s2 in ipairs(s) do
+ out:write(format(" %-12s %s\n", name, s2))
+ end
+ else
+ out:write(format(" %-12s %s\n", name, s))
+ end
+ end
+ end
+ out:write("\n")
+ exit(0)
+end
+
+-- Pseudo-opcode to set the architecture.
+-- Only initially available (map_op is replaced when called).
+map_op[".arch_1"] = function(params)
+ if not params then return "name" end
+ local err = loadarch(params[1])
+ if err then wfatal(err) end
+end
+
+-- Dummy .arch pseudo-opcode to improve the error report.
+map_coreop[".arch_1"] = function(params)
+ if not params then return "name" end
+ wfatal("duplicate .arch statement")
+end
+
+------------------------------------------------------------------------------
+
+-- Dummy pseudo-opcode. Don't confuse '.nop' with 'nop'.
+map_coreop[".nop_*"] = function(params)
+ if not params then return "[ignored...]" end
+end
+
+-- Pseudo-opcodes to raise errors.
+map_coreop[".error_1"] = function(params)
+ if not params then return "message" end
+ werror(params[1])
+end
+
+map_coreop[".fatal_1"] = function(params)
+ if not params then return "message" end
+ wfatal(params[1])
+end
+
+-- Dump all user defined elements.
+local function dumpdef(out)
+ local lvl = g_opt.dumpdef
+ if lvl == 0 then return end
+ dumpsections(out, lvl)
+ dumpdefines(out, lvl)
+ if g_arch then g_arch.dumpdef(out, lvl) end
+ dumpmacros(out, lvl)
+ dumpcaptures(out, lvl)
+end
+
+------------------------------------------------------------------------------
+
+-- Helper for splitstmt.
+local splitlvl
+
+local function splitstmt_one(c)
+ if c == "(" then
+ splitlvl = ")"..splitlvl
+ elseif c == "[" then
+ splitlvl = "]"..splitlvl
+ elseif c == ")" or c == "]" then
+ if sub(splitlvl, 1, 1) ~= c then werror("unbalanced () or []") end
+ splitlvl = sub(splitlvl, 2)
+ elseif splitlvl == "" then
+ return " \0 "
+ end
+ return c
+end
+
+-- Split statement into (pseudo-)opcode and params.
+local function splitstmt(stmt)
+ -- Convert label with trailing-colon into .label statement.
+ local label = match(stmt, "^%s*(.+):%s*$")
+ if label then return ".label", {label} end
+
+ -- Split at commas and equal signs, but obey parentheses and brackets.
+ splitlvl = ""
+ stmt = gsub(stmt, "[,%(%)%[%]]", splitstmt_one)
+ if splitlvl ~= "" then werror("unbalanced () or []") end
+
+ -- Split off opcode.
+ local op, other = match(stmt, "^%s*([^%s%z]+)%s*(.*)$")
+ if not op then werror("bad statement syntax") end
+
+ -- Split parameters.
+ local params = {}
+ for p in gmatch(other, "%s*(%Z+)%z?") do
+ params[#params+1] = gsub(p, "%s+$", "")
+ end
+ if #params > 16 then werror("too many parameters") end
+
+ params.op = op
+ return op, params
+end
+
+-- Process a single statement.
+dostmt = function(stmt)
+ -- Ignore empty statements.
+ if match(stmt, "^%s*$") then return end
+
+ -- Capture macro defs before substitution.
+ if mac_capture then return mac_capture(stmt) end
+ stmt = definesubst(stmt)
+
+ -- Emit C code without parsing the line.
+ if sub(stmt, 1, 1) == "|" then
+ local tail = sub(stmt, 2)
+ wflush()
+ if sub(tail, 1, 2) == "//" then wcomment(tail) else wline(tail, true) end
+ return
+ end
+
+ -- Split into (pseudo-)opcode and params.
+ local op, params = splitstmt(stmt)
+
+ -- Get opcode handler (matching # of parameters or generic handler).
+ local f = map_op[op.."_"..#params] or map_op[op.."_*"]
+ if not f then
+ if not g_arch then wfatal("first statement must be .arch") end
+ -- Improve error report.
+ for i=0,16 do
+ if map_op[op.."_"..i] then
+ werror("wrong number of parameters for `"..op.."'")
+ end
+ end
+ werror("unknown statement `"..op.."'")
+ end
+
+ -- Call opcode handler or special handler for template strings.
+ if type(f) == "string" then
+ map_op[".template__"](params, f)
+ else
+ f(params)
+ end
+end
+
+-- Process a single line.
+local function doline(line)
+ if g_opt.flushline then wflush() end
+
+ -- Assembler line?
+ local indent, aline = match(line, "^(%s*)%|(.*)$")
+ if not aline then
+ -- No, plain C code line, need to flush first.
+ wflush()
+ wsync()
+ wline(line, false)
+ return
+ end
+
+ g_indent = indent -- Remember current line indentation.
+
+ -- Emit C code (even from macros). Avoids echo and line parsing.
+ if sub(aline, 1, 1) == "|" then
+ if not mac_capture then
+ wsync()
+ elseif g_opt.comment then
+ wsync()
+ wcomment(aline)
+ end
+ dostmt(aline)
+ return
+ end
+
+ -- Echo assembler line as a comment.
+ if g_opt.comment then
+ wsync()
+ wcomment(aline)
+ end
+
+ -- Strip assembler comments.
+ aline = gsub(aline, "//.*$", "")
+
+ -- Split line into statements at semicolons.
+ if match(aline, ";") then
+ for stmt in gmatch(aline, "[^;]+") do dostmt(stmt) end
+ else
+ dostmt(aline)
+ end
+end
+
+------------------------------------------------------------------------------
+
+-- Write DynASM header.
+local function dasmhead(out)
+ out:write(format([[
+/*
+** This file has been pre-processed with DynASM.
+** %s
+** DynASM version %s, DynASM %s version %s
+** DO NOT EDIT! The original file is in "%s".
+*/
+
+#if DASM_VERSION != %d
+#error "Version mismatch between DynASM and included encoding engine"
+#endif
+
+]], _info.url,
+ _info.version, g_arch._info.arch, g_arch._info.version,
+ g_fname, _info.vernum))
+end
+
+-- Read input file.
+readfile = function(fin)
+ g_indent = ""
+ g_lineno = 0
+ g_synclineno = -1
+
+ -- Process all lines.
+ for line in fin:lines() do
+ g_lineno = g_lineno + 1
+ g_curline = line
+ local ok, err = pcall(doline, line)
+ if not ok and wprinterr(err, "\n") then return true end
+ end
+ wflush()
+
+ -- Close input file.
+ assert(fin == stdin or fin:close())
+end
+
+-- Write output file.
+local function writefile(outfile)
+ local fout
+
+ -- Open output file.
+ if outfile == nil or outfile == "-" then
+ fout = stdout
+ else
+ fout = assert(io.open(outfile, "w"))
+ end
+
+ -- Write all buffered lines
+ wdumplines(fout, g_wbuffer)
+
+ -- Close output file.
+ assert(fout == stdout or fout:close())
+
+ -- Optionally dump definitions.
+ dumpdef(fout == stdout and stderr or stdout)
+end
+
+-- Translate an input file to an output file.
+local function translate(infile, outfile)
+ g_wbuffer = {}
+ g_indent = ""
+ g_lineno = 0
+ g_synclineno = -1
+
+ -- Put header.
+ wline(dasmhead)
+
+ -- Read input file.
+ local fin
+ if infile == "-" then
+ g_fname = "(stdin)"
+ fin = stdin
+ else
+ g_fname = infile
+ fin = assert(io.open(infile, "r"))
+ end
+ readfile(fin)
+
+ -- Check for errors.
+ if not g_arch then
+ wprinterr(g_fname, ":*: error: missing .arch directive\n")
+ end
+ checkconds()
+ checkmacros()
+ checkcaptures()
+
+ if g_errcount ~= 0 then
+ stderr:write(g_fname, ":*: info: ", g_errcount, " error",
+ (type(g_errcount) == "number" and g_errcount > 1) and "s" or "",
+ " in input file -- no output file generated.\n")
+ dumpdef(stderr)
+ exit(1)
+ end
+
+ -- Write output file.
+ writefile(outfile)
+end
+
+------------------------------------------------------------------------------
+
+-- Print help text.
+function opt_map.help()
+ stdout:write("DynASM -- ", _info.description, ".\n")
+ stdout:write("DynASM ", _info.version, " ", _info.release, " ", _info.url, "\n")
+ stdout:write[[
+
+Usage: dynasm [OPTION]... INFILE.dasc|-
+
+ -h, --help Display this help text.
+ -V, --version Display version and copyright information.
+
+ -o, --outfile FILE Output file name (default is stdout).
+ -I, --include DIR Add directory to the include search path.
+
+ -c, --ccomment Use /* */ comments for assembler lines.
+ -C, --cppcomment Use // comments for assembler lines (default).
+ -N, --nocomment Suppress assembler lines in output.
+ -M, --maccomment Show macro expansions as comments (default off).
+
+ -L, --nolineno Suppress CPP line number information in output.
+ -F, --flushline Flush action list for every line.
+
+ -D NAME[=SUBST] Define a substitution.
+ -U NAME Undefine a substitution.
+
+ -P, --dumpdef Dump defines, macros, etc. Repeat for more output.
+ -A, --dumparch ARCH Load architecture ARCH and dump description.
+]]
+ exit(0)
+end
+
+-- Print version information.
+function opt_map.version()
+ stdout:write(format("%s version %s, released %s\n%s\n\n%s",
+ _info.name, _info.version, _info.release, _info.url, _info.copyright))
+ exit(0)
+end
+
+-- Misc. options.
+function opt_map.outfile(args) g_opt.outfile = optparam(args) end
+function opt_map.include(args) insert(g_opt.include, 1, optparam(args)) end
+function opt_map.ccomment() g_opt.comment = "/*|"; g_opt.endcomment = " */" end
+function opt_map.cppcomment() g_opt.comment = "//|"; g_opt.endcomment = "" end
+function opt_map.nocomment() g_opt.comment = false end
+function opt_map.maccomment() g_opt.maccomment = true end
+function opt_map.nolineno() g_opt.cpp = false end
+function opt_map.flushline() g_opt.flushline = true end
+function opt_map.dumpdef() g_opt.dumpdef = g_opt.dumpdef + 1 end
+
+------------------------------------------------------------------------------
+
+-- Short aliases for long options.
+local opt_alias = {
+ h = "help", ["?"] = "help", V = "version",
+ o = "outfile", I = "include",
+ c = "ccomment", C = "cppcomment", N = "nocomment", M = "maccomment",
+ L = "nolineno", F = "flushline",
+ P = "dumpdef", A = "dumparch",
+}
+
+-- Parse single option.
+local function parseopt(opt, args)
+ opt_current = #opt == 1 and "-"..opt or "--"..opt
+ local f = opt_map[opt] or opt_map[opt_alias[opt]]
+ if not f then
+ opterror("unrecognized option `", opt_current, "'. Try `--help'.\n")
+ end
+ f(args)
+end
+
+-- Parse arguments.
+local function parseargs(args)
+ -- Default options.
+ g_opt.comment = "//|"
+ g_opt.endcomment = ""
+ g_opt.cpp = true
+ g_opt.dumpdef = 0
+ g_opt.include = { "" }
+
+ -- Process all option arguments.
+ args.argn = 1
+ repeat
+ local a = args[args.argn]
+ if not a then break end
+ local lopt, opt = match(a, "^%-(%-?)(.+)")
+ if not opt then break end
+ args.argn = args.argn + 1
+ if lopt == "" then
+ -- Loop through short options.
+ for o in gmatch(opt, ".") do parseopt(o, args) end
+ else
+ -- Long option.
+ parseopt(opt, args)
+ end
+ until false
+
+ -- Check for proper number of arguments.
+ local nargs = #args - args.argn + 1
+ if nargs ~= 1 then
+ if nargs == 0 then
+ if g_opt.dumpdef > 0 then return dumpdef(stdout) end
+ end
+ opt_map.help()
+ end
+
+ -- Translate a single input file to a single output file
+ -- TODO: Handle multiple files?
+ translate(args[args.argn], g_opt.outfile)
+end
+
+------------------------------------------------------------------------------
+
+-- Add the directory dynasm.lua resides in to the Lua module search path.
+local arg = arg
+if arg and arg[0] then
+ local prefix = match(arg[0], "^(.*[/\\])")
+ if prefix then package.path = prefix.."?.lua;"..package.path end
+end
+
+-- Start DynASM.
+parseargs{...}
+
+------------------------------------------------------------------------------
+
diff --git a/etc/strict.lua b/etc/strict.lua
new file mode 100644
index 0000000000..604619dd2e
--- /dev/null
+++ b/etc/strict.lua
@@ -0,0 +1,41 @@
+--
+-- strict.lua
+-- checks uses of undeclared global variables
+-- All global variables must be 'declared' through a regular assignment
+-- (even assigning nil will do) in a main chunk before being used
+-- anywhere or assigned to inside a function.
+--
+
+local getinfo, error, rawset, rawget = debug.getinfo, error, rawset, rawget
+
+local mt = getmetatable(_G)
+if mt == nil then
+ mt = {}
+ setmetatable(_G, mt)
+end
+
+mt.__declared = {}
+
+local function what ()
+ local d = getinfo(3, "S")
+ return d and d.what or "C"
+end
+
+mt.__newindex = function (t, n, v)
+ if not mt.__declared[n] then
+ local w = what()
+ if w ~= "main" and w ~= "C" then
+ error("assign to undeclared variable '"..n.."'", 2)
+ end
+ mt.__declared[n] = true
+ end
+ rawset(t, n, v)
+end
+
+mt.__index = function (t, n)
+ if not mt.__declared[n] and what() ~= "C" then
+ error("variable '"..n.."' is not declared", 2)
+ end
+ return rawget(t, n)
+end
+
diff --git a/lib/.gitignore b/lib/.gitignore
new file mode 100644
index 0000000000..500e2855af
--- /dev/null
+++ b/lib/.gitignore
@@ -0,0 +1 @@
+vmdef.lua
diff --git a/lib/bc.lua b/lib/bc.lua
new file mode 100644
index 0000000000..532f24933a
--- /dev/null
+++ b/lib/bc.lua
@@ -0,0 +1,182 @@
+----------------------------------------------------------------------------
+-- LuaJIT bytecode listing module.
+--
+-- Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+-- Released under the MIT/X license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+--
+-- This module lists the bytecode of a Lua function. If it's loaded by -jbc
+-- it hooks into the parser and lists all functions of a chunk as they
+-- are parsed.
+--
+-- Example usage:
+--
+-- luajit -jbc -e 'local x=0; for i=1,1e6 do x=x+i end; print(x)'
+-- luajit -jbc=- foo.lua
+-- luajit -jbc=foo.list foo.lua
+--
+-- Default output is to stderr. To redirect the output to a file, pass a
+-- filename as an argument (use '-' for stdout) or set the environment
+-- variable LUAJIT_LISTFILE. The file is overwritten every time the module
+-- is started.
+--
+-- This module can also be used programmatically:
+--
+-- local bc = require("jit.bc")
+--
+-- local function foo() print("hello") end
+--
+-- bc.dump(foo) --> -- BYTECODE -- [...]
+-- print(bc.line(foo, 2)) --> 0002 KSTR 1 1 ; "hello"
+--
+-- local out = {
+-- -- Do something wich each line:
+-- write = function(t, ...) io.write(...) end,
+-- close = function(t) end,
+-- flush = function(t) end,
+-- }
+-- bc.dump(foo, out)
+--
+------------------------------------------------------------------------------
+
+-- Cache some library functions and objects.
+local jit = require("jit")
+assert(jit.version_num == 20000, "LuaJIT core/library version mismatch")
+local jutil = require("jit.util")
+local vmdef = require("jit.vmdef")
+local bit = require("bit")
+local sub, gsub, format = string.sub, string.gsub, string.format
+local byte, band, shr = string.byte, bit.band, bit.rshift
+local funcinfo, funcbc, funck = jutil.funcinfo, jutil.funcbc, jutil.funck
+local funcuvname = jutil.funcuvname
+local bcnames = vmdef.bcnames
+local stdout, stderr = io.stdout, io.stderr
+
+------------------------------------------------------------------------------
+
+local function ctlsub(c)
+ if c == "\n" then return "\\n"
+ elseif c == "\r" then return "\\r"
+ elseif c == "\t" then return "\\t"
+ elseif c == "\r" then return "\\r"
+ else return format("\\%03d", byte(c))
+ end
+end
+
+-- Return one bytecode line.
+local function bcline(func, pc, prefix)
+ local ins, m = funcbc(func, pc)
+ if not ins then return end
+ local ma, mb, mc = band(m, 7), band(m, 15*8), band(m, 15*128)
+ local a = band(shr(ins, 8), 0xff)
+ local oidx = 6*band(ins, 0xff)
+ local s = format("%04d %s %-6s %3s ",
+ pc, prefix or " ", sub(bcnames, oidx+1, oidx+6), ma == 0 and "" or a)
+ local d = shr(ins, 16)
+ if mc == 13*128 then -- BCMjump
+ if ma == 0 then
+ return format("%s=> %04d\n", sub(s, 1, -3), pc+d-0x7fff)
+ end
+ return format("%s=> %04d\n", s, pc+d-0x7fff)
+ end
+ if mb ~= 0 then d = band(d, 0xff) end
+ local kc
+ if mc == 10*128 then -- BCMstr
+ kc = funck(func, -d-1)
+ kc = format(#kc > 40 and '"%.40s"~' or '"%s"', gsub(kc, "%c", ctlsub))
+ elseif mc == 9*128 then -- BCMnum
+ kc = funck(func, d)
+ elseif mc == 12*128 then -- BCMfunc
+ local fi = funcinfo(funck(func, -d-1))
+ if fi.ffid then
+ kc = vmdef.ffnames[fi.ffid]
+ else
+ kc = fi.loc
+ end
+ elseif mc == 5*128 then -- BCMuv
+ kc = funcuvname(func, d)
+ end
+ if ma == 5 then -- BCMuv
+ local ka = funcuvname(func, a)
+ if kc then kc = ka.." ; "..kc else kc = ka end
+ end
+ if mb ~= 0 then
+ local b = shr(ins, 24)
+ if kc then return format("%s%3d %3d ; %s\n", s, b, d, kc) end
+ return format("%s%3d %3d\n", s, b, d)
+ end
+ if kc then return format("%s%3d ; %s\n", s, d, kc) end
+ if mc == 7*128 and d > 32767 then d = d - 65536 end -- BCMlits
+ return format("%s%3d\n", s, d)
+end
+
+-- Collect branch targets of a function.
+local function bctargets(func)
+ local target = {}
+ for pc=1,1000000000 do
+ local ins, m = funcbc(func, pc)
+ if not ins then break end
+ if band(m, 15*128) == 13*128 then target[pc+shr(ins, 16)-0x7fff] = true end
+ end
+ return target
+end
+
+-- Dump bytecode instructions of a function.
+local function bcdump(func, out)
+ if not out then out = stdout end
+ local fi = funcinfo(func)
+ out:write(format("-- BYTECODE -- %s-%d\n", fi.loc, fi.lastlinedefined))
+ local target = bctargets(func)
+ for pc=1,1000000000 do
+ local s = bcline(func, pc, target[pc] and "=>")
+ if not s then break end
+ out:write(s)
+ end
+ out:write("\n")
+ out:flush()
+end
+
+------------------------------------------------------------------------------
+
+-- Active flag and output file handle.
+local active, out
+
+-- List handler.
+local function h_list(func)
+ return bcdump(func, out)
+end
+
+-- Detach list handler.
+local function bclistoff()
+ if active then
+ active = false
+ jit.attach(h_list)
+ if out and out ~= stdout and out ~= stderr then out:close() end
+ out = nil
+ end
+end
+
+-- Open the output file and attach list handler.
+local function bcliston(outfile)
+ if active then bclistoff() end
+ if not outfile then outfile = os.getenv("LUAJIT_LISTFILE") end
+ if outfile then
+ out = outfile == "-" and stdout or assert(io.open(outfile, "w"))
+ else
+ out = stderr
+ end
+ jit.attach(h_list, "bc")
+ active = true
+end
+
+-- Public module functions.
+module(...)
+
+line = bcline
+dump = bcdump
+targets = bctargets
+
+on = bcliston
+off = bclistoff
+start = bcliston -- For -j command line option.
+
diff --git a/lib/dis_x64.lua b/lib/dis_x64.lua
new file mode 100644
index 0000000000..da3d63f8ba
--- /dev/null
+++ b/lib/dis_x64.lua
@@ -0,0 +1,19 @@
+----------------------------------------------------------------------------
+-- LuaJIT x64 disassembler wrapper module.
+--
+-- Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+-- Released under the MIT/X license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+-- This module just exports the 64 bit functions from the combined
+-- x86/x64 disassembler module. All the interesting stuff is there.
+------------------------------------------------------------------------------
+
+local require = require
+
+module(...)
+
+local dis_x86 = require(_PACKAGE.."dis_x86")
+
+create = dis_x86.create64
+disass = dis_x86.disass64
+
diff --git a/lib/dis_x86.lua b/lib/dis_x86.lua
new file mode 100644
index 0000000000..8f127bee92
--- /dev/null
+++ b/lib/dis_x86.lua
@@ -0,0 +1,824 @@
+----------------------------------------------------------------------------
+-- LuaJIT x86/x64 disassembler module.
+--
+-- Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+-- Released under the MIT/X license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+-- This is a helper module used by the LuaJIT machine code dumper module.
+--
+-- Sending small code snippets to an external disassembler and mixing the
+-- output with our own stuff was too fragile. So I had to bite the bullet
+-- and write yet another x86 disassembler. Oh well ...
+--
+-- The output format is very similar to what ndisasm generates. But it has
+-- been developed independently by looking at the opcode tables from the
+-- Intel and AMD manuals. The supported instruction set is quite extensive
+-- and reflects what a current generation Intel or AMD CPU implements in
+-- 32 bit and 64 bit mode. Yes, this includes MMX, SSE, SSE2, SSE3, SSSE3,
+-- SSE4.1, SSE4.2, SSE4a and even privileged and hypervisor (VMX/SVM)
+-- instructions.
+--
+-- Notes:
+-- * The (useless) a16 prefix, 3DNow and pre-586 opcodes are unsupported.
+-- * No attempt at optimization has been made -- it's fast enough for my needs.
+-- * The public API may change when more architectures are added.
+------------------------------------------------------------------------------
+
+local type = type
+local sub, byte, format = string.sub, string.byte, string.format
+local match, gmatch, gsub = string.match, string.gmatch, string.gsub
+local lower, rep = string.lower, string.rep
+
+-- Map for 1st opcode byte in 32 bit mode. Ugly? Well ... read on.
+local map_opc1_32 = {
+--0x
+[0]="addBmr","addVmr","addBrm","addVrm","addBai","addVai","push es","pop es",
+"orBmr","orVmr","orBrm","orVrm","orBai","orVai","push cs","opc2*",
+--1x
+"adcBmr","adcVmr","adcBrm","adcVrm","adcBai","adcVai","push ss","pop ss",
+"sbbBmr","sbbVmr","sbbBrm","sbbVrm","sbbBai","sbbVai","push ds","pop ds",
+--2x
+"andBmr","andVmr","andBrm","andVrm","andBai","andVai","es:seg","daa",
+"subBmr","subVmr","subBrm","subVrm","subBai","subVai","cs:seg","das",
+--3x
+"xorBmr","xorVmr","xorBrm","xorVrm","xorBai","xorVai","ss:seg","aaa",
+"cmpBmr","cmpVmr","cmpBrm","cmpVrm","cmpBai","cmpVai","ds:seg","aas",
+--4x
+"incVR","incVR","incVR","incVR","incVR","incVR","incVR","incVR",
+"decVR","decVR","decVR","decVR","decVR","decVR","decVR","decVR",
+--5x
+"pushUR","pushUR","pushUR","pushUR","pushUR","pushUR","pushUR","pushUR",
+"popUR","popUR","popUR","popUR","popUR","popUR","popUR","popUR",
+--6x
+"sz*pushaw,pusha","sz*popaw,popa","boundVrm","arplWmr",
+"fs:seg","gs:seg","o16:","a16",
+"pushUi","imulVrmi","pushBs","imulVrms",
+"insb","insVS","outsb","outsVS",
+--7x
+"joBj","jnoBj","jbBj","jnbBj","jzBj","jnzBj","jbeBj","jaBj",
+"jsBj","jnsBj","jpeBj","jpoBj","jlBj","jgeBj","jleBj","jgBj",
+--8x
+"arith!Bmi","arith!Vmi","arith!Bmi","arith!Vms",
+"testBmr","testVmr","xchgBrm","xchgVrm",
+"movBmr","movVmr","movBrm","movVrm",
+"movVmg","leaVrm","movWgm","popUm",
+--9x
+"nop*xchgVaR|pause|xchgWaR|repne nop","xchgVaR","xchgVaR","xchgVaR",
+"xchgVaR","xchgVaR","xchgVaR","xchgVaR",
+"sz*cbw,cwde,cdqe","sz*cwd,cdq,cqo","call farViw","wait",
+"sz*pushfw,pushf","sz*popfw,popf","sahf","lahf",
+--Ax
+"movBao","movVao","movBoa","movVoa",
+"movsb","movsVS","cmpsb","cmpsVS",
+"testBai","testVai","stosb","stosVS",
+"lodsb","lodsVS","scasb","scasVS",
+--Bx
+"movBRi","movBRi","movBRi","movBRi","movBRi","movBRi","movBRi","movBRi",
+"movVRI","movVRI","movVRI","movVRI","movVRI","movVRI","movVRI","movVRI",
+--Cx
+"shift!Bmu","shift!Vmu","retBw","ret","$lesVrm","$ldsVrm","movBmi","movVmi",
+"enterBwu","leave","retfBw","retf","int3","intBu","into","iretVS",
+--Dx
+"shift!Bm1","shift!Vm1","shift!Bmc","shift!Vmc","aamBu","aadBu","salc","xlatb",
+"fp*0","fp*1","fp*2","fp*3","fp*4","fp*5","fp*6","fp*7",
+--Ex
+"loopneBj","loopeBj","loopBj","sz*jcxzBj,jecxzBj,jrcxzBj",
+"inBau","inVau","outBua","outVua",
+"callVj","jmpVj","jmp farViw","jmpBj","inBad","inVad","outBda","outVda",
+--Fx
+"lock:","int1","repne:rep","rep:","hlt","cmc","testb!Bm","testv!Vm",
+"clc","stc","cli","sti","cld","std","incb!Bm","incd!Vm",
+}
+assert(#map_opc1_32 == 255)
+
+-- Map for 1st opcode byte in 64 bit mode (overrides only).
+local map_opc1_64 = setmetatable({
+ [0x06]=false, [0x07]=false, [0x0e]=false,
+ [0x16]=false, [0x17]=false, [0x1e]=false, [0x1f]=false,
+ [0x27]=false, [0x2f]=false, [0x37]=false, [0x3f]=false,
+ [0x60]=false, [0x61]=false, [0x62]=false, [0x63]="movsxdVrDmt", [0x67]="a32:",
+ [0x40]="rex*", [0x41]="rex*b", [0x42]="rex*x", [0x43]="rex*xb",
+ [0x44]="rex*r", [0x45]="rex*rb", [0x46]="rex*rx", [0x47]="rex*rxb",
+ [0x48]="rex*w", [0x49]="rex*wb", [0x4a]="rex*wx", [0x4b]="rex*wxb",
+ [0x4c]="rex*wr", [0x4d]="rex*wrb", [0x4e]="rex*wrx", [0x4f]="rex*wrxb",
+ [0x82]=false, [0x9a]=false, [0xc4]=false, [0xc5]=false, [0xce]=false,
+ [0xd4]=false, [0xd5]=false, [0xd6]=false, [0xea]=false,
+}, { __index = map_opc1_32 })
+
+-- Map for 2nd opcode byte (0F xx). True CISC hell. Hey, I told you.
+-- Prefix dependent MMX/SSE opcodes: (none)|rep|o16|repne, -|F3|66|F2
+local map_opc2 = {
+--0x
+[0]="sldt!Dmp","sgdt!Ump","larVrm","lslVrm",nil,"syscall","clts","sysret",
+"invd","wbinvd",nil,"ud1",nil,"$prefetch!Bm","femms","3dnowMrmu",
+--1x
+"movupsXrm|movssXrm|movupdXrm|movsdXrm",
+"movupsXmr|movssXmr|movupdXmr|movsdXmr",
+"movhlpsXrm$movlpsXrm|movsldupXrm|movlpdXrm|movddupXrm",
+"movlpsXmr||movlpdXmr",
+"unpcklpsXrm||unpcklpdXrm",
+"unpckhpsXrm||unpckhpdXrm",
+"movlhpsXrm$movhpsXrm|movshdupXrm|movhpdXrm",
+"movhpsXmr||movhpdXmr",
+"$prefetcht!Bm","hintnopVm","hintnopVm","hintnopVm",
+"hintnopVm","hintnopVm","hintnopVm","hintnopVm",
+--2x
+"movUmx$","movUmy$","movUxm$","movUym$","movUmz$",nil,"movUzm$",nil,
+"movapsXrm||movapdXrm",
+"movapsXmr||movapdXmr",
+"cvtpi2psXrMm|cvtsi2ssXrVm|cvtpi2pdXrMm|cvtsi2sdXrVm",
+"movntpsXmr|movntssXmr|movntpdXmr|movntsdXmr",
+"cvttps2piMrXm|cvttss2siVrXm|cvttpd2piMrXm|cvttsd2siVrXm",
+"cvtps2piMrXm|cvtss2siVrXm|cvtpd2piMrXm|cvtsd2siVrXm",
+"ucomissXrm||ucomisdXrm",
+"comissXrm||comisdXrm",
+--3x
+"wrmsr","rdtsc","rdmsr","rdpmc","sysenter","sysexit",nil,"getsec",
+"opc3*38",nil,"opc3*3a",nil,nil,nil,nil,nil,
+--4x
+"cmovoVrm","cmovnoVrm","cmovbVrm","cmovnbVrm",
+"cmovzVrm","cmovnzVrm","cmovbeVrm","cmovaVrm",
+"cmovsVrm","cmovnsVrm","cmovpeVrm","cmovpoVrm",
+"cmovlVrm","cmovgeVrm","cmovleVrm","cmovgVrm",
+--5x
+"movmskpsVrXm$||movmskpdVrXm$","sqrtpsXrm|sqrtssXrm|sqrtpdXrm|sqrtsdXrm",
+"rsqrtpsXrm|rsqrtssXrm","rcppsXrm|rcpssXrm",
+"andpsXrm||andpdXrm","andnpsXrm||andnpdXrm",
+"orpsXrm||orpdXrm","xorpsXrm||xorpdXrm",
+"addpsXrm|addssXrm|addpdXrm|addsdXrm","mulpsXrm|mulssXrm|mulpdXrm|mulsdXrm",
+"cvtps2pdXrm|cvtss2sdXrm|cvtpd2psXrm|cvtsd2ssXrm",
+"cvtdq2psXrm|cvttps2dqXrm|cvtps2dqXrm",
+"subpsXrm|subssXrm|subpdXrm|subsdXrm","minpsXrm|minssXrm|minpdXrm|minsdXrm",
+"divpsXrm|divssXrm|divpdXrm|divsdXrm","maxpsXrm|maxssXrm|maxpdXrm|maxsdXrm",
+--6x
+"punpcklbwPrm","punpcklwdPrm","punpckldqPrm","packsswbPrm",
+"pcmpgtbPrm","pcmpgtwPrm","pcmpgtdPrm","packuswbPrm",
+"punpckhbwPrm","punpckhwdPrm","punpckhdqPrm","packssdwPrm",
+"||punpcklqdqXrm","||punpckhqdqXrm",
+"movPrVSm","movqMrm|movdquXrm|movdqaXrm",
+--7x
+"pshufwMrmu|pshufhwXrmu|pshufdXrmu|pshuflwXrmu","pshiftw!Pmu",
+"pshiftd!Pmu","pshiftq!Mmu||pshiftdq!Xmu",
+"pcmpeqbPrm","pcmpeqwPrm","pcmpeqdPrm","emms|",
+"vmreadUmr||extrqXmuu$|insertqXrmuu$","vmwriteUrm||extrqXrm$|insertqXrm$",
+nil,nil,
+"||haddpdXrm|haddpsXrm","||hsubpdXrm|hsubpsXrm",
+"movVSmMr|movqXrm|movVSmXr","movqMmr|movdquXmr|movdqaXmr",
+--8x
+"joVj","jnoVj","jbVj","jnbVj","jzVj","jnzVj","jbeVj","jaVj",
+"jsVj","jnsVj","jpeVj","jpoVj","jlVj","jgeVj","jleVj","jgVj",
+--9x
+"setoBm","setnoBm","setbBm","setnbBm","setzBm","setnzBm","setbeBm","setaBm",
+"setsBm","setnsBm","setpeBm","setpoBm","setlBm","setgeBm","setleBm","setgBm",
+--Ax
+"push fs","pop fs","cpuid","btVmr","shldVmru","shldVmrc",nil,nil,
+"push gs","pop gs","rsm","btsVmr","shrdVmru","shrdVmrc","fxsave!Dmp","imulVrm",
+--Bx
+"cmpxchgBmr","cmpxchgVmr","$lssVrm","btrVmr",
+"$lfsVrm","$lgsVrm","movzxVrBmt","movzxVrWmt",
+"|popcntVrm","ud2Dp","bt!Vmu","btcVmr",
+"bsfVrm","bsrVrm|lzcntVrm|bsrWrm","movsxVrBmt","movsxVrWmt",
+--Cx
+"xaddBmr","xaddVmr",
+"cmppsXrmu|cmpssXrmu|cmppdXrmu|cmpsdXrmu","$movntiVmr|",
+"pinsrwPrWmu","pextrwDrPmu",
+"shufpsXrmu||shufpdXrmu","$cmpxchg!Qmp",
+"bswapVR","bswapVR","bswapVR","bswapVR","bswapVR","bswapVR","bswapVR","bswapVR",
+--Dx
+"||addsubpdXrm|addsubpsXrm","psrlwPrm","psrldPrm","psrlqPrm",
+"paddqPrm","pmullwPrm",
+"|movq2dqXrMm|movqXmr|movdq2qMrXm$","pmovmskbVrMm||pmovmskbVrXm",
+"psubusbPrm","psubuswPrm","pminubPrm","pandPrm",
+"paddusbPrm","padduswPrm","pmaxubPrm","pandnPrm",
+--Ex
+"pavgbPrm","psrawPrm","psradPrm","pavgwPrm",
+"pmulhuwPrm","pmulhwPrm",
+"|cvtdq2pdXrm|cvttpd2dqXrm|cvtpd2dqXrm","$movntqMmr||$movntdqXmr",
+"psubsbPrm","psubswPrm","pminswPrm","porPrm",
+"paddsbPrm","paddswPrm","pmaxswPrm","pxorPrm",
+--Fx
+"|||lddquXrm","psllwPrm","pslldPrm","psllqPrm",
+"pmuludqPrm","pmaddwdPrm","psadbwPrm","maskmovqMrm||maskmovdquXrm$",
+"psubbPrm","psubwPrm","psubdPrm","psubqPrm",
+"paddbPrm","paddwPrm","padddPrm","ud",
+}
+assert(map_opc2[255] == "ud")
+
+-- Map for three-byte opcodes. Can't wait for their next invention.
+local map_opc3 = {
+["38"] = { -- [66] 0f 38 xx
+--0x
+[0]="pshufbPrm","phaddwPrm","phadddPrm","phaddswPrm",
+"pmaddubswPrm","phsubwPrm","phsubdPrm","phsubswPrm",
+"psignbPrm","psignwPrm","psigndPrm","pmulhrswPrm",
+nil,nil,nil,nil,
+--1x
+"||pblendvbXrma",nil,nil,nil,
+"||blendvpsXrma","||blendvpdXrma",nil,"||ptestXrm",
+nil,nil,nil,nil,
+"pabsbPrm","pabswPrm","pabsdPrm",nil,
+--2x
+"||pmovsxbwXrm","||pmovsxbdXrm","||pmovsxbqXrm","||pmovsxwdXrm",
+"||pmovsxwqXrm","||pmovsxdqXrm",nil,nil,
+"||pmuldqXrm","||pcmpeqqXrm","||$movntdqaXrm","||packusdwXrm",
+nil,nil,nil,nil,
+--3x
+"||pmovzxbwXrm","||pmovzxbdXrm","||pmovzxbqXrm","||pmovzxwdXrm",
+"||pmovzxwqXrm","||pmovzxdqXrm",nil,"||pcmpgtqXrm",
+"||pminsbXrm","||pminsdXrm","||pminuwXrm","||pminudXrm",
+"||pmaxsbXrm","||pmaxsdXrm","||pmaxuwXrm","||pmaxudXrm",
+--4x
+"||pmulddXrm","||phminposuwXrm",
+--Fx
+[0xf0] = "|||crc32TrBmt",[0xf1] = "|||crc32TrVmt",
+},
+
+["3a"] = { -- [66] 0f 3a xx
+--0x
+[0x00]=nil,nil,nil,nil,nil,nil,nil,nil,
+"||roundpsXrmu","||roundpdXrmu","||roundssXrmu","||roundsdXrmu",
+"||blendpsXrmu","||blendpdXrmu","||pblendwXrmu","palignrPrmu",
+--1x
+nil,nil,nil,nil,
+"||pextrbVmXru","||pextrwVmXru","||pextrVmSXru","||extractpsVmXru",
+nil,nil,nil,nil,nil,nil,nil,nil,
+--2x
+"||pinsrbXrVmu","||insertpsXrmu","||pinsrXrVmuS",nil,
+--4x
+[0x40] = "||dppsXrmu",
+[0x41] = "||dppdXrmu",
+[0x42] = "||mpsadbwXrmu",
+--6x
+[0x60] = "||pcmpestrmXrmu",[0x61] = "||pcmpestriXrmu",
+[0x62] = "||pcmpistrmXrmu",[0x63] = "||pcmpistriXrmu",
+},
+}
+
+-- Map for VMX/SVM opcodes 0F 01 C0-FF (sgdt group with register operands).
+local map_opcvm = {
+[0xc1]="vmcall",[0xc2]="vmlaunch",[0xc3]="vmresume",[0xc4]="vmxoff",
+[0xc8]="monitor",[0xc9]="mwait",
+[0xd8]="vmrun",[0xd9]="vmmcall",[0xda]="vmload",[0xdb]="vmsave",
+[0xdc]="stgi",[0xdd]="clgi",[0xde]="skinit",[0xdf]="invlpga",
+[0xf8]="swapgs",[0xf9]="rdtscp",
+}
+
+-- Map for FP opcodes. And you thought stack machines are simple?
+local map_opcfp = {
+-- D8-DF 00-BF: opcodes with a memory operand.
+-- D8
+[0]="faddFm","fmulFm","fcomFm","fcompFm","fsubFm","fsubrFm","fdivFm","fdivrFm",
+"fldFm",nil,"fstFm","fstpFm","fldenvVm","fldcwWm","fnstenvVm","fnstcwWm",
+-- DA
+"fiaddDm","fimulDm","ficomDm","ficompDm",
+"fisubDm","fisubrDm","fidivDm","fidivrDm",
+-- DB
+"fildDm","fisttpDm","fistDm","fistpDm",nil,"fld twordFmp",nil,"fstp twordFmp",
+-- DC
+"faddGm","fmulGm","fcomGm","fcompGm","fsubGm","fsubrGm","fdivGm","fdivrGm",
+-- DD
+"fldGm","fisttpQm","fstGm","fstpGm","frstorDmp",nil,"fnsaveDmp","fnstswWm",
+-- DE
+"fiaddWm","fimulWm","ficomWm","ficompWm",
+"fisubWm","fisubrWm","fidivWm","fidivrWm",
+-- DF
+"fildWm","fisttpWm","fistWm","fistpWm",
+"fbld twordFmp","fildQm","fbstp twordFmp","fistpQm",
+-- xx C0-FF: opcodes with a pseudo-register operand.
+-- D8
+"faddFf","fmulFf","fcomFf","fcompFf","fsubFf","fsubrFf","fdivFf","fdivrFf",
+-- D9
+"fldFf","fxchFf",{"fnop"},nil,
+{"fchs","fabs",nil,nil,"ftst","fxam"},
+{"fld1","fldl2t","fldl2e","fldpi","fldlg2","fldln2","fldz"},
+{"f2xm1","fyl2x","fptan","fpatan","fxtract","fprem1","fdecstp","fincstp"},
+{"fprem","fyl2xp1","fsqrt","fsincos","frndint","fscale","fsin","fcos"},
+-- DA
+"fcmovbFf","fcmoveFf","fcmovbeFf","fcmovuFf",nil,{nil,"fucompp"},nil,nil,
+-- DB
+"fcmovnbFf","fcmovneFf","fcmovnbeFf","fcmovnuFf",
+{nil,nil,"fnclex","fninit"},"fucomiFf","fcomiFf",nil,
+-- DC
+"fadd toFf","fmul toFf",nil,nil,
+"fsub toFf","fsubr toFf","fdivr toFf","fdiv toFf",
+-- DD
+"ffreeFf",nil,"fstFf","fstpFf","fucomFf","fucompFf",nil,nil,
+-- DE
+"faddpFf","fmulpFf",nil,{nil,"fcompp"},
+"fsubrpFf","fsubpFf","fdivrpFf","fdivpFf",
+-- DF
+nil,nil,nil,nil,{"fnstsw ax"},"fucomipFf","fcomipFf",nil,
+}
+assert(map_opcfp[126] == "fcomipFf")
+
+-- Map for opcode groups. The subkey is sp from the ModRM byte.
+local map_opcgroup = {
+ arith = { "add", "or", "adc", "sbb", "and", "sub", "xor", "cmp" },
+ shift = { "rol", "ror", "rcl", "rcr", "shl", "shr", "sal", "sar" },
+ testb = { "testBmi", "testBmi", "not", "neg", "mul", "imul", "div", "idiv" },
+ testv = { "testVmi", "testVmi", "not", "neg", "mul", "imul", "div", "idiv" },
+ incb = { "inc", "dec" },
+ incd = { "inc", "dec", "callDmp", "$call farDmp",
+ "jmpDmp", "$jmp farDmp", "pushUm" },
+ sldt = { "sldt", "str", "lldt", "ltr", "verr", "verw" },
+ sgdt = { "vm*$sgdt", "vm*$sidt", "$lgdt", "vm*$lidt",
+ "smsw", nil, "lmsw", "vm*$invlpg" },
+ bt = { nil, nil, nil, nil, "bt", "bts", "btr", "btc" },
+ cmpxchg = { nil, "sz*,cmpxchg8bQmp,cmpxchg16bXmp", nil, nil,
+ nil, nil, "vmptrld|vmxon|vmclear", "vmptrst" },
+ pshiftw = { nil, nil, "psrlw", nil, "psraw", nil, "psllw" },
+ pshiftd = { nil, nil, "psrld", nil, "psrad", nil, "pslld" },
+ pshiftq = { nil, nil, "psrlq", nil, nil, nil, "psllq" },
+ pshiftdq = { nil, nil, "psrlq", "psrldq", nil, nil, "psllq", "pslldq" },
+ fxsave = { "$fxsave", "$fxrstor", "$ldmxcsr", "$stmxcsr",
+ nil, "lfenceDp$", "mfenceDp$", "sfenceDp$clflush" },
+ prefetch = { "prefetch", "prefetchw" },
+ prefetcht = { "prefetchnta", "prefetcht0", "prefetcht1", "prefetcht2" },
+}
+
+------------------------------------------------------------------------------
+
+-- Maps for register names.
+local map_regs = {
+ B = { "al", "cl", "dl", "bl", "ah", "ch", "dh", "bh",
+ "r8b", "r9b", "r10b", "r11b", "r12b", "r13b", "r14b", "r15b" },
+ B64 = { "al", "cl", "dl", "bl", "spl", "bpl", "sil", "dil",
+ "r8b", "r9b", "r10b", "r11b", "r12b", "r13b", "r14b", "r15b" },
+ W = { "ax", "cx", "dx", "bx", "sp", "bp", "si", "di",
+ "r8w", "r9w", "r10w", "r11w", "r12w", "r13w", "r14w", "r15w" },
+ D = { "eax", "ecx", "edx", "ebx", "esp", "ebp", "esi", "edi",
+ "r8d", "r9d", "r10d", "r11d", "r12d", "r13d", "r14d", "r15d" },
+ Q = { "rax", "rcx", "rdx", "rbx", "rsp", "rbp", "rsi", "rdi",
+ "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15" },
+ M = { "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7",
+ "mm0", "mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm7" }, -- No x64 ext!
+ X = { "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
+ "xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15" },
+}
+local map_segregs = { "es", "cs", "ss", "ds", "fs", "gs", "segr6", "segr7" }
+
+-- Maps for size names.
+local map_sz2n = {
+ B = 1, W = 2, D = 4, Q = 8, M = 8, X = 16,
+}
+local map_sz2prefix = {
+ B = "byte", W = "word", D = "dword",
+ Q = "qword",
+ M = "qword", X = "xword",
+ F = "dword", G = "qword", -- No need for sizes/register names for these two.
+}
+
+------------------------------------------------------------------------------
+
+-- Output a nicely formatted line with an opcode and operands.
+local function putop(ctx, text, operands)
+ local code, pos, hex = ctx.code, ctx.pos, ""
+ local hmax = ctx.hexdump
+ if hmax > 0 then
+ for i=ctx.start,pos-1 do
+ hex = hex..format("%02X", byte(code, i, i))
+ end
+ if #hex > hmax then hex = sub(hex, 1, hmax)..". "
+ else hex = hex..rep(" ", hmax-#hex+2) end
+ end
+ if operands then text = text.." "..operands end
+ if ctx.o16 then text = "o16 "..text; ctx.o16 = false end
+ if ctx.a32 then text = "a32 "..text; ctx.a32 = false end
+ if ctx.rep then text = ctx.rep.." "..text; ctx.rep = false end
+ if ctx.rex then
+ local t = (ctx.rexw and "w" or "")..(ctx.rexr and "r" or "")..
+ (ctx.rexx and "x" or "")..(ctx.rexb and "b" or "")
+ if t ~= "" then text = "rex."..t.." "..text end
+ ctx.rexw = false; ctx.rexr = false; ctx.rexx = false; ctx.rexb = false
+ ctx.rex = false
+ end
+ if ctx.seg then
+ local text2, n = gsub(text, "%[", "["..ctx.seg..":")
+ if n == 0 then text = ctx.seg.." "..text else text = text2 end
+ ctx.seg = false
+ end
+ if ctx.lock then text = "lock "..text; ctx.lock = false end
+ local imm = ctx.imm
+ if imm then
+ local sym = ctx.symtab[imm]
+ if sym then text = text.."\t->"..sym end
+ end
+ ctx.out(format("%08x %s%s\n", ctx.addr+ctx.start, hex, text))
+ ctx.mrm = false
+ ctx.start = pos
+ ctx.imm = nil
+end
+
+-- Clear all prefix flags.
+local function clearprefixes(ctx)
+ ctx.o16 = false; ctx.seg = false; ctx.lock = false; ctx.rep = false
+ ctx.rexw = false; ctx.rexr = false; ctx.rexx = false; ctx.rexb = false
+ ctx.rex = false; ctx.a32 = false
+end
+
+-- Fallback for incomplete opcodes at the end.
+local function incomplete(ctx)
+ ctx.pos = ctx.stop+1
+ clearprefixes(ctx)
+ return putop(ctx, "(incomplete)")
+end
+
+-- Fallback for unknown opcodes.
+local function unknown(ctx)
+ clearprefixes(ctx)
+ return putop(ctx, "(unknown)")
+end
+
+-- Return an immediate of the specified size.
+local function getimm(ctx, pos, n)
+ if pos+n-1 > ctx.stop then return incomplete(ctx) end
+ local code = ctx.code
+ if n == 1 then
+ local b1 = byte(code, pos, pos)
+ return b1
+ elseif n == 2 then
+ local b1, b2 = byte(code, pos, pos+1)
+ return b1+b2*256
+ else
+ local b1, b2, b3, b4 = byte(code, pos, pos+3)
+ local imm = b1+b2*256+b3*65536+b4*16777216
+ ctx.imm = imm
+ return imm
+ end
+end
+
+-- Process pattern string and generate the operands.
+local function putpat(ctx, name, pat)
+ local operands, regs, sz, mode, sp, rm, sc, rx, sdisp
+ local code, pos, stop = ctx.code, ctx.pos, ctx.stop
+
+ -- Chars used: 1DFGIMPQRSTUVWXacdfgijmoprstuwxyz
+ for p in gmatch(pat, ".") do
+ local x = nil
+ if p == "V" or p == "U" then
+ if ctx.rexw then sz = "Q"; ctx.rexw = false
+ elseif ctx.o16 then sz = "W"; ctx.o16 = false
+ elseif p == "U" and ctx.x64 then sz = "Q"
+ else sz = "D" end
+ regs = map_regs[sz]
+ elseif p == "T" then
+ if ctx.rexw then sz = "Q"; ctx.rexw = false else sz = "D" end
+ regs = map_regs[sz]
+ elseif p == "B" then
+ sz = "B"
+ regs = ctx.rex and map_regs.B64 or map_regs.B
+ elseif match(p, "[WDQMXFG]") then
+ sz = p
+ regs = map_regs[sz]
+ elseif p == "P" then
+ sz = ctx.o16 and "X" or "M"; ctx.o16 = false
+ regs = map_regs[sz]
+ elseif p == "S" then
+ name = name..lower(sz)
+ elseif p == "s" then
+ local imm = getimm(ctx, pos, 1); if not imm then return end
+ x = imm <= 127 and format("+0x%02x", imm)
+ or format("-0x%02x", 256-imm)
+ pos = pos+1
+ elseif p == "u" then
+ local imm = getimm(ctx, pos, 1); if not imm then return end
+ x = format("0x%02x", imm)
+ pos = pos+1
+ elseif p == "w" then
+ local imm = getimm(ctx, pos, 2); if not imm then return end
+ x = format("0x%x", imm)
+ pos = pos+2
+ elseif p == "o" then -- [offset]
+ if ctx.x64 then
+ local imm1 = getimm(ctx, pos, 4); if not imm1 then return end
+ local imm2 = getimm(ctx, pos+4, 4); if not imm2 then return end
+ x = format("[0x%08x%08x]", imm2, imm1)
+ pos = pos+8
+ else
+ local imm = getimm(ctx, pos, 4); if not imm then return end
+ x = format("[0x%08x]", imm)
+ pos = pos+4
+ end
+ elseif p == "i" or p == "I" then
+ local n = map_sz2n[sz]
+ if n == 8 and ctx.x64 and p == "I" then
+ local imm1 = getimm(ctx, pos, 4); if not imm1 then return end
+ local imm2 = getimm(ctx, pos+4, 4); if not imm2 then return end
+ x = format("0x%08x%08x", imm2, imm1)
+ else
+ if n == 8 then n = 4 end
+ local imm = getimm(ctx, pos, n); if not imm then return end
+ if sz == "Q" and (imm < 0 or imm > 0x7fffffff) then
+ imm = (0xffffffff+1)-imm
+ x = format(imm > 65535 and "-0x%08x" or "-0x%x", imm)
+ else
+ x = format(imm > 65535 and "0x%08x" or "0x%x", imm)
+ end
+ end
+ pos = pos+n
+ elseif p == "j" then
+ local n = map_sz2n[sz]
+ if n == 8 then n = 4 end
+ local imm = getimm(ctx, pos, n); if not imm then return end
+ if sz == "B" and imm > 127 then imm = imm-256
+ elseif imm > 2147483647 then imm = imm-4294967296 end
+ pos = pos+n
+ imm = imm + pos + ctx.addr
+ if imm > 4294967295 and not ctx.x64 then imm = imm-4294967296 end
+ ctx.imm = imm
+ if sz == "W" then
+ x = format("word 0x%04x", imm%65536)
+ elseif ctx.x64 then
+ local lo = imm % 0x1000000
+ x = format("0x%02x%06x", (imm-lo) / 0x1000000, lo)
+ else
+ x = format("0x%08x", imm)
+ end
+ elseif p == "R" then
+ local r = byte(code, pos-1, pos-1)%8
+ if ctx.rexb then r = r + 8; ctx.rexb = false end
+ x = regs[r+1]
+ elseif p == "a" then x = regs[1]
+ elseif p == "c" then x = "cl"
+ elseif p == "d" then x = "dx"
+ elseif p == "1" then x = "1"
+ else
+ if not mode then
+ mode = ctx.mrm
+ if not mode then
+ if pos > stop then return incomplete(ctx) end
+ mode = byte(code, pos, pos)
+ pos = pos+1
+ end
+ rm = mode%8; mode = (mode-rm)/8
+ sp = mode%8; mode = (mode-sp)/8
+ sdisp = ""
+ if mode < 3 then
+ if rm == 4 then
+ if pos > stop then return incomplete(ctx) end
+ sc = byte(code, pos, pos)
+ pos = pos+1
+ rm = sc%8; sc = (sc-rm)/8
+ rx = sc%8; sc = (sc-rx)/8
+ if ctx.rexx then rx = rx + 8; ctx.rexx = false end
+ if rx == 4 then rx = nil end
+ end
+ if mode > 0 or rm == 5 then
+ local dsz = mode
+ if dsz ~= 1 then dsz = 4 end
+ local disp = getimm(ctx, pos, dsz); if not disp then return end
+ if mode == 0 then rm = nil end
+ if rm or rx or (not sc and ctx.x64 and not ctx.a32) then
+ if dsz == 1 and disp > 127 then
+ sdisp = format("-0x%x", 256-disp)
+ elseif disp >= 0 and disp <= 0x7fffffff then
+ sdisp = format("+0x%x", disp)
+ else
+ sdisp = format("-0x%x", (0xffffffff+1)-disp)
+ end
+ else
+ sdisp = format(ctx.x64 and not ctx.a32 and
+ not (disp >= 0 and disp <= 0x7fffffff)
+ and "0xffffffff%08x" or "0x%08x", disp)
+ end
+ pos = pos+dsz
+ end
+ end
+ if rm and ctx.rexb then rm = rm + 8; ctx.rexb = false end
+ if ctx.rexr then sp = sp + 8; ctx.rexr = false end
+ end
+ if p == "m" then
+ if mode == 3 then x = regs[rm+1]
+ else
+ local aregs = ctx.a32 and map_regs.D or ctx.aregs
+ local srm, srx = "", ""
+ if rm then srm = aregs[rm+1]
+ elseif not sc and ctx.x64 and not ctx.a32 then srm = "rip" end
+ ctx.a32 = false
+ if rx then
+ if rm then srm = srm.."+" end
+ srx = aregs[rx+1]
+ if sc > 0 then srx = srx.."*"..(2^sc) end
+ end
+ x = format("[%s%s%s]", srm, srx, sdisp)
+ end
+ if mode < 3 and
+ (not match(pat, "[aRrgp]") or match(pat, "t")) then -- Yuck.
+ x = map_sz2prefix[sz].." "..x
+ end
+ elseif p == "r" then x = regs[sp+1]
+ elseif p == "g" then x = map_segregs[sp+1]
+ elseif p == "p" then -- Suppress prefix.
+ elseif p == "f" then x = "st"..rm
+ elseif p == "x" then
+ if sp == 0 and ctx.lock and not ctx.x64 then
+ x = "CR8"; ctx.lock = false
+ else
+ x = "CR"..sp
+ end
+ elseif p == "y" then x = "DR"..sp
+ elseif p == "z" then x = "TR"..sp
+ elseif p == "t" then
+ else
+ error("bad pattern `"..pat.."'")
+ end
+ end
+ if x then operands = operands and operands..", "..x or x end
+ end
+ ctx.pos = pos
+ return putop(ctx, name, operands)
+end
+
+-- Forward declaration.
+local map_act
+
+-- Fetch and cache MRM byte.
+local function getmrm(ctx)
+ local mrm = ctx.mrm
+ if not mrm then
+ local pos = ctx.pos
+ if pos > ctx.stop then return nil end
+ mrm = byte(ctx.code, pos, pos)
+ ctx.pos = pos+1
+ ctx.mrm = mrm
+ end
+ return mrm
+end
+
+-- Dispatch to handler depending on pattern.
+local function dispatch(ctx, opat, patgrp)
+ if not opat then return unknown(ctx) end
+ if match(opat, "%|") then -- MMX/SSE variants depending on prefix.
+ local p
+ if ctx.rep then
+ p = ctx.rep=="rep" and "%|([^%|]*)" or "%|[^%|]*%|[^%|]*%|([^%|]*)"
+ ctx.rep = false
+ elseif ctx.o16 then p = "%|[^%|]*%|([^%|]*)"; ctx.o16 = false
+ else p = "^[^%|]*" end
+ opat = match(opat, p)
+ if not opat then return unknown(ctx) end
+-- ctx.rep = false; ctx.o16 = false
+ --XXX fails for 66 f2 0f 38 f1 06 crc32 eax,WORD PTR [esi]
+ --XXX remove in branches?
+ end
+ if match(opat, "%$") then -- reg$mem variants.
+ local mrm = getmrm(ctx); if not mrm then return incomplete(ctx) end
+ opat = match(opat, mrm >= 192 and "^[^%$]*" or "%$(.*)")
+ if opat == "" then return unknown(ctx) end
+ end
+ if opat == "" then return unknown(ctx) end
+ local name, pat = match(opat, "^([a-z0-9 ]*)(.*)")
+ if pat == "" and patgrp then pat = patgrp end
+ return map_act[sub(pat, 1, 1)](ctx, name, pat)
+end
+
+-- Get a pattern from an opcode map and dispatch to handler.
+local function dispatchmap(ctx, opcmap)
+ local pos = ctx.pos
+ local opat = opcmap[byte(ctx.code, pos, pos)]
+ pos = pos + 1
+ ctx.pos = pos
+ return dispatch(ctx, opat)
+end
+
+-- Map for action codes. The key is the first char after the name.
+map_act = {
+ -- Simple opcodes without operands.
+ [""] = function(ctx, name, pat)
+ return putop(ctx, name)
+ end,
+
+ -- Operand size chars fall right through.
+ B = putpat, W = putpat, D = putpat, Q = putpat,
+ V = putpat, U = putpat, T = putpat,
+ M = putpat, X = putpat, P = putpat,
+ F = putpat, G = putpat,
+
+ -- Collect prefixes.
+ [":"] = function(ctx, name, pat)
+ ctx[pat == ":" and name or sub(pat, 2)] = name
+ if ctx.pos - ctx.start > 5 then return unknown(ctx) end -- Limit #prefixes.
+ end,
+
+ -- Chain to special handler specified by name.
+ ["*"] = function(ctx, name, pat)
+ return map_act[name](ctx, name, sub(pat, 2))
+ end,
+
+ -- Use named subtable for opcode group.
+ ["!"] = function(ctx, name, pat)
+ local mrm = getmrm(ctx); if not mrm then return incomplete(ctx) end
+ return dispatch(ctx, map_opcgroup[name][((mrm-(mrm%8))/8)%8+1], sub(pat, 2))
+ end,
+
+ -- o16,o32[,o64] variants.
+ sz = function(ctx, name, pat)
+ if ctx.o16 then ctx.o16 = false
+ else
+ pat = match(pat, ",(.*)")
+ if ctx.rexw then
+ local p = match(pat, ",(.*)")
+ if p then pat = p; ctx.rexw = false end
+ end
+ end
+ pat = match(pat, "^[^,]*")
+ return dispatch(ctx, pat)
+ end,
+
+ -- Two-byte opcode dispatch.
+ opc2 = function(ctx, name, pat)
+ return dispatchmap(ctx, map_opc2)
+ end,
+
+ -- Three-byte opcode dispatch.
+ opc3 = function(ctx, name, pat)
+ return dispatchmap(ctx, map_opc3[pat])
+ end,
+
+ -- VMX/SVM dispatch.
+ vm = function(ctx, name, pat)
+ return dispatch(ctx, map_opcvm[ctx.mrm])
+ end,
+
+ -- Floating point opcode dispatch.
+ fp = function(ctx, name, pat)
+ local mrm = getmrm(ctx); if not mrm then return incomplete(ctx) end
+ local rm = mrm%8
+ local idx = pat*8 + ((mrm-rm)/8)%8
+ if mrm >= 192 then idx = idx + 64 end
+ local opat = map_opcfp[idx]
+ if type(opat) == "table" then opat = opat[rm+1] end
+ return dispatch(ctx, opat)
+ end,
+
+ -- REX prefix.
+ rex = function(ctx, name, pat)
+ if ctx.rex then return unknown(ctx) end -- Only 1 REX prefix allowed.
+ for p in gmatch(pat, ".") do ctx["rex"..p] = true end
+ ctx.rex = true
+ end,
+
+ -- Special case for nop with REX prefix.
+ nop = function(ctx, name, pat)
+ return dispatch(ctx, ctx.rex and pat or "nop")
+ end,
+}
+
+------------------------------------------------------------------------------
+
+-- Disassemble a block of code.
+local function disass_block(ctx, ofs, len)
+ if not ofs then ofs = 0 end
+ local stop = len and ofs+len or #ctx.code
+ ofs = ofs + 1
+ ctx.start = ofs
+ ctx.pos = ofs
+ ctx.stop = stop
+ ctx.imm = nil
+ ctx.mrm = false
+ clearprefixes(ctx)
+ while ctx.pos <= stop do dispatchmap(ctx, ctx.map1) end
+ if ctx.pos ~= ctx.start then incomplete(ctx) end
+end
+
+-- Extended API: create a disassembler context. Then call ctx:disass(ofs, len).
+local function create_(code, addr, out)
+ local ctx = {}
+ ctx.code = code
+ ctx.addr = (addr or 0) - 1
+ ctx.out = out or io.write
+ ctx.symtab = {}
+ ctx.disass = disass_block
+ ctx.hexdump = 16
+ ctx.x64 = false
+ ctx.map1 = map_opc1_32
+ ctx.aregs = map_regs.D
+ return ctx
+end
+
+local function create64_(code, addr, out)
+ local ctx = create_(code, addr, out)
+ ctx.x64 = true
+ ctx.map1 = map_opc1_64
+ ctx.aregs = map_regs.Q
+ return ctx
+end
+
+-- Simple API: disassemble code (a string) at address and output via out.
+local function disass_(code, addr, out)
+ create_(code, addr, out):disass()
+end
+
+local function disass64_(code, addr, out)
+ create64_(code, addr, out):disass()
+end
+
+
+-- Public module functions.
+module(...)
+
+create = create_
+create64 = create64_
+disass = disass_
+disass64 = disass64_
+
diff --git a/lib/dump.lua b/lib/dump.lua
new file mode 100644
index 0000000000..9fde87c1e9
--- /dev/null
+++ b/lib/dump.lua
@@ -0,0 +1,567 @@
+----------------------------------------------------------------------------
+-- LuaJIT compiler dump module.
+--
+-- Copyright (C) 2005-2009 Mike Pall. All rights reserved.
+-- Released under the MIT/X license. See Copyright Notice in luajit.h
+----------------------------------------------------------------------------
+--
+-- This module can be used to debug the JIT compiler itself. It dumps the
+-- code representations and structures used in various compiler stages.
+--
+-- Example usage:
+--
+-- luajit -jdump -e "local x=0; for i=1,1e6 do x=x+i end; print(x)"
+-- luajit -jdump=im -e "for i=1,1000 do for j=1,1000 do end end" | less -R
+-- luajit -jdump=is myapp.lua | less -R
+-- luajit -jdump=-b myapp.lua
+-- luajit -jdump=+aH,myapp.html myapp.lua
+-- luajit -jdump=ixT,myapp.dump myapp.lua
+--
+-- The first argument specifies the dump mode. The second argument gives
+-- the output file name. Default output is to stdout, unless the environment
+-- variable LUAJIT_DUMPFILE is set. The file is overwritten every time the
+-- module is started.
+--
+-- Different features can be turned on or off with the dump mode. If the
+-- mode starts with a '+', the following features are added to the default
+-- set of features; a '-' removes them. Otherwise the features are replaced.
+--
+-- The following dump features are available (* marks the default):
+--
+-- * t Print a line for each started, ended or aborted trace (see also -jv).
+-- * b Dump the traced bytecode.
+-- * i Dump the IR (intermediate representation).
+-- r Augment the IR with register/stack slots.
+-- s Dump the snapshot map.
+-- * m Dump the generated machine code.
+-- x Print each taken trace exit.
+-- X Print each taken trace exit and the contents of all registers.
+--
+-- The output format can be set with the following characters:
+--
+-- T Plain text output.
+-- A ANSI-colored text output
+-- H Colorized HTML + CSS output.
+--
+-- The default output format is plain text. It's set to ANSI-colored text
+-- if the COLORTERM variable is set. Note: this is independent of any output
+-- redirection, which is actually considered a feature.
+--
+-- You probably want to use less -R to enjoy viewing ANSI-colored text from
+-- a pipe or a file. Add this to your ~/.bashrc: export LESS="-R"
+--
+------------------------------------------------------------------------------
+
+-- Cache some library functions and objects.
+local jit = require("jit")
+assert(jit.version_num == 20000, "LuaJIT core/library version mismatch")
+local jutil = require("jit.util")
+local vmdef = require("jit.vmdef")
+local funcinfo, funcbc = jutil.funcinfo, jutil.funcbc
+local traceinfo, traceir, tracek = jutil.traceinfo, jutil.traceir, jutil.tracek
+local tracemc, traceexitstub = jutil.tracemc, jutil.traceexitstub
+local tracesnap = jutil.tracesnap
+local bit = require("bit")
+local band, shl, shr = bit.band, bit.lshift, bit.rshift
+local sub, gsub, format = string.sub, string.gsub, string.format
+local byte, char, rep = string.byte, string.char, string.rep
+local type, tostring = type, tostring
+local stdout, stderr = io.stdout, io.stderr
+
+-- Load other modules on-demand.
+local bcline, discreate
+
+-- Active flag, output file handle and dump mode.
+local active, out, dumpmode
+
+------------------------------------------------------------------------------
+
+local symtab = {}
+local nexitsym = 0
+
+-- Fill symbol table with trace exit addresses.
+local function fillsymtab(nexit)
+ local t = symtab
+ if nexit > nexitsym then
+ for i=nexitsym,nexit-1 do t[traceexitstub(i)] = tostring(i) end
+ nexitsym = nexit
+ end
+ return t
+end
+
+local function dumpwrite(s)
+ out:write(s)
+end
+
+-- Disassemble machine code.
+local function dump_mcode(tr)
+ local info = traceinfo(tr)
+ if not info then return end
+ local mcode, addr, loop = tracemc(tr)
+ if not mcode then return end
+ if not discreate then
+ discreate = require("jit.dis_"..jit.arch).create
+ end
+ out:write("---- TRACE ", tr, " mcode ", #mcode, "\n")
+ local ctx = discreate(mcode, addr, dumpwrite)
+ ctx.hexdump = 0
+ ctx.symtab = fillsymtab(info.nexit)
+ if loop ~= 0 then
+ symtab[addr+loop] = "LOOP"
+ ctx:disass(0, loop)
+ out:write("->LOOP:\n")
+ ctx:disass(loop, #mcode-loop)
+ symtab[addr+loop] = nil
+ else
+ ctx:disass(0, #mcode)
+ end
+end
+
+------------------------------------------------------------------------------
+
+local irtype_text = {
+ [0] = "nil",
+ "fal",
+ "tru",
+ "lud",
+ "str",
+ "ptr",
+ "thr",
+ "pro",
+ "fun",
+ "t09",
+ "tab",
+ "udt",
+ "num",
+ "int",
+ "i8 ",
+ "u8 ",
+ "i16",
+ "u16",
+}
+
+local colortype_ansi = {
+ [0] = "%s",
+ "%s",
+ "%s",
+ "%s",
+ "\027[32m%s\027[m",
+ "%s",
+ "\027[1m%s\027[m",
+ "%s",
+ "\027[1m%s\027[m",
+ "%s",
+ "\027[31m%s\027[m",
+ "\027[36m%s\027[m",
+ "\027[34m%s\027[m",
+ "\027[35m%s\027[m",
+ "\027[35m%s\027[m",
+ "\027[35m%s\027[m",
+ "\027[35m%s\027[m",
+ "\027[35m%s\027[m",
+}
+
+local function colorize_text(s, t)
+ return s
+end
+
+local function colorize_ansi(s, t)
+ return format(colortype_ansi[t], s)
+end
+
+local irtype_ansi = setmetatable({},
+ { __index = function(tab, t)
+ local s = colorize_ansi(irtype_text[t], t); tab[t] = s; return s; end })
+
+local html_escape = { ["<"] = "<", [">"] = ">", ["&"] = "&", }
+
+local function colorize_html(s, t)
+ s = gsub(s, "[<>&]", html_escape)
+ return format('%s', irtype_text[t], s)
+end
+
+local irtype_html = setmetatable({},
+ { __index = function(tab, t)
+ local s = colorize_html(irtype_text[t], t); tab[t] = s; return s; end })
+
+local header_html = [[
+
+]]
+
+local colorize, irtype
+
+-- Lookup table to convert some literals into names.
+local litname = {
+ ["SLOAD "] = { [0] = "", "I", "R", "RI", "P", "PI", "PR", "PRI", },
+ ["XLOAD "] = { [0] = "", "unaligned", },
+ ["TOINT "] = { [0] = "check", "index", "", },
+ ["FLOAD "] = vmdef.irfield,
+ ["FREF "] = vmdef.irfield,
+ ["FPMATH"] = vmdef.irfpm,
+}
+
+local function ctlsub(c)
+ if c == "\n" then return "\\n"
+ elseif c == "\r" then return "\\r"
+ elseif c == "\t" then return "\\t"
+ elseif c == "\r" then return "\\r"
+ else return format("\\%03d", byte(c))
+ end
+end
+
+local function formatk(tr, idx)
+ local k, t, slot = tracek(tr, idx)
+ local tn = type(k)
+ local s
+ if tn == "number" then
+ if k == 2^52+2^51 then
+ s = "bias"
+ else
+ s = format("%+.14g", k)
+ end
+ elseif tn == "string" then
+ s = format(#k > 20 and '"%.20s"~' or '"%s"', gsub(k, "%c", ctlsub))
+ elseif tn == "function" then
+ local fi = funcinfo(k)
+ if fi.ffid then
+ s = vmdef.ffnames[fi.ffid]
+ else
+ s = fi.loc
+ end
+ elseif tn == "table" then
+ s = format("{%p}", k)
+ elseif tn == "userdata" then
+ if t == 11 then
+ s = format("userdata:%p", k)
+ else
+ s = format("[%p]", k)
+ if s == "[0x00000000]" then s = "NULL" end
+ end
+ else
+ s = tostring(k) -- For primitives.
+ end
+ s = colorize(format("%-4s", s), t)
+ if slot then
+ s = format("%s @%d", s, slot)
+ end
+ return s
+end
+
+local function printsnap(tr, snap)
+ for i=1,#snap do
+ local ref = snap[i]
+ if not ref then
+ out:write("---- ")
+ elseif ref < 0 then
+ out:write(formatk(tr, ref), " ")
+ else
+ local m, ot, op1, op2 = traceir(tr, ref)
+ local t = band(ot, 15)
+ local sep = " "
+ if t == 8 then
+ local oidx = 6*shr(ot, 8)
+ local op = sub(vmdef.irnames, oidx+1, oidx+6)
+ if op == "FRAME " then
+ sep = "|"
+ end
+ end
+ out:write(colorize(format("%04d", ref), t), sep)
+ end
+ end
+ out:write("]\n")
+end
+
+-- Dump snapshots (not interleaved with IR).
+local function dump_snap(tr)
+ out:write("---- TRACE ", tr, " snapshots\n")
+ for i=0,1000000000 do
+ local snap = tracesnap(tr, i)
+ if not snap then break end
+ out:write(format("#%-3d %04d [ ", i, snap[0]))
+ printsnap(tr, snap)
+ end
+end
+
+-- NYI: should really get the register map from the disassembler.
+local reg_map = {
+ [0] = "eax", "ecx", "edx", "ebx", "esp", "ebp", "esi", "edi",
+ "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
+}
+
+-- Return a register name or stack slot for a rid/sp location.
+local function ridsp_name(ridsp)
+ local rid = band(ridsp, 0xff)
+ if ridsp > 255 then return format("[%x]", shr(ridsp, 8)*4) end
+ if rid < 128 then return reg_map[rid] end
+ return ""
+end
+
+-- Dump IR and interleaved snapshots.
+local function dump_ir(tr, dumpsnap, dumpreg)
+ local info = traceinfo(tr)
+ if not info then return end
+ local nins = info.nins
+ out:write("---- TRACE ", tr, " IR\n")
+ local irnames = vmdef.irnames
+ local snapref = 65536
+ local snap, snapno
+ if dumpsnap then
+ snap = tracesnap(tr, 0)
+ snapref = snap[0]
+ snapno = 0
+ end
+ for ins=1,nins do
+ if ins >= snapref then
+ if dumpreg then
+ out:write(format(".... SNAP #%-3d [ ", snapno))
+ else
+ out:write(format(".... SNAP #%-3d [ ", snapno))
+ end
+ printsnap(tr, snap)
+ snapno = snapno + 1
+ snap = tracesnap(tr, snapno)
+ snapref = snap and snap[0] or 65536
+ end
+ local m, ot, op1, op2, ridsp = traceir(tr, ins)
+ local oidx, t = 6*shr(ot, 8), band(ot, 31)
+ local op = sub(irnames, oidx+1, oidx+6)
+ if op == "LOOP " then
+ if dumpreg then
+ out:write(format("%04d ------------ LOOP ------------\n", ins))
+ else
+ out:write(format("%04d ------ LOOP ------------\n", ins))
+ end
+ elseif op ~= "NOP " and (dumpreg or op ~= "RENAME") then
+ if dumpreg then
+ out:write(format("%04d %-5s ", ins, ridsp_name(ridsp)))
+ else
+ out:write(format("%04d ", ins))
+ end
+ out:write(format("%s%s %s %s ",
+ band(ot, 64) == 0 and " " or ">",
+ band(ot, 128) == 0 and " " or "+",
+ irtype[t], op))
+ local m1 = band(m, 3)
+ if m1 ~= 3 then -- op1 != IRMnone
+ if op1 < 0 then
+ out:write(formatk(tr, op1))
+ else
+ out:write(format(m1 == 0 and "%04d" or "#%-3d", op1))
+ end
+ local m2 = band(m, 3*4)
+ if m2 ~= 3*4 then -- op2 != IRMnone
+ if m2 == 1*4 then -- op2 == IRMlit
+ local litn = litname[op]
+ if litn and litn[op2] then
+ out:write(" ", litn[op2])
+ else
+ out:write(format(" #%-3d", op2))
+ end
+ elseif op2 < 0 then
+ out:write(" ", formatk(tr, op2))
+ else
+ out:write(format(" %04d", op2))
+ end
+ end
+ end
+ out:write("\n")
+ end
+ end
+ if snap then
+ if dumpreg then
+ out:write(format(".... SNAP #%-3d [ ", snapno))
+ else
+ out:write(format(".... SNAP #%-3d [ ", snapno))
+ end
+ printsnap(tr, snap)
+ end
+end
+
+------------------------------------------------------------------------------
+
+local recprefix = ""
+local recdepth = 0
+
+-- Format trace error message.
+local function fmterr(err, info)
+ if type(err) == "number" then
+ if type(info) == "function" then
+ local fi = funcinfo(info)
+ if fi.ffid then
+ info = vmdef.ffnames[fi.ffid]
+ else
+ info = fi.loc
+ end
+ end
+ err = format(vmdef.traceerr[err], info)
+ end
+ return err
+end
+
+-- Dump trace states.
+local function dump_trace(what, tr, func, pc, otr, oex)
+ if what == "stop" or (what == "abort" and dumpmode.a) then
+ if dumpmode.i then dump_ir(tr, dumpmode.s, dumpmode.r and what == "stop")
+ elseif dumpmode.s then dump_snap(tr) end
+ if dumpmode.m then dump_mcode(tr) end
+ end
+ if what == "start" then
+ if dumpmode.H then out:write('
\n') end
+ out:write("---- TRACE ", tr, " ", what)
+ if otr then out:write(" ", otr, "/", oex) end
+ local fi = funcinfo(func, pc)
+ out:write(" ", fi.loc, "\n")
+ recprefix = ""
+ reclevel = 0
+ elseif what == "stop" or what == "abort" then
+ out:write("---- TRACE ", tr, " ", what)
+ recprefix = nil
+ if what == "abort" then
+ local fi = funcinfo(func, pc)
+ out:write(" ", fi.loc, " -- ", fmterr(otr, oex), "\n")
+ else
+ local link = traceinfo(tr).link
+ if link == tr then
+ link = "loop"
+ elseif link == 0 then
+ link = "interpreter"
+ end
+ out:write(" -> ", link, "\n")
+ end
+ if dumpmode.H then out:write("