-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add script to split module based on source paths #25278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a script, `tools/empath-split.py`, which is a wrapper for Binaryen's `wasm-split`. `wasm-split` has `--multi-split` mode, which takes a manifest file that lists the name of functions per module. (Example: https://github.com/WebAssembly/binaryen/blob/main/test/lit/wasm-split/multi-split.wast.manifest) But listing all functions belonging to each module is a tedious process. `empath-split` takes a wasm file and a text file that has a list of paths, which can be either directories or functions, and using the source map information, generates a manifest file, and runs `wasm-split`. This makes a small drive-by fix for `emsymbolizer`. Currently when it takes a 0 address, it returns the location info associated with offsets[-1], which is the largest offset. This fixes it, and adds an optional `lower_bound` argument to `find_offset` so that when we want to get a source info entry, we don't go below the current function start offset.
| return None | ||
| # If lower bound is given, return the offset only if the offset is equal to | ||
| # or greather than the lower bound | ||
| if lower_bound: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that there's only one caller of this (and of lookup) and we don't anticipate any different use cases, maybe we should just simplify this by requiring lower_bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place is here:
emscripten/tools/emsymbolizer.py
Line 226 in eb6fcad
| return sm.lookup(address) |
What do we give for
lower_bound? It doesn't have the current function offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, ok.
Maybe for the original "just symbolize a random address" emsymbolizer use case we can eventually do better than we are now (e.g. give some kind of warning if we end up finding a location that corresponds to a different function from the given address, because odds are good it's not what the user actually wanted). But that doesn't have to be for this PR.
| assert module.read_string() == 'sourceMappingURL' | ||
| # TODO: support stripping/replacing a prefix from the URL | ||
| URL = module.read_string() | ||
| URL = module.get_sourceMappingURL() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add to this PR if things are working for you, but last time I tried to actually use emsymbolizer, I had to add something like
if not os.path.isfile(URL):
URL = os.path.join(os.path.dirname(module.filename), URL)
probably because I was using relative paths everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't change anything for emsymbolizer (I just moved sourceMappingURL-getting code from emsymbolizer.py to webassembly.py) and there was no os.path.join(os.path.dirname, ...) in emsymbolizer.py. Where am I supposed to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right I had to add it locally (I added it right here in emsymbolizer because that's where the code was until now). Again, this was just an FYI. Maybe I'll just try to reproduce the behavior and add a proper test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, emsymbolizer has worked for me with no change so far.. Yeah please let me know if you find the condition in which it becomes a problem.
|
Are we sure |
I'm all for different suggestions. What do you prefer? I started with And yeah, I think we can change the tool name even after landing because this is currently an experimental tool so that a few partners can try out and I don't intend to broadcast it to all users just yet. |
Co-authored-by: Derek Schuff <[email protected]>
|
@aheejin it just occurred to me that this functionality (in whatever form it ultimately gets integrated into emcc) should probably allow having multiple path specifications per module, rather than just one module per file or directory. I'd say let's figure out the JS glue and the integration into emcc first though. |
|
Done in #25577. |
This adds a script,
tools/empath-split.py, which is a wrapper for Binaryen'swasm-split.wasm-splithas--multi-splitmode, which takes a manifest file that lists the name of functions per module. (Example:https://github.com/WebAssembly/binaryen/blob/main/test/lit/wasm-split/multi-split.wast.manifest)
But listing all functions belonging to each module is a tedious process.
empath-splittakes a wasm file and a text file that has a list of paths, which can be either directories or functions, and using the source map information, generates a manifest file, and runswasm-split.This adds a small drive-by fix for
emsymbolizer. Currently when it takes a address 0, it returns the location info associated withoffsets[-1], which is the largest offset. This fixes it, and adds an optionallower_boundargument tofind_offsetso that when we want to get a source info entry, we don't go below the current function start offset.