Add script to split module based on source paths #25278

aheejin · 2025-09-15T17:08:20Z

This adds a script, tools/empath-split.py, which is a wrapper for Binaryen's wasm-split. wasm-split has --multi-split mode, which takes a manifest file that lists the name of functions per module. (Example:
https://github.com/WebAssembly/binaryen/blob/main/test/lit/wasm-split/multi-split.wast.manifest)

But listing all functions belonging to each module is a tedious process. empath-split takes a wasm file and a text file that has a list of paths, which can be either directories or functions, and using the source map information, generates a manifest file, and runs wasm-split.

This adds a small drive-by fix for emsymbolizer. Currently when it takes a address 0, it returns the location info associated with offsets[-1], which is the largest offset. This fixes it, and adds an optional lower_bound argument to find_offset so that when we want to get a source info entry, we don't go below the current function start offset.

This adds a script, `tools/empath-split.py`, which is a wrapper for Binaryen's `wasm-split`. `wasm-split` has `--multi-split` mode, which takes a manifest file that lists the name of functions per module. (Example: https://github.com/WebAssembly/binaryen/blob/main/test/lit/wasm-split/multi-split.wast.manifest) But listing all functions belonging to each module is a tedious process. `empath-split` takes a wasm file and a text file that has a list of paths, which can be either directories or functions, and using the source map information, generates a manifest file, and runs `wasm-split`. This makes a small drive-by fix for `emsymbolizer`. Currently when it takes a 0 address, it returns the location info associated with offsets[-1], which is the largest offset. This fixes it, and adds an optional `lower_bound` argument to `find_offset` so that when we want to get a source info entry, we don't go below the current function start offset.

dschuff · 2025-09-15T20:16:29Z

tools/emsymbolizer.py

+      return None
+    # If lower bound is given, return the offset only if the offset is equal to
+    # or greather than the lower bound
+    if lower_bound:


Given that there's only one caller of this (and of lookup) and we don't anticipate any different use cases, maybe we should just simplify this by requiring lower_bound.

Another place is here:

emscripten/tools/emsymbolizer.py

Line 226 in eb6fcad

return sm.lookup(address)

What do we give for lower_bound? It doesn't have the current function offset.

Oh yeah, ok.
Maybe for the original "just symbolize a random address" emsymbolizer use case we can eventually do better than we are now (e.g. give some kind of warning if we end up finding a location that corresponds to a different function from the given address, because odds are good it's not what the user actually wanted). But that doesn't have to be for this PR.

dschuff · 2025-09-15T20:20:39Z

tools/emsymbolizer.py

-    assert module.read_string() == 'sourceMappingURL'
    # TODO: support stripping/replacing a prefix from the URL
-    URL = module.read_string()
+    URL = module.get_sourceMappingURL()


no need to add to this PR if things are working for you, but last time I tried to actually use emsymbolizer, I had to add something like

if not os.path.isfile(URL): URL = os.path.join(os.path.dirname(module.filename), URL)

probably because I was using relative paths everywhere.

This doesn't change anything for emsymbolizer (I just moved sourceMappingURL-getting code from emsymbolizer.py to webassembly.py) and there was no os.path.join(os.path.dirname, ...) in emsymbolizer.py. Where am I supposed to add it?

Right I had to add it locally (I added it right here in emsymbolizer because that's where the code was until now). Again, this was just an FYI. Maybe I'll just try to reproduce the behavior and add a proper test.

Hmm, emsymbolizer has worked for me with no change so far.. Yeah please let me know if you find the condition in which it becomes a problem.

test/test_other.py

tools/empath-split.py

sbc100 · 2025-09-15T21:42:37Z

Are we sure empath-split is the best name for this tool? Are we free to change the name after this test lands?

aheejin · 2025-09-15T22:53:32Z

Are we sure empath-split is the best name for this tool? Are we free to change the name after this test lands?

I'm all for different suggestions. What do you prefer? I started with path-split, and then noticed all scripts that are meant to be used by outside users had the prefix em, so empath-split, but not that I particularly like the name.

And yeah, I think we can change the tool name even after landing because this is currently an experimental tool so that a few partners can try out and I don't intend to broadcast it to all users just yet.

Co-authored-by: Derek Schuff <[email protected]>

tools/empath-split.py

Co-authored-by: Derek Schuff <[email protected]>

dschuff · 2025-10-16T00:24:40Z

@aheejin it just occurred to me that this functionality (in whatever form it ultimately gets integrated into emcc) should probably allow having multiple path specifications per module, rather than just one module per file or directory. I'd say let's figure out the JS glue and the integration into emcc first though.

aheejin · 2025-10-17T22:07:53Z

Done in #25577.

aheejin requested review from dschuff and tlively September 15, 2025 17:08

aheejin added 4 commits September 15, 2025 17:09

Revert accidental change

ea300b2

comments

355fc60

Fix regex so that (import .. (func ..)) is not included

e6d39d6

comment typo

1246d2e

dschuff reviewed Sep 15, 2025

View reviewed changes

aheejin and others added 10 commits September 15, 2025 16:30

Update tools/empath-split.py

9cb04b7

Co-authored-by: Derek Schuff <[email protected]>

Address comments

4440ebf

ruff fixes

829e697

Merge branch 'main' into path_split

a475a24

More ruff fix

757253d

Add generated empath-split

272885c

Merge branch 'main' into path_split

7a99839

Maybe I shouldn't add this after all

e47d6a4

Add the scripts to .gitignore

d20c87b

fix

e36b0f5

sbc100 reviewed Sep 16, 2025

View reviewed changes

tools/empath-split.py Outdated Show resolved Hide resolved

tools/empath-split.py Outdated Show resolved Hide resolved

tools/empath-split.py Outdated Show resolved Hide resolved

tools/empath-split.py Outdated Show resolved Hide resolved

aheejin added 4 commits September 17, 2025 01:03

Address comments

3743aa7

Remove try-finally

f5f2468

Ruff fix

be1de65

Merge branch 'main' into path_split

60c39d2

dschuff approved these changes Sep 18, 2025

View reviewed changes

tools/empath-split.py Outdated Show resolved Hide resolved

aheejin and others added 4 commits September 18, 2025 12:08

Update tools/empath-split.py

a727aa1

Co-authored-by: Derek Schuff <[email protected]>

Merge branch 'main' into path_split

0bce350

Merge branch 'main' into path_split

7957deb

Merge branch 'main' into path_split

29b8a17

aheejin merged commit d801296 into emscripten-core:main Sep 22, 2025
32 checks passed

aheejin deleted the path_split branch September 22, 2025 22:17

Add script to split module based on source paths #25278

Add script to split module based on source paths #25278

Conversation

aheejin commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dschuff Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

aheejin Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

dschuff Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

dschuff Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

aheejin Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dschuff Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

aheejin Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbc100 commented Sep 15, 2025

Uh oh!

aheejin commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dschuff commented Oct 16, 2025

Uh oh!

aheejin commented Oct 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aheejin commented Sep 15, 2025 •

edited

Loading

aheejin Sep 15, 2025 •

edited

Loading

aheejin commented Sep 15, 2025 •

edited

Loading