Creating a self-contained MicroHs binary with embedded libraries #389
Replies: 10 comments 14 replies
-
|
Amazing hack! 👍 Your use case is a bit unusual. You want to build custom Until then, I think you should publish your work on hackage. I'm not sure exactly how, though. I think you could make something different by embedding entire |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for your positive feedback. I agree, it was an unusual use-case. Indeed, if I had been developing it in isolation, I probably wouldn't have wanted to go in that direction. But also the nice thing about MicroHs is that it is not so complex that it would dissuade you up front from trying this "hack".
Thanks for the advice I will look into the Also fyi, as an addendum to my post above: I subsequently managed to compress the embedded Haskell Modules, application modules, ffi_wrapper and and static library using zstd. The approach I took is that prior to compilation, there is one off compression step of these files to embed them in the c-header, and for the runtime decompression, a vfs variant with zstd decompression support was created. This requires adding a 900k single-file zstd decompressor ( The results are promising: binary size reduced from 3.2 MB to 1.3 MB with embedded files compressed 5.2x (2.5 MB to 367 KB including 112 KB dictionary). |
Beta Was this translation helpful? Give feedback.
-
|
I converted the issue to a discussion. |
Beta Was this translation helpful? Give feedback.
-
|
@augustss wrote
Do you mean macOS .pkg files here? |
Beta Was this translation helpful? Give feedback.
-
|
I'm not able to build a .pkg for some reason. I tried with this following script run from the root of the #!/usr/bin/env bash
THIRDPARTY=`pwd`/thirdparty
MHS_MIDI=`pwd`/projects/mhs-midi
MHSDIR=${THIRDPARTY}/MicroHs ${THIRDPARTY}/MicroHs/bin/mhs \
-a ${MHS_MIDI}/lib \
-P music.pkg \
${MHS_MIDI}/lib/Music.hsNote that .. and I get this error: ~/projects/midi-langs/thirdparty/MicroHs/bin/mhs: uncaught exception: error: no location: undefined export: main |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for your advice. I managed to get a package compiled after some time. There was a tricky bug that prevented one of the files from being compiled into the package which turned out to be caused by a name collision of an inner function thirdparty/MicroHs/bin/mhs: uncaught exception: error: "projects/mhs-midi/lib/MidiPerform.hs": line 402, col 54:
found: pattern
expected: { . LQIdent ( UQIdent [ literal _primitive @ (# \ case let if QualDo do mdo QSymOper ` :: ∷ where ; eofAfter figuring this out, the compilation worked well with this script: #!/usr/bin/env bash
# Note -P <entry> must be of form <name>-<semver>
PKG_VER=0.1.0
PKG_NAME=music
THIRDPARTY=`pwd`/thirdparty
MHS_MIDI=`pwd`/projects/mhs-midi
MHSDIR=${THIRDPARTY}/MicroHs ${THIRDPARTY}/MicroHs/bin/mhs -P${PKG_NAME}-${PKG_VER} -i${MHS_MIDI}/lib \
Async Midi MidiPerform Music MusicPerform \
-o ${PKG_NAME}.pkgI think embedding the |
Beta Was this translation helpful? Give feedback.
-
Embedding Precompiled .pkg Files in a Standalone MicroHs BinaryThis document provides some details about the development of The goal was to improve on the earlier demonstration of embedding the base haskell source files, app source files, MicroHS runtime, and statically compiled FFI dependencies. SummaryThe standalone mhs-midi binaries embed Haskell source files using a Virtual Filesystem (VFS), but this requires parsing and compiling ~274 modules on first run, resulting in a ~20 second cold-start delay. This document provides details about the development of Key Results:
Implementation Challenges Solved:
When to Use:
Background: The Cold Start ProblemThe standalone mhs-midi binaries embed all necessary Haskell source files (~274 .hs files from Subsequent runs load from
MicroHs Package SystemMicroHs supports precompiled packages via the PKG_VER=0.1.0
PKG_NAME=music
THIRDPARTY=`pwd`/thirdparty
MHS_MIDI=`pwd`/projects/mhs-midi
MHSDIR=${THIRDPARTY}/MicroHs ${THIRDPARTY}/MicroHs/bin/mhs \
-P${PKG_NAME}-${PKG_VER} \
-i${MHS_MIDI}/lib \
-o ${PKG_NAME}-${PKG_VER}.pkg \
Async Midi MidiPerform Music MusicPerformThis creates a To pre-compile the cd path/to/MicroHs
make installThen it becomes possible to load packages at runtime with ~/.mcabal/bin/mhs -pbase-0.15.2.0
# OR
~/.mcabal/bin/mhs -pbaseThe key point is that in the default installation of the When using MHSDIR=./thirdparty/MicroHs ./thirdparty/MicroHs/bin/mhs \
-a"$HOME/.mcabal/mhs-0.15.2.0" -pbaseThe The Local Dev Binary ProblemHowever, there's a critical limitation: the local dev binary still recompiles all modules from source, even when packages are preloaded: Installed binary (works correctly): ~/.mcabal/bin/mhs -pbase
Loading package /Users/sa/.mcabal/mhs-0.15.2.0/packages/base-0.15.2.0.pkg
Type ':quit' to quit, ':help' for help
>Local dev binary (recompiles everything despite -p flag): MHSDIR=./thirdparty/MicroHs ./thirdparty/MicroHs/bin/mhs -a"$HOME/.mcabal/mhs-0.15.2.0" -pbase
Welcome to interactive MicroHs, version 0.15.2.0
importing done Data.Bool_Type, 3ms
importing done Data.Ordering_Type, 2ms
importing done Primitives, 148ms
... (continues loading all modules from source)When This limitation is why the VFS embedding approach was necessary - it's the only way to achieve fast startup with a fully self-contained binary that doesn't depend on external paths. Implementation StrategyThe existing VFS intercepts The solution was to embed packages AND intercept all file operations so MicroHs has no access to external source files - forcing it to use the preloaded packages. Challenge 1: Package Discovery via Directory OperationsInitial testing with embedded Package not found: baseDebugging revealed that MicroHs discovers packages by scanning the -- From MicroHs/src/MicroHs/Packages.hs
findPkgs :: FilePath -> IO [FilePath]
findPkgs path = do
let pkgdir = path </> "packages"
fs <- getDirectoryContents pkgdir
return [pkgdir </> f | f <- fs, ".pkg" `isSuffixOf` f]This calls Solution: Virtual Directory ListingExtended typedef struct {
int index;
int is_virtual;
} VirtualDir;
void* mhs_opendir(const char* path) {
if (is_virtual_packages_dir(path)) {
VirtualDir* vd = malloc(sizeof(VirtualDir));
vd->index = 0;
vd->is_virtual = 1;
return vd;
}
return opendir_orig(path);
}
struct dirent* mhs_readdir(void* dirp) {
VirtualDir* vd = dirp;
if (vd->is_virtual) {
// Return embedded package names from table
if (embedded_packages[vd->index].path) {
static struct dirent de;
strcpy(de.d_name, embedded_packages[vd->index++].filename);
return &de;
}
return NULL;
}
return readdir_orig(dirp);
}Packages were now discovered. But loading still failed. Challenge 2: Module-to-Package MappingModule not found: PreludeMicroHs doesn't scan package contents to find modules. Instead, it uses ~/.mcabal/mhs-0.15.2.0/
Prelude.txt # Contains: "base-0.15.2.0.pkg"
Data/List.txt # Contains: "base-0.15.2.0.pkg"
Control/Monad.txt # Contains: "base-0.15.2.0.pkg"
...When searching for module Solution: Embed .txt Mapping FilesUpdated def collect_txt_files(base_dir: Path) -> list[tuple[str, Path]]:
"""Collect all .txt module mapping files."""
txt_files = []
for txt_path in base_dir.rglob("*.txt"):
if "packages" in txt_path.parts:
continue # Skip .txt inside package directories
rel_path = txt_path.relative_to(base_dir)
txt_files.append((str(rel_path), txt_path))
return sorted(txt_files)For custom modules (Midi, Music, etc.), synthetic def generate_music_txt_files(pkg_name: str, modules: list[str]):
return [(f"{module}.txt", pkg_name.encode()) for module in modules]The VFS serves these as small strings: static const char txt_Prelude_txt[] = "base-0.15.2.0.pkg";
static const char txt_Data_List_txt[] = "base-0.15.2.0.pkg";
// ...Challenge 3: Compilation SupportWith packages loading correctly, the REPL started in ~1 second. But the compilation test failed: mhs-midi-pkg -o /tmp/test /tmp/Test.hs
# Error: Cannot find runtime files for ccMicroHs compiles Haskell to C, then invokes Solution: Hybrid EmbeddingThe solution embeds both precompiled packages AND runtime source files: def collect_runtime_files(base_dir: Path) -> list[tuple[str, Path]]:
"""Collect runtime source files from packages/mhs-X.Y.Z/data/src/runtime/"""
runtime_files = []
packages_dir = base_dir / "packages"
for mhs_dir in packages_dir.iterdir():
if mhs_dir.name.startswith("mhs-"):
runtime_dir = mhs_dir / "data" / "src" / "runtime"
if runtime_dir.exists():
for file_path in runtime_dir.rglob("*"):
if file_path.is_file():
rel_path = file_path.relative_to(runtime_dir)
vfs_path = f"src/runtime/{rel_path}"
runtime_files.append((vfs_path, file_path))
return sorted(runtime_files)Static libraries are also embedded for linking: # CMake handles this automatically, but the equivalent command is:
./mhs-embed output.h --pkg-mode \
--pkg packages/base-0.15.2.0.pkg=build/mcabal/mhs-0.15.2.0/packages/base-0.15.2.0.pkg \
--pkg packages/music-0.1.0.pkg=build/music-0.1.0.pkg \
--txt-dir build/mcabal/mhs-0.15.2.0 \
--music-modules music-0.1.0.pkg:Async,Midi,MidiPerform,Music,MusicPerform \
--lib build/libmidi_ffi.a \
--lib build/liblibremidi.a \
--header src/midi_ffi.hDuring compilation, the VFS extracts these to a temporary directory for Final Embedded Content (MHS_USE_PKG)
Performance ComparisonStartup Time
Key insight: Binary Size
Feature Matrix
Error Messages and DebuggingError messages and debugging information are functionally identical across all build modes:
The only difference is in errors originating from Prelude/library code: # Source embedding mode (VFS path):
"/mhs-embedded/lib/Data/List.hs",389:11: head: empty list
# PKG mode (original compile-time path):
"/Users/.../thirdparty/MicroHs/lib/Data/List.hs",389:11: head: empty listLine and column numbers are preserved in both cases. There is no degradation in debugging capability when using precompiled packages. Build Instructions# Build all variants
make mhs-midi-all
# Or build specific variants:
make mhs-midi-src # Source embedding (uncompressed)
make mhs-midi-src-zstd # Source embedding (compressed, smallest)
make mhs-midi-pkg # Package embedding (fast startup)
make mhs-midi-pkg-zstd # Package + compression (recommended)
# Or using cmake directly:
cmake --build build --target mhs-midi-all
cmake --build build --target mhs-midi-pkg-zstdVariant Summary:
Prerequisites for MHS_USE_PKGThe package mode requires MicroHs to be built (but not installed): cd thirdparty/MicroHs
makeThe CMake build handles everything else automatically:
This keeps the source tree clean and avoids modifying user's home directory. ConclusionEmbedding precompiled
The result is a ~20x improvement in cold-start time (20s to 1s) while maintaining full functionality including the ability to compile Haskell programs to standalone executables. For distribution to end users, |
Beta Was this translation helpful? Give feedback.
-
|
@augustss I posted a full description of steps to embed .pkg files in mhs-midi above. This is currently working and tested on linux and macOS. Windows compiles the non-embedded case but not the embedded ones because it doesn't support |
Beta Was this translation helpful? Give feedback.
-
|
To help in making this method more general for other MicroHs users, I have created a project, mh-embed, which creates MicroHs standalones outside the context of my So far it can create source-based standalones only. I will add |
Beta Was this translation helpful? Give feedback.
-
|
@augustss fyi, I have added to mhs-embed the make example-pkg-zstd Build example-pkg-zstd standalone binary
make example-pkg Build example-pkg standalone binary
make example-src-zstd Build example-src-zstd standalone binary
make example-src Build example-src standalone binary
make example Build example REPL binary only |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This document describes how to create a self-contained MicroHs-based application that embeds all MicroHaskell libraries and can compile programs to standalone executables without any external file dependencies.
Background
mhs-midi is a MIDI-oriented music programming environment built on MicroHs. It is one of several language implementations that compile into convenient, self-contained executables. In its initial form,
mhs-midirequired extensive manual configuration, including specifying multiple directories such asMHSDIR, which points to the original MicroHs installation. To simplify this setup, a Python wrapper script was introduced. This script handled environment configuration and served as a REPL and compiler frontend for running and compiling MIDI-focused Haskell programs, while also integrating with the C/C++libremidilibrary via FFI.Despite these improvements,
mhs-midiremained an outlier: relocating it without friction was still difficult. Ideally, it should be distributable as a single executable that users can download and run immediately, without installing MicroHs or configuringMHSDIR. To explore this possibility, an experiment was designed with the following objectives:The Approach: Virtual Filesystem with fmemopen
MicroHs reads library files through the
mhs_fopenFFI function ineval.c. The proposed approach intercepts this function to serve embedded files from memory.Step 1: Embed Files as C Arrays
A script converts all
.hsfiles to a C header. We provide both Python and C implementations:This generates a header like:
C Implementation (for MicroHs Integration)
Provide a pure C implementation (
scripts/embed_libs.c) suitable for integration into MicroHs itself:Usage:
Output:
The C implementation has no dependencies beyond libc and is portable to any POSIX system.
Step 2: Virtual Filesystem Using
fmemopenThe VFS serves embedded files as
FILE*streams:Step 3: Intercept MicroHs FFI
As MicroHs calls
mhs_fopenfor all file operations, rename the original and provide an override:Step 4: Set MHSDIR to Virtual Root
At startup, point
MHSDIRto the virtual filesystem:MicroHs now constructs paths like
/mhs-embedded/lib/Prelude.hs, which the VFS intercepts and serves from memory.Challenge 1: UTF-8 Encoding Bug
Symptom: After loading ~163 of 185 modules, we got
ERR: getb_utf8.Root Cause: The embedding script had two bugs:
The file
Data/Bifunctor.hscontains the Unicode character...(U+2261). Python'slen()on a string counts code points, not bytes. And C octal escapes only support values 0-377 (0-255).Solution: Work with bytes, not strings:
Challenge 2: Compilation to Executable
When users compile with
-o program(not-o program.c), MicroHs invokesccwhich needs real files on disk. The VFS only works within the MicroHs process.Solution: Detect compilation mode and extract to temp directory:
Challenge 3: Linking MIDI Libraries
For our MIDI project, compiled executables need to link against libremidi and other libraries. We embed the static libraries (
.afiles) and inject linker flags:Generalizing for Other Projects
This approach turned out to be quite useful, and can be adapted for any MicroHs-based application:
1. Minimal Standalone (REPL/Run only)
If you only need REPL and
-r(run) modes, you just need:embed_libs.pyorembed_lib.c- embed your.hslibrariesvfs.c- the fmemopen-based VFSmhs_ffi_override.c- interceptmhs_fopenpatch_eval_vfs.py- rename originalmhs_fopenThis gives you a ~1MB overhead for the MicroHs standard library.
2. Full Standalone (with Compilation)
To support
-o executable:src/runtime/*.candsrc/runtime/*.hvfs_extract_to_temp()function-owithout.csuffix and extract3. With Additional Libraries
If your application has C dependencies:
.afiles) as binary content-optlflags when compiling to executablePotential MicroHs Enhancement
A potential enhancement to MicroHs itself could be a compile-time option to embed libraries:
This would generate a single C file with embedded libraries, eliminating the need for external VFS machinery. The generated REPL would be truly standalone.
The C implementation of
embed_libs.c(~500 lines, no dependencies) could be integrated directly into MicroHs to provide this functionality. It handles:.hsand.hs-bootfiles--runtime)--lib)--header)Results
The standalone binary:
-Ccaching)Files
Acknowledgments
Thanks to Lennart Augustsson for creating
MicroHs, which makes this kind of embedding possible through its use of combinators, clean FFI design and single-file C output.The exploratory work to develop the standalone
mhs-midiimplementation was greatly accelerated by the use of claude-code.References
Beta Was this translation helpful? Give feedback.
All reactions