diff --git a/README.md b/README.md index 180c100e..cb218280 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,7 @@ [![build](https://github.com/keichi/binary-parser/workflows/build/badge.svg)](https://github.com/keichi/binary-parser/actions?query=workflow%3Abuild) [![npm](https://img.shields.io/npm/v/binary-parser)](https://www.npmjs.com/package/binary-parser) +[![status](https://joss.theoj.org/papers/ec35c0e3ccc750a5cdab9771e5a6bf21/status.svg)](https://joss.theoj.org/papers/ec35c0e3ccc750a5cdab9771e5a6bf21) Binary-parser is a parser builder for JavaScript that enables you to write efficient binary parsers in a simple and declarative manner. diff --git a/paper/benchmark.pdf b/paper/benchmark.pdf new file mode 100644 index 00000000..dbc4dde6 Binary files /dev/null and b/paper/benchmark.pdf differ diff --git a/paper/paper.bib b/paper/paper.bib new file mode 100644 index 00000000..02da5499 --- /dev/null +++ b/paper/paper.bib @@ -0,0 +1,100 @@ +@misc{djiparsetxt, + author = {Christian Velez}, + title = {Decrypts and parse DJI logs in node}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2020}, + url = {https://github.com/chrisvm/node-djiparsetxt} +} + +@misc{libsbp, + author = {{Swift Navigation}}, + title = {Swift Binary Protocol client libraries}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2021}, + url = {https://github.com/swift-nav/libsbp} +} + +@misc{nimrod, + author = {Starbeamrainbowlabs}, + title = {Data downloader for the 1km NIMROD rainfall radar data}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2021}, + url = {https://github.com/sbrl/nimrod-data-downloader} +} + +@misc{flexradio, + author = {Stephen Houser}, + title = {NodeRed nodes for working with FlexRadio 6xxx series software defined radios}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2021}, + url = {https://github.com/stephenhouser/node-red-contrib-flexradio} +} + +@misc{linky, + author = {Zehir}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2021}, + url = {https://github.com/Zehir/eesmart-d2l} +} + +@misc{maxcul, + author = {Florian Beek}, + title = {A pimatic Plugin to control MAX! Heating devices over a Busware CUL stick}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2020}, + url = {https://github.com/fbeek/pimatic-maxcul} +} + +@misc{kaitai, + author = {{Kaitai team}}, + title = {Kaitai Struct: declarative language to generate binary data parsers}, + publisher = {GitHub}, + journal = {GitHub repository}, + year = {2021}, + url = {https://github.com/kaitai-io/kaitai_struct} +} + +@inproceedings{nail, + author={Bangert, Julian and Zeldovich, Nickolai}, + booktitle={2014 IEEE Security and Privacy Workshops}, + title={Nail: A Practical Interface Generator for Data Formats}, + year={2014}, + pages={158-166}, + doi={10.1109/SPW.2014.31} +} + +@inproceedings{nom, + author={Couprie, Geoffroy}, + booktitle={2015 IEEE Security and Privacy Workshops}, + title={Nom, A Byte oriented, streaming, Zero copy, Parser Combinators Library in Rust}, + year={2015}, + pages={142-148}, + doi={10.1109/SPW.2015.31} +} + +@inproceedings{parsifal, + author={Levillain, Olivier}, + booktitle={2014 IEEE Security and Privacy Workshops}, + title={Parsifal: A Pragmatic Solution to the Binary Parsing Problems}, + year={2014}, + pages={191-197}, + doi={10.1109/SPW.2014.35} +} + +@article{monadic, + title={Monadic parsing in Haskell}, + volume={8}, + doi={10.1017/S0956796898003050}, + number={4}, + journal={Journal of Functional Programming}, + publisher={Cambridge University Press}, + author={Hutton, Graham and Meijer, Erik}, + year={1998}, + pages={437–444} +} diff --git a/paper/paper.md b/paper/paper.md new file mode 100644 index 00000000..13aeabfe --- /dev/null +++ b/paper/paper.md @@ -0,0 +1,102 @@ +--- +title: 'Binary-parser: A declarative and efficient parser generator for binary data' +tags: + - JavaScript + - TypeScript + - binary + - parser +authors: + - name: Keichi Takahashi + orcid: 0000-0002-1607-5694 + affiliation: 1 +affiliations: + - name: Nara Institute of Science and Technology + index: 1 +date: 27 September 2021 +bibliography: paper.bib +--- + +# Summary + +This paper presents `binary-parser`, a JavaScript/TypeScript library that +allows users to write high-performance binary parsers, and facilitates the +rapid prototyping of research software that works with binary files and +network protocols. `Binary-parser`'s declarative API is designed such that +expressing complex binary structures is straightforward and easy. In addition +to the high productivity, `binary-parser` utilizes meta-programming to +dynamically generate parser codes to achieve parsing performance equivalent +to a hand-written parser. `Binary-parser` is being used by over 700 GitHub +repositories and 120 npm packages as of September 2021. + +# Statement of need + +Parsing binary data is a ubiquitous task in developing research software. Many +scientific instruments and software tools use proprietary file formats and +network protocols, while open-source libraries to work with them are often +unavailable or limited. In such situations, the programmer has no choice but +to write a binary parser. However, writing a binary parser by hand is +error-prone and tedious because the programmer faces challenges such as +understanding the specification of the binary format, correctly managing the +byte/bit offsets during parsing, and constructing complex data structures as +outputs. + +`Binary-parser` significantly reduces the programmer's effort by automatically +generating efficient parser code from a declarative description of the binary +format supplied by the user. The generated parser code is converted to a +JavaScript function and executed for efficient parsing. To accommodate diverse +needs by different users, `binary-parser` exposes various options to ensure +flexibility and provide opportunities for customization. + +A large number of software packages have been developed using `binary-parser` +that demonstrates its usefulness and practicality. Some examples include +libraries and applications to work with rainfall radars [@nimrod], +software-defined radio [@flexradio], GNSS receivers [@libsbp], smart meters +[@linky], drones [@djiparsetxt], and thermostats [@maxcul]. + +# Design + +`Binary-parser`'s design is characterized by the following three key features: + +1. **Fast**: `Binary-parser` takes advantage of meta-programming to generate + a JavaScript source code during runtime from the user's description of the + target binary format. The generated source code is then passed to the + `Function` constructor to dynamically create a function that performs + parsing. This design enables `binary-parser` to achieve parsing + performance comparable to a hand-written parser. +2. **Declarative**: As opposed to parser combinator libraries [@monadic; @nom], + `binary-parser` allows the user to express the target binary format in a + declarative manner, similar to a human-readable network protocol or file + format specification. The user can combine _primitive_ parsers (integers, + floating point numbers, bit fields, strings and bytes) using _composite_ + parsers (arrays, choices, nests and pointers) to express a wide variety of + binary formats. +3. **Flexible**: Unlike binary parser generators that use an external Domain + Specific Language (DSL) [@kaitai; @nail], `binary-parser` uses an internal + DSL implemented on top of JavaScript. This design allows the user to + specify most parsing options as return values of user-defined JavaScript + functions that are invoked at runtime. For example, the offset and length + of a field can be computed from another field that has been parsed already. + +# Performance evaluation + +To evaluate the parsing performance of `binary-parser`, we implemented a small +parser using `binary-parser` (v2.0.1) and three major JavaScript binary parser +libraries: `binparse` (v1.2.1), `structron` (v0.4.3) and `destruct.js` (v0.2.9). +We also implemented the same parser using Node.js's Buffer API as a baseline. +The binary data to be parsed was an array of 1,000 coordinates (each expressed +as three 16-bit integers) preceded by the number of coordinates (a 32-bit +integer). The benchmarks were executed on a MacBook Air (Apple M1 CPU, 2020). +The JavaScript runtime was Node.js (v16.9.1). + +![Performance comparison of binary-parser, binparse, structron, destruct.js and a hand-written parser.\label{fig:benchmark}](benchmark.pdf){ width=80% } + +\autoref{fig:benchmark} shows the measurement results. Evidently, +`binary-parser` significantly outperforms its alternatives by a factor of +7.5$\times$ to 180$\times$. The plot also reveals that `binary-parser` +achieves performance equal to a hand-written parser. + +# Acknowledgments + +This work was partly supported by JSPS KAKENHI Grant Number JP20K19808. + +# References