Deduplicate VM constants #712

triallax · 2025-04-07T19:24:41Z

(except when defining derived unit constants, as initially identical but temporary values)

Don't see any real performance changes from this (if anything it's probably a minor slowdown due to the added dupe check in add_constant). However, it does reduce the number of constants generated from the prelude from ~730 to ~320, a reduction of 56%. This in theory allows for larger Numbat programs as it's now harder to hit the ~65k constants ceiling. Worth it? I don't know :)

(except when defining derived unit constants, as initially identical but temporary values) Don't see any real performance changes from this (if anything it's probably a minor slowdown due to the added dupe check in `add_constant`). However, it does reduce the number of constants generated from the prelude from ~730 to ~320, a reduction of 56%. This in theory allows for larger Numbat programs as it's now harder to hit the ~65k constants ceiling. Worth it? I don't know :)

Goju-Ryu · 2025-04-08T16:43:29Z

This looks great.
I’m wondering if using an IndexSet would be possible? If it is I think it could simplify the code a bit and possibly address the (potential) performance impact you mentioned.
I’m not familiar enough with it to know its limitations, so this is mostly just my musings of curiosity.

triallax · 2025-04-08T18:22:26Z

i've already thought of that, unfortunately sets would not work here due to the dummy temporary constants added for derived base units, which are identical until processed at vm runtime; with a set these dummies would be incorrectly duplicated, at least as the code is now

triallax · 2025-04-08T18:26:44Z

the performance impact is nonexistent with my simple testing but i suppose algorithmically speaking the performance wil worsen much faster the more constants you have, so perhaps we can consider somehow eliminating the need for dummies here, it would also solve the pre-existing todo

Goju-Ryu · 2025-04-08T21:16:09Z

Aah, that's too bad. Thank you for the explanation though. I’m glad the impact is negligible, at currently realistic scale. With a linear time complexity, I think the trade off is worth it even at larger scales just due to the current constraint on the number of constants. It would be interesting to test the effects at scales large enough to approach the maximum number of constants without the duplication check.

triallax · 2025-04-08T21:42:08Z

to be clear i meant my previous testing (basically only the prelude)

however now i'm curious to get some results on more pathological cases so i did these rudimentary benchmarks:

$ seq 1 10000 | awk '{ print "let _" $0 " = " $0 }' > f
$ hyperfine " ./src/numbat/numbat-old  f"
Benchmark 1:  ./src/numbat/numbat-old  f
  Time (mean ± σ):      3.149 s ±  0.029 s    [User: 3.128 s, System: 0.018 s]
  Range (min … max):    3.100 s …  3.209 s    10 runs
 
$ hyperfine " ./src/numbat/numbat-better  f"
Benchmark 1:  ./src/numbat/numbat-better  f
  Time (mean ± σ):      3.219 s ±  0.029 s    [User: 3.200 s, System: 0.015 s]
  Range (min … max):    3.182 s …  3.267 s    10 runs
$ # same test but strings
$ seq 1 10000 | awk '{ print "let _" $0 " = \"foo" $0 "\"" }' > f
$ hyperfine " ./src/numbat/numbat-old  f" "./src/numbat/numbat-better f"
Benchmark 1:  ./src/numbat/numbat-old  f
  Time (mean ± σ):      2.008 s ±  0.027 s    [User: 1.991 s, System: 0.015 s]
  Range (min … max):    1.976 s …  2.066 s    10 runs
 
Benchmark 2: ./src/numbat/numbat-better f
  Time (mean ± σ):      2.332 s ±  0.033 s    [User: 2.314 s, System: 0.017 s]
  Range (min … max):    2.296 s …  2.401 s    10 runs
 
Summary
   ./src/numbat/numbat-old  f ran
    1.16 ± 0.02 times faster than ./src/numbat/numbat-better f

so there's a slowdown for sure (more noticeable on the strings version unsurprisingly), but it's not very massive and on a pathological case, but we might be able to do some work here still. still, i'm not sure if this pull request is worth it in the first place, i'm not sure what practical benefits it actually brings

also, i'm surprised the strings version is faster in the first place! i wonder why that is

triallax · 2025-04-08T21:48:02Z

also, tangential side note: piping the awk output directly into numbat is slower by an order of magnitude than reading it from a file:

$ time seq 1 10000 | awk '{ print "let _" $0 " = " $0 }' | numbat

________________________________________________________
Executed in   21.75 secs    fish           external
   usr time   21.66 secs    1.48 millis   21.66 secs
   sys time    0.09 secs    0.00 millis    0.09 secs

definitely worth a look

Goju-Ryu · 2025-04-09T09:37:27Z

That is a pretty minor impact at that scale, especially considering that the tests specifically avoid anything but the worst case scenario.
In a real life scenario, if we assume that the rate of occurrence of individual constant values are similar to the prelude (~40% being numbers 0-9), we would see even lower impact due to finding these values in the beginning of the list instead of having to traverse the whole thing on every constant.

also, i'm surprised the strings version is faster in the first place! i wonder why that is

That one surprised me too. I wonder if choosing representation and handed of units (or the lack there of) is having an impact here?

triallax · 2025-04-09T21:10:50Z

That one surprised me too. I wonder if choosing representation and handed of units (or the lack there of) is having an impact here?

my guess is possibly relating to number parsing, but i'm not sure if that can account for a 1 second difference
i'll take a closer look once i have the time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate VM constants #712

Deduplicate VM constants #712

triallax commented Apr 7, 2025

Goju-Ryu commented Apr 8, 2025

triallax commented Apr 8, 2025

triallax commented Apr 8, 2025 •

edited

Loading

Goju-Ryu commented Apr 8, 2025

triallax commented Apr 8, 2025 •

edited

Loading

triallax commented Apr 8, 2025

Goju-Ryu commented Apr 9, 2025

triallax commented Apr 9, 2025

Deduplicate VM constants #712

Are you sure you want to change the base?

Deduplicate VM constants #712

Conversation

triallax commented Apr 7, 2025

Goju-Ryu commented Apr 8, 2025

triallax commented Apr 8, 2025

triallax commented Apr 8, 2025 • edited Loading

Goju-Ryu commented Apr 8, 2025

triallax commented Apr 8, 2025 • edited Loading

triallax commented Apr 8, 2025

Goju-Ryu commented Apr 9, 2025

triallax commented Apr 9, 2025

triallax commented Apr 8, 2025 •

edited

Loading

triallax commented Apr 8, 2025 •

edited

Loading