Optimize TextBuffer.contentsAsDouble() #346

CodingFabian · 2017-01-12T23:45:40Z

Currently TextBuffer contentsAsDouble will first create a string that is passed to Double.parseDouble().

I know floating point parsing is painful but if you are looking for optimization opportunities, this is one.

cowtowncoder · 2017-01-13T04:28:40Z

I'd be happy to merge PR if you have something that can be shown to materially improve performance.

My understanding, for what that is worth, is that actual decoding from 10-based textual representation into binary floating point number is often an order of magnitude or more slower than cost of memory allocation (and resulting) GC.
This may be why JDK does not expose parse method(s) that would accept char[] as input.
And whereas decoding ints and longs manually is simple, floating-point arithmetics is REALLY complicated. If you are interested, there is one really good (even if old) paper that describes algorithm that Java implements -- I was only able to find the opposite side of printing:

https://pdfs.semanticscholar.org/9039/1d5a4b445b27cf8ab68ca10c0d37b69b39fc.pdf

but that gives you an idea of complexity. Or checkout JDK code. It's.... complicated.

CodingFabian · 2017-01-13T09:46:17Z

I know :-) I didn't suggest it was easy. Just wanted to document that in this case the String creation is significant in terms of garbage. Which matters if you are short on memory.
I did look at the implementation of the JDK, which made me wonder why they did not expose a char[] version. They use String.charAt() which incours bounds checks every single time they invoke it.

cowtowncoder · 2017-01-13T22:06:47Z

My point however is that the only case where this would seem to matter is if trying to absolutely minimize any memory allocations. I agree in that JDK could (and ideally should) have provided entry point from char[].

aaime · 2021-12-28T08:11:21Z

Looking into the bottlenecks of parsing GeoJSON with Jackson I've also found TextBuffer.contentsAsDouble() in the crosshair (it's using 30 to 50% of the overall time, depending on the test). In my case it was not the allocation of strings though, but eventually calling Double.parseDouble.

There is a library that can parse doubles significantly faster than Java own built-in. It looks a bit experimental so I hoped to just configure/subclass/manipulate Jackson's bits... got close enough, but a final ruined my parade:

    private static JsonFactory factory = new JsonFactory() {
        @Override
        protected IOContext _createContext(Object srcRef, boolean resourceManaged) {
            return new IOContext(this._getBufferRecycler(), srcRef, resourceManaged) {
                @Override
                public TextBuffer constructTextBuffer() {
                    return new TextBuffer(this._bufferRecycler) { // bummer... final
                        @Override
                        public double contentsAsDouble() throws NumberFormatException {
                            // eventually call FastDoubleParser here
                            return super.contentsAsDouble();
                        }
                    };
                }
            };
        }
    };

Pity, parsing complex GeoJSON polygons requires reading a lot of coordinates, optimizing that step would help quite a bit.

cowtowncoder · 2021-12-30T21:39:59Z

PRs would definitely be welcome.

I am open to changes and could change that method in TextBuffer but it isn't really a designed extension point and maybe more importantly it'd take a while to get new version in wide enough usage (ideally ought to be in 2.14.0 etc).

There are other ways to go about overriding behavior (via JsonParser) but they are probably no less work (or better otherwise). Ideally I guess floating-point decoding would be potentially pluggable, and I'd be happy to consider PRs that refactor this aspect. It would have to go in 2.14 as well unfortunately, so more of a longer term solution no matter what (unless actually just embedding relevant code to be called by TextBuffer which probably is the straightest way to get there).

cowtowncoder · 2022-07-30T03:43:35Z

I think #577 actually covers this; closing.

cowtowncoder added the performance Issue related to performance problems or enhancements label Aug 23, 2020

aaime mentioned this issue Dec 28, 2021

Make GeoJSON store usable in all GeoServer modules geotools/geotools#3723

Merged

11 tasks

cowtowncoder closed this as completed Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TextBuffer.contentsAsDouble() #346

Optimize TextBuffer.contentsAsDouble() #346

CodingFabian commented Jan 12, 2017

cowtowncoder commented Jan 13, 2017

CodingFabian commented Jan 13, 2017

cowtowncoder commented Jan 13, 2017

aaime commented Dec 28, 2021

cowtowncoder commented Dec 30, 2021 •

edited

Loading

cowtowncoder commented Jul 30, 2022

Optimize TextBuffer.contentsAsDouble() #346

Optimize TextBuffer.contentsAsDouble() #346

Comments

CodingFabian commented Jan 12, 2017

cowtowncoder commented Jan 13, 2017

CodingFabian commented Jan 13, 2017

cowtowncoder commented Jan 13, 2017

aaime commented Dec 28, 2021

cowtowncoder commented Dec 30, 2021 • edited Loading

cowtowncoder commented Jul 30, 2022

cowtowncoder commented Dec 30, 2021 •

edited

Loading