Skip to content

Optimize TextBuffer.contentsAsDouble() #346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CodingFabian opened this issue Jan 12, 2017 · 6 comments
Closed

Optimize TextBuffer.contentsAsDouble() #346

CodingFabian opened this issue Jan 12, 2017 · 6 comments
Labels
performance Issue related to performance problems or enhancements

Comments

@CodingFabian
Copy link
Contributor

Currently TextBuffer contentsAsDouble will first create a string that is passed to Double.parseDouble().

I know floating point parsing is painful but if you are looking for optimization opportunities, this is one.
screen shot 2017-01-13 at 00 45 25

@cowtowncoder
Copy link
Member

I'd be happy to merge PR if you have something that can be shown to materially improve performance.

My understanding, for what that is worth, is that actual decoding from 10-based textual representation into binary floating point number is often an order of magnitude or more slower than cost of memory allocation (and resulting) GC.
This may be why JDK does not expose parse method(s) that would accept char[] as input.
And whereas decoding ints and longs manually is simple, floating-point arithmetics is REALLY complicated. If you are interested, there is one really good (even if old) paper that describes algorithm that Java implements -- I was only able to find the opposite side of printing:

https://pdfs.semanticscholar.org/9039/1d5a4b445b27cf8ab68ca10c0d37b69b39fc.pdf

but that gives you an idea of complexity. Or checkout JDK code. It's.... complicated.

@CodingFabian
Copy link
Contributor Author

I know :-) I didn't suggest it was easy. Just wanted to document that in this case the String creation is significant in terms of garbage. Which matters if you are short on memory.
I did look at the implementation of the JDK, which made me wonder why they did not expose a char[] version. They use String.charAt() which incours bounds checks every single time they invoke it.

@cowtowncoder
Copy link
Member

My point however is that the only case where this would seem to matter is if trying to absolutely minimize any memory allocations. I agree in that JDK could (and ideally should) have provided entry point from char[].

@cowtowncoder cowtowncoder added the performance Issue related to performance problems or enhancements label Aug 23, 2020
@aaime
Copy link

aaime commented Dec 28, 2021

Looking into the bottlenecks of parsing GeoJSON with Jackson I've also found TextBuffer.contentsAsDouble() in the crosshair (it's using 30 to 50% of the overall time, depending on the test). In my case it was not the allocation of strings though, but eventually calling Double.parseDouble.

There is a library that can parse doubles significantly faster than Java own built-in. It looks a bit experimental so I hoped to just configure/subclass/manipulate Jackson's bits... got close enough, but a final ruined my parade:

    private static JsonFactory factory = new JsonFactory() {
        @Override
        protected IOContext _createContext(Object srcRef, boolean resourceManaged) {
            return new IOContext(this._getBufferRecycler(), srcRef, resourceManaged) {
                @Override
                public TextBuffer constructTextBuffer() {
                    return new TextBuffer(this._bufferRecycler) { // bummer... final
                        @Override
                        public double contentsAsDouble() throws NumberFormatException {
                            // eventually call FastDoubleParser here
                            return super.contentsAsDouble();
                        }
                    };
                }
            };
        }
    };

Pity, parsing complex GeoJSON polygons requires reading a lot of coordinates, optimizing that step would help quite a bit.

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 30, 2021

PRs would definitely be welcome.

I am open to changes and could change that method in TextBuffer but it isn't really a designed extension point and maybe more importantly it'd take a while to get new version in wide enough usage (ideally ought to be in 2.14.0 etc).

There are other ways to go about overriding behavior (via JsonParser) but they are probably no less work (or better otherwise). Ideally I guess floating-point decoding would be potentially pluggable, and I'd be happy to consider PRs that refactor this aspect. It would have to go in 2.14 as well unfortunately, so more of a longer term solution no matter what (unless actually just embedding relevant code to be called by TextBuffer which probably is the straightest way to get there).

@cowtowncoder
Copy link
Member

I think #577 actually covers this; closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issue related to performance problems or enhancements
Projects
None yet
Development

No branches or pull requests

3 participants