-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tile format for histograms #229
Comments
I see some problems:
Imagine we have a histogram like this for a tile: {
bins: [0, 1, 2, 3, 4, 5],
values: [3, 4, 4, 4, 1]
range: [0.0, 10.0]
} (so 5 buckets, range 0 -> 10) and this one for another tile: {
bins: [0, 1, 2, 3, 4, 5],
values: [3, 4, 4, 4, 1]
range: [0.0, 2.0]
} As you see the range is What would be the resulting histogram: {
bins: [0, 1, 2, 3, 4, 5],
values: [3 + 16, 4, 4, 4, 1]
range: [0.0, 10.0]
} We can calculate it for the first bucket because the bucket size in the first histogram is 2.0 (10.0/5) and the second histogram range is inside one bucket. It would work if the second histogram range would be a a multiple of the bin size of the first histogram. In general everything will work if the bin size for each histogram is:
This is harder server side than client side, in client side you just need to do:
|
I 100% agree. I didn't get round to implementing it in the code but the plan I had was to pass through the max and min of the column to width_bucket(requests,0,min,max). That would keep the bin sizes consistent. I like your idea of having each bin be an integer multiple of a given amount for the aggregation. The bins / range format looks fine as well. Lets go with that rather than the float boundaries for the bins. |
I worked on this, this is my proposal: {
"bins": { "0": 10, "9": 1, ....},
"bounds": [1.1, 3.2],
"zoom": 10,
"x": 4
} It sounds weird but let me explain a little bit each field:
The histograms are built in the same way tiles are built, we have the concept of zoom. The zoom 0 has 64 bins, zoom 1 -> 128, zoom 2 -> 256 zoom says what is the zoom level for the bins.
The interesting part, how to merge two histograms
function transform_histogram_to(hist, zoom) {
var new_hist = {
bins: {}
bounds: hist.bounds
zoom: zoom,
x: 0
}
var zoom_diff = zoom - hist.zoom;
new_hist.x = hist.x >> zoom_diff;
// aggregate
for(var k in h.bins) {
new_hist[(h.x + k) >> zoom_diff - new_hist.x] += hist.bins[k];
}
return new_hist;
}
function merge_histogram(h0, h1) {
var new_hist = {
bins: {}
bounds: [min(h0.bounds[0], h1.bounds[0]), max(h0.bounds[0], h1.bounds[1])],
zoom: h0.zoom, // zoom should be the same
x: min(h0.x, h1.x)
}
var zoom_levels =
// check the range isn't too big
var bucket_range = max(h0.x, h1.x) - new_hist.x
if ( bucket_range > 64) {
// zoom out
zoom_levels = ceil(log(float(bucket_range)/64)/log(2));
}
new_hist.x = new_hist.x >> zoom_levels;
for(var k in h0.bins) {
var idx = (h0.x + k) >> zoom_levels
if (new_hist[idx]) {
new_hist[idx] = 0
}
new_hist[idx - new_hist.x] += h0.bins[k];
}
for(var k in h1.bins) {
var idx = (h1.x + k) >> zoom_levels
if (new_hist[idx]) {
new_hist[idx] = 0
}
new_hist[idx - new_hist.x] += h1.bins[k];
}
return new_hist;
} Probably the code does not work and can be improved, it's only to illustrate the thing. Another possibility, instead of sending the |
This looks good. I think it will work in the front end just fine. |
Currently I am using the SQL to do requests for Histogram tiles. The code can be seen here: It builds up a series of arrays of the histogram data and bins and then concatenates these together in the response. The SQL it generates looks like
Where cat_id is for a category request and requests is for a histogram of a numerical variable. This is an example of the response from the query:
https://gist.github.com/stuartlynn/76e41e9c07ab01d92ee2
It returns a row with two arrays for each variable, one for the bins and the other for the values. In a json response though we probably want to have these formatted as something like:
Where the mappings relate the category values to their labels.
@javisantana
The text was updated successfully, but these errors were encountered: