Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupBy method should allow more control over method option #202

Open
adambom opened this issue Jun 13, 2013 · 2 comments
Open

groupBy method should allow more control over method option #202

adambom opened this issue Jun 13, 2013 · 2 comments

Comments

@adambom
Copy link

adambom commented Jun 13, 2013

I want to be able provide a function for each of the columns I specify. Let me provide an example.

The following code doesn't work because it just does sum on the year column.

ds.where(function (row) {
    return row.year === 2010;
}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();

I end up with a view that looks like this:

[ { _id: 39155,
    race: 'White',
    population: 77036790.6869098,
    year: 643200 },
  { _id: 39157,
    race: 'Black',
    population: 11941573.990469573,
    year: 643200 },
  { _id: 39159,
    race: 'Asian',
    population: 5038865.1221144805,
    year: 643200 },
  { _id: 39161,
    race: 'Hispanic',
    population: 13029068.243787775,
    year: 643200 },
  { _id: 39163,
    race: 'Other',
    population: 6686024.6175516695,
    year: 643200 } ]

Notice how the value for year is not 2010, which is what I want.

I'd like to propose an update to the method option in groupBy that allows me to pass an object with key being the column I'm aggregating by, and value being a function that determines the aggregation method. If none is specified, you get sum. I could rewrite my example above like so:

ds.where(function (row) {
    return row.year === 2010;
}).groupBy('race', ['population', 'year'], {
    method: {
        year: _.identity
        population: _.sum // optional, but showing here to be explicit
    }
})
.columns(['year', 'race', 'population']).toJSON();

This would return:

[ { _id: 39155,
    race: 'White',
    population: 77036790.6869098,
    year: 2010 },
  { _id: 39157,
    race: 'Black',
    population: 11941573.990469573,
    year: 2010 },
  { _id: 39159,
    race: 'Asian',
    population: 5038865.1221144805,
    year: 2010 },
  { _id: 39161,
    race: 'Hispanic',
    population: 13029068.243787775,
    year: 2010 },
  { _id: 39163,
    race: 'Other',
    population: 6686024.6175516695,
    year: 2010 } ]

Whatcha think? I'm open to other approaches too. What I'm trying to emulate is this SQL

SELECT year, race, sum(population) 
FROM mytable 
WHERE year=2010 
GROUP BY race;
@iros
Copy link
Member

iros commented Jun 19, 2013

Could you give me a sample of your data? I'm not really sure why the year
is getting converted to something strange. Is it being added up?

-- Irene

On Thu, Jun 13, 2013 at 4:57 PM, Adam Savitzky [email protected]:

I want to be able provide a function for each of the columns I specify.
Let me provide an example.

The following code doesn't work because it just does sum on the year
column.

ds.where(function (row) {
return row.year === 2010;}).groupBy('race', ['population', 'year']).columns(['year', 'race', 'population']).toJSON();

I end up with a view that looks like this:

[ { _id: 39155,
race: 'White',
population: 77036790.6869098,
year: 643200 },
{ _id: 39157,
race: 'Black',
population: 11941573.990469573,
year: 643200 },
{ _id: 39159,
race: 'Asian',
population: 5038865.1221144805,
year: 643200 },
{ _id: 39161,
race: 'Hispanic',
population: 13029068.243787775,
year: 643200 },
{ _id: 39163,
race: 'Other',
population: 6686024.6175516695,
year: 643200 } ]

Notice how the value for year is not 2010, which is what I want.

I'd like to propose an update to the method option in groupBy that allows
me to pass an object with key being the column I'm aggregating by, and
value being a function that determines the aggregation method. If none is
specified, you get sum. I could rewrite my example above like so:

ds.where(function (row) {
return row.year === 2010;}).groupBy('race', ['population', 'year'], {
method: {
year: _.identity
population: _.sum // optional, but showing here to be explicit
}}).columns(['year', 'race', 'population']).toJSON();

This would return:

[ { _id: 39155,
race: 'White',
population: 77036790.6869098,
year: 2010 },
{ _id: 39157,
race: 'Black',
population: 11941573.990469573,
year: 2010 },
{ _id: 39159,
race: 'Asian',
population: 5038865.1221144805,
year: 2010 },
{ _id: 39161,
race: 'Hispanic',
population: 13029068.243787775,
year: 2010 },
{ _id: 39163,
race: 'Other',
population: 6686024.6175516695,
year: 2010 } ]

Whatcha think? I'm open to other approaches too. What I'm trying to
emulate is this SQL

SELECT year, race, sum(population) FROM mytable WHERE year=2010 GROUP BY race;


Reply to this email directly or view it on GitHubhttps://github.com//issues/202
.

@adambom
Copy link
Author

adambom commented Jul 5, 2013

Yes, the year is being added up. It would be nice if there was a way to specify aggregators for all the columns, so this didin't happen.

The data looks like this:

year | race     | region | population
-------------------------------------
2010 | hispanic | south  | 100
2010 | hispanic | east   | 100
2010 | hispanic | west   | 100
2011 | hispanic | east   | 200
2011 | hispanic | south  | 200
2011 | hispanic | west   | 200
2012 | hispanic | west   | 300
2012 | hispanic | east   | 300
2012 | hispanic | south  | 300
2010 | white    | south  | 200
2010 | white    | east   | 200
2010 | white    | west   | 200
2011 | white    | east   | 300
2011 | white    | south  | 300
2011 | white    | west   | 300
2012 | white    | west   | 400
2012 | white    | east   | 400
2012 | white    | south  | 400
2010 | black    | south  | 150
2010 | black    | east   | 150
2010 | black    | west   | 150
2011 | black    | east   | 160
2011 | black    | south  | 160
2011 | black    | west   | 160
2012 | black    | west   | 170
2012 | black    | east   | 170
2012 | black    | south  | 170

What I'd like to get is data that looks like this:

year | race     | sum(population)
-----------------------------
2010 | hispanic | 300
2010 | white    | 600
2010 | black    | 450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants