Skip to content

Implementation of distribution_t data structure #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 67 commits into
base: project-sshmidt
Choose a base branch
from

Conversation

Lana243
Copy link

@Lana243 Lana243 commented Jul 27, 2020

Add distribution.h and distribution.c files that contains implementation of distribution_t data structure using the segment-tree approach.
You can read about the functionality of the distribution_t and the segment-tree approach in the design document: https://docs.google.com/document/d/1ccsg5ffUfqt9-mBDGTymRn8X-9Wk1CuGYeMlRxmxiok/edit?usp=sharing

Copy link
Owner

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Svetlana!

Copy link
Owner

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Svetlana! There is an uninitialized pointer in distribution_clone, all the rest is just nit-picks ;)

};
for (size_t i = 1; i < num_buckets; i++) {
bucket_array[i].bucket_counter = 0;
bucket_array[i].minimum = bucket_array[i - 1].maximum;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At your option: You could use a compound literal here, too:

bucket_array[i] = (bucket_t) {
  .minimum = bucket_array[i - 1].maximum,
  .maximum = (i == num_buckets-1) ? INFINITY : bucket_array[i - 1].maximum * factor,
};

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented such approach and realised that now my constructor functions are very similar. The main difference is in calculating maximum boundaries of buckets. Maybe, it would be better to select maximum boundaries first and then call the function that will fill other fields? Or such code duplication is okay?

Copy link

@bkjg bkjg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, Svetlana! 😄

}

int main() {
FILE *fout = fopen("benchmark.csv", "w");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fopen could possibly not succeed, so I'd recommend to check if fout is equal to NULL and if yes, return.

For consideration:
There is also another possibility to redirect output to file. If you would use only printf, then running ./distribution_benchmark > benchmark.csv should return the same file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Barbara, for example I wrote to the csv file using printf and writing:
./distribution_benchmark $i >> results.csv
in the bash script.

FILE *fout = fopen("benchmark.csv", "w");
fprintf(fout, "Number of buckets,Average for update,Average for percentile,Total for %lu mixed iterations\n", MIXED);
for (size_t num_buckets = 20; num_buckets <= 4000; num_buckets += 20) {
distribution_t *dist = build(num_buckets);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to check if dist is NULL and do some things if yes

static double *linear_upper_bounds(size_t num, double size) {
double *linear_upper_bounds = calloc(num, sizeof(*linear_upper_bounds));
for (size_t i = 0; i + 1 < num; i++)
linear_upper_bounds[i] = (i + 1) * size;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could possibly return an error because linear_update_bounds could be NULL. I'd recommend to check if linear_upper_bounds is NULL and do sth if yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same in the other upper bounds functions.

Copy link

@emargalit emargalit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your algorithm using segment tree is really interesting Svetlana, great work!

}

int main() {
FILE *fout = fopen("benchmark.csv", "w");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Barbara, for example I wrote to the csv file using printf and writing:
./distribution_benchmark $i >> results.csv
in the bash script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants