Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whitespace in challenge input files causes unnecessarily large file size #30

Open
mlangiu opened this issue Apr 1, 2021 · 1 comment

Comments

@mlangiu
Copy link

mlangiu commented Apr 1, 2021

I guess this comes a bit late, but I just realized that the format of the input files is somewhat problematic.
The additional white space is nice for readability in small example files but of no great use in larger files which aren't read by humans anyways.
The issue with the whitespace is that it causes unnecessarily large file sizes and consequentially also longer than necessary read-in times!
Both file size and read-in times can be reduced significantly (70-80%) by using input files for which all whitespace is removed.
The following script provides one way to accomplish this:

# compactify.py
import sys
import os
file_path = sys.argv[1]
path, filename = os.path.split(file_path)
name, extension = os.path.splitext(filename)
import json
with open(file_path, 'r') as f:
    data = json.load(f)
with open(os.path.join(path, name + '_compact' + extension), 'w') as f:
    json.dump(data, f, separators=(',', ':'))

e.g. with

for f in `ls A_set/*`;
do
  python compactify.py $f
done

I feel that it would be in everyone's interest, if you'd run the evaluation with the reduced-size input files.

Kind regards

Marco

@klorel
Copy link
Member

klorel commented Apr 2, 2021

Hi,
This is a very nice suggestion, thanks!

For people using standard json parser there will be no problem. But we don't know if it is the case for every body, we'll see during the semi-finale evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants