-
-
Notifications
You must be signed in to change notification settings - Fork 594
Support 'number-columns-repeated' attribute of ODS cells #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
src/tablib/formats/_ods.py
Outdated
end = row_vals.index("") | ||
except ValueError: | ||
end = len(row_vals) | ||
dset.headers = row_vals[:end] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried here that a column with an empty header cell will cut off data. I suppose having an empty header in the middle of headers is not well tested currently, probably something to improve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I revert this, then test_ods_import_set_ragged
will fail because the first row of 'ragged.ods' contains 16,380 trailing empty cells. If this constitutes a valid header row, then the assertion in the test would need to be modified to accommodate it. Should I do this instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use some heuristic to guess the end of the headers, like 5 successive empty headers would mean the header line is over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other formats (csv, xlsx, html) the header row is just accepted as it is, so I am now hesitant about changing this convention just for ods. The 'ragged.ods' file seems like a very extreme case, that is very unlikely to occur naturally. I'm more in favour of reverting my change and fixing the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@claudep I've decided to follow the convention used in other formats, of accepting the header row as-is. Would you let me know if this ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that in my experience, most .ods files resulting from an xlsx import (a common use case) will almost always have >16000 rows and also a big number of repeated (empty) rows at the end. This will result in tablib in big data structures mostly filled with empty strings. I would really try to avoid that, even if one could consider this as an ods import bug.
be85a28
to
5d74fb4
Compare
Repeat a cell's value if the 'number-columns-repeated' attribute is set.