Skip to content

Conversation

istride
Copy link
Contributor

@istride istride commented Mar 15, 2025

Repeat a cell's value if the 'number-columns-repeated' attribute is set.

end = row_vals.index("")
except ValueError:
end = len(row_vals)
dset.headers = row_vals[:end]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried here that a column with an empty header cell will cut off data. I suppose having an empty header in the middle of headers is not well tested currently, probably something to improve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I revert this, then test_ods_import_set_ragged will fail because the first row of 'ragged.ods' contains 16,380 trailing empty cells. If this constitutes a valid header row, then the assertion in the test would need to be modified to accommodate it. Should I do this instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use some heuristic to guess the end of the headers, like 5 successive empty headers would mean the header line is over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other formats (csv, xlsx, html) the header row is just accepted as it is, so I am now hesitant about changing this convention just for ods. The 'ragged.ods' file seems like a very extreme case, that is very unlikely to occur naturally. I'm more in favour of reverting my change and fixing the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claudep I've decided to follow the convention used in other formats, of accepting the header row as-is. Would you let me know if this ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that in my experience, most .ods files resulting from an xlsx import (a common use case) will almost always have >16000 rows and also a big number of repeated (empty) rows at the end. This will result in tablib in big data structures mostly filled with empty strings. I would really try to avoid that, even if one could consider this as an ods import bug.

@istride istride force-pushed the ods-support-number-cols-repeated branch from be85a28 to 5d74fb4 Compare July 15, 2025 16:56
@istride istride requested a review from claudep July 16, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants