Improve data file format #66
Replies: 4 comments 1 reply
-
I agree
tab-separated-values seems like a good choice to me. It would allow the user to use commas and/or spaces in the
yes this is lost information that should be fixed
this would be great!
I agree being consistent with other standards is preferable, but I don't think it is a deal breaker if it is difficult to change. Maybe we continue to represent time as integers in the pyControl framework/clock, and convert to float seconds when the data is being written to the log?
Yes, I think making variables their own type would work. so the possible types would be:
I think this is a workable solution. We currently have a "postprocessor.py" function automatically run at the end of experiment sessions and have been happy with it. Even if it fails, the temporary files will still be there/recoverable. We could open up this function to users as well. We use it to automatically transfer the local data files to a network drive at the end of a session I also propose that we embed the contents of the task file and hardware definition (#52) as "info" entries at the start of the file. The additional size isn't a problem (we typically have mb sized data logs with kb sized task files and hardware definitions) and I think it is better to not spread out session information across multiple files if it can be prevented. I worry about the current solution of hashed task files, which multiple sessions point to / depend on for information, getting misplaced or deleted. It would still be useful to provide unique hashes as part of the "info" entry, so it is quick/simple to check if two sessions are using the same task. |
Beta Was this translation helpful? Give feedback.
-
Thanks for you thoughts on this.
I think this is a good idea.
One option would be to have a 'variable' type, with the 'name' column used to indicate 'get', 'set', or 'print' (respectively for getting/setting using the variables dialog and using the print_variables function in a task), with the variable name(s) and value(s) as a json
Annother option would be to have 'variable_get' 'variable_set' and 'variable_print' types, with the 'name' and 'value' fields used for the variable name and value for the get/set types, and a json used for the print type.
I'm not sure which I prefer, the first option gives a consistent way of parsing all the variable lines but the second option is cleaner for the set/get lines.
I think this is workable solution. I think making it possible for users to call a custom postprocessor function could be useful though it requires a bit of thought whether this is done at the session or experiment level.
I'm not sure about this for two reasons. First because it would make the tsv file complicated to parse. The task files might have tabs in them (though we could convert these to spaces), and because they would certainly contain end of line characters in them and it would be hard to guarentee there was no risk of these confusing the tsv parser. Second because it would hurt the human readability of the files if opened in a text editor. As we typically sync / transfer entire experiment folders when we move pyContol data we have not had issues with task files getting seperated from data files, but I can see the potential issue (which presumably also exists for analog data files). One option would be to create a folder for each session and put all files related to the session in there (including task file and hardware definition). I think this could useful but should be optional, what do you think? |
Beta Was this translation helpful? Give feedback.
-
I was imagining storing the file contents as a raw string, so I don't think it would be a problem. Something like the following: with open('reversal_learning.py') as task_reader:
contents = task_reader.read()
data_output = f"info\ttask file\t\t{repr(contents)}"
with open('data.tsv', 'w') as data_writer:
data_writer.write(data_output) there would be harm to human readability in that it would cause the task file and hardware definition "info" entries to be extra extra long though. |
Beta Was this translation helpful? Give feedback.
-
I've started working on the new data file functionallity in the new_data_file_format branch. So far there is a new GUI dropdown menu in the
|
Beta Was this translation helpful? Give feedback.
-
I think there is room for improvement in the way data generated by pyControl experiments is saved. There are several issues with the current data formats (
.txt
for states, events, and user prints, and.pca
for analog data):Issues with the .txt files
.txt
files are difficult for a human to read because they use integer codes for states and events rather than state/event names. I think the increased size of data files would be worthwhile for human readability, particularly as the files are small..tsv
) then loading the data in languages/programs other than Python (where we provide import code) would be simplified.print_variables
function currently generates a print line that is not differentiated from standard print lines and probably should be.Issues with the .pca analog data files
.npy
) would be preferable for ease of loading data.General issues
Proposed changes
The current data formats should remain supported, but we could offer the following new data format in addition and make it the default.
.txt
file with a.tsv
file, with structure based on that of the dataframe generated by thesession_dataframe
function in thedata_import
module. I.e. a table with the following columns:type
: Whether the row contains session 'info', a 'state' entry, 'event', 'print' line, or 'warning'name
: The name of the state, event or session information in the row.time
: The time the row occured in ms since the session start.value
: The contents of 'info', 'print' and 'warning' rows.We would need to think about the best way to handle variables, probably there should be a 'variable' type, but how this should handle setting and getting individual variables, and printing multiple variables, needs a bit of thought.
@alustig3 what are your thoughts on this?
Beta Was this translation helpful? Give feedback.
All reactions