Skip to content

Clunky formatting when reading picks.csv with pandas or csv #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lennijusten opened this issue Jul 9, 2020 · 1 comment
Closed

Comments

@lennijusten
Copy link
Contributor

lennijusten commented Jul 9, 2020

The picks.csv output from PhaseNet contains some clunky formatting that requires the user to perform several string manipulations to properly format the itp, tp_prob, its, ts_prob columns.

I will show an example of reading the csv with pandas although reading the csv with the csv package runs into the same formatting issues . I will also share the function I had to make to correctly format the entries.

Pandas

import pandas as pd
df = pd.read_csv('output/picks.csv')

The result is a dataframe containing strings in the itp, tp_prob, its, ts_prob columns.

print(df['itp'][0])
>>>  '[   1 6620 8114]'

print(df['ts_prob'][0])
>>>  '[ 0.11291095  0.31720835  0.06021817]'

The values are not uniformly separated either which means the str.split() method can't be applied to convert the string into a list. Ideally, the csv would contain a uniform, comma-separated list of values. Another solution would be to also save a pickle file to the output directory that contains the lists in object form.

To fix the formatting with the current picks.csv, I made the following function:

import shlex
import pandas as pd

df = pd.read_csv('output/picks.csv')

def pickConverter(df):
    for col in ['itp', 'its']:
        pick_entry_list = []
        for x in range(len(df)):
            try:
                pick_entry_list.append(list(map(int, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                pick_entry_list.append([])
                pass
        df[col] = pick_entry_list

    for col in ['tp_prob', 'ts_prob']:
        prob_entry_list = []
        for x in range(len(df)):
            try:
                prob_entry_list.append(list(map(float, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                prob_entry_list.append([])
                pass
        df[col] = prob_entry_list
    return df
lennijusten added a commit to lennijusten/PhaseNet that referenced this issue Jul 18, 2020
I described this issue in under AI4EPS#9 and here is the PR with the fix. The fix is a two-part solution. 

First, I open the picks.csv file as fclog with the csv library and write the header row. I then open picks.csv in the append mode and write the results to picks.csv batch by batch. The results are converted from arrays to lists (this removes the empty white spaces), and then written to the row in a list of results instead of a single string formatted with the results. This also fixes the new-line issue I had opened and resolved earlier. 

I have tried and tested the method and it works as hoped. 

Best,
Lenni
@wayneweiqiang
Copy link
Collaborator

Thanks for the suggestion! I have updated the csv format without extra spaces. I have also added a pickle format which could be directly loaded for post processing. #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants