Clunky formatting when reading picks.csv with pandas or csv #9

lennijusten · 2020-07-09T17:37:18Z

The picks.csv output from PhaseNet contains some clunky formatting that requires the user to perform several string manipulations to properly format the itp, tp_prob, its, ts_prob columns.

I will show an example of reading the csv with pandas although reading the csv with the csv package runs into the same formatting issues . I will also share the function I had to make to correctly format the entries.

Pandas

import pandas as pd
df = pd.read_csv('output/picks.csv')

The result is a dataframe containing strings in the itp, tp_prob, its, ts_prob columns.

print(df['itp'][0])
>>>  '[   1 6620 8114]'

print(df['ts_prob'][0])
>>>  '[ 0.11291095  0.31720835  0.06021817]'

The values are not uniformly separated either which means the str.split() method can't be applied to convert the string into a list. Ideally, the csv would contain a uniform, comma-separated list of values. Another solution would be to also save a pickle file to the output directory that contains the lists in object form.

To fix the formatting with the current picks.csv, I made the following function:

import shlex
import pandas as pd

df = pd.read_csv('output/picks.csv')

def pickConverter(df):
    for col in ['itp', 'its']:
        pick_entry_list = []
        for x in range(len(df)):
            try:
                pick_entry_list.append(list(map(int, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                pick_entry_list.append([])
                pass
        df[col] = pick_entry_list

    for col in ['tp_prob', 'ts_prob']:
        prob_entry_list = []
        for x in range(len(df)):
            try:
                prob_entry_list.append(list(map(float, shlex.split(df[col][x].strip('[]')))))
            except AttributeError:
                prob_entry_list.append([])
                pass
        df[col] = prob_entry_list
    return df

The text was updated successfully, but these errors were encountered:

I described this issue in under AI4EPS#9 and here is the PR with the fix. The fix is a two-part solution. First, I open the picks.csv file as fclog with the csv library and write the header row. I then open picks.csv in the append mode and write the results to picks.csv batch by batch. The results are converted from arrays to lists (this removes the empty white spaces), and then written to the row in a list of results instead of a single string formatted with the results. This also fixes the new-line issue I had opened and resolved earlier. I have tried and tested the method and it works as hoped. Best, Lenni

wayneweiqiang · 2020-07-28T04:05:24Z

Thanks for the suggestion! I have updated the csv format without extra spaces. I have also added a pickle format which could be directly loaded for post processing. #11

lennijusten mentioned this issue Jul 18, 2020

Cleaned picks.csv formatting with the csv library #11

Open

wayneweiqiang closed this as completed Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clunky formatting when reading picks.csv with pandas or csv #9

Clunky formatting when reading picks.csv with pandas or csv #9

lennijusten commented Jul 9, 2020 •

edited

Loading

wayneweiqiang commented Jul 28, 2020

Clunky formatting when reading picks.csv with pandas or csv #9

Clunky formatting when reading picks.csv with pandas or csv #9

Comments

lennijusten commented Jul 9, 2020 • edited Loading

wayneweiqiang commented Jul 28, 2020

lennijusten commented Jul 9, 2020 •

edited

Loading