Messy string to a table


I was trying to avoid having to manually turning this messy string (downloaded CSV and then opened with Notepad…please do tell me if there is an easier way to do this as well) into a table. Above is the result from Transact SQL query that I ran.

Screen Link:

Above is what I’m trying to get to… but this is the set of string that I’m working with:

Id,PostTypeId,CreationDate,Score,ViewCount,Tags,AnswerCount,FavoriteCount
“52508”,“1”,“2019-05-24 02:59:13”,“1”,“33”,"",“1”,""
“52511”,“1”,“2019-05-24 06:46:23”,"-1",“36”,"",“1”,""
“52512”,“1”,“2019-05-24 07:03:04”,“1”,“69”,"",“1”,""
“52513”,“1”,“2019-05-24 07:07:12”,“2”,“262”,"",“1”,“2”
“52515”,“1”,“2019-05-24 08:18:25”,“1”,“64”,"",“3”,""

My Code:

import pandas as pd
table_str = 'Id,PostTypeId,CreationDate,Score,ViewCount,Tags,AnswerCount,FavoriteCount'
bosy_str = '"52508","1","2019-05-24 02:59:13","1","33","<training><bias>","1",""
"52511","1","2019-05-24 06:46:23","-1","36","<r>","1",""
"52512","1","2019-05-24 07:03:04","1","69","<machine-learning><data><error-handling>","1",""
"52513","1","2019-05-24 07:07:12","2","262","<keras>","1","2"
"52515","1","2019-05-24 08:18:25","1","64","<machine-learning><clustering><pca><dimensionality-reduction>","3",""'

def clean_table(string):
    string = string.replace(",", " | ")
    string = string.replace('"', " | ")
    string = string.replace('","', ' | ')
    return string

body = clean_table(body_str)
print(body)

What I expected to happen:
I was hoping that my code would return something that I can continue to work on and eventually get a string that I can then put into a markdown cell and turn into a table. I was successful in turning the header, but I can’t get past the ‘<’ .

What actually happened:

  File "<ipython-input-16-699f5b936388>", line 3
    bosy_str = '"52508","1","2019-05-24 02:59:13","1","33","<training><bias>","1",""
                                                                                    ^
SyntaxError: EOL while scanning string literal

Thanks in advance, and happy learning!

Hi @jadelynwatling

Well if you have the file and pandas then you can use pd.read_csv() to read it and save it into a DataFrame that is the same thing than saving it into a table

table = pd.read_csv('dir/file.csv')

Where dir is the directory in wich the csv, also pd.read_csv. Check Pandas documentation for more info about it.

You can print table to see all the data or you can use the method df.head(), where df is the name of the variable in wich you saved the data ie:
table.head() should print the first 10 rows of the table. Header and 9 rows of data

Then you can use vector operations to clean and modify the data, even you can apply the function that you wrote using df[col].apply(clean_table), where col is the column of the DataFrame that you want to modify

If you have questions about it check again the Data cleaning courses or if you haven’t reach it yet, then keep going with the basics and then go back to this problem

Hope my help was useful, good luck!