Data Scrapping - YouTube channel analysis

The goal of my project is to help a friend by analysing basic data from his YouTube Channel called Player1 Games. However, before going straight to data analysis I have to do some data scrapping to download a single json file. I have followed a tutorial on data scrapping found here.

The issue: for some reason the for loop on cell 10 does not recognize videoId in video_Id = v['id']['videoId']. It works fine with other values but not with videoId.

I have uploaded the jupyter notebook file (Python)

I appreciate any help! Thank you in advance.
Renato

YouTube Channel - Player1.ipynb (66.1 KB)

Click here to view the jupyter notebook file in a new tab

Hi @boemer00,

I copied your JSON file into JSON Editor Online. And when trying to convert it into tree view, I got this error:

Failed to switch to “tree” mode because of Invalid JSON:

Parse error on line 1:
{'kind': 'youtube#sea
-^
Expecting 'STRING', '}', got 'undefined'

After a quick google, apparently, you have to use double quotes for JSON strings. I see you have mixed single quotes and double quotes in the last cell. I would suggest starting with replacing the single quotes with double quotes in keys.

Hope this helps!


Update:

I just noticed there’s a ‘fix JSON’ button in JSON Editor Online. Here’s a screenshot of the tree view and you can see not all items have "videoId" under "id", some have "playlistId" instead.

Hi @veratsien

Thanks for your feedback. I didn’t know about the JSON Editor Online, this is going to be really helpful. Also, you are correct, there is not only videoId.

I thought of either replacing playlistId for videoId or trying to grab both keys in the for loop. Which one would you recommend? (I don’t know how to do the latter).

Once again, thank you so much!

1 Like

@boemer00 Glad to be of help. :grinning:

I think it depends on what your goals are in the project, and if the id kind matters. Personally I don’t recommend changing the JSON file. There are a lot of ways to grab both keys, in the for loop or not. Off the top of my head, you can grab everything in "id" and sort out the id "kind" and "videoId" or "playlistID" later in a data frame, or use an if else statement to check ["id"]["kind"] value first and know if you should retrieve "videoId" or "playlistId".

1 Like