This guide will show you how to scrape video views with Python with a primary focus on either scraping a single video or a number of videos. While this is fairly easy to do thanks to the python file we’ll be using, there are few parts that may require a bit more slightly more knowledge.
We are using an existing Python file created by Python Engineer. I’d highly recommend the Youtube tutorial which covers this file in greater detail. All credit to him for creating this Python file and making all our lives much easier! Link to the Youtube Playlist.
The channel and Video IDs can be identified by the URL from Youtube.
In my example, I’ll be using a data frame from an Excel File, with the 3 columns, the first being a Channel ID and the 2nd being a Video ID. For demonstration purposes, I have labelled the column names “Channel ID” and “Video ID”.
File Name: youtube_video_channel_ID.xlsx
Column 1: “Title”
Column 2: “Video ID”
Column 3: “Channel ID”
Example:
import pandas from yt_stats import YTstats from tqdm import tqdm
Add your API key to the API_KEY Object.
Example:
API_KEY = '#####Your API KEY#####'
Read “youtube_video_channel_ID.xlsx” and add to the “df” object. Then we’ll create an empty list where we’ll append the view data, later on.
Example:
df = pd.read_excel("Book2.xlsx") Views =[]
The following script will use the Channel ID and Video ID to locate the video, then using the Youtube Data API V3 will retrieve the video “Statistics”, from that taking the “viewCount”.
It’s worth noting that depending on your Data Frame headings you may need to alter the script for the correct references.
Example:
for x,y in tqdm(zip(df["Video ID"],df["Channel ID"])): channel_id = y video_id = x part = 'statistics' yt = YTstats(API_KEY, channel_id) a = yt._get_single_video_data(video_id,part) Views.append(a['viewCount'])
The data will now be in the view list. Using Pandas we can add that to our existing data frame and you can output the file to a CSV, Excel or JSON file.
Example:
df["views"] = Views df.to_excel("output.xlsx")
Once saved to excel you should something matching the following.
The Youtube Data API V3 has a quota of 5,000 queries a day. If you are planning a large scale project, you’ll have to look at paid options.
More information on the Youtube Data API V3 can be found here: https://developers.google.com/youtube/v3/determine_quota_cost
import pandas as pd from yt_stats import YTstats from tqdm import tqdm API_KEY = '#####API KEY#####' df = pd.read_excel("youtube_video_channel_ID.xlsx") Views =[] for x,y in tqdm(zip(df["Video ID"],df["Channel ID"])): channel_id = y video_id = x part = 'statistics' yt = YTstats(API_KEY, channel_id) a = yt._get_single_video_data(video_id,part) Views.append(a['viewCount']) df["views"] = Views df.to_excel("output.xlsx")