Getting Views From Youtube Videos
This guide will show you how to scrape video views with Python with a primary focus on either scraping a single video or a number of videos. While this is fairly easy to do thanks to the python file we’ll be using, there are few parts that may require a bit more slightly more knowledge.
We are using an existing Python file created by Python Engineer. I’d highly recommend the Youtube tutorial which covers this file in greater detail. All credit to him for creating this Python file and making all our lives much easier! Link to the Youtube Playlist.
Overview of the process
- You will need an API key from https://console.cloud.google.com/
- You will also need both the Video ID and Channel ID ideally in a list or Data Frame enabling us to iterate over the values
- Installation of pandas
- The following Python file from Python Engineer: https://github.com/python-engineer/youtube-analyzer
Prerequisites and set-up used
- Python Version: 3.9.4
- IDE: Visual Studio Code
- Libraries: Pandas, TQDM & yt_stats
- Google Cloud: Youtube Data API V3
- OS: Big Sur – 11.3.1
How to Get Video & Channel IDs
The channel and Video IDs can be identified by the URL from Youtube.
- The channel URL is, https://www.youtube.com/user/marksandspencertv with the string “marksandspencertv” being the Channel ID.
- The Video ID for M&S Christmas’s TV Ad from 2017 is https://www.youtube.com/watch?v=KfaSxIkLslE, with the “KfaSxIkLslE” being the Video ID.
Step 1: Adding your Channel & Video ID to a Dataframe
In my example, I’ll be using a data frame from an Excel File, with the 3 columns, the first being a Channel ID and the 2nd being a Video ID. For demonstration purposes, I have labelled the column names “Channel ID” and “Video ID”.
File Name: youtube_video_channel_ID.xlsx
Column 1: “Title”
Column 2: “Video ID”
Column 3: “Channel ID”
Step 2: Install & Import Files
- Save the yt_stats.py file in the directory of your script
- Import Pandas, yt_stats and tqdm
Example:
import pandas from yt_stats import YTstats from tqdm import tqdm
Step 3: Add your API Key
Add your API key to the API_KEY Object.
Example:
API_KEY = '#####Your API KEY#####'
Step 4: Read your Excel/CSV file and create an empty list
Read “youtube_video_channel_ID.xlsx” and add to the “df” object. Then we’ll create an empty list where we’ll append the view data, later on.
Example:
df = pd.read_excel("Book2.xlsx") Views =[]
Step 5: Create our script which will retrieve the view data.
The following script will use the Channel ID and Video ID to locate the video, then using the Youtube Data API V3 will retrieve the video “Statistics”, from that taking the “viewCount”.
It’s worth noting that depending on your Data Frame headings you may need to alter the script for the correct references.
Example:
for x,y in tqdm(zip(df["Video ID"],df["Channel ID"])): channel_id = y video_id = x part = 'statistics' yt = YTstats(API_KEY, channel_id) a = yt._get_single_video_data(video_id,part) Views.append(a['viewCount'])
Step 6: Retrieving our data
The data will now be in the view list. Using Pandas we can add that to our existing data frame and you can output the file to a CSV, Excel or JSON file.
Example:
df["views"] = Views df.to_excel("output.xlsx")
Output & notes
Once saved to excel you should something matching the following.
The Youtube Data API V3 has a quota of 5,000 queries a day. If you are planning a large scale project, you’ll have to look at paid options.
More information on the Youtube Data API V3 can be found here: https://developers.google.com/youtube/v3/determine_quota_cost
Full Code:
import pandas as pd from yt_stats import YTstats from tqdm import tqdm API_KEY = '#####API KEY#####' df = pd.read_excel("youtube_video_channel_ID.xlsx") Views =[] for x,y in tqdm(zip(df["Video ID"],df["Channel ID"])): channel_id = y video_id = x part = 'statistics' yt = YTstats(API_KEY, channel_id) a = yt._get_single_video_data(video_id,part) Views.append(a['viewCount']) df["views"] = Views df.to_excel("output.xlsx")