2 min read

How to use PRAW and crawl Reddit for subreddit post data?

Admin : Jan 21, 2021 9:55:26 AM

What is PRAW?

PRAW stands for “Python Reddit API Wrapper” and is an easy and fun module to start collecting data from Reddit. The official documentation can be found here:https://praw.readthedocs.io/en/latest/index.html.

Getting started, what you’ll need:

A Reddit account is required to access Reddit’s API
Basic knowledge of Python 3.6+
Client ID & Client Secret

If you don’t already have a client ID and client secret, follow Reddit’s First Steps Guide to create them.

Set-up & API Registration

Assuming you have a Reddit account already

Step 1:

Visit: https://www.reddit.com/prefs/apps

Step 2:

At the bottom, you will see “Create App” or similar depending on whether you have existing applications in your account.

Step 3:

Note down your Client ID & Client Secret (image below where to find them).

Step 4:

Install PRAW in Python. PRAW supports Python 3.6+. The recommended way to install PRAW is via pip

pip install praw

Obtaining a Submission instance from a subreddit instance.

You need an instance of the Reddit class to do anything with PRAW. The following will be looking at a “read only” submission instance. In basic terms allowing us to look at submissions in a subreddit as if you were browsing.

Creating our Read-only Reddit Instance

Create a new python file and, using the Client ID and Client Secret, enter your information. The user agent can be left as it is.

import praw

reddit = praw.Reddit(

client_id=”my client id”,

client_secret=”my client secret”,

user_agent=”my user agent” )

To test if your instance is working use:

print(reddit.read_only) # Output: True

Getting data from our chosen subreddit

Choose a subreddit that you want to get submission data for. For my example I’ll use r/pics – where everyone on LinkedIn and Twitter finds their “original” content.

A quick, simple operation – print the submission titles for the top 10 hottest posts right now. In the same python file from above add:

for submission in reddit.subreddit(“learnpython”).hot(limit=10):

print(submission.title)

You should have the top 10 post tiles printed. As seen below:

Using other Submission Attributes

With PRAW we’re able to extract a lot more than just the title posts. Below is my table, I have included others which I typically use.

Attribute	Description
author	Provides an instance of Redditor.
num_comments	The number of comments on the submission.
score	The number of upvotes for the submission.
title	The title of the submission.
url	The URL the submission links to, or the permalink if a self-post.

How to find all Submission attributes in PRAW

Since attributes are dynamic , there is no a guarantee that attributes seen in my example or other examples will always be present, nor will any list ever really be 100% accurate. The best way to see all available attributes at any given time is to use the following:

import print

# assume you have a Reddit instance bound to variable `reddit`

submission = reddit.submission(id=”39zje0″)

print(submission.title) # to make it non-lazy

pprint.pprint(vars(submission))

Final thoughts

Hopefully this has been an easy introduction to PRAW and using the Submission instance. While there is plenty more you can do from here, such as adding this all into a dataframe and using NLP to uncover sentiment and trends, we’ll leave that for another post.

If have any questions contact us at hello@honchosearch.com or find me on Linkedin.

Subscribe to our email list to receive blogs post and other Python How-Tos directly to your inbox.

Explore Our Services

DIGITAL PR

Earn authoritative links and drive brand awareness with Digital PR

PAID SEARCH

Deliver instant traffic and revenue through Paid Search and Shopping

SOCIAL ADS

Reach new audiences and retarget existing ones on social channels

CONTENT

Attract and engage website visitors with a well executed content strategy

2 min read

What is Google Search Generative Experience? (SGE)

Apr 18, 2024 10:45:53 AM

What is Google SGE? Think of Google SGE as your helpful buddy on the search results page. Instead of making you click on different websites, it pulls...

SEO Search/Social Updates

5 min read

Harnessing High Search Volume Keywords for Maximum Impact

Apr 9, 2024 4:12:33 AM

Discover the power of high search volume keywords and how to effectively use them to boost your online presence and drive maximum impact.

SEO

2 min read

Honcho partner with Eflorist to support Digital PR campaigns across Europe

Apr 1, 2024 9:37:07 AM

We're delighted to officially announce our partnership with Eflorist, one of the world’s leading flower delivery brands with over 54,000 local flower...

Digital PR SEO

How to use PRAW and crawl Reddit for subreddit post data?

What is PRAW?

Getting started, what you’ll need:

Set-up & API Registration

Step 1:

Step 2:

Step 3:

Step 4:

Obtaining a Submission instance from a subreddit instance.

Creating our Read-only Reddit Instance

Getting data from our chosen subreddit

Using other Submission Attributes

How to find all Submission attributes in PRAW

Final thoughts

Explore Our Services

DIGITAL PR

PAID SEARCH

SOCIAL ADS

CONTENT

What is Google Search Generative Experience? (SGE)

Harnessing High Search Volume Keywords for Maximum Impact

Honcho partner with Eflorist to support Digital PR campaigns across Europe

Install python modules and libraries using IDLE on MAC

How To Scrape Youtube Video Views With Python

How to use Python & Pytrends to automate Google Trends data

ABOUT US

OUR SERVICES

Tel: (+44) 01438 870220

How to use PRAW and crawl Reddit for subreddit post data?

What is PRAW?

Getting started, what you’ll need:

Set-up & API Registration

Step 1:

Step 2:

Step 3:

Step 4:

Obtaining a Submission instance from a subreddit instance.

Creating our Read-only Reddit Instance

Getting data from our chosen subreddit

Using other Submission Attributes

How to find all Submission attributes in PRAW

Final thoughts

Explore Our Services

DIGITAL PR

PAID SEARCH

SOCIAL ADS

CONTENT

What is Google Search Generative Experience? (SGE)

Harnessing High Search Volume Keywords for Maximum Impact

Honcho partner with Eflorist to support Digital PR campaigns across Europe

Install python modules and libraries using IDLE on MAC

How To Scrape Youtube Video Views With Python

How to use Python & Pytrends to automate Google Trends data

FOLLOW US ON SOCIAL