4. Data access: Social media API
Learning goals
By the end of this tutorial, you will be able to:
- Explain how APIs enable researchers to collect data from social media platforms
- Retrieve Reddit data using RedditExtractoR, including posts, threads, and comments
- Authenticate and collect data from the YouTube API using the tuber package
- Filter and organize retrieved data by keyword, channel, and time frame
- Save and manage collected data for later cleaning and analysis
- Automate tasks in R: For() loops and apply() functions
1. Reddit Data Collection
1.1 Using RedditExtractoR (Wrapper for the Reddit API)
The RedditExtractoR package allows collection of posts, comments, and metadata from Reddit.
CRAN documentation:
https://cran.r-project.org/web/packages/RedditExtractoR/RedditExtractoR.pdf
Important limitation:
Most queries return roughly up to 1,000 posts per subreddit or search. This works best for recent or popular content rather than full historical archives.
Step 1: Find Relevant Subreddits
This returns a data frame with subreddit names, descriptions, and subscriber counts.
Step 2: Retrieve Thread URLs
Search by keyword:
If you want to use multiple keywords (“a” OR “b” OR “c”):
Step 3: Extract Post and Comment Content
Step 4: Retrieve Data from a Specific User
Filtering by Date
1.2 Using the Reddit API Directly
For more control, use Reddit’s official API.
You must create an app and authenticate using:
- client ID
- client secret
- username and password
Documentation:
https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki
Rate limit: ~60 requests per minute for personal use.
Steps:
- Go to http://reddit.com/prefs/apps
- Click “Create App”
- Select type: Script
- Redirect URI: http://localhost:1410
- Save client ID and client secret
2. YouTube API
Requirements:
- Google account
- API credentials
- Daily quota: 10,000 units
Install and Authenticate
After authentication, return to R when prompted.
Retrieve Channel Information
To find a channel ID:
YouTube → Channel → About → Share → Copy channel ID
Search Videos by Keyword and Date
Retrieve Video Details and Comments
Optional table display:
Retrieve Captions
3. Loops in R
A for() loop repeats code for each element in a sequence.
[1] "Student 1"
[1] "Student 2"
[1] "Student 3"
[1] "Student 4"
[1] "Student 5"
Example with text:
[1] "I love Spring"
[1] "I love Summer"
[1] "I love Fall"
[1] "I love Winter"
lapply() Function
Applies a function to each element of a vector or list.
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
[[1]]
[1] "SNOW"
[[2]]
[1] "RAIN"
[[3]]
[1] "SUN"
Looping Over Reddit Threads
Using a for loop:
Bonus: Notifications for Long Scripts
This plays a sound when code finishes running.
Summary
This module introduced practical techniques for collecting digital data directly from online platforms. Because APIs and platform policies change frequently, always verify current documentation before starting a project.
Social Media Data Collection
Disclaimer
Access to social media APIs can change at any time depending on platform policies, and specific datasets may not remain available. Researchers should understand how APIs function so they do not rely on a single platform and can adapt to new data sources. Always review current platform policies and documentation before collecting data.
This module demonstrates how to use social media APIs to retrieve data in R. APIs allow researchers to communicate with platforms and request structured data. Note that platforms regulate the type and volume of accessible data, and users must follow each platform’s terms of service. This form of data collection requires basic coding skills developed earlier in the semester.