This project provides modular tools for fetching and analyzing Google Analytics 4 (GA4) data.
Note: Replace
yourpage.comwith your actual website domain in all examples and code.
The project has been organized into the following modules:
ga4_client.py: Handles authentication and GA4 client initializationga4_data_fetcher.py: Provides functions to fetch user data from GA4user_classification.py: Classifies user counts into milestone categoriesbatch_processor.py: Contains functions for batch processing URLs from CSV filesvisualization.py: Provides tools for creating charts and visualizations from GA4 dataga4_fetcher.py: Main script for batch processing multiple URLs from a CSV filega4_fetcher_uptodate.py: Specialized script for analyzing URLs from publication date to a fixed end datesingle_url_analysis.py: Example script for analyzing a single URLdate_range_analytics.py: Script for analyzing multiple URLs for a specific date rangeurl_trend_analysis.py: Script for analyzing a URL's performance over multiple time periods
- Python 3.6+
- Google Analytics 4 property
- Service account with GA4 access
- Clone or download this repository
- Install the required dependencies:
pip install -r requirements.txtIf you only need basic functionality without visualization:
pip install pandas google-analytics-dataThe input CSV file for batch processing should contain at least the following columns:
url: The full URL of the page (e.g., "https://www.yourpage.com/article-path")date_published: The publication date of the content in YYYY-MM-DD format (e.g., "2023-07-15")
Example:
url,date_published
https://www.yourpage.com/article1,2023-06-01
https://www.yourpage.com/article2,2023-06-15
https://www.yourpage.com/article3,2023-07-01
The single_url_analysis.py script provides a quick way to check traffic for a single URL over a specific number of days up to the current date. This is useful for:
- Quick traffic checks for individual pages
- Testing your GA4 setup and authentication
- Ad-hoc reporting needs
python single_url_analysis.py --url "https://www.yourpage.com/example-page" --days 30Additional options:
python single_url_analysis.py \
--url "https://www.yourpage.com/example-page" \
--days 90 \
--credentials "path/to/your-credentials.json" \
--property-id "your-property-id"The main ga4_fetcher.py script processes multiple URLs from a CSV file and analyzes their performance over different time periods after their publication date. This is ideal for:
- Analyzing how content performs at different stages of its lifecycle
- Comparing performance of content at similar ages
- Bulk processing of many URLs
- Creating comprehensive performance reports
python ga4_fetcher.py --days 30 60 90 --input-file your_input.csvThe output will be saved to a file named ga4_your_input.csv automatically.
Additional options:
python ga4_fetcher.py \
--days 30 90 180 360 \
--input-file your_input.csv \
--credentials "path/to/your-credentials.json" \
--property-id "your-property-id" \
--sleep-time 5The --sleep-time parameter controls how many seconds to wait between API requests to avoid rate limiting.
The ga4_fetcher_uptodate.py script is a specialized variant of the main fetcher that retrieves analytics data for URLs from their publication date up to a specific end date provided by the user. This is useful for:
- Creating point-in-time snapshots of content performance
- Generating historical reports showing how content performed up to a specific date
- Comparing how different pieces of content performed at a fixed calendar point
- Retrospective analysis of content performance
Unlike the main fetcher which analyzes fixed periods after publication (e.g., first 30 days), this script analyzes from publication date to a single fixed end date (the same for all URLs).
python ga4_fetcher_uptodate.py --date 2023-12-31 --input_file your_input.csvThe output will be saved to a file named ga4_uptodate_your_input.csv automatically.
Additional options:
python ga4_fetcher_uptodate.py \
--date 2023-12-31 \
--input_file your_input.csv \
--credentials "path/to/your-credentials.json" \
--property_id "your-property-id" \
--sleep 5The --sleep parameter controls how many seconds to wait between API requests to avoid rate limiting.
The date_range_analytics.py script allows you to analyze multiple URLs for the same fixed date range. This is useful for:
- Comparing the performance of different pages during the same time period
- Analyzing seasonal traffic patterns across multiple pages
- Measuring the impact of marketing campaigns on different content
- Generating reports for specific reporting periods (monthly, quarterly, etc.)
Unlike ga4_fetcher.py which analyzes URLs from their publication date, this script applies the exact same date range to all URLs.
python date_range_analytics.py --start-date 2024-01-01 --end-date 2024-01-31 \
--urls "https://www.yourpage.com/url1" "https://www.yourpage.com/url2"The script will output a CSV file with:
- URL
- Start date
- End date
- User count
- Category classification
- Detailed category classification
The url_trend_analysis.py script helps you understand how traffic to a specific URL grows over different time periods from its publication date. This is useful for:
- Tracking the traffic growth pattern of content
- Understanding how quickly content reaches different user milestones
- Visualizing long-term performance trajectory
- Identifying which time periods show significant growth
The script can generate both tabular data and a visualization graph showing the growth curve.
python url_trend_analysis.py --url "https://www.yourpage.com/example-page" \
--start-date 2024-01-01 --periods 7 30 90 180The script will output:
- A CSV file with data for each time period
- A PNG graph visualization (if matplotlib is installed)
- A summary in the terminal showing total users and growth patterns
If you want to incorporate these functions in your own scripts, you can import the modules directly. This allows you to build custom analysis tools or integrate GA4 data into other applications.
from ga4_client import initialize_analytics_client
from ga4_data_fetcher import get_users_for_url
from user_classification import classify_users
# Initialize client
client = initialize_analytics_client("path/to/credentials.json")
# Get user data for your URL
url = "https://www.yourpage.com/your-article" # Replace with your actual URL
users = get_users_for_url(client, "your-property-id", url, "2023-01-01", "2023-01-31")
# Classify users
category = classify_users(users)
print(f"User category: {category}")import pandas as pd
from ga4_client import initialize_analytics_client
from ga4_data_fetcher import get_users_for_url
# Initialize client
client = initialize_analytics_client("path/to/credentials.json")
property_id = "your-property-id"
# Define URLs and date range
urls = [
"https://www.yourpage.com/article1",
"https://www.yourpage.com/article2",
"https://www.yourpage.com/article3"
]
start_date = "2023-01-01"
end_date = "2023-01-31"
# Collect data
results = []
for url in urls:
users = get_users_for_url(client, property_id, url, start_date, end_date)
results.append({"url": url, "users": users})
# Create DataFrame and analyze
df = pd.DataFrame(results)
print(f"Total users: {df['users'].sum()}")
print(f"Average users per URL: {df['users'].mean():.2f}")
print(f"Best performing URL: {df.loc[df['users'].idxmax()]['url']}")The visualization module can be used as both a command-line tool and an imported library.
After collecting data with any of the GA4 fetcher scripts, you can directly visualize the results:
# Basic usage - automatically detects data type and creates appropriate chart
python visualization.py ga4_your_data.csv
# Specify output file
python visualization.py ga4_your_data.csv --output my_chart.png
# Choose chart type
python visualization.py ga4_your_data.csv --type bar
# Create interactive HTML visualization (requires plotly)
python visualization.py ga4_your_data.csv --interactiveThe tool automatically detects the type of GA4 data in your CSV file (url trend analysis, batch processing, date range analysis, etc.) and creates an appropriate visualization.
You can also import the visualization functions in your own Python scripts:
from visualization import create_trend_chart, create_bar_chart, visualize_from_csv
import pandas as pd
# Directly visualize a CSV file
visualize_from_csv('ga4_your_data.csv', output_file='chart.png')
# Or work with DataFrame data manually
df = pd.DataFrame({
'days': [7, 30, 90, 180, 360],
'users': [1200, 5400, 12500, 28000, 45000]
})
# Create a trend line chart
create_trend_chart(
df=df,
x_column='days',
y_column='users',
title='User Growth Over Time',
subtitle='https://www.yourpage.com/article',
x_label='Days Since Publication',
y_label='Total Users',
output_file='trend_chart.png'
)
# Create a bar chart for comparison
create_bar_chart(
df=df,
x_column='days',
y_column='users',
title='User Growth by Time Period',
output_file='bar_chart.png'
)For interactive charts (requires plotly):
from visualization import save_interactive_html
# Create an interactive HTML chart
save_interactive_html(
df=df,
x_column='days',
y_column='users',
title='Interactive User Growth Chart',
output_file='interactive_chart.html'
)-
Collect data using the appropriate GA4 fetcher script:
python ga4_fetcher.py --days 30 90 180 --input-file your_urls.csv
-
Visualize the results:
python visualization.py ga4_your_urls.csv
This separation allows you to:
- Collect data once and create multiple visualizations
- Share CSV data files with colleagues who can visualize them without API access
- Batch process data collection overnight and review visualizations later
This section provides a step-by-step example workflow using ga4_fetcher.py from data preparation to visualization and analysis.
Create a CSV file with your URLs and publication dates. For example, save the following as content_analysis.csv:
url,date_published
https://www.yourpage.com/report-2024,2024-01-15
https://www.yourpage.com/analysis-2024,2024-02-01
https://www.yourpage.com/news-update-2024,2024-03-10
https://www.yourpage.com/feature-story,2024-04-22
https://www.yourpage.com/special-report,2024-05-05
Run the ga4_fetcher.py script to collect analytics data for different time periods after publication:
python ga4_fetcher.py --days 7 30 90 --input-file content_analysis.csv --property-id "your-property-id" --credentials "your-credentials.json"This command:
- Analyzes each URL for three time periods: 7 days, 30 days, and 90 days after publication
- Uses your GA4 property ID and service account credentials
- Processes all URLs in the input file
The script will:
- Read your input CSV file
- Connect to the GA4 API using your credentials
- For each URL, calculate the date ranges based on publication date
- Fetch user counts for each time period
- Classify results into performance categories
- Save the output to
ga4_content_analysis.csv
The output file ga4_content_analysis.csv will contain the following columns:
url: The original URLdate_published: The content publication dateusers_7_days: Number of users in the first 7 daysusers_30_days: Number of users in the first 30 daysusers_90_days: Number of users in the first 90 daysmilestone_7_days: Performance category for the 7-day periodmilestone_30_days: Performance category for the 30-day periodmilestone_90_days: Performance category for the 90-day perioddetailed_milestone_7_days: Detailed performance category for 7-day perioddetailed_milestone_30_days: Detailed performance category for 30-day perioddetailed_milestone_90_days: Detailed performance category for 90-day period
Example output:
url,date_published,users_7_days,users_30_days,users_90_days,milestone_7_days,milestone_30_days,milestone_90_days,detailed_milestone_7_days,detailed_milestone_30_days,detailed_milestone_90_days
https://www.yourpage.com/report-2024,2024-01-15,1250,4300,8900,Medium,Medium,Medium,1k-5k,1k-10k,5k-10k
https://www.yourpage.com/analysis-2024,2024-02-01,750,2100,5600,Low,Low,Medium,500-1k,1k-5k,5k-10k
https://www.yourpage.com/news-update-2024,2024-03-10,4500,15000,32000,High,High,High,1k-5k,10k-50k,10k-50k
...
Create visualizations from the output data using the visualization.py module:
# Create a default visualization based on detected data type
python visualization.py ga4_content_analysis.csv
# Create a bar chart showing 30-day performance
python visualization.py ga4_content_analysis.csv --type bar
# Create an interactive HTML visualization (requires plotly)
python visualization.py ga4_content_analysis.csv --interactiveFor deeper analysis, you can import the data into your own Python scripts:
import pandas as pd
from visualization import create_comparison_chart
# Load the data
df = pd.read_csv('ga4_content_analysis.csv')
# Filter to most successful content
top_performers = df.sort_values(by='users_90_days', ascending=False).head(3)
# Create separate DataFrames for comparison
df_list = []
labels = []
for _, row in top_performers.iterrows():
url_short = row['url'].split('/')[-1]
# Create a DataFrame with time periods and user counts
data = {
'days': [7, 30, 90],
'users': [row['users_7_days'], row['users_30_days'], row['users_90_days']]
}
df_list.append(pd.DataFrame(data))
labels.append(url_short)
# Create a comparison chart
create_comparison_chart(
df_list=df_list,
labels=labels,
x_column='days',
y_column='users',
title='Top Content Performance Over Time',
output_file='top_content_comparison.png'
)For ongoing content monitoring, set up a scheduled task to run the analysis regularly:
# Create a bash script named analyze_content.sh
echo '#!/bin/bash
python ga4_fetcher.py --days 7 30 90 --input-file content_analysis.csv
python visualization.py ga4_content_analysis.csv --output weekly_report.png
' > analyze_content.sh
# Make it executable
chmod +x analyze_content.sh
# Add to crontab to run weekly (adjust path as needed)
# crontab -e
# 0 7 * * 1 /path/to/analyze_content.shThis workflow enables you to consistently monitor content performance across different time periods, identify trends, and optimize your content strategy based on data-driven insights.
To use this toolkit, you need to set up authentication with Google Analytics 4:
- Go to Google Cloud Console
- Create a new project or select an existing one
- Make note of your project ID
- In your Google Cloud project, navigate to "APIs & Services" > "Library"
- Search for "Google Analytics Data API"
- Click on the API and select "Enable"
- In your Google Cloud project, navigate to "IAM & Admin" > "Service Accounts"
- Click "Create Service Account"
- Enter a name and description for your service account
- Click "Create and Continue"
- (Optional) Grant the service account a role in your project (not required for GA4 access)
- Click "Done"
- In the service accounts list, click on your newly created service account
- Go to the "Keys" tab
- Click "Add Key" > "Create new key"
- Select JSON format
- Click "Create" - this will download a JSON key file
- Keep this key file secure - it will be used to authenticate with GA4
- Log in to your Google Analytics 4 account
- Navigate to Admin > Property > Property Access Management
- Click the "+" button to add a user
- Enter the email address of your service account (it looks like:
[email protected]) - Assign "Viewer" or "Analyst" role (Viewer is sufficient for read-only access)
- Click "Add"
- In Google Analytics 4, go to Admin > Property Settings
- Find your Property ID (it's a number like "315823153")
- Place your downloaded JSON key file in your project directory
- Update the
credentials_pathparameter in your scripts to point to this file - Update the
property_idparameter with your GA4 Property ID
Now you're ready to use the GA4 Data Fetcher toolkit!
Note: For information about the GA4 Data API dimensions, metrics, and filter specifications used in this project, refer to the official Google Analytics 4 Data API Schema documentation.
If you encounter authentication errors:
- Check your credentials file: Ensure the JSON key file exists and is properly formatted
- Verify service account permissions: Make sure the service account has been added to your GA4 property with proper permissions
- Check property ID: Verify that you're using the correct GA4 property ID
- API enablement: Ensure the Google Analytics Data API is enabled in your Google Cloud project
If your queries return zero users or no data:
- Verify URL format: Ensure URLs include the full path with "https://" prefix
- Check domain in ga4_data_fetcher.py: Update the domain check in
create_url_regex_pattern()to match your site - Date range issues: Ensure the date range is valid and within the time your GA4 property has been collecting data
- Property configuration: Verify that your GA4 property is correctly collecting data for the URLs you're querying
If you encounter rate limit errors:
- Increase sleep time: Use the
--sleep-timeparameter to increase the wait between API requests - Reduce batch size: Process fewer URLs at once
- Check quotas: Review your Google Cloud project quotas for the Analytics Data API
If you have issues installing dependencies:
- Upgrade pip:
pip install --upgrade pip - Check Python version: Ensure you're using Python 3.6+
- Virtual environment: Consider using a virtual environment for clean installation
- Dependencies: If only installing core packages:
pip install google-analytics-data pandas
- Google Analytics 4 Data API Schema - Official documentation for the GA4 Data API, including dimensions, metrics, and filter specifications used in this project.