In this sample, you will see how to use Cloud Natural Language API connector and for-in iteration syntax to analyze sentiments of top Reddit posts in a subreddit.
Reddit has an API to get top posts in a subreddit. Here's an example to get top
4 posts in googlecloud
subreddit:
curl -X GET https://www.reddit.com/r/googlecloud/top.json?t=month&count=4&limit=100
You will use this API to fetch top posts in a subreddit.
Natural Language API uses machine learning to reveal the structure and meaning of texts. It has methods such as sentiment analysis, entity analysis, syntactic analysis and more.
In this example, you will use sentiment analysis. Sentiment analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral.
You can see a sample sentiment analysis response
here.
You will use score
of documentSentiment
to identify the sentiment of each
post. Score ranges between -1.0 (negative) and 1.0 (positive) and corresponds to
the overall emotional leaning of the text.
You will also calculate and average score for the average sentiment of all processed posts.
First, enable required services:
gcloud services enable \
workflows.googleapis.com \
language.googleapis.com
Create a workflow.yaml
to define the workflow.
In the init
step, read subreddit name and number of posts to read as runtime
arguments and initialize totalScore to keep track of total sentiment:
main:
params: [args]
steps:
- init:
assign:
- subreddit: ${args.subreddit}
- count: ${args.count}
- totalScore: 0
In the second step, fetch top posts using the Reddit API:
- getTopPosts:
call: http.get
args:
url: ${"https://www.reddit.com/r/" + subreddit + "/top.json?t=month&count=" + limit + "&limit=" + limit}
result: topPostsResult
Next, analyze each post in a for-in
loop using the Language API connector:
- processPosts:
for:
value: post
in: ${topPostsResult.body.data.children}
steps:
- analyzeSentiment:
call: googleapis.language.v1.documents.analyzeSentiment
args:
body:
document:
content: ${post.data.title + " " + post.data.selftext}
type: "PLAIN_TEXT"
result: sentimentResult
- updateTotalScore:
assign:
- currentScore: ${sentimentResult.documentSentiment.score}
- totalScore: ${totalScore + currentScore}
- logPost:
call: sys.log
args:
text: ${post.data.title + " " + post.data.selftext + ":" + string(currentScore) + " " + string(totalScore)}
Finally calculate the average sentiment score and return the result:
- assignResult:
assign:
- numberOfPosts: ${len(topPostsResult.body.data.children)}
- avgSentiment: ${totalScore / numberOfPosts}
- logResult:
call: sys.log
args:
text: ${"Total score:" + string(totalScore) + " for n:" + string(numberOfPosts) + " posts with average sentiment:" + string(avgSentiment)}
- returnResult:
return: ${avgSentiment}
You can see the full workflow.yaml.
Deploy workflow:
gcloud workflows deploy reddit-sentiment \
--source=workflow.yaml
Execute workflow:
gcloud workflows execute reddit-sentiment \
--data='{"subreddit":"googlecloud","count":"4"}'
After a couple of seconds, you should see the see the average sentiment under result
:
gcloud workflows executions describe bcf52313-4ce9-4c4f-9b5e-2f461223923f --workflow reddit-sentiment --location us-central1
argument: '{"count":"4","subreddit":"googlecloud"}'
endTime: '2021-05-27T13:46:52.402307280Z'
name: projects/1011272509317/locations/us-central1/workflows/reddit-sentiment/executions/bcf52313-4ce9-4c4f-9b5e-2f461223923f
result: '-0.25'
startTime: '2021-05-27T13:46:50.202583444Z'
state: SUCCEEDED
workflowRevisionId: 000001-49b