-
Notifications
You must be signed in to change notification settings - Fork 79
feat(RubricEvaluation): implement rubric auto-grading #7941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
d428f24
refactor(RubricPanel): remove redundant useState for category grades
ncduy0303 795baff
feat(rubric_auto_grading): implement AI-assisted rubric auto-grading …
ncduy0303 ac24c46
feat(locales): update phrasing for "automatic programming feedback" i…
ncduy0303 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
35 changes: 35 additions & 0 deletions
35
app/services/course/assessment/answer/prompts/rubric_auto_grading_output_format.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
{ | ||
"_type": "json_schema", | ||
"type": "object", | ||
"properties": { | ||
"category_grades": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"category_id": { | ||
"type": "number", | ||
"description": "The ID of the rubric category" | ||
}, | ||
"criterion_id": { | ||
"type": "number", | ||
"description": "The ID of the criterion within the rubric category, must be one of the listed criteria for the rubric category" | ||
}, | ||
"explanation": { | ||
"type": "string", | ||
"description": "An explanation for why the criterion was selected" | ||
} | ||
}, | ||
"required": ["category_id", "criterion_id", "explanation"], | ||
"additionalProperties": false | ||
}, | ||
"description": "A list of criterions selected for each rubric category with explanations" | ||
}, | ||
"overall_feedback": { | ||
"type": "string", | ||
"description": "General feedback about the student's response, provided in HTML format and focused on how the student can improve according to the rubric" | ||
} | ||
}, | ||
"required": ["category_grades", "overall_feedback"], | ||
"additionalProperties": false | ||
} |
5 changes: 5 additions & 0 deletions
5
app/services/course/assessment/answer/prompts/rubric_auto_grading_system_prompt.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"_type": "prompt", | ||
"input_variables": ["format_instructions"], | ||
"template": "You are an expert grading assistant for educational assessments.\nYour task is to grade a student's response to a rubric-based question.\nYou will be provided with:\n1. The question prompt\n2. The rubric categories and criteria\n3. The student's response\n\nYou must analyze how well the student's response meets each rubric category's criteria\nand provide feedback accordingly.\n\nSpecial Note on Moderation:\nIf a rubric category is labeled as \"moderation\" and does not contain any grading criteria, do not attempt to assign a criterion. Instead, return `criterion_id: 0` and provide a neutral explanation such as \"No moderation criteria were assessed.\" This category is reserved for manual adjustment by educators.\n\nThe `overall_feedback` field **must be written in HTML** to support rich text rendering, and it **must emphasize how the student can improve their response** according to the rubric criteria.\n\n{format_instructions}" | ||
} |
5 changes: 5 additions & 0 deletions
5
app/services/course/assessment/answer/prompts/rubric_auto_grading_user_prompt.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"_type": "prompt", | ||
"input_variables": ["question_title", "question_description", "rubric_categories", "answer_text"], | ||
"template": "QUESTION:\n{question_title}\n{question_description}\n\nRUBRIC CATEGORIES:\n{rubric_categories}\n\nSTUDENT RESPONSE:\n{answer_text}" | ||
} |
162 changes: 162 additions & 0 deletions
162
app/services/course/assessment/answer/rubric_auto_grading_service.rb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
# frozen_string_literal: true | ||
class Course::Assessment::Answer::RubricAutoGradingService < | ||
Course::Assessment::Answer::AutoGradingService | ||
def evaluate(answer) | ||
answer.correct, grade, messages, feedback = evaluate_answer(answer.actable) | ||
answer.auto_grading.result = { messages: messages } | ||
create_ai_generated_draft_post(answer, feedback) | ||
grade | ||
end | ||
|
||
private | ||
|
||
# Grades the given answer. | ||
# | ||
# @param [Course::Assessment::Answer::RubricBasedResponse] answer The answer specified by the | ||
# @return [Array<(Boolean, Integer, Object, String)>] The correct status, grade, messages to be | ||
# assigned to the grading, and feedback for the draft post. | ||
def evaluate_answer(answer) | ||
question = answer.question.actable | ||
llm_service = Course::Assessment::Answer::RubricLlmService.new | ||
llm_response = llm_service.evaluate(question, answer) | ||
process_llm_grading_response(question, answer, llm_response) | ||
end | ||
|
||
# Processes the LLM response into grades and feedback, and updates the answer. | ||
# @param [Course::Assessment::Question] question The question to be graded. | ||
# @param [Course::Assessment::Answer::RubricBasedResponse] answer The answer to update. | ||
# @param [Hash] llm_response The parsed LLM response containing grading information | ||
# @return [Array<(Boolean, Integer, Object, String)>] The correct status, grade, and feedback messages. | ||
def process_llm_grading_response(question, answer, llm_response) | ||
category_grades = process_category_grades(question, llm_response) | ||
|
||
# For rubric-based questions, update the answer's selections and grade to database | ||
update_answer_selections(answer, category_grades) | ||
grade = update_answer_grade(answer, category_grades) | ||
|
||
# Currently no support for correctness in rubric-based questions | ||
[true, grade, ['success'], llm_response['overall_feedback']] | ||
end | ||
|
||
# Processes category grades from LLM response into a structured format | ||
# @param [Course::Assessment::Question] question The question to be graded. | ||
# @param [Hash] llm_response The parsed LLM response with category grades | ||
# @return [Array<Hash>] Array of processed category grades. | ||
def process_category_grades(question, llm_response) | ||
category_lookup = question.categories.index_by(&:id) | ||
llm_response['category_grades'].filter_map do |category_grade| | ||
category = category_lookup[category_grade['category_id']] | ||
next unless category | ||
|
||
# Skip 'moderation' category as it does not have criterions | ||
criterion = category.criterions.find { |c| c.id == category_grade['criterion_id'] } | ||
next unless criterion | ||
|
||
{ | ||
category_id: category_grade['category_id'], | ||
criterion_id: criterion&.id, | ||
grade: criterion&.grade, | ||
explanation: category_grade['explanation'] | ||
} | ||
end | ||
end | ||
|
||
# Updates the answer's selections and total grade based on the graded categories. | ||
# | ||
# @param [Course::Assessment::Answer::RubricBasedResponse] answer The answer to update. | ||
# @param [Array<Hash>] category_grades The processed category grades. | ||
# @return [void] | ||
def update_answer_selections(answer, category_grades) | ||
if answer.selections.empty? | ||
answer.create_category_grade_instances | ||
answer.reload | ||
end | ||
selection_lookup = answer.selections.index_by(&:category_id) | ||
params = { | ||
selections_attributes: category_grades.map do |grade_info| | ||
selection = selection_lookup[grade_info[:category_id]] | ||
next unless selection | ||
|
||
{ | ||
id: selection.id, | ||
criterion_id: grade_info[:criterion_id], | ||
grade: grade_info[:grade], | ||
explanation: grade_info[:explanation] | ||
} | ||
end.compact | ||
} | ||
answer.assign_params(params) | ||
end | ||
|
||
# Updates the answer's total grade based on the graded categories. | ||
# @param [Course::Assessment::Answer::RubricBasedResponse] answer The answer to update. | ||
# @param [Array<Hash>] category_grades The processed category grades. | ||
# @return [Integer] The new total grade for the answer. | ||
def update_answer_grade(answer, category_grades) | ||
grade_lookup = category_grades.to_h { |info| [info[:category_id], info[:grade]] } | ||
total_grade = answer.selections.sum do |selection| | ||
grade_lookup[selection.category_id] || selection.criterion&.grade || selection.grade || 0 | ||
end | ||
answer.grade = total_grade | ||
total_grade | ||
end | ||
|
||
# Builds a draft post with AI-generated feedback | ||
# @param [Course::Assessment::SubmissionQuestion] submission_question The submission question | ||
# @param [Course::Assessment::Answer] answer The answer | ||
# @param [String] feedback The feedback text | ||
# @return [Course::Discussion::Post] The built post | ||
def build_draft_post(submission_question, answer, feedback) | ||
submission_question.posts.build( | ||
creator: User.system, | ||
updater: User.system, | ||
text: feedback, | ||
is_ai_generated: true, | ||
workflow_state: 'draft', | ||
title: answer.submission.assessment.title | ||
) | ||
end | ||
|
||
# Saves the draft post and updates the submission question | ||
# @param [Course::Assessment::SubmissionQuestion] submission_question The submission question | ||
# @param [Course::Discussion::Answer] answer The answer to associate with the post | ||
# @param [Course::Discussion::Post] post The post to save | ||
# @return [void] | ||
def save_draft_post(submission_question, answer, post) | ||
submission_question.class.transaction do | ||
if submission_question.posts.length > 1 | ||
post.parent = submission_question.posts.ordered_topologically.flatten.select(&:id).last | ||
end | ||
post.save! | ||
submission_question.save! | ||
create_topic_subscription(post.topic, answer) | ||
post.topic.mark_as_pending | ||
end | ||
end | ||
|
||
# Creates a subscription for the discussion topic of the answer post | ||
# @param [Course::Assessment::Answer] answer The answer to create the subscription for | ||
# @param [Course::Discussion::Topic] discussion_topic The discussion topic to subscribe to | ||
# @return [void] | ||
def create_topic_subscription(discussion_topic, answer) | ||
# Ensure the student who wrote the answer amd all group managers | ||
# gets notified when someone comments on his answer | ||
discussion_topic.ensure_subscribed_by(answer.submission.creator) | ||
answer_course_user = answer.submission.course_user | ||
answer_course_user.my_managers.each do |manager| | ||
discussion_topic.ensure_subscribed_by(manager.user) | ||
end | ||
end | ||
|
||
# Creates AI-generated draft feedback post for the answer | ||
# @param [Course::Assessment::Answer] answer The answer to create the post for | ||
# @param [String] feedback The feedback text to include in the post | ||
# @return [void] | ||
def create_ai_generated_draft_post(answer, feedback) | ||
submission_question = answer.submission.submission_questions.find_by(question_id: answer.question_id) | ||
return unless submission_question | ||
|
||
post = build_draft_post(submission_question, answer, feedback) | ||
save_draft_post(submission_question, answer, post) | ||
end | ||
end |
72 changes: 72 additions & 0 deletions
72
app/services/course/assessment/answer/rubric_llm_service.rb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# frozen_string_literal: true | ||
class Course::Assessment::Answer::RubricLlmService | ||
@output_parser = Langchain::OutputParsers::StructuredOutputParser.from_json_schema( | ||
JSON.parse( | ||
File.read('app/services/course/assessment/answer/prompts/rubric_auto_grading_output_format.json') | ||
) | ||
) | ||
@system_prompt = Langchain::Prompt.load_from_path( | ||
file_path: 'app/services/course/assessment/answer/prompts/rubric_auto_grading_system_prompt.json' | ||
).format(format_instructions: @output_parser.get_format_instructions) | ||
@user_prompt = Langchain::Prompt.load_from_path( | ||
file_path: 'app/services/course/assessment/answer/prompts/rubric_auto_grading_user_prompt.json' | ||
) | ||
|
||
class << self | ||
attr_reader :system_prompt, :user_prompt, :output_parser | ||
end | ||
|
||
# Calls the LLM service to evaluate the answer. | ||
# | ||
# @param [Course::Assessment::Question::RubricBasedResponse] question The question to be graded. | ||
# @param [Course::Assessment::Answer::RubricBasedResponse] answer The student's answer. | ||
# @return [Hash] The LLM's evaluation response. | ||
def evaluate(question, answer) | ||
formatted_user_prompt = self.class.user_prompt.format( | ||
question_title: question.title, | ||
question_description: question.description, | ||
rubric_categories: format_rubric_categories(question), | ||
answer_text: answer.answer_text | ||
) | ||
messages = [ | ||
{ role: 'system', content: self.class.system_prompt }, | ||
{ role: 'user', content: formatted_user_prompt } | ||
] | ||
response = LANGCHAIN_OPENAI.chat( | ||
messages: messages, | ||
response_format: { type: 'json_object' } | ||
).completion | ||
parse_llm_response(response) | ||
end | ||
|
||
# Formats rubric categories for inclusion in the LLM prompt | ||
# @param [Course::Assessment::Question::TextResponse] question The question containing rubric categories | ||
# @return [String] Formatted string representation of rubric categories and criteria | ||
def format_rubric_categories(question) | ||
question.categories.map do |category| | ||
criterions = category.criterions.map do |criterion| | ||
"- [Grade: #{criterion.grade}, Criterion ID: #{criterion.id}]: #{criterion.explanation}" | ||
end | ||
<<~CATEGORY | ||
Category ID: #{category.id} | ||
Name: #{category.name} | ||
Criteria: | ||
#{criterions.join("\n")} | ||
CATEGORY | ||
end.join("\n\n") | ||
end | ||
|
||
# Parses LLM response with retry logic for handling parsing failures | ||
# @param [String] response The raw LLM response to parse | ||
# @param [Hash] default_output The default grading output to return on failure | ||
# @return [Hash] The parsed response as a structured hash | ||
def parse_llm_response(response) | ||
self.class.output_parser.parse(response) | ||
rescue Langchain::OutputParsers::OutputParserException | ||
fix_parser = Langchain::OutputParsers::OutputFixingParser.from_llm( | ||
llm: LANGCHAIN_OPENAI, | ||
parser: self.class.output_parser | ||
) | ||
fix_parser.parse(response) | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.