wow-agent-day03 OpenAI implements an intelligent grading agent

Reference: DataWhale wow agent day03
Reuse the environment configuration and large model settings from day02.

Define Function#

Extract JSON Part from Large Model Output#

import re
def extract_json_content(text):
  text = text.replace()

Function Explanation:
Function to extract and clean JSON content from text

def extract_json_content(text):
    text = text.replace("\n","")
    pattern = r"```json(.*?)```"
    matches = re.findall(pattern, text, re.DOTALL)
    if matches:
        return matches[0].strip()
    return text

Parameters:
    text (str): Input text string, usually containing JSON code blocks
    
Returns:
    str: Extracted and cleaned JSON string
    
Function Description:
1. text.replace("\n","") - Removes all newline characters from the text
2. pattern = r"```json(.*?)```" - Defines a regular expression pattern to match content between ```json and ```
3. re.findall() - Uses the regular expression to find all matches, re.DOTALL allows . to match newline characters
4. matches[0].strip() - Gets the first match result and removes leading and trailing whitespace
5. Returns the original text if no matches are found

Parse JSON String into Python Object#

Some modifications were made to the function

class JsonOutputParser:
    def parse(self, result):
        # First, try to parse directly
        try:
            return json.loads(result)
        except json.JSONDecodeError:
            pass
            
        # Try to extract JSON content
        cleaned_result = extract_json_content(result)
        try:
            return json.loads(cleaned_result)
        except json.JSONDecodeError:
            pass
            
        # Try to fix common JSON errors
        try:
            # Handle single quotes
            fixed_result = cleaned_result.replace("'", '"')
            # Handle trailing commas
            fixed_result = re.sub(r',\s*}', '}', fixed_result)
            # Handle unclosed quotes
            fixed_result = re.sub(r'([^"])"([^"])', r'\1"\2', fixed_result)
            return json.loads(fixed_result)
        except json.JSONDecodeError as e:
            raise ValueError(f"Unable to parse JSON output. Original output: {result}\nError message: {str(e)}")

Function Explanation:

    Parameters:
        result (str): Text containing JSON generated by LLM
    Returns:
        dict: Parsed JSON object
        
    Optimization Description:
    1. Added multiple JSON extraction methods to improve robustness
    2. Added JSON repair mechanism to handle common errors
    3. Added retry mechanism to improve success rate

Define GradingOpenAI#

class GradingOpenAI:
    def __init__(self):
        self.model = "glm-4-flash"
        self.output_parser = JsonOutputParser()
        self.template = """You are an expert in grading the Chinese patent agent exam,
skilled at generating scores and comments in Chinese based on the given questions and answers,
and outputting in a specific format.
Your task is to generate scores and comments in Chinese based on the answers provided by the candidates, and return them in JSON format.
The grading standard should be somewhat lenient; as long as the candidate conveys the basic meaning, they should receive points.
If the answer has numerical annotations, it means that if the candidate answers this knowledge point, they will receive a certain number of points for this question.
The generated comments in Chinese need to be correctly parsed by the json.loads() function.
The entire generated comment in Chinese needs to be wrapped in English double quotes, and inside the wrapped string, please use Chinese double quotes.
The comments in Chinese should not contain newline characters, escape characters, etc.

Output format is JSON:
{{
  "llmgetscore": 0,
  "llmcomments": "Chinese comments"
}}

Compare the student's answer with the correct answer,
and give a score out of 10 and comments in Chinese. 
Question: {ques_title} 
Answer: {answer} 
Student's reply: {reply}"""

    def create_prompt(self, ques_title, answer, reply):
        return self.template.format(
            ques_title=ques_title,
            answer=answer,
            reply=reply
        )

    def grade_answer(self, ques_title, answer, reply):
        success = False
        while not success:
            # This is a necessary expedient
            # The above JSON parsing function is not performing well, so generate multiple times until parsing succeeds
            # First parse the content generated by the large model, if parsing fails, let the large model generate again
            try:
                response = client.chat.completions.create(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": "You are a professional exam grading expert."},
                        {"role": "user", "content": self.create_prompt(ques_title, answer, reply)}
                    ],
                    temperature=0.7
                )

                result = self.output_parser.parse(response.choices[0].message.content)
                success = True
            except Exception as e:
                print(f"Error occurred: {e}")
                continue

        return result['llmgetscore'], result['llmcomments']

    def run(self, input_data):
        output = []
        for item in input_data:
            score, comment = self.grade_answer(
                item['ques_title'], 
                item['answer'], 
                item['reply']
            )
            item['llmgetscore'] = score
            item['llmcomments'] = comment
            output.append(item)
        return output
grading_openai = GradingOpenAI()

Demonstration#

input

# Example input data
input_data = [
 {'ques_title': 'Please explain the meanings of common technical features, distinguishing technical features, additional technical features, and necessary technical features',
  'answer': 'Common technical features: technical features shared with the closest prior art (2.5 points); Distinguishing technical features: technical features that distinguish from the closest prior art (2.5 points); Additional technical features: technical features that further limit the cited technical features, additional technical features (2.5 points); Necessary technical features: technical features that are indispensable for solving its technical problems (2.5 points).',
  'fullscore': 10,
  'reply': 'Common technical features: technical features that are the same as the compared technical solution\nDistinguishing technical features: technical features that are different from the compared technical solution\nAdditional technical features: technical features that further limit the cited technical features\nNecessary technical features: technical features that are indispensable for solving technical problems'},
 {'ques_title': 'Please explain the preamble, feature part, citation part, and limiting part',
  'answer': 'Preamble: in independent claims, the subject + technical features shared with the closest prior art, before the features are (2.5 points); Feature part: in independent claims, technical features that distinguish from the closest prior art, after the features are (2.5 points); Citation part: the claim numbers and subjects cited from the claims (2.5 points); Limiting part: additional technical features from the claims (2.5 points).',
  'fullscore': 10,
  'reply': 'Preamble: technical features that are the same as the prior art in independent claims\nFeature part: technical features that distinguish from the prior art in independent claims\nCitation part: the part that cites other claims from dependent claims\nLimiting part: technical features that further limit the cited claims'}]

Run the agent

graded_data = grading_openai.run(input_data)
print(graded_data)

Results
[{'ques_title': 'Please explain the meanings of common technical features, distinguishing technical features, additional technical features, and necessary technical features', 'answer': 'Common technical features: technical features shared with the closest prior art (2.5 points); Distinguishing technical features: technical features that distinguish from the closest prior art (2.5 points); Additional technical features: technical features that further limit the cited technical features, additional technical features (2.5 points); Necessary technical features: technical features that are indispensable for solving its technical problems (2.5 points).', 'fullscore': 10, 'reply': 'Common technical features: technical features that are the same as the compared technical solution\nDistinguishing technical features: technical features that are different from the compared technical solution\nAdditional technical features: technical features that further limit the cited technical features\nNecessary technical features: technical features that are indispensable for solving technical problems', 'llmgetscore': 10, 'llmcomments': 'The candidate's explanation of common technical features, distinguishing technical features, additional technical features, and necessary technical features is basically correct, and can accurately express the meanings of these concepts, so full marks are given.'}, {'ques_title': 'Please explain the preamble, feature part, citation part, and limiting part', 'answer': 'Preamble: in independent claims, the subject + technical features shared with the closest prior art, before the features are (2.5 points); Feature part: in independent claims, technical features that distinguish from the closest prior art, after the features are (2.5 points); Citation part: the claim numbers and subjects cited from the claims (2.5 points); Limiting part: additional technical features from the claims (2.5 points).', 'fullscore': 10, 'reply': 'Preamble: technical features that are the same as the prior art in independent claims\nFeature part: technical features that distinguish from the prior art in independent claims\nCitation part: the part that cites other claims from dependent claims\nLimiting part: technical features that further limit the cited claims', 'llmgetscore': 8, 'llmcomments': 'The student’s answer is basically correct, the explanations of the preamble and feature part are consistent with the standard answer, and the understanding of the citation part and limiting part is also correct, but the specific order and position of the technical features in the standard answer are not fully expressed, so 8 points are given.'}]