Summarization
Tests how well each model distills information.
Summarize the following article in 3 bullet points: […article text…]
Creative Writing
Tests creativity, coherence, and tone.
Write a short story about a child who befriends a robot on Mars, in the style of a bedtime fairy tale.
Information Q&A
Tests factual accuracy and explanatory clarity.
What are the main causes of climate change, and how do they impact ocean levels?
Code Generation
Tests coding ability and correctness.
Write a Python function that takes a list of numbers and returns the list sorted without using built-in sort.
Code Debugging
Tests ability to reason about and fix code.
Here is a snippet of code and the error it produces, how can I fix it? […code snippet and error message…]
Customer Support Email
Tests tone control, empathy, and professionalism.
You are a customer service agent. Respond to this customer complaint in a polite tone: "I bought your product and it broke in two days. I'm very upset."
Translation
Tests multilingual capabilities and preservation of meaning/tone.
Translate this English paragraph into French and Chinese: […English paragraph…]
Idea Brainstorming
Tests creativity and practicality of suggestions.
I'm launching a new coffee shop. Give me 5 creative marketing ideas to attract college students.
Explanation (Tutoring)
Tests simplification skills and clarity.
Explain the concept of blockchain to a 12-year-old in a few sentences.
Roleplay/Conversation
Tests conversational ability, persuasiveness, and persona maintenance.
Act as a personal fitness coach. I haven't exercised in months; encourage me with a motivational plan in a friendly tone.
Structured Output
Tests ability to follow specific output format instructions.
Analyze this customer feedback and categorize the issues. Format your response as a JSON object with the following structure: {"positive_points": ["point1", "point2"], "negative_points": ["point1", "point2"], "suggestions": ["suggestion1", "suggestion2"]}
Step-by-Step Reasoning
Tests logical reasoning and problem-solving capabilities.
Solve this math problem step by step, explaining your reasoning at each stage: A store is offering a 25% discount on an item that originally costs $120. If there is also a 8% sales tax applied after the discount, what is the final price?
System Prompt Engineering
Tests ability to follow system-level instructions and constraints.
You are an expert programming tutor specializing in Python. Your responses should: 1. Explain concepts clearly with simple examples, 2. Identify and correct errors in student code, 3. Follow educational best practices by guiding rather than solving, 4. Include explanatory comments in all code examples, 5. Reference Python 3.12 standards. Now help me understand how to implement a binary search algorithm.
Context-Aware Response
Tests ability to incorporate provided context into responses.
Context: I'm a high school physics teacher preparing materials for students who struggle with mathematical concepts. Many of my students have math anxiety but are interested in practical applications. Request: Create an explanation of Newton's Second Law (F=ma) that uses minimal mathematical notation while still conveying the core concept accurately.
Multimodal Reasoning
Tests ability to reason about and describe visual content.
Look at this image of a data visualization chart and explain what trends it shows. What conclusions can be drawn from this data? What might be missing or misleading about this presentation?
Chain of Thought
Tests ability to break down complex problems into logical steps.
Let's think through this problem step by step: A train leaves Station A at 3:00 PM traveling at 60 mph. Another train leaves Station B at 4:30 PM traveling at 75 mph toward Station A. If the stations are 300 miles apart, at what time will the trains meet?
Iterative Refinement
Tests ability to improve outputs based on feedback.
Write a short product description for a new smartphone. After I review it, I'll provide feedback, and I want you to refine the description based on my comments.
Ethical Reasoning
Tests ability to navigate complex ethical scenarios with nuance.
Consider this ethical dilemma in AI development: A healthcare algorithm must allocate limited medical resources. What ethical frameworks should guide its design? Present multiple perspectives and discuss the tradeoffs involved.