Summarization
Tests how well each model distills information.
Summarize the following article in 3 bullet points: […article text…]
Creative Writing
Tests creativity, coherence, and tone.
Write a short story about a child who befriends a robot on Mars, in the style of a bedtime fairy tale.
Information Q&A
Tests factual accuracy and explanatory clarity.
What are the main causes of climate change, and how do they impact ocean levels?
Code Generation
Tests coding ability and correctness.
Write a Python function that takes a list of numbers and returns the list sorted without using built-in sort.
Code Debugging
Tests ability to reason about and fix code.
Here is a snippet of code and the error it produces, how can I fix it? […code snippet and error message…]
Customer Support Email
Tests tone control, empathy, and professionalism.
You are a customer service agent. Respond to this customer complaint in a polite tone: "I bought your product and it broke in two days. I'm very upset."
Translation
Tests multilingual capabilities and preservation of meaning/tone.
Translate this English paragraph into French and Chinese: […English paragraph…]
Idea Brainstorming
Tests creativity and practicality of suggestions.
I'm launching a new coffee shop. Give me 5 creative marketing ideas to attract college students.
Explanation (Tutoring)
Tests simplification skills and clarity.
Explain the concept of blockchain to a 12-year-old in a few sentences.
Roleplay/Conversation
Tests conversational ability, persuasiveness, and persona maintenance.
Act as a personal fitness coach. I haven't exercised in months; encourage me with a motivational plan in a friendly tone.
Structured Output
Tests ability to follow specific output format instructions.
Analyze this customer feedback and categorize the issues. Format your response as a JSON object with the following structure: {"positive_points": ["point1", "point2"], "negative_points": ["point1", "point2"], "suggestions": ["suggestion1", "suggestion2"]}
Step-by-Step Reasoning
Tests logical reasoning and problem-solving capabilities.
Solve this math problem step by step, explaining your reasoning at each stage: A store is offering a 25% discount on an item that originally costs $120. If there is also a 8% sales tax applied after the discount, what is the final price?
System Prompt Engineering
Tests ability to follow system-level instructions and constraints.
You are an expert programming tutor specializing in Python. Your responses should: 1. Explain concepts clearly with simple examples, 2. Identify and correct errors in student code, 3. Follow educational best practices by guiding rather than solving, 4. Include explanatory comments in all code examples, 5. Reference Python 3.12 standards. Now help me understand how to implement a binary search algorithm.
Context-Aware Response
Tests ability to incorporate provided context into responses.
Context: I'm a high school physics teacher preparing materials for students who struggle with mathematical concepts. Many of my students have math anxiety but are interested in practical applications. Request: Create an explanation of Newton's Second Law (F=ma) that uses minimal mathematical notation while still conveying the core concept accurately.
Multimodal Reasoning
Tests ability to reason about and describe visual content.
Look at this image of a data visualization chart and explain what trends it shows. What conclusions can be drawn from this data? What might be missing or misleading about this presentation?
Chain of Thought
Tests ability to break down complex problems into logical steps.
Let's think through this problem step by step: A train leaves Station A at 3:00 PM traveling at 60 mph. Another train leaves Station B at 4:30 PM traveling at 75 mph toward Station A. If the stations are 300 miles apart, at what time will the trains meet?
Iterative Refinement
Tests ability to improve outputs based on feedback.
Write a short product description for a new smartphone. After I review it, I'll provide feedback, and I want you to refine the description based on my comments.
Ethical Reasoning
Tests ability to navigate complex ethical scenarios with nuance.
Consider this ethical dilemma in AI development: A healthcare algorithm must allocate limited medical resources. What ethical frameworks should guide its design? Present multiple perspectives and discuss the tradeoffs involved.
Zero-Shot Prompting
Getting results without providing examples, relying on the model's pre-training.
Classify the sentiment of this text as positive, negative, or neutral: 'The product exceeded my expectations in every way.'
Few-Shot Prompting
Providing a few examples to guide the model's response pattern.
Convert these sentences to past tense:
Example: 'I walk to school' → 'I walked to school'
Example: 'She eats lunch' → 'She ate lunch'
Now convert: 'They play soccer'
Role Prompting
Assigning a specific role or persona to shape the model's responses.
You are a senior software architect with 15 years of experience in distributed systems. Review this API design and provide recommendations for scalability and maintainability.
Constraints & Guardrails
Setting clear boundaries on what the model should and shouldn't do.
Explain quantum computing to a beginner. Constraints: Use only simple analogies, avoid mathematical formulas, keep it under 150 words, and don't use jargon without explaining it first.
Meta-Prompting
Having the model help design better prompts or reflect on its own process.
I want to ask an AI to help me plan a wedding. What information should I include in my prompt to get the most helpful response?
Prompt Chaining
Breaking complex tasks into sequential prompts where each builds on the previous.
Step 1: Identify the main themes in this article. Step 2: For each theme, find supporting evidence. Step 3: Create an outline for a response article addressing these themes.
Tree of Thoughts
Exploring multiple reasoning paths simultaneously before selecting the best one.
Consider three different approaches to solve this optimization problem. For each approach, outline the steps, identify potential issues, and estimate success probability. Then recommend the best approach.
Reflection & Self-Critique
Having the model review and improve its own outputs.
First, write a product description. Then, critique your description for clarity, persuasiveness, and completeness. Finally, write an improved version based on your critique.
Retrieval-Augmented Generation (RAG)
Providing relevant context or documents for the model to reference.
Context: [Insert relevant documentation here]. Based on this context, answer: How do we implement authentication in our system?
Output Format Control
Specifying exact output format, often using JSON, XML, or markdown.
Extract key information and format as JSON with these fields: {"name": string, "date": ISO-8601, "priority": "high"|"medium"|"low", "tags": string[]}
Temperature & Creativity Control
Using prompt language to guide creativity vs. consistency.
For creative: 'Brainstorm 10 wild and unconventional ideas for...' vs. For consistent: 'Provide the standard, industry-accepted method for...'
Negative Prompting
Explicitly stating what NOT to include in the response.
Explain blockchain technology. Do NOT use technical jargon, do NOT assume prior knowledge of cryptography, and do NOT include code examples.