SkillHub ClubDesign ProductData / AIDesigner

exercise-designer

Generates Python practice exercises using evidence-based learning strategies like retrieval practice and spaced repetition. Follows an 'evals-first' design process, mapping exercises to learning objectives and success criteria. Provides structured templates for exercises and rubrics, supporting varied exercise types from fill-in-the-blank to AI-collaborative tasks.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

159

Hot score

Updated

March 20, 2026

Overall rating

A8.3

Composite score

6.7

Best-practice grade

B73.6

Install command

npx @skill-hub/cli install panaversity-agentfactory-exercise-designer

educationexercise-generationpythoncognitive-sciencecurriculum-design

Repository

panaversity/agentfactory

Skill path: .claude/skills/exercise-designer

Open repository

Best for

Primary workflow: Design Product.

Technical facets: Data / AI, Designer.

Target audience: Educators and curriculum designers creating programming exercises, particularly for Python courses at beginner to intermediate levels.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: panaversity.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install exercise-designer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/panaversity/agentfactory before adding exercise-designer to shared team environments
Use exercise-designer for ai/ml workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: exercise-designer
description: Designs deliberate practice exercises applying evidence-based learning strategies like retrieval practice, spaced repetition, and interleaving. Activate when educators need varied exercise types (fill-in-blank, debug-this, build-from-scratch, extend-code, AI-collaborative) targeting learning objectives with appropriate difficulty progression. Creates exercise sets that apply cognitive science principles to maximize retention and skill development. Use when designing practice activities for Python concepts, creating homework assignments, generating problem sets, or evaluating exercise quality.
---

## Purpose

The exercise-designer skill helps educators create varied, evidence-based practice exercises that target specific learning objectives and apply proven strategies from cognitive science. This skill designs exercises with appropriate difficulty progression, spaced repetition opportunities, and clear assessment criteria.

**Constitution v4.0.1 Alignment**: This skill implements evals-first exercise design—defining success criteria BEFORE creating exercises, integrating Section IIb (AI Three Roles Framework) co-learning exercise types, and aligning with Section IIa (4-Layer Method) for layer-appropriate exercises.

## When to Activate

Use this skill when:
- Educators need practice exercises for Python concepts
- Designing homework assignments or problem sets
- Creating varied exercise types beyond simple coding problems
- Applying evidence-based learning strategies (retrieval practice, spaced repetition)
- Establishing difficulty progression across exercise sequences
- Generating test cases and rubrics for exercises
- Evaluating existing exercises for pedagogical effectiveness

## Inputs

Required:
- **Learning objectives**: What learners should be able to do
- **Concept/topic**: Python concept to practice (e.g., "loops", "dictionaries")

Optional:
- **Target audience**: beginner | intermediate | advanced
- **Number of exercises**: How many to generate
- **Exercise types**: Preferred types (fill-in, debug, build-from-scratch, etc.)
- **Time constraints**: Total time available for exercises
- **Prior concepts**: Previously learned concepts for spaced repetition

## Evals-First Exercise Design (Constitution v3.1.2)

**CRITICAL WORKFLOW**:
1. **Evals First**: Review success criteria from chapter spec BEFORE designing exercises
2. **Objectives Second**: Ensure exercises target learning objectives that lead to evals
3. **Exercises Third**: Design practice activities that prepare students for eval success
4. **Validation Fourth**: Verify exercises measure progress toward defined success criteria

**Template**:
```markdown
### Exercise Design (Evals-First)

**Source**: Chapter spec at `specs/part-X/chapter-Y/spec.md`

**Success Evals from Spec**:
1. 75%+ write valid specification (measured by final exercise)
2. 80%+ identify vague requirements (measured by quiz)

**Learning Objectives** (from spec):
- LO-001: Write clear specifications
- LO-002: Identify ambiguous requirements

**Exercise Design to Achieve Objectives → Evals**:
- Ex-1: Fill-in incomplete spec (LO-001, starter difficulty)
- Ex-2: Debug vague spec (LO-002, core difficulty)
- Ex-3: Write complete spec from scratch (LO-001, stretch difficulty) → Tests Eval #1
- Ex-4: Evaluate spec clarity (LO-002, stretch difficulty) → Tests Eval #2
```

**Do NOT** create exercises without:
- ✅ Reference to approved spec with success evals
- ✅ Explicit mapping: Exercise → Objective → Eval
- ✅ Verification that exercises prepare for eval success

## Process

### Step 1: Clarify Learning Objectives and Evals

Understand what learners should achieve:
- Specific skills to demonstrate
- Depth of understanding required (recall vs. application vs. creation)
- Connection to Bloom's taxonomy levels
- **Success evals from chapter spec** (what defines mastery?)

### Step 2: Load Exercise Type Reference

Read exercise type patterns for variety:

```bash
Read reference/exercise-types.md
```

Available types:
- **Fill-in-the-blank**: Focus on specific concepts with scaffolding
- **Debug-this**: Develop error recognition skills
- **Build-from-scratch**: Test independent problem-solving
- **Extend-the-code**: Practice incremental development
- **Trace-execution**: Test mental execution model
- **Explain-code**: Promote deeper understanding
- **Refactor**: Teach code quality and Pythonic patterns
- **Parsons problems**: Focus on logic flow
- **AI-collaborative** (NEW): Practice working WITH AI as co-learning partner

### AI-Collaborative Exercise Types (Section IIb, Constitution v4.0.1)

**CRITICAL**: AI-native exercises must teach students to work WITH AI in bidirectional co-learning partnership (per Section IIb forcing functions), not just independently.

**AI-Collaborative Exercise Categories**:

**1. Spec-to-Code with AI (AI as Student)**:
```markdown
### Exercise: User Authentication

**Task**: Write a specification that produces working OAuth implementation on first try.

**Instructions**:
1. Write detailed specification for OAuth authentication
2. Provide spec to AI
3. Evaluate AI's generated code
4. Identify gaps in your spec if code doesn't match intent

**Assessment**:
- Spec clarity (5 pts): Unambiguous requirements
- Completeness (5 pts): All edge cases specified
- AI output quality (5 pts): Code matches spec without clarification
- Reflection (5 pts): What you learned about spec-writing from AI's response
```

**2. Convergence Iteration (AI as Co-Worker)**:
```markdown
### Exercise: Optimize Database Query

**Task**: Iterate with AI to improve query performance.

**Instructions**:
1. Start with provided slow query
2. Ask AI for improvement suggestions
3. Evaluate AI's suggestions (don't blindly accept)
4. Implement chosen approach
5. Document what YOU decided vs. what AI suggested

**Assessment**:
- Iteration quality (5 pts): Clear back-and-forth refinement
- Decision-making (5 pts): Strategic choices explained
- Convergence (5 pts): Better solution than either party alone
- Validation (5 pts): Verified AI's suggestions work correctly
```

**3. Pattern Learning from AI (AI as Teacher)**:
```markdown
### Exercise: Discover Pythonic Patterns

**Task**: Learn a new pattern from AI suggestion.

**Instructions**:
1. Implement solution using your current approach
2. Ask AI: "How would you improve this for Pythonicity?"
3. Analyze AI's suggestion
4. Explain what pattern AI taught you and why it's better
5. Apply pattern to 2 new problems

**Assessment**:
- Understanding (5 pts): Clearly explains AI's suggested pattern
- Application (5 pts): Successfully applies to new contexts
- Evaluation (5 pts): Identifies when pattern is/isn't appropriate
- Reflection (5 pts): What you learned that you didn't know before
```

**4. AI Output Validation (Critical Skill)**:
```markdown
### Exercise: Verify AI-Generated Code

**Task**: Validate AI-generated authentication code for security.

**Instructions**:
1. Review provided AI-generated code
2. Identify security vulnerabilities
3. Write test cases that expose issues
4. Propose fixes
5. Document validation checklist you used

**Assessment**:
- Vulnerability detection (5 pts): Found critical issues
- Test coverage (5 pts): Tests expose problems
- Fix quality (5 pts): Secure improvements
- Validation process (5 pts): Systematic approach documented
```

**5. Spec Refinement from AI Feedback (Bidirectional Learning)**:
```markdown
### Exercise: Iterative Spec Improvement

**Task**: Refine specification based on AI clarifying questions.

**Instructions**:
1. Write initial specification
2. AI asks clarifying questions (or you simulate what AI might ask)
3. Improve spec to answer questions proactively
4. Compare initial vs. final spec quality

**Assessment**:
- Initial spec (2 pts): Baseline quality
- Question anticipation (3 pts): Identified ambiguities
- Refinement quality (3 pts): Clearer final spec
- Learning (2 pts): Documented what makes specs clear
```

**Exercise Balance for AI-Native Content**:
- 50-60%: Traditional independent exercises
- 30-40%: AI-collaborative exercises (Three Roles)
- 10-20%: Validation/verification exercises

### Step 3: Load Evidence-Based Strategies

Read cognitive science strategies to apply:

```bash
Read reference/evidence-based-strategies.md
```

Key strategies:
- **Retrieval Practice**: Recall from memory strengthens learning
- **Spaced Repetition**: Distribute practice over time
- **Interleaving**: Mix exercise types and concepts
- **Elaboration**: Ask "why" and "how" questions
- **Desirable Difficulty**: Appropriate challenge level

### Step 4: Design Exercise Variety

Create 3-5 exercises using multiple types:

**Mix Exercise Types** (avoid 5 identical exercises):
```
Exercise 1: Fill-in-blank (quick warm-up)
Exercise 2: Debug-this (error recognition)
Exercise 3: Build-from-scratch (application)
Exercise 4: Explain-code (elaboration)
Exercise 5: Extend-code (integration)
```

**Apply Interleaving**: Mix new and prior concepts:
- 60% current concept
- 30% recent concepts (last 1-2 lessons)
- 10% older concepts (3+ lessons ago)

### Step 5: Establish Difficulty Progression

Load difficulty progression guide:

```bash
Read reference/difficulty-progression.md
```

Sequence exercises from easier to harder:
- **Easy**: High scaffolding, clear structure
- **Medium**: Moderate scaffolding, specification-based
- **Hard**: Minimal scaffolding, open-ended

**Bloom's Progression**:
1. Remember/Understand (trace execution, explain)
2. Apply (fill-in-blank, standard problems)
3. Analyze (debug-this, compare approaches)
4. Evaluate/Create (build-from-scratch, refactor)

### Step 6: Incorporate Spaced Repetition

Load spaced repetition patterns:

```bash
Read reference/spaced-repetition.md
```

Include prior concepts for review:
- Identify concepts from previous lessons
- Design exercises combining old + new concepts
- Tag exercises with concepts practiced (for tracking)

**Example**:
```
Lesson 5 (Current: Loops)
Exercise 1: Loop basics (new)
Exercise 2: Loops + lists (review Lesson 2)
Exercise 3: Loops + conditionals (review Lesson 3)
Exercise 4: Loops + functions (review Lesson 4)
```

### Step 7: Create Test Cases

Generate comprehensive test cases:

```bash
Read templates/exercise-template.yml
```

Include:
- **Normal cases**: Typical usage (60%)
- **Edge cases**: Empty inputs, boundaries, special cases (30%)
- **Error cases**: Invalid inputs, exceptions (10%)

Validate test coverage using script:

```bash
python .claude/skills/exercise-designer/scripts/generate-test-cases.py exercise.yml
```

The script will:
- Analyze existing test case coverage
- Suggest missing test types
- Provide concept-specific recommendations
- Check for normal/edge/error case balance

### Step 8: Define Assessment Rubric

Load rubric template:

```bash
Read templates/rubric-template.yml
```

Create rubric with criteria:
- **Correctness** (40%): Produces correct output
- **Code Quality** (30%): Readable, well-structured
- **Efficiency** (20%): Appropriate approach
- **Error Handling** (10%): Edge case consideration

Each criterion has levels: excellent, adequate, developing, insufficient

### Step 9: Add Progressive Hints

Provide 3 levels of hints:
- **Level 1** (gentle): Direction without giving answer
- **Level 2** (moderate): More specific guidance
- **Level 3** (explicit): Nearly complete solution

**Example**:
```
Exercise: Write function to find duplicates in a list

Hint 1: "Consider using a set to track items you've seen"
Hint 2: "Iterate through list, add items to set, check if item already in set"
Hint 3: "Use: seen = set(); for item in list: if item in seen..."
```

### Step 10: Validate and Refine

Check exercise quality:
- [ ] Clear learning objective stated
- [ ] Appropriate difficulty for target audience
- [ ] Complete instructions (learner knows what to do)
- [ ] Test cases provided (normal + edge + error)
- [ ] At least 2 evidence-based strategies applied
- [ ] Exercise is achievable in estimated time
- [ ] Rubric or evaluation criteria included

## Output Format

Provide exercise set as structured markdown or YAML:

```markdown
# Exercise Set: [Topic]

**Learning Objectives**:
- [Objective 1]
- [Objective 2]

**Estimated Time**: [X minutes total]
**Evidence-Based Strategies**: [List strategies applied]

---

## Exercise 1: [Title]

**Type**: [fill-in-blank | debug-this | etc.]
**Difficulty**: [easy | medium | hard]
**Time**: [X minutes]
**Strategies**: [retrieval-practice, etc.]

### Instructions

[Clear description of what to do]

### Starter Code (if applicable)

```python
[Code here]
```

### Test Cases

1. **Input**: `[example]`
   **Expected**: `[output]`
   **Tests**: Normal case

2. **Input**: `[]`
   **Expected**: `[output]`
   **Tests**: Edge case - empty input

### Hints

**Hint 1**: [Gentle guidance]
**Hint 2**: [More specific]
**Hint 3**: [Explicit approach]

### Rubric

- **Correctness** (4 pts): Passes all test cases
- **Code Quality** (3 pts): Readable with good naming
- **Efficiency** (2 pts): Reasonable approach
- **Error Handling** (1 pt): Handles edge cases

---

[Repeat for exercises 2-5]

---

## Spaced Repetition Notes

This exercise set practices:
- **New**: [Current concept]
- **Review**: [Concepts from prior lessons]

---

## Answer Key

[Solutions for all exercises with explanations]
```

## Examples

### Example 1: Design Exercises for List Methods

**Input**: "Create 5 exercises for practicing list methods (append, remove, extend) for beginners"

**Process**:
1. Identify learning objectives: Use list methods correctly, understand when to use each
2. Choose variety: fill-in-blank, debug-this, build-from-scratch, explain-code, trace-execution
3. Progress difficulty: easy → medium → medium → hard → medium
4. Apply strategies: retrieval practice (no references), interleaving (mix method types)
5. Add test cases and rubrics
6. Include hints

**Output**: 5-exercise set with variety, progression, test cases, and strategies applied

---

### Example 2: Review Existing Exercise Set

**Input**: "Evaluate these 10 loop exercises for pedagogical effectiveness"

**Process**:
1. Check variety: "All 10 are build-from-scratch—needs more variety"
2. Check progression: "Difficulty jumps too quickly from exercise 2 to 3"
3. Check strategies: "No spaced repetition—all exercises only use loops, no prior concepts"
4. Check test cases: "Only 3 exercises have test cases, edge cases missing"
5. Provide specific recommendations

**Output**: Detailed assessment with actionable improvements

---

### Example 3: Design Exercises with Spaced Repetition

**Input**: "Create exercises for dictionaries (Lesson 4) that review lists (Lesson 2) and conditionals (Lesson 3)"

**Process**:
1. Primary concept: Dictionary methods and operations
2. Secondary concepts: Lists, conditionals (for review)
3. Design exercises combining concepts:
   - Exercise 1: Dictionary basics (new)
   - Exercise 2: Dictionary + conditionals (review)
   - Exercise 3: Dictionary + lists (review)
   - Exercise 4: All three combined
4. Tag for tracking: primary=dictionaries, secondary=[lists, conditionals]

**Output**: Exercise set with explicit spaced repetition

## Common Patterns

### Pattern 1: Concept Introduction Set

```
Exercise 1: Fill-in-blank (very easy, high scaffolding)
Exercise 2: Trace-execution (understand behavior)
Exercise 3: Build-from-scratch (simple application)
Exercise 4: Debug-this (recognize errors)
Exercise 5: Extend-code (integrate with prior knowledge)
```

### Pattern 2: Mixed Review Set

```
Exercise 1: Current concept only (60%)
Exercise 2: Current + recent concept (30%)
Exercise 3: Current concept only (60%)
Exercise 4: Current + old concept (10%)
Exercise 5: Current + recent + old (integration)
```

### Pattern 3: Progressive Challenge Set

```
Exercise 1: Guided (70% code provided)
Exercise 2: Structured (50% code provided)
Exercise 3: Specification (clear requirements)
Exercise 4: Open-ended (minimal guidance)
Exercise 5: Extension (build on Exercise 3)
```

## Validation Checklist

Before finalizing exercise set:
- [ ] 3-5 exercises (not too few, not overwhelming)
- [ ] Multiple exercise types (not all identical)
- [ ] Clear difficulty progression (easier → harder)
- [ ] At least 2 evidence-based strategies explicitly applied
- [ ] Test cases for each exercise (normal + edge + error)
- [ ] Rubric or assessment criteria provided
- [ ] Spaced repetition if applicable (reviews prior concepts)
- [ ] Instructions are clear and complete
- [ ] Exercises are achievable in estimated time
- [ ] Each exercise has clear learning objective

## Acceptance Checks

- [ ] Difficulty bands present: starter (easy), core (medium), stretch (hard)
- [ ] Hints provided at three levels (gentle, moderate, explicit)
- [ ] Rubric attached with criteria and points; maps to objectives

### Difficulty bands example
```
Starter: Warm‑up fill‑in (L2‑Understand)
Core: Implement function from spec (L3‑Apply)
Stretch: Refactor for performance (L4‑Analyze/L5‑Evaluate)
```

## References

Supporting documentation (loaded as needed):
- `reference/exercise-types.md` - Fill-in, debug, build-from-scratch, etc.
- `reference/evidence-based-strategies.md` - Retrieval, spacing, interleaving, elaboration
- `reference/difficulty-progression.md` - Scaffolding, Bloom's levels, PRIME framework
- `reference/spaced-repetition.md` - Spiral curriculum, mixed sets, optimal intervals

## Error Handling

If validation fails:
1. Report specific issues (e.g., "All exercises are same type", "No test cases provided")
2. Suggest remediation (e.g., "Add debug-this and trace-execution exercises")
3. Halt and require user intervention (hard failure mode)

Examples must meet quality standards: varied types, appropriate difficulty, clear objectives, comprehensive test cases.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### reference/exercise-types.md

```markdown
# Exercise Types for Programming Education

## Overview

Varied exercise types engage different cognitive processes and learning mechanisms. This guide categorizes effective exercise patterns for Python education with pedagogical rationale.

## Core Exercise Types

### 1. Fill-in-the-Blank

**Description**: Provide partial code with strategic gaps for learners to complete.

**Pedagogical Value**: Focuses attention on key concepts while reducing cognitive load from boilerplate code.

**Example**:
```python
# Complete this function to return the sum of even numbers in a list
def sum_even_numbers(numbers):
    total = 0
    for num in numbers:
        if _______________:  # Fill in the condition
            total += num
    return total
```

**When to Use**:
- Introducing new syntax patterns
- After worked examples, before independent practice
- Testing specific concept understanding

**Difficulty Levels**:
- **Easy**: One blank, clear context
- **Medium**: Multiple blanks, requires understanding relationships
- **Hard**: Strategic blanks that require planning

---

### 2. Debug-This

**Description**: Provide intentionally flawed code for learners to identify and fix errors.

**Pedagogical Value**: Develops debugging skills, error recognition, and understanding of common mistakes.

**Example**:
```python
# This code should calculate factorial but has bugs
def factorial(n):
    result = 0  # BUG 1: Wrong initial value
    for i in range(n):  # BUG 2: Missing value
        result *= i
    return result

print(factorial(5))  # Should print 120
```

**Common Bug Categories**:
- **Syntax errors**: Missing colons, incorrect indentation
- **Logic errors**: Wrong operators, incorrect conditions
- **Off-by-one**: range(n) vs range(1, n+1)
- **Type errors**: String/int confusion
- **Scope errors**: Variable accessibility

**When to Use**:
- Reinforcing common pitfalls
- After concept introduction
- Testing deep understanding

---

### 3. Build-From-Scratch

**Description**: Specification-based exercises where learners write complete solutions.

**Pedagogical Value**: Tests full understanding, integration of multiple concepts, problem-solving.

**Example**:
```
Write a function is_palindrome(text) that returns True if
the text reads the same forwards and backwards (ignore case
and spaces).

Examples:
  is_palindrome("racecar") → True
  is_palindrome("hello") → False
  is_palindrome("A man a plan a canal Panama") → True
```

**Difficulty Levels**:
- **Easy**: Clear requirements, one concept
- **Medium**: Multiple steps, 2-3 concepts
- **Hard**: Complex logic, edge cases, multiple approaches

---

### 4. Extend-the-Code

**Description**: Provide working code and ask learners to add functionality.

**Pedagogical Value**: Builds on existing code, practices incremental development, code reading.

**Example**:
```python
# This function validates email format
def is_valid_email(email):
    return '@' in email and '.' in email

# TASK: Extend this to also check:
# 1. Email doesn't start or end with '@'
# 2. There's text after the last '.'
# 3. No spaces in the email
```

**When to Use**:
- Practicing code modification
- Building on previous work
- Real-world skill (extending existing code)

---

### 5. Code-Completion (Parsons Problems)

**Description**: Provide code lines in scrambled order; learners arrange correctly.

**Pedagogical Value**: Focuses on logic flow without typing, reduces syntax burden.

**Example**:
```
Arrange these lines to create a function that finds the maximum value in a list:

A. def find_max(numbers):
B.     if num > max_value:
C.         max_value = num
D.     for num in numbers:
E.     max_value = numbers[0]
F.     return max_value

Correct order: _______________
```

**Variations**:
- **Fixed Parsons**: All lines correct, just scrambled
- **Distractor Parsons**: Include extra incorrect lines

---

### 6. Trace-the-Execution

**Description**: Ask learners to predict program output or trace variable values.

**Pedagogical Value**: Develops mental execution model, tests understanding of flow.

**Example**:
```python
x = 5
y = 10

for i in range(3):
    x += i
    y -= 1

print(x)  # What will this print?
print(y)  # What will this print?
```

**When to Use**:
- Testing control flow understanding
- Loop comprehension
- Variable scope and mutation

---

### 7. Explain-This-Code

**Description**: Provide code and ask for explanation in plain language.

**Pedagogical Value**: Promotes deeper understanding, tests conceptual grasp beyond mechanics.

**Example**:
```python
result = [x**2 for x in range(10) if x % 2 == 0]

# Explain what this code does in your own words:
# 1. What does x**2 mean?
# 2. What does range(10) produce?
# 3. What does 'if x % 2 == 0' filter?
# 4. What will 'result' contain?
```

---

### 8. Refactor-for-Quality

**Description**: Provide working but poorly-written code; ask for improvements.

**Pedagogical Value**: Teaches code quality, Pythonic patterns, best practices.

**Example**:
```python
# Refactor this code to be more Pythonic and readable

def func(x):
    y = []
    for i in range(len(x)):
        if x[i] % 2 == 0:
            y.append(x[i])
    return y

# Improvements to make:
# - Better function/variable names
# - Use enumerate or direct iteration
# - Consider list comprehension
# - Add docstring
```

---

### 9. Test-Driven Exercise

**Description**: Provide test cases; learners write code to pass tests.

**Pedagogical Value**: Introduces TDD, clarifies requirements through tests.

**Example**:
```python
# Write a function that passes these tests:

def test_greet():
    assert greet("Alice") == "Hello, Alice"
    assert greet("Bob", formal=True) == "Good day, Bob"
    assert greet("") raises ValueError

# Your implementation:
def greet(name, formal=False):
    # Your code here
    pass
```

---

### 10. Multiple-Choice Code Questions

**Description**: Present code scenario with multiple possible answers.

**Pedagogical Value**: Quick assessment, tests concept recognition.

**Example**:
```
What will this code print?

x = [1, 2, 3]
y = x
y.append(4)
print(len(x))

A) 3
B) 4
C) Error
D) [1, 2, 3, 4]
```

## Exercise Design Principles

### Principle 1: Clear Learning Target
Every exercise should target specific learning objective(s).

### Principle 2: Appropriate Difficulty
Match difficulty to learner level and learning stage.

### Principle 3: Immediate Feedback
Provide or enable quick verification of correctness.

### Principle 4: Scaffolded Support
Offer hints, partial solutions, or progressive disclosure.

### Principle 5: Realistic Context
Use meaningful scenarios when possible.

## Mixing Exercise Types

**Recommended Sequence**:
1. **Fill-in-the-blank** (after initial instruction)
2. **Trace-the-execution** (test understanding)
3. **Debug-this** (reinforce common errors)
4. **Build-from-scratch** (apply independently)

**Avoid**: Using only one type repeatedly—varies engagement and targets different skills.

## Exercise Timing

- **Fill-in-blank**: 2-5 minutes
- **Debug-this**: 5-10 minutes
- **Build-from-scratch**: 10-30 minutes
- **Extend-code**: 5-15 minutes
- **Trace-execution**: 2-5 minutes

## Further Reading

- "How Learning Works" (Ambrose et al.) - Exercise design principles
- "The Programmer's Brain" (Hermans) - Cognitive aspects of code exercises
- Parsons Problems research (Parsons & Haden, 2006)

```

### reference/evidence-based-strategies.md

```markdown
# Evidence-Based Learning Strategies

## Overview

Research in cognitive science and educational psychology has identified specific strategies that significantly improve learning retention and transfer. This guide explains how to apply these strategies to programming exercise design.

## Core Strategies

### 1. Retrieval Practice

**Definition**: The act of recalling information from memory strengthens learning more than re-reading or reviewing.

**Research Basis**: "Testing effect" (Roediger & Karpicke, 2006) shows retrieval practice produces better long-term retention than repeated study.

**Application to Python Exercises**:
- Ask learners to write code from memory (not copy-paste)
- Use "close the book" coding challenges
- Frequent low-stakes quizzes
- Code-from-specification exercises

**Example Exercise**:
```
Without looking at references, write a function that removes duplicates from a list while preserving order.

After attempting, compare your solution with documented approaches.
```

**Implementation Tips**:
- Space out retrieval attempts (see Spaced Repetition)
- Start with easier retrievals, progress to harder
- Provide feedback after retrieval attempts
- Use varied retrieval contexts

---

### 2. Spaced Repetition

**Definition**: Distributing practice over time produces better retention than massed practice (cramming).

**Research Basis**: Spacing effect (Ebbinghaus, 1885; Cepeda et al., 2006) shows distributed learning outperforms blocked practice.

**Application to Python Exercises**:
- Revisit concepts across multiple lessons
- Include "spiral review" exercises mixing old and new content
- Progressive interval testing (1 day, 3 days, 1 week, 1 month)

**Example Pattern**:
```
Lesson 1: Introduce list methods (.append, .extend, .remove)
  Exercise: Use these methods

Lesson 3: Introduce dictionaries
  Exercise: Use dictionaries AND revisit list methods

Lesson 7: Introduce classes
  Exercise: Create class that uses lists internally

This spaces out list method practice over multiple sessions.
```

**Optimal Spacing**:
- **Initial**: Practice immediately after learning
- **Short interval**: 1-2 days later
- **Medium interval**: 1 week later
- **Long interval**: 1 month later

**Implementation in Exercise Sets**:
```
Exercise Set for Lesson 5:
- 60% new content (current lesson)
- 30% recent content (lessons 3-4)
- 10% older content (lessons 1-2)
```

---

### 3. Interleaving

**Definition**: Mixing different types of problems during practice rather than blocking by type.

**Research Basis**: Interleaved practice improves discrimination between problem types and enhances transfer (Rohrer & Taylor, 2007).

**Application to Python Exercises**:
- Mix exercise types (fill-in, debug, build-from-scratch) in one set
- Combine different concepts in single exercise session
- Avoid long blocks of identical problem types

**Example**:
```
BLOCKED (Less Effective):
- 10 exercises on list comprehensions
- 10 exercises on dictionary methods
- 10 exercises on string methods

INTERLEAVED (More Effective):
- Exercise 1: List comprehension
- Exercise 2: Dictionary method
- Exercise 3: String method
- Exercise 4: List comprehension
- Exercise 5: Dictionary method
[Continues mixing...]
```

**When NOT to Interleave**:
- Initial introduction of brand-new concept (allow some blocked practice first)
- Very early beginners (may cause confusion)

---

### 4. Elaboration

**Definition**: Explaining concepts in your own words and connecting new information to existing knowledge.

**Research Basis**: Elaborative interrogation (Pressley et al., 1987) improves comprehension and retention.

**Application to Python Exercises**:
- Ask "why" and "how" questions, not just "what"
- Require explanations alongside code
- Compare/contrast exercises
- Connection to prior knowledge

**Example Exercises**:
```python
# Standard exercise:
# Write a function to find the maximum value in a list.

# Elaboration-enhanced exercise:
# Write a function to find the maximum value in a list.
# Then explain:
# 1. Why did you initialize max_value the way you did?
# 2. How would your approach change for an empty list?
# 3. Compare your solution to using the built-in max() function.
# 4. When might your custom function be preferable?
```

**Self-Explanation Prompts**:
- "Explain how this code works in your own words"
- "Why does this approach work?"
- "How is X different from Y?"
- "When would you use this pattern?"

---

### 5. Concrete Examples

**Definition**: Using specific, concrete instances before introducing abstract concepts.

**Research Basis**: Concrete-to-abstract progression (Goldstone & Son, 2005) aids conceptual understanding.

**Application to Python Exercises**:
- Start with tangible, relatable scenarios
- Use specific numbers/data before variables
- Provide examples before definitions

**Example Progression**:
```python
# Step 1: Concrete (specific numbers)
total = 10 + 20 + 30
average = total / 3

# Step 2: Concrete with specific data
prices = [10, 20, 30]
total = sum(prices)
average = total / len(prices)

# Step 3: Abstract pattern (any data)
def calculate_average(numbers):
    return sum(numbers) / len(numbers)
```

---

### 6. Dual Coding

**Definition**: Combining verbal and code-based information enhances learning.

**Research Basis**: Dual coding theory (Paivio, 1971) shows using multiple representations aids memory.

**Application to Programming Exercises**:
- Provide code examples alongside explanations
- Use concrete examples in exercises
- Ask learners to trace data structure states
- Include explanatory comments in code

**Example**:
```python
# Exercise: Trace this code and draw the list state after each step

# Start: [1, 2, 3]
numbers = [1, 2, 3]

# After append: [1, 2, 3, 4]
numbers.append(4)

# After remove: [1, 3, 4]
numbers.remove(2)

# Draw or write the list state at each point
```

---

### 7. Desirable Difficulty

**Definition**: Introducing appropriate challenges that feel difficult but are surmountable improves retention.

**Research Basis**: Bjork's desirable difficulties framework shows some struggle during learning enhances long-term retention.

**Application to Python Exercises**:
- Don't make exercises too easy (rote copying)
- Add constraints that require thinking
- Remove some scaffolding progressively
- Include exercises slightly beyond current comfort zone

**Example**:
```
Easy (less desirable difficulty):
Write a function that takes two parameters and returns their sum.

Better (appropriate difficulty):
Write a function calculate_total(items, tax_rate=0.08) that:
- Calculates subtotal of items list
- Applies tax rate
- Returns total rounded to 2 decimal places
- Handles empty items list appropriately
```

**Caution**: Difficulty must be "desirable" (challenging but achievable), not frustrating.

---

## Combining Strategies

**Optimal Exercise Design** uses multiple strategies:

```
EXERCISE: Build a Contact Manager (Week 3)

[Retrieval Practice]: Write from memory without referencing notes
[Spaced Repetition]: This revisits lists (Week 1) and dictionaries (Week 2)
[Interleaving]: Combines multiple data structure concepts
[Elaboration]: Requires explaining design decisions
[Desirable Difficulty]: Slightly beyond current comfort zone

Requirements:
- Store contacts as dictionaries in a list
- Functions: add_contact, find_contact, list_all_contacts
- Explain why you chose this data structure over alternatives
```

## Evidence-Based Exercise Sequence

**Lesson Flow**:
1. **Learn**: Introduction with concrete examples
2. **Practice**: Immediate retrieval (fill-in-blank)
3. **Apply**: Build-from-scratch exercise (desirable difficulty)
4. **Explain**: Elaboration questions about approach
5. **Review**: Spaced repetition of previous concepts
6. **Interleave**: Mix of old and new concepts

## Measurement: Is It Working?

Track these indicators:
- **Retention**: Can learners recall concepts days/weeks later?
- **Transfer**: Can learners apply concepts to new situations?
- **Confidence**: Do learners feel more capable over time?
- **Performance**: Are error rates decreasing?

## Common Mistakes

**Mistake 1**: Only using recognition (multiple choice) without retrieval (write from memory)
**Mistake 2**: Massed practice (10 identical exercises in a row)
**Mistake 3**: Too much scaffolding (no desirable difficulty)
**Mistake 4**: No spaced repetition (never revisiting old concepts)
**Mistake 5**: Only abstract explanations (no concrete examples)

## Further Reading

- "Make It Stick" (Brown, Roediger, McDaniel) - Practical application of learning research
- "How Learning Works" (Ambrose et al.) - Seven research-based principles
- "The Learning Scientists" blog - Evidence-based study strategies
- Roediger, H. L., & Karpicke, J. D. (2006). "Test-enhanced learning"
- Bjork, R. A. (1994). "Memory and metamemory considerations in the training of human beings"

```

### reference/difficulty-progression.md

```markdown
# Difficulty Progression in Programming Exercises

## Overview

Effective exercise sequences gradually increase difficulty, building learner confidence and competence. This guide provides frameworks for designing progressive exercise difficulty.

## Difficulty Dimensions

Exercises can increase difficulty along multiple dimensions:

1. **Cognitive Complexity**: Number of concepts/steps required
2. **Code Length**: Lines of code to write
3. **Abstraction Level**: Concrete → abstract thinking
4. **Scaffolding**: Amount of support provided
5. **Problem Clarity**: Specification clarity
6. **Edge Cases**: Number of special cases to handle

## Bloom's Taxonomy Progression

Map exercises to cognitive levels:

**Remember** (Lowest):
- Recall syntax, recognize patterns
- Example: "What does list.append() do?"

**Understand**:
- Explain concepts, predict output
- Example: "Trace this code and predict output"

**Apply**:
- Use concepts in standard situations
- Example: "Write a function using list comprehension"

**Analyze**:
- Debug code, compare approaches
- Example: "Find and fix three bugs in this code"

**Evaluate**:
- Assess quality, choose best approach
- Example: "Which of these three solutions is most efficient and why?"

**Create** (Highest):
- Design new solutions, integrate multiple concepts
- Example: "Build a contact manager with add/search/delete functions"

## Progressive Example Sequence

### Level 1: Guided (Highest Support)

```python
# Complete this function (one blank)
def double_number(n):
    return n __ 2  # Fill in the operator
```

### Level 2: Structured (Medium Support)

```python
# Complete this function (multiple blanks, clear structure provided)
def find_max(numbers):
    max_value = numbers[0]
    for num in ________:
        if _________:
            max_value = ________
    return max_value
```

### Level 3: Specification (Low Support)

```
Write a function find_max(numbers) that returns the largest number in a list.
Handle empty lists by returning None.
```

### Level 4: Open-Ended (Minimal Support)

```
Create a function to process a list of student scores. It should:
- Calculate average
- Find highest and lowest scores
- Determine letter grades (A: 90+, B: 80-89, C: 70-79, D: 60-69, F: <60)
- Return a summary dictionary
```

## PRIME Framework

**P**rerequisites: What must be known first
**R**elevance: Clear purpose and application
**I**nstruction: Explicit about what to do
**M**anageable: Achievable with current skills
**E**ngaging: Interesting and meaningful

## Difficulty Scaling Patterns

### Pattern 1: Concept Accumulation

```
Exercise 1: Use if statement
Exercise 2: Use if/else
Exercise 3: Use if/elif/else
Exercise 4: Nested conditionals
Exercise 5: Complex boolean expressions
```

### Pattern 2: Scaffolding Fade

```
Exercise 1: Complete code (70% provided)
Exercise 2: Complete code (50% provided)
Exercise 3: Complete code (30% provided)
Exercise 4: Specification only (0% provided)
```

### Pattern 3: Constraint Addition

```
Exercise 1: Sort a list (use built-in sorted())
Exercise 2: Sort without built-in functions
Exercise 3: Sort with custom comparison
Exercise 4: Implement sorting algorithm from scratch
```

### Pattern 4: Edge Case Complexity

```
Exercise 1: Handle normal input only
Exercise 2: Handle empty input
Exercise 3: Handle invalid types
Exercise 4: Handle all edge cases with clear error messages
```

## Recommended Difficulty Distribution

**Beginner Module** (10 exercises):
- 40% Easy (Levels 1-2)
- 40% Medium (Level 3)
- 20% Challenging (Level 4)

**Intermediate Module**:
- 20% Easy (review)
- 50% Medium
- 30% Challenging

**Advanced Module**:
- 10% Easy (review)
- 30% Medium
- 60% Challenging

## Time-Based Progression

**5-minute exercises**: Fill-in-blank, trace execution
**10-minute exercises**: Debug-this, small functions
**20-minute exercises**: Multi-function programs
**45-minute exercises**: Mini-projects with multiple requirements

## Warning Signs

**Too Easy**:
- Learners complete instantly without thinking
- No errors or struggle
- Boredom reported

**Too Hard**:
- High frustration, giving up
- Unable to start (no entry point)
- Requires concepts not yet taught

**Adjustment**: Move difficulty up or down based on completion time and success rate (target 70-80% success).

## Further Reading

- "Flow" (Csikszentmihalyi) - Optimal challenge balance
- "How to Design Programs" - Systematic exercise progression

```

### reference/spaced-repetition.md

```markdown
# Spaced Repetition for Programming Skills

## Overview

Spaced repetition is a learning technique that involves reviewing material at increasing intervals. For programming, this means revisiting concepts across multiple lessons and exercises rather than practicing intensively once and never returning.

## The Forgetting Curve

Without review, we forget:
- **20 minutes later**: 40% forgotten
- **1 day later**: 60% forgotten
- **1 week later**: 75% forgotten
- **1 month later**: 80%+ forgotten

Spaced repetition combats this by reviewing just before we'd forget.

## Optimal Review Intervals

**First review**: 1 day after initial learning
**Second review**: 3 days after first review
**Third review**: 1 week after second review
**Fourth review**: 2 weeks after third review
**Fifth review**: 1 month after fourth review

## Application to Python Exercises

### Strategy 1: Spiral Curriculum

Revisit concepts across lessons:

```
Week 1: Introduce lists (.append, .remove, indexing)
Week 2: New topic (dictionaries) + list review exercises
Week 3: New topic (functions) + use lists in examples
Week 4: New topic (classes) + classes with list attributes
Week 6: Project using lists, dictionaries, functions, classes together
```

### Strategy 2: Mixed Exercise Sets

Each exercise set includes:
- **60%** new concept practice
- **30%** recent concepts (last 1-2 lessons)
- **10%** older concepts (3+ lessons ago)

**Example Exercise Set (Lesson 5):**
```
Exercise 1: Current concept (loops)
Exercise 2: Current concept (loops)
Exercise 3: Recent concept (conditionals from Lesson 4)
Exercise 4: Current concept (loops)
Exercise 5: Old concept (variables from Lesson 1)
Exercise 6: Current concept (loops)
Exercise 7: Recent concept (lists from Lesson 3)
```

### Strategy 3: Progressive Complexity

Each time you revisit a concept, increase difficulty:

**Week 1** (Introduction):
```python
# Basic list usage
fruits = ["apple", "banana"]
fruits.append("cherry")
print(fruits)
```

**Week 2** (Review + Slightly More Complex):
```python
# Use lists in conditional logic
scores = [85, 92, 78, 88, 95]
if 90 in scores:
    print("At least one A grade!")
```

**Week 3** (Review + Integration with New Concept):
```python
# Use lists with functions
def calculate_average(numbers):
    return sum(numbers) / len(numbers)

grades = [85, 92, 78, 88, 95]
avg = calculate_average(grades)
```

**Week 5** (Review + Advanced Usage):
```python
# List comprehensions (new) using prior list knowledge
numbers = [1, 2, 3, 4, 5]
squared = [n**2 for n in numbers if n % 2 == 0]
```

### Strategy 4: Cumulative Projects

**Mini-Project Each Week** that requires previous concepts:

- **Week 2 Project**: Use Week 1 + Week 2 concepts
- **Week 3 Project**: Use Week 1 + Week 2 + Week 3 concepts
- **Week 4 Project**: Use all concepts learned so far

## Implementing in Exercise Design

### Tagging Exercises

Tag each exercise with concepts it practices:

```yaml
exercise_id: "EX-305"
primary_concept: "loops"  # New this lesson
secondary_concepts:
  - "lists"  # From Lesson 2
  - "conditionals"  # From Lesson 3
difficulty: "medium"
last_practiced:
  lists: "2 lessons ago"
  conditionals: "1 lesson ago"
```

### Spacing Calculator

Determine when to revisit:

```
Concept: List methods
- Introduced: Lesson 1
- First review: Lesson 2 (1 lesson gap)
- Second review: Lesson 4 (2 lesson gap)
- Third review: Lesson 7 (3 lesson gap)
- Fourth review: Lesson 11 (4 lesson gap)
```

## Practical Patterns

### Pattern 1: "Throwback Thursday"

Every 4th lesson, dedicate time to reviewing concepts from 3-4 weeks ago.

### Pattern 2: Progressive Problem Revisit

Same problem, different complexity:

**Week 1**: Write function to sum a list
**Week 3**: Extend to handle empty lists and non-numeric values
**Week 6**: Optimize for very large lists, add type hints and tests

### Pattern 3: Concept Combination Matrix

Later exercises combine previously learned concepts:

```
Lesson 5 combines:
- Loops (Lesson 2) + Dictionaries (Lesson 4)
- Functions (Lesson 3) + Lists (Lesson 1)
```

## Balancing New vs Review

**Early Course** (Weeks 1-4):
- 70% new concept practice
- 30% review

**Mid Course** (Weeks 5-8):
- 60% new concept practice
- 40% review

**Late Course** (Weeks 9-12):
- 50% new concepts
- 50% review and integration

## Benefits of Spaced Repetition

1. **Reduced cramming**: Knowledge built incrementally
2. **Better retention**: Long-term memory formation
3. **Transfer**: Ability to apply concepts in new contexts
4. **Confidence**: Repeated success builds competence

## Further Reading

- Cepeda et al. (2006) "Distributed practice in verbal recall tasks"
- Rohrer & Taylor (2007) "The shuffling of mathematics problems improves learning"
- "Make It Stick" (Brown, Roediger, McDaniel) - Chapter on spacing

```

### templates/exercise-template.yml

```yaml
exercise_id: ""  # Unique identifier (e.g., "EX-101")
title: ""  # Clear, descriptive title
type: ""  # fill-in-blank | debug-this | build-from-scratch | extend-code | trace-execution | explain-code | refactor | test-driven | parsons | multiple-choice

learning_objectives:
  - ""  # What learner will be able to do after completing this

primary_concept: ""  # Main concept being practiced
secondary_concepts:
  - ""  # Additional concepts involved (for spaced repetition tracking)

difficulty: ""  # easy | medium | hard
estimated_time_minutes: 0
target_audience: ""  # beginner | intermediate | advanced

evidence_based_strategies:
  - ""  # Which strategies this exercise employs: retrieval-practice | spaced-repetition | interleaving | elaboration | desirable-difficulty

prerequisites:
  - ""  # Concepts learner must understand before attempting

instructions: |
  # Clear instructions for the learner
  # What they should do
  # Any constraints or requirements

starter_code: |
  # For fill-in-blank, extend-code, or debug-this exercises
  # Provide initial code here
  # Use ___ or BLANK for fill-in exercises

solution: |
  # Complete correct solution
  # Include comments explaining approach

test_cases:
  - input: ""
    expected_output: ""
    description: ""  # What this test verifies

  - input: ""
    expected_output: ""
    description: ""

hints:
  - level: 1  # Progressive hints: 1 (gentle) to 3 (explicit)
    text: ""

  - level: 2
    text: ""

  - level: 3
    text: ""

rubric:
  criteria:
    - name: "Correctness"
      points: 0
      description: ""

    - name: "Code Quality"
      points: 0
      description: ""

    - name: "Efficiency"
      points: 0
      description: ""

common_mistakes:
  - mistake: ""  # Common error learners make
    explanation: ""  # Why it's wrong
    hint: ""  # How to fix

extensions:
  - description: ""  # How to extend exercise for more practice
    difficulty: ""  # easy | medium | hard

related_exercises:
  - ""  # IDs of related exercises for spaced repetition

```

### templates/rubric-template.yml

```yaml
rubric_name: ""  # E.g., "List Methods Exercise Rubric"
total_points: 0
pass_threshold: 0  # Minimum points to pass

criteria:
  - criterion: "Correctness"
    weight: 40  # Percentage of total grade
    levels:
      excellent:
        points: 4
        description: "Solution produces correct output for all test cases including edge cases"
      adequate:
        points: 3
        description: "Solution works for standard cases but misses some edge cases"
      developing:
        points: 2
        description: "Solution works for simple cases but fails on complex inputs"
      insufficient:
        points: 1
        description: "Solution has significant errors or doesn't run"

  - criterion: "Code Quality"
    weight: 30
    levels:
      excellent:
        points: 4
        description: "Clean, readable code with meaningful names, proper formatting, and comments where needed"
      adequate:
        points: 3
        description: "Generally readable with minor style issues or missing comments"
      developing:
        points: 2
        description: "Code works but has poor naming, inconsistent style, or is hard to follow"
      insufficient:
        points: 1
        description: "Code is difficult to read or understand"

  - criterion: "Efficiency"
    weight: 20
    levels:
      excellent:
        points: 4
        description: "Uses optimal approach and appropriate data structures"
      adequate:
        points: 3
        description: "Reasonable approach with minor inefficiencies"
      developing:
        points: 2
        description: "Works but uses inefficient approach (e.g., nested loops where not needed)"
      insufficient:
        points: 1
        description: "Significantly inefficient approach"

  - criterion: "Error Handling"
    weight: 10
    levels:
      excellent:
        points: 4
        description: "Handles all edge cases and invalid inputs gracefully"
      adequate:
        points: 3
        description: "Handles most edge cases"
      developing:
        points: 2
        description: "Handles some edge cases"
      insufficient:
        points: 1
        description: "No error handling or edge case consideration"

feedback_guidelines:
  - "Be specific about what was done well"
  - "Identify specific improvements with examples"
  - "Suggest next steps for continued learning"
  - "Maintain encouraging, growth-mindset tone"

```