SkillHub ClubAnalyze Data & AIFull StackData / AI

policyengine-uk-data-skill

UK survey data enhancement - FRS with WAS imputation patterns

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C4.1

Composite score

4.1

Best-practice grade

B77.6

Install command

npx @skill-hub/cli install policyengine-policyengine-claude-policyengine-uk-data-skill

Repository

PolicyEngine/policyengine-claude

Skill path: skills/data-science/policyengine-uk-data-skill

UK survey data enhancement - FRS with WAS imputation patterns

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: PolicyEngine.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install policyengine-uk-data-skill into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/PolicyEngine/policyengine-claude before adding policyengine-uk-data-skill to shared team environments
Use policyengine-uk-data-skill for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: policyengine-uk-data
description: UK survey data enhancement - FRS with WAS imputation patterns
---

# PolicyEngine UK Data

PolicyEngine UK Data provides enhanced Family Resources Survey (FRS) datasets with imputed variables from the Wealth and Assets Survey (WAS).

## For Users

### What is policyengine-uk-data?

PolicyEngine UK uses the Family Resources Survey (FRS) as its primary microdata source. The FRS contains household demographics, income, and benefits but lacks detailed wealth information. The Wealth and Assets Survey (WAS) provides comprehensive wealth data but has a smaller sample. This package imputes wealth variables from WAS to FRS.

**Key datasets:**
- **FRS (Family Resources Survey):** Main UK household survey with ~20,000 households
- **WAS (Wealth and Assets Survey):** Detailed wealth survey with ~20,000 households
- **Enhanced FRS:** FRS with imputed wealth variables from WAS

## For Analysts

### Repository

**Location:** PolicyEngine/policyengine-uk-data

**Clone:**
```bash
git clone https://github.com/PolicyEngine/policyengine-uk-data
cd policyengine-uk-data
```

### Structure

```
policyengine_uk_data/
├── datasets/          # Dataset definitions
│   └── frs/          # FRS enhancement
│       ├── raw_frs.py           # Raw FRS loader
│       ├── calibration.py       # Weight calibration
│       └── imputations/         # Variable imputation
│           ├── wealth.py        # WAS wealth imputation
│           ├── student_loans.py # Student loan balances
│           └── ...
└── storage/          # Data storage utilities
```

### Installation

**From PyPI:**
```bash
pip install policyengine-uk-data
```

**Development:**
```bash
pip install -e .
```

## For Contributors

### Imputation Pattern

The standard pattern for adding WAS-to-FRS imputations:

**1. Identify the variables:**
- Source: WAS variables (complete wealth data)
- Target: FRS (needs these variables)
- Common variables: Demographics that exist in both surveys

**2. Follow the `wealth.py` pattern:**

```python
# In policyengine_uk_data/datasets/frs/imputations/my_variable.py

from policyengine_uk_data.datasets.frs.imputations.imputation_utils import (
    impute_from_was
)

def add_my_variable(frs, was):
    """
    Impute my_variable from WAS to FRS.

    Args:
        frs: Enhanced FRS DataFrame
        was: WAS DataFrame with target variable

    Returns:
        Enhanced FRS with imputed variable
    """
    return impute_from_was(
        donor=was,
        recipient=frs,
        target_variable='my_variable',
        common_variables=[
            'age',
            'region',
            'employment_status',
            # Add relevant predictors
        ],
        method='quantile_forest'  # Or other microimpute method
    )
```

**3. Update the RENAMES dictionary:**

If the variable has different names in WAS vs FRS:

```python
# In the relevant module
RENAMES = {
    "was_variable_name": "standardized_name",
    "frs_variable_name": "standardized_name",
}
```

**4. Add to the pipeline:**

Register the imputation in the FRS enhancement pipeline so it runs automatically.

### Example: Student Loan Imputation

The recent PR #252 added student loan balance imputation:

```python
# policyengine_uk_data/datasets/frs/imputations/student_loans.py

def add_student_loan_balance(frs, was):
    """
    Impute student loan balances from WAS to FRS.

    WAS contains:
    - total_loans: All loan balances
    - total_loans_exc_slc: Loans excluding student loans

    Derived variable:
    - student_loan_balance = total_loans - total_loans_exc_slc
    """
    return impute_from_was(
        donor=was,
        recipient=frs,
        target_variable='student_loan_balance',
        common_variables=[
            'age',
            'highest_qualification',
            'region',
            'employment_status',
            'income'
        ],
        method='quantile_forest'
    )
```

### Common Variables for WAS-FRS Imputation

**Demographics (always available):**
- age
- sex
- region (UK region codes)

**Economic status:**
- employment_status
- income (or income bands)
- hours_worked

**Household:**
- household_size
- num_children
- tenure_type (own/rent)

**Education:**
- highest_qualification
- currently_studying

### Testing

**Run tests:**
```bash
make test

# Or pytest directly
pytest policyengine_uk_data/tests/ -v
```

**Test structure:**
```bash
# Check if imputation was added
pytest policyengine_uk_data/tests/test_imputations.py::test_student_loan_imputation
```

### Validation

After adding an imputation, validate:

**1. Distribution check:**
```python
# Compare imputed FRS distribution to WAS source
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hist(was['my_variable'], bins=50)
ax1.set_title('WAS (source)')
ax2.hist(frs_imputed['my_variable'], bins=50)
ax2.set_title('FRS (imputed)')
```

**2. Aggregate totals:**
```python
# Check population-weighted totals match administrative data
weighted_total = (frs_imputed['my_variable'] * frs_imputed['weight']).sum()
print(f"Imputed total: {weighted_total:,.0f}")
# Compare to known UK aggregate
```

**3. Conditional relationships:**
```python
# Verify relationships are preserved
# E.g., student loan balance by age and qualification
frs_imputed.groupby(['age_band', 'qualification'])['student_loan_balance'].mean()
```

## Common Patterns

### Pattern 1: Simple Variable Imputation

```python
# Most common: direct variable imputation
def add_variable(frs, was):
    return impute_from_was(
        donor=was,
        recipient=frs,
        target_variable='my_var',
        common_variables=['age', 'income', 'region']
    )
```

### Pattern 2: Derived Variable Imputation

```python
# When WAS has components but not the exact variable
def add_derived_variable(frs, was):
    # First derive the variable in WAS
    was['net_wealth'] = was['total_assets'] - was['total_debts']

    # Then impute
    return impute_from_was(
        donor=was,
        recipient=frs,
        target_variable='net_wealth',
        common_variables=['age', 'income', 'region']
    )
```

### Pattern 3: Multiple Related Variables

```python
# Impute several related variables together
def add_wealth_components(frs, was):
    variables = [
        'property_wealth',
        'financial_wealth',
        'pension_wealth',
        'debt'
    ]

    for var in variables:
        frs = impute_from_was(
            donor=was,
            recipient=frs,
            target_variable=var,
            common_variables=['age', 'income', 'region']
        )

    return frs
```

## Integration with PolicyEngine UK

**Usage flow:**
```
1. Load raw FRS
   ↓
2. Add WAS imputations (wealth, student loans, etc.)
   ↓
3. Calibrate weights to administrative benchmarks
   ↓
4. Validate against known UK totals
   ↓
5. Package for policyengine-uk
   ↓
6. Use for UK policy simulations
```

**In policyengine-uk:**
```python
from policyengine_uk import Microsimulation

# Uses enhanced FRS under the hood
sim = Microsimulation()
sim.calculate('student_loan_repayment', period='2024')
# Uses imputed student_loan_balance variable
```

## Related Skills

- **microimpute-skill** - ML imputation methods (underlying technique)
- **policyengine-uk-skill** - UK policy model (uses this data)
- **microcalibrate-skill** - Weight calibration (next step after imputation)
- **microdf-skill** - Working with survey microdata

## Resources

**Repository:** https://github.com/PolicyEngine/policyengine-uk-data
**Dependencies:** policyengine-uk, policyengine-core, microdf, microimpute
**Data sources:**
- FRS: https://www.gov.uk/government/collections/family-resources-survey
- WAS: https://www.ons.gov.uk/surveys/informationforhouseholdsandindividuals/householdandindividualsurveys/wealthandassetssurvey