SkillHub ClubAnalyze Data & AIFull StackData / AI

instrument-data-to-allotrope

Convert laboratory instrument output files (PDF, CSV, Excel, TXT) to Allotrope Simple Model (ASM) JSON format or flattened 2D CSV. Use this skill when scientists need to standardize instrument data for LIMS systems, data lakes, or downstream analysis. Supports auto-detection of instrument types. Outputs include full ASM JSON, flattened CSV for easy import, and exportable Python code for data engineers. Common triggers include converting instrument files, standardizing lab data, preparing data for upload to LIMS/ELN systems, or generating parser code for production pipelines.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

9,958

Hot score

Updated

March 20, 2026

Overall rating

C4.5

Composite score

4.5

Best-practice grade

C64.8

Install command

npx @skill-hub/cli install anthropics-knowledge-work-plugins-instrument-data-to-allotrope

Repository

anthropics/knowledge-work-plugins

Skill path: bio-research/skills/instrument-data-to-allotrope

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: anthropics.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install instrument-data-to-allotrope into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/anthropics/knowledge-work-plugins before adding instrument-data-to-allotrope to shared team environments
Use instrument-data-to-allotrope for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: instrument-data-to-allotrope
description: Convert laboratory instrument output files (PDF, CSV, Excel, TXT) to Allotrope Simple Model (ASM) JSON format or flattened 2D CSV. Use this skill when scientists need to standardize instrument data for LIMS systems, data lakes, or downstream analysis. Supports auto-detection of instrument types. Outputs include full ASM JSON, flattened CSV for easy import, and exportable Python code for data engineers. Common triggers include converting instrument files, standardizing lab data, preparing data for upload to LIMS/ELN systems, or generating parser code for production pipelines.
---

# Instrument Data to Allotrope Converter

Convert instrument files into standardized Allotrope Simple Model (ASM) format for LIMS upload, data lakes, or handoff to data engineering teams.

> **Note: This is an Example Skill**
>
> This skill demonstrates how skills can support your data engineering tasks—automating schema transformations, parsing instrument outputs, and generating production-ready code.
>
> **To customize for your organization:**
> - Modify the `references/` files to include your company's specific schemas or ontology mappings
> - Use an MCP server to connect to systems that define your schemas (e.g., your LIMS, data catalog, or schema registry)
> - Extend the `scripts/` to handle proprietary instrument formats or internal data standards
>
> This pattern can be adapted for any data transformation workflow where you need to convert between formats or validate against organizational standards.

## Workflow Overview

1. **Detect instrument type** from file contents (auto-detect or user-specified)
2. **Parse file** using allotropy library (native) or flexible fallback parser
3. **Generate outputs**:
   - ASM JSON (full semantic structure)
   - Flattened CSV (2D tabular format)
   - Python parser code (for data engineer handoff)
4. **Deliver** files with summary and usage instructions

> **When Uncertain:** If you're unsure how to map a field to ASM (e.g., is this raw data or calculated? device setting or environmental condition?), ask the user for clarification. Refer to `references/field_classification_guide.md` for guidance, but when ambiguity remains, confirm with the user rather than guessing.

## Quick Start

```python
# Install requirements first
pip install allotropy pandas openpyxl pdfplumber --break-system-packages

# Core conversion
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file

# Convert with allotropy
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
```

## Output Format Selection

**ASM JSON (default)** - Full semantic structure with ontology URIs
- Best for: LIMS systems expecting ASM, data lakes, long-term archival
- Validates against Allotrope schemas

**Flattened CSV** - 2D tabular representation
- Best for: Quick analysis, Excel users, systems without JSON support
- Each measurement becomes one row with metadata repeated

**Both** - Generate both formats for maximum flexibility

## Calculated Data Handling

**IMPORTANT:** Separate raw measurements from calculated/derived values.

- **Raw data** → `measurement-document` (direct instrument readings)
- **Calculated data** → `calculated-data-aggregate-document` (derived values)

Calculated values MUST include traceability via `data-source-aggregate-document`:

```json
"calculated-data-aggregate-document": {
  "calculated-data-document": [{
    "calculated-data-identifier": "SAMPLE_B1_DIN_001",
    "calculated-data-name": "DNA integrity number",
    "calculated-result": {"value": 9.5, "unit": "(unitless)"},
    "data-source-aggregate-document": {
      "data-source-document": [{
        "data-source-identifier": "SAMPLE_B1_MEASUREMENT",
        "data-source-feature": "electrophoresis trace"
      }]
    }
  }]
}
```

**Common calculated fields by instrument type:**
| Instrument | Calculated Fields |
|------------|-------------------|
| Cell counter | Viability %, cell density dilution-adjusted values |
| Spectrophotometer | Concentration (from absorbance), 260/280 ratio |
| Plate reader | Concentrations from standard curve, %CV |
| Electrophoresis | DIN/RIN, region concentrations, average sizes |
| qPCR | Relative quantities, fold change |

See `references/field_classification_guide.md` for detailed guidance on raw vs. calculated classification.

## Validation

Always validate ASM output before delivering to the user:

```bash
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json  # Compare to reference
python scripts/validate_asm.py output.json --strict  # Treat warnings as errors
```

**Validation Rules:**
- Based on Allotrope ASM specification (December 2024)
- Last updated: 2026-01-07
- Source: https://gitlab.com/allotrope-public/asm

**Soft Validation Approach:**
Unknown techniques, units, or sample roles generate **warnings** (not errors) to allow for forward compatibility. If Allotrope adds new values after December 2024, the validator won't block them—it will flag them for manual verification. Use `--strict` mode to treat warnings as errors if you need stricter validation.

**What it checks:**
- Correct technique selection (e.g., multi-analyte profiling vs plate reader)
- Field naming conventions (space-separated, not hyphenated)
- Calculated data has traceability (`data-source-aggregate-document`)
- Unique identifiers exist for measurements and calculated values
- Required metadata present
- Valid units and sample roles (with soft validation for unknown values)

## Supported Instruments

See `references/supported_instruments.md` for complete list. Key instruments:

| Category | Instruments |
|----------|-------------|
| Cell Counting | Vi-CELL BLU, Vi-CELL XR, NucleoCounter |
| Spectrophotometry | NanoDrop One/Eight/8000, Lunatic |
| Plate Readers | SoftMax Pro, EnVision, Gen5, CLARIOstar |
| ELISA | SoftMax Pro, BMG MARS, MSD Workbench |
| qPCR | QuantStudio, Bio-Rad CFX |
| Chromatography | Empower, Chromeleon |

## Detection & Parsing Strategy

### Tier 1: Native allotropy parsing (PREFERRED)
**Always try allotropy first.** Check available vendors directly:

```python
from allotropy.parser_factory import Vendor

# List all supported vendors
for v in Vendor:
    print(f"{v.name}")

# Common vendors:
# AGILENT_TAPESTATION_ANALYSIS  (for TapeStation XML)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... many more
```

**When the user provides a file, check if allotropy supports it before falling back to manual parsing.** The `scripts/convert_to_asm.py` auto-detection only covers a subset of allotropy vendors.

### Tier 2: Flexible fallback parsing
**Only use if allotropy doesn't support the instrument.** This fallback:
- Does NOT generate `calculated-data-aggregate-document`
- Does NOT include full traceability
- Produces simplified ASM structure

Use flexible parser with:
- Column name fuzzy matching
- Unit extraction from headers
- Metadata extraction from file structure

### Tier 3: PDF extraction
For PDF-only files, extract tables using pdfplumber, then apply Tier 2 parsing.

## Pre-Parsing Checklist

Before writing a custom parser, ALWAYS:

1. **Check if allotropy supports it** - Use native parser if available
2. **Find a reference ASM file** - Check `references/examples/` or ask user
3. **Review instrument-specific guide** - Check `references/instrument_guides/`
4. **Validate against reference** - Run `validate_asm.py --reference <file>`

## Common Mistakes to Avoid

| Mistake | Correct Approach |
|---------|------------------|
| Manifest as object | Use URL string |
| Lowercase detection types | Use "Absorbance" not "absorbance" |
| "emission wavelength setting" | Use "detector wavelength setting" for emission |
| All measurements in one document | Group by well/sample location |
| Missing procedure metadata | Extract ALL device settings per measurement |

## Code Export for Data Engineers

Generate standalone Python scripts that scientists can hand off:

```python
# Export parser code
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"
```

The exported script:
- Has no external dependencies beyond pandas/allotropy
- Includes inline documentation
- Can run in Jupyter notebooks
- Is production-ready for data pipelines

## File Structure

```
instrument-data-to-allotrope/
├── SKILL.md                          # This file
├── scripts/
│   ├── convert_to_asm.py            # Main conversion script
│   ├── flatten_asm.py               # ASM → 2D CSV conversion
│   ├── export_parser.py             # Generate standalone parser code
│   └── validate_asm.py              # Validate ASM output quality
└── references/
    ├── supported_instruments.md     # Full instrument list with Vendor enums
    ├── asm_schema_overview.md       # ASM structure reference
    ├── field_classification_guide.md # Where to put different field types
    └── flattening_guide.md          # How flattening works
```

## Usage Examples

### Example 1: Vi-CELL BLU file
```
User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]

Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
   - viCell_Results_asm.json (full ASM)
   - viCell_Results_flat.csv (2D format)
   - viCell_parser.py (exportable code)
```

### Example 2: Request for code handoff
```
User: "I need to give our data engineer code to parse NanoDrop files"

Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version
```

### Example 3: LIMS-ready flattened output
```
User: "Convert this ELISA data to a CSV I can upload to our LIMS"

Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements
```

## Implementation Notes

### Installing allotropy
```bash
pip install allotropy --break-system-packages
```

### Handling parse failures
If allotropy native parsing fails:
1. Log the error for debugging
2. Fall back to flexible parser
3. Report reduced metadata completeness to user
4. Suggest exporting different format from instrument

### ASM Schema Validation
Validate output against Allotrope schemas when available:
```python
import jsonschema
# Schema URLs in references/asm_schema_overview.md
```


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/validate_asm.py

```python
#!/usr/bin/env python3
"""
ASM Output Validation Script

Validates ASM JSON output against common issues:
- Wrong technique selection
- Hyphenated field names (should be space-separated)
- Missing statistics documents
- Incorrect units
- Missing required fields
- Missing calculated data traceability
- Improperly flattened nested documents (sample document, device control, etc.)

Validation Rules:
    Based on: Allotrope ASM specification (December 2024)
    Last Updated: 2026-01-07
    Source: https://gitlab.com/allotrope-public/asm/-/tree/main/json-schemas/adm

Note: Unknown techniques/units generate WARNINGS (not errors) to allow for new
additions to the Allotrope specification. This prevents blocking valid data
when the Allotrope foundation adds new techniques or units.

Usage:
    python validate_asm.py output.json
    python validate_asm.py output.json --reference reference.json
    python validate_asm.py output.json --strict
"""

import json
import re
import sys
import argparse
from typing import Dict, List, Tuple, Any, Optional

# Validation metadata
ASM_SPEC_VERSION = "2024-12"
VALIDATION_RULES_DATE = "2026-01-07"
SCHEMA_SOURCE = "https://gitlab.com/allotrope-public/asm"


# All valid ASM techniques from https://gitlab.com/allotrope-public/asm/-/tree/main/json-schemas/adm
VALID_TECHNIQUES = [
    "absorbance",
    "automated-reactors",
    "balance",
    "bga",
    "binding-affinity",
    "bulk-density",
    "cell-counting",
    "cell-culture-analyzer",
    "chromatography",
    "code-reader",
    "conductance",
    "conductivity",
    "disintegration",
    "dsc",
    "dvs",
    "electronic-lab-notebook",
    "electronic-spectrometry",
    "electrophoresis",
    "flow-cytometry",
    "fluorescence",
    "foam-height",
    "foam-qualification",
    "fplc",
    "ftir",
    "gas-chromatography",
    "gc-ms",
    "gloss",
    "hot-tack",
    "impedance",
    "lc-ms",
    "light-obscuration",
    "liquid-chromatography",
    "liquid-handler",  # Added for liquid handler support
    "loss-on-drying",
    "luminescence",
    "mass-spectrometry",
    "metabolite-analyzer",
    "multi-analyte-profiling",
    "nephelometry",
    "nmr",
    "optical-imaging",
    "optical-microscopy",
    "osmolality",
    "oven-kf",
    "pcr",
    "ph",
    "plate-reader",
    "pressure-monitoring",
    "psd",
    "pumping",
    "raman",
    "rheometry",
    "sem",
    "solution-analyzer",
    "specific-rotation",
    "spectrophotometry",
    "stirring",
    "surface-area-analysis",
    "tablet-hardness",
    "temperature-monitoring",
    "tensile-test",
    "thermogravimetric-analysis",
    "titration",
    "ultraviolet-absorbance",
    "x-ray-powder-diffraction",
]

# Instrument keywords that indicate specific techniques
TECHNIQUE_INDICATORS = {
    "multi-analyte-profiling": [
        "bead",
        "luminex",
        "bio-plex",
        "bioplex",
        "multiplex",
        "plex",
        "msd",
        "region",
    ],
    "electrophoresis": [
        "tapestation",
        "bioanalyzer",
        "labchip",
        "fragment",
        "din",
        "rin",
        "gel",
        "capillary",
    ],
    "spectrophotometry": ["nanodrop", "lunatic", "a260", "a280", "wavelength"],
    "cell-counting": [
        "viability",
        "viable cell",
        "cell count",
        "vi-cell",
        "vicell",
        "nucleocounter",
        "cell density",
    ],
    "pcr": [
        "ct",
        "quantstudio",
        "cfx",
        "amplification",
        "melt curve",
        "qpcr",
        "cycle threshold",
    ],
    "plate-reader": [
        "microplate",
        "96-well",
        "384-well",
        "plate reader",
        "envision",
        "spectramax",
    ],
    "liquid-chromatography": [
        "hplc",
        "uplc",
        "retention time",
        "chromatogram",
        "empower",
        "chromeleon",
    ],
    "flow-cytometry": ["facs", "flow cytometry", "scatter", "gating", "cytometer"],
    "mass-spectrometry": ["m/z", "mass spec", "ms/ms", "lcms", "maldi"],
    "fluorescence": ["fluorescence", "excitation", "emission", "fluorimeter"],
    "luminescence": ["luminescence", "bioluminescence", "chemiluminescence"],
    "absorbance": ["absorbance", "optical density", "od600"],
    "ph": ["ph meter", "ph measurement"],
    "osmolality": ["osmolality", "osmometer"],
    "conductivity": ["conductivity", "conductance"],
    "balance": ["balance", "weight", "mass measurement"],
    "nmr": ["nmr", "nuclear magnetic resonance"],
    "ftir": ["ftir", "infrared", "ir spectrum"],
    "raman": ["raman", "raman spectroscopy"],
    "liquid-handler": [
        "biomek",
        "liquid handler",
        "aspirate",
        "dispense",
        "transfer volume",
        "liquid handling",
    ],
}

# Fields that should typically be in calculated-data-document, not measurement-document
SHOULD_BE_CALCULATED = [
    "dna integrity number",
    "rna integrity number",
    "din",
    "rin",
    "viability",
    "260/280",
    "a260/a280",
    "concentration",  # When derived from standard curve
    "percent of total",
    "average size",
    "molarity",  # When calculated from concentration
    "relative quantity",
    "fold change",
    "coefficient of variation",
]

# =============================================================================
# NESTED DOCUMENT STRUCTURE DEFINITIONS
# =============================================================================

# Fields that MUST be inside 'sample document' (space or hyphen separated)
SAMPLE_DOCUMENT_FIELDS = {
    # Core sample identification
    "sample identifier",
    "sample-identifier",
    "written name",
    "written-name",
    "batch identifier",
    "batch-identifier",
    "sample role type",
    "sample-role-type",
    "description",
    # Location fields (should be in sample document for most techniques)
    "location identifier",
    "location-identifier",
    "well location identifier",
    "well-location-identifier",
    "well plate identifier",
    "well-plate-identifier",
    # Liquid handler specific - source/destination pairs
    "source location identifier",
    "source-location-identifier",
    "destination location identifier",
    "destination-location-identifier",
    "source well plate identifier",
    "source-well-plate-identifier",
    "destination well plate identifier",
    "destination-well-plate-identifier",
    "source well location identifier",
    "source-well-location-identifier",
    "destination well location identifier",
    "destination-well-location-identifier",
}

# Fields that MUST be inside 'device control aggregate document' -> 'device control document'
DEVICE_CONTROL_FIELDS = {
    # General device control
    "device type",
    "device-type",
    "detector wavelength setting",
    "detector-wavelength-setting",
    "compartment temperature",
    "compartment-temperature",
    "sample volume setting",
    "sample-volume-setting",
    "flow rate",
    "flow-rate",
    "exposure duration setting",
    "exposure-duration-setting",
    "detector gain setting",
    "detector-gain-setting",
    "illumination setting",
    "illumination-setting",
    # Liquid handler specific
    "liquid handling technique",
    "liquid-handling-technique",
    "source liquid handling technique",
    "source-liquid-handling-technique",
    "destination liquid handling technique",
    "destination-liquid-handling-technique",
}

# Fields that should be in 'custom information document' (vendor-specific)
CUSTOM_INFO_FIELDS = {
    # Liquid handler specific
    "probe",
    "pod",
    "source labware name",
    "source-labware-name",
    "destination labware name",
    "destination-labware-name",
    "deck position",
    "deck-position",
}

# Fields that commonly get incorrectly flattened (superset for general checking)
COMMONLY_FLATTENED_FIELDS = {
    # Sample-related (often incorrectly put directly on measurement)
    "sample identifier",
    "sample-identifier",
    "sample barcode",
    "sample-barcode",
    "well index",
    "well-index",
    "location identifier",
    "location-identifier",
    # Device control related (often incorrectly put directly on measurement)
    "probe identifier",
    "probe-identifier",
    "device identifier",  # When it should be in device control doc, not measurement
    "device-identifier",
    "technique",  # Should be "liquid handling technique" in device control
    "transfer type",  # Should be structured differently
    "transfer-type",
}

# Standard ASM units
VALID_UNITS = {
    "fluorescence": ["RFU", "MFI", "(unitless)"],
    "counts": ["#"],
    "volume": ["μL", "mL", "L", "µL"],
    "concentration": [
        "ng/μL",
        "ng/mL",
        "pg/mL",
        "mg/mL",
        "μg/mL",
        "M",
        "mM",
        "μM",
        "nM",
    ],
    "temperature": ["degC"],
    "unitless": ["(unitless)", "%"],
    "molecular_weight": ["bp", "Da", "kDa"],
    "time": ["s", "min", "h"],
}

# Standard sample role types
VALID_SAMPLE_ROLES = [
    "standard sample role",
    "blank role",
    "control sample role",
    "unknown sample role",
    "reference sample role",
    "calibration sample role",
]

# Standard statistic datum roles
VALID_STATISTIC_ROLES = [
    "median role",
    "arithmetic mean role",
    "coefficient of variation role",
    "standard deviation role",
    "standard error role",
    "trimmed arithmetic mean role",
    "trimmed standard deviation role",
    "minimum value role",
    "maximum value role",
]


class ValidationResult:
    """Container for validation results."""

    def __init__(self):
        self.errors: List[str] = []
        self.warnings: List[str] = []
        self.info: List[str] = []
        self.metrics: Dict[str, Any] = {}

    def add_error(self, msg: str):
        self.errors.append(f"ERROR: {msg}")

    def add_warning(self, msg: str):
        self.warnings.append(f"WARNING: {msg}")

    def add_info(self, msg: str):
        self.info.append(f"INFO: {msg}")

    def is_valid(self) -> bool:
        return len(self.errors) == 0

    def print_report(self):
        print("\n" + "=" * 60)
        print("ASM VALIDATION REPORT")
        print("=" * 60)

        # Print metrics
        if self.metrics:
            print("\nMetrics:")
            for key, value in self.metrics.items():
                print(f"   {key}: {value}")

        # Print info
        if self.info:
            print("\n" + "\n".join(self.info))

        # Print warnings
        if self.warnings:
            print("\n" + "\n".join(self.warnings))

        # Print errors
        if self.errors:
            print("\n" + "\n".join(self.errors))

        # Summary
        print("\n" + "-" * 60)
        if self.is_valid():
            if self.warnings:
                print(f"PASSED with {len(self.warnings)} warning(s)")
            else:
                print("PASSED - No issues found")
        else:
            print(
                f"FAILED - {len(self.errors)} error(s), {len(self.warnings)} warning(s)"
            )
        print("=" * 60 + "\n")


def validate_manifest(asm: Dict, result: ValidationResult):
    """Check for valid manifest."""
    if "$asm.manifest" not in asm:
        result.add_error("Missing $asm.manifest")
        return

    manifest = asm["$asm.manifest"]
    if isinstance(manifest, str):
        if "allotrope.org" in manifest:
            result.add_info(f"Manifest: {manifest}")
        else:
            result.add_warning(f"Non-standard manifest URL: {manifest}")
    elif isinstance(manifest, dict):
        if "vocabulary" in manifest or "contexts" in manifest:
            result.add_info("Manifest: Object format with vocabulary/contexts")
        else:
            result.add_warning("Manifest object missing vocabulary or contexts")


def detect_technique(asm: Dict) -> Tuple[str, float]:
    """Detect technique from ASM structure."""
    # Check for technique in top-level keys
    for key in asm.keys():
        if key == "$asm.manifest":
            continue
        # Extract technique name from aggregate document key
        # Handle both "liquid handler aggregate document" and "liquid-handler-aggregate-document"
        key_normalized = key.lower().replace("-", " ")
        if "aggregate document" in key_normalized:
            technique = key_normalized.replace(" aggregate document", "").strip()
            return technique, 100.0

    return "unknown", 0.0


def validate_technique(asm: Dict, result: ValidationResult, content_str: str):
    """Validate technique selection."""
    technique, confidence = detect_technique(asm)
    result.metrics["technique"] = technique
    result.metrics["technique_confidence"] = confidence

    if technique == "unknown":
        result.add_warning("Could not detect technique from ASM structure")
        return

    result.add_info(f"Detected technique: {technique}")

    # Check if technique is in known list (soft validation)
    technique_normalized = technique.replace(" ", "-")
    if technique_normalized not in VALID_TECHNIQUES:
        result.add_warning(
            f"Unknown technique '{technique}' not in known list (as of {VALIDATION_RULES_DATE}). "
            f"This may be a new Allotrope addition. Verify at: {SCHEMA_SOURCE}"
        )

    # Check if technique seems appropriate for content
    content_lower = content_str.lower()
    suggested_technique = None

    for tech, keywords in TECHNIQUE_INDICATORS.items():
        matches = sum(1 for kw in keywords if kw in content_lower)
        if matches >= 2:  # Multiple keyword matches
            if tech != technique.replace(" ", "-"):
                suggested_technique = tech
                break

    if suggested_technique:
        result.add_warning(
            f"Content suggests '{suggested_technique}' but ASM uses '{technique}' - "
            "verify correct technique selection"
        )


def validate_naming_conventions(content_str: str, result: ValidationResult):
    """Check for proper space-separated naming (not hyphens)."""
    # Find all keys that look like ASM field names
    # Hyphenated keys in ASM are typically wrong (should be space-separated)
    hyphenated_keys = re.findall(r'"([a-z]+-[a-z]+-?[a-z]*-?[a-z]*)":', content_str)

    # Filter to likely ASM fields (not URLs, not manifest)
    asm_hyphenated = []
    for key in hyphenated_keys:
        if "http" in key or "manifest" in key:
            continue
        # Known hyphenated keys that are OK
        if key in ["data-source-identifier", "data-source-feature"]:
            continue
        asm_hyphenated.append(key)

    if asm_hyphenated:
        unique = list(set(asm_hyphenated))[:10]
        result.add_warning(
            f"Found hyphenated field names (ASM uses spaces): {unique}"
            + (" ... and more" if len(set(asm_hyphenated)) > 10 else "")
        )
        result.add_info("Tip: Use 'sample identifier' not 'sample-identifier'")


def count_measurements(content_str: str) -> int:
    """Count measurement documents in ASM."""
    # Count occurrences of measurement document patterns
    count = len(re.findall(r'"measurement identifier":', content_str))
    if count == 0:
        count = len(re.findall(r'"measurement-identifier":', content_str))
    return count


def validate_measurements(content_str: str, result: ValidationResult):
    """Validate measurement documents."""
    count = count_measurements(content_str)
    result.metrics["measurement_count"] = count

    if count == 0:
        result.add_warning("No measurement documents found")
    else:
        result.add_info(f"Measurement count: {count}")


def validate_sample_roles(content_str: str, result: ValidationResult):
    """Check for valid sample roles."""
    roles = re.findall(r'"sample.role.type":\s*"([^"]+)"', content_str)
    if not roles:
        roles = re.findall(r'"sample role type":\s*"([^"]+)"', content_str)

    if roles:
        unknown_roles = [r for r in set(roles) if r not in VALID_SAMPLE_ROLES]
        if unknown_roles:
            result.add_warning(
                f"Unknown sample roles not in known list (as of {VALIDATION_RULES_DATE}): {unknown_roles}. "
                f"These may be valid Allotrope roles added after spec version {ASM_SPEC_VERSION}. "
                f"Verify at: {SCHEMA_SOURCE}"
            )


def validate_statistics(asm: Dict, content_str: str, result: ValidationResult):
    """Check for statistics documents where expected."""
    technique, _ = detect_technique(asm)

    has_stats = (
        "statistics aggregate document" in content_str.lower()
        or "statistics-aggregate-document" in content_str
    )

    result.metrics["has_statistics"] = has_stats

    # Statistics are required for multi-analyte profiling
    if "multi analyte" in technique or "multiplex" in content_str.lower():
        if not has_stats:
            result.add_warning(
                "No statistics aggregate document found - bead-based assays should include "
                "median, mean, CV, std dev per analyte"
            )
        else:
            result.add_info("Statistics document: Present")


def validate_units(content_str: str, result: ValidationResult):
    """Check for valid units."""
    # Find all unit values
    units = re.findall(r'"unit":\s*"([^"]+)"', content_str)

    # Check for common case-sensitivity issues
    case_issues = []
    for unit in set(units):
        if unit.lower() in ["rfu", "mfi"] and unit not in ["RFU", "MFI"]:
            case_issues.append(f"{unit} (should be uppercase)")
        elif unit in ["ul", "uL", "µl"] and unit != "μL":
            case_issues.append(f"{unit} (should be μL)")

    if case_issues:
        result.add_warning(f"Non-standard unit capitalization: {case_issues}")

    # Soft validation: check against known units list
    all_known_units = set()
    for unit_list in VALID_UNITS.values():
        all_known_units.update(unit_list)

    unknown_units = []
    for unit in set(units):
        # Skip units that have case issues (already reported above)
        if unit not in all_known_units and unit not in [u.lower() for u in case_issues]:
            unknown_units.append(unit)

    if unknown_units:
        result.add_warning(
            f"Unknown units not in known list (as of {VALIDATION_RULES_DATE}): {unknown_units}. "
            f"These may be valid Allotrope units added after spec version {ASM_SPEC_VERSION}. "
            f"Verify at: {SCHEMA_SOURCE}"
        )


def validate_metadata(content_str: str, result: ValidationResult):
    """Check for required metadata fields."""
    required_fields = [
        ("device system document", "equipment serial number"),
        ("data system document", "software name"),
        ("data system document", "software version"),
    ]

    missing = []
    for _, field in required_fields:
        if (
            field not in content_str.lower()
            and field.replace(" ", "-") not in content_str
        ):
            missing.append(field)

    if missing:
        result.add_warning(f"Missing recommended metadata: {missing}")


def validate_calculated_data(content_str: str, result: ValidationResult):
    """Check calculated data has proper traceability."""
    content_lower = content_str.lower()

    has_calculated = (
        "calculated data document" in content_lower
        or "calculated-data-document" in content_str
    )
    has_data_source = (
        "data source aggregate document" in content_lower
        or "data-source-aggregate-document" in content_str
    )

    result.metrics["has_calculated_data"] = has_calculated
    result.metrics["has_data_source_traceability"] = has_data_source

    if has_calculated:
        result.add_info("Calculated data document: Present")
        if not has_data_source:
            result.add_error(
                "Calculated data found without data-source-aggregate-document - "
                "traceability is required for audit/regulatory compliance"
            )
        else:
            result.add_info("Data source traceability: Present")

    # Check for calculated fields that might be incorrectly placed in measurement-document
    misplaced = []
    for field in SHOULD_BE_CALCULATED:
        # Check if field appears in measurement document context but not in calculated data
        field_pattern = field.replace("/", ".")
        if field_pattern in content_lower:
            # If we have the field but no calculated-data-document, it's misplaced
            if not has_calculated:
                misplaced.append(field)

    if misplaced:
        result.add_warning(
            f"Fields that should likely be in calculated-data-document: {misplaced[:5]}"
            + (f" ... and {len(misplaced)-5} more" if len(misplaced) > 5 else "")
        )


def validate_unique_identifiers(content_str: str, result: ValidationResult):
    """Validate that entities have unique identifiers for traceability."""
    # Count different identifier types
    measurement_ids = len(
        re.findall(r'"measurement identifier":\s*"[^"]+"', content_str)
    )
    if measurement_ids == 0:
        measurement_ids = len(
            re.findall(r'"measurement-identifier":\s*"[^"]+"', content_str)
        )

    calculated_ids = len(
        re.findall(r'"calculated data identifier":\s*"[^"]+"', content_str)
    )
    if calculated_ids == 0:
        calculated_ids = len(
            re.findall(r'"calculated-data-identifier":\s*"[^"]+"', content_str)
        )

    data_source_ids = len(
        re.findall(r'"data source identifier":\s*"[^"]+"', content_str)
    )
    if data_source_ids == 0:
        data_source_ids = len(
            re.findall(r'"data-source-identifier":\s*"[^"]+"', content_str)
        )

    result.metrics["measurement_identifiers"] = measurement_ids
    result.metrics["calculated_data_identifiers"] = calculated_ids
    result.metrics["data_source_identifiers"] = data_source_ids

    if measurement_ids == 0:
        result.add_warning(
            "No measurement identifiers found - required for traceability"
        )

    # If we have calculated data but no data source identifiers, that's a problem
    if calculated_ids > 0 and data_source_ids == 0:
        result.add_error(
            f"Found {calculated_ids} calculated data entries but no data source identifiers - "
            "each calculated value should reference its source"
        )


# =============================================================================
# NEW: NESTED DOCUMENT STRUCTURE VALIDATION
# =============================================================================


def validate_nested_document_structure(
    asm: Dict, content_str: str, result: ValidationResult
):
    """
    Validate that fields are properly nested in their correct documents.

    This checks for common mistakes like:
    - Sample fields flattened directly onto measurement instead of in 'sample document'
    - Device control fields flattened instead of in 'device control aggregate document'
    - Custom/vendor fields not wrapped in 'custom information document'
    """
    content_lower = content_str.lower()

    # Check if proper nested documents exist
    has_sample_doc = (
        '"sample document"' in content_lower or '"sample-document"' in content_str
    )
    has_device_control_doc = (
        '"device control aggregate document"' in content_lower
        or '"device-control-aggregate-document"' in content_str
    )
    has_custom_info_doc = (
        '"custom information document"' in content_lower
        or '"custom-information-document"' in content_str
    )

    result.metrics["has_sample_document"] = has_sample_doc
    result.metrics["has_device_control_document"] = has_device_control_doc
    result.metrics["has_custom_information_document"] = has_custom_info_doc

    # Parse ASM to check field locations
    def find_flattened_fields_in_measurements(obj, path=""):
        """Recursively find fields that appear directly on measurement documents."""
        issues = {"sample": [], "device_control": [], "custom": []}

        if isinstance(obj, dict):
            # Check if we're inside a measurement document
            in_measurement = (
                "measurement document" in path.lower() or "measurement-document" in path
            )
            in_sample_doc = (
                "sample document" in path.lower() or "sample-document" in path
            )
            in_device_control = (
                "device control" in path.lower() or "device-control" in path
            )
            in_custom_info = (
                "custom information" in path.lower() or "custom-information" in path
            )

            for key, value in obj.items():
                key_normalized = key.lower().replace("-", " ")
                new_path = f"{path}.{key}"

                # Check if this key should be nested but isn't
                if in_measurement and not in_sample_doc:
                    if key_normalized in [
                        f.lower().replace("-", " ") for f in SAMPLE_DOCUMENT_FIELDS
                    ]:
                        issues["sample"].append(key)

                if in_measurement and not in_device_control:
                    if key_normalized in [
                        f.lower().replace("-", " ") for f in DEVICE_CONTROL_FIELDS
                    ]:
                        issues["device_control"].append(key)

                if in_measurement and not in_custom_info:
                    if key_normalized in [
                        f.lower().replace("-", " ") for f in CUSTOM_INFO_FIELDS
                    ]:
                        issues["custom"].append(key)

                # Recurse
                child_issues = find_flattened_fields_in_measurements(value, new_path)
                for k in issues:
                    issues[k].extend(child_issues[k])

        elif isinstance(obj, list):
            for i, item in enumerate(obj):
                child_issues = find_flattened_fields_in_measurements(
                    item, f"{path}[{i}]"
                )
                for k in issues:
                    issues[k].extend(child_issues[k])

        return issues

    issues = find_flattened_fields_in_measurements(asm)
    flattened_sample_fields = list(set(issues["sample"]))
    flattened_device_control_fields = list(set(issues["device_control"]))
    flattened_custom_fields = list(set(issues["custom"]))

    # Report issues
    if flattened_sample_fields:
        result.add_error(
            f"Fields that should be nested in 'sample document' are flattened on measurement: "
            f"{flattened_sample_fields[:5]}"
            + (
                f" ... and {len(flattened_sample_fields)-5} more"
                if len(flattened_sample_fields) > 5
                else ""
            )
        )
        result.add_info(
            "Tip: Wrap sample fields in a 'sample document' object inside each measurement"
        )

    if flattened_device_control_fields:
        result.add_error(
            f"Fields that should be nested in 'device control aggregate document' are flattened: "
            f"{flattened_device_control_fields[:5]}"
            + (
                f" ... and {len(flattened_device_control_fields)-5} more"
                if len(flattened_device_control_fields) > 5
                else ""
            )
        )
        result.add_info(
            "Tip: Wrap device control fields in 'device control aggregate document' → 'device control document'"
        )

    if flattened_custom_fields:
        result.add_warning(
            f"Vendor-specific fields that should be in 'custom information document': "
            f"{flattened_custom_fields[:5]}"
            + (
                f" ... and {len(flattened_custom_fields)-5} more"
                if len(flattened_custom_fields) > 5
                else ""
            )
        )


def validate_liquid_handler_structure(
    asm: Dict, content_str: str, result: ValidationResult
):
    """
    Specific validation for liquid handler ASM documents.

    Checks for:
    - Proper transfer pairing (aspirate + dispense = 1 measurement)
    - Source/destination field pairs
    - Aspiration volume + transfer volume instead of single volume
    """
    technique, _ = detect_technique(asm)

    # Only run for liquid handler techniques
    if "liquid" not in technique.lower() and "handler" not in technique.lower():
        # Also check content for liquid handler indicators
        content_lower = content_str.lower()
        if not any(
            kw in content_lower
            for kw in ["aspirate", "dispense", "liquid handler", "biomek"]
        ):
            return

    result.add_info("Liquid handler specific validation...")

    content_lower = content_str.lower()

    # Check for proper volume field structure
    has_aspiration_volume = (
        "aspiration volume" in content_lower or "aspiration-volume" in content_str
    )
    has_transfer_volume = (
        "transfer volume" in content_lower or "transfer-volume" in content_str
    )
    has_single_volume = (
        '"volume"' in content_str
        and not has_aspiration_volume
        and not has_transfer_volume
    )

    if has_single_volume and not has_aspiration_volume:
        result.add_warning(
            "Liquid handler ASM uses single 'volume' field - "
            "consider using 'aspiration volume' and 'transfer volume' for full transfer semantics"
        )

    if has_aspiration_volume and has_transfer_volume:
        result.add_info("Volume fields: Proper aspiration/transfer volume structure")

    # Check for source/destination pairing
    has_source_dest = (
        "source location" in content_lower or "source-location" in content_str
    ) and (
        "destination location" in content_lower or "destination-location" in content_str
    )

    has_separate_transfer_type = (
        "transfer type" in content_lower or "transfer-type" in content_str
    )

    if has_separate_transfer_type and not has_source_dest:
        result.add_warning(
            "Found 'transfer type' field (Aspirate/Dispense as separate records) - "
            "proper ASM pairs source→destination in single measurement with 'source location identifier' "
            "and 'destination location identifier'"
        )
        result.add_info(
            "Tip: Pair aspirate+dispense operations by probe number into single transfer measurements"
        )

    if has_source_dest:
        result.add_info("Source/destination: Proper paired transfer structure")

    # Check for labware name fields in custom information document
    has_labware_names = (
        "source labware name" in content_lower
        or "destination labware name" in content_lower
    )

    if has_labware_names:
        result.add_info(
            "Labware names: Present (should be in custom information document)"
        )


def compare_to_reference(
    asm: Dict,
    reference: Dict,
    content_str: str,
    ref_content: str,
    result: ValidationResult,
):
    """Compare generated ASM to reference ASM."""
    result.add_info("Comparing to reference ASM...")

    # Compare techniques
    gen_tech, _ = detect_technique(asm)
    ref_tech, _ = detect_technique(reference)

    if gen_tech.replace("-", " ") != ref_tech.replace("-", " "):
        result.add_error(
            f"Technique mismatch: generated '{gen_tech}' vs reference '{ref_tech}'"
        )

    # Compare measurement counts
    gen_count = count_measurements(content_str)
    ref_count = count_measurements(ref_content)

    result.metrics["reference_measurement_count"] = ref_count

    if gen_count != ref_count:
        diff = ref_count - gen_count
        if diff > 0:
            result.add_error(
                f"Missing {diff} measurements: generated {gen_count} vs reference {ref_count}"
            )
        else:
            result.add_warning(
                f"Extra {-diff} measurements: generated {gen_count} vs reference {ref_count}"
            )

    # Compare sample roles
    gen_roles = set(re.findall(r'"sample.role.type":\s*"([^"]+)"', content_str))
    ref_roles = set(re.findall(r'"sample role type":\s*"([^"]+)"', ref_content))

    missing_roles = ref_roles - gen_roles
    if missing_roles:
        result.add_warning(f"Missing sample roles from reference: {missing_roles}")

    # Compare nested document presence
    ref_has_sample_doc = '"sample document"' in ref_content.lower()
    gen_has_sample_doc = (
        '"sample document"' in content_str.lower() or '"sample-document"' in content_str
    )

    if ref_has_sample_doc and not gen_has_sample_doc:
        result.add_error(
            "Reference has 'sample document' but generated ASM does not - fields may be incorrectly flattened"
        )

    ref_has_device_control = (
        '"device control aggregate document"' in ref_content.lower()
    )
    gen_has_device_control = (
        '"device control aggregate document"' in content_str.lower()
        or '"device-control-aggregate-document"' in content_str
    )

    if ref_has_device_control and not gen_has_device_control:
        result.add_error(
            "Reference has 'device control aggregate document' but generated ASM does not"
        )

    ref_has_custom_info = '"custom information document"' in ref_content.lower()
    gen_has_custom_info = (
        '"custom information document"' in content_str.lower()
        or '"custom-information-document"' in content_str
    )

    if ref_has_custom_info and not gen_has_custom_info:
        result.add_warning(
            "Reference has 'custom information document' for vendor fields but generated ASM does not"
        )


def validate_asm(
    filepath: str, reference_path: Optional[str] = None, strict: bool = False
) -> ValidationResult:
    """
    Validate ASM JSON file.

    Args:
        filepath: Path to ASM JSON file
        reference_path: Optional path to reference ASM for comparison
        strict: If True, treat warnings as errors

    Returns:
        ValidationResult with errors, warnings, and metrics
    """
    result = ValidationResult()

    # Load ASM file
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            content_str = f.read()
            asm = json.loads(content_str)
    except json.JSONDecodeError as e:
        result.add_error(f"Invalid JSON: {e}")
        return result
    except FileNotFoundError:
        result.add_error(f"File not found: {filepath}")
        return result

    result.add_info(f"Validating: {filepath}")

    # Run validations
    validate_manifest(asm, result)
    validate_technique(asm, result, content_str)
    validate_naming_conventions(content_str, result)
    validate_measurements(content_str, result)
    validate_sample_roles(content_str, result)
    validate_statistics(asm, content_str, result)
    validate_units(content_str, result)
    validate_metadata(content_str, result)
    validate_calculated_data(content_str, result)
    validate_unique_identifiers(content_str, result)

    # NEW: Nested document structure validation
    validate_nested_document_structure(asm, content_str, result)
    validate_liquid_handler_structure(asm, content_str, result)

    # Compare to reference if provided
    if reference_path:
        try:
            with open(reference_path, "r", encoding="utf-8") as f:
                ref_content = f.read()
                reference = json.loads(ref_content)
            compare_to_reference(asm, reference, content_str, ref_content, result)
        except Exception as e:
            result.add_warning(f"Could not load reference file: {e}")

    # In strict mode, convert warnings to errors
    if strict:
        result.errors.extend([w.replace("WARNING", "ERROR") for w in result.warnings])
        result.warnings = []

    return result


def main():
    parser = argparse.ArgumentParser(description="Validate ASM JSON output")
    parser.add_argument("input", help="ASM JSON file to validate")
    parser.add_argument("--reference", "-r", help="Reference ASM file for comparison")
    parser.add_argument(
        "--strict", "-s", action="store_true", help="Treat warnings as errors"
    )
    parser.add_argument("--quiet", "-q", action="store_true", help="Only show errors")

    args = parser.parse_args()

    result = validate_asm(args.input, args.reference, args.strict)

    if args.quiet:
        if result.errors:
            for error in result.errors:
                print(error)
            sys.exit(1)
        sys.exit(0)

    result.print_report()
    sys.exit(0 if result.is_valid() else 1)


if __name__ == "__main__":
    main()

```

### scripts/export_parser.py

```python
#!/usr/bin/env python3
"""
Export Parser Code

Generates standalone Python scripts that can be handed off to data engineers
or run in Jupyter notebooks. The exported code is self-contained and
production-ready.

Usage:
    python export_parser.py --vendor VI_CELL_BLU --output vicell_parser.py
    python export_parser.py --vendor NANODROP_EIGHT --format notebook --output nanodrop_parser.ipynb
"""

import sys
from pathlib import Path
from datetime import datetime
from typing import Optional


# Template for standalone Python script
SCRIPT_TEMPLATE = '''#!/usr/bin/env python3
"""
{instrument_name} to Allotrope Simple Model (ASM) Parser

Auto-generated by Claude instrument-data-to-allotrope skill
Generated: {timestamp}
Vendor: {vendor}

This script converts {instrument_name} output files to Allotrope Simple Model (ASM)
JSON format for LIMS import, data lakes, or downstream analysis.

Requirements:
    pip install allotropy pandas openpyxl

Usage:
    python {script_name} input_file.csv --output output_asm.json
    python {script_name} input_file.csv --flatten  # Also generate CSV

Input file format:
    {file_format_description}
"""

import json
import argparse
from pathlib import Path
from typing import Dict, Any, Optional

try:
    from allotropy.parser_factory import Vendor
    from allotropy.to_allotrope import allotrope_from_file
    ALLOTROPY_AVAILABLE = True
except ImportError:
    ALLOTROPY_AVAILABLE = False
    print("Warning: allotropy not installed. Install with: pip install allotropy")

try:
    import pandas as pd
    PANDAS_AVAILABLE = True
except ImportError:
    PANDAS_AVAILABLE = False


def convert_to_asm(filepath: str) -> Optional[Dict[str, Any]]:
    """
    Convert {instrument_name} file to ASM format.
    
    Args:
        filepath: Path to input file
        
    Returns:
        ASM dictionary or None if conversion fails
    """
    if not ALLOTROPY_AVAILABLE:
        raise ImportError("allotropy library required. Install with: pip install allotropy")
    
    try:
        asm = allotrope_from_file(filepath, Vendor.{vendor})
        return asm
    except Exception as e:
        print(f"Conversion error: {{e}}")
        return None


def flatten_asm(asm: Dict[str, Any]) -> list:
    """
    Flatten ASM to list of row dictionaries for CSV export.
    
    Args:
        asm: ASM dictionary
        
    Returns:
        List of flattened row dictionaries
    """
    technique = "{technique}"
    rows = []
    
    agg_key = f"{{technique}}-aggregate-document"
    agg_doc = asm.get(agg_key, {{}})
    
    # Extract device info
    device = agg_doc.get("device-system-document", {{}})
    device_info = {{
        "instrument_serial_number": device.get("device-identifier"),
        "instrument_model": device.get("model-number"),
    }}
    
    doc_key = f"{{technique}}-document"
    for doc in agg_doc.get(doc_key, []):
        meas_agg = doc.get("measurement-aggregate-document", {{}})
        
        common = {{
            "analyst": meas_agg.get("analyst"),
            "measurement_time": meas_agg.get("measurement-time"),
            **device_info
        }}
        
        for meas in meas_agg.get("measurement-document", []):
            row = {{**common}}
            for key, value in meas.items():
                clean_key = key.replace("-", "_")
                if isinstance(value, dict) and "value" in value:
                    row[clean_key] = value["value"]
                    if "unit" in value:
                        row[f"{{clean_key}}_unit"] = value["unit"]
                else:
                    row[clean_key] = value
            rows.append(row)
    
    return rows


def main():
    parser = argparse.ArgumentParser(description="Convert {instrument_name} to ASM")
    parser.add_argument("input", help="Input file path")
    parser.add_argument("--output", "-o", help="Output JSON path")
    parser.add_argument("--flatten", action="store_true", help="Also generate CSV")
    
    args = parser.parse_args()
    
    input_path = Path(args.input)
    if not input_path.exists():
        print(f"Error: File not found: {{args.input}}")
        return 1
    
    # Convert to ASM
    print(f"Converting {{args.input}}...")
    asm = convert_to_asm(str(input_path))
    
    if asm is None:
        print("Conversion failed")
        return 1
    
    # Write ASM JSON
    output_path = args.output or str(input_path.with_suffix('.asm.json'))
    with open(output_path, 'w') as f:
        json.dump(asm, f, indent=2, default=str)
    print(f"ASM written to: {{output_path}}")
    
    # Optionally flatten
    if args.flatten and PANDAS_AVAILABLE:
        rows = flatten_asm(asm)
        df = pd.DataFrame(rows)
        flat_path = str(input_path.with_suffix('.flat.csv'))
        df.to_csv(flat_path, index=False)
        print(f"CSV written to: {{flat_path}}")
    
    return 0


if __name__ == "__main__":
    sys.exit(main())
'''


# Template for Jupyter notebook
NOTEBOOK_TEMPLATE = """{{
 "cells": [
  {{
   "cell_type": "markdown",
   "metadata": {{}},
   "source": [
    "# {instrument_name} to Allotrope Simple Model (ASM) Parser\\n",
    "\\n",
    "Auto-generated by Claude instrument-data-to-allotrope skill\\n",
    "Generated: {timestamp}\\n",
    "Vendor: {vendor}\\n",
    "\\n",
    "This notebook converts {instrument_name} output files to Allotrope Simple Model (ASM) JSON format."
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "# Install requirements (uncomment if needed)\\n",
    "# !pip install allotropy pandas openpyxl"
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "import json\\n",
    "from pathlib import Path\\n",
    "import pandas as pd\\n",
    "\\n",
    "from allotropy.parser_factory import Vendor\\n",
    "from allotropy.to_allotrope import allotrope_from_file"
   ]
  }},
  {{
   "cell_type": "markdown",
   "metadata": {{}},
   "source": [
    "## Configuration\\n",
    "\\n",
    "Set your input file path here:"
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "# Configure input/output paths\\n",
    "INPUT_FILE = \\"your_data_file.csv\\"  # <-- Change this\\n",
    "OUTPUT_ASM = \\"output_asm.json\\"\\n",
    "OUTPUT_CSV = \\"output_flat.csv\\""
   ]
  }},
  {{
   "cell_type": "markdown",
   "metadata": {{}},
   "source": [
    "## Convert to ASM"
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "# Convert file to ASM\\n",
    "asm = allotrope_from_file(INPUT_FILE, Vendor.{vendor})\\n",
    "\\n",
    "# Save ASM JSON\\n",
    "with open(OUTPUT_ASM, 'w') as f:\\n",
    "    json.dump(asm, f, indent=2, default=str)\\n",
    "\\n",
    "print(f\\"ASM saved to: {{OUTPUT_ASM}}\\")"
   ]
  }},
  {{
   "cell_type": "markdown",
   "metadata": {{}},
   "source": [
    "## Preview ASM Structure"
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "# Show ASM structure\\n",
    "print(json.dumps(asm, indent=2, default=str)[:2000])"
   ]
  }},
  {{
   "cell_type": "markdown",
   "metadata": {{}},
   "source": [
    "## Flatten to CSV"
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "def flatten_asm(asm, technique=\\"{technique}\\"):\\n",
    "    rows = []\\n",
    "    agg_key = f\\"{{technique}}-aggregate-document\\"\\n",
    "    agg_doc = asm.get(agg_key, {{}})\\n",
    "    \\n",
    "    device = agg_doc.get(\\"device-system-document\\", {{}})\\n",
    "    device_info = {{\\n",
    "        \\"instrument_serial_number\\": device.get(\\"device-identifier\\"),\\n",
    "        \\"instrument_model\\": device.get(\\"model-number\\"),\\n",
    "    }}\\n",
    "    \\n",
    "    doc_key = f\\"{{technique}}-document\\"\\n",
    "    for doc in agg_doc.get(doc_key, []):\\n",
    "        meas_agg = doc.get(\\"measurement-aggregate-document\\", {{}})\\n",
    "        common = {{\\n",
    "            \\"analyst\\": meas_agg.get(\\"analyst\\"),\\n",
    "            \\"measurement_time\\": meas_agg.get(\\"measurement-time\\"),\\n",
    "            **device_info\\n",
    "        }}\\n",
    "        \\n",
    "        for meas in meas_agg.get(\\"measurement-document\\", []):\\n",
    "            row = {{**common}}\\n",
    "            for key, value in meas.items():\\n",
    "                clean_key = key.replace(\\"-\\", \\"_\\")\\n",
    "                if isinstance(value, dict) and \\"value\\" in value:\\n",
    "                    row[clean_key] = value[\\"value\\"]\\n",
    "                    if \\"unit\\" in value:\\n",
    "                        row[f\\"{{clean_key}}_unit\\"] = value[\\"unit\\"]\\n",
    "                else:\\n",
    "                    row[clean_key] = value\\n",
    "            rows.append(row)\\n",
    "    return rows\\n",
    "\\n",
    "# Flatten and save\\n",
    "rows = flatten_asm(asm)\\n",
    "df = pd.DataFrame(rows)\\n",
    "df.to_csv(OUTPUT_CSV, index=False)\\n",
    "print(f\\"CSV saved to: {{OUTPUT_CSV}}\\")"
   ]
  }},
  {{
   "cell_type": "code",
   "execution_count": null,
   "metadata": {{}},
   "source": [
    "# Preview flattened data\\n",
    "df.head()"
   ]
  }}
 ],
 "metadata": {{
  "kernelspec": {{
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }},
  "language_info": {{
   "name": "python",
   "version": "3.10.0"
  }}
 }},
 "nbformat": 4,
 "nbformat_minor": 4
}}"""


# Instrument metadata for templates
INSTRUMENT_INFO = {
    "BECKMAN_VI_CELL_BLU": {
        "name": "Beckman Coulter Vi-CELL BLU",
        "technique": "cell-counting",
        "file_format": "CSV export from Vi-CELL BLU software with columns: Sample ID, Viable cells, Viability, Total cells, etc.",
    },
    "BECKMAN_VI_CELL_XR": {
        "name": "Beckman Coulter Vi-CELL XR",
        "technique": "cell-counting",
        "file_format": "TXT or XLS/XLSX export from Vi-CELL XR with sample and measurement data",
    },
    "THERMO_FISHER_NANODROP_EIGHT": {
        "name": "Thermo Fisher NanoDrop Eight",
        "technique": "spectrophotometry",
        "file_format": "TSV or TXT export with Sample Name, Nucleic Acid Conc., A260, A280, 260/280 ratio",
    },
    "THERMO_FISHER_NANODROP_ONE": {
        "name": "Thermo Fisher NanoDrop One",
        "technique": "spectrophotometry",
        "file_format": "CSV or XLSX export with spectrophotometry measurements",
    },
    "MOLDEV_SOFTMAX_PRO": {
        "name": "Molecular Devices SoftMax Pro",
        "technique": "plate-reader",
        "file_format": "TXT export from SoftMax Pro with plate reader data",
    },
    "BMG_MARS": {
        "name": "BMG MARS (CLARIOstar)",
        "technique": "plate-reader",
        "file_format": "CSV or TXT export from BMG MARS with Well, Content, Conc., Mean, SD, CV columns",
    },
    "AGILENT_GEN5": {
        "name": "Agilent Gen5 (BioTek)",
        "technique": "plate-reader",
        "file_format": "XLSX export from Gen5 software",
    },
    "APPBIO_QUANTSTUDIO": {
        "name": "Applied Biosystems QuantStudio",
        "technique": "pcr",
        "file_format": "XLSX export with qPCR data including Well, Sample Name, Target Name, CT values",
    },
}


def generate_script(vendor: str, output_path: str) -> None:
    """Generate standalone Python script for given vendor."""
    info = INSTRUMENT_INFO.get(
        vendor,
        {
            "name": vendor.replace("_", " ").title(),
            "technique": "generic",
            "file_format": "Instrument output file",
        },
    )

    script = SCRIPT_TEMPLATE.format(
        instrument_name=info["name"],
        timestamp=datetime.now().isoformat(),
        vendor=vendor,
        script_name=Path(output_path).name,
        file_format_description=info["file_format"],
        technique=info["technique"],
    )

    with open(output_path, "w") as f:
        f.write(script)


def generate_notebook(vendor: str, output_path: str) -> None:
    """Generate Jupyter notebook for given vendor."""
    info = INSTRUMENT_INFO.get(
        vendor,
        {
            "name": vendor.replace("_", " ").title(),
            "technique": "generic",
            "file_format": "Instrument output file",
        },
    )

    notebook = NOTEBOOK_TEMPLATE.format(
        instrument_name=info["name"],
        timestamp=datetime.now().isoformat(),
        vendor=vendor,
        technique=info["technique"],
    )

    with open(output_path, "w") as f:
        f.write(notebook)


def main():
    import argparse

    parser = argparse.ArgumentParser(
        description="Export parser code for data engineers"
    )
    parser.add_argument("--vendor", help="Vendor enum name (e.g., VI_CELL_BLU)")
    parser.add_argument("--output", "-o", help="Output file path")
    parser.add_argument(
        "--format",
        choices=["script", "notebook"],
        default="script",
        help="Output format (default: script)",
    )
    parser.add_argument(
        "--list-vendors", action="store_true", help="List supported vendors"
    )

    args = parser.parse_args()

    if args.list_vendors:
        print("Supported vendors:")
        for vendor in INSTRUMENT_INFO.keys():
            print(f"  {vendor}")
        return 0

    if not args.vendor or not args.output:
        parser.error("--vendor and --output are required when not using --list-vendors")

    vendor = args.vendor.upper()

    if args.format == "notebook":
        generate_notebook(vendor, args.output)
    else:
        generate_script(vendor, args.output)

    print(f"Parser code exported to: {args.output}")
    return 0


if __name__ == "__main__":
    sys.exit(main())

```

### references/asm_schema_overview.md

```markdown
# ASM Schema Overview

The Allotrope Simple Model (ASM) is a JSON-based standard for representing laboratory instrument data with semantic consistency.

## Core Concepts

### Structure
ASM uses a hierarchical document structure:
- **Manifest** - Links to ontologies and schemas
- **Data** - The actual measurement data organized by technique

### Key Components

```json
{
  "$asm.manifest": {
    "vocabulary": ["http://purl.allotrope.org/voc/afo/REC/2023/09/"],
    "contexts": ["http://purl.allotrope.org/json-ld/afo-context-REC-2023-09.jsonld"]
  },
  "<technique>-aggregate-document": {
    "device-system-document": { ... },
    "<technique>-document": [
      {
        "measurement-aggregate-document": {
          "measurement-document": [ ... ]
        }
      }
    ]
  }
}
```

## Required Metadata Documents

### data system document
Every ASM output MUST include this document with:
- `ASM file identifier`: Output filename
- `data system instance identifier`: System ID or "N/A"
- `file name`: Source input filename
- `UNC path`: Path to source file
- `ASM converter name`: Parser identifier (e.g., "allotropy_beckman_coulter_biomek")
- `ASM converter version`: Version string
- `software name`: Instrument software that generated the source file

### device system document
Every ASM output MUST include this document with:
- `equipment serial number`: Main instrument serial
- `product manufacturer`: Vendor name
- `device document`: Array of sub-components (probes, pods, etc.)
  - `device type`: Standardized type (e.g., "liquid handler probe head")
  - `device identifier`: Logical name (e.g., "Pod1", not serial number)
  - `equipment serial number`: Component serial
  - `product manufacturer`: Component vendor

## Available ASM Techniques

The official ASM repository includes **65 technique schemas**:

```
absorbance, automated-reactors, balance, bga, binding-affinity, bulk-density,
cell-counting, cell-culture-analyzer, chromatography, code-reader, conductance,
conductivity, disintegration, dsc, dvs, electronic-lab-notebook,
electronic-spectrometry, electrophoresis, flow-cytometry, fluorescence,
foam-height, foam-qualification, fplc, ftir, gas-chromatography, gc-ms, gloss,
hot-tack, impedance, lc-ms, light-obscuration, liquid-chromatography,
loss-on-drying, luminescence, mass-spectrometry, metabolite-analyzer,
multi-analyte-profiling, nephelometry, nmr, optical-imaging, optical-microscopy,
osmolality, oven-kf, pcr, ph, plate-reader, pressure-monitoring, psd, pumping,
raman, rheometry, sem, solution-analyzer, specific-rotation, spectrophotometry,
stirring, surface-area-analysis, tablet-hardness, temperature-monitoring,
tensile-test, thermogravimetric-analysis, titration, ultraviolet-absorbance,
x-ray-powder-diffraction
```

See: https://gitlab.com/allotrope-public/asm/-/tree/main/json-schemas/adm

## Common ASM Schemas by Technique

Below are details for frequently-used techniques:

### Cell Counting
Schema: `cell-counting/REC/2024/09/cell-counting.schema.json`

Key fields:
- `viable-cell-density` (cells/mL)
- `viability` (percentage)
- `total-cell-count`
- `dead-cell-count`
- `cell-diameter-distribution-datum`

### Spectrophotometry (UV-Vis)
Schema: `spectrophotometry/REC/2024/06/spectrophotometry.schema.json`

Key fields:
- `absorbance` (dimensionless)
- `wavelength` (nm)
- `transmittance` (percentage)
- `pathlength` (cm)
- `concentration` with units

### Plate Reader
Schema: `plate-reader/REC/2024/06/plate-reader.schema.json`

Key fields:
- `absorbance`
- `fluorescence`
- `luminescence`
- `well-location` (A1-H12)
- `plate-identifier`

### qPCR
Schema: `pcr/REC/2024/06/pcr.schema.json`

Key fields:
- `cycle-threshold-result`
- `amplification-efficiency`
- `melt-curve-datum`
- `target-DNA-description`

### Chromatography
Schema: `liquid-chromatography/REC/2023/09/liquid-chromatography.schema.json`

Key fields:
- `retention-time` (minutes)
- `peak-area`
- `peak-height`
- `peak-width`
- `chromatogram-data-cube`

## Data Patterns

### Value Datum
Simple value with unit:
```json
{
  "value": 1.5,
  "unit": "mL"
}
```

### Aggregate Datum
Collection of related values:
```json
{
  "measurement-aggregate-document": {
    "measurement-document": [
      { "viable-cell-density": {"value": 2.5e6, "unit": "(cell/mL)"} },
      { "viability": {"value": 95.2, "unit": "%"} }
    ]
  }
}
```

### Data Cube
Multi-dimensional array data:
```json
{
  "cube-structure": {
    "dimensions": [{"@componentDatatype": "double", "concept": "elapsed time"}],
    "measures": [{"@componentDatatype": "double", "concept": "absorbance"}]
  },
  "data": {
    "dimensions": [[0, 1, 2, 3, 4]],
    "measures": [[0.1, 0.2, 0.3, 0.4, 0.5]]
  }
}
```

## Validation

Validate ASM output against official schemas:

```python
import json
import jsonschema
from urllib.request import urlopen

# Load ASM output
with open("output.json") as f:
    asm = json.load(f)

# Get schema URL from manifest
schema_url = asm.get("$asm.manifest", {}).get("$ref")

# Validate (simplified - real validation more complex)
# Note: Full validation requires resolving $ref references
```

## Schema Repository

Official schemas: https://gitlab.com/allotrope-public/asm/-/tree/main/json-schemas/adm

Schema structure:
```
json-schemas/adm/
├── cell-counting/
│   └── REC/2024/09/
│       └── cell-counting.schema.json
├── spectrophotometry/
│   └── REC/2024/06/
│       └── spectrophotometry.schema.json
├── plate-reader/
│   └── REC/2024/06/
│       └── plate-reader.schema.json
└── ...
```

## Common Issues

### Missing Fields
Not all instrument exports contain all ASM fields. Report completeness:
```python
def report_completeness(asm, expected_fields):
    found = set(extract_all_fields(asm))
    missing = expected_fields - found
    return len(found) / len(expected_fields) * 100
```

### Unit Variations
Instruments may use different unit formats. The allotropy library normalizes these:
- "cells/mL" → "(cell/mL)"
- "%" → "%"
- "nm" → "nm"

### Date Formats
ASM uses ISO 8601: `2024-01-15T10:30:00Z`

```

### references/field_classification_guide.md

```markdown
# Field Classification Guide

This guide helps classify instrument data fields into the correct ASM document locations. Use this when mapping raw instrument output to Allotrope Simple Model structure.

## ASM Document Hierarchy

```
<technique>-aggregate-document
├── device-system-document          # Instrument hardware info
├── data-system-document            # Software/conversion info
├── <technique>-document[]          # Per-run/sequence data
│   ├── analyst                     # Who performed the analysis
│   ├── measurement-aggregate-document
│   │   ├── measurement-time
│   │   ├── measurement-document[]  # Individual measurements
│   │   │   ├── sample-document
│   │   │   ├── device-control-aggregate-document
│   │   │   └── [measurement fields]
│   │   └── [aggregate-level metadata]
│   ├── processed-data-aggregate-document
│   │   └── processed-data-document[]
│   │       ├── data-processing-document
│   │       └── [processed results]
│   └── calculated-data-aggregate-document
│       └── calculated-data-document[]
```

## Field Classification Categories

### 1. Device/Instrument Information → `device-system-document`

Hardware and firmware details about the physical instrument.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Instrument name | `model-number` | "Vi-CELL BLU", "NanoDrop One" |
| Serial number | `equipment-serial-number` | "VCB-12345", "SN001234" |
| Manufacturer | `product-manufacturer` | "Beckman Coulter", "Thermo Fisher" |
| Firmware version | `firmware-version` | "v2.1.3" |
| Device ID | `device-identifier` | "Instrument_01" |
| Brand | `brand-name` | "Beckman Coulter" |

**Rule:** If the value describes the physical instrument and doesn't change between runs, it goes in `device-system-document`.

---

### 2. Software/Data System Information → `data-system-document`

Information about software used for acquisition, analysis, or conversion.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Software name | `software-name` | "Chromeleon", "Gen5" |
| Software version | `software-version` | "7.3.2" |
| File name | `file-name` | "experiment_001.xlsx" |
| File path | `file-identifier` | "/data/runs/2024-01-15/" |
| Database ID | `ASM-converter-name` | "allotropy v0.1.55" |

**Rule:** If the value describes software, file metadata, or data provenance, it goes in `data-system-document`.

---

### 3. Sample Information → `sample-document`

Metadata about the biological/chemical sample being analyzed.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Sample ID | `sample-identifier` | "Sample_A", "LIMS-001234" |
| Sample name | `written-name` | "CHO Cell Culture Day 5" |
| Sample type/role | `sample-role-type` | "unknown sample role", "control sample role" |
| Batch ID | `batch-identifier` | "Batch-2024-001" |
| Description | `description` | "Protein expression sample" |
| Well position | `location-identifier` | "A1", "B3" |

**Rule:** If the value identifies or describes what was measured (not how), it goes in `sample-document`.

---

### 4. Device Control Settings → `device-control-aggregate-document`

Instrument settings and parameters used during measurement.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Injection volume | `sample-volume-setting` | 10 µL |
| Wavelength | `detector-wavelength-setting` | 254 nm |
| Temperature | `compartment-temperature` | 37°C |
| Flow rate | `flow-rate` | 1.0 mL/min |
| Exposure time | `exposure-duration-setting` | 500 ms |
| Detector gain | `detector-gain-setting` | 1.5 |
| Illumination | `illumination-setting` | 80% |

**Rule:** If the value is a configurable instrument parameter that affects measurement, it goes in `device-control-aggregate-document`.

---

### 5. Environmental Conditions → `device-control-document` or technique-specific

Ambient or controlled environmental parameters during measurement.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Ambient temperature | `ambient-temperature` | 22.5°C |
| Humidity | `ambient-relative-humidity` | 45% |
| Column temperature | `compartment-temperature` | 30°C |
| Sample temperature | `sample-temperature` | 4°C |
| Electrophoresis temp | (technique-specific) | 26.4°C |

**Rule:** Environmental conditions that affect measurement quality go with device control or in technique-specific locations.

---

### 6. Raw Measurement Data → `measurement-document`

Direct instrument readings - the "ground truth" data.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Absorbance | `absorbance` | 0.523 AU |
| Fluorescence | `fluorescence` | 12500 RFU |
| Cell count | `total-cell-count` | 2.5e6 cells |
| Peak area | `peak-area` | 1234.5 mAU·min |
| Retention time | `retention-time` | 5.67 min |
| Ct value | `cycle-threshold-result` | 24.5 |
| Concentration (measured) | `mass-concentration` | 1.5 mg/mL |

**Rule:** If the value is a direct instrument reading that wasn't computed from other values in this analysis, it goes in `measurement-document`.

---

### 7. Calculated/Derived Data → `calculated-data-aggregate-document`

Values computed from raw measurements.

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Viability % | `calculated-result` | 95.2% |
| Concentration (from std curve) | `calculated-result` | 125 ng/µL |
| Ratio (260/280) | `calculated-result` | 1.89 |
| Relative quantity | `calculated-result` | 2.5x |
| % Recovery | `calculated-result` | 98.7% |
| CV% | `calculated-result` | 2.3% |

**Calculated data document structure:**
```json
{
  "calculated-data-name": "viability",
  "calculated-result": {"value": 95.2, "unit": "%"},
  "calculation-description": "viable cells / total cells * 100"
}
```

**Rule:** If the value was computed from other measurements in this analysis, it goes in `calculated-data-aggregate-document`. Include `calculation-description` when possible.

---

### 8. Processed/Analyzed Data → `processed-data-aggregate-document`

Results from data processing algorithms (peak integration, cell classification, etc.).

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Peak list | `peak-list` | Integrated peak results |
| Cell size distribution | `cell-diameter-distribution` | Histogram data |
| Baseline-corrected data | (in processed-data-document) | Corrected spectra |
| Fitted curve | (in processed-data-document) | Standard curve fit |

**Associated `data-processing-document`:**
```json
{
  "cell-type-processing-method": "trypan blue exclusion",
  "cell-density-dilution-factor": {"value": 2, "unit": "(unitless)"},
  "minimum-cell-diameter-setting": {"value": 5, "unit": "µm"},
  "maximum-cell-diameter-setting": {"value": 50, "unit": "µm"}
}
```

**Rule:** If the value results from an algorithm or processing method applied to raw data, it goes in `processed-data-aggregate-document` with its processing parameters in `data-processing-document`.

---

### 9. Timing/Timestamps → Various locations

| Timestamp Type | Location | ASM Field |
|----------------|----------|-----------|
| Measurement time | `measurement-document` | `measurement-time` |
| Run start time | `analysis-sequence-document` | `analysis-sequence-start-time` |
| Run end time | `analysis-sequence-document` | `analysis-sequence-end-time` |
| Data export time | `data-system-document` | (custom) |

**Rule:** Use ISO 8601 format: `2024-01-15T10:30:00Z`

---

### 10. Analyst/Operator Information → `<technique>-document`

| Field Type | ASM Field | Examples |
|------------|-----------|----------|
| Operator name | `analyst` | "jsmith" |
| Reviewer | (custom or extension) | "Pending" |

**Rule:** Analyst goes at the technique-document level, not in individual measurements.

---

## Decision Tree

```
Is this field about...

THE INSTRUMENT ITSELF?
├── Hardware specs → device-system-document
└── Software/files → data-system-document

THE SAMPLE?
└── Sample ID, name, type, batch → sample-document

INSTRUMENT SETTINGS?
└── Configurable parameters → device-control-aggregate-document

ENVIRONMENTAL CONDITIONS?
└── Temp, humidity, etc. → device-control-document

A DIRECT READING?
└── Raw instrument output → measurement-document

A COMPUTED VALUE?
├── From other measurements → calculated-data-document
└── From processing algorithm → processed-data-document

TIMING?
├── When measured → measurement-document.measurement-time
└── When run started/ended → analysis-sequence-document

WHO DID IT?
└── Operator/analyst → <technique>-document.analyst
```

## Common Instrument-to-ASM Mappings

> **Note:** These mappings are derived from the [Benchling allotropy library](https://github.com/Benchling-Open-Source/allotropy/tree/main/src/allotropy/parsers). For authoritative mappings, consult the parser source code for your specific instrument.

### Cell Counter (Vi-CELL BLU)
*Source: `allotropy/parsers/beckman_vi_cell_blu/vi_cell_blu_structure.py`*

| Instrument Field | ASM Field |
|-----------------|-----------|
| Sample ID | `sample_identifier` |
| Analysis date/time | `measurement_time` |
| Analysis by | `analyst` |
| Viability (%) | `viability` |
| Viable (x10^6) cells/mL | `viable_cell_density` |
| Total (x10^6) cells/mL | `total_cell_density` |
| Cell count | `total_cell_count` |
| Viable cells | `viable_cell_count` |
| Average diameter (μm) | `average_total_cell_diameter` |
| Average viable diameter (μm) | `average_live_cell_diameter` |
| Average circularity | `average_total_cell_circularity` |
| Cell type | `cell_type_processing_method` (data-processing) |
| Dilution | `cell_density_dilution_factor` (data-processing) |
| Min/Max Diameter | `minimum/maximum_cell_diameter_setting` (data-processing) |

### Spectrophotometer (NanoDrop)
| Instrument Field | ASM Field |
|-----------------|-----------|
| Sample Name | `sample_identifier` |
| A260, A280 | `absorbance` (with wavelength) |
| Concentration | `mass_concentration` |
| 260/280 ratio | `a260_a280_ratio` |
| Pathlength | `pathlength` |

### Plate Reader
| Instrument Field | ASM Field |
|-----------------|-----------|
| Well | `location_identifier` |
| Sample Type | `sample_role_type` |
| Absorbance/OD | `absorbance` |
| Fluorescence | `fluorescence` |
| Plate ID | `container_identifier` |

### Chromatography (HPLC)
| Instrument Field | ASM Field |
|-----------------|-----------|
| Sample ID | `sample_identifier` |
| Injection Volume | `injection_volume` |
| Retention Time | `retention_time` |
| Peak Area | `peak_area` |
| Peak Height | `peak_height` |
| Column Temp | `column_oven_temperature` |
| Flow Rate | `flow_rate` |

## Unit Handling

Only use units explicitly present in source data. If a value has no unit specified:
- Use `(unitless)` as the unit value
- Do NOT infer units based on domain knowledge

## Calculated Data Traceability

When creating calculated values, always link them to their source data using `data-source-aggregate-document`:

```json
{
    "calculated-data-name": "DIN",
    "calculated-result": {"value": 5.8, "unit": "(unitless)"},
    "calculated-data-identifier": "TEST_ID_147",
    "data-source-aggregate-document": {
        "data-source-document": [{
            "data-source-identifier": "TEST_ID_145",
            "data-source-feature": "sample"
        }]
    }
}
```

This declares: "DIN 5.8 was calculated from the sample at `TEST_ID_145`."

**Why this matters:**
- **Audits**: Prove a value came from specific raw data
- **Debugging**: Trace unexpected results back to their source
- **Reprocessing**: Know which inputs to re-analyze if algorithms change

**Assign unique IDs to:**
- Measurements, peaks, regions, and calculated values
- Use a consistent naming pattern (e.g., `INSTRUMENT_TYPE_TEST_ID_N`)

This enables bidirectional traversal: trace from calculated → raw, or raw → all derived values.

---

## Nested Document Structure (Critical)

A common mistake is "flattening" fields directly onto measurement documents when they should be wrapped in nested structures. This breaks schema compliance and loses semantic context.

### Why Nesting Matters

ASM uses nested documents for semantic grouping:

| Document | Purpose | Contains |
|----------|---------|----------|
| `sample document` | What was measured | Sample ID, locations, plate identifiers |
| `device control aggregate document` | How instrument operated | Settings, parameters, techniques |
| `custom information document` | Vendor-specific fields | Non-standard fields that don't map to ASM |

### Sample Document Fields

These fields MUST be inside `sample document`, not flattened on measurement:

```json
// ❌ WRONG - Fields flattened on measurement
{
  "measurement identifier": "TEST_001",
  "sample identifier": "Sample_A",
  "location identifier": "A1",
  "absorbance": {"value": 0.5, "unit": "(unitless)"}
}

// ✅ CORRECT - Fields nested in sample document
{
  "measurement identifier": "TEST_001",
  "sample document": {
    "sample identifier": "Sample_A",
    "location identifier": "A1",
    "well plate identifier": "96WP001"
  },
  "absorbance": {"value": 0.5, "unit": "(unitless)"}
}
```

**Fields belonging in sample document:**
- `sample identifier` - Sample ID/name
- `written name` - Descriptive sample name
- `batch identifier` - Batch/lot number
- `sample role type` - Standard, blank, control, unknown
- `location identifier` - Well position (A1, B3, etc.)
- `well plate identifier` - Plate barcode
- `description` - Sample description

### Device Control Document Fields

Instrument settings MUST be inside `device control aggregate document`:

```json
// ❌ WRONG - Device settings flattened
{
  "measurement identifier": "TEST_001",
  "device identifier": "Pod1",
  "technique": "Custom",
  "volume": {"value": 26, "unit": "μL"}
}

// ✅ CORRECT - Settings nested in device control
{
  "measurement identifier": "TEST_001",
  "device control aggregate document": {
    "device control document": [{
      "device type": "liquid handler",
      "device identifier": "Pod1"
    }]
  },
  "aspiration volume": {"value": 26, "unit": "μL"}
}
```

**Fields belonging in device control:**
- `device type` - Type of device
- `device identifier` - Device ID
- `detector wavelength setting` - Wavelength for detection
- `compartment temperature` - Temperature setting
- `sample volume setting` - Volume setting
- `flow rate` - Flow rate setting

### Custom Information Document

Vendor-specific fields that don't map to standard ASM terms go in `custom information document`:

```json
"device control document": [{
  "device type": "liquid handler",
  "custom information document": {
    "probe": "2",
    "pod": "Pod1",
    "source labware name": "Inducer",
    "destination labware name": "GRP1"
  }
}]
```

### Liquid Handler: Transfer Pairing

For liquid handlers, a measurement represents a complete transfer (aspirate + dispense), not separate operations:

```json
// ❌ WRONG - Separate records for aspirate and dispense
[
  {"measurement identifier": "OP_001", "transfer type": "Aspirate", "volume": {"value": 26, "unit": "μL"}},
  {"measurement identifier": "OP_002", "transfer type": "Dispense", "volume": {"value": 26, "unit": "μL"}}
]

// ✅ CORRECT - Single record with source and destination
{
  "measurement identifier": "TRANSFER_001",
  "sample document": {
    "source well location identifier": "1",
    "destination well location identifier": "2",
    "source well plate identifier": "96WP001",
    "destination well plate identifier": "96WP002"
  },
  "aspiration volume": {"value": 26, "unit": "μL"},
  "transfer volume": {"value": 26, "unit": "μL"}
}
```

**Pairing logic:**
1. Match aspirate and dispense operations by probe number
2. Create one measurement per matched pair
3. Use `source_*` fields for aspirate location
4. Use `destination_*` fields for dispense location
5. Include both `aspiration volume` and `transfer volume`

### Quick Reference: Nesting Decision

```
Is this field about...

THE SAMPLE BEING MEASURED?
├── Sample ID, name, batch → sample document
├── Well position → sample document.location identifier
├── Plate barcode → sample document.well plate identifier
└── Source/destination locations → sample document (with prefixes)

INSTRUMENT SETTINGS?
├── Standard settings → device control aggregate document
└── Vendor-specific → custom information document

A MEASUREMENT VALUE?
└── Direct on measurement document (e.g., absorbance, volume)

TRANSFER OPERATION TYPE?
└── DON'T use "transfer type" - pair into single measurement
    with source/destination fields instead
```

### Validation

Use `validate_asm.py` to check for nesting issues:
```bash
python scripts/validate_asm.py output.json --reference known_good.json
```

The validator checks for:
- Fields incorrectly flattened on measurements
- Missing `sample document` wrapper
- Missing `device control aggregate document` wrapper
- Missing `custom information document` for vendor fields
- Liquid handler: separate transfer types instead of paired records

## Sources

- [Allotrope Simple Model Introduction](https://www.allotrope.org/introduction-to-allotrope-simple-model)
- [Benchling allotropy library](https://github.com/Benchling-Open-Source/allotropy)
- [Allotrope Foundation ASM Overview](https://www.allotrope.org/asm)

```

### references/supported_instruments.md

```markdown
# Supported Instruments

## What Can This Skill Convert?

**Any instrument data that maps to an Allotrope schema can be converted.** The skill uses a tiered parsing approach:

1. **Native allotropy parsers** (listed below) - Highest fidelity, validated against vendor-specific formats
2. **Flexible fallback parser** - Handles any tabular data (CSV, Excel, TXT) by mapping columns to ASM fields
3. **PDF extraction** - Extracts tables from PDFs, then applies flexible parsing

If your instrument isn't listed below, the skill can still convert it as long as your data contains recognizable measurement fields (sample IDs, values, units, timestamps, etc.) that map to an ASM technique schema.

---

## Instruments with Native Allotropy Parsers

The following instruments have optimized parsers in the allotropy library with their Vendor enum values.

## Cell Counting

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Beckman Coulter Vi-CELL BLU | `BECKMAN_VI_CELL_BLU` | .csv |
| Beckman Coulter Vi-CELL XR | `BECKMAN_VI_CELL_XR` | .txt, .xls, .xlsx |
| ChemoMetec NucleoView NC-200 | `CHEMOMETEC_NUCLEOVIEW` | .xlsx |
| ChemoMetec NC-View | `CHEMOMETEC_NC_VIEW` | .xlsx |
| Revvity Matrix | `REVVITY_MATRIX` | .csv |

## Spectrophotometry (UV-Vis)

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Thermo Fisher NanoDrop One | `THERMO_FISHER_NANODROP_ONE` | .csv, .xlsx |
| Thermo Fisher NanoDrop Eight | `THERMO_FISHER_NANODROP_EIGHT` | .tsv, .txt |
| Thermo Fisher NanoDrop 8000 | `THERMO_FISHER_NANODROP_8000` | .csv |
| Unchained Labs Lunatic | `UNCHAINED_LABS_LUNATIC` | .csv, .xlsx |
| Thermo Fisher Genesys 30 | `THERMO_FISHER_GENESYS30` | .csv |

## Plate Readers (Multi-mode, Absorbance, Fluorescence)

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Molecular Devices SoftMax Pro | `MOLDEV_SOFTMAX_PRO` | .txt |
| PerkinElmer EnVision | `PERKIN_ELMER_ENVISION` | .csv |
| Agilent Gen5 (BioTek) | `AGILENT_GEN5` | .xlsx |
| Agilent Gen5 Image | `AGILENT_GEN5_IMAGE` | .xlsx |
| BMG MARS (CLARIOstar) | `BMG_MARS` | .csv, .txt |
| BMG LabTech Smart Control | `BMG_LABTECH_SMART_CONTROL` | .csv |
| Thermo SkanIt | `THERMO_SKANIT` | .xlsx |
| Revvity Kaleido | `REVVITY_KALEIDO` | .csv |
| Tecan Magellan | `TECAN_MAGELLAN` | .xlsx |

## ELISA / Immunoassay

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Molecular Devices SoftMax Pro | `MOLDEV_SOFTMAX_PRO` | .txt |
| MSD Discovery Workbench | `MSD_WORKBENCH` | .txt |
| MSD Methodical Mind | `METHODICAL_MIND` | .xlsx |
| BMG MARS | `BMG_MARS` | .csv, .txt |

## qPCR / PCR

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Applied Biosystems QuantStudio | `APPBIO_QUANTSTUDIO` | .xlsx |
| Applied Biosystems QuantStudio Design & Analysis | `APPBIO_QUANTSTUDIO_DESIGNANALYSIS` | .xlsx, .csv |
| Bio-Rad CFX Maestro | `BIORAD_CFX_MAESTRO` | .csv, .xlsx |
| Roche LightCycler | `ROCHE_LIGHTCYCLER` | .txt |

## Chromatography (HPLC, LC)

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Waters Empower | `WATERS_EMPOWER` | .xml |
| Thermo Fisher Chromeleon | `THERMO_FISHER_CHROMELEON` | .xml |
| Agilent ChemStation | `AGILENT_CHEMSTATION` | .csv |

## Electrophoresis

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Agilent TapeStation | `AGILENT_TAPESTATION` | .csv |
| PerkinElmer LabChip | `PERKIN_ELMER_LABCHIP` | .csv |

## Flow Cytometry

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| BD Biosciences FACSDiva | `BD_BIOSCIENCES_FACSDIVA` | .xml |
| FlowJo | `FLOWJO` | .wsp |

## Solution Analysis

| Instrument | Vendor Enum | File Types |
|------------|-------------|------------|
| Roche Cedex BioHT | `ROCHE_CEDEX_BIOHT` | .xlsx |
| Beckman Coulter Biomek | `BECKMAN_COULTER_BIOMEK` | .csv |

## Auto-Detection Patterns

The skill attempts to identify instrument type from file contents using these patterns:

### Vi-CELL BLU
- Column headers: "Sample ID", "Viable cells (x10^6 cells/mL)", "Viability (%)"
- File structure: CSV with specific column order

### Vi-CELL XR
- Column headers: "Sample", "Total cells/ml", "Viable cells/ml"
- Multiple export formats supported

### NanoDrop
- Column headers: "Sample Name", "Nucleic Acid Conc.", "A260", "A280"
- 260/280 and 260/230 ratio columns

### Plate Readers (General)
- Well identifiers (A1-H12 pattern)
- "Plate", "Well", "Sample" columns
- Block-based structure with metadata headers

### ELISA
- Standard curve data with concentrations
- OD/absorbance readings
- Sample/blank/standard classification

## Using Vendor Enums

```python
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file

# List all supported vendors
for v in Vendor:
    print(f"{v.name}: {v.value}")

# Convert file
asm = allotrope_from_file("data.csv", Vendor.BECKMAN_VI_CELL_BLU)
```

## Checking Supported Status

```python
from allotropy.parser_factory import get_parser

# Check if a vendor/file combo is supported
try:
    parser = get_parser(Vendor.BECKMAN_VI_CELL_BLU)
    print("Supported!")
except Exception as e:
    print(f"Not supported: {e}")
```

```

### scripts/convert_to_asm.py

```python
#!/usr/bin/env python3
"""
Instrument Data to ASM Converter

Converts laboratory instrument output files to Allotrope Simple Model (ASM) JSON format.
Supports auto-detection of instrument types and fallback parsing for unsupported formats.

Usage:
    python convert_to_asm.py <input_file> [--vendor VENDOR] [--output OUTPUT]
"""

import json
import sys
import re
import hashlib
import importlib.metadata
from pathlib import Path
from typing import Optional, Tuple, Dict, Any
from datetime import datetime


# Lazy imports to avoid errors if not installed
def get_allotropy():
    try:
        from allotropy.parser_factory import Vendor
        from allotropy.to_allotrope import allotrope_from_file, allotrope_from_io

        return Vendor, allotrope_from_file, allotrope_from_io
    except ImportError:
        return None, None, None


def get_pandas():
    try:
        import pandas as pd

        return pd
    except ImportError:
        return None


# Detection patterns for instrument identification
DETECTION_PATTERNS = {
    "BECKMAN_VI_CELL_BLU": {
        "columns": [
            "Sample ID",
            "Viable cells",
            "Viability",
            "Total cells",
            "Average diameter",
        ],
        "keywords": ["Vi-CELL BLU", "Beckman Coulter"],
        "file_patterns": [r".*\.csv$"],
        "confidence_boost": 20,
    },
    "BECKMAN_VI_CELL_XR": {
        "columns": ["Sample", "Total cells/ml", "Viable cells/ml", "Viability (%)"],
        "keywords": ["Vi-CELL XR", "Cell Viability Analyzer"],
        "file_patterns": [r".*\.(txt|xls|xlsx)$"],
        "confidence_boost": 20,
    },
    "THERMO_FISHER_NANODROP_EIGHT": {
        "columns": ["Sample Name", "Nucleic Acid Conc.", "A260", "A280", "260/280"],
        "keywords": ["NanoDrop Eight", "NanoDrop 8"],
        "file_patterns": [r".*\.(tsv|txt)$"],
        "confidence_boost": 15,
    },
    "THERMO_FISHER_NANODROP_ONE": {
        "columns": ["Sample Name", "Nucleic Acid(ng/uL)", "A260", "A280"],
        "keywords": ["NanoDrop One", "NanoDrop"],
        "file_patterns": [r".*\.(csv|xlsx)$"],
        "confidence_boost": 15,
    },
    "MOLDEV_SOFTMAX_PRO": {
        "columns": ["Well", "Sample", "Values", "Mean", "SD"],
        "keywords": ["SoftMax Pro", "SpectraMax", "Molecular Devices"],
        "file_patterns": [r".*\.txt$"],
        "confidence_boost": 15,
    },
    "BMG_MARS": {
        "columns": ["Well", "Content", "Conc.", "Mean", "SD", "CV"],
        "keywords": ["BMG LABTECH", "MARS", "CLARIOstar", "PHERAstar"],
        "file_patterns": [r".*\.(csv|txt)$"],
        "confidence_boost": 15,
    },
    "AGILENT_GEN5": {
        "columns": ["Well", "Read", "Time", "Temperature"],
        "keywords": ["Gen5", "BioTek", "Synergy"],
        "file_patterns": [r".*\.xlsx$"],
        "confidence_boost": 15,
    },
    "APPBIO_QUANTSTUDIO": {
        "columns": ["Well", "Sample Name", "Target Name", "CT", "Ct Mean"],
        "keywords": ["QuantStudio", "Applied Biosystems", "qPCR"],
        "file_patterns": [r".*\.xlsx$"],
        "confidence_boost": 15,
    },
}


def detect_instrument_type(
    filepath: str, file_content: Optional[str] = None
) -> Tuple[str, float]:
    """
    Auto-detect instrument type from file contents.

    Returns:
        Tuple of (vendor_name, confidence_score)
        confidence_score is 0-100
    """
    path = Path(filepath)
    filename = path.name.lower()
    extension = path.suffix.lower()

    # Read file content if not provided
    if file_content is None:
        try:
            if extension in [".xlsx", ".xls"]:
                pd = get_pandas()
                if pd:
                    df = pd.read_excel(filepath, nrows=50)
                    file_content = df.to_string() + "\n" + "\n".join(df.columns)
                else:
                    file_content = ""
            else:
                with open(filepath, "r", encoding="utf-8", errors="ignore") as f:
                    file_content = f.read(10000)  # First 10KB
        except Exception as e:
            print(f"Warning: Could not read file for detection: {e}")
            file_content = ""

    content_lower = file_content.lower()
    scores = {}

    for vendor, patterns in DETECTION_PATTERNS.items():
        score = 0

        # Check file extension patterns
        for pattern in patterns.get("file_patterns", []):
            if re.match(pattern, filename, re.IGNORECASE):
                score += 10
                break

        # Check column headers
        columns_found = 0
        for col in patterns.get("columns", []):
            if col.lower() in content_lower:
                columns_found += 1
        if columns_found > 0:
            score += min(50, columns_found * 15)

        # Check keywords
        for keyword in patterns.get("keywords", []):
            if keyword.lower() in content_lower:
                score += patterns.get("confidence_boost", 10)

        scores[vendor] = min(100, score)

    # Return best match
    if scores:
        best = max(scores.items(), key=lambda x: x[1])
        return best[0], best[1]

    return "UNKNOWN", 0


def convert_with_allotropy(filepath: str, vendor_name: str) -> Optional[Dict[str, Any]]:
    """
    Convert file using allotropy library.

    Returns:
        ASM dictionary or None if conversion fails
    """
    Vendor, allotrope_from_file, _ = get_allotropy()

    if Vendor is None:
        print(
            "Warning: allotropy not installed. Run: pip install allotropy --break-system-packages"
        )
        return None

    try:
        vendor = getattr(Vendor, vendor_name, None)
        if vendor is None:
            print(f"Warning: Vendor {vendor_name} not found in allotropy")
            return None

        asm = allotrope_from_file(filepath, vendor)
        return asm
    except Exception as e:
        print(f"Allotropy conversion failed: {e}")
        return None


def get_deterministic_timestamp(filepath: str) -> str:
    """
    Get deterministic timestamp for file.
    Uses file modification time for reproducibility.

    Returns:
        ISO format timestamp string
    """
    try:
        path = Path(filepath)
        mtime = path.stat().st_mtime
        return datetime.fromtimestamp(mtime).isoformat()
    except Exception:
        return "TIMESTAMP_NOT_AVAILABLE"


def calculate_file_hash(filepath: str) -> str:
    """Calculate SHA256 hash of file for provenance tracking."""
    try:
        with open(filepath, "rb") as f:
            return hashlib.sha256(f.read()).hexdigest()
    except Exception:
        return "HASH_NOT_AVAILABLE"


def get_library_version(library: str) -> str:
    """Get version of installed library."""
    try:
        return importlib.metadata.version(library)
    except Exception:
        return "VERSION_NOT_AVAILABLE"


def add_provenance_metadata(
    asm: Dict[str, Any],
    filepath: str,
    vendor: str,
    confidence: float,
    used_fallback: bool,
    warnings: list = None,
) -> Dict[str, Any]:
    """
    Add provenance metadata to ASM for reproducibility and audit trail.

    This metadata enables:
    - Reproducing conversions months later
    - Determining which version generated data
    - Auditing data lineage for regulatory compliance
    """
    pd = get_pandas()

    asm["$conversion_metadata"] = {
        "skill_version": "1.0.0",
        "allotropy_version": get_library_version("allotropy"),
        "pandas_version": pd.__version__ if pd else "NOT_INSTALLED",
        "conversion_timestamp_utc": datetime.utcnow().isoformat(),
        "input_file_sha256": calculate_file_hash(filepath),
        "input_file_size_bytes": Path(filepath).stat().st_size,
        "input_file_name": Path(filepath).name,
        "parser_used": "fallback" if used_fallback else "allotropy",
        "detection_confidence": confidence,
        "vendor_detected": vendor,
        "warnings": warnings or [],
    }

    return asm


def flexible_parse(filepath: str, detected_type: str) -> Optional[Dict[str, Any]]:
    """
    Flexible fallback parser when allotropy fails.
    Creates ASM-like structure from parsed data.

    **WARNING:** This parser creates simplified ASM that:
    - Does NOT distinguish raw vs. calculated data
    - LACKS instrument control parameters (temperature, wavelengths, etc.)
    - MAY NOT be compatible with regulatory requirements (GxP)
    - Should be used for exploratory analysis only, not production LIMS import
    """
    pd = get_pandas()
    if pd is None:
        print("Warning: pandas not installed for flexible parsing")
        return None

    path = Path(filepath)
    extension = path.suffix.lower()

    try:
        # Read file based on extension
        if extension in [".xlsx", ".xls"]:
            df = pd.read_excel(filepath, engine="openpyxl")
        elif extension == ".tsv":
            df = pd.read_csv(filepath, sep="\t")
        elif extension == ".csv":
            df = pd.read_csv(filepath)
        else:
            df = pd.read_csv(filepath, sep=None, engine="python")

        # Build ASM-like structure
        asm = build_flexible_asm(df, detected_type, filepath)
        return asm

    except Exception as e:
        print(f"Flexible parsing failed: {e}")
        return None


def build_flexible_asm(df, detected_type: str, filepath: str) -> Dict[str, Any]:
    """
    Build ASM-like JSON structure from parsed DataFrame.
    """
    timestamp = get_deterministic_timestamp(filepath)

    # Determine technique from detected type
    technique = "generic"
    if "VI_CELL" in detected_type:
        technique = "cell-counting"
    elif "NANODROP" in detected_type:
        technique = "spectrophotometry"
    elif detected_type in ["MOLDEV_SOFTMAX_PRO", "BMG_MARS", "AGILENT_GEN5"]:
        technique = "plate-reader"
    elif "QUANTSTUDIO" in detected_type:
        technique = "pcr"

    # Build base structure
    asm = {
        "$asm.manifest": {
            "vocabulary": ["http://purl.allotrope.org/voc/afo/REC/2023/09/"],
            "contexts": [
                "http://purl.allotrope.org/json-ld/afo-context-REC-2023-09.jsonld"
            ],
        },
        f"{technique}-aggregate-document": {
            "device-system-document": {
                "device-identifier": "FLEXIBLE_PARSER",
                "product-manufacturer": (
                    detected_type.split("_")[0] if "_" in detected_type else "Unknown"
                ),
            },
            f"{technique}-document": [
                {
                    "measurement-aggregate-document": {
                        "measurement-time": timestamp,
                        "measurement-document": [],
                    }
                }
            ],
        },
    }

    # Add measurements from DataFrame
    measurements = asm[f"{technique}-aggregate-document"][f"{technique}-document"][0][
        "measurement-aggregate-document"
    ]["measurement-document"]

    for _, row in df.iterrows():
        meas = {}
        for col in df.columns:
            value = row[col]
            if pd.notna(value):
                # Clean column name
                clean_col = str(col).lower().replace(" ", "-").replace("_", "-")
                clean_col = re.sub(r"[^a-z0-9-]", "", clean_col)

                # Handle numeric values
                if isinstance(value, (int, float)):
                    meas[clean_col] = {"value": value, "unit": "(unitless)"}
                else:
                    meas[clean_col] = str(value)

        if meas:
            measurements.append(meas)

    return asm


def main():
    """Main entry point."""
    import argparse

    parser = argparse.ArgumentParser(
        description="Convert instrument data to ASM format"
    )
    parser.add_argument("input", help="Input file path")
    parser.add_argument(
        "--vendor", help="Vendor enum name (auto-detected if not provided)"
    )
    parser.add_argument(
        "--output", "-o", help="Output file path (default: input_asm.json)"
    )
    parser.add_argument(
        "--flatten", action="store_true", help="Also generate flattened CSV"
    )
    parser.add_argument(
        "--allow-fallback",
        action="store_true",
        help="Allow fallback to simplified parser (reduced metadata)",
    )
    parser.add_argument(
        "--skip-validation",
        action="store_true",
        help="Skip automatic validation (not recommended)",
    )
    parser.add_argument(
        "--force",
        action="store_true",
        help="Force conversion even with low confidence detection",
    )

    args = parser.parse_args()

    input_path = Path(args.input)
    if not input_path.exists():
        print(f"Error: File not found: {args.input}")
        sys.exit(1)

    warnings = []

    # Detect or use provided vendor
    if args.vendor:
        vendor = args.vendor.upper()
        confidence = 100
        print(f"Using specified vendor: {vendor}")
    else:
        vendor, confidence = detect_instrument_type(str(input_path))
        print(f"Detected instrument: {vendor} (confidence: {confidence}%)")

        # Enforce confidence thresholds
        if confidence < 30:
            print(
                f"ERROR: Detection confidence too low ({confidence}%). Cannot proceed."
            )
            print("Please specify --vendor explicitly.")
            sys.exit(1)
        elif confidence < 60:
            warning_msg = f"WARNING: Low confidence detection ({confidence}%)."
            print(warning_msg)
            warnings.append(warning_msg)
            if not args.force:
                print("Use --force to proceed anyway (not recommended).")
                sys.exit(1)

    # Try allotropy first
    asm = convert_with_allotropy(str(input_path), vendor)
    used_fallback = False

    # Fall back to flexible parser
    if asm is None:
        print("\n" + "=" * 60)
        print("ALLOTROPY PARSING FAILED - USING REDUCED METADATA PARSER")
        print("=" * 60)
        print("Output will lack:")
        print("  - Calculated data traceability")
        print("  - Device control settings")
        print("  - Data processing metadata")
        print("\nNot suitable for:")
        print("  - Regulatory submissions")
        print("  - LIMS import with validation")
        print("=" * 60 + "\n")

        if not args.allow_fallback:
            print(
                "ERROR: Allotropy parsing failed. Use --allow-fallback to continue with"
            )
            print("simplified parser, but note that output will lack required metadata")
            print("for GxP compliance.")
            sys.exit(1)

        asm = flexible_parse(str(input_path), vendor)
        used_fallback = True
        warnings.append("Used fallback parser - reduced metadata")

    if asm is None:
        print("Error: Could not convert file")
        sys.exit(1)

    # Add provenance metadata
    asm = add_provenance_metadata(
        asm, str(input_path), vendor, confidence, used_fallback, warnings
    )

    # Determine output path
    if args.output:
        output_path = Path(args.output)
    else:
        output_path = input_path.with_suffix(".asm.json")

    # Write to temporary file first
    temp_path = output_path.with_suffix(".tmp")

    try:
        with open(temp_path, "w") as f:
            json.dump(asm, f, indent=2, default=str)

        # Validate unless skipped
        if not args.skip_validation:
            print("Running validation...")
            try:
                from validate_asm import validate_asm

                result = validate_asm(str(temp_path))

                if not result.is_valid():
                    print("\n" + "=" * 60)
                    print("VALIDATION FAILED")
                    print("=" * 60)
                    for error in result.errors:
                        print(f"ERROR: {error}")
                    for warning in result.warnings:
                        print(f"WARNING: {warning}")
                    print("=" * 60)

                    # Remove temp file
                    temp_path.unlink()
                    print("\nValidation failed. Output file not created.")
                    sys.exit(1)
                else:
                    if result.warnings:
                        print("\nValidation warnings:")
                        for warning in result.warnings:
                            print(f"  WARNING: {warning}")
                    print("Validation passed.")
            except ImportError:
                print(
                    "Warning: validate_asm.py not found. Skipping validation. "
                    "Consider adding validation script."
                )

        # Move temp file to final location
        temp_path.replace(output_path)
        print(f"ASM output written to: {output_path}")

    except Exception as e:
        # Clean up temp file on error
        if temp_path.exists():
            temp_path.unlink()
        raise e

    # Optionally flatten
    if args.flatten:
        from flatten_asm import flatten_asm_to_csv

        flat_path = input_path.with_suffix(".flat.csv")
        flatten_asm_to_csv(asm, str(flat_path))
        print(f"Flattened CSV written to: {flat_path}")


if __name__ == "__main__":
    main()

```