compositional-acset-comparison
Compositional algorithm and data analysis via algebraic databases
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install plurigrid-asi-compositional-acset-comparison
Repository
Skill path: skills/compositional-acset-comparison
Compositional algorithm and data analysis via algebraic databases
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: plurigrid.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install compositional-acset-comparison into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/plurigrid/asi before adding compositional-acset-comparison to shared team environments
- Use compositional-acset-comparison for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: compositional-acset-comparison
description: Compositional algorithm and data analysis via algebraic databases
version: 1.0.0
---
# Compositional ACSet Comparison Skill
**Trit**: 0 (ERGODIC - Coordinator)
**Color**: #26D826 (Green)
**Domain**: Compositional algorithm/data analysis via algebraic databases
## Homoiconic Insight
In self-hosted Lisps, the boundary between data structures and algorithms dissolves:
- Code is data, data is code (homoiconicity)
- Evaluation time is phase-scoped (RED/BLUE/GREEN gadgets)
- Entanglement avoided by leaving phases open until explicitly closed
- Compositional structure preserved across algorithm ↔ data boundary
## Overview
Compare data structures and their properties (density/sparsity, dynamic/static, versioning strategies) using the richness afforded by ACSets. Uses Gay.jl-aided superrandom walks for deterministic exploration of comparison dimensions.
## Canonical Triads
```
schema-validation (-1) ⊗ compositional-acset-comparison (0) ⊗ gay-mcp (+1) = 0 ✓ [Property Analysis]
three-match (-1) ⊗ compositional-acset-comparison (0) ⊗ koopman-generator (+1) = 0 ✓ [Dynamic Traversal]
temporal-coalgebra (-1) ⊗ compositional-acset-comparison (0) ⊗ oapply-colimit (+1) = 0 ✓ [Versioning]
polyglot-spi (-1) ⊗ compositional-acset-comparison (0) ⊗ gay-mcp (+1) = 0 ✓ [Homoiconic Interop]
```
## Golden Thread Walk Dimensions
Each dimension is explored via φ-angle (137.508°) golden spiral for maximal dispersion:
| Step | Dimension | Hex Color | Hue |
|------|-----------|-----------|-----|
| 1 | Storage Hierarchy | #EE2B2B | 0° |
| 2 | Density/Sparsity | #2BEE64 | 137.51° |
| 3 | Dynamic/Static | #9D2BEE | 275.02° |
| 4 | Versioning Strategy | #EED52B | 52.52° |
| 5 | Traversal Patterns | #2BCDEE | 190.03° |
| 6 | Index Structures | #EE2B94 | 327.54° |
| 7 | Compression | #5BEE2B | 105.05° |
| 8 | Query Model | #332BEE | 242.55° |
| 9 | Embedding Support | #EE6C2B | 20.06° |
| 10 | Interoperability | #2BEEA5 | 157.57° |
| 11 | Concurrency | #DE2BEE | 295.08° |
| 12 | Memory Model | #C5EE2B | 72.59° |
## Comparison Matrix: DuckDB vs LanceDB
### Dimension 1: Storage Hierarchy (#EE2B2B)
```
DuckDB LanceDB
────── ───────
Table Database
└─RowGroup (122K rows) └─Table
└─Column └─Manifest (version)
└─Segment └─Fragment
└─Block └─Column
└─VectorColumn
```
**ACSet Morphism Depth**:
- DuckDB: 4 levels (Table→RowGroup→Column→Segment)
- LanceDB: 5 levels (Database→Table→Manifest→Fragment→Column)
### Dimension 2: Density/Sparsity (#2BEE64)
| Property | DuckDB | LanceDB |
|----------|--------|---------|
| **Default** | Dense columnar | Dense Arrow arrays |
| **Sparse Support** | Via NULL bitmask | Via Arrow validity bitmask |
| **Vector Sparsity** | N/A | Sparse via IVF partitioning |
| **Storage Efficiency** | ALP, ZSTD compression | Lance columnar format |
| **ACSet Rep** | `DenseFinColumn` | `DenseFinColumn` with `VectorColumn` extension |
**Density Formula**:
```julia
density(acset, obj) = nparts(acset, obj) / theoretical_max(acset, obj)
# DuckDB Segment: ~2048 rows per vector batch
# LanceDB Fragment: variable, optimized for vector search
```
### Dimension 3: Dynamic/Static (#9D2BEE)
| Property | DuckDB | LanceDB |
|----------|--------|---------|
| **Schema Evolution** | ALTER TABLE | Manifest versioning |
| **Row Updates** | In-place (TRANSIENT→PERSISTENT) | Append + compaction |
| **Index Updates** | Dynamic B-Tree/ART | Rebuild IVF partitions |
| **ACSet Mutation** | `set_subpart!`, `rem_part!` | Append-only, version chains |
**State Machine**:
```
DuckDB Segment: TRANSIENT ⟷ PERSISTENT (bidirectional)
LanceDB Manifest: V1 → V2 → V3 → ... (append-only chain)
```
### Dimension 4: Versioning Strategy (#EED52B) ⭐ Lance SDK 1.0.0
**Critical Update (December 15, 2025)**: Lance SDK adopts SemVer 1.0.0
| Component | Versioning | Strategy |
|-----------|------------|----------|
| **Lance SDK** | SemVer 1.0.0 | MAJOR.MINOR.PATCH |
| **Lance File Format** | 2.1 | Binary compatibility, independent |
| **Lance Table Format** | Feature flags | Full backward compat, no linear versions |
| **Lance Namespace Spec** | Per-operation | Iceberg REST Catalog style |
**Key Insight**: Breaking SDK changes will NOT invalidate existing Lance data.
```julia
# ACSet representation of versioning strategies
@present SchVersioning(FreeSchema) begin
SDKVersion::Ob # SemVer (1.0.0)
FileFormat::Ob # Binary compat (2.1)
TableFormat::Ob # Feature flags
NamespaceSpec::Ob # Per-operation
# Morphisms: SDK ≠ Format
sdk_file::Hom(SDKVersion, FileFormat) # Many-to-one
file_table::Hom(FileFormat, TableFormat) # Independent
table_ns::Hom(TableFormat, NamespaceSpec) # Independent
end
```
**DuckDB Versioning**:
- Temporal tables via `VERSION AT`
- Extension versioning separate from core
### Dimension 5: Traversal Patterns (#2BCDEE)
| Pattern | DuckDB | LanceDB |
|---------|--------|---------|
| **Sequential Scan** | RowGroup→Column→Segment | Fragment→Column |
| **Index Scan** | ART/B-Tree navigation | IVF partition probe |
| **Vector Search** | N/A (extension) | Centroid→Partition→Rows |
| **Time Travel** | `FOR SYSTEM_TIME AS OF` | `checkout(version)` |
**ACSet Incident Queries**:
```julia
# DuckDB: Find all segments in a column
incident(duckdb_acset, col_id, :column)
# LanceDB: Find all centroids for an index
incident(lancedb_acset, idx_id, :partition_index) |>
flatmap(p -> incident(lancedb_acset, p, :centroid_partition))
```
### Dimension 6: Index Structures (#EE2B94)
| Index Type | DuckDB | LanceDB |
|------------|--------|---------|
| **Primary** | None (heap) | None (Lance format) |
| **Secondary** | ART (Radix Tree) | Scalar indexes |
| **Vector** | Extension (vss) | IVF_PQ, IVF_HNSW_SQ, IVF_HNSW_PQ |
| **Full-Text** | Extension (fts) | N/A |
**ACSet Index Representation**:
```julia
# LanceDB vector index hierarchy
VectorIndex → Partition → Centroid
↓
index_column → VectorColumn → Column
```
### Dimension 7: Compression (#5BEE2B)
| Algorithm | DuckDB | LanceDB |
|-----------|--------|---------|
| **Numeric** | ALP (Adaptive Lossless) | Arrow encoding |
| **String** | Dictionary, FSST | Dictionary |
| **General** | ZSTD, LZ4 | ZSTD |
| **Vector** | N/A | PQ (Product Quantization) |
### Dimension 8: Query Model (#332BEE)
| Aspect | DuckDB | LanceDB |
|--------|--------|---------|
| **Language** | SQL | Python/Rust API + SQL filter |
| **Optimization** | Volcano/push-based | Vector-first + filter |
| **Execution** | Vectorized (2048 batch) | Arrow RecordBatch |
| **Parallelism** | Morsel-driven | Partition-parallel |
### Dimension 9: Embedding Support (#EE6C2B)
| Feature | DuckDB | LanceDB |
|---------|--------|---------|
| **Native** | No | Yes (FixedSizeList<Float>) |
| **Generation** | UDF/Extension | EmbeddingFunction registry |
| **Storage** | ARRAY type | VectorColumn |
| **Search** | Extension (vss) | Native (IVF, HNSW) |
### Dimension 10: Interoperability (#2BEEA5)
| Format | DuckDB | LanceDB |
|--------|--------|---------|
| **Arrow** | Full support | Native (Lance = Arrow extension) |
| **Parquet** | Read/Write | Read (convert to Lance) |
| **CSV/JSON** | Read/Write | Via Arrow |
| **ACSets** | Via Tables.jl | Via Arrow → Tables.jl |
**Cross-Language (from ACSets Intertypes)**:
```julia
# Generate interoperable types
generate_module(DuckDBACSet, [PydanticTarget, JacksonTarget])
generate_module(LanceDBACSet, [PydanticTarget, JacksonTarget])
```
### Dimension 11: Concurrency (#DE2BEE)
| Aspect | DuckDB | LanceDB |
|--------|--------|---------|
| **Model** | MVCC | Optimistic (manifest-based) |
| **Writers** | Single (or WAL) | Single (append) |
| **Readers** | Unlimited concurrent | Unlimited concurrent |
| **Isolation** | Snapshot | Version snapshot |
### Dimension 12: Memory Model (#C5EE2B)
| Aspect | DuckDB | LanceDB |
|--------|--------|---------|
| **Buffer Pool** | BufferManager | Memory-mapped Arrow |
| **Eviction** | LRU | OS page cache |
| **Allocation** | Unified allocator | Arrow allocator |
| **Out-of-Core** | Automatic spill | Lazy loading |
## Interleaved 3-Stream Comparison
Using GF(3) conservation for balanced parallel analysis:
```
Stream 1 (Blue, -1): Validation/Constraints
#31945E → #B3DA86 → #8810F2 → #2F5194 → #2452AA → #245FB4
Stream 2 (Green, 0): Coordination/Transport
#6D59D2 → #9E2981 → #72E24F → #31C5B4 → #C04DDD → #1C8EEE
Stream 3 (Red, +1): Generation/Composition
#E22FA7 → #E812C8 → #6F68E6 → #25D840 → #DA387F → #A82358
```
## Crystal Family Analogy
Data structures map to crystal symmetry:
| Crystal Family | Symmetry | DuckDB Analog | LanceDB Analog |
|----------------|----------|---------------|----------------|
| Cubic (#9E94DD) | Order 48 | RowGroup uniformity | Fragment uniformity |
| Hexagonal (#65F475) | Order 24 | Column types | Vector dimensions |
| Tetragonal (#E764F1) | Order 16 | Segment blocking | Partition structure |
| Orthorhombic (#2ADC56) | Order 8 | Type system | Index types |
| Monoclinic (#CD7B61) | Order 4 | Compression | Quantization |
| Triclinic (#E4338F) | Order 2 | Raw storage | Raw Arrow |
## Hierarchical Control Palette
Powers PCT cascade for harmonious comparison:
```
Level 5 (Program): "Compare DuckDB vs LanceDB"
↓ sets reference for
Level 4 (Transition): Dimension sequence [30° steps]
↓ sets reference for
Level 3 (Configuration): Property relationships
↓ sets reference for
Level 2 (Sensation): Individual metrics
↓ sets reference for
Level 1 (Intensity): Numeric values
```
Colors: #B322C0 → #D5268C → #DC3946 → #DF884A → #E0D551 → #A3E04E
## XY Model Phenomenology
At τ=0.5 (ordered phase, τ < τ_c=0.893):
- Smooth field, defects bound in pairs
- High valence, disentangled
- Antivortex at (4,3): #C33567
**Interpretation**: Both DuckDB and LanceDB are in "ordered phase" - mature, production-ready systems with well-defined structures.
## Usage
```julia
using ACSets, Catlab
# Load both schemas
include("DuckDBACSet.jl")
include("LanceDBACSet.jl")
# Compare morphism structures
compare_schemas(SchDuckDB, SchLanceDB)
# Analyze density
density_analysis = map([SchDuckDB, SchLanceDB]) do sch
Dict(ob => sparsity_metric(sch, ob) for ob in obs(sch))
end
# Traverse with Gay.jl colors
for (i, dimension) in enumerate(DIMENSIONS)
color = gay_color_at(1000000, i)
analyze_dimension(dimension, color)
end
```
## Skill Files
| File | Purpose | Gay.jl Seed |
|------|---------|-------------|
| `DuckDBACSet.jl` | Schema for DuckDB storage layer | 1000000 |
| `LanceDBACSet.jl` | Schema for LanceDB vector store | 1000000 |
| `IrreversibleMorphisms.jl` | Analysis of lossy morphisms | 2000000 |
| `SideBySideComparison.jl` | Visual comparison tables | 3000000 |
| `ComparisonUtils.jl` | 12-dimension comparison utilities | 1000000 |
| `GhristCoverage.jl` | Persistent homology coverage analysis | 4000000 |
| `ColoringFunctor.jl` | Schema coloring + GF(3) verification | 4000000 |
| `GeometricMorphism.jl` | Presheaf topos translation analysis | 4000000 |
## Ghrist Persistent Homology Integration
Based on de Silva & Ghrist "Coverage in Sensor Networks via Persistent Homology":
**AM Radio Coverage Analogy**:
- Radio stations = Schema objects (Table, Column, etc.)
- Coverage radius = Morphism composability range
- Signal overlap = Translatable concepts between schemas
- Dead zones = Irreversible information loss
**Betti Numbers for Schemas**:
- β₀: Connected components (isolated subsystems)
- β₁: Coverage holes (information flow gaps)
- β₂: Enclosed voids (unreachable regions)
**Persistent Holes (never die)**:
- 🔴 `parent_manifest`: Temporal irreversibility (version chain)
- 🔴 `source_column`: Semantic irreversibility (embedding loss)
## Geometric Morphism Analysis
For presheaf topoi PSh(SchDuckDB) and PSh(SchLanceDB):
**Essential Image** (lossless translation):
- Table ↔ Table ✓
- Column ↔ Column ✓
**Partial Coverage** (lossy translation):
- RowGroup ~ Fragment
- VectorColumn → Column (loses vector semantics)
**Dead Zones** (no translation):
- Segment → ??? (DuckDB-only)
- Manifest ← ??? (LanceDB-only)
- VectorIndex ← ??? (LanceDB-only)
## References
- [de Silva & Ghrist, Coverage via Persistent Homology](https://www2.math.upenn.edu/~ghrist/preprints/persistent.pdf)
- [Lance SDK 1.0.0 Announcement](https://lancedb.github.io/lancedb/blog/announcing-lance-sdk-1.0.0/) (December 15, 2025)
- [DuckDB Architecture](https://duckdb.org/internals/overview)
- [ACSets.jl Documentation](https://algebraicjulia.github.io/ACSets.jl/)
- [StructuredDecompositions.jl](https://github.com/AlgebraicJulia/StructuredDecompositions.jl)
- [Gay.jl Deterministic Colors](https://github.com/bmorphism/Gay.jl)
## Scientific Skill Interleaving
This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:
### Annotated Data
- **anndata** [○] via bicomodule
- Hub for annotated matrices
### Bibliography References
- `general`: 734 citations in bib.duckdb
## Cat# Integration
This skill maps to **Cat# = Comod(P)** as a bicomodule in the equipment structure:
```
Trit: 0 (ERGODIC)
Home: Prof
Poly Op: ⊗
Kan Role: Adj
Color: #26D826
```
### GF(3) Naturality
The skill participates in triads satisfying:
```
(-1) + (0) + (+1) ≡ 0 (mod 3)
```
This ensures compositional coherence in the Cat# equipment structure.