transcript-analyzer
This skill analyzes meeting transcripts to extract decisions, action items, opinions, questions, and terminology using Cerebras AI (llama-3.3-70b). Use this skill when the user asks to analyze a transcript, extract action items from meetings, find decisions in conversations, build glossaries from discussions, or summarize key points from recorded meetings.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install glebis-claude-skills-transcript-analyzer
Repository
Skill path: transcript-analyzer
This skill analyzes meeting transcripts to extract decisions, action items, opinions, questions, and terminology using Cerebras AI (llama-3.3-70b). Use this skill when the user asks to analyze a transcript, extract action items from meetings, find decisions in conversations, build glossaries from discussions, or summarize key points from recorded meetings.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: glebis.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install transcript-analyzer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/glebis/claude-skills before adding transcript-analyzer to shared team environments
- Use transcript-analyzer for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: transcript-analyzer
description: This skill analyzes meeting transcripts to extract decisions, action items, opinions, questions, and terminology using Cerebras AI (llama-3.3-70b). Use this skill when the user asks to analyze a transcript, extract action items from meetings, find decisions in conversations, build glossaries from discussions, or summarize key points from recorded meetings.
---
# Transcript Analyzer
## Overview
Analyze meeting transcripts using AI to automatically extract and categorize:
- **Decisions** - Explicit agreements or choices made
- **Action Items** - Tasks assigned to people
- **Opinions** - Viewpoints expressed but not agreed upon
- **Questions** - Unresolved questions raised
- **Terms** - Domain-specific terminology for glossary
## Prerequisites
Before first use, install dependencies:
```bash
cd ~/.claude/skills/transcript-analyzer/scripts && npm install
```
## Usage
To analyze a transcript:
```bash
cd ~/.claude/skills/transcript-analyzer/scripts && npm run cli -- <transcript-file> -o <output.md> [options]
```
### Options
| Option | Description |
|--------|-------------|
| `<file>` | Transcript file to analyze (first positional arg) |
| `-o, --output <path>` | Write markdown to file instead of stdout |
| `--include-transcript` | Include full transcript in output [default: off] |
| `--no-extractions` | Exclude extractions section |
| `--no-glossary` | Exclude glossary section |
| `--glossary <path>` | Custom glossary JSON path |
| `--skip-glossary` | Don't preload glossary terms |
| `--max-terms <num>` | Limit glossary suggestions |
| `--chunk-size <num>` | Override chunk size (default: 3000) |
## Examples
### Basic Analysis
```bash
cd ~/.claude/skills/transcript-analyzer/scripts && npm run cli -- /path/to/meeting.md -o /path/to/analysis.md
```
### Include Original Transcript
```bash
cd ~/.claude/skills/transcript-analyzer/scripts && npm run cli -- /path/to/meeting.md -o /path/to/analysis.md --include-transcript
```
### Extractions Only (No Glossary)
```bash
cd ~/.claude/skills/transcript-analyzer/scripts && npm run cli -- /path/to/meeting.md -o /path/to/analysis.md --no-glossary
```
### Analyze Specific Section
To analyze only part of a transcript, extract the section first:
```bash
sed -n '50,100p' /path/to/meeting.md > /tmp/section.md
cd ~/.claude/skills/transcript-analyzer/scripts && npm run cli -- /tmp/section.md -o /path/to/section-analysis.md
```
## Output Format
The tool generates markdown with:
1. **YAML Frontmatter** - Processing metadata:
- chunks processed
- extractions count by type
- new terms discovered
- model used (llama-3.3-70b via Cerebras)
- token usage (input/output/total)
2. **Extractions** - Categorized findings with confidence scores:
- Each extraction includes speaker (if identified), source snippet, and related terms
3. **Glossary** - Approved terms from existing glossary + suggested new terms with definitions
## Configuration
The skill uses Cerebras API with the key stored in `scripts/.env`:
```
CEREBRAS_API_KEY=<your-key>
```
## Scripts
- `scripts/cli.ts` - Main CLI entry point
- `scripts/src/lib/extract-service.ts` - AI processing logic using Cerebras
- `scripts/src/lib/markdown.ts` - Markdown output generation
- `scripts/src/lib/term-utils.ts` - Term deduplication utilities
- `scripts/src/lib/mockExtractor.ts` - Mock mode for testing
- `scripts/src/types/index.ts` - TypeScript type definitions
- `scripts/data/glossary.json` - Default glossary storage
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/cli.ts
```typescript
#!/usr/bin/env node
import path from 'path';
import { promises as fs } from 'fs';
import { processTranscript } from '@/lib/extract-service';
import {
buildFullDocument
} from '@/lib/markdown';
import { Extraction, GlossaryTerm } from '@/types';
interface CLIOptions {
file?: string;
glossaryPath?: string;
skipGlossary?: boolean;
includeTranscript?: boolean;
noExtractions?: boolean;
noGlossary?: boolean;
output?: string;
maxTerms?: number;
chunkSize?: number;
}
async function main() {
const options = parseArgs(process.argv.slice(2));
if (!options.file) {
printHelp();
process.exit(1);
}
console.log('π Reading transcript...');
const transcriptPath = path.resolve(options.file);
const transcript = await fs.readFile(transcriptPath, 'utf-8');
console.log(` Loaded ${transcript.length} characters`);
console.log('π Loading glossary...');
const glossary = await loadGlossary(options);
console.log(` Found ${glossary.length} existing terms`);
console.log('π€ Processing transcript with AI...');
const result = await processTranscript({
transcript,
glossary,
chunkSize: options.chunkSize,
maxTerms: options.maxTerms
});
console.log(`β Processing complete`);
console.log(` Chunks: ${result.stats.chunks}`);
console.log(` Extractions: ${result.stats.totalExtractions}`);
console.log(` New terms: ${result.stats.newTerms}`);
if (result.stats.inputTokens && result.stats.outputTokens) {
console.log(` Tokens: ${result.stats.inputTokens + result.stats.outputTokens} total`);
}
console.log('π Building markdown document...');
const markdown = buildFullDocument({
transcript,
extractions: result.extractions,
glossaryTerms: glossary,
suggestedTerms: result.suggestedTerms,
stats: result.stats,
includeTranscript: options.includeTranscript,
includeExtractions: !options.noExtractions,
includeGlossary: !options.noGlossary
});
if (options.output) {
const outputPath = path.resolve(options.output);
await fs.writeFile(outputPath, markdown, 'utf-8');
console.log(`β Exported to ${outputPath}`);
} else {
console.log(markdown);
}
}
function parseArgs(argv: string[]): CLIOptions {
const options: CLIOptions = {};
const args = [...argv];
while (args.length > 0) {
const arg = args.shift()!;
if (!arg.startsWith('-') && !options.file) {
options.file = arg;
continue;
}
switch (arg) {
case '--file':
case '-f':
options.file = args.shift();
break;
case '--glossary':
options.glossaryPath = args.shift();
break;
case '--skip-glossary':
options.skipGlossary = true;
break;
case '--include-transcript':
options.includeTranscript = true;
break;
case '--no-extractions':
options.noExtractions = true;
break;
case '--no-glossary':
options.noGlossary = true;
break;
case '--output':
case '-o':
options.output = args.shift();
break;
case '--max-terms':
options.maxTerms = Number(args.shift());
break;
case '--chunk-size':
options.chunkSize = Number(args.shift());
break;
case '--help':
case '-h':
printHelp();
process.exit(0);
default:
if (!options.file) {
options.file = arg;
}
break;
}
}
return options;
}
async function loadGlossary(options: CLIOptions): Promise<GlossaryTerm[]> {
if (options.skipGlossary) return [];
const defaultPath = path.join(process.cwd(), 'data', 'glossary.json');
const target = options.glossaryPath
? path.resolve(options.glossaryPath)
: defaultPath;
try {
const data = await fs.readFile(target, 'utf-8');
return JSON.parse(data);
} catch {
return [];
}
}
function printHelp() {
console.log(`Usage: npm run cli -- <transcript.txt> -o <output.md> [options]
Outputs a Markdown document with metadata, extractions, and glossary.
Options:
<file> Transcript file to analyze (first positional arg)
-o, --output <path> Write markdown to a file instead of stdout
--glossary <path> Path to glossary JSON (defaults to data/glossary.json)
--skip-glossary Do not preload glossary terms
--include-transcript Include full transcript in output [default: off]
--no-extractions Exclude extractions section from output
--no-glossary Exclude glossary section from output
--max-terms <num> Maximum glossary suggestions
--chunk-size <num> Override chunk size for processing
-h, --help Show this help message
Examples:
npm run cli -- transcript.txt -o output.md
npm run cli -- transcript.txt -o output.md --include-transcript
npm run cli -- transcript.txt --no-glossary -o output.md
Output includes:
- YAML frontmatter with processing metadata (chunks, tokens, model)
- Full transcript [default: off, use --include-transcript]
- Extractions (decisions, actions, opinions, questions, terms) [default: on]
- Glossary terms (approved + suggested) [default: on]
`);
}
main().catch((error) => {
console.error(error);
process.exit(1);
});
```
### scripts/src/lib/extract-service.ts
```typescript
import OpenAI from 'openai';
import { Extraction, ExtractionType, GlossaryTerm } from '@/types';
import { mockProcessTranscript } from './mockExtractor';
import { areTermsEquivalent } from './term-utils';
const cerebrasApiKey = process.env.CEREBRAS_API_KEY;
const cerebras = cerebrasApiKey ? new OpenAI({
apiKey: cerebrasApiKey,
baseURL: 'https://api.cerebras.ai/v1'
}) : null;
const forceMock = process.env.MOCK_TRANSCRIPT_PROCESSOR === 'true';
export interface ProcessTranscriptOptions {
transcript: string;
glossary?: GlossaryTerm[];
chunkSize?: number;
maxTerms?: number;
}
export async function processTranscript({
transcript,
glossary = [],
chunkSize = 3000,
maxTerms
}: ProcessTranscriptOptions) {
if (!transcript) {
throw new Error('Transcript required');
}
if (!cerebras || forceMock) {
const mockResult = mockProcessTranscript(transcript, glossary, maxTerms);
return {
...mockResult,
stats: {
chunks: 1,
totalExtractions: mockResult.extractions.length,
byType: countByType(mockResult.extractions),
newTerms: mockResult.suggestedTerms.length,
mode: 'mock' as const
}
};
}
const MODEL = 'llama-3.3-70b';
const chunks = chunkTranscript(transcript, chunkSize);
const allExtractions: Extraction[] = [];
const suggestedTerms: GlossaryTerm[] = [];
let totalInputTokens = 0;
let totalOutputTokens = 0;
console.log(` Split into ${chunks.length} chunk${chunks.length > 1 ? 's' : ''}`);
for (let i = 0; i < chunks.length; i++) {
const chunk = chunks[i];
console.log(` Processing chunk ${i + 1}/${chunks.length}...`);
const extractionResponse = await cerebras.chat.completions.create({
model: MODEL,
max_tokens: 4096,
messages: [{
role: 'user',
content: buildExtractionPrompt(glossary, chunk, i + 1, chunks.length)
}]
});
totalInputTokens += extractionResponse.usage?.prompt_tokens || 0;
totalOutputTokens += extractionResponse.usage?.completion_tokens || 0;
const extractionText = extractionResponse.choices[0]?.message?.content || '';
const chunkExtractions = parseExtractions(extractionText, i);
allExtractions.push(...chunkExtractions);
console.log(` β Found ${chunkExtractions.length} extractions`);
const termResponse = await cerebras.chat.completions.create({
model: MODEL,
max_tokens: 2048,
messages: [{
role: 'user',
content: buildTermPrompt(chunk)
}]
});
totalInputTokens += termResponse.usage?.prompt_tokens || 0;
totalOutputTokens += termResponse.usage?.completion_tokens || 0;
const termText = termResponse.choices[0]?.message?.content || '';
const chunkTerms = parseTerms(termText, i);
suggestedTerms.push(...chunkTerms);
console.log(` β Found ${chunkTerms.length} terms`);
}
const mergedExtractions = deduplicateExtractions(allExtractions);
const mergedTerms = limitTerms(deduplicateTerms(suggestedTerms, glossary), maxTerms);
return {
extractions: mergedExtractions,
suggestedTerms: mergedTerms,
stats: {
chunks: chunks.length,
totalExtractions: mergedExtractions.length,
byType: countByType(mergedExtractions),
newTerms: mergedTerms.length,
mode: 'live' as const,
model: MODEL,
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens
}
};
}
function buildExtractionPrompt(
glossary: GlossaryTerm[],
chunk: string,
index: number,
total: number
) {
return `You are analyzing a meeting transcript. Extract the following with confidence scores (0-100):
1. DECISIONS - explicit agreements or choices made (type: "decision")
2. ACTION ITEMS - tasks assigned to people (type: "action")
3. OPINIONS - viewpoints expressed but not agreed upon (type: "opinion")
4. QUESTIONS - unresolved questions raised (type: "question")
5. TERMS - domain-specific terminology that should be in a glossary (type: "term")
For each extraction, provide:
- type: one of decision/action/opinion/question/term
- content: the extracted content
- confidence: 0-100 score (how certain you are this is correctly classified)
- speaker: who said it (if identifiable)
- sourceSnippet: exact quote from transcript (max 200 chars)
- relatedTerms: any domain terms used
GLOSSARY CONTEXT (use these definitions, suggest new terms not in this list):
${formatGlossary(glossary)}
Return JSON array of extractions. Be conservative - only extract clear items with confidence > 60.
TRANSCRIPT CHUNK ${index}/${total}:
${chunk}`;
}
function buildTermPrompt(chunk: string) {
return `Analyze this transcript chunk for domain-specific terminology that should be in a glossary.
For each term found:
- term: the term or phrase
- definition: inferred definition from context
- aliases: alternative ways it's referred to
Only include terms that are:
1. Domain-specific (not common words)
2. Used multiple times OR defined in context
3. Would help someone new understand the discussion
Return JSON array of terms.
TRANSCRIPT CHUNK:
${chunk}`;
}
function chunkTranscript(text: string, maxChars: number): string[] {
const chunks: string[] = [];
const lines = text.split('\n');
let currentChunk = '';
for (const line of lines) {
if (currentChunk.length + line.length > maxChars && currentChunk.length > 0) {
chunks.push(currentChunk.trim());
currentChunk = line;
} else {
currentChunk += '\n' + line;
}
}
if (currentChunk.trim()) {
chunks.push(currentChunk.trim());
}
return chunks;
}
function formatGlossary(terms: GlossaryTerm[]): string {
if (terms.length === 0) return 'No existing glossary terms.';
return terms.map(t => `- ${t.term}: ${t.definition}`).join('\n');
}
function parseExtractions(text: string, chunkIndex: number): Extraction[] {
try {
const jsonMatch = text.match(/\[[\s\S]*\]/);
if (!jsonMatch) return [];
const parsed = JSON.parse(jsonMatch[0]);
return parsed.map((item: Record<string, unknown>, idx: number) => ({
id: `ext-${chunkIndex}-${idx}`,
type: (item.type as ExtractionType) || 'opinion',
content: String(item.content || ''),
confidence: Number(item.confidence) || 50,
speaker: item.speaker ? String(item.speaker) : undefined,
sourceSnippet: String(item.sourceSnippet || item.source_snippet || ''),
relatedTerms: Array.isArray(item.relatedTerms) ? item.relatedTerms : []
}));
} catch {
return [];
}
}
function parseTerms(text: string, chunkIndex: number): GlossaryTerm[] {
try {
const jsonMatch = text.match(/\[[\s\S]*\]/);
if (!jsonMatch) return [];
const parsed = JSON.parse(jsonMatch[0]);
return parsed.map((item: Record<string, unknown>, idx: number) => ({
id: `term-${chunkIndex}-${idx}`,
term: String(item.term || ''),
definition: String(item.definition || ''),
aliases: Array.isArray(item.aliases) ? item.aliases : [],
firstMentioned: '',
frequency: 1,
approved: false
}));
} catch {
return [];
}
}
function deduplicateExtractions(extractions: Extraction[]): Extraction[] {
const seen = new Map<string, Extraction>();
for (const ext of extractions) {
const key = `${ext.type}-${ext.content.toLowerCase().slice(0, 50)}`;
const existing = seen.get(key);
if (!existing || ext.confidence > existing.confidence) {
seen.set(key, ext);
}
}
return Array.from(seen.values()).sort((a, b) => b.confidence - a.confidence);
}
function deduplicateTerms(terms: GlossaryTerm[], existingGlossary: GlossaryTerm[]): GlossaryTerm[] {
const seenLabels: string[] = [];
existingGlossary.forEach(term => {
seenLabels.push(term.term);
term.aliases?.forEach(alias => seenLabels.push(alias));
});
const deduped: GlossaryTerm[] = [];
for (const term of terms) {
if (!term.term?.trim()) continue;
if (seenLabels.some(label => areTermsEquivalent(label, term.term))) {
continue;
}
term.frequency = term.frequency ?? 1;
const existing = deduped.find(t => areTermsEquivalent(t.term, term.term));
if (existing) {
existing.frequency += term.frequency;
continue;
}
deduped.push(term);
seenLabels.push(term.term);
term.aliases?.forEach(alias => seenLabels.push(alias));
}
return deduped.sort((a, b) => b.frequency - a.frequency);
}
function countByType(extractions: Extraction[]): Record<string, number> {
const counts: Record<string, number> = {};
for (const ext of extractions) {
counts[ext.type] = (counts[ext.type] || 0) + 1;
}
return counts;
}
function limitTerms(terms: GlossaryTerm[], maxTerms?: number) {
if (typeof maxTerms !== 'number' || maxTerms <= 0) {
return terms;
}
return terms.slice(0, maxTerms);
}
```
### scripts/src/lib/markdown.ts
```typescript
import { Extraction, GlossaryTerm, ProcessingStats } from '@/types';
export function buildExtractionsMarkdown(extractions: Extraction[]): string {
if (extractions.length === 0) {
return '# Extractions\n\nNo extractions available.';
}
const grouped = extractions.reduce<Record<string, Extraction[]>>((acc, extraction) => {
const key = extraction.type;
acc[key] = acc[key] || [];
acc[key].push(extraction);
return acc;
}, {});
const sections = Object.entries(grouped)
.sort(([a], [b]) => a.localeCompare(b))
.map(([type, items]) => {
const header = `## ${capitalize(type)}`;
const body = items
.map((item, idx) => {
const speaker = item.speaker ? `**Speaker:** ${item.speaker}\n` : '';
const terms = item.relatedTerms && item.relatedTerms.length > 0
? `**Related Terms:** ${item.relatedTerms.join(', ')}\n`
: '';
const source = item.sourceSnippet ? `> ${item.sourceSnippet}` : '';
return `### ${idx + 1}. ${item.content}\n${speaker}${terms}**Confidence:** ${item.confidence}%\n${source}`;
})
.join('\n\n');
return `${header}\n\n${body}`;
});
return ['# Extractions', ...sections].join('\n\n');
}
export function buildGlossaryMarkdown(
approved: GlossaryTerm[],
suggested: GlossaryTerm[] = []
): string {
const approvedSection = approved.length > 0
? approved
.map((term, idx) => formatTerm(idx + 1, term))
.join('\n\n')
: 'No approved terms available.';
const suggestedSection = suggested.length > 0
? suggested
.map((term, idx) => formatTerm(idx + 1, term))
.join('\n\n')
: 'No suggested terms available.';
return [
'# Glossary',
'## Approved Terms',
approvedSection,
'## Suggested Terms',
suggestedSection
].join('\n\n');
}
export function buildCombinedMarkdown(
extractions: Extraction[],
approved: GlossaryTerm[],
suggested: GlossaryTerm[] = []
): string {
const parts = [
buildExtractionsMarkdown(extractions),
buildGlossaryMarkdown(approved, suggested)
];
return parts.join('\n\n---\n\n');
}
export interface FullDocumentOptions {
transcript: string;
extractions: Extraction[];
glossaryTerms: GlossaryTerm[];
suggestedTerms: GlossaryTerm[];
stats: ProcessingStats;
includeTranscript?: boolean;
includeExtractions?: boolean;
includeGlossary?: boolean;
}
export function buildFullDocument(options: FullDocumentOptions): string {
const {
transcript,
extractions,
glossaryTerms,
suggestedTerms,
stats,
includeTranscript = false,
includeExtractions = true,
includeGlossary = true
} = options;
const parts: string[] = [];
// Metadata section
parts.push(buildMetadataSection(stats));
// Optional transcript section
if (includeTranscript) {
parts.push('# Transcript\n\n' + transcript);
}
// Optional extractions section
if (includeExtractions && extractions.length > 0) {
parts.push(buildExtractionsMarkdown(extractions));
}
// Optional glossary section
if (includeGlossary && (glossaryTerms.length > 0 || suggestedTerms.length > 0)) {
parts.push(buildGlossaryMarkdown(glossaryTerms, suggestedTerms));
}
return parts.join('\n\n---\n\n');
}
function buildMetadataSection(stats: ProcessingStats): string {
const lines = [
'---',
`chunks: ${stats.chunks}`,
`extractions: ${stats.totalExtractions}`,
`new_terms: ${stats.newTerms}`,
`mode: ${stats.mode}`
];
if (stats.model) {
lines.push(`model: ${stats.model}`);
}
if (stats.inputTokens !== undefined) {
lines.push(`input_tokens: ${stats.inputTokens}`);
}
if (stats.outputTokens !== undefined) {
lines.push(`output_tokens: ${stats.outputTokens}`);
}
if (stats.inputTokens && stats.outputTokens) {
lines.push(`total_tokens: ${stats.inputTokens + stats.outputTokens}`);
}
const typeBreakdown = Object.entries(stats.byType)
.map(([type, count]) => ` ${type}: ${count}`)
.join('\n');
if (typeBreakdown) {
lines.push(`extractions_by_type:\n${typeBreakdown}`);
}
lines.push('---');
return lines.join('\n');
}
function formatTerm(index: number, term: GlossaryTerm) {
const aliases = term.aliases && term.aliases.length > 0
? `**Aliases:** ${term.aliases.join(', ')}\n`
: '';
return `### ${index}. ${term.term}\n${term.definition}\n\n${aliases}**First Mentioned:** ${term.firstMentioned || 'n/a'}\n**Frequency:** ${term.frequency}`;
}
function capitalize(value: string) {
return value.charAt(0).toUpperCase() + value.slice(1);
}
```
### scripts/src/lib/term-utils.ts
```typescript
export function normalizeTermKey(value: string): string {
return value
.toLowerCase()
.replace(/[^a-z0-9]/g, '')
.trim();
}
export function areTermsEquivalent(a: string, b: string): boolean {
const normA = normalizeTermKey(a);
const normB = normalizeTermKey(b);
if (!normA || !normB) return false;
if (normA === normB) return true;
return levenshtein(normA, normB) <= 2;
}
function levenshtein(a: string, b: string): number {
const aLen = a.length;
const bLen = b.length;
if (aLen === 0) return bLen;
if (bLen === 0) return aLen;
const prev = new Array(bLen + 1).fill(0);
const curr = new Array(bLen + 1).fill(0);
for (let j = 0; j <= bLen; j++) {
prev[j] = j;
}
for (let i = 1; i <= aLen; i++) {
curr[0] = i;
for (let j = 1; j <= bLen; j++) {
const cost = a[i - 1] === b[j - 1] ? 0 : 1;
curr[j] = Math.min(
prev[j] + 1,
curr[j - 1] + 1,
prev[j - 1] + cost
);
}
for (let j = 0; j <= bLen; j++) {
prev[j] = curr[j];
}
}
return curr[bLen];
}
```
### scripts/src/lib/mockExtractor.ts
```typescript
import { Extraction, ExtractionType, GlossaryTerm } from '@/types';
import { areTermsEquivalent, normalizeTermKey } from './term-utils';
export function mockProcessTranscript(
transcript: string,
glossary: GlossaryTerm[],
maxTerms?: number
) {
const sentences = splitTranscript(transcript);
const extracted: Extraction[] = [];
sentences.forEach((sentence, idx) => {
const parsed = classifySentence(sentence, glossary, idx);
if (parsed) {
extracted.push(parsed);
}
});
const suggestedTerms = buildGlossarySuggestions(transcript, glossary, maxTerms);
return {
extractions: extracted,
suggestedTerms
};
}
function splitTranscript(transcript: string): string[] {
return transcript
.split(/\n+|(?<=[.!?])\s+/)
.map(part => part.trim())
.filter(Boolean);
}
function classifySentence(
sentence: string,
glossary: GlossaryTerm[],
index: number
): Extraction | null {
let working = sentence;
let speaker: string | undefined;
const speakerMatch = working.match(/^([A-Za-z][A-Za-z\s]{1,30}):\s*(.+)$/);
if (speakerMatch) {
speaker = speakerMatch[1].trim();
working = speakerMatch[2].trim();
}
const lowered = working.toLowerCase();
let type: ExtractionType | null = null;
let confidence = 65;
if (/[?οΌ]$/.test(working)) {
type = 'question';
confidence = 90;
} else if (/\b(decision|decide|agreed|agreement|approve)\b/.test(lowered)) {
type = 'decision';
confidence = 80;
} else if (/\b(action item|follow up|will|need to|let's|assign|task)\b/.test(lowered)) {
type = 'action';
confidence = 78;
} else if (/\b(i think|i believe|feels like|i guess|should)\b/.test(lowered)) {
type = 'opinion';
confidence = 70;
}
if (!type) {
return null;
}
const related = glossary
.filter(term => containsTerm(lowered, term.term))
.map(term => term.term);
return {
id: `mock-ext-${index}`,
type,
content: working,
confidence,
speaker,
sourceSnippet: working.slice(0, 200),
relatedTerms: related
};
}
function containsTerm(content: string, term: string) {
return content.includes(term.toLowerCase());
}
export function buildGlossarySuggestions(
transcript: string,
glossary: GlossaryTerm[],
maxTerms?: number
): GlossaryTerm[] {
const existingLabels: string[] = [];
glossary.forEach(term => {
existingLabels.push(term.term);
term.aliases?.forEach(alias => existingLabels.push(alias));
});
const tokens = transcript.match(/\b[A-Z][A-Za-z0-9]+(?:\s+[A-Z][A-Za-z0-9]+)?\b/g) || [];
const aggregates: { label: string; count: number }[] = [];
tokens.forEach(token => {
const normalized = token.trim();
if (normalized.length < 3) return;
if (!normalizeTermKey(normalized)) return;
if (existingLabels.some(label => areTermsEquivalent(label, normalized))) {
return;
}
const entry = aggregates.find(item => areTermsEquivalent(item.label, normalized));
if (entry) {
entry.count += 1;
} else {
aggregates.push({ label: normalized, count: 1 });
}
});
const sorted = aggregates.sort((a, b) => b.count - a.count);
const limit = typeof maxTerms === 'number' && maxTerms > 0 ? maxTerms : 5;
return sorted.slice(0, limit).map((info, idx) => ({
id: `mock-term-${idx}`,
term: info.label,
definition: deriveDefinition(info.label, transcript),
aliases: [],
firstMentioned: 'mock',
frequency: info.count,
approved: false
}));
}
function deriveDefinition(term: string, transcript: string): string {
const pattern = new RegExp(`[^.!?]*${escapeRegExp(term)}[^.!?]*[.!?]`, 'i');
const match = transcript.match(pattern);
if (match) {
return match[0].trim();
}
return `Term mentioned in transcript context as "${term}".`;
}
function escapeRegExp(value: string) {
return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
```
### scripts/src/types/index.ts
```typescript
export type ExtractionType = 'decision' | 'action' | 'opinion' | 'question' | 'term';
export interface Extraction {
id: string;
type: ExtractionType;
content: string;
confidence: number; // 0-100
speaker?: string;
sourceSnippet: string;
sourceTimestamp?: string;
relatedTerms?: string[];
}
export interface GlossaryTerm {
id: string;
term: string;
definition: string;
aliases: string[];
firstMentioned: string; // meeting ID
frequency: number;
approved: boolean;
}
export interface TranscriptMeeting {
id: string;
title: string;
date: string;
filePath: string;
extractions: Extraction[];
suggestedTerms: GlossaryTerm[];
}
export interface ProcessingResult {
meeting: TranscriptMeeting;
glossaryUpdates: GlossaryTerm[];
}
export interface ProcessingStats {
chunks: number;
totalExtractions: number;
byType: Record<string, number>;
newTerms: number;
mode: 'live' | 'mock';
inputTokens?: number;
outputTokens?: number;
model?: string;
}
```
### scripts/data/glossary.json
```json
[
{
"id": "term-1764268209078",
"term": "Oldowan tradition",
"definition": "The earliest known stone tool technology, dating from 2.6+ million years ago, characterized by simple stone tools that required planning and foresight to manufacture",
"aliases": [
"Oldowan tools"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268210888",
"term": "Acheulean tradition",
"definition": "A stone tool technology spanning 1.7-0.1 million years ago, characterized by symmetrical hand axes that demanded greater motor control and cognitive sophistication than earlier tools",
"aliases": [
"Acheulean hand axes"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268212895",
"term": "embodied cognition",
"definition": "The concept that thinking occurs through physical interaction with the world, particularly relevant to tool-making where cognitive development happens through hands-on manipulation and creation",
"aliases": [
"embodied knowledge"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268214559",
"term": "manual dexterity",
"definition": "Fine motor control and hand-brain coordination that was both required for and developed by tool-making, creating evolutionary pressure for increasingly sophisticated motor capabilities",
"aliases": [
"fine motor control"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268218126",
"term": "executive function",
"definition": "Higher-level cognitive processes including planning, foresight, and problem-solving that were enhanced through tool-making activities and drove brain development",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268256469",
"term": "Homo faber",
"definition": "Hannah Arendt's philosophical concept defining humans by their fundamental ability to shape their environment and fate through making tools and objects, positioning tool-making as core human identity rather than mere capability",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268257592",
"term": "cognitive-physical feedback loop",
"definition": "The evolutionary cycle where tool-making drove brain development through enhanced executive function and motor skills, which in turn enabled more sophisticated tool creation, creating selective pressure for larger brains",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268683191",
"term": "MCP",
"definition": "Model Context Protocol - a standard for connecting AI models to external data sources and services, allowing models to access content from various applications and databases",
"aliases": [
"MCP-ΡΠ΅ΡΠ²Π΅ΡΡ",
"MCP-ΡΠ΅ΡΠ²Π΅Ρ",
"MCP-ΡΠΊΠΈ"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268684581",
"term": "ΡΠΊΠΈΠ»Π»Ρ",
"definition": "ΠΠΎΠ΄ΡΠ»ΡΠ½ΡΠ΅ Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎΡΡΠΈ, ΡΠ°ΡΡΠΈΡΡΡΡΠΈΠ΅ ΡΡΠ½ΠΊΡΠΈΠΎΠ½Π°Π»ΡΠ½ΠΎΡΡΡ Claude - ΠΏΠ°ΠΏΠΊΠΈ Ρ ΡΠ°ΠΉΠ»Π°ΠΌΠΈ, ΡΠΎΠ΄Π΅ΡΠΆΠ°ΡΠΈΠ΅ ΠΏΡΠΎΠΌΠΏΡΡ ΠΈ ΠΊΠΎΠ΄ Π΄Π»Ρ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΠΏΠ΅ΡΠΈΡΠΈΡΠ΅ΡΠΊΠΈΡ
Π·Π°Π΄Π°Ρ",
"aliases": [
"agent skills",
"Claude skills",
"ΡΠΊΠΈΠ»Π»"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268685606",
"term": "ΠΠ»ΠΎΠ΄",
"definition": "AI-Π°ΡΡΠΈΡΡΠ΅Π½Ρ ΠΎΡ Anthropic, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΡΠΉ Π΄Π»Ρ ΠΏΡΠΎΠ³ΡΠ°ΠΌΠΌΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΈ Π΄ΡΡΠ³ΠΈΡ
Π·Π°Π΄Π°Ρ",
"aliases": [
"Claude",
"ΠΠ»ΠΎΠ΄ ΠΠΎΠ΄",
"Claude Code"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268686333",
"term": "ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡ",
"definition": "ΠΠ±ΡΠ°Ρ ΠΏΠ°ΠΌΡΡΡ ΡΠ΅ΡΡΠΈΠΈ Ρ ΠΌΠΎΠ΄Π΅Π»ΡΡ, Π² ΠΊΠΎΡΠΎΡΡΡ ΠΏΠΎΠΏΠ°Π΄Π°Π΅Ρ ΠΈΡΡΠΎΡΠΈΡ Π²Π·Π°ΠΈΠΌΠΎΠ΄Π΅ΠΉΡΡΠ²ΠΈΠΉ ΠΈ ΠΊΠΎΡΠΎΡΠ°Ρ Π²Π»ΠΈΡΠ΅Ρ Π½Π° ΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΡΠ΅Π»ΡΠ½ΠΎΡΡΡ",
"aliases": [
"ΠΎΠ±ΡΠΈΠΉ ΠΊΠΎΠ½ΡΠ΅ΠΊΡΡ"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268687364",
"term": "ΡΠΊΠΈΠ»Π»",
"definition": "A self-contained package containing all actions and materials needed to teach Claude to perform a specific task, similar to a skill for a person. Can include text files, scripts, templates, and other resources.",
"aliases": [
"skill"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268689054",
"term": "software 3.0",
"definition": "AI-driven programming approach where interactions are managed through prompts and AI agents rather than traditional code",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268690067",
"term": "Obsidian",
"definition": "A note-taking application that stores files locally, which can be integrated with AI assistants through MCP",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268690993",
"term": "Firecrawl",
"definition": "A tool or service used for web scraping and content extraction, mentioned in context of a skill that can generate scientific templates",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268692040",
"term": "LLM",
"definition": "Large Language Model - AI systems that can process and generate text, mentioned as an alternative to Claude for certain tasks",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268695398",
"term": "Claude",
"definition": "An AI assistant that can write code and scripts, as referenced in the context of writing programming scripts",
"aliases": [
"him"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268697671",
"term": "skill (ΡΠΊΠΈΠ»Π»)",
"definition": "A coding automation feature in Claude that can execute complex multi-step tasks, often converting from software 1.0 utilities and scripts to software 3.0 AI-driven tools",
"aliases": [
"skills",
"ΡΠΊΠΈΠ»Π»Ρ"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764268705680",
"term": "Claude Code",
"definition": "A command-line interface tool for interacting with Claude AI, distinct from Claude Desktop which is a GUI application",
"aliases": [
"ΠΊΠ»ΠΎΠ΄-ΠΊΠΎΠ΄",
"CloudCode",
"ΠΠ»ΠΎΠ΄ ΠΠΎΠ΄"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269937176",
"term": "ΡΡΠ°ΡΡΠ°ΠΏ",
"definition": "A startup company or early-stage business venture, often technology-focused",
"aliases": [
"ΡΡΡΠ°ΡΠ°ΠΏ"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269939004",
"term": "ΠΏΡΠ΅Π΄ΠΏΡΠΈΠ½ΠΈΠΌΠ°ΡΠ΅Π»ΡΡΡΠ²ΠΎ",
"definition": "Entrepreneurship - the activity of starting and running business ventures",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269940881",
"term": "ΡΠ΅Π½Π½ΠΎΡΡΠΈ ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
ΠΏΡΠ΅Π΄ΠΏΡΠΈΠ½ΠΈΠΌΠ°ΡΠ΅Π»Π΅ΠΉ",
"definition": "Values-oriented entrepreneurs who focus on meaningful impact rather than just profit",
"aliases": [
"ΠΏΡΠΎ ΡΠ΅Π»ΠΎΠ²Π΅ΠΊΠ° ΠΈ ΡΡΠ΅Π½Π½ΠΎΡΡΠ½ΡΠ΅ ΠΈΡΡΠΎΡΠΈΠΈ"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269944073",
"term": "ΡΡΠ°ΡΡΠ°ΠΏ",
"definition": "A new business venture or entrepreneurial project that requires focus on money and revenue generation to succeed",
"aliases": [
"startup"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269945756",
"term": "self-concordance",
"definition": "A psychological term referring to the degree of alignment between one's actions and personal values",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269948304",
"term": "self-perception",
"definition": "One's own perception or understanding of themselves and their characteristics",
"aliases": [],
"firstMentioned": "",
"frequency": 1,
"approved": true
},
{
"id": "term-1764269950037",
"term": "ΡΠ΅Π»Π΅ΠΏΠΎΠ»Π°Π³Π°Π½ΠΈΠ΅",
"definition": "A field of study in psychology focused on goal-setting, with several major schools of thought including one from Edwin Locke",
"aliases": [
"goal-setting"
],
"firstMentioned": "",
"frequency": 1,
"approved": true
}
]
```