Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI

zhipu-tts

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,070
Hot score
99
Updated
March 20, 2026
Overall rating
C4.6
Composite score
4.6
Best-practice grade
A92.0

Install command

npx @skill-hub/cli install openclaw-skills-zhipu-tts

Repository

openclaw/skills

Skill path: skills/franklu0819-lang/zhipu-tts

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install zhipu-tts into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding zhipu-tts to shared team environments
  • Use zhipu-tts for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: zhipu-tts
description: Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Use when you need to convert text to audio files with various voice options. Supports Chinese text synthesis with multiple voice personas, speed control, and output formats.
metadata:
  {
    "openclaw":
      {
        "requires": { "bins": ["jq"], "env": ["ZHIPU_API_KEY"] },
      },
  }
---

# Zhipu AI Text-to-Speech

Convert Chinese text to natural-sounding speech using Zhipu AI's GLM-TTS model.

## Setup

**1. Get your API Key:**
Get a key from [Zhipu AI Console](https://bigmodel.cn/usercenter/proj-mgmt/apikeys)

**2. Set it in your environment:**
```bash
export ZHIPU_API_KEY="your-key-here"
```

## Available Voices

### System Voices (Pre-built)

- **tongtong** (彤彤) - Default voice, balanced tone
- **chuichui** (锤锤) - Male voice, deeper tone
- **xiaochen** (小陈) - Young professional voice
- **jam** - 动动动物圈 Jam voice
- **kazi** - 动动动物圈 Kazi voice
- **douji** - 动动动物圈 Douji voice
- **luodo** - 动动动物圈 Luodo voice

## Usage

### Basic Text-to-Speech

Convert text to speech with default settings (tongtong voice, normal speed, WAV format):

```bash
bash scripts/text_to_speech.sh "你好,今天天气怎么样"
```

### Advanced Options

Specify voice, speed, format, and output filename:

```bash
bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav
```

**Parameters:**
- `text` (required): Chinese text to convert (max 1024 characters)
- `voice` (optional): tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodo
- `speed` (optional): Speech speed from 0.5 to 2.0 (default: 1.0)
- `output_format` (optional): wav (default), pcm
- `output_file` (optional): Output filename (default: output.{format})

## Voice Selection Guide

**Choose tongtong (default) for:**
- General purpose narration
- Professional presentations
- Balanced tone requirements

**Choose chuichui for:**
- Male voice needed
- Deeper, authoritative tone
- Documentary or formal content

**Choose xiaochen for:**
- Young, energetic tone
- Modern, casual content
- Friendly assistant vibe

**Choose jam/kazi/douji/luodo for:**
- Entertainment content
- Character voices
- Creative projects

## Speed Control

**Recommended speeds:**
- **0.8-1.0**: Clear, professional narration
- **1.0-1.2**: Natural conversational pace (default: 1.0)
- **1.2-1.5**: Energetic, upbeat delivery
- **1.5-2.0**: Fast-paced summaries (may reduce clarity)

## Output Formats

**WAV (recommended):**
- Standard audio format
- Widely compatible
- Better quality preservation

**PCM:**
- Raw audio format
- Smaller file size
- Requires additional processing for playback

## Examples

Create a professional greeting:

```bash
bash scripts/text_to_speech.sh "您好,感谢致电智能客服,请按1选择中文服务" tongtong 1.0 wav greeting.wav
```

Generate an energetic announcement:

```bash
bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾参加今天的活动!" xiaochen 1.3 wav announcement.wav
```

Create a calm narration:

```bash
bash scripts/text_to_speech.sh "在这个宁静的夜晚,让我们一起欣赏美丽的星空" chuichui 0.9 wav narration.wav
```

## Character Limits

- Maximum input: **1024 characters** per request
- For longer texts, split into multiple segments
- Combine audio files post-generation

## Audio Quality Tips

**Best practices:**
- Use punctuation for natural pauses (commas, periods)
- Break long sentences into shorter segments
- Use appropriate line breaks for paragraph pauses
- Test speed settings for your specific content

**Sample rate:** Generated audio uses 24000 Hz sampling rate for optimal quality.

## Troubleshooting

**Text Length Issues:**
- Split texts longer than 1024 characters
- Process segments separately
- Combine using audio editing tools

**Audio Quality Issues:**
- Check text encoding (use UTF-8)
- Verify punctuation placement
- Adjust speed settings
- Try different voices

**File Playback Issues:**
- Ensure format compatibility with your player
- WAV format works on most systems
- PCM may require conversion

## API Notes

- Responses are returned as audio files
- Watermarking enabled by default (can be disabled in account settings)
- No strict rate limiting documented
- Audio generation typically completes in 1-3 seconds


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/text_to_speech.sh

```bash
#!/bin/bash
# Zhipu AI Text-to-Speech Script
# Usage: ./text_to_speech.sh "text" [voice] [speed] [output_format]

set -e

# Configuration
API_ENDPOINT="https://open.bigmodel.cn/api/paas/v4/audio/speech"

# Get API key from environment
if [ -z "$ZHIPU_API_KEY" ]; then
    echo "Error: ZHIPU_API_KEY environment variable is not set" >&2
    echo "" >&2
    echo "To fix:" >&2
    echo "1. Get a key from https://bigmodel.cn/usercenter/proj-mgmt/apikeys" >&2
    echo "2. Run: export ZHIPU_API_KEY=\"your-key\"" >&2
    exit 1
fi

# Parse arguments
TEXT="$1"
VOICE="${2:-tongtong}"
SPEED="${3:-1.0}"
OUTPUT_FORMAT="${4:-wav}"
OUTPUT_FILE="${5:-output.${OUTPUT_FORMAT}}"

# Validate text
if [ -z "$TEXT" ]; then
    echo "Usage: $0 \"text\" [voice] [speed] [output_format] [output_file]" >&2
    echo "" >&2
    echo "Examples:" >&2
    echo "  $0 \"你好,今天天气怎么样\"" >&2
    echo "  $0 \"欢迎使用智能语音服务\" xiaochen 1.2 wav greeting.wav" >&2
    echo "" >&2
    echo "Voices: tongtong (default), chuichui, xiaochen, jam, kazi, douji, luodo" >&2
    echo "Speed: 0.5-2.0 (default: 1.0)" >&2
    echo "Format: wav (default), pcm" >&2
    exit 1
fi

# Build request payload
PAYLOAD=$(jq -n \
    --arg model "glm-tts" \
    --arg input "$TEXT" \
    --arg voice "$VOICE" \
    --argjson speed "$SPEED" \
    --arg response_format "$OUTPUT_FORMAT" \
    '{
        model: $model,
        input: $input,
        voice: $voice,
        speed: $speed,
        response_format: $response_format
    }')

# Make API request
echo "Converting text to speech..." >&2
echo "Voice: $VOICE, Speed: $SPEED, Format: $OUTPUT_FORMAT" >&2
echo "" >&2

RESPONSE=$(curl -s -X POST "$API_ENDPOINT" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $ZHIPU_API_KEY" \
    -d "$PAYLOAD" \
    --output "$OUTPUT_FILE" \
    -w "%{http_code}")

# Check for errors
if [ "$RESPONSE" != "200" ]; then
    echo "Error: HTTP $RESPONSE" >&2
    if [ -f "$OUTPUT_FILE" ]; then
        cat "$OUTPUT_FILE" >&2
        rm "$OUTPUT_FILE"
    fi
    exit 1
fi

echo "Audio saved to: $OUTPUT_FILE" >&2
echo "Text: $TEXT" >&2

# Get file info
if command -v file &> /dev/null; then
    FILE_INFO=$(file "$OUTPUT_FILE")
    echo "File info: $FILE_INFO" >&2
fi

if command -v ls &> /dev/null; then
    FILE_SIZE=$(ls -lh "$OUTPUT_FILE" | awk '{print $5}')
    echo "File size: $FILE_SIZE" >&2
fi

echo "$OUTPUT_FILE"

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# Zhipu AI TTS Skill

Text-to-speech conversion using Zhipu AI (BigModel) GLM-TTS model. Convert Chinese text to natural-sounding speech with multiple voice options.

## Features

- 🎙️ **Multiple Voices**: 7 different voice personas (tongtong, chuichui, xiaochen, jam, kazi, douji, luodo)
- ⚡ **Speed Control**: Adjustable speech speed from 0.5x to 2.0x
- 🎵 **Multiple Formats**: WAV and PCM output formats
- 🇨🇳 **Chinese Language**: Optimized for Mandarin Chinese synthesis
- 📝 **Long Text Support**: Up to 1024 characters per request
- 🔊 **High Quality**: 24000 Hz sampling rate for optimal audio quality

## Requirements

- `jq` - JSON processor
- `ZHIPU_API_KEY` environment variable

## Quick Start

```bash
# Install dependencies (if needed)
sudo apt-get install jq

# Set your API key
export ZHIPU_API_KEY="your-key-here"

# Convert text to speech (default settings)
bash scripts/text_to_speech.sh "你好,今天天气怎么样"

# With custom voice and speed
bash scripts/text_to_speech.sh "欢迎使用智能语音服务" xiaochen 1.2 wav greeting.wav
```

## Available Voices

- **tongtong** (彤彤) - Default balanced tone
- **chuichui** (锤锤) - Male voice, deeper tone
- **xiaochen** (小陈) - Young professional voice
- **jam** - 动动动物圈 Jam voice
- **kazi** - 动动动物圈 Kazi voice
- **douji** - 动动动物圈 Douji voice
- **luodo** - 动动动物圈 Luodo voice

## Use Cases

- 📚 Audiobook creation
- 🎮 Game character voices
- 📢 Announcement systems
- 🤖 Virtual assistants
- 🎬 Video dubbing
- 📻 Radio content generation

## Parameters

- `text` (required): Chinese text to convert (max 1024 characters)
- `voice` (optional): Voice persona (default: tongtong)
- `speed` (optional): Speech speed 0.5-2.0 (default: 1.0)
- `output_format` (optional): wav or pcm (default: wav)
- `output_file` (optional): Output filename (default: output.{format})

## Examples

```bash
# Professional greeting
bash scripts/text_to_speech.sh "您好,感谢致电智能客服" tongtong 1.0 wav greeting.wav

# Energetic announcement
bash scripts/text_to_speech.sh "热烈欢迎各位嘉宾!" xiaochen 1.3 wav announcement.wav

# Calm narration
bash scripts/text_to_speech.sh "在这个宁静的夜晚" chuichui 0.9 wav narration.wav
```

## Author

franklu0819-lang

## License

MIT

```

### _meta.json

```json
{
  "owner": "franklu0819-lang",
  "slug": "zhipu-tts",
  "displayName": "Zhipu AI TTS",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1771684745050,
    "commit": "https://github.com/openclaw/skills/commit/5cfba21d911fd636f0708cdda4c192ac4f144e43"
  },
  "history": []
}

```

zhipu-tts | SkillHub