SkillHub ClubShip Full StackFull StackBackend

tts-skill

MiniMax TTS API - 文本转语音、声音克隆、声音设计

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

323

Hot score

Updated

March 20, 2026

Overall rating

C5.0

Composite score

5.0

Best-practice grade

C60.3

Install command

npx @skill-hub/cli install notedit-happy-skills-tts-skill

Repository

notedit/happy-skills

Skill path: skills/utils/tts-skill

MiniMax TTS API - 文本转语音、声音克隆、声音设计

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Backend.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: notedit.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install tts-skill into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/notedit/happy-skills before adding tts-skill to shared team environments
Use tts-skill for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: tts-skill
description: MiniMax TTS API - 文本转语音、声音克隆、声音设计
metadata:
  tags: minimax, tts, voice, audio, speech
---

# MiniMax TTS Skill

这个 Skill 提供 MiniMax TTS API 的完整封装，支持文本转语音、声音克隆和声音设计功能。

## 快速开始

### 1. 环境配置

确保已设置环境变量：
```bash
export MINIMAX_API_KEY="your-api-key"
```

详细配置说明见 [setup.md](rules/setup.md)

### 2. 使用 Python 模块

```python
import sys
import os

# 获取 skill 目录路径
skill_dir = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, os.path.join(skill_dir, "assets"))

from minimax_tts import text_to_audio, list_voices, voice_clone, voice_design, play_audio
```

## 功能概览

| 功能 | 函数 | 说明 |
|------|------|------|
| 文本转语音 | `text_to_audio()` | 将文本转换为语音文件 |
| 列出声音 | `list_voices()` | 获取可用的声音列表 |
| 声音克隆 | `voice_clone()` | 基于音频文件克隆声音 |
| 声音设计 | `voice_design()` | 根据文字描述生成声音 |
| 播放音频 | `play_audio()` | 播放音频文件 |

## 详细文档

- [环境配置](rules/setup.md) - API Key 和依赖安装
- [文本转语音](rules/text-to-audio.md) - TTS 功能详解
- [声音列表](rules/list-voices.md) - 可用声音和筛选
- [声音克隆](rules/voice-clone.md) - 克隆自定义声音
- [声音设计](rules/voice-design.md) - 根据描述生成声音

## 快速示例

### 文本转语音
```python
text_to_audio(
    text="你好，欢迎使用 MiniMax TTS 服务！",
    voice_id="female-shaonv",
    output_path="./hello.mp3"
)
```

### 列出可用声音
```python
voices = list_voices(voice_type="system")
for voice in voices:
    print(f"{voice['voice_id']}: {voice['name']}")
```

### 声音克隆
```python
voice_clone(
    voice_id="my-custom-voice",
    audio_file="./sample.mp3",
    voice_name="我的声音"
)
```

### 声音设计
```python
voice_design(
    prompt="一个温柔的年轻女性声音，带有轻微的南方口音",
    preview_text="你好，这是我的声音"
)
```

## 支持的模型

| 模型 | 说明 |
|------|------|
| speech-02-hd | 高清版本，音质最佳 |
| speech-02-turbo | 快速版本，延迟低 |
| speech-01-hd | 旧版高清 |
| speech-01-turbo | 旧版快速 |
| speech-2.6-hd | 2.6 版高清 |
| speech-2.6-turbo | 2.6 版快速 |

## 常用声音 ID

### 系统预设声音
- `female-shaonv` - 少女音
- `female-yujie` - 御姐音
- `female-chengshu` - 成熟女声
- `male-qingnian` - 青年男声
- `male-chengshu` - 成熟男声

更多声音请使用 `list_voices()` 查询。


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### rules/setup.md

```markdown
# 环境配置

## 获取 MiniMax API Key

1. 访问 [MiniMax 开放平台](https://platform.minimaxi.com/)
2. 注册/登录账号
3. 在控制台创建应用
4. 获取 API Key

## 设置环境变量

### macOS / Linux

在 `~/.zshrc` 或 `~/.bashrc` 中添加：

```bash
# MiniMax TTS 配置 (必需)
export MINIMAX_API_KEY="your-api-key-here"

# 可选配置
export MINIMAX_API_HOST="https://api.minimax.io"  # API 地址
export MINIMAX_OUTPUT_DIR="~/Downloads/minimax"   # 默认输出目录
```

添加后执行：
```bash
source ~/.zshrc  # 或 source ~/.bashrc
```

### 验证配置

```bash
echo $MINIMAX_API_KEY
```

## 安装依赖

```bash
pip install requests
```

## 测试连接

```python
import sys
import os

# 添加 assets 目录到路径
skill_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, os.path.join(skill_dir, "assets"))

from minimax_tts import list_voices

# 测试 API 连接
voices = list_voices(voice_type="system")
print(f"连接成功，获取到 {len(voices)} 个系统声音")
```

## 常见问题

### API Key 未设置
```
ValueError: 请设置环境变量 MINIMAX_API_KEY
```
解决方法：确保已正确设置环境变量并重新加载配置。

### 网络连接失败
检查网络连接和 API 地址是否正确。

### 权限不足
确认 API Key 有相应的权限。

```

### rules/text-to-audio.md

```markdown
# 文本转语音 (Text to Audio)

## 函数签名

```python
def text_to_audio(
    text: str,
    voice_id: str = "female-shaonv",
    output_path: str = "./output.mp3",
    model: str = "speech-02-hd",
    speed: float = 1.0,
    vol: float = 1.0,
    pitch: int = 0,
    emotion: str = "happy",
    format: str = "mp3",
    sample_rate: int = 32000,
    bitrate: int = 128000
) -> dict
```

## 参数说明

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| text | str | 必填 | 要转换的文本，最大 10000 字符 |
| voice_id | str | "female-shaonv" | 声音 ID |
| output_path | str | "./output.mp3" | 输出文件路径 |
| model | str | "speech-02-hd" | 模型版本 |
| speed | float | 1.0 | 语速 [0.5, 2.0] |
| vol | float | 1.0 | 音量 [0.1, 10.0] |
| pitch | int | 0 | 音调 [-12, 12] |
| emotion | str | "happy" | 情感风格 |
| format | str | "mp3" | 输出格式 |
| sample_rate | int | 32000 | 采样率 |
| bitrate | int | 128000 | 比特率 |

## 可用情感

- `happy` - 开心
- `sad` - 悲伤
- `angry` - 愤怒
- `fearful` - 恐惧
- `disgusted` - 厌恶
- `surprised` - 惊讶
- `calm` - 平静
- `fluent` - 流畅
- `whisper` - 低语

## 输出格式

- `mp3` - MP3 格式
- `wav` - WAV 格式
- `pcm` - PCM 格式
- `flac` - FLAC 格式

## 使用示例

### 基础用法

```python
from minimax_tts import text_to_audio

result = text_to_audio(
    text="你好，世界！",
    output_path="./hello.mp3"
)
print(f"音频已保存到: {result['file_path']}")
```

### 指定声音和情感

```python
result = text_to_audio(
    text="今天天气真好啊！",
    voice_id="female-yujie",
    emotion="happy",
    speed=1.1,
    output_path="./weather.mp3"
)
```

### 使用克隆的声音

```python
result = text_to_audio(
    text="这是用我的声音生成的",
    voice_id="my-cloned-voice",  # 使用 voice_clone 创建的声音 ID
    output_path="./my_voice.mp3"
)
```

### 批量生成

```python
texts = [
    "第一段文字",
    "第二段文字",
    "第三段文字"
]

for i, text in enumerate(texts):
    text_to_audio(
        text=text,
        output_path=f"./output_{i+1}.mp3"
    )
```

## SSML 支持

MiniMax TTS 支持 SSML 标记来精细控制语音：

```python
ssml_text = """
<speak>
    你好<break time="500ms"/>
    <phoneme alphabet="pinyin" ph="chong2qing4">重庆</phoneme>欢迎你
    <prosody rate="slow" pitch="high">慢速高音调</prosody>
</speak>
"""

text_to_audio(text=ssml_text, output_path="./ssml_demo.mp3")
```

### 常用 SSML 标签

| 标签 | 说明 | 示例 |
|------|------|------|
| `<break>` | 停顿 | `<break time="500ms"/>` |
| `<phoneme>` | 拼音标注 | `<phoneme ph="hao3">好</phoneme>` |
| `<prosody>` | 韵律控制 | `<prosody rate="fast">快速</prosody>` |
| `<say-as>` | 朗读方式 | `<say-as interpret-as="digits">123</say-as>` |

## 返回值

```python
{
    "success": True,
    "file_path": "/path/to/output.mp3",
    "duration": 2.5,  # 音频时长（秒）
    "trace_id": "xxx"  # 请求追踪 ID
}
```

## 错误处理

```python
try:
    result = text_to_audio(text="测试", output_path="./test.mp3")
    if result["success"]:
        print(f"成功: {result['file_path']}")
except Exception as e:
    print(f"错误: {e}")
```

```

### rules/list-voices.md

```markdown
# 列出可用声音 (List Voices)

## 函数签名

```python
def list_voices(voice_type: str = "all") -> list
```

## 参数说明

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| voice_type | str | "all" | 声音类型筛选 |

### voice_type 可选值

- `all` - 所有声音
- `system` - 系统预设声音
- `cloned` - 用户克隆的声音
- `designed` - 用户设计的声音

## 使用示例

### 获取所有声音

```python
from minimax_tts import list_voices

voices = list_voices()
for voice in voices:
    print(f"{voice['voice_id']}: {voice['name']}")
```

### 只获取系统声音

```python
system_voices = list_voices(voice_type="system")
print(f"共 {len(system_voices)} 个系统声音")
```

### 获取用户克隆的声音

```python
cloned_voices = list_voices(voice_type="cloned")
for voice in cloned_voices:
    print(f"克隆声音: {voice['voice_id']} - {voice['name']}")
```

## 返回值格式

```python
[
    {
        "voice_id": "female-shaonv",
        "name": "少女音",
        "type": "system",
        "language": "zh",
        "description": "清新活泼的少女声音",
        "sample_url": "https://..."  # 试听链接
    },
    {
        "voice_id": "my-cloned-voice",
        "name": "我的声音",
        "type": "cloned",
        "language": "zh",
        "created_at": "2024-01-01T00:00:00Z"
    }
]
```

## 系统预设声音列表

### 女声

| voice_id | 名称 | 特点 |
|----------|------|------|
| female-shaonv | 少女音 | 清新活泼 |
| female-yujie | 御姐音 | 成熟知性 |
| female-chengshu | 成熟女声 | 稳重大方 |
| female-tianmei | 甜美音 | 温柔甜美 |
| female-qingxin | 清新音 | 自然清新 |

### 男声

| voice_id | 名称 | 特点 |
|----------|------|------|
| male-qingnian | 青年男声 | 朝气蓬勃 |
| male-chengshu | 成熟男声 | 沉稳大气 |
| male-磁性 | 磁性男声 | 低沉有磁性 |

### 特殊声音

| voice_id | 名称 | 特点 |
|----------|------|------|
| narrator | 旁白音 | 适合叙述 |
| news | 新闻播音 | 标准播音腔 |

## 筛选和搜索

### 按语言筛选

```python
voices = list_voices()
chinese_voices = [v for v in voices if v.get("language") == "zh"]
english_voices = [v for v in voices if v.get("language") == "en"]
```

### 按名称搜索

```python
voices = list_voices()
female_voices = [v for v in voices if "女" in v.get("name", "")]
```

## 注意事项

1. 系统声音 ID 是固定的，不会改变
2. 克隆和设计的声音 ID 是在创建时指定的
3. 建议缓存声音列表，避免频繁调用 API

```

### rules/voice-clone.md

```markdown
# 声音克隆 (Voice Clone)

## 函数签名

```python
def voice_clone(
    voice_id: str,
    audio_file: str,
    voice_name: str = None,
    voice_description: str = None,
    demo_text: str = None
) -> dict
```

## 参数说明

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| voice_id | str | 必填 | 自定义声音 ID（唯一标识） |
| audio_file | str | 必填 | 音频文件路径 |
| voice_name | str | None | 声音名称（可选） |
| voice_description | str | None | 声音描述（可选） |
| demo_text | str | None | 试听文本（可选） |

## 音频要求

### 格式支持
- MP3, WAV, M4A, FLAC

### 最佳实践
- **时长**: 10-60 秒
- **质量**: 清晰无噪音
- **内容**: 自然说话，包含多种音调
- **环境**: 安静环境录制

## 使用示例

### 基础克隆

```python
from minimax_tts import voice_clone

result = voice_clone(
    voice_id="my-voice-001",
    audio_file="./my_recording.mp3"
)

if result["success"]:
    print(f"声音克隆成功: {result['voice_id']}")
```

### 完整参数

```python
result = voice_clone(
    voice_id="custom-narrator",
    audio_file="./narrator_sample.mp3",
    voice_name="专业旁白",
    voice_description="深沉有力的男声旁白",
    demo_text="这是一段试听文本"
)
```

### 使用克隆的声音

```python
from minimax_tts import text_to_audio

# 克隆后即可使用
text_to_audio(
    text="这是用我克隆的声音生成的",
    voice_id="my-voice-001",
    output_path="./cloned_output.mp3"
)
```

## 返回值

### 成功

```python
{
    "success": True,
    "voice_id": "my-voice-001",
    "voice_name": "我的声音",
    "status": "ready",  # ready, processing, failed
    "created_at": "2024-01-01T00:00:00Z"
}
```

### 失败

```python
{
    "success": False,
    "error": "音频质量不符合要求",
    "error_code": "AUDIO_QUALITY_LOW"
}
```

## 录制建议

### 推荐设备
- 专业麦克风或手机录音
- 避免使用蓝牙耳机

### 录制环境
- 安静的室内环境
- 避免回声和噪音
- 关闭空调、风扇等

### 录制内容
- 自然流畅地说话
- 包含不同语调和情感
- 避免过长的停顿

### 示例脚本

```
大家好，我是[名字]。今天天气真不错。
我很高兴能够和大家分享这段内容。
让我们一起来看看接下来会发生什么。
这真是一个令人惊喜的消息！
好的，谢谢大家的收听。
```

## 管理克隆声音

### 查看所有克隆声音

```python
from minimax_tts import list_voices

cloned = list_voices(voice_type="cloned")
for voice in cloned:
    print(f"{voice['voice_id']}: {voice['name']}")
```

### 删除克隆声音

```python
# 暂不支持 API 删除，需要在控制台操作
```

## 常见问题

### 克隆失败
- 检查音频质量和时长
- 确保音频中只有一个人说话
- 尝试重新录制更清晰的音频

### 克隆效果不好
- 使用更长的样本（30 秒以上）
- 确保样本包含丰富的语调变化
- 选择更清晰的录音环境

## 注意事项

1. voice_id 必须唯一，重复会覆盖
2. 克隆可能需要几秒钟处理
3. 克隆声音仅供个人使用
4. 请确保有权使用所提供的音频

```

### rules/voice-design.md

```markdown
# 声音设计 (Voice Design)

## 函数签名

```python
def voice_design(
    prompt: str,
    preview_text: str,
    voice_id: str = None,
    voice_name: str = None
) -> dict
```

## 参数说明

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| prompt | str | 必填 | 声音描述（中文或英文） |
| preview_text | str | 必填 | 试听预览文本 |
| voice_id | str | None | 保存时的声音 ID |
| voice_name | str | None | 声音名称 |

## 使用示例

### 生成预览

```python
from minimax_tts import voice_design

result = voice_design(
    prompt="一个温柔的年轻女性声音，带有轻微的南方口音，语速适中",
    preview_text="你好，欢迎来到我们的节目"
)

if result["success"]:
    # 播放预览
    from minimax_tts import play_audio
    play_audio(result["preview_audio"])
```

### 保存设计的声音

```python
result = voice_design(
    prompt="深沉有磁性的中年男声，像电台主播",
    preview_text="晚上好，欢迎收听今晚的节目",
    voice_id="radio-host",
    voice_name="电台主播"
)

if result["success"]:
    print(f"声音已保存: {result['voice_id']}")
```

## Prompt 编写技巧

### 描述维度

1. **性别和年龄**
   - 年轻女性、中年男性、老年人

2. **音色特点**
   - 温柔、沙哑、清亮、低沉、磁性

3. **说话风格**
   - 活泼、稳重、专业、亲切

4. **口音特征**
   - 标准普通话、南方口音、北方口音

5. **情感倾向**
   - 愉快、平静、严肃、热情

### 优秀 Prompt 示例

#### 新闻播音
```
专业的新闻播音员声音，男性，中年，
声音浑厚有力，吐字清晰，
语速适中，富有权威感
```

#### 有声书旁白
```
温柔的女性声音，年轻，
声音清亮悦耳，富有感染力，
适合讲述故事，能够表达丰富的情感
```

#### 儿童节目
```
活泼可爱的年轻女声，
声音甜美清脆，充满活力，
语调活泼有趣，适合儿童内容
```

#### 商务配音
```
成熟稳重的男性声音，中年，
声音低沉有磁性，专业可信，
适合商务场合和产品介绍
```

### 避免的描述

- 过于简短: "女声" ❌
- 矛盾的描述: "温柔且愤怒" ❌
- 不相关的信息: "穿红色衣服的人" ❌

## 返回值

### 成功

```python
{
    "success": True,
    "voice_id": "radio-host",  # 如果提供了 voice_id
    "preview_audio": "/tmp/preview_xxx.mp3",
    "voice_features": {
        "gender": "male",
        "age": "middle",
        "style": "professional"
    }
}
```

### 失败

```python
{
    "success": False,
    "error": "描述不够清晰",
    "suggestion": "请提供更详细的声音特征描述"
}
```

## 工作流程

### 1. 迭代设计

```python
# 第一次尝试
result1 = voice_design(
    prompt="温柔的女声",
    preview_text="测试文本"
)
play_audio(result1["preview_audio"])

# 根据效果调整
result2 = voice_design(
    prompt="更加温柔甜美的年轻女声，带有轻微的撒娇感",
    preview_text="测试文本"
)
play_audio(result2["preview_audio"])

# 满意后保存
result3 = voice_design(
    prompt="更加温柔甜美的年轻女声，带有轻微的撒娇感",
    preview_text="测试文本",
    voice_id="sweet-girl",
    voice_name="甜美女声"
)
```

### 2. 使用设计的声音

```python
from minimax_tts import text_to_audio

text_to_audio(
    text="这是用设计的声音生成的",
    voice_id="sweet-girl",
    output_path="./designed_output.mp3"
)
```

## 与声音克隆的对比

| 特性 | 声音设计 | 声音克隆 |
|------|----------|----------|
| 输入 | 文字描述 | 音频文件 |
| 相似度 | 符合描述 | 高度相似 |
| 灵活性 | 高，可任意设计 | 受限于样本 |
| 使用场景 | 创造新声音 | 复制特定声音 |

## 注意事项

1. 设计的声音是 AI 生成的，不代表真实人物
2. 每次设计可能产生略有不同的结果
3. 复杂的描述可能需要多次调整
4. 建议先预览再保存

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### assets/minimax_tts.py

```python
#!/usr/bin/env python3
"""
MiniMax TTS API Python Module

提供文本转语音、声音克隆、声音设计等功能的完整封装。

使用方法:
    import sys
    import os

    # 方式1: 直接添加 assets 目录
    sys.path.insert(0, "/path/to/skills/tts-skill/assets")
    from minimax_tts import text_to_audio, list_voices, voice_clone, voice_design, play_audio

    # 方式2: 相对路径
    skill_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
    sys.path.insert(0, os.path.join(skill_dir, "assets"))
    from minimax_tts import text_to_audio, list_voices, voice_clone, voice_design, play_audio

环境变量:
    MINIMAX_API_KEY: API 密钥 (必需)
    MINIMAX_API_HOST: API 地址 (可选，默认 https://api.minimax.io)
    MINIMAX_OUTPUT_DIR: 默认输出目录 (可选)
"""

import os
import json
import base64
import subprocess
import platform
from pathlib import Path
from typing import Optional, Dict, List, Any
from datetime import datetime

try:
    import requests
except ImportError:
    raise ImportError("请安装 requests: pip install requests")


# ============================================================
# 配置
# ============================================================

def get_config() -> Dict[str, str]:
    """获取配置信息"""
    api_key = os.environ.get("MINIMAX_API_KEY")
    if not api_key:
        raise ValueError(
            "请设置环境变量 MINIMAX_API_KEY\n"
            "在 ~/.zshrc 或 ~/.bashrc 中添加:\n"
            "export MINIMAX_API_KEY=\"your-api-key\""
        )

    return {
        "api_key": api_key,
        "api_host": os.environ.get("MINIMAX_API_HOST", "https://api.minimax.io"),
        "output_dir": os.environ.get("MINIMAX_OUTPUT_DIR", os.getcwd())
    }


def ensure_output_dir(path: str = None) -> str:
    """确保输出目录存在"""
    if path:
        dir_path = Path(path).parent
    else:
        config = get_config()
        dir_path = Path(config["output_dir"])

    dir_path.mkdir(parents=True, exist_ok=True)
    return str(dir_path)


# ============================================================
# 文本转语音
# ============================================================

def text_to_audio(
    text: str,
    voice_id: str = "female-shaonv",
    output_path: str = None,
    model: str = "speech-02-hd",
    speed: float = 1.0,
    vol: float = 1.0,
    pitch: int = 0,
    emotion: str = "happy",
    format: str = "mp3",
    sample_rate: int = 32000,
    bitrate: int = 128000
) -> Dict[str, Any]:
    """
    将文本转换为语音文件

    Args:
        text: 要转换的文本，最大 10000 字符
        voice_id: 声音 ID
        output_path: 输出文件路径
        model: 模型版本 (speech-02-hd, speech-02-turbo, etc.)
        speed: 语速 [0.5, 2.0]
        vol: 音量 [0.1, 10.0]
        pitch: 音调 [-12, 12]
        emotion: 情感 (happy, sad, angry, fearful, disgusted, surprised, calm, fluent, whisper)
        format: 输出格式 (mp3, wav, pcm, flac)
        sample_rate: 采样率
        bitrate: 比特率

    Returns:
        dict: 包含 success, file_path, duration, trace_id 等信息
    """
    config = get_config()

    # 处理输出路径
    if output_path is None:
        ensure_output_dir()
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        output_path = os.path.join(config["output_dir"], f"tts_{timestamp}.{format}")
    else:
        output_path = os.path.expanduser(output_path)
        ensure_output_dir(output_path)

    # 构建请求
    url = f"{config['api_host']}/v1/t2a_v2"

    headers = {
        "Authorization": f"Bearer {config['api_key']}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": model,
        "text": text,
        "stream": False,
        "voice_setting": {
            "voice_id": voice_id,
            "speed": speed,
            "vol": vol,
            "pitch": pitch,
            "emotion": emotion
        },
        "audio_setting": {
            "format": format,
            "sample_rate": sample_rate,
            "bitrate": bitrate
        }
    }

    try:
        response = requests.post(url, headers=headers, json=payload, timeout=60)
        response.raise_for_status()

        result = response.json()

        if "data" in result and "audio" in result["data"]:
            # 解码并保存音频 (API 返回的是十六进制编码)
            audio_data = bytes.fromhex(result["data"]["audio"])
            with open(output_path, "wb") as f:
                f.write(audio_data)

            return {
                "success": True,
                "file_path": output_path,
                "duration": result.get("data", {}).get("duration"),
                "trace_id": result.get("trace_id"),
                "extra_info": result.get("extra_info")
            }
        else:
            return {
                "success": False,
                "error": result.get("base_resp", {}).get("status_msg", "未知错误"),
                "error_code": result.get("base_resp", {}).get("status_code")
            }

    except requests.exceptions.RequestException as e:
        return {
            "success": False,
            "error": str(e)
        }


# ============================================================
# 声音列表
# ============================================================

def list_voices(voice_type: str = "all") -> List[Dict[str, Any]]:
    """
    列出可用的声音

    Args:
        voice_type: 声音类型筛选 (all, system, cloned, designed)

    Returns:
        list: 声音列表
    """
    config = get_config()

    url = f"{config['api_host']}/v1/voice/list"

    headers = {
        "Authorization": f"Bearer {config['api_key']}",
        "Content-Type": "application/json"
    }

    try:
        response = requests.get(url, headers=headers, timeout=30)
        response.raise_for_status()

        result = response.json()
        voices = result.get("data", {}).get("voices", [])

        # 如果 API 没有返回数据，返回默认系统声音列表
        if not voices:
            voices = get_default_system_voices()

        # 筛选
        if voice_type == "all":
            return voices
        elif voice_type == "system":
            return [v for v in voices if v.get("type") == "system"]
        elif voice_type == "cloned":
            return [v for v in voices if v.get("type") == "cloned"]
        elif voice_type == "designed":
            return [v for v in voices if v.get("type") == "designed"]
        else:
            return voices

    except requests.exceptions.RequestException:
        # 如果 API 调用失败，返回默认声音列表
        return get_default_system_voices() if voice_type in ["all", "system"] else []


def get_default_system_voices() -> List[Dict[str, Any]]:
    """返回默认的系统声音列表"""
    return [
        {"voice_id": "female-shaonv", "name": "少女音", "type": "system", "language": "zh"},
        {"voice_id": "female-yujie", "name": "御姐音", "type": "system", "language": "zh"},
        {"voice_id": "female-chengshu", "name": "成熟女声", "type": "system", "language": "zh"},
        {"voice_id": "female-tianmei", "name": "甜美音", "type": "system", "language": "zh"},
        {"voice_id": "male-qingnian", "name": "青年男声", "type": "system", "language": "zh"},
        {"voice_id": "male-chengshu", "name": "成熟男声", "type": "system", "language": "zh"},
        {"voice_id": "presenter_male", "name": "男性主持", "type": "system", "language": "zh"},
        {"voice_id": "presenter_female", "name": "女性主持", "type": "system", "language": "zh"},
        {"voice_id": "audiobook_male_1", "name": "有声书男声1", "type": "system", "language": "zh"},
        {"voice_id": "audiobook_female_1", "name": "有声书女声1", "type": "system", "language": "zh"},
    ]


# ============================================================
# 声音克隆
# ============================================================

def voice_clone(
    voice_id: str,
    audio_file: str,
    voice_name: str = None,
    voice_description: str = None,
    demo_text: str = None
) -> Dict[str, Any]:
    """
    克隆声音

    Args:
        voice_id: 自定义声音 ID（唯一标识）
        audio_file: 音频文件路径
        voice_name: 声音名称
        voice_description: 声音描述
        demo_text: 试听文本

    Returns:
        dict: 包含 success, voice_id, status 等信息
    """
    config = get_config()

    audio_path = os.path.expanduser(audio_file)
    if not os.path.exists(audio_path):
        return {
            "success": False,
            "error": f"音频文件不存在: {audio_path}"
        }

    url = f"{config['api_host']}/v1/voice/clone"

    headers = {
        "Authorization": f"Bearer {config['api_key']}"
    }

    # 准备文件和表单数据
    with open(audio_path, "rb") as f:
        files = {
            "file": (os.path.basename(audio_path), f, "audio/mpeg")
        }

        data = {
            "voice_id": voice_id
        }

        if voice_name:
            data["voice_name"] = voice_name
        if voice_description:
            data["voice_description"] = voice_description
        if demo_text:
            data["demo_text"] = demo_text

        try:
            response = requests.post(
                url,
                headers=headers,
                files=files,
                data=data,
                timeout=120
            )
            response.raise_for_status()

            result = response.json()

            if result.get("base_resp", {}).get("status_code") == 0:
                return {
                    "success": True,
                    "voice_id": voice_id,
                    "voice_name": voice_name,
                    "status": "ready",
                    "created_at": datetime.now().isoformat()
                }
            else:
                return {
                    "success": False,
                    "error": result.get("base_resp", {}).get("status_msg", "克隆失败"),
                    "error_code": result.get("base_resp", {}).get("status_code")
                }

        except requests.exceptions.RequestException as e:
            return {
                "success": False,
                "error": str(e)
            }


# ============================================================
# 声音设计
# ============================================================

def voice_design(
    prompt: str,
    preview_text: str,
    voice_id: str = None,
    voice_name: str = None
) -> Dict[str, Any]:
    """
    根据描述设计声音

    Args:
        prompt: 声音描述
        preview_text: 试听预览文本
        voice_id: 保存时的声音 ID（可选）
        voice_name: 声音名称（可选）

    Returns:
        dict: 包含 success, preview_audio, voice_id 等信息
    """
    config = get_config()

    url = f"{config['api_host']}/v1/voice/design"

    headers = {
        "Authorization": f"Bearer {config['api_key']}",
        "Content-Type": "application/json"
    }

    payload = {
        "prompt": prompt,
        "preview_text": preview_text
    }

    if voice_id:
        payload["voice_id"] = voice_id
    if voice_name:
        payload["voice_name"] = voice_name

    try:
        response = requests.post(url, headers=headers, json=payload, timeout=60)
        response.raise_for_status()

        result = response.json()

        if "data" in result:
            # 保存预览音频
            preview_audio = None
            if "audio" in result["data"]:
                ensure_output_dir()
                timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
                preview_path = os.path.join(
                    config["output_dir"],
                    f"voice_design_preview_{timestamp}.mp3"
                )

                audio_data = bytes.fromhex(result["data"]["audio"])
                with open(preview_path, "wb") as f:
                    f.write(audio_data)
                preview_audio = preview_path

            return {
                "success": True,
                "voice_id": voice_id,
                "preview_audio": preview_audio,
                "voice_features": result.get("data", {}).get("voice_features", {})
            }
        else:
            return {
                "success": False,
                "error": result.get("base_resp", {}).get("status_msg", "设计失败"),
                "suggestion": "请提供更详细的声音特征描述"
            }

    except requests.exceptions.RequestException as e:
        return {
            "success": False,
            "error": str(e)
        }


# ============================================================
# 播放音频
# ============================================================

def play_audio(file_path: str) -> Dict[str, Any]:
    """
    播放音频文件

    Args:
        file_path: 音频文件路径

    Returns:
        dict: 包含 success 和可能的 error 信息
    """
    file_path = os.path.expanduser(file_path)

    if not os.path.exists(file_path):
        return {
            "success": False,
            "error": f"文件不存在: {file_path}"
        }

    system = platform.system()

    try:
        if system == "Darwin":  # macOS
            subprocess.run(["afplay", file_path], check=True)
        elif system == "Windows":
            os.startfile(file_path)
        else:  # Linux
            # 尝试多种播放器
            players = ["aplay", "paplay", "mpv", "ffplay"]
            for player in players:
                try:
                    subprocess.run([player, file_path], check=True)
                    break
                except FileNotFoundError:
                    continue
            else:
                return {
                    "success": False,
                    "error": "未找到可用的音频播放器，请安装 aplay, paplay, mpv 或 ffplay"
                }

        return {"success": True, "file_path": file_path}

    except subprocess.CalledProcessError as e:
        return {
            "success": False,
            "error": f"播放失败: {e}"
        }
    except Exception as e:
        return {
            "success": False,
            "error": str(e)
        }


# ============================================================
# 便捷函数
# ============================================================

def quick_tts(text: str, voice: str = "female-shaonv") -> str:
    """
    快速 TTS，返回生成的文件路径

    Args:
        text: 要转换的文本
        voice: 声音 ID

    Returns:
        str: 生成的音频文件路径
    """
    result = text_to_audio(text=text, voice_id=voice)
    if result["success"]:
        return result["file_path"]
    else:
        raise Exception(result.get("error", "TTS 失败"))


def speak(text: str, voice: str = "female-shaonv") -> None:
    """
    直接朗读文本（生成并播放）

    Args:
        text: 要朗读的文本
        voice: 声音 ID
    """
    file_path = quick_tts(text, voice)
    play_audio(file_path)


# ============================================================
# 主函数（用于测试）
# ============================================================

if __name__ == "__main__":
    import sys

    if len(sys.argv) < 2:
        print("MiniMax TTS 模块")
        print("\n可用函数:")
        print("  - text_to_audio(text, voice_id, output_path, ...)")
        print("  - list_voices(voice_type)")
        print("  - voice_clone(voice_id, audio_file, ...)")
        print("  - voice_design(prompt, preview_text, ...)")
        print("  - play_audio(file_path)")
        print("  - quick_tts(text, voice)")
        print("  - speak(text, voice)")
        print("\n测试: python minimax_tts.py test")
        sys.exit(0)

    if sys.argv[1] == "test":
        print("测试 MiniMax TTS...")

        # 测试获取配置
        try:
            config = get_config()
            print(f"✓ API Key 已配置")
            print(f"✓ API Host: {config['api_host']}")
        except ValueError as e:
            print(f"✗ 配置错误: {e}")
            sys.exit(1)

        # 测试列出声音
        print("\n测试列出声音...")
        voices = list_voices(voice_type="system")
        print(f"✓ 获取到 {len(voices)} 个系统声音")
        for v in voices[:3]:
            print(f"  - {v['voice_id']}: {v['name']}")

        print("\n测试完成!")

```