llama-cpp
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install nousresearch-hermes-agent-llama-cpp
Repository
Skill path: skills/mlops/inference/llama-cpp
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.
Open repositoryBest for
Primary workflow: Run DevOps.
Technical facets: Full Stack, DevOps.
Target audience: Development teams looking for install-ready agent workflows..
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: NousResearch.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install llama-cpp into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/NousResearch/hermes-agent before adding llama-cpp to shared team environments
- Use llama-cpp for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.