rl-reward
Build RL reward signals using the OpenJudge framework. Covers choosing between pointwise and pairwise reward strategies based on RL algorithm, task type, and cost; aggregating multi-dimensional pointwise scores into a scalar reward; pairwise tournament reward for GRPO on subjective tasks (net win rate across group rollouts); generating preference pairs for DPO/RLAIF; and normalizing scores for training stability. Use when building reward models, scoring rollouts for GRPO/REINFORCE, generating preference data for DPO, or doing Best-of-N selection.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install agentscope-ai-openjudge-rl-reward
Repository
Skill path: skills/rl-reward
Build RL reward signals using the OpenJudge framework. Covers choosing between pointwise and pairwise reward strategies based on RL algorithm, task type, and cost; aggregating multi-dimensional pointwise scores into a scalar reward; pairwise tournament reward for GRPO on subjective tasks (net win rate across group rollouts); generating preference pairs for DPO/RLAIF; and normalizing scores for training stability. Use when building reward models, scoring rollouts for GRPO/REINFORCE, generating preference data for DPO, or doing Best-of-N selection.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: Development teams looking for install-ready agent workflows..
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: agentscope-ai.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install rl-reward into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/agentscope-ai/OpenJudge before adding rl-reward to shared team environments
- Use rl-reward for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.