SkillHub ClubAnalyze Data & AIFull StackBackendData / AI

optimizing-r

R performance profiling, benchmarking, and optimization strategies. Use this skill when code is running slowly, comparing alternative implementations, deciding between dplyr/data.table/base R, or implementing parallel processing. Covers profvis and bench usage, performance workflow, parallel processing with in_parallel(), data backend selection, modern purrr patterns (list_rbind, walk), and common performance anti-patterns to avoid.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C2.9

Composite score

2.9

Best-practice grade

B77.6

Install command

npx @skill-hub/cli install jeremy-allen-claude-skills-optimizing-r

Repository

jeremy-allen/claude-skills

Skill path: optimizing-r

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Backend, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: jeremy-allen.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install optimizing-r into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/jeremy-allen/claude-skills before adding optimizing-r to shared team environments
Use optimizing-r for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: optimizing-r
description: |
  R performance profiling, benchmarking, and optimization strategies. Use this skill when code is running slowly, comparing alternative implementations, deciding between dplyr/data.table/base R, or implementing parallel processing. Covers profvis and bench usage, performance workflow, parallel processing with in_parallel(), data backend selection, modern purrr patterns (list_rbind, walk), and common performance anti-patterns to avoid.
---

# Optimizing R

This skill covers profiling, benchmarking, parallelization, and performance best practices for R.

## Core Principle

**Profile before optimizing** - Use profvis and bench to identify real bottlenecks. Write readable code first, optimize only when necessary.

## Profiling Tools Decision Matrix

| Tool | Use When | Don't Use When | What It Shows |
|------|----------|----------------|---------------|
| **`profvis`** | Complex code, unknown bottlenecks | Simple functions, known issues | Time per line, call stack |
| **`bench::mark()`** | Comparing alternatives | Single approach | Relative performance, memory |
| **`system.time()`** | Quick checks | Detailed analysis | Total runtime only |
| **`Rprof()`** | Base R only environments | When profvis available | Raw profiling data |

## Performance Workflow

1. **Profile first** - Find the actual bottlenecks
2. **Focus on the slowest parts** - 80/20 rule
3. **Benchmark alternatives** - For hot spots only
4. **Consider tool trade-offs** - Based on bottleneck type

See [profiling-workflow.md](references/profiling-workflow.md) for the complete workflow.

## When Each Tool Helps vs Hurts

### Parallel Processing (`in_parallel()`)

**Helps when:**
- CPU-intensive computations
- Embarrassingly parallel problems
- Large datasets with independent operations
- I/O bound operations (file reading, API calls)

**Hurts when:**
- Simple, fast operations (overhead > benefit)
- Memory-intensive operations (may cause thrashing)
- Operations requiring shared state
- Small datasets

See [parallel-examples.md](references/parallel-examples.md) for decision points.

### Data Backend Selection

| Backend | Use When |
|---------|----------|
| **data.table** | Very large datasets (>1GB), complex grouping, maximum performance critical |
| **dplyr** | Readability priority, complex joins/window functions, moderate data (<100MB) |
| **base R** | No dependencies allowed, simple operations, teaching/learning |

See [backend-selection.md](references/backend-selection.md) for guidance.

## Profiling Best Practices

1. **Profile realistic data sizes** - Not toy examples
2. **Profile multiple runs** - For stability
3. **Check memory usage too** - Not just time
4. **Profile realistic usage patterns** - Not isolated calls

See [profiling-best-practices.md](references/profiling-best-practices.md) for examples.

## Performance Anti-Patterns to Avoid

- **Don't optimize without measuring** - Profile first
- **Don't over-engineer** - Complex optimizations for 1% gains
- **Don't assume** - "for loops are always slow" is a myth
- **Don't ignore readability costs** - Readable code with targeted optimizations

See [performance-anti-patterns.md](references/performance-anti-patterns.md) for examples.

## Modern purrr Patterns

### Data Frame Binding (purrr 1.0+)

| Superseded | Modern Replacement |
|------------|-------------------|
| `map_dfr(x, f)` | `map(x, f) \|> list_rbind()` |
| `map_dfc(x, f)` | `map(x, f) \|> list_cbind()` |
| `map2_dfr(x, y, f)` | `map2(x, y, f) \|> list_rbind()` |

### Side Effects with `walk()`

Use `walk()` and `walk2()` for side effects (file writing, plotting).

### Parallel Processing (purrr 1.1.0+)

Use `in_parallel()` with mirai for scaling across cores.

See [purrr-patterns.md](references/purrr-patterns.md) for all patterns.

## Backend Tools for Performance

When speed is critical, consider:
- **vctrs** - Type-stable vector operations
- **rlang** - Metaprogramming
- **data.table** - Large data operations

Profile to identify whether these tools will help your specific bottleneck.

source: Sarah Johnson's gist https://gist.github.com/sj-io/3828d64d0969f2a0f05297e59e6c15ad


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/profiling-workflow.md

```markdown
# Step-by-Step Performance Workflow

### 1. Profile first - find the actual bottlenecks

```r
library(profvis)
profvis({
  # Your slow code here
})
```

### 2. Focus on the slowest parts (80/20 rule)

Don't optimize until you know where time is spent.

### 3. Benchmark alternatives for hot spots

```r
library(bench)
bench::mark(
  current = current_approach(data),
  vectorized = vectorized_approach(data),
  parallel = map(data, in_parallel(func))
)
```

### 4. Consider tool trade-offs based on bottleneck type

```

### references/parallel-examples.md

```markdown
# Parallel Processing Decision Points

## Helps when

- CPU-intensive computations
- Embarrassingly parallel problems
- Large datasets with independent operations
- I/O bound operations (file reading, API calls)

## Hurts when

- Simple, fast operations (overhead > benefit)
- Memory-intensive operations (may cause thrashing)
- Operations requiring shared state
- Small datasets

## Example decision point

```r
expensive_func <- function(x) Sys.sleep(0.1) # 100ms per call
fast_func <- function(x) x^2                 # microseconds per call
```

### Good for parallel

```r
library(mirai)
daemons(4)
results <- map(1:100, in_parallel(expensive_func))  # ~10s -> ~2.5s on 4 cores
daemons(0)
```

### Bad for parallel (overhead > benefit)

```r
map(1:100, in_parallel(fast_func))  # 100us -> 50ms (500x slower!)
```

## Use parallel processing with mirai (purrr 1.1.0+)

```r
library(mirai)
daemons(4)
results <- large_datasets |>
  map(in_parallel(expensive_computation))
daemons(0)
```

```

### references/backend-selection.md

```markdown
# Data Backend Selection Guide

## Use data.table when

- Very large datasets (>1GB)
- Complex grouping operations
- Reference semantics desired
- Maximum performance critical

```r
library(data.table)
dt <- as.data.table(large_data)
dt[, .(mean_val = mean(value)), by = group]
```

## Use dplyr when

- Readability and maintainability priority
- Complex joins and window functions
- Team familiarity with tidyverse
- Moderate sized data (<100MB)

```r
library(dplyr)
data |>
  group_by(group) |>
  summarise(mean_val = mean(value))
```

## Use base R when

- No dependencies allowed
- Simple operations
- Teaching/learning contexts

```r
aggregate(value ~ group, data, mean)
```

```

### references/profiling-best-practices.md

```markdown
# Profiling Best Practices

### 1. Profile realistic data sizes

```r
library(profvis)
profvis({
  # Use actual data size, not toy examples
  real_data |> your_analysis()
})
```

### 2. Profile multiple runs for stability

```r
library(bench)
bench::mark(
  your_function(data),
  min_iterations = 10,  # Multiple runs
  max_iterations = 100
)
```

### 3. Check memory usage too

```r
bench::mark(
  approach1 = method1(data),
  approach2 = method2(data),
  check = FALSE,  # If outputs differ slightly
  filter_gc = FALSE  # Include GC time
)
```

### 4. Profile with realistic usage patterns

Not just isolated function calls.

```

### references/performance-anti-patterns.md

```markdown
# Performance Anti-Patterns to Avoid

## Don't optimize without measuring

- Bad: "This looks slow" -> immediately rewrite
- Good: Profile first, optimize bottlenecks

## Don't over-engineer for performance

- Bad: Complex optimizations for 1% gains
- Good: Focus on algorithmic improvements

## Don't assume - measure

- Bad: "for loops are always slow in R"
- Good: Benchmark your specific use case

## Don't ignore readability costs

- Bad: Unreadable code for minor speedups
- Good: Readable code with targeted optimizations

## Growing objects in loops - AVOID

### Bad - Growing objects in loops

```r
result <- c()
for(i in 1:n) {
  result <- c(result, compute(i))  # Slow!
}
```

### Good - Pre-allocate

```r
result <- vector("list", n)
for(i in 1:n) {
  result[[i]] <- compute(i)
}
```

### Better - Use purrr

```r
result <- map(1:n, compute)
```

```

### references/purrr-patterns.md

```markdown
# Modern purrr Patterns (purrr 1.0+)

### Modern data frame row binding

```r
models <- data_splits |>
  map(\(split) train_model(split)) |>
  list_rbind()  # Replaces map_dfr()
```

### Column binding

```r
summaries <- data_list |>
  map(\(df) get_summary_stats(df)) |>
  list_cbind()  # Replaces map_dfc()
```

### Superseded functions migration

```r
# map_dfr(x, f)      -> map(x, f) |> list_rbind()
# map_dfc(x, f)      -> map(x, f) |> list_cbind()
# map2_dfr(x, y, f)  -> map2(x, y, f) |> list_rbind()
# pmap_dfr(list, f)  -> pmap(list, f) |> list_rbind()
# imap_dfr(x, f)     -> imap(x, f) |> list_rbind()
```

### Side effects with walk()

```r
plots <- walk2(data_list, plot_names, \(df, name) {
  p <- ggplot(df, aes(x, y)) + geom_point()
  ggsave(name, p)
})
```

### For side effects - use walk instead of for loops

```r
walk(x, write_file)
walk2(data, paths, write_csv)
```

## Parallel processing (purrr 1.1.0+)

```r
library(mirai)
daemons(4)
results <- large_datasets |>
  map(in_parallel(expensive_computation))
daemons(0)
```

```