optimizing-r
R performance profiling, benchmarking, and optimization strategies. Use this skill when code is running slowly, comparing alternative implementations, deciding between dplyr/data.table/base R, or implementing parallel processing. Covers profvis and bench usage, performance workflow, parallel processing with in_parallel(), data backend selection, modern purrr patterns (list_rbind, walk), and common performance anti-patterns to avoid.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install jeremy-allen-claude-skills-optimizing-r
Repository
Skill path: optimizing-r
R performance profiling, benchmarking, and optimization strategies. Use this skill when code is running slowly, comparing alternative implementations, deciding between dplyr/data.table/base R, or implementing parallel processing. Covers profvis and bench usage, performance workflow, parallel processing with in_parallel(), data backend selection, modern purrr patterns (list_rbind, walk), and common performance anti-patterns to avoid.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Backend, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: jeremy-allen.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install optimizing-r into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/jeremy-allen/claude-skills before adding optimizing-r to shared team environments
- Use optimizing-r for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: optimizing-r
description: |
R performance profiling, benchmarking, and optimization strategies. Use this skill when code is running slowly, comparing alternative implementations, deciding between dplyr/data.table/base R, or implementing parallel processing. Covers profvis and bench usage, performance workflow, parallel processing with in_parallel(), data backend selection, modern purrr patterns (list_rbind, walk), and common performance anti-patterns to avoid.
---
# Optimizing R
This skill covers profiling, benchmarking, parallelization, and performance best practices for R.
## Core Principle
**Profile before optimizing** - Use profvis and bench to identify real bottlenecks. Write readable code first, optimize only when necessary.
## Profiling Tools Decision Matrix
| Tool | Use When | Don't Use When | What It Shows |
|------|----------|----------------|---------------|
| **`profvis`** | Complex code, unknown bottlenecks | Simple functions, known issues | Time per line, call stack |
| **`bench::mark()`** | Comparing alternatives | Single approach | Relative performance, memory |
| **`system.time()`** | Quick checks | Detailed analysis | Total runtime only |
| **`Rprof()`** | Base R only environments | When profvis available | Raw profiling data |
## Performance Workflow
1. **Profile first** - Find the actual bottlenecks
2. **Focus on the slowest parts** - 80/20 rule
3. **Benchmark alternatives** - For hot spots only
4. **Consider tool trade-offs** - Based on bottleneck type
See [profiling-workflow.md](references/profiling-workflow.md) for the complete workflow.
## When Each Tool Helps vs Hurts
### Parallel Processing (`in_parallel()`)
**Helps when:**
- CPU-intensive computations
- Embarrassingly parallel problems
- Large datasets with independent operations
- I/O bound operations (file reading, API calls)
**Hurts when:**
- Simple, fast operations (overhead > benefit)
- Memory-intensive operations (may cause thrashing)
- Operations requiring shared state
- Small datasets
See [parallel-examples.md](references/parallel-examples.md) for decision points.
### Data Backend Selection
| Backend | Use When |
|---------|----------|
| **data.table** | Very large datasets (>1GB), complex grouping, maximum performance critical |
| **dplyr** | Readability priority, complex joins/window functions, moderate data (<100MB) |
| **base R** | No dependencies allowed, simple operations, teaching/learning |
See [backend-selection.md](references/backend-selection.md) for guidance.
## Profiling Best Practices
1. **Profile realistic data sizes** - Not toy examples
2. **Profile multiple runs** - For stability
3. **Check memory usage too** - Not just time
4. **Profile realistic usage patterns** - Not isolated calls
See [profiling-best-practices.md](references/profiling-best-practices.md) for examples.
## Performance Anti-Patterns to Avoid
- **Don't optimize without measuring** - Profile first
- **Don't over-engineer** - Complex optimizations for 1% gains
- **Don't assume** - "for loops are always slow" is a myth
- **Don't ignore readability costs** - Readable code with targeted optimizations
See [performance-anti-patterns.md](references/performance-anti-patterns.md) for examples.
## Modern purrr Patterns
### Data Frame Binding (purrr 1.0+)
| Superseded | Modern Replacement |
|------------|-------------------|
| `map_dfr(x, f)` | `map(x, f) \|> list_rbind()` |
| `map_dfc(x, f)` | `map(x, f) \|> list_cbind()` |
| `map2_dfr(x, y, f)` | `map2(x, y, f) \|> list_rbind()` |
### Side Effects with `walk()`
Use `walk()` and `walk2()` for side effects (file writing, plotting).
### Parallel Processing (purrr 1.1.0+)
Use `in_parallel()` with mirai for scaling across cores.
See [purrr-patterns.md](references/purrr-patterns.md) for all patterns.
## Backend Tools for Performance
When speed is critical, consider:
- **vctrs** - Type-stable vector operations
- **rlang** - Metaprogramming
- **data.table** - Large data operations
Profile to identify whether these tools will help your specific bottleneck.
source: Sarah Johnson's gist https://gist.github.com/sj-io/3828d64d0969f2a0f05297e59e6c15ad
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/profiling-workflow.md
```markdown
# Step-by-Step Performance Workflow
### 1. Profile first - find the actual bottlenecks
```r
library(profvis)
profvis({
# Your slow code here
})
```
### 2. Focus on the slowest parts (80/20 rule)
Don't optimize until you know where time is spent.
### 3. Benchmark alternatives for hot spots
```r
library(bench)
bench::mark(
current = current_approach(data),
vectorized = vectorized_approach(data),
parallel = map(data, in_parallel(func))
)
```
### 4. Consider tool trade-offs based on bottleneck type
```
### references/parallel-examples.md
```markdown
# Parallel Processing Decision Points
## Helps when
- CPU-intensive computations
- Embarrassingly parallel problems
- Large datasets with independent operations
- I/O bound operations (file reading, API calls)
## Hurts when
- Simple, fast operations (overhead > benefit)
- Memory-intensive operations (may cause thrashing)
- Operations requiring shared state
- Small datasets
## Example decision point
```r
expensive_func <- function(x) Sys.sleep(0.1) # 100ms per call
fast_func <- function(x) x^2 # microseconds per call
```
### Good for parallel
```r
library(mirai)
daemons(4)
results <- map(1:100, in_parallel(expensive_func)) # ~10s -> ~2.5s on 4 cores
daemons(0)
```
### Bad for parallel (overhead > benefit)
```r
map(1:100, in_parallel(fast_func)) # 100us -> 50ms (500x slower!)
```
## Use parallel processing with mirai (purrr 1.1.0+)
```r
library(mirai)
daemons(4)
results <- large_datasets |>
map(in_parallel(expensive_computation))
daemons(0)
```
```
### references/backend-selection.md
```markdown
# Data Backend Selection Guide
## Use data.table when
- Very large datasets (>1GB)
- Complex grouping operations
- Reference semantics desired
- Maximum performance critical
```r
library(data.table)
dt <- as.data.table(large_data)
dt[, .(mean_val = mean(value)), by = group]
```
## Use dplyr when
- Readability and maintainability priority
- Complex joins and window functions
- Team familiarity with tidyverse
- Moderate sized data (<100MB)
```r
library(dplyr)
data |>
group_by(group) |>
summarise(mean_val = mean(value))
```
## Use base R when
- No dependencies allowed
- Simple operations
- Teaching/learning contexts
```r
aggregate(value ~ group, data, mean)
```
```
### references/profiling-best-practices.md
```markdown
# Profiling Best Practices
### 1. Profile realistic data sizes
```r
library(profvis)
profvis({
# Use actual data size, not toy examples
real_data |> your_analysis()
})
```
### 2. Profile multiple runs for stability
```r
library(bench)
bench::mark(
your_function(data),
min_iterations = 10, # Multiple runs
max_iterations = 100
)
```
### 3. Check memory usage too
```r
bench::mark(
approach1 = method1(data),
approach2 = method2(data),
check = FALSE, # If outputs differ slightly
filter_gc = FALSE # Include GC time
)
```
### 4. Profile with realistic usage patterns
Not just isolated function calls.
```
### references/performance-anti-patterns.md
```markdown
# Performance Anti-Patterns to Avoid
## Don't optimize without measuring
- Bad: "This looks slow" -> immediately rewrite
- Good: Profile first, optimize bottlenecks
## Don't over-engineer for performance
- Bad: Complex optimizations for 1% gains
- Good: Focus on algorithmic improvements
## Don't assume - measure
- Bad: "for loops are always slow in R"
- Good: Benchmark your specific use case
## Don't ignore readability costs
- Bad: Unreadable code for minor speedups
- Good: Readable code with targeted optimizations
## Growing objects in loops - AVOID
### Bad - Growing objects in loops
```r
result <- c()
for(i in 1:n) {
result <- c(result, compute(i)) # Slow!
}
```
### Good - Pre-allocate
```r
result <- vector("list", n)
for(i in 1:n) {
result[[i]] <- compute(i)
}
```
### Better - Use purrr
```r
result <- map(1:n, compute)
```
```
### references/purrr-patterns.md
```markdown
# Modern purrr Patterns (purrr 1.0+)
### Modern data frame row binding
```r
models <- data_splits |>
map(\(split) train_model(split)) |>
list_rbind() # Replaces map_dfr()
```
### Column binding
```r
summaries <- data_list |>
map(\(df) get_summary_stats(df)) |>
list_cbind() # Replaces map_dfc()
```
### Superseded functions migration
```r
# map_dfr(x, f) -> map(x, f) |> list_rbind()
# map_dfc(x, f) -> map(x, f) |> list_cbind()
# map2_dfr(x, y, f) -> map2(x, y, f) |> list_rbind()
# pmap_dfr(list, f) -> pmap(list, f) |> list_rbind()
# imap_dfr(x, f) -> imap(x, f) |> list_rbind()
```
### Side effects with walk()
```r
plots <- walk2(data_list, plot_names, \(df, name) {
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave(name, p)
})
```
### For side effects - use walk instead of for loops
```r
walk(x, write_file)
walk2(data, paths, write_csv)
```
## Parallel processing (purrr 1.1.0+)
```r
library(mirai)
daemons(4)
results <- large_datasets |>
map(in_parallel(expensive_computation))
daemons(0)
```
```