SkillHub ClubShip Full StackFull Stack

pytorch-core

Core PyTorch fundamentals including tensor operations, autograd, nn.Module architecture, and training loop orchestration. Covers optimizations like pin_memory and lazy module initialization. (pytorch, tensor, autograd, nn.Module, optimizer, training loop, state_dict, pin_memory, lazylinear, requires_grad)

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

766

Hot score

Updated

March 20, 2026

Overall rating

C4.8

Composite score

4.8

Best-practice grade

B81.2

Install command

npx @skill-hub/cli install benchflow-ai-skillsbench-pytorch-core

Repository

benchflow-ai/SkillsBench

Skill path: registry/terminal_bench_2.0/full_batch_reviewed/terminal_bench_2_0_torch-tensor-parallelism/environment/skills/pytorch-core

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: benchflow-ai.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install pytorch-core into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/benchflow-ai/SkillsBench before adding pytorch-core to shared team environments
Use pytorch-core for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: pytorch-core
description: Core PyTorch fundamentals including tensor operations, autograd, nn.Module architecture, and training loop orchestration. Covers optimizations like pin_memory and lazy module initialization. (pytorch, tensor, autograd, nn.Module, optimizer, training loop, state_dict, pin_memory, lazylinear, requires_grad)
---

## Overview

Core PyTorch provides the fundamental building blocks for deep learning, focusing on tensor computation with strong GPU acceleration and a deep-learning-oriented autograd system. It emphasizes a "define-by-run" approach where models are standard Python objects.

## When to Use

Use PyTorch Core when you need granular control over model architecture, custom training loops, or specific hardware optimizations like pinned memory for data transfers.

## Decision Tree

1. Do you know the input dimensions of your data?
- YES: Use standard layers (e.g., `nn.Linear`).
- NO: Use Lazy modules (e.g., `nn.LazyLinear`) to defer initialization.
2. Is your bottleneck data transfer to the GPU?
- YES: Enable `pin_memory=True` in your `DataLoader`.
- NO: Standard data loading suffices.
3. Are you fine-tuning a model?
- YES: Set `requires_grad=False` for frozen parameters.
- NO: Keep `requires_grad=True` for full training.

## Workflows

1. **Standard Training Iteration**
1. Load a batch of data from the `DataLoader`.
2. Zero the gradients using `optimizer.zero_grad()`.
3. Perform a forward pass through the `nn.Module`.
4. Compute the loss using a criterion (e.g., `nn.CrossEntropyLoss`).
5. Execute a backward pass with `loss.backward()` to compute gradients.
6. Update model parameters using `optimizer.step()`.

2. **Model Persistence and Checkpointing**
1. Capture the state of the model and optimizer using `.state_dict()`.
2. Save the dictionaries to a file using `torch.save()`.
3. Restore the model by instantiating the class and calling `.load_state_dict()`.
4. Ensure `.eval()` is called before inference to handle Dropout and BatchNorm correctly.

3. **Deferred Architecture Initialization**
1. Define the model using Lazy modules (e.g., `nn.LazyLinear`).
2. Initialize the model on the desired device.
3. Run a dummy input or the first real batch through the model.
4. PyTorch automatically infers and sets the weight shapes based on the input.

## Non-Obvious Insights

- **Lazy Initialization**: Using `LazyLinear` or `LazyConv2d` simplifies architecture definitions where input dimensions are unknown, preventing manual shape calculation errors.
- **Data Transfer Optimization**: Using `pin_memory()` in DataLoaders is a critical optimization for faster data transfer between CPU and GPU.
- **Dynamic Gradient Control**: The `requires_grad` attribute can be toggled on-the-fly to freeze parameters during fine-tuning or transfer learning without re-instantiating the model.

## Evidence

- "Most machine learning workflows involve working with data, creating models, optimizing model parameters, and saving the trained models." (https://pytorch.org/tutorials/beginner/basics/intro.html)
- "Lazy modules like LazyLinear allow for deferred initialization of input dimensions until the first forward pass." (https://pytorch.org/docs/stable/nn.html)

## Scripts

- `scripts/pytorch-core_tool.py`: Provides a standard training loop skeleton and lazy initialization examples.
- `scripts/pytorch-core_tool.js`: Node.js wrapper for invoking PyTorch training scripts.

## Dependencies

- torch
- torchvision (optional for datasets)
- numpy

## References

- [PyTorch Core Reference](references/README.md)