Pytorch Memory

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 10 11:56
Editor
Edited
Edited
2025 Feb 10 12:16
Refs
 
removes allocated memory
 
  • CUDA OOMs: Count of out-of-memory errors on the GPU.
  • cudaMalloc retries: Number of times memory allocation was retried before succeeding.
  • Allocated memory:
    • Cur Usage: Currently allocated bytes by tensors.
    • Peak Usage: Maximum allocated bytes recorded.
    • Tot Alloc / Tot Freed: Total bytes allocated/freed over time.
    • from large/small pool: Breakdown for large vs. small block allocations.
  • Active memory: Memory currently held by active tensors (similar stats as Allocated memory).
  • Requested memory: Memory actually requested from the allocator (may differ slightly due to alignment or overhead).
  • GPU reserved memory: Total GPU memory reserved by PyTorch’s caching allocator (includes both used and cached/free blocks).
  • Non-releasable memory: Memory that, due to fragmentation or caching, cannot be returned to the GPU immediately.
  • Allocations: Count of all allocation events (total and split into large/small pool allocations).
  • Active allocs: Count of currently active allocation events.
  • GPU reserved segments: Number of contiguous memory chunks reserved from the GPU (split into large and small segments).
  • Non-releasable allocs: Count of allocations that remain in memory and cannot be freed immediately.
  • Oversize allocations / Oversize GPU segments: Allocations (and their corresponding memory chunks) that were too large to fit in the regular caching pools and thus were allocated separately.
 
 

Memory snapshot

  • Active Memory Timeline
  • Allocator State History
  • Active Cached Segment Timeline
  • Allocator Settings
Allocator State History
Allocator State History
notion image

Memory snapshot with profiler

notion image
 
 

Memory exported visualizer (hosted)

 
 

Recommendations