Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Qualifiers

The language uses a system of annotqualifiers) to explicitly define the memory hierarchy and movement of data across different hardware components. This provides a complete mental model for data management, from CPU RAM to GPU SRAM.

@host

  • Hardware: CPU RAM
  • Typical Usage: Dataset loading and heavy preprocessing tasks (Pandas-style). This is standard, pageable system memory.

@pinned

  • Hardware: CPU RAM (Page-locked / Pinned)
  • Typical Usage: Transit buffers used to stream data to GPUs at high bandwidth (e.g., 32GB/s+). Since it cannot be paged out to disk, it allows for faster DMA transfers to the device.

@global

  • Hardware: GPU VRAM
  • Typical Usage: Storage of model weights, biases, and gradients on a single GPU board. This is the main high-capacity memory on the device.

@sharded

  • Hardware: Cluster VRAM (Multi-GPU)
  • Typical Usage: Distribution of giant models (e.g., Llama 405B) across 8+ GPUs. It abstracts the partitioned memory across a cluster of devices.

@shared

  • Hardware: GPU SRAM (Shared Memory)
  • Typical Usage: Ultra high-speed computation inside a kernel (e.g., Tiling). This is the small, fast, on-chip memory shared by threads within the same block.