Qualifiers

The language uses a system of annotqualifiers) to explicitly define the memory hierarchy and movement of data across different hardware components. This provides a complete mental model for data management, from CPU RAM to GPU SRAM.

`@host`

Hardware: CPU RAM
Typical Usage: Dataset loading and heavy preprocessing tasks (Pandas-style). This is standard, pageable system memory.

`@pinned`

Hardware: CPU RAM (Page-locked / Pinned)
Typical Usage: Transit buffers used to stream data to GPUs at high bandwidth (e.g., 32GB/s+). Since it cannot be paged out to disk, it allows for faster DMA transfers to the device.

`@global`

Hardware: GPU VRAM
Typical Usage: Storage of model weights, biases, and gradients on a single GPU board. This is the main high-capacity memory on the device.

`@sharded`

Hardware: Cluster VRAM (Multi-GPU)
Typical Usage: Distribution of giant models (e.g., Llama 405B) across 8+ GPUs. It abstracts the partitioned memory across a cluster of devices.

`@shared`

Hardware: GPU SRAM (Shared Memory)
Typical Usage: Ultra high-speed computation inside a kernel (e.g., Tiling). This is the small, fast, on-chip memory shared by threads within the same block.