GPU

GPU

Creator

Creator

Created

Created

2020 Jan 19 5:28

Editor

Editor

Edited

Edited

2025 Jul 5 14:5

Refs

Refs

Graphics processing unit

The structure progresses from slow global memory (VRAM) to fast Shared Memory (SRAM) to ultra-fast registers, with computation speed far exceeding memory bandwidth. Overheads include memory bandwidth limitations (memory-bound) vs. computational processing capacity limitations (compute-bound), and host overhead from repeated small kernels. Arithmetic Intensity (AI): FLOP/byte ratio must be approximately 13 or higher to transition from memory-bound to compute-bound.

Optimization strategies to avoid bounds include Operation Fusion: reducing memory traffic by processing intermediate results in one go without writing to memory, and Tiling: maximizing read/write reusability by loading large tiles into Shared Memory.

Coalesced Loading: optimizing global memory efficiency by reading continuous 128-byte blocks at once in warp units.

Bank Conflict Avoidance: preventing bank conflicts by on-the-fly transposition when storing B tiles in Shared Memory.

GPU Notion

Streaming Multiprocessor

GPU Usages

https://x.com/RajaXg/status/1812721241985610147

Basic facts about GPUs

Making sure I don’t forget what I read.

https://damek.github.io/random/basic-facts-about-gpus/

To understand GPU implementation with
ISA:
tiny-gpu
adam-maj • Updated 2025 Jul 5 13:7

What Every Developer Should Know About GPU Computing

A primer on GPU architecture and computing

https://codeconfessions.substack.com/p/gpu-computing

What Every Developer Should Know About GPU Computing

Vector graphics on GPU.

Despite vector graphics being used in every computer with a screen connected to it, rendering of vector shapes and text is still mostly a task for the CPU. This is wrong and needs to be changed. Here I describe a general approach to rasterization and how we can ask for help from GPU when rendering vector paths.

Vector graphics on GPU.

https://gasiulis.name/vector-graphics-on-gpu

Vector graphics on GPU.

Nvidia Unveils Big Accelerator Memory: Solid-State Storage for GPUs

Microsoft's DirectStorage application programming interface (API) promises to improve the efficiency of GPU-to-SSD data transfers for games in a Windows environment, but Nvidia and its partners have found a way to make GPUs seamlessly work with SSDs without a proprietary API.

https://www.tomshardware.com/news/nvidia-unveils-big-accelerator-memory-solid-state-storage-for-gpus

Nvidia Unveils Big Accelerator Memory: Solid-State Storage for GPUs

Backlinks

Coding Environment Parallel Programming Semiconductor AI Industry Circuit Board

Recommendations

////