FP8/LogFMT native quantization and high-precision accumulation register support is needed. Adaptive Routing, Virtual Output Queuing (VOQ), and end-to-end lossless load control are required. Hardware error detection beyond ECC (Error-Correcting Code), Hardware-level acquire/release consistency and ordering guarantees improve memory-semantic communication by removing fence overhead.
AI Accelerator Companies
AI Accelerators
Geekbench AI
New Geekbench AI benchmark can test the performance of CPUs, GPUs, and NPUs
Performance test comes out of beta as NPUs become standard equipment in PCs.
https://arstechnica.com/gadgets/2024/08/geekbench-ml-becomes-geekbench-ai-a-cross-platform-performance-test-for-npus-and-more/

How We’ll Reach a 1 Trillion Transistor GPU
Advances in semiconductors are feeding the AI boom
https://spectrum.ieee.org/trillion-transistor-gpu
Chip is core part
Nvidia On the Mountaintop
Nvidia has gone from the valley to the mountain-top in less than a year, thanks to ChatGPT and the frenzy it inspired; whether or not there is a cliff depends on developing new kinds of demand that…
https://stratechery.com/2023/nvidia-on-the-mountaintop/

The Hardware Lottery
How hardware and software determine what research ideas succeed and fail.
https://hardwarelottery.github.io/
.jpg%3Fv%3D1586843526885?table=block&id=719dbd0c-1102-4c5d-997a-6b84ad46f74a&cache=v2)
Korean researchers power-shame Nvidia with new neural AI chip — claim 625 times less power draw, 41 times smaller
Claim Samsung-fabbed chip is the first ultra-low power LLM processor.
https://www.tomshardware.com/tech-industry/artificial-intelligence/korean-researchers-power-shame-nvidia-with-new-neural-ai-chip-claim-625-times-less-power-41-times-smaller


Seonglae Cho

