Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server | NVIDIA Technical Blog
Learn about FasterTransformer, one of the fastest libraries for distributed inference of transformers of any size, including benefits of using the library.
https://developer.nvidia.com/blog/accelerated-inference-for-large-transformer-models-using-nvidia-fastertransformer-and-nvidia-triton-inference-server/