FlashDecoding++Paper page - FlashDecoding++: Faster Large Language Model Inference on GPUsJoin the discussion on this paper pagehttps://huggingface.co/papers/2311.01282PyTorchAn open source machine learning framework that accelerates the path from research prototyping to production deployment.https://pytorch.org/blog/flash-decoding/