Ray Serve: Scalable and Programmable Serving — Ray 2.54.0
Ray Serve is a scalable model serving library for building online inference APIs.
Serve is framework-agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, TensorFlow, and Keras, to Scikit-Learn models, to arbitrary Python business logic. It has several features and performance optimizations for serving Large Language Models such as response streaming, dynamic request batching, multi-node/multi-GPU serving, etc.
https://docs.ray.io/en/latest/serve/index.html

Seonglae Cho