Ray Serve

Creator

Creator

Seonglae Cho

Created

Created

2026 Mar 24 14:13

Editor

Editor

Seonglae Cho

Edited

Edited

2026 Mar 24 14:13

Refs

Refs

Ray Serve: Scalable and Programmable Serving — Ray 2.54.0

Ray Serve is a scalable model serving library for building online inference APIs. Serve is framework-agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, TensorFlow, and Keras, to Scikit-Learn models, to arbitrary Python business logic. It has several features and performance optimizations for serving Large Language Models such as response streaming, dynamic request batching, multi-node/multi-GPU serving, etc.

https://docs.ray.io/en/latest/serve/index.html

Backlinks

In-Flight Batching

Recommendations

//////