Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/
AI Inference Tool
Search
AI Inference Tool

AI Inference Tool

Creator
Creator
Seonglae Cho
Created
Created
2023 Oct 26 17:15
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Mar 7 0:6
Refs
Refs
Model Training Tool
Prompt Engineering Tool
LLM Development
Local LLM

AI Optimization
,
Inference Optimization

Model Inference Tools
Triton Inference
Vllm
Exo
TensorRT
SGLang
Deepsparse
OpenVINO
Sparsify
PowerInfer
Flexflow
Transformer Engine
Faster Transformer
TensorIR
XFormers
Torchchat
Exo Inference
AirLLM
Lingua
 
 

AI Server

Model Inference Servers
TGI
Dynamo
TEI
ONNX Server
Torchserve
Kserve
Truss
BentoML
Nvidia NIM
Lepton AI
 
 
 
AI Performance Libraries
FlashInfer
GGML
Flashlight
FastAI
 
 
 
https://huggingface.co/settings/local-apps
 
 
 
LLM inference cost is going down fast
Welcome to LLMflation - LLM inference cost is going down fast ⬇️ | Andreessen Horowitz
For LLM of equivalent performance, the inference cost is decreasing by 10x every year. What cost $60/million tokens in 2021 costs $.06/million tokens today.
Welcome to LLMflation - LLM inference cost is going down fast ⬇️ | Andreessen Horowitz
https://a16z.com/llmflation-llm-inference-cost
Welcome to LLMflation - LLM inference cost is going down fast ⬇️ | Andreessen Horowitz
Inference cost with model size
Observations About LLM Inference Pricing — LessWrong
This work was done as part of the MIRI Technical Governance Team. It reflects my views and may not reflect those of the organization. …
Observations About LLM Inference Pricing — LessWrong
https://www.lesswrong.com/posts/mRKd4ArA5fYhd2BPb/observations-about-llm-inference-pricing
Observations About LLM Inference Pricing — LessWrong
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Development/
AI Inference Tool
Copyright Seonglae Cho