Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
text-generation-inference documentation
Consuming Text Generation Inference
text-generation-inference
🏡 View all docs
AWS Trainium & Inferentia
Accelerate
Argilla
AutoTrain
Bitsandbytes
Chat UI
Dataset viewer
Datasets
Deploying on AWS
Diffusers
Distilabel
Evaluate
Gradio
Hub
Hub Python Library
Huggingface.js
Inference Endpoints (dedicated)
Inference Providers
Kernels
LeRobot
Leaderboards
Lighteval
Microsoft Azure
Optimum
PEFT
Safetensors
Sentence Transformers
TRL
Tasks
Text Embeddings Inference
Text Generation Inference
Tokenizers
Trackio
Transformers
Transformers.js
smolagents
timm
Search documentation
main
EN
Getting started
Text Generation Inference
Quick Tour
Supported Models
Using TGI with Nvidia GPUs
Using TGI with AMD GPUs
Using TGI with Intel Gaudi
Using TGI with AWS Trainium and Inferentia
Using TGI with Google TPUs
Using TGI with Intel GPUs
Installation from source
Multi-backend support
Internal Architecture
Usage Statistics
Tutorials
Consuming TGI
Preparing Model for Serving
Serving Private & Gated Models
Using TGI CLI
Non-core Model Serving
Safety
Using Guidance, JSON, tools
Visual Language Models
Monitoring TGI with Prometheus and Grafana
Train Medusa
Backends
Neuron
Gaudi
TensorRT-LLM
Llamacpp
Reference
All TGI CLI options
Exported Metrics
API Reference
Conceptual Guides
V3 update, caching and chunking
Streaming
Quantization
Tensor Parallelism
PagedAttention