Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

text-generation-inference documentation

Consuming Text Generation Inference

text-generation-inference

Getting started

Text Generation Inference Quick Tour Supported Models Using TGI with Nvidia GPUs Using TGI with AMD GPUs Using TGI with Intel Gaudi Using TGI with AWS Trainium and Inferentia Using TGI with Google TPUs Using TGI with Intel GPUs Installation from source Multi-backend support Internal Architecture Usage Statistics

Tutorials

Consuming TGI Preparing Model for Serving Serving Private & Gated Models Using TGI CLI Non-core Model Serving Safety Using Guidance, JSON, tools Visual Language Models Monitoring TGI with Prometheus and Grafana Train Medusa

Backends

Neuron Gaudi TensorRT-LLM Llamacpp

Reference

All TGI CLI options Exported Metrics API Reference

Conceptual Guides

V3 update, caching and chunking Streaming Quantization Tensor Parallelism PagedAttention