Models for fine-tuning in Nebius Token Factory

Nebius Token Factory supports fine-tuning on multiple open-weight model families.
This page lists:

Which base models you can fine-tune
Which context lengths they support
Which fine-tuning types are available (LoRA vs full fine-tuning)

Deployment note
Not all models that can be fine-tuned can be deployed as serverless endpoints in Nebius Token Factory.

For serving options, see Deploy custom model and the list of available deployment models.

Model List

For each models listed below, Nebius Token Factory supports the following

context_length: 8192, 16384, 32768, 65536, 131072 Unless you override it via the context_length hyperparameter, the default context length for fine-tuning is 8192 tokens. Check hyperparameter section for model details regarding context_length

OpenAI / Unsloth GPT-OSS

These models are OpenAI GPT-OSS weights (bf16) packaged by Unsloth.
They are Apache 2.0–licensed and suitable for both research and commercial use (subject to the license). To convert the weights into MXFP4 please follow instructions here.

Name	Training type	Model card / license
unsloth/gpt-oss-20b-BF16 (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
unsloth/gpt-oss-120b-BF16 (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0

For merging MoE LoRA adapter weights to deploy on Dedicated endpoints please follow the guide here.

Qwen

Nebius Token Factory supports dense, moe and coder variants across Qwen3 and Qwen2.5 families.
All Qwen models below use the Apache 2.0 license (see each model card for details).

Qwen3 MoE

Name	Training type	Model card / license
Qwen/Qwen3-235B-A22B (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
Qwen/Qwen3-235B-A22B-Instruct-2507 (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
Qwen/Qwen3-235B-A22B-Thinking-2507 (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
Qwen/Qwen3-30B-A3B (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
Qwen/Qwen3-30B-A3B-Instruct-2507 (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
Qwen/Qwen3-30B-A3B-Thinking-2507 (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0

Qwen3 coder

Name	Training type	Model card / license
Qwen/Qwen3-Coder-30B-A3B-Instruct (Model card)	LoRA and Full Parameter fine-tuning	Apache 2.0
Qwen/Qwen3-Coder-480B-A35B-Instruct (Model card)	LoRA and Full Parameter fine-tuning

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Teams & Access Management

Integrations

Model List

OpenAI / Unsloth GPT-OSS

Qwen

Qwen3 MoE

Qwen3 coder

Get Started

AI Models Inference

Observability

Post-training

Data Lab

Teams & Access Management

Integrations

Documentation Index

​Model List

​OpenAI / Unsloth GPT-OSS

​Qwen

​Qwen3 MoE

​Qwen3 coder

Model List

OpenAI / Unsloth GPT-OSS

Qwen

Qwen3 MoE

Qwen3 coder