Reference architectures

Tier: Free, Premium, Ultimate
Offering: GitLab Self-Managed

The GitLab reference architectures are validated, production-ready environment designs for deploying GitLab at scale. Each architecture provides detailed specifications that you can use or adapt based on your requirements.

Before you start

First, consider whether GitLab Self-Managed is the right choice for you and your requirements.

Running any application in production is complex, and the same applies for GitLab. While we aim to make this as smooth as possible, there are still the general complexities based on your design. Typically you have to manage all aspects such as hardware, operating systems, networking, storage, security, GitLab itself, and more. This includes both the initial setup of the environment and the longer term maintenance.

You must have a working knowledge of running and maintaining applications in production if you decide to go down this route. If you aren’t in this position, our Professional Services team offers implementation services. Those who want a more managed solution long term, can explore our other offerings such as GitLab SaaS or GitLab Dedicated.

If you are considering using the GitLab Self-Managed approach, we encourage you to read through this page in full, specifically the following sections:

Deciding which architecture to start with

The reference architectures are designed to strike a balance between three important factors: performance, resilience, and cost. They are designed to make it easier to set up GitLab at scale. However, it can still be a challenge to know which one meets your requirements and where to start accordingly.

As a general guide, the more performant or resilient you want your environment to be, the more complex it is.

This section explains the things to consider when picking a reference architecture.

Expected load (RPS or user count)

The right architecture size depends primarily on your environment’s expected peak load. The most objective measure of this load is through peak Requests per Second (RPS) coming into the environment.

Each architecture is designed to handle specific RPS targets for different types of requests (API, Web, Git). These details are described in the “Testing Methodology” section on each page.

For comprehensive RPS analysis and data-driven sizing decisions, see reference architecture sizing, which provides:

Detailed PromQL queries for extracting peak and sustained RPS metrics.
Workload pattern analysis to identify component-specific adjustments.
Assessment methodology for monorepos, network usage, and growth planning.

For quick RPS estimation, some potential options include:

Prometheus queries, such as:

sum(irate(gitlab_transaction_duration_seconds_count{controller!~'HealthController|MetricsController'}[1m])) by (controller, action)

get-rps script from GitLab support.
Other monitoring solutions.
Load balancer statistics.

If you can’t determine your RPS, we provide an alternative sizing method based on equivalent User Count by Load Category. This count is mapped to typical RPS values, considering both manual and automated usage.

Available reference architectures

The following reference architectures are available as recommended starting points for your environment.

The architectures are named in terms of peak load, based on user count or requests per second (RPS). RPS is calculated based on average real data.

Each architecture is designed to be scalable and elastic. They can be adjusted accordingly based on your workload, upwards or downwards. For example, some known heavy scenarios such as using large monorepos or notable additional workloads.

For details about what each reference architecture is tested against, see the “Testing Methodology” section of each page.

Initial sizing guide

To determine which architecture to pick for the expected load, see the following initial sizing guide tables.

Before you select an initial architecture, review this section thoroughly. Consider other factors such as High Availability (HA) or use of large monorepos because they may impact the choice beyond just RPS or user count.

GitLab package (Omnibus)

The following is the list of Linux package based reference architectures:

Size	Users	API RPS	Web RPS	Git (Pull) RPS	Git (Push) RPS
X Small	1,000	20	2	2	1
Small	2,000	40	4	4	1
Medium	3,000	60	6	6	1
Large	5,000	100	10	10	2
X Large	10,000	200	20	20	4
2X Large	25,000	500	50	50	10
3X Large	50,000	1000	100	100	20

Cloud native hybrid

The following is a list of Cloud Native Hybrid reference architectures, where select recommended components can be run in Kubernetes:

Size	Users	API RPS	Web RPS	Git (Pull) RPS	Git (Push) RPS
Small	2,000	40	4	4	1
Medium	3,000	60	6	6	1
Large	5,000	100	10	10	2
X Large	10,000	200	20	20	4
2X Large	25,000	500	50	50	10
3X Large	50,000	1000	100	100	20

If in doubt, start large, monitor, and then scale down

If you’re uncertain about the required environment size, consider starting with a larger size, monitoring it, and then scaling down accordingly if the metrics support your situation.

Starting large and then scaling down is a prudent approach when:

You can’t determine RPS
The environment load could be atypically higher than expected
You have large monorepos or notable additional workloads

For example, if you have 3,000 users but also know that there’s automation at play that would significantly increase the concurrent load, then you could start with a 100 RPS / 5k User class environment, monitor it, and if the metrics support it, scale down all components at once, or one by one.

Standalone (non-HA)

For environments serving 2,000 or fewer users, it’s generally recommended to follow a standalone approach by deploying a non-HA, single, or multi-node environment. With this approach, you can employ strategies such as automated backups for recovery. These strategies provide a good level of recovery time objective (RTO) or recovery point objective (RPO) while avoiding the complexities that come with HA.

With standalone setups, especially single node environments, various options are available for installation and management. The options include the ability to deploy directly by using select cloud provider marketplaces that reduce the complexity a little further.

High Availability (HA)

High Availability ensures every component in the GitLab setup can handle failures through various mechanisms. However, to achieve this is complex, and the environments required can be sizable.

For environments serving 3,000 or more users, we generally recommend using an HA strategy. At this level, outages have a bigger impact against more users. All the architectures in this range have HA built in by design for this reason.

Do you need High Availability (HA)?

As mentioned previously, achieving HA comes at a cost. The environment requirements are sizable as each component needs to be multiplied, which comes with additional actual and maintenance costs.

For a lot of our customers with fewer than 3,000 users, we’ve found that a backup strategy is sufficient and even preferable. While this does have a slower recovery time, it also means you have a much smaller architecture and less maintenance costs as a result.

As a general guideline, employ HA only in the following scenarios:

When you have 3,000 or more users.
When GitLab being down would critically impact your workflow.

Scaled-down High Availability (HA) approach

If you still need HA for fewer users, you can achieve it with an adjusted 3K architecture.

Zero-downtime upgrades

Zero-downtime upgrades are available for standard environments with HA (Cloud Native Hybrid is not supported). This allows for an environment to stay up during an upgrade. However, this process is more complex as a result and has some limitations as detailed in the documentation.

When going through this process, it’s worth noting that there may still be brief moments of downtime when the HA mechanisms take effect.

In most cases, the downtime required for doing an upgrade shouldn’t be substantial. Use this approach only if it’s a key requirement for you.

Cloud Native Hybrid (Kubernetes HA)

As an additional layer of HA resilience, you can deploy select components in Kubernetes, known as a Cloud Native Hybrid reference architecture. For stability reasons, stateful components such as Gitaly cannot be deployed in Kubernetes.

Cloud Native Hybrid is an alternative and more advanced setup compared to a standard reference architecture. Running services in Kubernetes is complex. Use this setup only if you have strong working knowledge and experience in Kubernetes.

GitLab Geo (Cross Regional Distribution / Disaster Recovery)

With GitLab Geo, you can achieve distributed environments in different regions with a full Disaster Recovery (DR) setup in place. GitLab Geo requires at least two separate environments:

One primary site.
One or more secondary sites that serve as replicas.

If the primary site becomes unavailable, you can fail over to one of the secondary sites.

Use this advanced and complex setup only if DR is a key requirement for your environment. You must also make additional decisions on how each site is configured. For example, if each secondary site would be the same architecture as the primary or if each site is configured for HA.

Large monorepos / Additional workloads

Large monorepos or significant additional workloads can affect the performance of the environment notably. Some adjustments may be required depending on the context.

For comprehensive analysis of these factors, see reference architecture sizing, which provides:

Detailed assessment methodology for monorepo impacts on infrastructure.
Component-specific scaling recommendations for different workload patterns.
Network bandwidth analysis for heavy data transfer scenarios.

If this situation applies to you, reach out to your GitLab representative or our Support team for further guidance.

Cloud provider services

For all the previously described strategies, you can run select GitLab components on equivalent cloud provider services such as the PostgreSQL database or Redis.

For more information, see the recommended cloud providers and services.

Decision Tree

Read through the guidance documented previously in full first before you refer to the following decision tree.

%%{init: { "fontFamily": "GitLab Sans" }}%%
graph TD
    accTitle: Decision tree for reference architecture selection
    accDescr: Key considerations for selecting architecture including expected load, HA requirements, and additional workload factors.

   L0A(<b>What Reference Architecture should I use?</b>)
   L1A(<b>What is your expected load?</b>)

   L2A("60 RPS / 3,000 users or more?")
   L2B("40 RPS / 2,000 users or less?")

   L3A("Do you need HA?<br>(or zero-downtime upgrades)")
   L3B[Do you have experience with<br/>and want additional resilience<br/>with select components in Kubernetes?]

   L4A><b>Recommendation</b><br><br>60 RPS / 3,000 user architecture with HA<br>and supported reductions]
   L4B><b>Recommendation</b><br><br>Architecture closest to expected load with HA]
   L4C><b>Recommendation</b><br><br>Cloud Native Hybrid architecture<br>closest to expected load]
   L4D>"<b>Recommendation</b><br><br>Standalone 20 RPS / 1,000 user or 40 RPS / 2,000 user<br/>architecture with Backups"]

   L0A --> L1A
   L1A --> L2A
   L1A --> L2B
   L2A -->|Yes| L3B
   L3B -->|Yes| L4C
   L3B -->|No| L4B

   L2B --> L3A
   L3A -->|Yes| L4A
   L3A -->|No| L4D
   L5A("Do you need cross regional distribution</br> or disaster recovery?") --> |Yes| L6A><b>Additional Recommendation</b><br><br> GitLab Geo]
   L4A ~~~ L5A
   L4B ~~~ L5A
   L4C ~~~ L5A
   L4D ~~~ L5A

   L5B("Do you have Large Monorepos or expect</br> to have substantial additional workloads?") --> |Yes| L6B><b>Additional Recommendations</b><br><br>Start large, monitor and scale down<br><br> Contact GitLab representative or Support]
   L4A ~~~ L5B
   L4B ~~~ L5B
   L4C ~~~ L5B
   L4D ~~~ L5B

Requirements

Before implementing a reference architecture, see the following requirements and guidance.

Supported machine types

The architectures are designed to be flexible in terms of machine type selection while ensuring consistent performance. While we provide specific machine type examples in each reference architecture, these are not intended to be prescriptive defaults.

You can use any machine types that meet or exceed the specified requirements for each component, such as:

Newer generation machine types (like GCP n2 series or AWS m6 series)
Different architectures like ARM-based instances (such as AWS Graviton)
Alternative machine type families that better match your specific workload characteristics (such as higher network bandwidth)

This guidance is also applicable for any Cloud Provider services such as AWS RDS.

Any “burstable” instance types are not recommended due to inconsistent performance.

For details about what machine types we test against and how, refer to validation and test results.

Supported disk types

Most standard disk types are expected to work for GitLab. However, be aware of the following specific call-outs:

Gitaly has certain disk requirements for Gitaly storages.
We don’t recommend the use of any disk types that are “burstable” due to inconsistent performance.

Other disk types are expected to work with GitLab. Choose based on your requirements such as durability or cost.

Supported infrastructure

GitLab should run on most infrastructures such as reputable cloud providers (AWS, GCP, Azure) and their services, or self-managed (ESXi) that meet both:

The specifications detailed in each architecture.
Any requirements in this section.

However, this does not guarantee compatibility with every potential permutation.

See Recommended cloud providers and services for more information.

Networking (High Availability)

Below are the network requirements for running GitLab in a High Availability fashion.

Network latency

Network latency should be as low as possible to allow for synchronous replication across the GitLab application, such as database replication. Generally this should be lower than 5 ms.

Availability zones (Cloud Providers)

Deploying across availability zones is supported and generally recommended for additional resilience. You should use an odd number of zones to align with GitLab application requirements, as some components use an odd number of nodes for quorum voting.

Data centers (Self Hosted)

Deploying across multiple self-hosted data centers is possible but requires careful consideration. This requires synchronous capable latency between centers, robust redundant network links to prevent split-brain scenarios, all centers located in the same geographic region, and deployment across an odd number of centers for proper quorum voting (like availability zones).

It may not be possible for GitLab Support to assist with infrastructure-related issues stemming from multi-data center deployments. Choosing to deploy across centers is generally at your own risk.

It is not supported to deploy a single GitLab environment across different regions. Data centers should be in the same region.

Large Monorepos

The architectures were tested with repositories of varying sizes that follow best practices.

However, large monorepos (several gigabytes or more) can significantly impact the performance of Git and in turn the environment itself. Their presence and how they are used can put a significant strain on the entire system from Gitaly to the underlying infrastructure.

The performance implications are largely software in nature. Additional hardware resources lead to diminishing returns.

If this applies to you, we strongly recommend you follow the linked documentation and reach out to your GitLab representative or our Support team for further guidance.

Large monorepos come with notable cost. If you have such a repository, follow these guidance to ensure good performance and to keep costs in check:

Optimize the large monorepo. Using features such as LFS to not store binaries, and other approaches for reducing repository size, can dramatically improve performance and reduce costs.
Depending on the monorepo, increased environment specifications may be required to compensate. Gitaly might require additional resources along with Praefect, GitLab Rails, and Load Balancers. This depends on the monorepo itself and its usage.
When the monorepo is significantly large (20 gigabytes or more), further additional strategies may be required such as even further increased specifications or in some cases, a separate Gitaly backend for the monorepo alone.
Network and disk bandwidth is another potential consideration with large monorepos. In very heavy cases, bandwidth saturation is possible if there’s a high amount of concurrent clones (such as with CI). Reduce full clones wherever possible in this scenario. Otherwise, additional environment specifications may be required to increase bandwidth. This differs based on cloud providers.

Additional workloads

These architectures have been designed and tested for standard GitLab setups based on real data.

However, additional workloads can multiply the impact of operations by triggering follow-up actions. You might have to adjust the suggested specifications to compensate if you use:

Security software on the nodes.
Hundreds of concurrent CI jobs for large repositories.
Custom scripts that run at high frequency.
Integrations in many large projects.
Server hooks.
System hooks.

Generally, you should have robust monitoring in place to measure the impact of any additional workloads to inform any changes needed to be made. Reach out to your GitLab representative or our Support team for further guidance.

Load Balancers

The architectures make use of up to two load balancers depending on the class:

External load balancer - Serves traffic to any external facing components, primarily Rails.
Internal load balancer - Serves traffic to select internal components that are deployed in an HA fashion such as Praefect or PgBouncer.

The specifics on which load balancer to use, or its exact configuration is beyond the scope of GitLab documentation. The most common options are to set up load balancers on machine nodes or to use a service such as one offered by cloud providers. If deploying a Cloud Native Hybrid environment, the charts can handle the external load balancer setup by using Kubernetes Ingress.

Each architecture class includes a recommended base machine size to deploy directly on machines. However, they may need adjustment based on factors such as the chosen load balancer and expected workload. Of note machines can have varying network bandwidth that should also be taken into consideration.

The following sections provide additional guidance for load balancers.

Balancing algorithm

To ensure equal spread of calls to the nodes and good performance, use a least-connection-based load balancing algorithm or equivalent wherever possible.

We don’t recommend the use of round-robin algorithms as they are known to not spread connections equally in practice.

Network Bandwidth

The total network bandwidth available to a load balancer when deployed on a machine can vary notably across cloud providers. Some cloud providers, like AWS, may operate on a burst system with credits to determine the bandwidth at any time.

The required network bandwidth for your load balancers depends on factors such as data shape and workload. The recommended base sizes for each architecture class have been selected based on real data. However, in some scenarios such as consistent clones of large monorepos, heavy usage of