Tasks

Sentry's task platform is designed to scale horizontally to enable high-throughput processing. The task platform is composed of a few components:

Copied

flowchart

Sentry -- produce task activation --> k[(Kafka)]
k -- consume messages --> Taskbroker
Worker -- grpc GetTask --> Taskbroker
Worker -- execute task --> Worker
Worker -- grpc SetTaskStatus --> Taskbroker

Brokers and workers are paired together to create 'processing pools' for tasks. Brokers and workers can be scaled horizontally to increase parallelism.

By default, self-hosted installs come with a single broker & worker replica. You can increase processing capacity by adding more concurrency to the single worker (via the --concurrency option on the worker), or by adding additional worker, and broker replicas. It is not recommended to go above 24 worker replicas per broker as broker performance can degrade with higher worker counts.

If your deployment requires additional processing capacity, you can add additional broker replicas and use CLI options to inform the workers of the broker addresses:

Copied

sentry run taskworker --rpc-host-list=sentry-broker-default-0:50051,sentry-broker-default-1:50051

Workers use client-side loadbalancing to distribute load across the brokers they have been assigned to.

If you only want to scale taskworker replicas and you have a single taskbroker, you can easily increase it (up to 24 replicas) by modifying your docker-compose.override.yml file:

Copied

services:
  taskworker:
    deploy:
      replicas: 5

This will spawn 5 replicas of the taskworker service. Bear in mind that although the processing capacity is increased, you will need to monitor and scale your system resources (CPU and RAM) accordingly.

If you have reached the maximum number of taskworker replicas, you can scale your taskbroker replicas. First, you need to increase the number of partitions on "taskworker" topic in Kafka, the number of partitions should be evenly divisible by the number of taskbroker replicas that you're planning to scale to. Then due to the limited capability of Docker Compose, you will need to manually scale your taskbroker replicas. You can do this by adding more container declarations to your docker-compose.override.yml file:

Copied

services:
  taskbroker-beta:
    restart: "unless-stopped"
    image: "$TASKBROKER_IMAGE"
    environment:
      TASKBROKER_KAFKA_CLUSTER: "kafka:9092"
      TASKBROKER_KAFKA_DEADLETTER_CLUSTER: "kafka:9092"
      TASKBROKER_DB_PATH: "/opt/sqlite/taskbroker-activations-beta.sqlite"
    volumes:
      - sentry-taskbroker:/opt/sqlite
    depends_on:
      - kafka
  taskbroker-charlie:
    restart: "unless-stopped"
    image: "$TASKBROKER_IMAGE"
    environment:
      TASKBROKER_KAFKA_CLUSTER: "kafka:9092"
      TASKBROKER_KAFKA_DEADLETTER_CLUSTER: "kafka:9092"
      TASKBROKER_DB_PATH: "/opt/sqlite/taskbroker-activations-charlie.sqlite"
    volumes:
      - sentry-taskbroker:/opt/sqlite
    depends_on:
      - kafka

Note that each taskbroker replica needs their own SQLite database per replica, to prevent issues contention and locks between replicas.

Finally, you need to modify taskworker command to have rpc-host-list pointing to the new brokers:

Copied

taskworker:
  <<: *sentry_defaults
  command: run taskworker --concurrency=4 --rpc-host-list=taskbroker:50051,taskbroker-beta:50051,taskbroker-charlie:50051 --health-check-file-path=/tmp/health.txt
  healthcheck:
    <<: *file_healthcheck_defaults

If you are running on Kubernetes, you can utilize --rpc-host and --num-brokers instead if you are using StatefulSet. Otherwise, you can use --rpc-host-list if you have a different host name pattern for the brokers.

In higher throughput installations, you may also want to isolate task workloads from each other to ensure timely processing of lower volume tasks. For example, you could isolate ingestion related tasks from other work:

Copied

flowchart

Sentry -- produce tasks --> k[(Kafka)]
k -- topic-taskworker-ingest --> tb-i[Taskbroker ingest]
k -- topic-taskworker --> tb-d[Taskbroker default]
tb-i --> w-i[ingest Worker]
tb-d --> w-d[default Worker]

To achieve this work separation we need to make a few changes:

Provision any additional topics. Topic names need to come from one of the predefined topics in src/sentry/conf/types/kafka_definition.py
Deploy the additional broker replicas. You can use the TASKBROKER_KAFKA_TOPIC environment variable to define the topic a taskbroker consumes from.
Deploy additional workers that use the new brokers in their rpc-host-list CLI flag.
Find the list of namespaces you want to shift to the new topic. The list of task namespaces can be found in the sentry.taskworker.namespaces module.
Update task routing option, defining the namespace -> topic mappings. e.g.
Copied
# in sentry/config.yml taskworker.route.overrides: "ingest.errors": "taskworker-ingest" "ingest.transactions": "taskworker-ingest"

Was this helpful?

Help improve this content
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").

How to contribute | Edit this page | Create a docs issue | Get support

Tasks

System Architecture

Scaling workers

Scaling brokers

Isolate Workload Separation