Loading

Export failures when sending telemetry data from the EDOT Collector

Serverless EDOT Collector

During high traffic or load testing scenarios, the EDOT Collector might fail to export telemetry data (traces, metrics, or logs) to Elasticsearch. This typically happens when the internal queue for outgoing data fills up faster than it can be drained, resulting in timeouts and dropped data.

You might see one or more of the following messages in the EDOT Collector logs:

  • bulk indexer flush error: failed to execute the request: context deadline exceeded
  • Exporting failed. Rejecting data. sending queue is full

These errors indicate the Collector is overwhelmed and unable to export data fast enough, leading to queue overflows and data loss.

This issue typically occurs when the sending_queue configuration or the Elasticsearch cluster scaling is misaligned with the incoming telemetry volume.

Important

Stack GA 9.0.0 The sending queue is turned off by default. Verify that enabled: true is explicitly set — otherwise any queue configuration will be ignored.

Common contributing factors include:

  • Underscaled Elasticsearch cluster is the most frequent cause of persistent export failures. If Elasticsearch cannot index data fast enough, the Collector’s queue fills up.
  • Stack GA 9.0.0 sending_queue.block_on_overflow is turned off (defaults to false), which can lead to data drops.
  • Sending queue is enabled but num_consumers is too low to keep up with the incoming data volume.
  • Sending queue size (queue_size) is too small for the traffic load.
  • Both internal and sending queue batching are disabled, increasing processing overhead.
  • EDOT Collector resources (CPU, memory) are insufficient for the traffic volume.
Note

Increasing the timeout value (for example from 30s to 90s) doesn't help if the queue itself or Elasticsearch throughput is the bottleneck.

The resolution approach depends on your Elastic Stack version and Collector configuration.

Stack GA 9.0.0

Enable the sending queue and block on overflow to prevent data drops:

sending_queue:
  enabled: true
  queue_size: 1000
  num_consumers: 10
  block_on_overflow: true
		

Stack Planned

The Elasticsearch exporter provides default sending_queue parameters (including block_on_overflow: true) but these can and often should be tuned for specific workloads.

The following steps can help identify and resolve export bottlenecks:

  1. Check the Collector's internal metrics

    If internal telemetry is enabled, review these metrics:

    • otelcol.elasticsearch.bulk_requests.latency — high tail latency suggests Elasticsearch is the bottleneck. Check Elasticsearch cluster metrics and scale if necessary.

    • otelcol.elasticsearch.bulk_requests.count and otelcol.elasticsearch.flushed.bytes — they help assess whether the Collector is sending too many or too large requests. Tune sending_queue.num_consumers or batching configuration to balance throughput.

    • otelcol_exporter_queue_size and otelcol_exporter_queue_capacity — if the queue runs near capacity, but Elasticsearch is healthy, increase the queue size or number of consumers.

    • otelcol_enqueue_failed_spans, otelcol_enqueue_failed_metric_points, otelcol_enqueue_failed_log_records — persistent enqueue failures indicate undersized queues or slow consumers.

    For a complete list of available metrics, refer to the upstream OpenTelemetry metadata files for the Elasticsearch exporter and exporter helper.

  2. Scale the Collector's resources

    • Ensure sufficient CPU and memory for the EDOT Collector.
    • Scale vertically (more resources) or horizontally (more replicas) as needed.
  3. Optimize Elasticsearch performance

    Address indexing delays, rejected bulk requests, or shard imbalances that limit ingestion throughput.

Tip

Stack Planned Focus tuning efforts on Elasticsearch performance, Collector resource allocation, and queue sizing informed by the internal telemetry metrics above.