Export failures when sending telemetry data from the EDOT Collector
Serverless EDOT Collector
During high traffic or load testing scenarios, the EDOT Collector might fail to export telemetry data (traces, metrics, or logs) to Elasticsearch. This typically happens when the internal queue for outgoing data fills up faster than it can be drained, resulting in timeouts and dropped data.
You might see one or more of the following messages in the EDOT Collector logs:
bulk indexer flush error: failed to execute the request: context deadline exceededExporting failed. Rejecting data. sending queue is full
These errors indicate the Collector is overwhelmed and unable to export data fast enough, leading to queue overflows and data loss.
This issue typically occurs when the sending_queue configuration or the Elasticsearch cluster scaling is misaligned with the incoming telemetry volume.
Stack
The sending queue is turned off by default. Verify that enabled: true is explicitly set — otherwise any queue configuration will be ignored.
Common contributing factors include:
- Underscaled Elasticsearch cluster is the most frequent cause of persistent export failures. If Elasticsearch cannot index data fast enough, the Collector’s queue fills up.
-
Stack
sending_queue.block_on_overflowis turned off (defaults tofalse), which can lead to data drops. - Sending queue is enabled but
num_consumersis too low to keep up with the incoming data volume. - Sending queue size (
queue_size) is too small for the traffic load. - Both internal and sending queue batching are disabled, increasing processing overhead.
- EDOT Collector resources (CPU, memory) are insufficient for the traffic volume.
Increasing the timeout value (for example from 30s to 90s) doesn't help if the queue itself or Elasticsearch throughput is the bottleneck.
The resolution approach depends on your Elastic Stack version and Collector configuration.
Stack
Enable the sending queue and block on overflow to prevent data drops:
sending_queue:
enabled: true
queue_size: 1000
num_consumers: 10
block_on_overflow: true
Stack
The Elasticsearch exporter provides default sending_queue parameters (including block_on_overflow: true) but these can and often should be tuned for specific workloads.
The following steps can help identify and resolve export bottlenecks:
-
Check the Collector's internal metrics
If internal telemetry is enabled, review these metrics:
otelcol.elasticsearch.bulk_requests.latency— high tail latency suggests Elasticsearch is the bottleneck. Check Elasticsearch cluster metrics and scale if necessary.otelcol.elasticsearch.bulk_requests.countandotelcol.elasticsearch.flushed.bytes— they help assess whether the Collector is sending too many or too large requests. Tunesending_queue.num_consumersor batching configuration to balance throughput.otelcol_exporter_queue_sizeandotelcol_exporter_queue_capacity— if the queue runs near capacity, but Elasticsearch is healthy, increase the queue size or number of consumers.otelcol_enqueue_failed_spans,otelcol_enqueue_failed_metric_points,otelcol_enqueue_failed_log_records— persistent enqueue failures indicate undersized queues or slow consumers.
For a complete list of available metrics, refer to the upstream OpenTelemetry metadata files for the Elasticsearch exporter and exporter helper.
-
Scale the Collector's resources
- Ensure sufficient CPU and memory for the EDOT Collector.
- Scale vertically (more resources) or horizontally (more replicas) as needed.
-
Optimize Elasticsearch performance
Address indexing delays, rejected bulk requests, or shard imbalances that limit ingestion throughput.
Stack Focus tuning efforts on Elasticsearch performance, Collector resource allocation, and queue sizing informed by the internal telemetry metrics above.