DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Finally, an ORM That Matches Modern Architectural Patterns!
  • Which Tool Is Better for Code Completion — Azure Data Studio or dbForge SQL Complete?
  • Why "Polyglot Programming" or "Do It Yourself Programming Languages" or "Language Oriented Programming" sucks?

Trending

  • Zero Trust for AWS NLBs: Why It Matters and How to Do It
  • Hybrid Cloud vs Multi-Cloud: Choosing the Right Strategy for AI Scalability and Security
  • Scaling InfluxDB for High-Volume Reporting With Continuous Queries (CQs)
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  1. DZone
  2. Data Engineering
  3. Databases
  4. Doris vs Elasticsearch: A Comparison and Practical Cost Case Study

Doris vs Elasticsearch: A Comparison and Practical Cost Case Study

Apache Doris excels in analytics, SQL support, and cost efficiency, while Elasticsearch leads in text search but has higher storage costs and complexity.

By 
Darren Xu user avatar
Darren Xu
·
Mar. 31, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
2.5K Views

Join the DZone community and get the full member experience.

Join For Free

In the domain of big data real-time analytics and log search, enterprises frequently find themselves choosing between Elasticsearch and Apache Doris. Elasticsearch is well-known for its powerful full-text search and flexible aggregation capabilities. 

On the other hand, Apache Doris, with its distributed MPP architecture, columnar storage, and continuously evolving inverted indexing mechanism, shines in complex aggregations and data analysis. 

This article delves into a comparison of the two solutions from multiple angles, such as architecture design, data ingestion, query optimization, storage management, functional capabilities, operational complexity, and community activity. Special attention is given to cost. Finally, we will present a real-world case study from the Tencent Music Content Library to illustrate the substantial benefits of replacing Elasticsearch with Doris.

1. Architecture Comparison

Doris

Distributed MPP Architecture and Decoupled Frontend/Backend

Doris separates SQL parsing, optimization, and execution. The frontend is responsible for managing metadata and query scheduling, while the backend focuses on data storage and computation. This design makes it easy to scale, isolate faults, and perform efficient parallel processing.

Doris' architecture

Elasticsearch

Shard and Replica-Based Distributed Architecture

Elasticsearch uses inverted indexes along with sharding to distribute data and ensure high availability. It is great at handling unstructured data for full-text search. However, its architecture may face difficulties when dealing with complex aggregations and join queries.

Elasticsearch's architecture

Comparison Insight

Both systems support distributed clusters. But Doris has a more distinct separation of responsibilities for data computation and complex queries. Elasticsearch, on the contrary, places more emphasis on text retrieval.

2. Data Ingestion

Doris

Unified Batch and Streaming Data Ingestion

Doris supports batch loading, streaming synchronization, and real-time writes. It is also MySQL-protocol compatible, which enables seamless integration with existing systems.

Elasticsearch

Real-Time Indexing via RESTful API

Data in Elasticsearch is usually ingested through HTTP interfaces, often with the help of tools like Logstash and Beats. However, building and maintaining indexes at scale can cause significant overhead.

Comparison Insight

Doris is highly optimized for OLAP requirements, featuring high throughput and complex computations. Elasticsearch, on the other hand, emphasizes real-time search and index building.

3. Query Optimization

Doris

Standard SQL and Built-In Optimizer

Doris fully supports MySQL syntax and uses a cost-based optimizer to automatically plan parallel queries. This is especially effective for multi-table joins, aggregations, and complex analyses.

Enhanced With Inverted Indexes

Starting from version 2.0, Doris added support for inverted indexes and full-text search, further reducing query response times.

Elasticsearch

Dedicated JSON-Based DSL

Elasticsearch uses a JSON-based Domain-Specific Language (DSL) that is excellent for keyword matching and text search. But it may be less flexible for complex joins and multi-dimensional aggregations.

Comparison Insight

Doris’s SQL-friendly approach and integrated optimizer allow it to respond more rapidly in complex query scenarios. Elasticsearch, meanwhile, remains strong in text search.

4. Storage Management

Doris

Columnar Storage and High Compression

Data in Doris is stored by column and supports efficient compression algorithms like ZSTD. It often achieves compression ratios of 5 to 10 times, significantly reducing storage costs.

Real-Time Update Support

Doris supports updates and deletions, making it suitable for scenarios that require interactive real-time data.

Elasticsearch

Inverted Index and Redundant Storage

In addition to storing the raw data, Elasticsearch maintains inverted indexes and other auxiliary data (such as forward indexes and column-store copies), which increases storage space requirements.

Comparison Insight

Doris offers better storage efficiency, especially for large-scale data. Elasticsearch’s redundant design helps with retrieval but comes at the cost of higher storage usage.

5. Functional Capabilities

Doris

Standard SQL Interface and Extensibility

Doris is fully compatible with MySQL and supports JDBC/ODBC for integration with various BI tools. It allows for near-instant schema changes (adding, dropping, or modifying fields and indexes) using its Variable data type to auto-expand JSON fields.

Advanced Query and Analytics Features

It provides rich aggregation queries, pre-aggregation, multi-table joins, subqueries, window functions, logical/materialized views, and SQL UDFs and supports external data lake tables (e.g., Hive, Iceberg, Hudi, Paimon).

Diverse Index Support

Besides text inverted and BKD (multi-dimensional numerical) indexes, it supports sparse primary key indexes, BloomFilter (skipping) indexes, and Ngram BloomFilter indexes to meet complex query needs.

Elasticsearch

Dedicated DSL and Dynamic Mapping

Elasticsearch uses a JSON-based DSL for queries and supports dynamic mapping to auto-expand JSON fields. However, it does not support changes in field types; once defined, the schema is usually static.

Focus on Full-Text Search

Its powerful text-inverted indexes and BKD numerical indexes offer excellent keyword search and simple aggregation performance, but it may lack advanced SQL analytics capabilities.

Comparison Insight

Doris’s functional design is open and flexible, covering everything from standard SQL and dynamic schema changes to advanced aggregation and analytics. It is a comprehensive solution for complex data analysis. Elasticsearch, in contrast, is more specialized in text retrieval and basic aggregations.

6. Operational Complexity

Doris

Simplified Operations through Decoupled Design

Doris has integrated monitoring and logging systems, and it supports automatic scaling and fault recovery, which reduces overall operational complexity.

Unified Platform Management

It combines data ingestion, computation, and storage in one system, minimizing the need to coordinate across multiple platforms.

Elasticsearch

Challenging Cluster Tuning

Elasticsearch requires significant expertise for sharding, replica configuration, index mapping, and cross-node data balancing, which increases operational overhead.

Multi-System Coordination Issues

It often needs to be maintained together with other systems (e.g., separate data warehouses), further complicating management.

Comparison Insight

Doris’s more centralized and integrated approach significantly reduces operational costs. Elasticsearch’s tuning and multi-system management require higher expertise and more resources.

7. Community Activity

Doris

Rapidly Growing Open-Source Community

Although relatively new, Doris’s community has been growing rapidly, with an increasing number of case studies, documentation, and plugins.

Elasticsearch

Mature and International Community

Elasticsearch has a large global developer ecosystem and extensive third-party support. However, some advanced features and commercial support require paid licenses.

Comparison Insight

Both communities have their own strengths. Elasticsearch’s community is well-established and global, while Doris’s community is expanding quickly, driven by wide adoption among leading enterprises worldwide.

8. Cost Comparison

When choosing a big data platform, the cost includes not only hardware investment but also operational expenses, development efficiency, and storage costs.

Storage Costs

Doris’s columnar storage and high compression typically reduce storage space requirements by 60% to 80%, significantly lowering hardware and cloud storage expenses. Elasticsearch’s need to maintain multiple index structures usually results in higher storage consumption.

Computational Resource Consumption

Doris is implemented in C++ with support for vectorized execution, leading to lower CPU and memory usage. Under equivalent hardware, Doris can achieve 4x faster write speeds and reduce query latency by over 50%, thereby reducing the need for high-end servers.

Operational and Development Costs

Doris’s SQL-friendly interface and unified management simplify development and operations. Elasticsearch’s DSL and complex tuning add additional labor and expertise costs.

Overall Cost Reduction

Overall, replacing Elasticsearch with Doris can reduce total costs by over 50% in hardware, storage, and maintenance.

9. Practical Case Study: Tencent Music Content Library

Background

Tencent Music Content Library stores and analyzes billions of music-related records, including song details, artist information, album data, and label metadata. As the business grew, the original Elasticsearch-based search and analytics architecture started to show problems. 

There were high index maintenance costs, long query response times, and insufficient support for complex aggregations. To reduce resource usage and enhance real-time analytics capabilities, Tencent Music integrated Apache Doris in key scenarios to replace parts of the Elasticsearch components.

Outcomes

Resource and Cost Advantages

After adopting Doris, the number of required servers decreased significantly. Overall, CPU and memory usage were reduced by about 50%, and storage space requirements were cut by nearly 70% compared to Elasticsearch.

Enhanced Query and Write Performance

For real-time log and content data ingestion, Doris achieved a 3 to 5 times increase in write speed. Complex aggregation query response times were reduced from several seconds to sub-second levels, effectively meeting the needs for real-time monitoring and rapid analysis.

Improved Operational Efficiency

The unified Doris platform combined data ingestion, computation, and storage processes, greatly simplifying the overall system architecture and reducing the complexity of maintaining data consistency across multiple systems.

Case Study Conclusion

Tencent Music Content Library’s experience shows that adopting Apache Doris not only leads to a significant improvement in query performance and data processing capability but also significantly reduces hardware and storage costs. This provides a smoother and more efficient digital music experience for global users.

10. Summary and Recommendations

Summary

From an Architectural and Technical Perspective

Apache Doris uses a distributed MPP architecture and columnar storage to optimize complex queries and big data analysis deeply. It is especially good at high-throughput data ingestion and real-time querying.

Cost Efficiency

Doris reduces hardware, storage, and maintenance costs by over 50% compared to Elasticsearch.

Real-World Impact

As shown in the Tencent Music Content Library case, replacing Elasticsearch components with Doris has solved performance bottlenecks in high-data-volume environments, improved system stability, and reduced overall costs.

Recommendation

For enterprises that need to balance complex aggregation queries, real-time data updates, and low storage costs and want to simplify operations with a unified platform, Apache Doris is a very attractive solution. It is recommended that Doris be tested in a controlled environment, evaluated for performance and cost benefits, and gradually expanded across the business.

Elasticsearch Comparison (grammar) Domain-Specific Language Database

Opinions expressed by DZone contributors are their own.

Related

  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Finally, an ORM That Matches Modern Architectural Patterns!
  • Which Tool Is Better for Code Completion — Azure Data Studio or dbForge SQL Complete?
  • Why "Polyglot Programming" or "Do It Yourself Programming Languages" or "Language Oriented Programming" sucks?

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: