本篇文章为大家展示了如何实现ClickHouse与 Elasticsearch聚合性能对比测试,内容简明扼要并且容易理解,绝对能使你眼前一亮,通过这篇文章的详细介绍希望你能有所收获。
Elasticsearch以其优秀的分布式架构与全文搜索引擎等特点在机器数据的存储、分析领域广为使用,但随着数据量的增长,其聚合分析性能已无法满足业务需求。而ClickHouse作为一个高性能的OLAP列式数据库管理系统有望解决这一痛点。
本文是对ClickHouse与Elasticsearch聚合性能的简单对比测试。主要关注查询语句的响应时间,暂不考虑资源占用情况。
组件 | 版本 | CPU | 内存 |
---|---|---|---|
ClickHouse | 7.9.0 | 4C | 8G |
Elasticsearch | 20.11.4.13 | 4C | 8G |
使用ClickHouse官方提供的测试数据集,共67G,约6亿行。
其中,ClickHouse使用LO_ORDERDATE字段作为分区键,使用LO_ORDERDATE, LO_ORDERKEY作为排序键。
# ClickHouseSELECT LO_SHIPMODE,COUNT() FROM lineorder GROUP BY LO_SHIPMODE ORDER BY COUNT() DESC LIMIT 10# ElasticsearchGET lineorder/_search{ "aggs": { "1": { "terms": { "field": "LO_SHIPMODE.keyword", "order": { "_count": "desc" }, "size": 10 } } }, "size": 0}
# ClickHouseSELECT toYear(LO_ORDERDATE),COUNT() FROM lineorder GROUP BY toYear(LO_ORDERDATE) FORMAT PrettyCompactMonoBlock# ElasticsearchGET lineorder/_search{ "aggs": { "2": { "date_histogram": { "field": "LO_ORDERDATE", "calendar_interval":"1y", "format":"yyyy-MM-dd" } } }, "size": 0}
# ClickHouseSELECT LO_ORDERDATE,LO_ORDERKEY,LO_SHIPMODE,LO_ORDERPRIORITY,LO_COMMITDATE FROM lineorder WHERE LO_ORDERDATE >= '1992-01-01' AND LO_ORDERDATE < '1993-01-01' ORDER BY LO_ORDERDATE LIMIT 500# ElasticsearchGET lineorder/_search{ "size": 500, "sort": [ { "timestamp": { "order": "desc", "unmapped_type": "boolean" } } ], "query": { "bool": { "must": [], "filter": [ { "match_all": {} }, { "match_all": {} }, { "range": { "LO_ORDERDATE": { "gte": "1992-01-01", "lte": "1993-01-01", "format": "strict_date_optional_time" } } } ], "should": [], "must_not": [] } }}
# ClickHouseSELECT toYear(LO_ORDERDATE),LO_SHIPMODE,COUNT() FROM lineorder GROUP BY toYear(LO_ORDERDATE),LO_SHIPMODE ORDER BY toYear(LO_ORDERDATE) FORMAT PrettyCompactMonoBlock# ElasticsearchGET lineorder/_search{ "aggs": { "3": { "terms": { "field": "LO_SHIPMODE.keyword", "order": { "_count": "desc" }, "size": 10 }, "aggs": { "2": { "date_histogram": { "field": "LO_ORDERDATE", "calendar_interval": "1y", "time_zone": "Asia/Shanghai", "min_doc_count": 1 } } } } }, "size": 0}
# ClickHouseSELECT toYear(LO_ORDERDATE),LO_SHIPMODE,COUNT() FROM lineorder GROUP BY toYear(LO_ORDERDATE),LO_SHIPMODE ORDER BY toYear(LO_ORDERDATE) FORMAT PrettyCompactMonoBlock# ElasticsearchGET lineorder/_search{ "aggs": { "3": { "terms": { "field": "LO_SHIPMODE.keyword", "order": { "_count": "desc" }, "size": 10 }, "aggs": { "2": { "date_histogram": { "field": "LO_ORDERDATE", "calendar_interval": "1y", "time_zone": "Asia/Shanghai", "min_doc_count": 1 } } } } }, "size": 0}
# ClickHouseSELECT LO_SHIPMODE,COUNT(LO_SHIPMODE),LO_ORDERPRIORITY,COUNT(LO_ORDERPRIORITY) FROM lineorder GROUP BY LO_SHIPMODE,LO_ORDERPRIORITY ORDER BY COUNT(LO_SHIPMODE),COUNT(LO_ORDERPRIORITY) LIMIT 5 BY LO_SHIPMODE,LO_ORDERPRIORITY# ElasticsearchGET lineorder/_search{ "aggs": { "2": { "terms": { "field": "LO_SHIPMODE.keyword", "order": { "_count": "desc" }, "size": 5 }, "aggs": { "3": { "terms": { "field": "LO_ORDERPRIORITY.keyword", "order": { "_count": "desc" }, "size": 5 } } } } }, "size": 0}
聚合场景 | ck(ms) | es(ms) | 性能对比 |
---|---|---|---|
基于时间的多字段聚合 | 5506 | 15599 | 近3倍 |
多个字段按年进行计数(数据表) | 381 | 6267 | 16倍多 |
某字段出现次数 TOP 10(饼图) | 4048 | 7317 | 近2倍 |
某字段按年进行计数(时间趋势图) | 901 | 23257 | 25倍多 |
聚合嵌套(非时间字段) | 6937 | 15767 | 2倍多 |
相同数据量下,ClickHouse的聚合性能都要优于Elasticsearch,且如果基于排序键进行聚合,性能更好,是ES的数倍。
此外,ClickHouse的SummaryMergeTree、AggregatingMergeTree表引擎支持后台自动聚合数据,所以在某些场景下其聚合分析性能会更优。
上述内容就是如何实现ClickHouse与 Elasticsearch聚合性能对比测试,你们学到知识或技能了吗?如果还想学到更多技能或者丰富自己的知识储备,欢迎关注亿速云行业资讯频道。
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:https://my.oschina.net/u/4899528/blog/4880571