使用explain查看生成的执行计划,即拦截第5步之前的物理计划,后续步骤不执行,返回给我用户可读的执行计划,查看执行计划可以了解sql的存在的问题,和一些需要可以优化的改进点。
sql自动换行:关放大阅读展开代码EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query
以下测试都基于 hive on spark
sql自动换行:关放大阅读展开代码explain select city, count(1) numbers from measurement_lite where city_hash = 938117018 and measurement > 30 group by city
查询结果:
主要两个部分 STAGE DEPENDENCIES 和 STAGE PLANS,
STAGE DEPENDENCIES表示各个阶段的依赖关系,Stage-1为根阶段,Stage-0依赖于Stage-1,只有 Stage-1执行完毕,Stage-0才会执行,需要注意 stage-序号,其中序号的大小并不表示依赖的先后顺序。
bash自动换行:关放大阅读展开代码STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1
STAGE PLANS表示各个阶段的执行详情。
Stage-1
sql自动换行:关放大阅读展开代码Map 1 Map Operator Tree: TableScan #进行表扫描 alias: measurement_lite # 需要扫描的表别名为measurement_lite Statistics: Num rows: 2421668 Data size: 494020272 Basic stats: COMPLETE Column stats: NONE # 数据基本情况,行数,数据大小,基本数据情况是否完整,以及列的一些信息 Filter Operator # 数据过滤 predicate: (measurement > 30) (type: boolean) # 过滤条件 Statistics: Num rows: 807222 Data size: 164673288 Basic stats: COMPLETE Column stats: NONE ## 过滤后的数据情况 Select Operator # 列裁剪,筛选后续需要的字段,此处需要city expressions: city (type: string) # 字段类型 outputColumnNames: city # 字段别名 Statistics: Num rows: 807222 Data size: 164673288 Basic stats: COMPLETE Column stats: NONE # 列裁剪后的数据情况 Group By Operator # map端的分组聚合操作,也称为map端预聚合或者combiner aggregations: count() # 聚合方式 keys: city (type: string) # 按照哪些key进行分组聚合 mode: hash outputColumnNames: _col0, _col1 # 输出列名字 Statistics: Num rows: 807222 Data size: 164673288 Basic stats: COMPLETE Column stats: NONE # 数据情况 Reduce Output Operator # 发往reduce时的处理,即map的输出(为key-value结构) key expressions: _col0 (type: string) # key计算表达式,可能为多个列计算得到 sort order: + # 多个列的排序方式,此处只有一个列,且为升序;如果为 +- 表示第一个列升序,第二个列降序 Map-reduce partition columns: _col0 (type: string) # 表示Map阶段输出到Reduce阶段的分区列,在HiveSQL中,可以用distribute by指定分区的列 Statistics: Num rows: 807222 Data size: 164673288 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) # value值字段 Execution mode: vectorized # 表示可以向量化处理
一般的查询操作一次只处理一行数据,在向量化查询执行时一次处理1024行的块来简化系统底层的操作,提高了数据处理的性能
hive向量化模式使用前置条件
sql自动换行:关放大阅读展开代码Reducer 2 Execution mode: vectorized # 向量化处理 Reduce Operator Tree: Group By Operator # 分组聚合操作 aggregations: count(VALUE._col0) # 聚合逻辑 keys: KEY._col0 (type: string) # 聚合作用的列 mode: mergepartial # 聚合模式,值有 hash:随机聚合;mergepartial:合并部分聚合结果;final:最终聚合 outputColumnNames: _col0, _col1 # 聚合输出列 Statistics: Num rows: 403611 Data size: 82336644 Basic stats: COMPLETE Column stats: NONE # 数据量,可以看到通过map端预聚合,到达reducer后数据量减少了 File Output Operator # 文件输出操作 compressed: false # 是否压缩 Statistics: Num rows: 403611 Data size: 82336644 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat # 输入文件类型 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat # 输出文件类型 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe # 读取表数据的序列化和反序列化方式。
Stage-0:
sql自动换行:关放大阅读展开代码Stage: Stage-0 # 阶段Stage-0 Fetch Operator # 客户端获取数据操作 limit: -1 # 获取数据量 -1表示全部 Processor Tree: # 处理器树 ListSink # 数据展示
总结 hivesql执行过程:
通过将过滤条件提前作用于需要查询的数据集达到减少需要处理的数据量,从而提高查询效率的手段(predicate push down)
能否谓词下推取决于谓词的表达式,数据表原本的存储格式(非列式存储不支持)
支持下推的表达式
=, >, <, >=, <=, <>, !=IN / NOT INIS NULL / IS NOT NULLBETWEEN ... AND ...LIKE(部分情况,取决于存储格式)AND, OR(需拆分为多个子谓词)不支持下推的表达式
RAND(), UUID(), CURRENT_TIMESTAMP()SUBSTR(a,1,2) = 'ab',除非是列裁剪优化WHERE id IN (SELECT ... FROM ...)支持谓词下推的存储格式
测试表情况:
sql自动换行:关放大阅读展开代码+----------------------------------------------------------------------+ |createtab_stmt | +----------------------------------------------------------------------+ |CREATE TABLE `mydb.measurement_lite`( | | `city` string, | | `measurement` decimal(10,2)) | |PARTITIONED BY ( | | `city_hash` int) | |ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | |STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | |OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | |LOCATION | | 'hdfs://hadoop102:8020/user/hive/warehouse/mydb.db/measurement_lite'| |TBLPROPERTIES ( | | 'orc.compress'='ZLIB', | | 'transient_lastDdlTime'='1756196960') | +----------------------------------------------------------------------+ +---------------------------------------------------------------+ |createtab_stmt | +---------------------------------------------------------------+ |CREATE TABLE `mydb.city_info`( | | `city_id` int COMMENT '城市ID', | | `city_name` string COMMENT '城市名称', | | `country` string COMMENT '所属国家', | | `province` string COMMENT '省份/州', | | `population` bigint COMMENT '人口数量', | | `longitude` decimal(10,6) COMMENT '经度', | | `latitude` decimal(10,6) COMMENT '纬度', | | `post_code` string COMMENT '邮政编码', | | `area_code` string COMMENT '区号', | | `created_at` timestamp COMMENT '创建时间') | |PARTITIONED BY ( | | `city_hash` int COMMENT '?????') | |ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | |WITH SERDEPROPERTIES ( | | 'field.delim'=',', | | 'serialization.format'=',') | |STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | |OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | |LOCATION | | 'hdfs://hadoop102:8020/user/hive/warehouse/mydb.db/city_info'| |TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'transient_lastDdlTime'='1756197746') | +---------------------------------------------------------------+
两张表都具有 990497647, 981918035,951221382,938117018,90761778这5个分区,在 a left join b的语句中称a为基表,b为从表
sql自动换行:关放大阅读展开代码-- 如下2端sql所表达的含义是不同的 -- 方式1 explain dependency SELECT a.measurement FROM mydb.measurement_lite a LEFT OUTER JOIN city_info b ON a.city_hash = b.city_hash AND a.city_hash BETWEEN 938117018 AND 990497647; -- 方式2 explain dependency SELECT a.measurement FROM mydb.measurement_lite a LEFT OUTER JOIN city_info b ON a.city_hash = b.city_hash WHERE a.city_hash BETWEEN 938117018 AND 990497647;
方式1查询结果:发现2张表都没有进行数据的筛减,即不满足谓词下推条件
json自动换行:关放大阅读展开代码{ "input_tables": [ { "tablename": "mydb@measurement_lite", "tabletype": "MANAGED_TABLE" }, { "tablename": "mydb@city_info", "tabletype": "MANAGED_TABLE" } ], "input_partitions": [ { "partitionName": "mydb@measurement_lite@city_hash=90761778" }, { "partitionName": "mydb@measurement_lite@city_hash=938117018" }, { "partitionName": "mydb@measurement_lite@city_hash=951221382" }, { "partitionName": "mydb@measurement_lite@city_hash=981918035" }, { "partitionName": "mydb@measurement_lite@city_hash=990497647" }, { "partitionName": "mydb@city_info@city_hash=90761778" }, { "partitionName": "mydb@city_info@city_hash=938117018" }, { "partitionName": "mydb@city_info@city_hash=951221382" }, { "partitionName": "mydb@city_info@city_hash=981918035" }, { "partitionName": "mydb@city_info@city_hash=990497647" } ] }
方式2查询结果:发现2张表对都各个表数据进行了筛选
json自动换行:关放大阅读展开代码{ "input_tables": [ { "tablename": "mydb@measurement_lite", "tabletype": "MANAGED_TABLE" }, { "tablename": "mydb@city_info", "tabletype": "MANAGED_TABLE" } ], "input_partitions": [ { "partitionName": "mydb@measurement_lite@city_hash=938117018" }, { "partitionName": "mydb@measurement_lite@city_hash=951221382" }, { "partitionName": "mydb@measurement_lite@city_hash=981918035" }, { "partitionName": "mydb@measurement_lite@city_hash=990497647" }, { "partitionName": "mydb@city_info@city_hash=938117018" }, { "partitionName": "mydb@city_info@city_hash=951221382" }, { "partitionName": "mydb@city_info@city_hash=981918035" }, { "partitionName": "mydb@city_info@city_hash=990497647" } ] }
sql自动换行:关放大阅读展开代码-- 如下两段sql表达含义一致 -- 方式a explain dependency SELECT a.measurement FROM mydb.measurement_lite a LEFT OUTER JOIN city_info b ON a.city_hash = b.city_hash AND b.city_hash BETWEEN 938117018 AND 990497647; -- 方式b explain dependency SELECT a.measurement FROM mydb.measurement_lite a LEFT OUTER JOIN city_info b ON a.city_hash = b.city_hash WHERE b.city_hash BETWEEN 938117018 AND 990497647;
方式a输出:发现基表 measurement_lite 为全分区,从表city_info筛选了分区,谓词下推对从表生效了
json自动换行:关放大阅读展开代码{ "input_tables": [ { "tablename": "mydb@measurement_lite", "tabletype": "MANAGED_TABLE" }, { "tablename": "mydb@city_info", "tabletype": "MANAGED_TABLE" } ], "input_partitions": [ { "partitionName": "mydb@measurement_lite@city_hash=90761778" }, { "partitionName": "mydb@measurement_lite@city_hash=938117018" }, { "partitionName": "mydb@measurement_lite@city_hash=951221382" }, { "partitionName": "mydb@measurement_lite@city_hash=981918035" }, { "partitionName": "mydb@measurement_lite@city_hash=990497647" }, { "partitionName": "mydb@city_info@city_hash=938117018" }, { "partitionName": "mydb@city_info@city_hash=951221382" }, { "partitionName": "mydb@city_info@city_hash=981918035" }, { "partitionName": "mydb@city_info@city_hash=990497647" } ] }
方式b输出:基表和从表都未进行数据的裁剪
json自动换行:关放大阅读展开代码{ "input_tables": [ { "tablename": "mydb@measurement_lite", "tabletype": "MANAGED_TABLE" }, { "tablename": "mydb@city_info", "tabletype": "MANAGED_TABLE" } ], "input_partitions": [ { "partitionName": "mydb@measurement_lite@city_hash=90761778" }, { "partitionName": "mydb@measurement_lite@city_hash=938117018" }, { "partitionName": "mydb@measurement_lite@city_hash=951221382" }, { "partitionName": "mydb@measurement_lite@city_hash=981918035" }, { "partitionName": "mydb@measurement_lite@city_hash=990497647" }, { "partitionName": "mydb@city_info@city_hash=90761778" }, { "partitionName": "mydb@city_info@city_hash=938117018" }, { "partitionName": "mydb@city_info@city_hash=951221382" }, { "partitionName": "mydb@city_info@city_hash=981918035" }, { "partitionName": "mydb@city_info@city_hash=990497647" } ] }
综上:left outer join中针对右表非等值条件on和where查询数据左表都是全表扫描,右表on条件是条件过滤,where条件是全表扫描。
测试不同的join情况,得出如下结论:
因此对inner join 和 full outer join,过滤条件写在 on条件还是where条件都是相同的效果;full outer join无论过滤条件写在on还是where都不会进行谓词下推;只有left/right outer join需要分情况讨论。
sql自动换行:关放大阅读展开代码explain SELECT t1.province, COUNT(1) numbers FROM city_info t1 LEFT JOIN mydb.measurement_lite t2 ON t1.city_hash = t2.city_hash AND t2.measurement > 30 GROUP BY t1.province;
由于 measurement_lite表数据量很少,大概2.3GB,我想使用map join的方式。查看目前的执行计划:
大概的流程就是,分别读取 measurement_lite 和 city_info的数据,采用的PARTITION-LEVEL SORT(每个 Map 任务会先在内部对自己将要输出到各个分区的数据进行排序。也就是说,发往同一个 Reducer 的数据,在到达之前就已经是按照 Key(city_hash)排好序的)。其中 measurement_lite读取时谓词下推生效了,即过滤条件t2.measurement > 30;接下来进行第一次按照字段city_hash的shffule,在Reduce 2内执行 left outer join操作,其中也进行简单的count()聚合操作 Group By Operate;在Reduce 3 执行最终按照 privince_code的聚合;stage-0 执行数据的抓取。这个过程就是一个sort merge join的过程。整个过程耗时 4s左右。
想要改造为 map join的方式:
sql自动换行:关放大阅读展开代码set hive.auto.convert.join = true; set hive.mapjoin.smalltable.filesize = 900000000; -- 3. 【关键】开启更现代的 MapJoin 实现方式,避免独立的条件任务 set hive.auto.convert.join.noconditionaltask = true; -- 4. 【关键】这个参数与上面的 noconditionaltask 配合使用,指定大小检查的阈值 set hive.auto.convert.join.noconditionaltask.size = 900000000; explain SELECT /*+ MAPJOIN(t2) */ t1.province, COUNT(1) numbers FROM city_info t1 LEFT JOIN mydb.measurement_lite t2 ON t1.city_hash = t2.city_hash AND t2.measurement > 30 GROUP BY t1.province
计划解读: stage-2 为小表准备阶段,在内存中构建一个以 Join Key (city_hash) 为键的哈希表。这个哈希表随后会被分发(广播)到每一个即将处理大表数据的节点上
stage-1 Map 1为执行map join的主要过程,Reduce 2为最终结果的聚合。
stage-0 数据的抓取展示
整个查询过程大概 2s左右,可以看到执行上是有优化的。
map join生效的条件:
measurement_lite全部为2.3GB,通过谓词下推后大小有790MB左右map join生效的执行过程特定:有一个阶段读取小表构建hash表,并将数据写入到hdfs上,并构建spark的分布式缓存,并接下来的阶段就是读取小表数据并进行map join
分区是为了减少数据查询时扫描的范围,在hdfs上体现为具体的目录;而分桶的目的是为了使得存储具体数据的文件大小更加均匀,一定程度山避免数据倾斜。
需要的分桶表:
sql自动换行:关放大阅读展开代码CREATE TABLE city_info_bucketed ( city_id INT COMMENT '城市ID', city_name STRING COMMENT '城市名称', country STRING COMMENT '所属国家', province STRING COMMENT '省份/州', population BIGINT COMMENT '人口数量', longitude DECIMAL(10, 6) COMMENT '经度', latitude DECIMAL(10, 6) COMMENT '纬度', post_code STRING COMMENT '邮政编码', area_code STRING COMMENT '区号', created_at TIMESTAMP COMMENT '创建时间', city_hash int COMMENT '城市哈希值' ) CLUSTERED BY (city_hash) INTO 32 BUCKETS STORED AS ORC; CREATE TABLE measurement_lite_bucketed ( `city` string comment '城市名字', `measurement` decimal(10,2) comment '温度测量值', `city_hash` int COMMENT '城市哈希值' ) CLUSTERED BY (city_hash) INTO 32 BUCKETS STORED AS ORC;
sql自动换行:关放大阅读展开代码set hive.auto.convert.join = true; set hive.mapjoin.smalltable.filesize = 900000000; -- 3. 【关键】开启更现代的 MapJoin 实现方式,避免独立的条件任务 set hive.auto.convert.join.noconditionaltask = true; -- 4. 【关键】这个参数与上面的 noconditionaltask 配合使用,指定大小检查的阈值 set hive.auto.convert.join.noconditionaltask.size = 900000000; set hive.optimize.bucketmapjoin = true; EXPLAIN SELECT t1.province, COUNT(1) AS numbers FROM city_info_bucketed t1 LEFT JOIN measurement_lite_bucketed t2 ON t1.city_hash = t2.city_hash AND t2.measurement > 30 GROUP BY t1.province;
执行计划:
可以看到就是普通的map join。
bucket map join生效的条件在满足map join的条件上还需要:
按 Join Key 进行分桶 (Bucketing on Join Keys)
桶内按 Join Key 排序 (Sorting within Buckets)
桶数量成整数倍 (Bucket Count Relationship)
创建对应的分桶表并且按照join key排序
sql自动换行:关放大阅读展开代码CREATE TABLE city_info_smb ( city_id INT, city_name STRING, country STRING, province STRING, population BIGINT, longitude DECIMAL(10, 6), latitude DECIMAL(10, 6), post_code STRING, area_code STRING, created_at TIMESTAMP, city_hash INT ) -- 黄金法则: 1.按Join Key分桶; 2.桶内按Join Key排序 CLUSTERED BY (city_hash) SORTED BY (city_hash) INTO 32 BUCKETS STORED AS ORC; -- 创建 measurement_lite 的 SMB 版本 CREATE TABLE measurement_lite_smb ( city STRING, measurement DECIMAL(10,2), city_hash INT ) -- 黄金法则: 1.按Join Key分桶; 2.桶内按Join Key排序; 3.桶数量相等 CLUSTERED BY (city_hash) SORTED BY (city_hash) INTO 32 BUCKETS STORED AS ORC; EXPLAIN SELECT t1.province, COUNT(1) AS numbers -- 使用分桶键进行 Join FROM city_info_smb t1 LEFT JOIN measurement_lite_smb t2 ON t1.city_hash = t2.city_hash AND t2.measurement > 30 GROUP BY t1.province;
通过执行计划可以看出 Sorted Merge Bucket Map Join的特点:
在 map join的基础上省去了小表哈希表构建和广播的过程,从数据被读取(TableScan)到数据被连接(SMB Map Join Operator),中间完全没有 Reduce Output Operator。这证明了数据没有经过任何网络 Shuffle 就完成了 Join 操作
Sorted Merge Bucket (SMB) Join
Map Join (Broadcast Join)
Common Join (Shuffle Join / Sort-Merge Join)
查看授权信息,在不实际执行查询的情况下,告诉你当前用户是否有足够的权限来运行这条查询,并列出所有必需的权限
sql自动换行:关放大阅读展开代码INPUTS: mydb@measurements_partitions mydb@measurements_partitions@city_hash=-1022324546 mydb@measurements_partitions@city_hash=-1034724598 mydb@measurements_partitions@city_hash=-1041684677 mydb@measurements_partitions@city_hash=-1072080229 OUTPUTS: hdfs://hadoop102:8020/tmp/hive/atguigu/f9e3d1db-18e4-4f9b-bae0-b249b3524fe2/hive_2025-08-28_10-57-52_088_1123441659099374210-2/-mr-10001 CURRENT_USER: atguigu OPERATION: QUERY
实际执行一个SQL查询,并在执行后返回一个带有真实运行时统计信息的详细执行计划
sql自动换行:关放大阅读展开代码explain ANALYZE select city, count(1) from measurements_partitions t where city_hash in (-1414109858,-1512161185) group by t.city;
参考文档:
本文作者:hedeoer
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!