温馨提示×

温馨提示×

您好,登录后才能下订单哦!

密码登录×
登录注册×
其他方式登录
点击 登录注册 即表示同意《亿速云用户服务条款》

hbase之宽表与窄表对split的影响

发布时间:2020-06-24 22:24:03 阅读:3687 作者:yyj0531 栏目:关系型数据库
开发者测试专用服务器限时活动,0元免费领,库存有限,领完即止! 点击查看>>

     hbase的hbase.hregion.max.filesize属性值用来指定region分割的阀值, 该值默认为268435456(256MB), 当一个列族文件大小超过该值时,将会分裂成两个region。
     hbase的列可以有很多,设计时有两种方式可选择, 宽表(一行有很多列)和窄表
如有一个存储用户邮件的表
按宽表设计时,可以表示成(一个用户的所有邮件存成一行)
userid1 email1 emali2 email3 ... ... ... ... ... emailn
userid2 email1 emali2 email3 ... ... ... ... ... emailn
useridn                 
按窄表设计时,可以表示成(rowkey由用ID和emailID组成)
userid1_emialid1  email1
userid1_emialid2  email2
userid1_emialid3  email2
userid1_emialidn  emailn
userid2_emialid1  email1
userid2_emialid2  email2
userid2_emialid3  email3
userid2_emialidn  emailn
这两种设计方法会对region的分割造成影响, 今天在看HFileOutputFormat代码时发现它new出的RecordWriter对 region分割有一定的限制,

只有当rowkey不同是才会做分割, 而rowkey相同时即使region大小已经超过hbase.hregion.max.filesize值, 也不会分割
RecordWriter代码:

public void write(ImmutableBytesWritable row, KeyValue kv)         throws IOException {           long length = kv.getLength();           byte [] family = kv.getFamily();           WriterLength wl = this.writers.get(family);           if (wl == null || ((length + wl.written) >= maxsize) &&               Bytes.compareTo(this.previousRow, 0this.previousRow.length,                 kv.getBuffer(), kv.getRowOffset(), kv.getRowLength()) != 0) {             // Get a new writer.             Path basedir = new Path(outputdir, Bytes.toString(family));             if (wl == null) {               wl = new WriterLength();               this.writers.put(family, wl);               if (this.writers.size() > 1) throw new IOException("One family only");               // If wl == null, first file in family.  Ensure family dir exits.               if (!fs.exists(basedir)) fs.mkdirs(basedir);             }             wl.writer = getNewWriter(wl.writer, basedir);             LOG.info("Writer=" + wl.writer.getPath() +               ((wl.written == 0)? "": ", wrote=" + wl.written));             wl.written = 0;           }           kv.updateLatestStamp(this.now);           wl.writer.append(kv);           wl.written += length;           // Copy the row so we know when a row transition.           this.previousRow = kv.getRow();         }   

标红加粗部分说明当块大小大于hbase.hregion.max.filesize值, 并却当前行与上一次插入的行不同时才会分割region.
1. 宽表情况下, 单独一行大小超过hbase.hregion.max.filesize值, 不会做分割
2. 相同rowkey下插入很多不同版本的记录,即使大小超过hbase.hregion.max.filesize值, 也不会做分割

下面就来验证下:
为了尽早看到效果, 需要在hbase-site.xml中修改两个配置参数

<property>       <name>hbase.hregion.memstore.flush.size</name>       <value>5</value>       <description>       Memstore will be flushed to disk if size of the memstore       exceeds this number of bytes.  Value is checked by a thread that runs       every hbase.server.thread.wakefrequency.       </description>     </property>   <property>       <name>hbase.hregion.max.filesize</name>       <value>10</value>       <description>       Maximum HStoreFile size. If any one of a column families' HStoreFiles has       grown to exceed this value, the hosting HRegion is split in two.       Default: 256M.       </description>     </property>   

 建测试表t1和t2

hbase(main):076:0* create 't1','f1'  0 row(s) in 1.6460 seconds   hbase(main):077:0> create 't2','f1'  0 row(s) in 1.1790 seconds  

查看系统表 .META.

hbase(main):081:0* scan '.META.'  ROW                                                 COLUMN+CELL                                                                                                                                             t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo, timestamp=1314720667384, value=REGION => {NAME => 't1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.'STARTKEY => ''ENDK   .                                                  EY => ''ENCODED => d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SCOPE                                                       => '0'COMPRESSION => 'NONE'VERSIONS => '3'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}               t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server, timestamp=1314720667941, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode, timestamp=1314720667941, value=1314716290123                                                                               .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo, timestamp=1314720672241, value=REGION => {NAME => 't2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.'STARTKEY => ''ENDK   .                                                  EY => ''ENCODED => 16bb3d2563eab3b4e25477c64e007e71, TABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SCOPE                                                       => '0'COMPRESSION => 'NONE'VERSIONS => '3'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}               t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server, timestamp=1314720672346, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode, timestamp=1314720672346, value=1314716290123                                                                               .                                                                                                                                                                                                         2 row(s) in 0.0230 seconds  

可以看到此时,t1,t2都已有一个region
先往t1表插入10条记录,rowkwy相同

hbase(main):086:0* for i in 0..9 do\  hbase(main):087:1* put 't1','row1',"f1:c#{i}","swallow#{i}"\  hbase(main):088:1* end  0 row(s) in 0.0180 seconds   0 row(s) in 0.0070 seconds   0 row(s) in 0.0420 seconds   0 row(s) in 0.0620 seconds   0 row(s) in 0.0120 seconds   0 row(s) in 0.0770 seconds   0 row(s) in 0.0150 seconds   0 row(s) in 0.1290 seconds   0 row(s) in 10.0740 seconds   0 row(s) in 0.1230 seconds  => 0..9  hbase(main):089:0>  

查看t1记录

hbase(main):089:0> scan 't1'  ROW                                                 COLUMN+CELL                                                                                                                                             row1                                               column=f1:c0, timestamp=1314720946495value=swallow0                                                                                                   row1                                               column=f1:c1, timestamp=1314720946507value=swallow1                                                                                                   row1                                               column=f1:c2, timestamp=1314720946903value=swallow2                                                                                                   row1                                               column=f1:c3, timestamp=1314720946939value=swallow3                                                                                                   row1                                               column=f1:c4, timestamp=1314720946976value=swallow4                                                                                                   row1                                               column=f1:c5, timestamp=1314720947055value=swallow5                                                                                                   row1                                               column=f1:c6, timestamp=1314720947070value=swallow6                                                                                                   row1                                               column=f1:c7, timestamp=1314720947198value=swallow7                                                                                                   row1                                               column=f1:c8, timestamp=1314720957272value=swallow8                                                                                                   row1                                               column=f1:c9, timestamp=1314720957392value=swallow9                                                                                                  1 row(s) in 0.0300 seconds 

查看 .META.

hbase(main):090:0> scan '.META.'  ROW                                                 COLUMN+CELL                                                                                                                                             t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo, timestamp=1314720667384, value=REGION => {NAME => 't1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.'STARTKEY => ''ENDK   .                                                  EY => ''ENCODED => d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SCOPE                                                       => '0'COMPRESSION => 'NONE'VERSIONS => '3'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}               t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server, timestamp=1314720667941, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode, timestamp=1314720667941, value=1314716290123                                                                               .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo, timestamp=1314720672241, value=REGION => {NAME => 't2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.'STARTKEY => ''ENDK   .                                                  EY => ''ENCODED => 16bb3d2563eab3b4e25477c64e007e71, TABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SCOPE                                                       => '0'COMPRESSION => 'NONE'VERSIONS => '3'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}               t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server, timestamp=1314720672346, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode, timestamp=1314720672346, value=1314716290123                                                                               .                                                                                                                                                                                                         2 row(s) in 0.0210 seconds  

可以看到t1仍旧只有一个region

接下去往往t2表插入10条相同记录,但rowkwy不同

hbase(main):091:0> for i in 0..9 do\                          hbase(main):092:1* put 't2',"row#{i}","f1:c#{i}","swallow#{i}"\  hbase(main):093:1* end  0 row(s) in 0.1140 seconds   0 row(s) in 0.0080 seconds   0 row(s) in 0.0410 seconds   0 row(s) in 0.0820 seconds   0 row(s) in 0.0210 seconds   0 row(s) in 0.0410 seconds   0 row(s) in 0.0200 seconds   0 row(s) in 0.1210 seconds   0 row(s) in 0.0140 seconds   0 row(s) in 0.0360 seconds  => 0..9  

查看t2记录

hbase(main):097:0* scan 't2'  ROW                                                 COLUMN+CELL                                                                                                                                             row0                                               column=f1:c0, timestamp=1314721110769value=swallow0                                                                                                   row1                                               column=f1:c1, timestamp=1314721110787value=swallow1                                                                                                   row2                                               column=f1:c2, timestamp=1314721110830value=swallow2                                                                                                   row3                                               column=f1:c3, timestamp=1314721110916value=swallow3                                                                                                   row4                                               column=f1:c4, timestamp=1314721110932value=swallow4                                                                                                   row5                                               column=f1:c5, timestamp=1314721110971value=swallow5                                                                                                   row6                                               column=f1:c6, timestamp=1314721110989value=swallow6                                                                                                   row7                                               column=f1:c7, timestamp=1314721111121value=swallow7                                                                                                   row8                                               column=f1:c8, timestamp=1314721111130value=swallow8                                                                                                   row9                                               column=f1:c9, timestamp=1314721111172value=swallow9                                                                                                  10 row(s) in 1.0450 seconds  

查看 .META.

hbase(main):102:0> scan '.META.'  ROW                                                 COLUMN+CELL                                                                                                                                             t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:regioninfo, timestamp=1314720667384, value=REGION => {NAME => 't1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad.'STARTKEY => ''ENDK   .                                                  EY => ''ENCODED => d8acd6bc659ac8326b88850d645a90ad, TABLE => {{NAME => 't1'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SCOPE                                                       => '0'COMPRESSION => 'NONE'VERSIONS => '3'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}               t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:server, timestamp=1314720667941, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t1,,1314720667274.d8acd6bc659ac8326b88850d645a90ad column=info:serverstartcode, timestamp=1314720667941, value=1314716290123                                                                               .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:regioninfo, timestamp=1314721112130, value=REGION => {NAME => 't2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71.'STARTKEY => ''ENDK   .                                                  EY => ''ENCODED => 16bb3d2563eab3b4e25477c64e007e71, OFFLINE => trueSPLIT => trueTABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILT                                                      ER => 'NONE'REPLICATION_SCOPE => '0'VERSIONS => '3'COMPRESSION => 'NONE'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOC                                                      KCACHE => 'true'}]}}                                                                                                                                    t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:server, timestamp=1314720672346, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:serverstartcode, timestamp=1314720672346, value=1314716290123                                                                               .                                                                                                                                                                                                          t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:splitA, timestamp=1314721112130, value=REGION => {NAME => 't2,,1314721111490.71df02214242923574b71fe5e2a19360.'STARTKEY => ''ENDKEY =   .                                                  > 'row0'ENCODED => 71df02214242923574b71fe5e2a19360, TABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SCOPE                                                       => '0'VERSIONS => '3'COMPRESSION => 'NONE'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}               t2,,1314720672168.16bb3d2563eab3b4e25477c64e007e71 column=info:splitB, timestamp=1314721112130, value=REGION => {NAME => 't2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b061ca.'STARTKEY => 'row0',    .                                                  ENDKEY => ''ENCODED => 915ee8d4a32c59a4ec3960e335b061ca, TABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SC                                                      OPE => '0'VERSIONS => '3'COMPRESSION => 'NONE'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}           t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:regioninfo, timestamp=1314721112267, value=REGION => {NAME => 't2,,1314721111490.71df02214242923574b71fe5e2a19360.'STARTKEY => ''ENDK   .                                                  EY => 'row0'ENCODED => 71df02214242923574b71fe5e2a19360, TABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATION_SC                                                      OPE => '0'VERSIONS => '3'COMPRESSION => 'NONE'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}           t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:server, timestamp=1314721112267, value=yinjie:60020                                                                                         .                                                                                                                                                                                                          t2,,1314721111490.71df02214242923574b71fe5e2a19360 column=info:serverstartcode, timestamp=1314721112267, value=1314716290123                                                                               .                                                                                                                                                                                                          t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:regioninfo, timestamp=1314721112627, value=REGION => {NAME => 't2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b061ca.'STARTKEY => 'row   61ca.                                              0'ENDKEY => ''ENCODED => 915ee8d4a32c59a4ec3960e335b061ca, TABLE => {{NAME => 't2'FAMILIES => [{NAME => 'f1'BLOOMFILTER => 'NONE'REPLICATIO                                                      N_SCOPE => '0'VERSIONS => '3'COMPRESSION => 'NONE'TTL => '2147483647'BLOCKSIZE => '65536'IN_MEMORY => 'false'BLOCKCACHE => 'true'}]}}       t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:server, timestamp=1314721112627, value=yinjie:60020                                                                                         61ca.                                                                                                                                                                                                      t2,row0,1314721111490.915ee8d4a32c59a4ec3960e335b0 column=info:serverstartcode, timestamp=1314721112627, value=1314716290123                                                                               61ca.                                                                                                                                                                                                     4 row(s) in 0.0380 seconds  

可以看到t2的region已经分裂.

亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>

向AI问一下细节

免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。

AI

开发者交流群×