本篇内容主要讲解“PostgreSQL中mdread函数有什么作用”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“PostgreSQL中mdread函数有什么作用”吧!
PostgreSQL存储管理的mdread函数是magnetic disk存储管理中负责读取的函数.
smgrsw
f_smgr函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数.
md是magnetic disk的缩写.
除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,
但在后面只剩下magnetic disk,其余的已被废弃不再支持.
“magnetic disk”本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.
/*
* This struct of function pointers defines the API between smgr.c and
* any individual storage manager module. Note that smgr subfunctions are
* generally expected to report problems via elog(ERROR). An exception is
* that smgr_unlink should use elog(WARNING), rather than erroring out,
* because we normally unlink relations during post-commit/abort cleanup,
* and so it's too late to raise an error. Also, various conditions that
* would normally be errors should be allowed during bootstrap and/or WAL
* recovery --- see comments in md.c for details.
* 函数指针结构体定义了独立的存储管理模块和smgr.c之间的API函数.
* 注意smgr子函数通常会通过elog(ERROR)报告错误.
* 其中一个例外是smgr_unlink应该使用elog(WARNING),而不是把错误抛出,
* 因为通过来说在事务提交/回滚清理期间才会解链接(unlinke)关系,
* 因此这时候抛出错误就显得太晚了.
* 同时,在bootstrap和/或WAL恢复期间,各种可能会出现错误的情况也应被允许 --- 详细可查看md.c中的注释.
*/
typedef struct f_smgr
{
void (*smgr_init) (void); /* may be NULL */
void (*smgr_shutdown) (void); /* may be NULL */
void (*smgr_close) (SMgrRelation reln, ForkNumber forknum);
void (*smgr_create) (SMgrRelation reln, ForkNumber forknum,
bool isRedo);
bool (*smgr_exists) (SMgrRelation reln, ForkNumber forknum);
void (*smgr_unlink) (RelFileNodeBackend rnode, ForkNumber forknum,
bool isRedo);
void (*smgr_extend) (SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, char *buffer, bool skipFsync);
void (*smgr_prefetch) (SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum);
void (*smgr_read) (SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, char *buffer);
void (*smgr_write) (SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, char *buffer, bool skipFsync);
void (*smgr_writeback) (SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum, BlockNumber nblocks);
BlockNumber (*smgr_nblocks) (SMgrRelation reln, ForkNumber forknum);
void (*smgr_truncate) (SMgrRelation reln, ForkNumber forknum,
BlockNumber nblocks);
void (*smgr_immedsync) (SMgrRelation reln, ForkNumber forknum);
void (*smgr_pre_ckpt) (void); /* may be NULL */
void (*smgr_sync) (void); /* may be NULL */
void (*smgr_post_ckpt) (void); /* may be NULL */
} f_smgr;
/*
md是magnetic disk的缩写.
除了md,先前PG还支持Sony WORM optical disk jukebox and persistent main memory这两种存储方式,
但在后面只剩下magnetic disk,其余的已被废弃不再支持.
"magnetic disk"本身的名称也存在误导,实际上md可以支持操作系统提供标准文件系统的任何类型的设备.
*/
static const f_smgr smgrsw[] = {
/* magnetic disk */
{
.smgr_init = mdinit,
.smgr_shutdown = NULL,
.smgr_close = mdclose,
.smgr_create = mdcreate,
.smgr_exists = mdexists,
.smgr_unlink = mdunlink,
.smgr_extend = mdextend,
.smgr_prefetch = mdprefetch,
.smgr_read = mdread,
.smgr_write = mdwrite,
.smgr_writeback = mdwriteback,
.smgr_nblocks = mdnblocks,
.smgr_truncate = mdtruncate,
.smgr_immedsync = mdimmedsync,
.smgr_pre_ckpt = mdpreckpt,
.smgr_sync = mdsync,
.smgr_post_ckpt = mdpostckpt
}
};
MdfdVec
magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符.
之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系.
为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的”segment”文件.
段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置.
/*
* The magnetic disk storage manager keeps track of open file
* descriptors in its own descriptor pool. This is done to make it
* easier to support relations that are larger than the operating
* system's file size limit (often 2GBytes). In order to do that,
* we break relations up into "segment" files that are each shorter than
* the OS file size limit. The segment size is set by the RELSEG_SIZE
* configuration constant in pg_config.h.
* magnetic disk存储管理在自己的描述符池中跟踪打开的文件描述符.
* 之所以这样做是因为便于支持超过os文件大小上限(通常是2GB)的关系.
* 为了达到这个目的,我们拆分关系为多个比OS文件大小上限要小的"segment"文件.
* 段大小通过pg_config.h中定义的RELSEG_SIZE配置参数设置.
*
* On disk, a relation must consist of consecutively numbered segment
* files in the pattern
* -- Zero or more full segments of exactly RELSEG_SIZE blocks each
* -- Exactly one partial segment of size 0 <= size < RELSEG_SIZE blocks
* -- Optionally, any number of inactive segments of size 0 blocks.
* The full and partial segments are collectively the "active" segments.
* Inactive segments are those that once contained data but are currently
* not needed because of an mdtruncate() operation. The reason for leaving
* them present at size zero, rather than unlinking them, is that other
* backends and/or the checkpointer might be holding open file references to
* such segments. If the relation expands again after mdtruncate(), such
* that a deactivated segment becomes active again, it is important that
* such file references still be valid --- else data might get written
* out to an unlinked old copy of a segment file that will eventually
* disappear.
* 在磁盘上,关系必须由按照某种模式连续编号的segment files组成.
* -- 每个RELSEG_SIZE块的另段或多个完整段
* -- 大小满足0 <= size < RELSEG_SIZE blocks的一个部分段
* -- 可选的,大小为0 blocks的N个非活动段
* 完整和部分段统称为活动段.非活动段指的是哪些因为mdtruncate()操作而出现的包含数据但目前不需要的.
* 保留这些大小为0的非活动段而不是unlinking的原因是其他进程和/或checkpointer进程可能
* 持有这些段的文件依赖.
* 如果关系在mdtruncate()之后再次扩展了,这样一个无效的会重新变为活动段,
* 因此文件依赖仍然保持有效是很重要的
* --- 否则数据可能写出到未经链接的旧segment file拷贝上,会时不时的出现数据丢失.
*
* File descriptors are stored in the per-fork md_seg_fds arrays inside
* SMgrRelation. The length of these arrays is stored in md_num_open_segs.
* Note that a fork's md_num_open_segs having a specific value does not
* necessarily mean the relation doesn't have additional segments; we may
* just not have opened the next segment yet. (We could not have "all
* segments are in the array" as an invariant anyway, since another backend
* could extend the relation while we aren't looking.) We do not have
* entries for inactive segments, however; as soon as we find a partial
* segment, we assume that any subsequent segments are inactive.
* 文件描述符在SMgrRelation中的per-fork md_seg_fds数组存储.
* 这些数组的长度存储在md_num_open_segs中.
* 注意一个fork的md_num_open_segs有一个特定值并不必要意味着关系不能有额外的段,
* 我们只是还没有打开下一个段而已.
* (但不管怎样,我们不可能把"所有段都放在数组中"作为一个不变式看待,
* 因为其他后台进程在尚未检索时已经扩展了关系)
* 但是,我们不需要持有非活动段的条目,只要我们一旦发现部分段,那么就可以假定接下来的段是非活动的.
*
* The entire MdfdVec array is palloc'd in the MdCxt memory context.
* 整个MdfdVec数组通过palloc在MdCxt内存上下文中分配.
*/
typedef struct _MdfdVec
{
//文件描述符池中该文件的编号
File mdfd_vfd; /* fd number in fd.c's pool */
//段号,从0起算
BlockNumber mdfd_segno; /* segment number, from 0 */
} MdfdVec;
mdread() — 从relation中读取相应的block.
源码较为简单,主要是调用FileRead函数执行实际的读取操作.
/*
* mdread() -- Read the specified block from a relation.
* mdread() -- 从relation中读取相应的block
*/
void
mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
char *buffer)
{
off_t seekpos;//seek的位置
int nbytes;//bytes
MdfdVec *v;//md文件描述符向量数组
TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
reln->smgr_rnode.node.spcNode,
reln->smgr_rnode.node.dbNode,
reln->smgr_rnode.node.relNode,
reln->smgr_rnode.backend);
//获取向量数组
v = _mdfd_getseg(reln, forknum, blocknum, false,
EXTENSION_FAIL | EXTENSION_CREATE_RECOVERY);
//获取block偏移
seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
//验证
Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
//读取文件,读入buffer中,返回读取的字节数
nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, seekpos, WAIT_EVENT_DATA_FILE_READ);
//跟踪
TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
reln->smgr_rnode.node.spcNode,
reln->smgr_rnode.node.dbNode,
reln->smgr_rnode.node.relNode,
reln->smgr_rnode.backend,
nbytes,
BLCKSZ);
if (nbytes != BLCKSZ)
{
//读取的字节数不等于块大小,报错
if (nbytes < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read block %u in file \"%s\": %m",
blocknum, FilePathName(v->mdfd_vfd))));
/*
* Short read: we are at or past EOF, or we read a partial block at
* EOF. Normally this is an error; upper levels should never try to
* read a nonexistent block. However, if zero_damaged_pages is ON or
* we are InRecovery, we should instead return zeroes without
* complaining. This allows, for example, the case of trying to
* update a block that was later truncated away.
* Short read:处于EOF或者在EOF之后,或者在EOF处读取了一个部分块.
* 通常来说,这是一个错误,高层代码不应尝试读取一个不存在的block.
* 但是,如果zero_damaged_pages参数设置为ON或者处于InRecovery状态,那么应该返回0而不报错.
* 比如,这可以允许尝试更新一个块但随后就给截断的情况.
*/
if (zero_damaged_pages || InRecovery)
MemSet(buffer, 0, BLCKSZ);
else
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("could not read block %u in file \"%s\": read only %d of %d bytes",
blocknum, FilePathName(v->mdfd_vfd),
nbytes, BLCKSZ)));
}
}
测试脚本
11:15:11 (xdb@[local]:5432)testdb=# insert into t1(id) select generate_series(100,500);
启动gdb,跟踪
查看调用栈
(gdb) b mdread
Breakpoint 3 at 0x8b669b: file md.c, line 738.
(gdb) c
Continuing.
Breakpoint 3, mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738
738 TRACE_POSTGRESQL_SMGR_MD_READ_START(forknum, blocknum,
(gdb) bt
#0 mdread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "") at md.c:738
#1 0x00000000008b92d5 in smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "")
at smgr.c:628
#2 0x00000000008793f9 in ReadBuffer_common (smgr=0x2d09be0, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=50,
mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd5fb2948b) at bufmgr.c:890
#3 0x0000000000878cd4 in ReadBufferExtended (reln=0x7f3836e1e788, forkNum=MAIN_FORKNUM, blockNum=50, mode=RBM_NORMAL,
strategy=0x0) at bufmgr.c:664
#4 0x0000000000878bb1 in ReadBuffer (reln=0x7f3836e1e788, blockNum=50) at bufmgr.c:596
#5 0x00000000004eeb96 in ReadBufferBI (relation=0x7f3836e1e788, targetBlock=50, bistate=0x0) at hio.c:87
#6 0x00000000004ef387 in RelationGetBufferForTuple (relation=0x7f3836e1e788, len=32, otherBuffer=0, options=0,
bistate=0x0, vmbuffer=0x7ffd5fb295ec, vmbuffer_other=0x0) at hio.c:415
#7 0x00000000004df1f8 in heap_insert (relation=0x7f3836e1e788, tup=0x2ca6770, cid=0, options=0, bistate=0x0)
at heapam.c:2468
#8 0x0000000000709dda in ExecInsert (mtstate=0x2ca4c40, slot=0x2ca3418, planSlot=0x2ca3418, estate=0x2ca48d8,
canSetTag=true) at nodeModifyTable.c:529
#9 0x000000000070c475 in ExecModifyTable (pstate=0x2ca4c40) at nodeModifyTable.c:2159
#10 0x00000000006e05cb in ExecProcNodeFirst (node=0x2ca4c40) at execProcnode.c:445
#11 0x00000000006d552e in ExecProcNode (node=0x2ca4c40) at ../../../src/include/executor/executor.h:247
#12 0x00000000006d7d66 in ExecutePlan (estate=0x2ca48d8, planstate=0x2ca4c40, use_parallel_mode=false,
operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x2d41a30,
execute_once=true) at execMain.c:1723
#13 0x00000000006d5af8 in standard_ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:364
#14 0x00000000006d5920 in ExecutorRun (queryDesc=0x2ca24b8, direction=ForwardScanDirection, count=0, execute_once=true)
at execMain.c:307
#15 0x00000000008c1092 in ProcessQuery (plan=0x2d418b8,
sourceText=0x2c7eec8 "insert into t1(id) select generate_series(100,500);", params=0x0, queryEnv=0x0, dest=0x2d41a30,
---Type <return> to continue, or q <return> to quit---
completionTag=0x7ffd5fb29b80 "") at pquery.c:161
#16 0x00000000008c29a1 in PortalRunMulti (portal=0x2ce4488, isTopLevel=true, setHoldSnapshot=false, dest=0x2d41a30,
altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:1286
#17 0x00000000008c1f7a in PortalRun (portal=0x2ce4488, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2d41a30, altdest=0x2d41a30, completionTag=0x7ffd5fb29b80 "") at pquery.c:799
#18 0x00000000008bbf16 in exec_simple_query (query_string=0x2c7eec8 "insert into t1(id) select generate_series(100,500);")
at postgres.c:1145
#19 0x00000000008c01a1 in PostgresMain (argc=1, argv=0x2ca8af8, dbname=0x2ca8960 "testdb", username=0x2c7bba8 "xdb")
at postgres.c:4182
#20 0x000000000081e07c in BackendRun (port=0x2ca0940) at postmaster.c:4361
#21 0x000000000081d7ef in BackendStartup (port=0x2ca0940) at postmaster.c:4033
#22 0x0000000000819be9 in ServerLoop () at postmaster.c:1706
#23 0x000000000081949f in PostmasterMain (argc=1, argv=0x2c79b60) at postmaster.c:1379
#24 0x0000000000742941 in main (argc=1, argv=0x2c79b60) at main.c:228
(gdb)
获取读取的偏移
(gdb) n
744 v = _mdfd_getseg(reln, forknum, blocknum, false,
(gdb)
747 seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
(gdb) p *v
$1 = {mdfd_vfd = 26, mdfd_segno = 0}
(gdb) p BLCKSZ
$2 = 8192
(gdb) p blocknum
$3 = 50
(gdb) p RELSEG_SIZE
$4 = 131072
(gdb) n
749 Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
(gdb) p seekpos
$5 = 409600
(gdb)
执行读取操作
(gdb) n
751 if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)
(gdb)
757 nbytes = FileRead(v->mdfd_vfd, buffer, BLCKSZ, WAIT_EVENT_DATA_FILE_READ);
(gdb)
759 TRACE_POSTGRESQL_SMGR_MD_READ_DONE(forknum, blocknum,
(gdb) p nbytes
$6 = 8192
(gdb) p *buffer
$7 = 1 '\001'
(gdb) n
767 if (nbytes != BLCKSZ)
(gdb)
792 }
(gdb)
smgrread (reln=0x2d09be0, forknum=MAIN_FORKNUM, blocknum=50, buffer=0x7f3823369c00 "\001") at smgr.c:629
629 }
(gdb)
到此,相信大家对“PostgreSQL中mdread函数有什么作用”有了更深的了解,不妨来实际操作一番吧!这里是亿速云网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:http://blog.itpub.net/6906/viewspace-2637879/