本节介绍了checkpoint中用于控制checkpoint刷盘频率的函数:IsCheckpointOnSchedule.
宏定义
checkpoints request flag bits
checkpoints request flag bits,检查点请求标记位定义.
/*
* OR-able request flag bits for checkpoints. The "cause" bits are used only
* for logging purposes. Note: the flags must be defined so that it's
* sensible to OR together request flags arising from different requestors.
*/
/* These directly affect the behavior of CreateCheckPoint and subsidiaries */
#define CHECKPOINT_IS_SHUTDOWN 0x0001 /* Checkpoint is for shutdown */
#define CHECKPOINT_END_OF_RECOVERY 0x0002 /* Like shutdown checkpoint, but
* issued at end of WAL recovery */
#define CHECKPOINT_IMMEDIATE 0x0004 /* Do it without delays */
#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */
#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those
* belonging to unlogged tables */
/* These are important to RequestCheckpoint */
#define CHECKPOINT_WAIT 0x0020 /* Wait for completion */
#define CHECKPOINT_REQUESTED 0x0040 /* Checkpoint request has been made */
/* These indicate the cause of a checkpoint request */
#define CHECKPOINT_CAUSE_XLOG 0x0080 /* XLOG consumption */
#define CHECKPOINT_CAUSE_TIME 0x0100 /* Elapsed time */
WRITES_PER_ABSORB
/* interval for calling AbsorbSyncRequests in CheckpointWriteDelay */
//调用AbsorbSyncRequests的间隔,默认值为1000
#define WRITES_PER_ABSORB 1000
IsCheckpointOnSchedule
该函数判断是否在完成checkpoint的调度中,如返回T则可以休息,否则返回F则需要干活.
/*
* Calculate CheckPointSegments based on max_wal_size_mb and
* checkpoint_completion_target.
* 计算CheckPointSegments
*/
static void
CalculateCheckpointSegments(void)
{
double target;
/*-------
* Calculate the distance at which to trigger a checkpoint, to avoid
* exceeding max_wal_size_mb. This is based on two assumptions:
*
* a) we keep WAL for only one checkpoint cycle (prior to PG11 we kept
* WAL for two checkpoint cycles to allow us to recover from the
* secondary checkpoint if the first checkpoint failed, though we
* only did this on the master anyway, not on standby. Keeping just
* one checkpoint simplifies processing and reduces disk space in
* many smaller databases.)
* b) during checkpoint, we consume checkpoint_completion_target *
* number of segments consumed between checkpoints.
*-------
*/
//#define ConvertToXSegs(x,segsize) (x / ((segsize) / (1024 * 1024)))
target = (double) ConvertToXSegs(max_wal_size_mb, wal_segment_size) /
(1.0 + CheckPointCompletionTarget);
/* round down */
CheckPointSegments = (int) target;
if (CheckPointSegments < 1)
CheckPointSegments = 1;
}
/*
* IsCheckpointOnSchedule -- are we on schedule to finish this checkpoint
* (or restartpoint) in time?
* IsCheckpointOnSchedule -- 是否在完成checkpoint的调度中
*
* Compares the current progress against the time/segments elapsed since last
* checkpoint, and returns true if the progress we've made this far is greater
* than the elapsed time/segments.
* 当前的进度与消逝的time/xlog segments进行比较,如果进度要早,那么返回T(进入休息状态)
*/
static bool
IsCheckpointOnSchedule(double progress)
{
XLogRecPtr recptr;
struct timeval now;
double elapsed_xlogs,
elapsed_time;
Assert(ckpt_active);
/* Scale progress according to checkpoint_completion_target. */
//实际进度调整为progress*checkpoint_completion_target
progress *= CheckPointCompletionTarget;
/*
* Check against the cached value first. Only do the more expensive
* calculations once we reach the target previously calculated. Since
* neither time or WAL insert pointer moves backwards, a freshly
* calculated value can only be greater than or equal to the cached value.
* 如果进度小于缓存值,返回F,需加快进度了!
*/
if (progress < ckpt_cached_elapsed)
return false;
/*
* Check progress against WAL segments written and CheckPointSegments.
* 进度 vs WAL
*
* We compare the current WAL insert location against the location
* computed before calling CreateCheckPoint. The code in XLogInsert that
* actually triggers a checkpoint when CheckPointSegments is exceeded
* compares against RedoRecptr, so this is not completely accurate.
* However, it's good enough for our purposes, we're only calculating an
* estimate anyway.
*
* During recovery, we compare last replayed WAL record's location with
* the location computed before calling CreateRestartPoint. That maintains
* the same pacing as we have during checkpoints in normal operation, but
* we might exceed max_wal_size by a fair amount. That's because there can
* be a large gap between a checkpoint's redo-pointer and the checkpoint
* record itself, and we only start the restartpoint after we've seen the
* checkpoint record. (The gap is typically up to CheckPointSegments *
* checkpoint_completion_target where checkpoint_completion_target is the
* value that was in effect when the WAL was generated).
*/
if (RecoveryInProgress())
recptr = GetXLogReplayRecPtr(NULL);
else
recptr = GetInsertRecPtr();
elapsed_xlogs = (((double) (recptr - ckpt_start_recptr)) /
wal_segment_size) / CheckPointSegments;
if (progress < elapsed_xlogs)
{
//进度小于产生xlogs的速度,需干活
ckpt_cached_elapsed = elapsed_xlogs;
return false;
}
/*
* Check progress against time elapsed and checkpoint_timeout.
* 比较时间
*/
gettimeofday(&now, NULL);
elapsed_time = ((double) ((pg_time_t) now.tv_sec - ckpt_start_time) +
now.tv_usec / 1000000.0) / CheckPointTimeout;
if (progress < elapsed_time)
{
//进度慢于消逝的时间,需干活
ckpt_cached_elapsed = elapsed_time;
return false;
}
/* It looks like we're on schedule. */
//处于调度中,可以休息
return true;
}
N/A
PG Source Code
PgSQL · 特性分析 · 谈谈checkpoint的调度
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。