本篇内容主要讲解“怎么使用PostgreSQL的tuplesort_performsort函数”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“怎么使用PostgreSQL的tuplesort_performsort函数”吧!
TupleTableSlot
执行器在”tuple table”中存储元组,这个表是各自独立的TupleTableSlots链表.
/*----------
* The executor stores tuples in a "tuple table" which is a List of
* independent TupleTableSlots. There are several cases we need to handle:
* 1. physical tuple in a disk buffer page
* 2. physical tuple constructed in palloc'ed memory
* 3. "minimal" physical tuple constructed in palloc'ed memory
* 4. "virtual" tuple consisting of Datum/isnull arrays
* 执行器在"tuple table"中存储元组,这个表是各自独立的TupleTableSlots链表.
* 有以下情况需要处理:
* 1. 磁盘缓存页中的物理元组
* 2. 在已分配内存中构造的物理元组
* 3. 在已分配内存中构造的"minimal"物理元组
* 4. 含有Datum/isnull数组的"virtual"虚拟元组
*
* The first two cases are similar in that they both deal with "materialized"
* tuples, but resource management is different. For a tuple in a disk page
* we need to hold a pin on the buffer until the TupleTableSlot's reference
* to the tuple is dropped; while for a palloc'd tuple we usually want the
* tuple pfree'd when the TupleTableSlot's reference is dropped.
* 最上面2种情况跟"物化"元组的处理方式类似,但资源管理是不同的.
* 对于在磁盘页中的元组,需要pin在缓存中直至TupleTableSlot依赖的元组被清除,
* 而对于通过palloc分配的元组在TupleTableSlot依赖被清除后通常希望使用pfree释放
*
* A "minimal" tuple is handled similarly to a palloc'd regular tuple.
* At present, minimal tuples never are stored in buffers, so there is no
* parallel to case 1. Note that a minimal tuple has no "system columns".
* (Actually, it could have an OID, but we have no need to access the OID.)
* "minimal"元组与通常的palloc分配的元组处理类似.
* 截止目前为止,"minimal"元组不会存储在缓存中,因此对于第一种情况不会存在并行的问题.
* 注意"minimal"没有"system columns"系统列
* (实际上,可以有OID,但不需要访问OID列)
*
* A "virtual" tuple is an optimization used to minimize physical data
* copying in a nest of plan nodes. Any pass-by-reference Datums in the
* tuple point to storage that is not directly associated with the
* TupleTableSlot; generally they will point to part of a tuple stored in
* a lower plan node's output TupleTableSlot, or to a function result
* constructed in a plan node's per-tuple econtext. It is the responsibility
* of the generating plan node to be sure these resources are not released
* for as long as the virtual tuple needs to be valid. We only use virtual
* tuples in the result slots of plan nodes --- tuples to be copied anywhere
* else need to be "materialized" into physical tuples. Note also that a
* virtual tuple does not have any "system columns".
* "virtual"元组是用于在嵌套计划节点中拷贝时最小化物理数据的优化.
* 所有通过引用传递指向与TupleTableSlot非直接相关的存储的元组的Datums使用,
* 通常它们会指向存储在低层节点输出的TupleTableSlot中的元组的一部分,
* 或者指向在计划节点的per-tuple内存上下文econtext中构造的函数结果.
* 产生计划节点的时候有责任确保这些资源未被释放,确保virtual元组是有效的.
* 我们使用计划节点中的结果slots中的虚拟元组 --- 元组会拷贝到其他地方需要"物化"到物理元组中.
* 注意virtual元组不需要有"system columns"
*
* It is also possible for a TupleTableSlot to hold both physical and minimal
* copies of a tuple. This is done when the slot is requested to provide
* the format other than the one it currently holds. (Originally we attempted
* to handle such requests by replacing one format with the other, but that
* had the fatal defect of invalidating any pass-by-reference Datums pointing
* into the existing slot contents.) Both copies must contain identical data
* payloads when this is the case.
* TupleTableSlot包含物理和minimal元组拷贝是可能的.
* 在slot需要提供格式化而不是当前持有的格式时会出现这种情况.
* (原始的情况是我们准备通过另外一种格式进行替换来处理这种请求,但在校验引用传递Datums时会出现致命错误)
* 同时在这种情况下,拷贝必须含有唯一的数据payloads.
*
* The Datum/isnull arrays of a TupleTableSlot serve double duty. When the
* slot contains a virtual tuple, they are the authoritative data. When the
* slot contains a physical tuple, the arrays contain data extracted from
* the tuple. (In this state, any pass-by-reference Datums point into
* the physical tuple.) The extracted information is built "lazily",
* ie, only as needed. This serves to avoid repeated extraction of data
* from the physical tuple.
* TupleTableSlot中的Datum/isnull数组有双重职责.
* 在slot包含虚拟元组时,它们是authoritative(权威)数据.
* 在slot包含物理元组时,时包含从元组中提取的数据的数组.
* (在这种情况下,所有通过引用传递的Datums指向物理元组)
* 提取的信息通过'lazily'在需要的时候才构建.
* 这样可以避免从物理元组的重复数据提取.
*
* A TupleTableSlot can also be "empty", holding no valid data. This is
* the only valid state for a freshly-created slot that has not yet had a
* tuple descriptor assigned to it. In this state, tts_isempty must be
* true, tts_shouldFree false, tts_tuple NULL, tts_buffer InvalidBuffer,
* and tts_nvalid zero.
* TupleTableSlot可能为"empty",没有有效数据.
* 对于新鲜创建仍未分配描述的的slot来说这是唯一有效的状态.
* 在这种状态下,tts_isempty必须为T,tts_shouldFree为F, tts_tuple为NULL,
* tts_buffer为InvalidBuffer,tts_nvalid为0.
*
* The tupleDescriptor is simply referenced, not copied, by the TupleTableSlot
* code. The caller of ExecSetSlotDescriptor() is responsible for providing
* a descriptor that will live as long as the slot does. (Typically, both
* slots and descriptors are in per-query memory and are freed by memory
* context deallocation at query end; so it's not worth providing any extra
* mechanism to do more. However, the slot will increment the tupdesc
* reference count if a reference-counted tupdesc is supplied.)
* tupleDescriptor只是简单的引用并没有通过TupleTableSlot中的代码进行拷贝.
* ExecSetSlotDescriptor()的调用者有责任提供与slot生命周期一样的描述符.
* (典型的,不管是slots还是描述符会在per-query内存中,
* 并且会在查询结束时通过内存上下文的析构器释放,因此不需要提供额外的机制来处理.
* 但是,如果使用了引用计数型tupdesc,slot会增加tupdesc引用计数)
*
* When tts_shouldFree is true, the physical tuple is "owned" by the slot
* and should be freed when the slot's reference to the tuple is dropped.
* 在tts_shouldFree为T的情况下,物理元组由slot持有,并且在slot引用元组被清除时释放内存.
*
* If tts_buffer is not InvalidBuffer, then the slot is holding a pin
* on the indicated buffer page; drop the pin when we release the
* slot's reference to that buffer. (tts_shouldFree should always be
* false in such a case, since presumably tts_tuple is pointing at the
* buffer page.)
* 如tts_buffer不是InvalidBuffer,那么slot持有缓存页中的pin,在释放引用该buffer的slot时会清除该pin.
* (tts_shouldFree通常来说应为F,因为tts_tuple会指向缓存页)
*
* tts_nvalid indicates the number of valid columns in the tts_values/isnull
* arrays. When the slot is holding a "virtual" tuple this must be equal
* to the descriptor's natts. When the slot is holding a physical tuple
* this is equal to the number of columns we have extracted (we always
* extract columns from left to right, so there are no holes).
* tts_nvalid指示了tts_values/isnull数组中的有效列数.
* 如果slot含有虚拟元组,该字段必须跟描述符的natts一样.
* 在slot含有物理元组时,该字段等于我们提取的列数.
* (我们通常从左到右提取列,因此不会有空洞存在)
*
* tts_values/tts_isnull are allocated when a descriptor is assigned to the
* slot; they are of length equal to the descriptor's natts.
* 在描述符分配给slot时tts_values/tts_isnull会被分配内存,长度与描述符natts长度一样.
*
* tts_mintuple must always be NULL if the slot does not hold a "minimal"
* tuple. When it does, tts_mintuple points to the actual MinimalTupleData
* object (the thing to be pfree'd if tts_shouldFreeMin is true). If the slot
* has only a minimal and not also a regular physical tuple, then tts_tuple
* points at tts_minhdr and the fields of that struct are set correctly
* for access to the minimal tuple; in particular, tts_minhdr.t_data points
* MINIMAL_TUPLE_OFFSET bytes before tts_mintuple. This allows column
* extraction to treat the case identically to regular physical tuples.
* 如果slot没有包含minimal元组,tts_mintuple通常必须为NULL.
* 如含有,则tts_mintuple执行实际的MinimalTupleData对象(如tts_shouldFreeMin为T,则需要通过pfree释放内存).
* 如果slot只有一个minimal而没有通常的物理元组,那么tts_tuple指向tts_minhdr,
* 结构体的其他字段会被正确的设置为用于访问minimal元组.
* 特别的, tts_minhdr.t_data指向tts_mintuple前的MINIMAL_TUPLE_OFFSET字节.
* 这可以让列提取可以独立处理通常的物理元组.
*
* tts_slow/tts_off are saved state for slot_deform_tuple, and should not
* be touched by any other code.
* tts_slow/tts_off用于存储slot_deform_tuple状态,不应通过其他代码修改.
*----------
*/
typedef struct TupleTableSlot
{
NodeTag type;//Node标记
//如slot为空,则为T
bool tts_isempty; /* true = slot is empty */
//是否需要pfree tts_tuple?
bool tts_shouldFree; /* should pfree tts_tuple? */
//是否需要pfree tts_mintuple?
bool tts_shouldFreeMin; /* should pfree tts_mintuple? */
#define FIELDNO_TUPLETABLESLOT_SLOW 4
//为slot_deform_tuple存储状态?
bool tts_slow; /* saved state for slot_deform_tuple */
#define FIELDNO_TUPLETABLESLOT_TUPLE 5
//物理元组,如为虚拟元组则为NULL
HeapTuple tts_tuple; /* physical tuple, or NULL if virtual */
#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 6
//slot中的元组描述符
TupleDesc tts_tupleDescriptor; /* slot's tuple descriptor */
//slot所在的上下文
MemoryContext tts_mcxt; /* slot itself is in this context */
//元组缓存,如无则为InvalidBuffer
Buffer tts_buffer; /* tuple's buffer, or InvalidBuffer */
#define FIELDNO_TUPLETABLESLOT_NVALID 9
//tts_values中的有效值
int tts_nvalid; /* # of valid values in tts_values */
#define FIELDNO_TUPLETABLESLOT_VALUES 10
//当前每个属性的值
Datum *tts_values; /* current per-attribute values */
#define FIELDNO_TUPLETABLESLOT_ISNULL 11
//isnull数组
bool *tts_isnull; /* current per-attribute isnull flags */
//minimal元组,如无则为NULL
MinimalTuple tts_mintuple; /* minimal tuple, or NULL if none */
//在minimal情况下的工作空间
HeapTupleData tts_minhdr; /* workspace for minimal-tuple-only case */
#define FIELDNO_TUPLETABLESLOT_OFF 14
//slot_deform_tuple的存储状态
uint32 tts_off; /* saved state for slot_deform_tuple */
//不能被变更的描述符(固定描述符)
bool tts_fixedTupleDescriptor; /* descriptor can't be changed */
} TupleTableSlot;
/* base tuple table slot type */
typedef struct TupleTableSlot
{
NodeTag type;//Node标记
#define FIELDNO_TUPLETABLESLOT_FLAGS 1
uint16 tts_flags; /* 布尔状态;Boolean states */
#define FIELDNO_TUPLETABLESLOT_NVALID 2
AttrNumber tts_nvalid; /* 在tts_values中有多少有效的values;# of valid values in tts_values */
const TupleTableSlotOps *const tts_ops; /* slot的实际实现;implementation of slot */
#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4
TupleDesc tts_tupleDescriptor; /* slot的元组描述符;slot's tuple descriptor */
#define FIELDNO_TUPLETABLESLOT_VALUES 5
Datum *tts_values; /* 当前属性值;current per-attribute values */
#define FIELDNO_TUPLETABLESLOT_ISNULL 6
bool *tts_isnull; /* 当前属性isnull标记;current per-attribute isnull flags */
MemoryContext tts_mcxt; /*内存上下文; slot itself is in this context */
} TupleTableSlot;
/* routines for a TupleTableSlot implementation */
//TupleTableSlot的"小程序"
struct TupleTableSlotOps
{
/* Minimum size of the slot */
//slot的最小化大小
size_t base_slot_size;
/* Initialization. */
//初始化方法
void (*init)(TupleTableSlot *slot);
/* Destruction. */
//析构方法
void (*release)(TupleTableSlot *slot);
/*
* Clear the contents of the slot. Only the contents are expected to be
* cleared and not the tuple descriptor. Typically an implementation of
* this callback should free the memory allocated for the tuple contained
* in the slot.
* 清除slot中的内容。
* 只希望清除内容,而不希望清除元组描述符。
* 通常,这个回调的实现应该释放为slot中包含的元组分配的内存。
*/
void (*clear)(TupleTableSlot *slot);
/*
* Fill up first natts entries of tts_values and tts_isnull arrays with
* values from the tuple contained in the slot. The function may be called
* with natts more than the number of attributes available in the tuple,
* in which case it should set tts_nvalid to the number of returned
* columns.
* 用slot中包含的元组的值填充tts_values和tts_isnull数组的第一个natts条目。
* 在调用该函数时,natts可能多于元组中可用属性的数量,在这种情况下,
* 应该将tts_nvalid设置为返回列的数量。
*/
void (*getsomeattrs)(TupleTableSlot *slot, int natts);
/*
* Returns value of the given system attribute as a datum and sets isnull
* to false, if it's not NULL. Throws an error if the slot type does not
* support system attributes.
* 将给定系统属性的值作为基准返回,如果不为NULL,
* 则将isnull设置为false。如果slot类型不支持系统属性,则引发错误。
*/
Datum (*getsysattr)(TupleTableSlot *slot, int attnum, bool *isnull);
/*
* Make the contents of the slot solely depend on the slot, and not on
* underlying resources (like another memory context, buffers, etc).
* 使slot的内容完全依赖于slot,而不是底层资源(如另一个内存上下文、缓冲区等)。
*/
void (*materialize)(TupleTableSlot *slot);
/*
* Copy the contents of the source slot into the destination slot's own
* context. Invoked using callback of the destination slot.
* 将源slot的内容复制到目标slot自己的上下文中。
* 使用目标slot的回调函数调用。
*/
void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);
/*
* Return a heap tuple "owned" by the slot. It is slot's responsibility to
* free the memory consumed by the heap tuple. If the slot can not "own" a
* heap tuple, it should not implement this callback and should set it as
* NULL.
* 返回slot“拥有”的堆元组。
* slot负责释放堆元组分配的内存。
* 如果slot不能“拥有”堆元组,它不应该实现这个回调函数,应该将它设置为NULL。
*/
HeapTuple (*get_heap_tuple)(TupleTableSlot *slot);
/*
* Return a minimal tuple "owned" by the slot. It is slot's responsibility
* to free the memory consumed by the minimal tuple. If the slot can not
* "own" a minimal tuple, it should not implement this callback and should
* set it as NULL.
* 返回slot“拥有”的最小元组。
* slot负责释放最小元组分配的内存。
* 如果slot不能“拥有”最小元组,它不应该实现这个回调函数,应该将它设置为NULL。
*/
MinimalTuple (*get_minimal_tuple)(TupleTableSlot *slot);
/*
* Return a copy of heap tuple representing the contents of the slot. The
* copy needs to be palloc'd in the current memory context. The slot
* itself is expected to remain unaffected. It is *not* expected to have
* meaningful "system columns" in the copy. The copy is not be "owned" by
* the slot i.e. the caller has to take responsibilty to free memory
* consumed by the slot.
* 返回表示slot内容的堆元组副本。
* 需要在当前内存上下文中对副本进行内存分配palloc。
* 预计slot本身不会受到影响。
* 它不希望在副本中有有意义的“系统列”。副本不是slot“拥有”的,即调用方必须负责释放slot消耗的内存。
*/
HeapTuple (*copy_heap_tuple)(TupleTableSlot *slot);
/*
* Return a copy of minimal tuple representing the contents of the slot. The
* copy needs to be palloc'd in the current memory context. The slot
* itself is expected to remain unaffected. It is *not* expected to have
* meaningful "system columns" in the copy. The copy is not be "owned" by
* the slot i.e. the caller has to take responsibilty to free memory
* consumed by the slot.
* 返回表示slot内容的最小元组的副本。
* 需要在当前内存上下文中对副本进行palloc。
* 预计slot本身不会受到影响。
* 它不希望在副本中有有意义的“系统列”。副本不是slot“拥有”的,即调用方必须负责释放slot消耗的内存。
*/
MinimalTuple (*copy_minimal_tuple)(TupleTableSlot *slot);
};
typedef struct tupleDesc
{
int natts; /* tuple中的属性数量;number of attributes in the tuple */
Oid tdtypeid; /* tuple类型的组合类型ID;composite type ID for tuple type */
int32 tdtypmod; /* tuple类型的typmode;typmod for tuple type */
int tdrefcount; /* 依赖计数,如为-1,则没有依赖;reference count, or -1 if not counting */
TupleConstr *constr; /* 约束,如无则为NULL;constraints, or NULL if none */
/* attrs[N] is the description of Attribute Number N+1 */
//attrs[N]是第N+1个属性的描述符
FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];
} *TupleDesc;
SortState
排序运行期状态信息
/* ----------------
* SortState information
* 排序运行期状态信息
* ----------------
*/
typedef struct SortState
{
//基类
ScanState ss; /* its first field is NodeTag */
//是否需要随机访问排序输出?
bool randomAccess; /* need random access to sort output? */
//结果集是否存在边界?
bool bounded; /* is the result set bounded? */
//如存在边界,需要多少个元组?
int64 bound; /* if bounded, how many tuples are needed */
//是否已完成排序?
bool sort_Done; /* sort completed yet? */
//是否使用有界值?
bool bounded_Done; /* value of bounded we did the sort with */
//使用的有界值?
int64 bound_Done; /* value of bound we did the sort with */
//tuplesort.c的私有状态
void *tuplesortstate; /* private state of tuplesort.c */
//是否worker?
bool am_worker; /* are we a worker? */
//每个worker对应一个条目
SharedSortInfo *shared_info; /* one entry per worker */
} SortState;
/* ----------------
* Shared memory container for per-worker sort information
* per-worker排序信息的共享内存容器
* ----------------
*/
typedef struct SharedSortInfo
{
//worker个数?
int num_workers;
//排序机制
TuplesortInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedSortInfo;
TuplesortInstrumentation
报告排序统计的数据结构.
/*
* Data structures for reporting sort statistics. Note that
* TuplesortInstrumentation can't contain any pointers because we
* sometimes put it in shared memory.
* 报告排序统计的数据结构.
* 注意TuplesortInstrumentation不能包含指针因为有时候会把该结构体放在共享内存中.
*/
typedef enum
{
SORT_TYPE_STILL_IN_PROGRESS = 0,//仍然在排序中
SORT_TYPE_TOP_N_HEAPSORT,//TOP N 堆排序
SORT_TYPE_QUICKSORT,//快速排序
SORT_TYPE_EXTERNAL_SORT,//外部排序
SORT_TYPE_EXTERNAL_MERGE//外部排序后的合并
} TuplesortMethod;//排序方法
typedef enum
{
SORT_SPACE_TYPE_DISK,//需要用上磁盘
SORT_SPACE_TYPE_MEMORY//使用内存
} TuplesortSpaceType;
typedef struct TuplesortInstrumentation
{
//使用的排序算法
TuplesortMethod sortMethod; /* sort algorithm used */
//排序使用空间类型
TuplesortSpaceType spaceType; /* type of space spaceUsed represents */
//空间消耗(以K为单位)
long spaceUsed; /* space consumption, in kB */
} TuplesortInstrumentation;
tuplesort_performsort是排序的实现.
/*
* All tuples have been provided; finish the sort.
* 已存在元组,执行排序!
*/
void
tuplesort_performsort(Tuplesortstate *state)
{
MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
#ifdef TRACE_SORT
if (trace_sort)
elog(LOG, "performsort of worker %d starting: %s",
state->worker, pg_rusage_show(&state->ru_start));
#endif
//根据状态执行不同的逻辑
switch (state->status)
{
case TSS_INITIAL:
/*
* We were able to accumulate all the tuples within the allowed
* amount of memory, or leader to take over worker tapes
* 可以在允许的内存大小中积累所有的元组,或者让协调者接管工作tapes.
*/
if (SERIAL(state))
{
/* Just qsort 'em and we're done */
//快速排序
tuplesort_sort_memtuples(state);
state->status = TSS_SORTEDINMEM;
}
else if (WORKER(state))
{
/*
* Parallel workers must still dump out tuples to tape. No
* merge is required to produce single output run, though.
* 并行worker必须dump元组到磁盘上.
* 但是,生成单个输出运行不需要合并.
*/
inittapes(state, false);
dumptuples(state, true);
worker_nomergeruns(state);
state->status = TSS_SORTEDONTAPE;
}
else
{
/*
* Leader will take over worker tapes and merge worker runs.
* Note that mergeruns sets the correct state->status.
* 并行协调器会接管工作进程的数据并合并工作线程运行.
* 注意mergeruns会设置正确的状态:state->status
*/
leader_takeover_tapes(state);
mergeruns(state);
}
state->current = 0;
state->eof_reached = false;
state->markpos_block = 0L;
state->markpos_offset = 0;
state->markpos_eof = false;
break;
case TSS_BOUNDED://堆排序
/*
* We were able to accumulate all the tuples required for output
* in memory, using a heap to eliminate excess tuples. Now we
* have to transform the heap to a properly-sorted array.
* 使用堆来消除多余的元组,在内存可以积累所有的元组用于输出.
* 现在我们必须转换堆为已排序的数组.
*/
sort_bounded_heap(state);
state->current = 0;
state->eof_reached = false;
state->markpos_offset = 0;
state->markpos_eof = false;
state->status = TSS_SORTEDINMEM;
break;
case TSS_BUILDRUNS:
/*
* Finish tape-based sort. First, flush all tuples remaining in
* memory out to tape; then merge until we have a single remaining
* run (or, if !randomAccess and !WORKER(), one run per tape).
* Note that mergeruns sets the correct state->status.
* 完成tape-based排序.
* 首先刷新所有在内存的元组到tape(持久化存储)上,然后合并直至只留下一个在运行.
* (否则,如果!randomAccess 且 !WORKER(),一个tape运行一次)
*/
//全部刷到磁盘上
dumptuples(state, true);
//合并执行
mergeruns(state);
state->eof_reached = false;
state->markpos_block = 0L;
state->markpos_offset = 0;
state->markpos_eof = false;
break;
default:
elog(ERROR, "invalid tuplesort state");
break;
}
#ifdef TRACE_SORT
if (trace_sort)
{
if (state->status == TSS_FINALMERGE)
elog(LOG, "performsort of worker %d done (except %d-way final merge): %s",
state->worker, state->activeTapes,
pg_rusage_show(&state->ru_start));
else
elog(LOG, "performsort of worker %d done: %s",
state->worker, pg_rusage_show(&state->ru_start));
}
#endif
MemoryContextSwitchTo(oldcontext);
}
测试脚本
select * from t_sort order by c1,c2;
跟踪分析
(gdb) b tuplesort_begin_heap
Breakpoint 1 at 0xa6ffa1: file tuplesort.c, line 812.
(gdb) b tuplesort_puttupleslot
Breakpoint 2 at 0xa7119d: file tuplesort.c, line 1436.
(gdb) b tuplesort_performsort
Breakpoint 3 at 0xa71f45: file tuplesort.c, line 1792.
(gdb) c
Continuing.
Breakpoint 1, tuplesort_begin_heap (tupDesc=0x208fa40, nkeys=2, attNums=0x2081858, sortOperators=0x2081878,
sortCollations=0x2081898, nullsFirstFlags=0x20818b8, workMem=4096, coordinate=0x0, randomAccess=false)
at tuplesort.c:812
812 Tuplesortstate *state = tuplesort_begin_common(workMem, coordinate,
(gdb)
tuplesort_begin_heap
输入参数
(gdb) p *tupDesc
$1 = {natts = 7, tdtypeid = 2249, tdtypmod = -1, tdhasoid = false, tdrefcount = -1, constr = 0x0, attrs = 0x208fa60}
(gdb) p *tupDesc->attrs
$2 = {attrelid = 0, attname = {data = '\000' <repeats 63 times>}, atttypid = 1043, attstattarget = -1, attlen = -1,
attnum = 1, attndims = 0, attcacheoff = -1, atttypmod = 24, attbyval = false, attstorage = 120 'x', attalign = 105 'i',
attnotnull = false, atthasdef = false, atthasmissing = false, attidentity = 0 '\000', attisdropped = false,
attislocal = true, attinhcount = 0, attcollation = 100}
(gdb) p *attNums
$3 = 2
(gdb) p *sortOperators
$4 = 97
(gdb) p *sortCollations
$5 = 0
(gdb) p nullsFirstFlags
$6 = (_Bool *) 0x20818b8
(gdb) p *nullsFirstFlags
$7 = false
(gdb)
获取排序状态,status = TSS_INITIAL
(gdb) p *state
$8 = {status = TSS_INITIAL, nKeys = 0, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true,
availMem = 4169704, allowedMem = 4194304, maxTapes = 0, tapeRange = 0, sortcontext = 0x2093290, tuplecontext = 0x20992c0,
tapeset = 0x0, comparetup = 0x0, copytup = 0x0, writetup = 0x0, readtup = 0x0, memtuples = 0x209b310, memtupcount = 0,
memtupsize = 1024, growmemtuples = true, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0,
slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 0, mergeactive = 0x0, Level = 0,
destTape = 0, tp_fib = 0x0, tp_runs = 0x0, tp_dummy = 0x0, tp_tapenum = 0x0, activeTapes = 0, result_tape = -1,
current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0,
nParticipants = -1, tupDesc = 0x0, sortKeys = 0x0, onlyKey = 0x0, abbrevNext = 0, indexInfo = 0x0, estate = 0x0,
heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0,
datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {
tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {
ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {
ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0},
{ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,
__ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {
ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}
设置运行状态
(gdb) n
819 AssertArg(nkeys > 0);
(gdb)
822 if (trace_sort)
(gdb)
828 state->nKeys = nkeys;
(gdb)
830 TRACE_POSTGRESQL_SORT_START(HEAP_SORT,
(gdb)
837 state->comparetup = comparetup_heap;
(gdb)
838 state->copytup = copytup_heap;
(gdb)
839 state->writetup = writetup_heap;
(gdb)
840 state->readtup = readtup_heap;
(gdb)
842 state->tupDesc = tupDesc; /* assume we need not copy tupDesc */
(gdb)
843 state->abbrevNext = 10;
(gdb)
846 state->sortKeys = (SortSupport) palloc0(nkeys * sizeof(SortSupportData));
(gdb)
848 for (i = 0; i < nkeys; i++)
(gdb) p *state
$9 = {status = TSS_INITIAL, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true,
availMem = 4169704, allowedMem = 4194304, maxTapes = 0, tapeRange = 0, sortcontext = 0x2093290, tuplecontext = 0x20992c0,
tapeset = 0x0, comparetup = 0xa7525b <comparetup_heap>, copytup = 0xa76247 <copytup_heap>,
writetup = 0xa76de1 <writetup_heap>, readtup = 0xa76ec6 <readtup_heap>, memtuples = 0x209b310, memtupcount = 0,
memtupsize = 1024, growmemtuples = true, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0,
slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 0, mergeactive = 0x0, Level = 0,
destTape = 0, tp_fib = 0x0, tp_runs = 0x0, tp_dummy = 0x0, tp_tapenum = 0x0, activeTapes = 0, result_tape = -1,
current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0,
nParticipants = -1, tupDesc = 0x208fa40, sortKeys = 0x20937c0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0,
estate = 0x0, heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0,
datumType = 0, datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0},
ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {
ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {
ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0},
{ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,
__ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {
ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}
(gdb)
为每一列(c1&c2)准备SortSupport数据(分配内存空间)
(gdb) n
850 SortSupport sortKey = state->sortKeys + i;
(gdb)
852 AssertArg(attNums[i] != 0);
(gdb) p *state->sortKeys
$10 = {ssup_cxt = 0x0, ssup_collation = 0, ssup_reverse = false, ssup_nulls_first = false, ssup_attno = 0,
ssup_extra = 0x0, comparator = 0x0, abbreviate = false, abbrev_converter = 0x0, abbrev_abort = 0x0,
abbrev_full_comparator = 0x0}
(gdb) n
853 AssertArg(sortOperators[i] != 0);
(gdb)
855 sortKey->ssup_cxt = CurrentMemoryContext;
(gdb)
856 sortKey->ssup_collation = sortCollations[i];
(gdb)
857 sortKey->ssup_nulls_first = nullsFirstFlags[i];
(gdb)
858 sortKey->ssup_attno = attNums[i];
(gdb)
860 sortKey->abbreviate = (i == 0);
(gdb)
862 PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
(gdb)
848 for (i = 0; i < nkeys; i++)
(gdb)
850 SortSupport sortKey = state->sortKeys + i;
(gdb)
852 AssertArg(attNums[i] != 0);
(gdb)
853 AssertArg(sortOperators[i] != 0);
(gdb)
855 sortKey->ssup_cxt = CurrentMemoryContext;
(gdb)
856 sortKey->ssup_collation = sortCollations[i];
(gdb)
857 sortKey->ssup_nulls_first = nullsFirstFlags[i];
(gdb)
858 sortKey->ssup_attno = attNums[i];
(gdb)
860 sortKey->abbreviate = (i == 0);
(gdb)
862 PrepareSortSupportFromOrderingOp(sortOperators[i], sortKey);
(gdb)
848 for (i = 0; i < nkeys; i++)
(gdb)
完成初始化,返回state
(gdb)
871 if (nkeys == 1 && !state->sortKeys->abbrev_converter)
(gdb) n
874 MemoryContextSwitchTo(oldcontext);
(gdb)
876 return state;
(gdb) p *state
$11 = {status = TSS_INITIAL, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0, tuples = true,
availMem = 4169704, allowedMem = 4194304, maxTapes = 0, tapeRange = 0, sortcontext = 0x2093290, tuplecontext = 0x20992c0,
tapeset = 0x0, comparetup = 0xa7525b <comparetup_heap>, copytup = 0xa76247 <copytup_heap>,
writetup = 0xa76de1 <writetup_heap>, readtup = 0xa76ec6 <readtup_heap>, memtuples = 0x209b310, memtupcount = 0,
memtupsize = 1024, growmemtuples = true, slabAllocatorUsed = false, slabMemoryBegin = 0x0, slabMemoryEnd = 0x0,
slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0, currentRun = 0, mergeactive = 0x0, Level = 0,
destTape = 0, tp_fib = 0x0, tp_runs = 0x0, tp_dummy = 0x0, tp_tapenum = 0x0, activeTapes = 0, result_tape = -1,
current = 0, eof_reached = false, markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0,
nParticipants = -1, tupDesc = 0x208fa40, sortKeys = 0x20937c0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0,
estate = 0x0, heapRel = 0x0, indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0,
datumType = 0, datumTypeLen = 0, ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0},
ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {
ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {
ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0},
{ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,
__ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {
ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}
(gdb)
tuplesort_puttupleslot
出现在循环中
for (;;)
{
//从outer plan中获取元组
slot = ExecProcNode(outerNode);
if (TupIsNull(slot))
break;//直至全部获取完毕
//排序
tuplesort_puttupleslot(tuplesortstate, slot);
}
以其中一个slot为例说明
(gdb) c
Continuing.
Breakpoint 2, tuplesort_puttupleslot (state=0x20933a8, slot=0x208f8c8) at tuplesort.c:1436
1436 MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
输入参数,state为先前调用begin_heap返回的state,slot为outer node返回的元组slot
(gdb) p *slot
$12 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = false,
tts_tuple = 0x2090678, tts_tupleDescriptor = 0x7f061a300380, tts_mcxt = 0x208f270, tts_buffer = 103, tts_nvalid = 0,
tts_values = 0x208f928, tts_isnull = 0x208f960, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {
bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0, tts_fixedTupleDescriptor = true}
(gdb)
slot中的元组数据
(gdb) p *slot->tts_values
$13 = 0
(gdb) p *slot->tts_tuple
$14 = {t_len = 56, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, t_tableOid = 286759, t_data = 0x7f05ee0c4648}
(gdb) p *slot->tts_tuple->t_data
$15 = {t_choice = {t_heap = {t_xmin = 839, t_xmax = 0, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 839,
datum_typmod = 0, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1}, t_infomask2 = 7,
t_infomask = 2306, t_hoff = 24 '\030', t_bits = 0x7f05ee0c465f ""}
(gdb) p *slot->tts_tuple->t_data->t_bits
$16 = 0 '\000'
(gdb) x/16ux *slot->tts_tuple->t_data->t_bits
0x0: Cannot access memory at address 0x0
(gdb) x/16ux slot->tts_tuple->t_data->t_bits
0x7f05ee0c465f: 0x5a470b00 0x00003130 0x00000100 0x00000100
0x7f05ee0c466f: 0x00000100 0x00000100 0x00000100 0x00000100
0x7f05ee0c467f: 0x00000000 0x8f282800 0x000000da 0x40023800
0x7f05ee0c468f: 0x04200002 0x00000020 0x709fc800 0x709f9000
(gdb) x/16bx slot->tts_tuple->t_data->t_bits
0x7f05ee0c465f: 0x00 0x0b 0x47 0x5a 0x30 0x31 0x00 0x00
0x7f05ee0c4667: 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00
(gdb) x/16bc slot->tts_tuple->t_data->t_bits
0x7f05ee0c465f: 0 '\000' 11 '\v' 71 'G' 90 'Z' 48 '0' 49 '1' 0 '\000' 0 '\000'
0x7f05ee0c4667: 0 '\000' 1 '\001' 0 '\000' 0 '\000' 0 '\000' 1 '\001' 0 '\000' 0 '\000'
(gdb) p *slot->tts_tupleDescriptor
$17 = {natts = 7, tdtypeid = 286761, tdtypmod = -1, tdhasoid = false, tdrefcount = 2, constr = 0x0, attrs = 0x7f061a3003a0}
(gdb) p *slot
$18 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = false,
tts_tuple = 0x2090678, tts_tupleDescriptor = 0x7f061a300380, tts_mcxt = 0x208f270, tts_buffer = 103, tts_nvalid = 0,
tts_values = 0x208f928, tts_isnull = 0x208f960, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {
bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0, tts_fixedTupleDescriptor = true}
(gdb) p *slot->tts_values[0]
Cannot access memory at address 0x0
(gdb) p slot->tts_values[0]
$19 = 0
(gdb) x/32bc slot->tts_tuple->t_data->t_bits
0x7f05ee0c465f: 0 '\000' 11 '\v' 71 'G' 90 'Z' 48 '0' 49 '1' 0 '\000' 0 '\000'
0x7f05ee0c4667: 0 '\000' 1 '\001' 0 '\000' 0 '\000' 0 '\000' 1 '\001' 0 '\000' 0 '\000'
0x7f05ee0c466f: 0 '\000' 1 '\001' 0 '\000' 0 '\000' 0 '\000' 1 '\001' 0 '\000' 0 '\000'
0x7f05ee0c4677: 0 '\000' 1 '\001' 0 '\000' 0 '\000' 0 '\000' 1 '\001' 0 '\000' 0 '\000'
(gdb) x/32bx slot->tts_tuple->t_data->t_bits
0x7f05ee0c465f: 0x00 0x0b 0x47 0x5a 0x30 0x31 0x00 0x00
0x7f05ee0c4667: 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00
0x7f05ee0c466f: 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00
0x7f05ee0c4677: 0x00 0x01 0x00 0x00 0x00 0x01 0x00 0x00
拷贝元组,并放到state->memtuples中
(gdb) n
1443 COPYTUP(state, &stup, (void *) slot);
(gdb)
1445 puttuple_common(state, &stup);
(gdb) step
puttuple_common (state=0x20933a8, tuple=0x7ffe890e0b00) at tuplesort.c:1639
1639 Assert(!LEADER(state));
(gdb) n
1641 switch (state->status)
(gdb) p state->status
$20 = TSS_INITIAL
(gdb) n
1652 if (state->memtupcount >= state->memtupsize - 1)
(gdb) p state->memtupcount
$21 = 0
(gdb) p state->memtupsize - 1
$22 = 1023
(gdb) n
1657 state->memtuples[state->memtupcount++] = *tuple;
(gdb)
1671 if (state->bounded &&
(gdb) p state->bounded
$23 = false
(gdb) n
1688 if (state->memtupcount < state->memtupsize && !LACKMEM(state))
(gdb)
1689 return;
(gdb)
1743 }
(gdb)
tuplesort_puttupleslot (state=0x20933a8, slot=0x208f8c8) at tuplesort.c:1447
1447 MemoryContextSwitchTo(oldcontext);
(gdb)
1448 }
(gdb)
(gdb) p state->memtuples[0]
$25 = {tuple = 0x20993d8, datum1 = 1, isnull1 = false, tupindex = 0}
tuplesort_performsort
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y 0x0000000000a6ffa1 in tuplesort_begin_heap at tuplesort.c:812
breakpoint already hit 1 time
2 breakpoint keep y 0x0000000000a7119d in tuplesort_puttupleslot at tuplesort.c:1436
breakpoint already hit 1 time
3 breakpoint keep y 0x0000000000a71f45 in tuplesort_performsort at tuplesort.c:1792
(gdb) del 2
(gdb) c
Continuing.
Breakpoint 3, tuplesort_performsort (state=0x20933a8) at tuplesort.c:1792
1792 MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
(gdb)
输入参数
(gdb) p *state
$27 = {status = TSS_BUILDRUNS, nKeys = 2, randomAccess = false, bounded = false, boundUsed = false, bound = 0,
tuples = true, availMem = 824360, allowedMem = 4194304, maxTapes = 16, tapeRange = 15, sortcontext = 0x2093290,
tuplecontext = 0x20992c0, tapeset = 0x2093a00, comparetup = 0xa7525b <comparetup_heap>,
copytup = 0xa76247 <copytup_heap>, writetup = 0xa76de1 <writetup_heap>, readtup = 0xa76ec6 <readtup_heap>,
memtuples = 0x2611570, memtupcount = 26592, memtupsize = 37448, growmemtuples = false, slabAllocatorUsed = false,
slabMemoryBegin = 0x0, slabMemoryEnd = 0x0, slabFreeHead = 0x0, read_buffer_size = 0, lastReturnedTuple = 0x0,
currentRun = 2, mergeactive = 0x2093878, Level = 1, destTape = 2, tp_fib = 0x20938a0, tp_runs = 0x20938f8,
tp_dummy = 0x2093950, tp_tapenum = 0x20939a8, activeTapes = 0, result_tape = -1, current = 0, eof_reached = false,
markpos_block = 0, markpos_offset = 0, markpos_eof = false, worker = -1, shared = 0x0, nParticipants = -1,
tupDesc = 0x208fa40, sortKeys = 0x20937c0, onlyKey = 0x0, abbrevNext = 10, indexInfo = 0x0, estate = 0x0, heapRel = 0x0,
indexRel = 0x0, enforceUnique = false, high_mask = 0, low_mask = 0, max_buckets = 0, datumType = 0, datumTypeLen = 0,
ru_start = {tv = {tv_sec = 0, tv_usec = 0}, ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0,
tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0,
__ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0,
__ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {
ru_oublock = 0, __ru_oublock_word = 0}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0,
__ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 0, __ru_nvcsw_word = 0}, {
ru_nivcsw = 0, __ru_nivcsw_word = 0}}}}
(gdb) p state->memtupsize
$28 = 37448
(gdb)
state->status状态已切换为TSS_BUILDRUNS
(gdb) n
1795 if (trace_sort)
(gdb)
1800 switch (state->status)
(gdb) p state->status
$29 = TSS_BUILDRUNS
(gdb)
全部刷到磁盘上,归并排序
(gdb) n
1864 dumptuples(state, true);
(gdb)
1865 mergeruns(state);
(gdb)
1866 state->eof_reached = false;
(gdb)
1867 state->markpos_block = 0L;
(gdb)
1868 state->markpos_offset = 0;
(gdb)
1869 state->markpos_eof = false;
(gdb)
1870 break;
(gdb)
1878 if (trace_sort)
(gdb)
1890 MemoryContextSwitchTo(oldcontext);
(gdb)
1891 }
(gdb)
到此,相信大家对“怎么使用PostgreSQL的tuplesort_performsort函数”有了更深的了解,不妨来实际操作一番吧!这里是亿速云网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
亿速云「云服务器」,即开即用、新一代英特尔至强铂金CPU、三副本存储NVMe SSD云盘,价格低至29元/月。点击查看>>
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。
原文链接:http://blog.itpub.net/6906/viewspace-2645213/