Python中如何使用struct模板

发布时间：2021-07-02 15:18:40 来源：亿速云阅读：276 作者：Leah 栏目：大数据

Python中如何使用struct模板，相信很多没有经验的人对此束手无策，为此本文总结了问题出现的原因和解决方法，通过这篇文章希望你能解决这个问题。

struct模板主要函数有：

pack(v1, v2, ...)
unpack(string)
pack_into(buffer, offset, v1, v2, ...)
unpack_from(buffer, offset=0)

下文一一介绍

pack() and unpack()

pack()

先来看看官方说明:

pack(fmt, v1, v2, ...):

Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.

就是把values:v1, v2按照对应fmt(format)方式转换为string.

来看个栗子：

>>> import struct
>>> 
>>> v1 = 1
>>> v2 = 'abc'
>>> bytes = struct.pack('i3s', v1, v2)
>>> bytes
'\x01\x00\x00\x00abc'

这里的fmt就是'i3s'，什么意思呢？其中i就是integer,即整数，后面的s对应string。在上面的栗子中，abc是长度为３的字符串，所以就有了3s.

这里有一个完整的fmt列表：

fmt.png

unpack()

同样，先看看官方文档

unpack(fmt, string)

Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

简单说来，就是把string按照对应的fmt形式解析出来。注意，结果返回的是一个tuple

举个栗子

>>> bytes = '\x01\x00\x00\x00abc'
>>> v1, v2 = struct.unpack('i3s', bytes)
>>> v1
1
>>> v2
'abc'

这就把上面的v1，v2还原回去了。

注意，当返回值只有一个时：

>>> a = 2
>>> a_pack = struct.pack('i',a)
>>> a_unpack = struct.unpack('i',a_pack)  #此处得到的a_unpack为tuple
>>> a_unpack
(2,)
>>> a_unpack, = struct.unpack('i',a_pack) #此处得到的a_unpack为int
>>> a_unpack
2

Byte Order, Size, and Alignment

这里穿插一下字节的顺序，大小，和对齐问题

byte order

下面有个表

order.png

如果在fmt字符串前加上了'<',那么字节将会采用little-endian即小端的排列方式，如果是'>'会采用big-endian即大端的排列方式。默认的是'@'方式

举个栗子

>>> a = 2
>>> a_pack = struct.pack('i',a)      #这是默认的，机器不同可能会不同，我这里默认为字节按little-endian顺序排列
>>> a_pack
'\x02\x00\x00\x00'
>>> 
>>> a_pack2 = struct.pack('>i',a)    # '>'即big-endian
>>> a_pack2
'\x00\x00\x00\x02'
>>> 
>>> a_pack3 = struct.pack('<i',a)   #'<'即little-endian
>>> a_pack3
'\x02\x00\x00\x00'

如果不按默认的小端或大端字节排列，加上'<'或'>'，unpack就要留意了

>>> a = 2
>>> a_pack2 = struct.pack('>i',a)   #big-endian
>>> a_pack2
'\x00\x00\x00\x02'
>>> a_unpack, = struct.unpack('<i',a_pack2)    #little-endian
>>> a_unpack
33554432
>>> a_unpack2, = struct.unpack('>i', a_pack2)   #big-endian
>>> a_unpack2
2

如上所示，如果pack与unpack操作的字节顺序不一致，把little-endian和big-endian乱搞，就会导致数据搞乱

size and alignment

其实，struct是类似于C语言中的struct结构体方式存储数据的。故这里有一个数据的对齐方式问题。如果在内存为32位(即４GB)机器中，一般是以4 bytes对齐的。CPU一次读取４字节，然后放入对应的cache(缓存)中。

看个栗子

struct A{
  char c1;
  int a;
  char c2;
}

结构体A会占用多少内存大小呢？直觉上可能是 1+4+1 = 6　字节，但一般来说，其实是12字节！在第一个char变量c1占用了一字节后，由于是４字节对齐的，int变量a不会插在c1后面，而是把c1后面隐式的补上3个字节，然后把a放在了下面的那行中，最后把char变量c2放到a下面。
再看看下面的

struct A{
  char c1;
  char c2;
  int a;
}

这种情形，结构体A会占用多少内存呢？答案是8字节。原理同上，先把char变量c1放上去，和c1同行的还有３字节，一看下一个char变量c2才１字节，于是就把c2接在c1后面了，此时还剩2字节，但是已经不够int了，故只能填充上２字节，然后另起一行。

想想为什么要这样呢？这岂不是浪费了内存了？！从某种意义上说，确实是浪费了内存，但这却提高了CPU的效率！
想想这种情景模式：假设内存中某一行已经先放了一字节的char变量c, 下一个是轮到int变量a了，它一共占４字节内存，先是拿出3字节放在了变量c的后面，然后再拿最后的１字节放在下面一行。
如果CPU想读取a变量该怎么办？它应该读取２次！一次读取３字节，一次读取１字节。故这速度真是拖了，慢了一倍啊！如果变量ａ是另起一行的话，只要读取一次就够了，直接把４字节取走。

calcsize()

有了上了的简单认识，就好理解这个函数是干什么了的

文档君说

struct.calcsize(fmt)

Return the size of the struct (and hence of the string) corresponding to the given format.

简单说来，就是根据fmt计算出struct占用了内存的多少字节

举个栗子

>>> struct.calcsize('ci')
8
>>> struct.calcsize('ic')
5

查查上面的format表可知，c对应于char,大小为１字节；i对应于int,大小为４字节。所以，出现了上面情况，至于原因，不再累赘。只是最后的ic输出了５，我猜，在struct所占用内存行中的最后一行是不用再padding即填充了。

上面举的栗子都是加了padding的，如果不填充呢？

>>> struct.calcsize('<ci')
5
>>> struct.calcsize('@ci')
8

倘若在fmt前加上了'<','>','=','!'这些，则不会padding,即不填充。默认的或是'@'则会。

pack_into() and pack_from()

在具体讲解之前，先来看几个函数预热一下

binascii module

这个模块用于二进制和ASCII码之间的转换，下面介绍几个函数

binascii.b2a_hex(data)
binascii.hexlify(data)

Return the hexadecimal representation of the binary data. Every byte of data is converted into the corresponding 2-digit hex representation. The resulting string is therefore twice as long as the length of data.

简单说来，就是用十六进制表示二进制数。

举个栗子

>>> import binascii
>>> s = 'abc'
>>> binascii.b2a_hex(s)
'616263'
>>> binascii.hexlify(s)
'616263'

binascii.a2b_hex(hexstr)
binascii.unhexlify(hexstr)

Return the binary data represented by the hexadecimal string hexstr. This function is the inverse of b2a_hex()
hexstr must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise a TypeError is raised.

简单说来，就是上面函数的反操作，即把十六进制串转为二进制数据

举个栗子

>>> binascii.a2b_hex('616263')
'abc'
>>> binascii.unhexlify('616263')
'abc'

pack_into()　and pack_from()

文档说

struct.pack_into(fmt, buffer, offset, v1, v2, ...)

Pack the values v1, v2, ...
according to the given format, write the packed bytes into the writable buffer starting at offset. Note that the offset is a required argument.

简单说来，就是把values：v1, v2, ...打包按格式fmt转换后写入指定的内存buffer中，并且可以指定buffer中的offset即偏移量，从哪里开始写。

struct.unpack_from(fmt, buffer[, offset=0])

Unpack the buffer according to the given format. The result is a tuple even if it contains exactly one item. The buffer must contain at least the amount of data required by the format (len(buffer[offset:])
must be at least calcsize(fmt)).

简单说来，就是从内存中的指定buffer区读取出来，然后按照fmt格式解析。可以指定offset，从buffer的哪个位置开始读取。

相比于前面的pack, unpack，这两个函数有什么作用呢？我们也可以看出区别，就是多了buffer这东东，内存中的一个缓冲区。在前面，pack需要将values v1, v2打包放入内存中某个区域，而这某个区域是程序内部定的，可能会让出很多的空间给它放，这有点浪费了。其次，如果每次间断性的来一些vlaues，然后又要开辟新的空间，这效率有点慢了，拖时间啊！那还不如我们一次性给定算了，而且我们可以指定多少内存给它，这样就不会浪费内存了。

举个栗子

import struct
import binascii
import ctypes

vals1 = (1, 'hello', 1.2)
vals2 = ('world', 2)
s1 = struct.Struct('I5sf')
s2 = struct.Struct('5sI')
print 's1 format: ', s1.format
print 's2 format: ', s2.format

b_buffer = ctypes.create_string_buffer(s1.size+s2.size)  #开出一块buffer
print 'Before pack:',binascii.hexlify(b_buffer)
s1.pack_into(b_buffer,0,*vals1)
s2.pack_into(b_buffer,s1.size,*vals2)
print 'After pack:',binascii.hexlify(b_buffer)
print 'vals1 is:', s1.unpack_from(b_buffer,0)
print 'vals2 is:', s2.unpack_from(b_buffer,s1.size)

结果输出：

s1 format:  I5sf
s2 format:  5sI
Before pack: 00000000000000000000000000000000000000000000000000000000
After pack: 0100000068656c6c6f0000009a99993f776f726c6400000002000000
vals1 is: (1, 'hello', 1.2000000476837158)
vals2 is: ('world', 2)

看完上述内容，你们掌握Python中如何使用struct模板的方法了吗？如果还想学到更多技能或想了解更多相关内容，欢迎关注亿速云行业资讯频道，感谢各位的阅读！

向AI问一下细节

Python中如何使用struct模板

pack() and unpack()

pack()

unpack()

Byte Order, Size, and Alignment

byte order

size and alignment

calcsize()

pack_into() and pack_from()

binascii module

pack_into() and pack_from()

猜你喜欢

最新资讯

相关推荐

相关标签

pack_into()　and pack_from()