python整数、字符串、字节串相互转换 · python

[TOC] python的数据转换很灵活，所以用日志记录下他们的用法。 ### 概览 | | 数字 | 字符串 | 字节码 | | --- | --- | --- | --- | | 到数字 | [进制转换](https://edonymu.com/2017/06/18/python%E6%95%B4%E6%95%B0%E3%80%81%E5%AD%97%E7%AC%A6%E4%B8%B2%E3%80%81%E5%AD%97%E8%8A%82%E4%B8%B2%E7%9B%B8%E4%BA%92%E8%BD%AC%E6%8D%A2/#进制转换) | [字符转整数](https://edonymu.com/2017/06/18/python%E6%95%B4%E6%95%B0%E3%80%81%E5%AD%97%E7%AC%A6%E4%B8%B2%E3%80%81%E5%AD%97%E8%8A%82%E4%B8%B2%E7%9B%B8%E4%BA%92%E8%BD%AC%E6%8D%A2/#字符to整数) | [字节串转整数](https://edonymu.com/2017/06/18/python%E6%95%B4%E6%95%B0%E3%80%81%E5%AD%97%E7%AC%A6%E4%B8%B2%E3%80%81%E5%AD%97%E8%8A%82%E4%B8%B2%E7%9B%B8%E4%BA%92%E8%BD%AC%E6%8D%A2/#字节串to整数) | | 到字符串 | str() | [字符串编码解码](https://edonymu.com/2017/06/18/python%E6%95%B4%E6%95%B0%E3%80%81%E5%AD%97%E7%AC%A6%E4%B8%B2%E3%80%81%E5%AD%97%E8%8A%82%E4%B8%B2%E7%9B%B8%E4%BA%92%E8%BD%AC%E6%8D%A2/#字节串to字符串) | decode(‘hex’) | | 到字节码 | [数字转字符串](https://edonymu.com/2017/06/18/python%E6%95%B4%E6%95%B0%E3%80%81%E5%AD%97%E7%AC%A6%E4%B8%B2%E3%80%81%E5%AD%97%E8%8A%82%E4%B8%B2%E7%9B%B8%E4%BA%92%E8%BD%AC%E6%8D%A2/#整数to字节串) | [字符串转字节串](https://edonymu.com/2017/06/18/python%E6%95%B4%E6%95%B0%E3%80%81%E5%AD%97%E7%AC%A6%E4%B8%B2%E3%80%81%E5%AD%97%E8%8A%82%E4%B8%B2%E7%9B%B8%E4%BA%92%E8%BD%AC%E6%8D%A2/#字符串to字节串) | no | | 函数 | 功能 | 记忆口诀 | 备注 | | --- | --- | --- | --- | | chr | 数字转成对应的ascii字符 | chr长得很像char，因此转成char | 范围为0~255 | | ord | 单个字符转对应ascii序号 | digit为最后一个字母 | ### 进制转换 * 10进制转16进制: ~~~ hex(16) ==> 0x10 ~~~ * 16进制转10进制: int(STRING,BASE)将字符串STRING转成十进制int，其中STRING的基是base。该函数的第一个参数是字符串 ~~~ int('0x10', 16) ==> 16 ~~~ 类似的还有八进制oct()，二进制bin() * 16进制字符串转成二进制 ~~~ hex_str='00fe' bin(int('1'+hex_str, 16))[3:] #含有前导0 # 结果 '0000000011111110' bin(int(hex_str, 16))[2:] #忽略前导0 # 结果 '11111110' ~~~ * 二进制字符串转成16进制字符串 ~~~ bin_str='0b0111000011001100' hex(int(bin_str,2)) # 结果 '0x70cc' ~~~ ### 字符to整数 * 10进制字符串: ~~~ int('10') ==> 10 ~~~ * 16进制字符串: ~~~ int('10', 16) ==> 16 # 或者 int('0x10', 16) ==> 16 ~~~ * 字节串to整数使用网络数据包常用的struct，兼容C语言的数据结构 struct中支持的格式如下表 | Format | C-Type | Python-Type | 字节数 | 备注 | | --- | --- | --- | --- | --- | | x | pad byte | no value | 1 | | | c | char | string of length 1 | 1 | | | b | signed char | integer | 1 | | | B | unsigned char | integer | 1 | | | ? | _Bool | bool | 1 | | | h | short | integer | 2 | | | H | unsigned short | integer | 2 | | | i | int | integer | 4 | | | I | unsigned int | integer or long | 4 | | | l | long | integer | 4 | | | L | unsigned long | long | 4 | | | q | long long | long | 8 | 仅支持64bit机器 | | Q | unsigned long long | long | 8 | 仅支持64bit机器 | | f | float | float | 4 | | | d | double | float | 8 | | | s | char[] | string | 1 | | | p | char[] | string | 1(与机器有关) | 作为指针 | | P | void * | long | 4 | 作为指针 | 对齐方式：放在第一个fmt位置 | CHARACTER | BYTE ORDER | SIZE | ALIGNMENT | | --- | --- | --- | --- | | @ | native | native | native | | = | native | standard | none | | < | little-endian | standard | none | | > | big-endian | standard | none | | ! | network (= big-endian) | standard | none | * 转义为short型整数: ~~~ struct.unpack(' (1, 0) ~~~ * 转义为long型整数: ~~~ struct.unpack(' (1,) ~~~ ### 整数to字节串 * 转为两个字节: ~~~ struct.pack(' b'\x01\x00\x02\x00' ~~~ * 转为四个字节: ~~~ struct.pack(' b'\x01\x00\x00\x00\x02\x00\x00\x00' ~~~ ### 整数to字符串 * 直接用函数 ~~~ str(100) ~~~ ### 字符串to字节串 * bytes、str与unicode的区别 Python3有两种表示字符序列的类型：bytes和str。前者的实例包含原始的8位值，后者的实例包含Unicode字符。 Python2也有两种表示字符序列的类型，分别叫做str和Unicode。与Python3不同的是，str实例包含原始的8位值；而unicode的实例，则包含Unicode字符。把Unicode字符表示为二进制数据（也就是原始8位值）有许多种办法。最常见的编码方式就是UTF-8。但是，Python3的str实例和Python2的unicode实例都没有和特定的二进制编码形式相关联。要想把Unicode字符转换成二进制数据，就必须使用encode方法。要想把二进制数据转换成Unicode字符，则必须使用decode方法。编写Python程序的时候，一定要把编码和解码操作放在界面最外围来做。程序的核心部分应该使用Unicode字符类型（也就是Python3中的str、Python2中的unicode），而且不要对字符编码做任何假设。这种办法既可以令程序接受多种类型的文本编码（如Latin-1、Shift JIS和Big5），又可以保证输出的文本信息只采用一种编码形式（最好是UTF-8）。由于字符类型有别，所以Python代码中经常会出现两种常见的使用情境：开发者需要原始8位值，这些8位值表示以UTF-8格式（或其他编码形式）来编码的字符。开发者需要操作没有特定编码形式的Unicode字符。 * *decode和encode区别* ![python-str](https://edonymu.files.wordpress.com/2017/06/python-str.png?w=660) * 字符串编码为字节码: ~~~ '12abc'.encode('ascii') ==> b'12abc' ~~~ * 数字或字符数组: ~~~ bytes([1,2, ord('1'),ord('2')]) ==> b'\x01\x0212' ~~~ * 16进制字符串: ~~~ bytes().fromhex('010210') ==> b'\x01\x02\x10' ~~~ * 16进制字符串: ~~~ bytes(map(ord, '\x01\x02\x31\x32')) ==> b'\x01\x0212' ~~~ * 16进制数组: ~~~ bytes([0x01,0x02,0x31,0x32]) ==> b'\x01\x0212' ~~~ ### 字节串to字符串 * 字节码解码为字符串: ~~~ bytes(b'\x31\x32\x61\x62').decode('ascii') ==> 12ab ~~~ * 字节串转16进制表示,夹带ascii: ~~~ str(bytes(b'\x01\x0212'))[2:-1] ==> \x01\x0212 ~~~ * 字节串转16进制表示,固定两个字符表示: ~~~ str(binascii.b2a_hex(b'\x01\x0212'))[2:-1] ==> 01023132 ~~~ * 字节串转16进制数组: ~~~ [hex(x) for x in bytes(b'\x01\x0212')] ==> ['0x1', '0x2', '0x31', '0x32'] ~~~ 问题：什么时候字符串前面加上’r’、’b’、’u’，其实官方文档有写。我认为在Python2中，r和b是等效的。 The Python 2.x documentation: > A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix. > ‘b’字符加在字符串前面，对于python2会被忽略。加上’b’目的仅仅为了兼容python3，让python3以bytes数据类型(0~255)存放这个字符、字符串。 The Python 3.3 documentation states: > Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes. > 数据类型byte总是以’b’为前缀，该数据类型仅为ascii。下面是stackflow上面一个回答。我觉得不错，拿出来跟大家分享 In Python 2.x > Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was: > > * unicode = u’…’ literals = sequence of Unicode characters = 3.x str > * str = ‘…’ literals = sequences of confounded bytes/characters > Usually text, encoded in some unspecified encoding. > But also used to represent binary data like struct.pack output. Python 3.x makes a clear distinction between the types: > * str = ‘…’ literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled) > * bytes = b’…’ literals = a sequence of octets (integers between 0 and 255) [参考文献](https://lixingcong.github.io/2016/03/06/convert-data-in-python/)