2017-03-21

Python 详谈数据类型

最后更新： 2018-01-05

阅读次数：次

许可协议：知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议

首先感谢廖雪峰老师的 Python 教程，因为文章是结合廖雪峰老师的 Python 教程和 Python 官方文档写出来的。

下面只列出了常见的数据类型，全部数据类型参看官方文档。

Boolean Types

True
False
可转换为 False 的值有：None、0、0.0、0j、''、()、[]、{} 等。

Numeric Types

int（整数）
float（小数）
complex（复数）

0b11101   # 二进制书写方式
0o123     # 八进制书写方式

# 将一个整数转为二进制字符串
bin(5)    => '0b101'

# 得到商和余数
divmod(x,y)      => (x//y, x%y)
divmod(5.2, 2)   => (2.0, 1.2000000000000002)

# 设置小数的位数，满五进一
round(x[, n])
round(1.235, 2)  => 1.24
round(2.675, 2)  => 2.67 （注意有例外）

Sequence Types

list、tuple、range

###### 下面是一些序列通用的操作方法（连接和重复不适用于 range）

list = [1,2,3,1,2,1,1,2]

# 统计某个元素出现的次数
list.count(1) => 4

# 计算序列长度
len(list)     => 8

# 连接两个同类型的序列
(12,'ad') + (True,'ha') => (12, 'ad', True, 'ha')
[1,2] + ['aa','bb']     => [1, 2, 'aa', 'bb']

# 重复序列，s * n，重复最初的 s，n 次，并添加到 s 尾部
# 注意第二个例子，重复后添加的对象和原对象的地址一样
(1,'a') * 3 => (1, 'a', 1, 'a', 1, 'a')
list = [[]] * 3
print(list)                => [[], [], []]
print(list[0] is list[1])  => True
list[0].append('ha')
print(list)                => [['ha'], ['ha'], ['ha']]

list（列表）：有序列表，可变，类似数组

# list 常见的操作方法
l = [1,2,3,'a','bb','ccc','ddd','4']

# 替换指定元素的值
l[0] = 'first'
l => ['first', 2, 3, 'a', 'bb', 'ccc', 'ddd', '4']

# 替换指定区间 [1,4) 的元素
l[1:4] = ['二','哈']
l => ['first', '二', '哈', 'bb', 'ccc', 'ddd', '4']

# 删除索引为 4 的元素
del l[4]
l => ['first', '二', '哈', 'bb', 'ddd', '4']

# 删除索引在 [2,4) 之间的元素
del l[2:4]
l => ['first', '二', 'ddd', '4']

# 为列表尾部添加元素
l.append('last')
l => ['first', '二', 'ddd', '4', 'last']

# 将指定列表添加到当前列表的尾部
l1 = [1,'a']
l2 = ['as',True]
l1.extend(l2)
l1 => [1, 'a', 'as', True]

# 清空列表的所有元素（类似 del l[:]）
l.clear()
l => []

# 重新赋值
l = ['first', '二', 'ddd', '4', 'last']

# 返回一个 l 的拷贝对象（类似 l[:]）
l.copy()

# 向指定位置插入元素x，其他元素后移
# l.insert(index, x)
l.insert(1,'second')
l = ['first', 'second', '二', 'ddd', '4', 'last']

# 去除指定位置的元素，默认去除最后一个
l.pop([index])

# 反转 list 中的元素
l.reverse()
l => ['last', '4', 'ddd', '二', 'second', 'first']

# 将 list 变为 索引-元素 对
l = ['aaa',123,True]
l = list(enumerate(l))
l => [(0, 'aaa'), (1, 123), (2, True)]

tuple（元组）：有序列表，一旦初始化，就不可改变

在某些情况下，tuple 的括号可以省略。tuple 对于赋值语句有特殊的处理。

x, y = 1, 2   # 等价于 x = 1 , y = 2
x, y = y, x   # 交换了 x 于 y 的值

range（序列）：利用它的方法（range(start, stop, step)），可生成一个整数序列

list(range(5))       => [0, 1, 2, 3, 4]
list(range(2,6))     => [2, 3, 4, 5]
list(range(2,18,3))  => [2, 5, 8, 11, 14, 17]

Text Sequence Type

# 多行字符串
print('''line1
line2
line3
''')

# 字符串内的转义符号不转义（字符串前添加 r）
a = 'hello\nworld'
b = r'hello\nworld'
print(a) => 'hello\nworld'
print(b) => 'hello\\nworld'

# 重复字符串
'hola!' * 3                => 'hola!hola!hola!'

# 连接字符串
'hello' + ' world'         => 'hello world'

# 以指定的方式连接字符串 str.join(iterable)
l = ['abc','def','123']
s = '###'
s.join(l)                  => 'abc###def###123'

# 统计某个字符出现的次数
'hello world'.count('l')   => 3

# 去掉字符串两边空格
'  what is it  '.strip()   => 'what is it'

# 判断某个或某些字符是否在字符串内
'你hao' in 'hello,你hao'   => True

# 切割字符串转为列表
str = '1.2.asd'
str.split('.')   => ['1', '2', 'asd']
list(str)        => ['1', '.', '2', '.', 'a', 's', 'd']

# 分行转为列表
s = 'line 1\nline 2\nline 3'
s.splitlines()    => ['line 1', 'line 2', 'line 3']

# 判断字符串是否以指定前缀、后缀结尾
s = 'es6-destructuring-assignment.md'
s.startswith('es6')   => True
s.endswith('.md')     => True

# 替换字符串 str.replace(old, new[, max])（max 指定替换次数，默认替换所有）
s = 'line 1\nline 2\nline 3'
s.replace('\n','@@@') => 'line 1@@@line 2@@@line 3'

# 大小写转换
s = 'Hello World'
s.lower()     => 'hello world'
s.upper()     => 'HELLO WORLD'
s.swapcase()  => 'hELLO wORLD'

# 格式化字符串
s = 'python Is great, is it? i always think so'
      # 首字母大写，其他小写
s.capitalize() => 'Python is great, is it? i always think so'
      # 首字母都大写
s.title()      => 'Python Is Great, Is It? I Always Think So'

Binary Sequence Types

bytes：字节流序列，类似于字符串 str，都是不可变的序列

# 定义一个 bytes
a = b'asdbf'

# 字符串转 bytes
s = 'hello 波比'
bytes(s, encoding='utf-8') => b'hello \xe6\xb3\xa2\xe6\xaf\x94'
b = s.encode()             => b'hello \xe6\xb3\xa2\xe6\xaf\x94'

# bytes 转字符串
b.decode()                 => 'hello 波比'

由于 bytes 是一种类似字符串的序列结构，所以它有很多操作方法类似于字符串。

b'py' in b'i love python'  => True

bytearray：可以说它也类似于 bytes，也是一个字节流序列，但它是可变的，它也有 list 的 append()、extend() 方法。

s = 'hello 哈哈'
barr = bytearray(s,encoding='utf-8')
barr => bytearray(b'hello \xe5\x93\x88\xe5\x93\x88')

barr.append(5)
barr => bytearray(b'hello \xe5\x93\x88\xe5\x93\x88\x05')
barr.append(299)  => 报错，byte must be in range(0, 256)

bytes 与 bytearray 的主要区别在于可变与不可变。

举个例子说明他们的用途：比如现在要从服务器接收一个很大的数据，服务器那边是断断续续地把数据发送过来的，所以客户端这边需要写一个循环进行接收数据，使用 bytes 接收数据的话，每次接收到新的数据，就需要对盛放已经接收的数据的变量进行重新赋值，而使用 bytearray 的话，直接将新接收的数据添加到已经接收的数据的末尾即可。情况就是这样，使用 bytearray 可以避免每次赋值产生的开销。

参考文章：The difference between bytearray and bytes in Python

memoryview：。。。先空着

Set Types

set：无序集合，是一个键的集合，可变，无重复键

A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

# 一个 set
s = {'a',12,'halo'}

# 添加或删除元素
s.add('德玛')
s => {'a', 12, '德玛', 'halo'}
s.remove('halo')
s => {'a', 12, '德玛'}
s.clear()
s => set()

frozenset：类似于 set，但是它是不可变，即初始化后就不可再变

Mapping Types

dict：使用键-值对（key-value）存储，类似 JavaScript 中的对象，只不过 dict 的键要加引号

d = {'name':'percy','age':22,}

for key in d:                # 迭代键
for value in d.values():     # 迭代值
for key,value in d.items():  # 迭代键值

m1 = {"name":"森rcy","age":22,"skills":[1,2,3]}

# 计算字典元素的个数，即键的个数
len(m1)  => 3

# 将字典以字符串的形式表示
str(m1)  => "{'age': 22, 'skills': [1, 2, 3], 'name': '森rcy'}"

# 去除指定键的键值对
m1.pop('name')
m1       => {'age': 22, 'skills': [1, 2, 3]}

# 向字典中插入新的键值对，并且插入相同的键，不会覆盖原始值
m1.setdefault('who')
m1       => {'age': 22, 'skills': [1, 2, 3], 'who': None}
m1.setdefault('who', 'pppercy')
m1       => {'age': 22, 'skills': [1, 2, 3], 'who': None}
m1.pop('who')
m1.setdefault('who', 'pppercy')
m1       => {'age': 22, 'skills': [1, 2, 3], 'who': 'pppercy'}

# dict1.update(dict2)
# 将字典dict2的内容更新到dict1，重复的键值对会进行覆盖
m2 = {'age':1000,'skills':['html','js'],'new':'newvalue'}
m1.update(m2)
m1       => {'age': 1000, 'skills': ['html', 'js'], 'who': 'pppercy', 'new': 'newvalue'}

# 清空字典
m1.clear()
m1       => {}

Iterator Types（迭代器类型）

把可以直接作用于 for 循环的数据类型称为可迭代对象：Iterable
- 一类是集合数据类型：list、tuple、set、dict、str 等
- 另一类是迭代器（比如生成器）

# 判断一个对象是否是可迭代对象
from collections import Iterable
isinstance([],Iterable)     => True

形成 生成器（generator） 有以下两种方式
- 一是通过元组推导式生成
- 二是通过函数生成（如果一个函数定义中包含 yield 关键字，那么这个函数就不再是一个普通函数，而是一个 generator。）

# 元组推导式
g1 = (x*x for x in range(5,10))

# 函数生成
def func():
    yield '111'
    yield '222'
    yield '哈喽，Word'
g2 = func()

可以被 next() 函数调用并不断返回下一个值的对象称为迭代器（Iterator）
- 生成器就是一种迭代器，而 list、set 等就不是迭代器。

# 判断一个对象是否是迭代器
from collections import Iterator
l = ['aaa',123,'sd']
isinstance(l, Iterator)   => False
g = (x for x in l)
isinstance(g, Iterator)   => True

把 list、dict、str 等 Iterable 变成 Iterator 可以使用 iter() 函数

from collections import Iterator
l = ['aaa',123,'sd']
isinstance(l, Iterator)        => False
isinstance(iter(l), Iterator)  => True

为什么 list、dict、str 等数据类型不是 Iterator？

这是因为 Python 的 Iterator 对象表示的是一个数据流，Iterator 对象可以被 next() 函数调用并不断返回下一个数据，直到没有数据时抛出 StopIteration 错误。可以把这个数据流看做是一个有序序列，但我们却不能提前知道序列的长度，只能不断通过 next() 函数实现按需计算下一个数据，所以 Iterator 的计算是惰性的，只有在需要返回下一个数据时它才会计算。Iterator 甚至可以表示一个无限大的数据流，例如全体自然数。而使用 list 是永远不可能存储全体自然数的。

凡是可作用于 for 循环的对象都是 Iterable 类型

凡是可作用于 next()函数的对象都是 Iterator 类型，它们表示一个惰性计算的序列

Python 的 for 循环本质上就是通过不断调用 next() 函数实现的

for x in [1, 2, 3, 4, 5]:
    pass

###### 上面代码等价于下面代码 ######

# 首先获得Iterator对象:
it = iter([1, 2, 3, 4, 5])
# 循环:
while True:
    try:
        # 获得下一个值:
        x = next(it)
    except StopIteration:
        # 遇到StopIteration就退出循环
        break

数据类型转换

Python 内置了一些函数来方便地转换数据类型。

bool('hola')        => True
int('12')           => 12
int('12.11')        => 报错
float('12.11')      => 12.11
str(True)           => 'True'
list((1,2))         => [1, 2]
list({1,'asd','s'}) => ['s', 1, 'asd']
tuple({1,2,3})      => (1, 2, 3)
set([1, 2])         => {1, 2}
dict([1,2])         => 报错

更多内置函数，点这里。

数据类型检测

有一个内置函数 type()，虽然可以用来查看数据类型，但还是不太方便进行数据类型判断。因为它返回一个 type 类型的对象。。。

type(12)         => <class 'int'>
type(type(12))   => <class 'type'>

所以呢，推荐用 isinstance() 进行类型检测。

isinstance(3.14, int)              => False
isinstance(3.14, (str, float))     => True
isinstance(3.14, (str,float,int))  => True
isinstance([1,2], list)            => True