[TOC]
# 显示指定的列
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
print(csv[['Age', 'Fare']])
~~~
输出
~~~
Age Fare
0 22.0 7.2500
1 38.0 71.2833
2 26.0 7.9250
3 35.0 53.1000
4 35.0 8.0500
5 NaN 8.4583
6 54.0 51.8625
~~~
# 定位数据
**Ioc用label来去定位**
**iloc用position来去定位**
## iloc定位数据
**用具体值定位用iloc**
**获取指定行**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
# 获取第一行数据
print(csv.iloc[0])
~~~
输出
~~~
PassengerId 1
Survived 0
Pclass 3
Name Braund, Mr. Owen Harris
Sex male
Age 22
SibSp 1
Parch 0
Ticket A/5 21171
Fare 7.25
Cabin NaN
Embarked S
Name: 0, dtype: object
~~~
获取0-5行数据
~~~
# 获取第0-5行数据
print(csv.iloc[0:5])
~~~
**获取0-5行,1-3列数据**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
# 获取第0-5行数据
print(csv.iloc[0:5, 1:3])
~~~
输出
~~~
Survived Pclass
0 0 3
1 1 1
2 1 3
3 1 1
4 0 3
~~~
## loc定位数据
用标签定位数据用Ioc
**根据名字定位数据**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
# 根据具体名字定位数据
print(index.loc['Braund, Mr. Owen Harris'])
~~~
输出
~~~
PassengerId 1
Survived 0
Pclass 3
Sex male
Age 22
SibSp 1
Parch 0
Ticket A/5 21171
Fare 7.25
Cabin NaN
Embarked S
Name: Braund, Mr. Owen Harris, dtype: object
~~~
**获取指定名字,指定列的数据**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
# 根据具体名字定位数据
print(index.loc['Braund, Mr. Owen Harris', 'Fare'])
~~~
输出
`7.25`
**输出从这个名字到那个名字的数据**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
# 根据具体名字定位数据
print(index.loc['Braund, Mr. Owen Harris':'Heikkinen, Miss. Laina',:])
~~~
输出
![](https://box.kancloud.cn/85521b296ed5a6d071c2ea3e82887d47_1404x308.png)
**修改价格**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
# 把这个值改为1000
index.loc['Braund, Mr. Owen Harris','Fare'] = 1000
print(index.loc['Braund, Mr. Owen Harris'])
~~~
输出
~~~
PassengerId 1
Survived 0
Pclass 3
Sex male
Age 22
SibSp 1
Parch 0
Ticket A/5 21171
Fare 1000
Cabin NaN
Embarked S
Name: Braund, Mr. Owen Harris, dtype: object
~~~
**ix方法被遗弃了**
**求性别为male的年龄的平均值**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
print(index.loc[index['Sex'] == 'male', 'Age'].mean())
~~~
输出
~~~
30.7266445916
~~~
# bool类型的索引
**价格大于40会返回true**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
print(index['Fare'] > 40)
~~~
输出
~~~
Name
Braund, Mr. Owen Harris False
Cumings, Mrs. John Bradley (Florence Briggs Thayer) True
Heikkinen, Miss. Laina False
Futrelle, Mrs. Jacques Heath (Lily May Peel) True
~~~
**上面的布尔类型结果本身就是索引**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
print(index[index['Fare'] > 40])
~~~
输出
![](https://box.kancloud.cn/4a7ce19e65de58e8e858e30571d1bf05_1378x518.png)
如果我们只要前5个的话,可以这样写
~~~
print(index[index['Fare'] > 40][:5])
~~~
**取性别为male的前5个**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
index = csv.set_index('Name')
print(index[index['Sex'] == 'male'][:5])
~~~
**求年龄大于70的和**
~~~
import pandas as pd
csv = pd.read_csv('./titanic.csv')
print((csv['Age'] > 70).sum())
~~~
输出
5
# 索引倒叙
~~~
s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')
print(s)
~~~
输出
~~~
4 0
3 1
2 2
1 3
0 4
dtype: int64
~~~
# 查看是否在这个里面
~~~
s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')
isin = s.isin([1, 3, 4])
print(isin)
# 取出里面的数据
isi = s[s.isin([1, 3, 4])]
print(isi)
~~~
输出
~~~
4 False
3 True
2 False
1 True
0 True
dtype: bool
3 1
1 3
0 4
dtype: int64
~~~
# 多重索引,查看是否在这里面
~~~
s = pd.Series(np.arange(6), index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']]))
print(s)
rel = s.iloc[s.index.isin([(1, 'a'), (2, 'b')])]
print(rel)
~~~
输出
~~~
0 a 0
b 1
c 2
1 a 3
b 4
c 5
dtype: int64
1 a 3
dtype: int64
~~~
# where操作
~~~
s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64')
print(s[s > 2])
~~~
输出
~~~
1 3
0 4
dtype: int64
~~~
~~~
# 创建时间从20171124开始8天
dates = pd.date_range('20171124', periods=8)
# 创建数据,索引的时间,列是columns
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
# 第二个参数可以不写,那就是NAN,-df就表示不是<0的变为-的
rel = df.where(df < 0, -df)
print(rel)
~~~
输出
~~~
A B C D
2017-11-24 -1.036210 -0.274139 -0.259427 -2.142512
2017-11-25 -1.444938 -0.845857 -0.428317 -0.233512
2017-11-26 -0.471317 -0.762691 -0.840787 -1.285823
2017-11-27 -0.186347 -1.267221 -0.846105 -1.181464
2017-11-28 -0.712680 -0.550270 -0.502814 -0.978257
2017-11-29 -1.994509 -0.833734 -0.572442 -0.191683
2017-11-30 -1.166124 -0.812152 -0.027302 -0.453136
2017-12-01 -0.487233 -0.138206 -0.195269 -0.543170
~~~
# select操作
~~~
# 创建时间从20171124开始8天
dates = pd.date_range('20171124', periods=8)
# 创建数据,索引的时间,列是columns
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
print(df)
select = df.select(lambda x: x == 'A', axis='columns')
print(select)
~~~
输出
~~~
A B C D
2017-11-24 0.785305 0.690050 0.004393 -0.001702
2017-11-25 2.191914 -0.167305 -0.055147 -0.098674
2017-11-26 -1.835203 -0.264115 -0.160448 0.523237
2017-11-27 -0.172933 -0.832750 -1.822103 -0.937862
2017-11-28 -1.641046 1.522968 0.821158 0.411712
2017-11-29 -0.083643 0.343699 0.206024 0.823357
2017-11-30 -0.009275 -2.321868 -0.039904 0.124485
2017-12-01 0.047416 1.887354 -1.192498 -0.142680
A
2017-11-24 0.785305
2017-11-25 2.191914
2017-11-26 -1.835203
2017-11-27 -0.172933
2017-11-28 -1.641046
2017-11-29 -0.083643
2017-11-30 -0.009275
2017-12-01 0.047416
~~~
# query操作
~~~
df = pd.DataFrame(np.random.rand(10, 3), columns=list('abc'))
print(df)
query = df.query('(a<b) & (b<c)')
print(query)
~~~
输出
~~~
a b c
0 0.978398 0.518068 0.305819
1 0.361797 0.941426 0.336406
2 0.195656 0.013157 0.684899
3 0.361125 0.235523 0.407919
4 0.187292 0.279097 0.666062
5 0.669675 0.389468 0.768248
6 0.047814 0.095575 0.505208
7 0.344353 0.502843 0.706559
8 0.858862 0.781222 0.385343
9 0.201276 0.275238 0.319084
a b c
4 0.187292 0.279097 0.666062
6 0.047814 0.095575 0.505208
7 0.344353 0.502843 0.706559
9 0.201276 0.275238 0.319084
~~~