💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
[TOC] # 显示指定的列 ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') print(csv[['Age', 'Fare']]) ~~~ 输出 ~~~ Age Fare 0 22.0 7.2500 1 38.0 71.2833 2 26.0 7.9250 3 35.0 53.1000 4 35.0 8.0500 5 NaN 8.4583 6 54.0 51.8625 ~~~ # 定位数据 **Ioc用label来去定位** **iloc用position来去定位** ## iloc定位数据 **用具体值定位用iloc** **获取指定行** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') # 获取第一行数据 print(csv.iloc[0]) ~~~ 输出 ~~~ PassengerId 1 Survived 0 Pclass 3 Name Braund, Mr. Owen Harris Sex male Age 22 SibSp 1 Parch 0 Ticket A/5 21171 Fare 7.25 Cabin NaN Embarked S Name: 0, dtype: object ~~~ 获取0-5行数据 ~~~ # 获取第0-5行数据 print(csv.iloc[0:5]) ~~~ **获取0-5行,1-3列数据** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') # 获取第0-5行数据 print(csv.iloc[0:5, 1:3]) ~~~ 输出 ~~~ Survived Pclass 0 0 3 1 1 1 2 1 3 3 1 1 4 0 3 ~~~ ## loc定位数据 用标签定位数据用Ioc **根据名字定位数据** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') # 根据具体名字定位数据 print(index.loc['Braund, Mr. Owen Harris']) ~~~ 输出 ~~~ PassengerId 1 Survived 0 Pclass 3 Sex male Age 22 SibSp 1 Parch 0 Ticket A/5 21171 Fare 7.25 Cabin NaN Embarked S Name: Braund, Mr. Owen Harris, dtype: object ~~~ **获取指定名字,指定列的数据** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') # 根据具体名字定位数据 print(index.loc['Braund, Mr. Owen Harris', 'Fare']) ~~~ 输出 `7.25` **输出从这个名字到那个名字的数据** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') # 根据具体名字定位数据 print(index.loc['Braund, Mr. Owen Harris':'Heikkinen, Miss. Laina',:]) ~~~ 输出 ![](https://box.kancloud.cn/85521b296ed5a6d071c2ea3e82887d47_1404x308.png) **修改价格** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') # 把这个值改为1000 index.loc['Braund, Mr. Owen Harris','Fare'] = 1000 print(index.loc['Braund, Mr. Owen Harris']) ~~~ 输出 ~~~ PassengerId 1 Survived 0 Pclass 3 Sex male Age 22 SibSp 1 Parch 0 Ticket A/5 21171 Fare 1000 Cabin NaN Embarked S Name: Braund, Mr. Owen Harris, dtype: object ~~~ **ix方法被遗弃了** **求性别为male的年龄的平均值** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') print(index.loc[index['Sex'] == 'male', 'Age'].mean()) ~~~ 输出 ~~~ 30.7266445916 ~~~ # bool类型的索引 **价格大于40会返回true** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') print(index['Fare'] > 40) ~~~ 输出 ~~~ Name Braund, Mr. Owen Harris False Cumings, Mrs. John Bradley (Florence Briggs Thayer) True Heikkinen, Miss. Laina False Futrelle, Mrs. Jacques Heath (Lily May Peel) True ~~~ **上面的布尔类型结果本身就是索引** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') print(index[index['Fare'] > 40]) ~~~ 输出 ![](https://box.kancloud.cn/4a7ce19e65de58e8e858e30571d1bf05_1378x518.png) 如果我们只要前5个的话,可以这样写 ~~~ print(index[index['Fare'] > 40][:5]) ~~~ **取性别为male的前5个** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') index = csv.set_index('Name') print(index[index['Sex'] == 'male'][:5]) ~~~ **求年龄大于70的和** ~~~ import pandas as pd csv = pd.read_csv('./titanic.csv') print((csv['Age'] > 70).sum()) ~~~ 输出 5 # 索引倒叙 ~~~ s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64') print(s) ~~~ 输出 ~~~ 4 0 3 1 2 2 1 3 0 4 dtype: int64 ~~~ # 查看是否在这个里面 ~~~ s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64') isin = s.isin([1, 3, 4]) print(isin) # 取出里面的数据 isi = s[s.isin([1, 3, 4])] print(isi) ~~~ 输出 ~~~ 4 False 3 True 2 False 1 True 0 True dtype: bool 3 1 1 3 0 4 dtype: int64 ~~~ # 多重索引,查看是否在这里面 ~~~ s = pd.Series(np.arange(6), index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])) print(s) rel = s.iloc[s.index.isin([(1, 'a'), (2, 'b')])] print(rel) ~~~ 输出 ~~~ 0 a 0 b 1 c 2 1 a 3 b 4 c 5 dtype: int64 1 a 3 dtype: int64 ~~~ # where操作 ~~~ s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64') print(s[s > 2]) ~~~ 输出 ~~~ 1 3 0 4 dtype: int64 ~~~ ~~~ # 创建时间从20171124开始8天 dates = pd.date_range('20171124', periods=8) # 创建数据,索引的时间,列是columns df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) # 第二个参数可以不写,那就是NAN,-df就表示不是<0的变为-的 rel = df.where(df < 0, -df) print(rel) ~~~ 输出 ~~~ A B C D 2017-11-24 -1.036210 -0.274139 -0.259427 -2.142512 2017-11-25 -1.444938 -0.845857 -0.428317 -0.233512 2017-11-26 -0.471317 -0.762691 -0.840787 -1.285823 2017-11-27 -0.186347 -1.267221 -0.846105 -1.181464 2017-11-28 -0.712680 -0.550270 -0.502814 -0.978257 2017-11-29 -1.994509 -0.833734 -0.572442 -0.191683 2017-11-30 -1.166124 -0.812152 -0.027302 -0.453136 2017-12-01 -0.487233 -0.138206 -0.195269 -0.543170 ~~~ # select操作 ~~~ # 创建时间从20171124开始8天 dates = pd.date_range('20171124', periods=8) # 创建数据,索引的时间,列是columns df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) print(df) select = df.select(lambda x: x == 'A', axis='columns') print(select) ~~~ 输出 ~~~ A B C D 2017-11-24 0.785305 0.690050 0.004393 -0.001702 2017-11-25 2.191914 -0.167305 -0.055147 -0.098674 2017-11-26 -1.835203 -0.264115 -0.160448 0.523237 2017-11-27 -0.172933 -0.832750 -1.822103 -0.937862 2017-11-28 -1.641046 1.522968 0.821158 0.411712 2017-11-29 -0.083643 0.343699 0.206024 0.823357 2017-11-30 -0.009275 -2.321868 -0.039904 0.124485 2017-12-01 0.047416 1.887354 -1.192498 -0.142680 A 2017-11-24 0.785305 2017-11-25 2.191914 2017-11-26 -1.835203 2017-11-27 -0.172933 2017-11-28 -1.641046 2017-11-29 -0.083643 2017-11-30 -0.009275 2017-12-01 0.047416 ~~~ # query操作 ~~~ df = pd.DataFrame(np.random.rand(10, 3), columns=list('abc')) print(df) query = df.query('(a<b) & (b<c)') print(query) ~~~ 输出 ~~~ a b c 0 0.978398 0.518068 0.305819 1 0.361797 0.941426 0.336406 2 0.195656 0.013157 0.684899 3 0.361125 0.235523 0.407919 4 0.187292 0.279097 0.666062 5 0.669675 0.389468 0.768248 6 0.047814 0.095575 0.505208 7 0.344353 0.502843 0.706559 8 0.858862 0.781222 0.385343 9 0.201276 0.275238 0.319084 a b c 4 0.187292 0.279097 0.666062 6 0.047814 0.095575 0.505208 7 0.344353 0.502843 0.706559 9 0.201276 0.275238 0.319084 ~~~