[TOC]
# 创建数据
~~~
import pandas as pd
example = pd.DataFrame({'Month': ["January", "January", "January", "January",
"February", "February", "February", "February",
"March", "March", "March", "March"],
'Category': ["Transportation", "Grocery", "Household", "Entertainment",
"Transportation", "Grocery", "Household", "Entertainment",
"Transportation", "Grocery", "Household", "Entertainment"],
'Amount': [74., 235., 175., 100., 115., 240., 225., 125., 90., 260., 200., 120.]})
print(example)
~~~
输出
~~~
Amount Category Month
0 74.0 Transportation January
1 235.0 Grocery January
2 175.0 Household January
3 100.0 Entertainment January
4 115.0 Transportation February
5 240.0 Grocery February
6 225.0 Household February
7 125.0 Entertainment February
8 90.0 Transportation March
9 260.0 Grocery March
10 200.0 Household March
11 120.0 Entertainment March
~~~
# 指定索引和列统计
索引index按照columns统计values
~~~
example_pivot = example.pivot(index='Category', columns='Month', values='Amount')
print(example_pivot)
~~~
# 按照不同维度求和
~~~
example_pivot = example.sum(axis=0)
print(example_pivot)
~~~
输出
~~~
Amount 1959
Category TransportationGroceryHouseholdEntertainmentTra...
Month JanuaryJanuaryJanuaryJanuaryFebruaryFebruaryFe...
dtype: object
~~~
# 求平均
csv结构
![](https://box.kancloud.cn/fd0c3fb8beac1e1740d1610710c9cdca_1770x612.png)
~~~
df = pd.read_csv('titanic.csv')
table = df.pivot_table(index='Sex', columns='Pclass', values='Fare')
print(table)
~~~
输出
~~~
Pclass 1 2 3
Sex
female 106.125798 21.970121 16.118810
male 67.226127 19.741782 12.661633
~~~
# 求最大
~~~
df = pd.read_csv('titanic.csv')
table = df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='max')
print(table)
~~~
输出
~~~
Pclass 1 2 3
Sex
female 512.3292 65.0 69.55
male 512.3292 73.5 69.55
~~~
# 统计
不同index在不同的columns里面他的value是怎么样(默认是求平均),也可以设置展示的是和或者max
~~~
df = pd.read_csv('titanic.csv')
# table = df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='count')
table = df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='count')
print(table)
~~~
输出
~~~
Pclass 1 2 3
Sex
female 94 76 144
male 122 108 347
~~~
这个结果也是下面这个
查看某个键在某个列上的分布情况
~~~
df = pd.read_csv('titanic.csv')
crosstab = pd.crosstab(index=df['Sex'], columns=df['Pclass'])
print(crosstab)
~~~
**查看获救的情况**
查看不同的船仓,男性和女性获救的情况
~~~
df = pd.read_csv('titanic.csv')
table = df.pivot_table(index='Pclass', columns='Sex', values='Survived', aggfunc='mean')
print(table)
~~~
输出
~~~
Sex female male
Pclass
1 0.968085 0.368852
2 0.921053 0.157407
3 0.500000 0.135447
~~~
**查看未成年人获救情况**
~~~
df = pd.read_csv('titanic.csv')
df['Underaged'] = df['Age'] <= 18
table = df.pivot_table(index='Underaged', columns='Sex', values='Survived', aggfunc='mean')
print(table)
~~~
输出
~~~
Sex female male
Underaged
False 0.760163 0.167984
True 0.676471 0.338028
~~~