第七节 NUMPY科学计算 · soton_数据分析

[TOC] ***** # 1. Numpy科学计算 numpy 是 python科学计算的核心库。PYTHON里涉及到科学计算的包括Pandas,sklearn等都是基于numpy进行二次开发包装的。numpy功能非常强大，和scipy构建了强大的PYTHON数理计算功能，函数接口丰富复杂。 ![](https://img.kancloud.cn/48/8d/488dce7371d5b8568e349ffce9cd2aa6_751x184.png) * 数组的定义和应用 * 数组元素的索引选取 * 数组的计算 ![](https://img.kancloud.cn/2a/28/2a2878ca85979ff60d234cf69c36533d_1115x401.png) ## 数组：Arrays array用来**存储同类型的序列数据**，能够被非负整数进行索引。维度的数量就是array的秩(rank)。我们可以通过python的列表来创建array,并且通过方括号进行索引获取元素 ``` import numpy as np a = np.array([1,3,4,6,10]) ``` ![](https://img.kancloud.cn/0f/25/0f259caef625390256c04b086abd869f_325x73.png) ***** ``` #有五个元素 print(a.size) # 是一维的 print(a.shape) # 按索引取元素 print(a[2]) ``` ![](https://img.kancloud.cn/69/22/692201567bd8072a05c5098c213a8e75_67x71.png) ***** **高维数组的创建** 对于多维数组的理解，看《利用Python进行数据分析》的4.1.4小节 ![](https://img.kancloud.cn/64/a0/64a0b7a1ffe714ff18e9d9e68d9ab8d6_1137x306.png) ``` # 二维数组 b = np.array([[1,2,3,4],[5,6,7,8]]) print(b.shape) ``` ![](https://img.kancloud.cn/06/6a/066a6fef016c6f2881b3566dd6dc2253_89x33.png) ![](https://img.kancloud.cn/be/a1/bea100ef384d4eba3710097fd57f6416_382x116.png) ***** **axis = 0表示在行方向上的索引选取。axis=1表示在列方向上的进行索引选取** ***** ## 创建Array numpy提供了内置的函数来创建一些特殊的数组，我们仅仅需要传递创建的大小即可 ![](https://img.kancloud.cn/83/7a/837a98f7c67db826ece92c4df5663c8d_1253x244.png) ![](https://img.kancloud.cn/7b/98/7b98184cabac8dc1a4d71ef1a8d0f3a9_213x342.png) ***** ``` #按照b的数组结构全填0 np.zeros_like(b) ``` ***** 生成如下图的数组 ![](https://img.kancloud.cn/26/76/26766c05d6d17414be0f4e9dad53f717_246x138.png) ***** ## Array的常用属性和方法 * 统计计算 * 排序 * 按照大小查索引 * 条件查找 * shape ***** 生成3行4列的数组 ![](https://img.kancloud.cn/58/73/587329746a9ed5ff8108926a3bc75eb4_566x265.png) ![](https://img.kancloud.cn/2c/16/2c167854a79d909b94031bbb4e5977c4_167x194.png) ![](https://img.kancloud.cn/d9/e4/d9e499f871f9f3eea11e7ee749f4d528_587x305.png) ![](https://img.kancloud.cn/74/99/7499ba41da2903c0ec4c2f0c13daab30_376x141.png) ***** ![](https://img.kancloud.cn/02/94/02946e253f2e4a64361c384c54712387_590x150.png) ***** ![](https://img.kancloud.cn/7b/be/7bbed4997b6ea7f5b70d73047f016c69_645x109.png) ***** **聚合计算** ![](https://img.kancloud.cn/2e/79/2e79f688513e3e611beb05e4f3336656_1320x585.png) ![](https://img.kancloud.cn/6c/5a/6c5a3dbce91b7a5e6f4a550e424de079_667x344.png) ***** ![](https://img.kancloud.cn/d8/15/d815dfd180e441ddf3d013061514923c_281x92.png) ***** ![](https://img.kancloud.cn/95/94/959464927c38149bcadd3e28fd07f164_304x102.png) ***** ## Shape改变一个数组的 shape 是由轴及其元素数量决定的，它一般由一个整型元组表示，且元组中的整数表示对应维度的元素数我们最容易接触到的shape改变就是转置，这通常用于计算dot ![](https://img.kancloud.cn/32/9e/329e494f8a95502c911592f267a8014d_812x315.png) 还有一种常见的情形是在机器学习中应用，我们需要改变数组的形状从而适应我们的建模需要 ![](https://img.kancloud.cn/70/ce/70cee7ac5192b2e794b362beaaaef57a_1309x467.png) ![](https://img.kancloud.cn/1b/4b/1b4b2a0fb5e8f1710404eafb902b0f6d_396x303.png) ***** ![](https://img.kancloud.cn/45/80/45802b79a0a80120b5491aaa67362eef_1306x491.png) ravel()和reshape()生成新的数组，不改变原有的数组 ![](https://img.kancloud.cn/d9/de/d9de5ee260262b52e5092b71ee76c35e_736x135.png) ***** ## 随机数 numpy可以根据一定的规则创建随机数，随机数的使用会在后面概率论，数据挖掘的时候经常用到。 ***** 生成3*4的随机数,生成范围[0,1) ![](https://img.kancloud.cn/86/e4/86e4e0a68d53f7be63fffd42a395890f_558x113.png) ***** ``` #生成10个随机数 np.random.rand(10) ``` ***** 生成一个不大于10的随机数 ![](https://img.kancloud.cn/cf/b6/cfb69edfe4f2c2471c9d01dc67592c9c_290x99.png) ***** 生成两个5*2的数组。生成范围[0.0, 1.0) ![](https://img.kancloud.cn/74/b8/74b883d0cfd59f01e04a0e5997e9e10a_385x306.png) ***** ![](https://img.kancloud.cn/14/2e/142e5dc5748d99523100c20a71e17573_416x150.png) ***** 生成3*4的数组，数组中的每个元素从前面的参数列表中随机选取 ![](https://img.kancloud.cn/0a/d3/0ad3cf6f9d683dbaf1a99b21fe13e5c5_373x125.png) ***** ## 数组的索引 **切片**选取类似于list，但是array可以是多维度的，因此我们需要指定每一个维度上的操作 ![](https://img.kancloud.cn/f9/91/f991395f9aa3c41c2893709865e89f3c_254x349.png) ![](https://img.kancloud.cn/f8/f2/f8f2c55d24b576d2107db7c54ef264d1_237x332.png) ***** []里用切片进行选取，第一个切片参数是在行索引方向上进行选取。[1:3]表示选中位置下标为1的行到位置下标为3的行。切片按位置选取时，前闭后开，会选到位置下标为2的行止 ![](https://img.kancloud.cn/84/3e/843efc844412db5d8074e7fccb4c3d57_217x315.png) ***** [0:1]选取位置从第0行到1行。逗号后0选取第一列 ![](https://img.kancloud.cn/17/72/177211d54ba20878f734ffbaa623e1a5_274x349.png) ***** ![](https://img.kancloud.cn/0d/ed/0ded401ed224cd1f65c221065d3c742d_770x170.png) ***** ![](https://img.kancloud.cn/1a/bd/1abd95f2566f39567d04e99f4870cfac_314x238.png) **整数索引** ``` #行位置索引列表1，2。列位置索引列表0，1。所以选中(1,0)和(2,1) a[[1,2],[0,1]] ``` ![](https://img.kancloud.cn/15/4d/154dba5c03419f73efc235a731cfca81_268x74.png) ![](https://img.kancloud.cn/0a/af/0aafd7432c8ff86c3e86d800815bed10_198x38.png) ***** **布尔型索引** ``` # 选出数组中所有大于4的元素 a >4 a[a>4] ``` ![](https://img.kancloud.cn/15/4d/154dba5c03419f73efc235a731cfca81_268x74.png) ![](https://img.kancloud.cn/09/b6/09b6cdf82b221c5ecd878d45bb0e6394_398x30.png) ***** **图解索引** mask是行索引从位置0到最后一行，每行被选中与否的布尔值列表 ![](https://img.kancloud.cn/3f/d1/3fd1864e1ac44507897b76efcb9e272a_779x338.png) ***** a[2::2,::2]。2::2表示选中位置为2的行和从2开始能不断加2得到索引行。::2表示选中位置为0的列和从0开始能不断加2得到的索引列 ![](https://img.kancloud.cn/33/84/33844505d52578b367ff9028b8a77303_798x459.png) ***** ## 数学计算需要线性代数的知识 ![](https://img.kancloud.cn/05/4b/054b6af5f22a0fc0ff873f6a6fb8662f_1017x210.png) ![](https://img.kancloud.cn/bd/4c/bd4c4cc759d49ab3aa2def205adb1108_1146x263.png) ***** ![](https://img.kancloud.cn/38/da/38da10b7151f8a48d971b2271ad839cd_593x411.png) ***** ![](https://img.kancloud.cn/7e/02/7e02533dc1e17c5fc30f9173f6f05277_628x268.png) ![](https://img.kancloud.cn/cd/e1/cde12bc5940109cee51c1b7dc6402c3d_580x138.png) ***** ![](https://img.kancloud.cn/dd/dd/dddd35e525393e292bdf86c4a2b565ad_179x214.png) ***** ![](https://img.kancloud.cn/31/84/3184ce6ff6a2e687f799bfff83a99563_1329x714.png) ***** ![](https://img.kancloud.cn/fc/3f/fc3fed0e53e48b6abf3712c772d556a1_621x318.png) ***** ## 实际应用 ### 机器学习在我们学习到后面机器学习的时候，会遇到一个公式。这个公式在统计学习中叫做最小二乘法，在机器学习中的线性回归模型中叫做平方和误差 Prediction是线性回归模型预测试，求与观测值的方差，数字越大，模型预测能力越差 ![](https://img.kancloud.cn/2d/e4/2de4b2510e4181862463fe5427adad4a_1330x383.png) ![](https://img.kancloud.cn/23/09/2309ed9c4ba87d3538cbd2351851650f_1299x640.png) ![](https://img.kancloud.cn/28/7b/287b442387258a537e84d7b3b912aefd_1352x760.png) ![](https://img.kancloud.cn/e9/c5/e9c592238dffdecafa36f673d2362384_1346x634.png) ### **文本** 我们可以把一句话切割成单个的字符 ![](https://img.kancloud.cn/92/b8/92b8365c2f41e6d77cf78e2fdec193a5_1345x643.png)