MySQL专题四：InnoDB存储引擎 · JavaCook

## MySQL专题四：InnoDB存储引擎 [TOC] > InnoDB是Mysql默认的存储引擎 > CREATE TABLE t (i INT) ENGINE = InnoDB; ### 4.1.磁盘物理存储结构 #### 4.1.1. [Record物理结构](https://dev.mysql.com/doc/internals/en/innodb-record-structure.html) | Name | Size | | --- | --- | | Field Start Offsets | (F\*1) or (F\*2) bytes | | Extra Bytes | 6 bytes | | Field Contents | depends on content | 1. 字段开始偏移量（Field Start Offset）记录中每一个字段相对于原点（第一个Field Contents开始的位置）的偏移量的集合取反后的列表，每个Offset大小为1个或2个字节 2. 额外的字节（Extra Bytes ）最重要的是**1byte_offs_flag**，标志偏移量的Offset大小是1个或2个字节，1 代表**1-byte offsets**，0代表**2-byte offsets** | Name | Size | Description | | --- | --- | --- | | info_bits: | - |- | | () | 1 bit |unused or unknown | | () | 1 bit |unused or unknown | | deleted_flag | 1 bit |1 if record is deleted | | min_rec_flag | 1 bit |1 if record is predefined minimum record | | n_owned | 4 bits |number of records owned by this record | | heap_no | 13 bits |record's order number in heap of index page | | n_fields | 10 bits |number of fields in this record, 1 to 1023 | | **1byte_offs_flag** | 1 bit |1 if each Field Start Offsets is 1 byte long (this item is also called the "short" flag) | | next 16 bits | 16 bits |pointer to next record in page | | **TOTAL** | 48 bits |- | 3. 字段内容（Field Contents ）用户自定义表后，DBMS会在表中额外增加三个系统字段： - row ID - transaction ID - rollback pointer 例，向T表中插入一条记录： ``` CREATE TABLE T (FIELD1 VARCHAR(3), FIELD2 VARCHAR(3), FIELD3 VARCHAR(3)) ; ``` ``` INSERT INTO T VALUES ('PP', 'PP', 'PP'); ``` ``` ha_write_row19 17 15 13 0C 06 Field Start Offsets /* First Row */ 00 00 78 0D 02 BF Extra Bytes 00 00 00 00 04 21 System Column #1 00 00 00 00 09 2A System Column #2 80 00 00 00 2D 00 84 System Column #3 50 50 Field1 'PP' 50 50 Field2 'PP' 50 50 Field3 'PP' 16 15 14 13 0C 06 Field Start Offsets /* Second Row */ 00 00 80 0D 02 E1 Extra Bytes 00 00 00 00 04 22 System Column #1 00 00 00 00 09 2B 80 System Column #2 00 00 00 2D 00 84 System Column #3 51 Field1 'Q' 51 Field2 'Q' 51 Field3 'Q' 94 94 14 13 0C 06 Field Start Offsets /* Third Row */ 00 00 88 0D 00 74 Extra Bytes 00 00 00 00 04 23 System Column #1 00 00 00 00 09 2C System Column #2 80 00 00 00 2D 00 84 System Column #3 52 Field1 'R' ``` #### 4.1.2. [Page物理结构](https://dev.mysql.com/doc/internals/en/innodb-page-structure.html) Page用于存储记录，每个Page的大小固定为16KB，结构如下： * Fil Header * Page Header * Infimum + Supremum Records * User Records * Free Space * Page Directory * Fil Trailer 1. Fil Header **FIL_PAGE_PREV**和 **FIL_PAGE_NEXT** ：B+Tree数据结构中指针，指向Previous Page和Next Page | Name | Size | Remarks | | --- | --- | --- | | FIL_PAGE_SPACE | 4 |4 ID of the space the page is in | | FIL_PAGE_OFFSET | 4 |ordinal page number from start of space | | **FIL_PAGE_PREV** | 4 |offset of previous page in key order | | **FIL_PAGE_NEXT** | 4 |offset of next page in key order | | FIL_PAGE_LSN | 8|log serial number of page's latest log record | | FIL_PAGE_TYPE | 2 |current defined types are: FIL_PAGE_INDEX, FIL_PAGE_UNDO_LOG, FIL_PAGE_INODE, FIL_PAGE_IBUF_FREE_LIST | | FIL_PAGE_FILE_FLUSH_LSN | 8 |"the file has been flushed to disk at least up to this lsn" (log serial number), valid only on the first page of the file | | FIL_PAGE_ARCH_LOG_NO | 4 |the latest archived log file number at the time that FIL_PAGE_FILE_FLUSH_LSN was written (in the log) | #### 4.1.3. B+Tree树结构 InnoDB是以每一个Page为节点的B-Tree结构的存储引擎。在[B-树](http://www.btechsmartclass.com/data_structures/b-trees.html)的结构上，InnoDB由于 **FIL_PAGE_PREV**和 **FIL_PAGE_NEXT**指针的存在，可以从一个叶节点出发访问另一个叶节点，而不必每次回到根节点，这就是为什么InnoDB应该被称为B+树。 ### 4.2. 内存缓存存储模式 **BufferPool** 缓冲池是主内存中的一个区域，InnoDB在访问表和索引数据时将其缓存。缓冲池允许直接从内存中处理经常使用的数据加快了处理速度。 BufferPool被实现作为Page的列表，使用了LRU算法进行管理，很少使用的数据使用LRU算法会变淘汰。 > 对前面2种存储做一个总结： ![](https://img.kancloud.cn/20/d4/20d4f528c6e0f726eda771956e0750fa_700x538.png) ### 4.3. 其它内置存储引擎 ``` mysql> show engines; +--------------------+---------+----------------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +--------------------+---------+----------------------------------------------------------------+--------------+------+------------+ | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | | MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | | CSV | YES | CSV storage engine | NO | NO | NO | | FEDERATED | NO | Federated MySQL storage engine | NULL | NULL | NULL | | PERFORMANCE_SCHEMA | YES | Performance Schema | NO | NO | NO | | MyISAM | YES | MyISAM storage engine | NO | NO | NO | | InnoDB | DEFAULT | Supports transactions, row-level locking, and foreign keys | YES | YES | YES | | BLACKHOLE | YES | /dev/null storage engine (anything you write to it disappears) | NO | NO | NO | | ARCHIVE | YES | Archive storage engine | NO | NO | NO | +--------------------+---------+----------------------------------------------------------------+--------------+------+------------+ ``` ### 4.4. InnoDB的事务与锁 #### 4.4.1. 事务（[Transactions](https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/glossary.html#glos_transaction)）： **Transactions**是可以被提交（`commit`）和回滚（`rollback`）作业的原子单元，假设一个事务对数据库做了很多操作，要么当事务提交的时候操作数据库成功，要么当事务回滚的时候数据库不发生任何改变。在Mysql数据库存储引擎中只有InnnoDB实现了事务，具有**ACID**属性，包括原子性（`atomicity`）、一致性（`consistency`）、隔离性（`isolation`）和持久性（`durability`）。 #### 4.4.2. 行级锁（[row-level locking](https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/glossary.html#glos_row_lock)）： **rw-lock**等级为行的锁，多个事务可以并发地修改同一张表（table），但是当修改同一行（row），同一时间只能有一个事务可以修改，另一个必须等前面的事务完成或者释放行级锁(row locks)才能进行修改操作。 rw-lock包括三种类型的锁：(`shared、exclusive`) - 共享锁（`s-locks`）：可以对公共资源的读取访问 - 独占锁（`x-locks`）：可以对公共资源的写访问，但是不允许其它线程不一致地读取 - 共享独占锁（`sx-locks`）：可以对公共资源的写访问，也允许其它线程不一致地读取 | | S | X | SX | | --- | --- | --- | --- | | S | 兼容 | 兼容 | 冲突 | | X | 兼容 | 冲突 | 冲突 | | SX | 冲突 | 冲突 | 冲突 |