# Heap Sort - 堆排序
堆排序通常基于[**二叉堆**](http://algorithm.yuanbin.me/zh-cn/basics_data_structure/heap.html)实现,以大根堆为例,堆排序的实现过程分为两个子过程。第一步为取出大根堆的根节点(当前堆的最大值), 由于取走了一个节点,故需要对余下的元素重新建堆。重新建堆后继续取根节点,循环直至取完所有节点,此时数组已经有序。基本思想就是这样,不过实现上还是有些小技巧的。
### 堆的操作
以大根堆为例,堆的常用操作如下。
1. 最大堆调整(Max_Heapify):将堆的末端子节点作调整,使得子节点永远小于父节点
1. 创建最大堆(Build_Max_Heap):将堆所有数据重新排序
1. 堆排序(HeapSort):移除位在第一个数据的根节点,并做最大堆调整的递归运算
其中步骤1是给步骤2和3用的。
![Heapsort-example](https://box.kancloud.cn/2015-10-24_562b1f3474fbe.gif)
建堆时可以自顶向下,也可以采取自底向上,以下先采用自底向上的思路分析。我们可以将数组的后半部分节点想象为堆的最下面的那些节点,由于是单个节点,故显然满足二叉堆的定义,于是乎我们就可以从中间节点向上逐步构建二叉堆,每前进一步都保证其后的节点都是二叉堆,这样一来前进到第一个节点时整个数组就是一个二叉堆了。下面用 C++ 实现一个堆的类。
堆排在空间比较小(嵌入式设备和手机)时特别有用,但是因为现代系统往往有较多的缓存,堆排序无法有效利用缓存,数组元素很少和相邻的其他元素比较,故缓存未命中的概率远大于其他在相邻元素间比较的算法。但是在海量数据的排序下又重新发挥了重要作用,因为它在插入操作和删除最大元素的混合动态场景中能保证对数级别的运行时间。TopM
### C++
~~~
#include <iostream>
#include <vector>
using namespace std;
class HeapSort {
// get the parent node index
int parent(int i) {
return (i - 1) / 2;
}
// get the left child node index
int left(int i) {
return 2 * i + 1;
}
// get the right child node index
int right(int i) {
return 2 * i + 2;
}
// build max heap
void build_max_heapify(vector<int> &nums, int heap_size) {
for (int i = heap_size / 2; i >= 0; --i) {
max_heapify(nums, i, heap_size);
}
print_heap(nums, heap_size);
}
// build min heap
void build_min_heapify(vector<int> &nums, int heap_size) {
for (int i = heap_size / 2; i >= 0; --i) {
min_heapify(nums, i, heap_size);
}
print_heap(nums, heap_size);
}
// adjust the heap to max-heap
void max_heapify(vector<int> &nums, int k, int len) {
// int len = nums.size();
while (k < len) {
int max_index = k;
// left leaf node search
int l = left(k);
if (l < len && nums[l] > nums[max_index]) {
max_index = l;
}
// right leaf node search
int r = right(k);
if (r < len && nums[r] > nums[max_index]) {
max_index = r;
}
// node after k are max-heap already
if (k == max_index) {
break;
}
// keep the root node the largest
int temp = nums[k];
nums[k] = nums[max_index];
nums[max_index] = temp;
// adjust not only just current index
k = max_index;
}
}
// adjust the heap to min-heap
void min_heapify(vector<int> &nums, int k, int len) {
// int len = nums.size();
while (k < len) {
int min_index = k;
// left leaf node search
int l = left(k);
if (l < len && nums[l] < nums[min_index]) {
min_index = l;
}
// right leaf node search
int r = right(k);
if (r < len && nums[r] < nums[min_index]) {
min_index = r;
}
// node after k are min-heap already
if (k == min_index) {
break;
}
// keep the root node the largest
int temp = nums[k];
nums[k] = nums[min_index];
nums[min_index] = temp;
// adjust not only just current index
k = min_index;
}
}
public:
// heap sort
void heap_sort(vector<int> &nums) {
int len = nums.size();
// init heap structure
build_max_heapify(nums, len);
// heap sort
for (int i = len - 1; i >= 0; --i) {
// put the largest number int the last
int temp = nums[0];
nums[0] = nums[i];
nums[i] = temp;
// reconstruct heap
build_max_heapify(nums, i);
}
print_heap(nums, len);
}
// print heap between [0, heap_size - 1]
void print_heap(vector<int> &nums, int heap_size) {
for (int i = 0; i < heap_size; ++i) {
cout << nums[i] << ", ";
}
cout << endl;
}
};
int main(int argc, char *argv[])
{
int A[] = {19, 1, 10, 14, 16, 4, 7, 9, 3, 2, 8, 5, 11};
vector<int> nums;
for (int i = 0; i < sizeof(A) / sizeof(A[0]); ++i) {
nums.push_back(A[i]);
}
HeapSort sort;
sort.print_heap(nums, nums.size());
sort.heap_sort(nums);
return 0;
}
~~~
### 复杂度分析
从代码中可以发现堆排最费时间的地方在于构建二叉堆的过程。
上述构建大根堆和小根堆都是自底向上的方法,建堆过程时间复杂度为 O(2N)O(2N)O(2N), 堆排过程中重建的时间复杂度为 O(2NlogN)O(2N \log N)O(2NlogN). 故总的时间复杂度为 O(NlogN)O(N \log N)O(NlogN).
先看看建堆的过程,画图分析(比如以8个节点为例)可知在最坏情况下,每次都需要调整之前已经成为堆的节点,那么就意味着有二分之一的节点向下比较了一次,四分之一的节点向下比较了两次,八分之一的节点比较了三次... 等差等比数列求和,具体过程可参考下面的链接。
### Reference
- [堆排序 - 维基百科,自由的百科全书](http://zh.wikipedia.org/wiki/%E5%A0%86%E6%8E%92%E5%BA%8F)
- [Priority Queues](http://algs4.cs.princeton.edu/24pq/) - Robert Sedgewick 的大作,详解了关于堆的操作。
- [经典排序算法总结与实现 | Jark's Blog](http://wuchong.me/blog/2014/02/09/algorithm-sort-summary/) - 堆排序讲的很好。
- *Algorithm* - Robert Sedgewick
- [堆排序中建堆过程时间复杂度O(n)怎么来的?](http://www.zhihu.com/question/20729324)
- [《大话数据结构》第9章 排序 9.7 堆排序(上) - 伍迷 - 博客园](http://www.cnblogs.com/cj723/archive/2011/04/21/2024261.html)
- [《大话数据结构》第9章 排序 9.7 堆排序(下) - 伍迷 - 博客园](http://www.cnblogs.com/cj723/archive/2011/04/22/2024269.html)
- Preface
- Part I - Basics
- Basics Data Structure
- String
- Linked List
- Binary Tree
- Huffman Compression
- Queue
- Heap
- Stack
- Set
- Map
- Graph
- Basics Sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
- Bucket Sort
- Counting Sort
- Radix Sort
- Basics Algorithm
- Divide and Conquer
- Binary Search
- Math
- Greatest Common Divisor
- Prime
- Knapsack
- Probability
- Shuffle
- Basics Misc
- Bit Manipulation
- Part II - Coding
- String
- strStr
- Two Strings Are Anagrams
- Compare Strings
- Anagrams
- Longest Common Substring
- Rotate String
- Reverse Words in a String
- Valid Palindrome
- Longest Palindromic Substring
- Space Replacement
- Wildcard Matching
- Length of Last Word
- Count and Say
- Integer Array
- Remove Element
- Zero Sum Subarray
- Subarray Sum K
- Subarray Sum Closest
- Recover Rotated Sorted Array
- Product of Array Exclude Itself
- Partition Array
- First Missing Positive
- 2 Sum
- 3 Sum
- 3 Sum Closest
- Remove Duplicates from Sorted Array
- Remove Duplicates from Sorted Array II
- Merge Sorted Array
- Merge Sorted Array II
- Median
- Partition Array by Odd and Even
- Kth Largest Element
- Binary Search
- Binary Search
- Search Insert Position
- Search for a Range
- First Bad Version
- Search a 2D Matrix
- Search a 2D Matrix II
- Find Peak Element
- Search in Rotated Sorted Array
- Search in Rotated Sorted Array II
- Find Minimum in Rotated Sorted Array
- Find Minimum in Rotated Sorted Array II
- Median of two Sorted Arrays
- Sqrt x
- Wood Cut
- Math and Bit Manipulation
- Single Number
- Single Number II
- Single Number III
- O1 Check Power of 2
- Convert Integer A to Integer B
- Factorial Trailing Zeroes
- Unique Binary Search Trees
- Update Bits
- Fast Power
- Hash Function
- Count 1 in Binary
- Fibonacci
- A plus B Problem
- Print Numbers by Recursion
- Majority Number
- Majority Number II
- Majority Number III
- Digit Counts
- Ugly Number
- Plus One
- Linked List
- Remove Duplicates from Sorted List
- Remove Duplicates from Sorted List II
- Remove Duplicates from Unsorted List
- Partition List
- Two Lists Sum
- Two Lists Sum Advanced
- Remove Nth Node From End of List
- Linked List Cycle
- Linked List Cycle II
- Reverse Linked List
- Reverse Linked List II
- Merge Two Sorted Lists
- Merge k Sorted Lists
- Reorder List
- Copy List with Random Pointer
- Sort List
- Insertion Sort List
- Check if a singly linked list is palindrome
- Delete Node in the Middle of Singly Linked List
- Rotate List
- Swap Nodes in Pairs
- Remove Linked List Elements
- Binary Tree
- Binary Tree Preorder Traversal
- Binary Tree Inorder Traversal
- Binary Tree Postorder Traversal
- Binary Tree Level Order Traversal
- Binary Tree Level Order Traversal II
- Maximum Depth of Binary Tree
- Balanced Binary Tree
- Binary Tree Maximum Path Sum
- Lowest Common Ancestor
- Invert Binary Tree
- Diameter of a Binary Tree
- Construct Binary Tree from Preorder and Inorder Traversal
- Construct Binary Tree from Inorder and Postorder Traversal
- Subtree
- Binary Tree Zigzag Level Order Traversal
- Binary Tree Serialization
- Binary Search Tree
- Insert Node in a Binary Search Tree
- Validate Binary Search Tree
- Search Range in Binary Search Tree
- Convert Sorted Array to Binary Search Tree
- Convert Sorted List to Binary Search Tree
- Binary Search Tree Iterator
- Exhaustive Search
- Subsets
- Unique Subsets
- Permutations
- Unique Permutations
- Next Permutation
- Previous Permuation
- Unique Binary Search Trees II
- Permutation Index
- Permutation Index II
- Permutation Sequence
- Palindrome Partitioning
- Combinations
- Combination Sum
- Combination Sum II
- Minimum Depth of Binary Tree
- Word Search
- Dynamic Programming
- Triangle
- Backpack
- Backpack II
- Minimum Path Sum
- Unique Paths
- Unique Paths II
- Climbing Stairs
- Jump Game
- Word Break
- Longest Increasing Subsequence
- Palindrome Partitioning II
- Longest Common Subsequence
- Edit Distance
- Jump Game II
- Best Time to Buy and Sell Stock
- Best Time to Buy and Sell Stock II
- Best Time to Buy and Sell Stock III
- Best Time to Buy and Sell Stock IV
- Distinct Subsequences
- Interleaving String
- Maximum Subarray
- Maximum Subarray II
- Longest Increasing Continuous subsequence
- Longest Increasing Continuous subsequence II
- Graph
- Find the Connected Component in the Undirected Graph
- Route Between Two Nodes in Graph
- Topological Sorting
- Word Ladder
- Bipartial Graph Part I
- Data Structure
- Implement Queue by Two Stacks
- Min Stack
- Sliding Window Maximum
- Longest Words
- Heapify
- Problem Misc
- Nuts and Bolts Problem
- String to Integer
- Insert Interval
- Merge Intervals
- Minimum Subarray
- Matrix Zigzag Traversal
- Valid Sudoku
- Add Binary
- Reverse Integer
- Gray Code
- Find the Missing Number
- Minimum Window Substring
- Continuous Subarray Sum
- Continuous Subarray Sum II
- Longest Consecutive Sequence
- Part III - Contest
- Google APAC
- APAC 2015 Round B
- Problem A. Password Attacker
- Microsoft
- Microsoft 2015 April
- Problem A. Magic Box
- Problem B. Professor Q's Software
- Problem C. Islands Travel
- Problem D. Recruitment
- Microsoft 2015 April 2
- Problem A. Lucky Substrings
- Problem B. Numeric Keypad
- Problem C. Spring Outing
- Microsoft 2015 September 2
- Problem A. Farthest Point
- Appendix I Interview and Resume
- Interview
- Resume