# Hash Function
### Source
- lintcode: [(128) Hash Function](http://www.lintcode.com/en/problem/hash-function/)
~~~
In data structure Hash, hash function is used to convert a string(or any other type)
into an integer smaller than hash size and bigger or equal to zero.
The objective of designing a hash function is to "hash" the key as unreasonable as possible.
A good hash function can avoid collision as less as possible.
A widely used hash function algorithm is using a magic number 33,
consider any string as a 33 based big integer like follow:
hashcode("abcd") = (ascii(a) * 333 + ascii(b) * 332 + ascii(c) *33 + ascii(d)) % HASH_SIZE
= (97* 333 + 98 * 332 + 99 * 33 +100) % HASH_SIZE
= 3595978 % HASH_SIZE
here HASH_SIZE is the capacity of the hash table
(you can assume a hash table is like an array with index 0 ~ HASH_SIZE-1).
Given a string as a key and the size of hash table, return the hash value of this key.f
Have you met this question in a real interview? Yes
Example
For key="abcd" and size=100, return 78
Clarification
For this problem,
you are not necessary to design your own hash algorithm or
consider any collision issue,
you just need to implement the algorithm as described.
~~~
### 题解1
基本实现题,大多数人看到题目的直觉是按照定义来递推不就得了嘛,但其实这里面大有玄机,因为在字符串较长时使用 long 型来计算33的幂会溢出!所以这道题的关键在于如何处理**大整数溢出**。对于整数求模,`(a * b) % m = a % m * b % m` 这个基本公式务必牢记。根据这个公式我们可以大大降低时间复杂度和规避溢出。
### Java
~~~
class Solution {
/**
* @param key: A String you should hash
* @param HASH_SIZE: An integer
* @return an integer
*/
public int hashCode(char[] key,int HASH_SIZE) {
if (key == null || key.length == 0) return -1;
long hashSum = 0;
for (int i = 0; i < key.length; i++) {
hashSum += key[i] * modPow(33, key.length - i - 1, HASH_SIZE);
hashSum %= HASH_SIZE;
}
return (int)hashSum;
}
private long modPow(int base, int n, int mod) {
if (n == 0) {
return 1;
} else if (n == 1) {
return base % mod;
} else if (n % 2 == 0) {
long temp = modPow(base, n / 2, mod);
return (temp % mod) * (temp % mod) % mod;
} else {
return (base % mod) * modPow(base, n - 1, mod) % mod;
}
}
}
~~~
### 源码分析
题解1属于较为直观的解法,只不过在计算33的幂时使用了私有方法`modPow`, 这个方法使用了对数级别复杂度的算法,可防止 [TLE](# "Time Limit Exceeded 的简称。你的程序在 OJ 上的运行时间太长了,超过了对应题目的时间限制。") 的产生。注意两个 int 型数据在相乘时可能会溢出,故对中间结果的存储需要使用 long.
### 复杂度分析
遍历加求`modPow`,时间复杂度 O(nlogn)O(n \log n)O(nlogn), 空间复杂度 O(1)O(1)O(1). 当然也可以使用哈希表的方法将幂求模的结果保存起来,这样一来空间复杂度就是 O(n)O(n)O(n), 不过时间复杂度为 O(n)O(n)O(n).
### 题解2 - 巧用求模公式
从题解1中我们可以看到其时间复杂度还是比较高的,作为基本库来使用是比较低效的。我们从范例`hashcode("abc")`为例进行说明。
hashcode(abc)=(a×332+b×33+c)%M=(33(33×a+b)+c)%M=(33(33(33×0+a)+b)+c)%M\begin{array}{cl}hashcode(abc) & = & (a \times 33^{2} + b \times 33 + c)\% M\\ & = & (33(33\times a+b)+c)\% M\\ & = & (33(33(33\times0+a)+b)+c)\% M\end{array}hashcode(abc)===(a×332+b×33+c)%M(33(33×a+b)+c)%M(33(33(33×0+a)+b)+c)%M
再根据 (a×b)%M=(a%M)×(b%M)(a \times b) \% M = (a \% M) \times (b \% M)(a×b)%M=(a%M)×(b%M)
从中可以看出使用迭代的方法较容易实现。
### Java
~~~
class Solution {
/**
* @param key: A String you should hash
* @param HASH_SIZE: An integer
* @return an integer
*/
public int hashCode(char[] key,int HASH_SIZE) {
if (key == null || key.length == 0) return -1;
long hashSum = 0;
for (int i = 0; i < key.length; i++) {
hashSum = 33 * hashSum + key[i];
hashSum %= HASH_SIZE;
}
return (int)hashSum;
}
}
~~~
### 源码分析
精华在`hashSum = 33 * hashSum + key[i];`
### 复杂度分析
时间复杂度 O(n)O(n)O(n), 空间复杂度 O(1)O(1)O(1).
- Preface
- Part I - Basics
- Basics Data Structure
- String
- Linked List
- Binary Tree
- Huffman Compression
- Queue
- Heap
- Stack
- Set
- Map
- Graph
- Basics Sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
- Bucket Sort
- Counting Sort
- Radix Sort
- Basics Algorithm
- Divide and Conquer
- Binary Search
- Math
- Greatest Common Divisor
- Prime
- Knapsack
- Probability
- Shuffle
- Basics Misc
- Bit Manipulation
- Part II - Coding
- String
- strStr
- Two Strings Are Anagrams
- Compare Strings
- Anagrams
- Longest Common Substring
- Rotate String
- Reverse Words in a String
- Valid Palindrome
- Longest Palindromic Substring
- Space Replacement
- Wildcard Matching
- Length of Last Word
- Count and Say
- Integer Array
- Remove Element
- Zero Sum Subarray
- Subarray Sum K
- Subarray Sum Closest
- Recover Rotated Sorted Array
- Product of Array Exclude Itself
- Partition Array
- First Missing Positive
- 2 Sum
- 3 Sum
- 3 Sum Closest
- Remove Duplicates from Sorted Array
- Remove Duplicates from Sorted Array II
- Merge Sorted Array
- Merge Sorted Array II
- Median
- Partition Array by Odd and Even
- Kth Largest Element
- Binary Search
- Binary Search
- Search Insert Position
- Search for a Range
- First Bad Version
- Search a 2D Matrix
- Search a 2D Matrix II
- Find Peak Element
- Search in Rotated Sorted Array
- Search in Rotated Sorted Array II
- Find Minimum in Rotated Sorted Array
- Find Minimum in Rotated Sorted Array II
- Median of two Sorted Arrays
- Sqrt x
- Wood Cut
- Math and Bit Manipulation
- Single Number
- Single Number II
- Single Number III
- O1 Check Power of 2
- Convert Integer A to Integer B
- Factorial Trailing Zeroes
- Unique Binary Search Trees
- Update Bits
- Fast Power
- Hash Function
- Count 1 in Binary
- Fibonacci
- A plus B Problem
- Print Numbers by Recursion
- Majority Number
- Majority Number II
- Majority Number III
- Digit Counts
- Ugly Number
- Plus One
- Linked List
- Remove Duplicates from Sorted List
- Remove Duplicates from Sorted List II
- Remove Duplicates from Unsorted List
- Partition List
- Two Lists Sum
- Two Lists Sum Advanced
- Remove Nth Node From End of List
- Linked List Cycle
- Linked List Cycle II
- Reverse Linked List
- Reverse Linked List II
- Merge Two Sorted Lists
- Merge k Sorted Lists
- Reorder List
- Copy List with Random Pointer
- Sort List
- Insertion Sort List
- Check if a singly linked list is palindrome
- Delete Node in the Middle of Singly Linked List
- Rotate List
- Swap Nodes in Pairs
- Remove Linked List Elements
- Binary Tree
- Binary Tree Preorder Traversal
- Binary Tree Inorder Traversal
- Binary Tree Postorder Traversal
- Binary Tree Level Order Traversal
- Binary Tree Level Order Traversal II
- Maximum Depth of Binary Tree
- Balanced Binary Tree
- Binary Tree Maximum Path Sum
- Lowest Common Ancestor
- Invert Binary Tree
- Diameter of a Binary Tree
- Construct Binary Tree from Preorder and Inorder Traversal
- Construct Binary Tree from Inorder and Postorder Traversal
- Subtree
- Binary Tree Zigzag Level Order Traversal
- Binary Tree Serialization
- Binary Search Tree
- Insert Node in a Binary Search Tree
- Validate Binary Search Tree
- Search Range in Binary Search Tree
- Convert Sorted Array to Binary Search Tree
- Convert Sorted List to Binary Search Tree
- Binary Search Tree Iterator
- Exhaustive Search
- Subsets
- Unique Subsets
- Permutations
- Unique Permutations
- Next Permutation
- Previous Permuation
- Unique Binary Search Trees II
- Permutation Index
- Permutation Index II
- Permutation Sequence
- Palindrome Partitioning
- Combinations
- Combination Sum
- Combination Sum II
- Minimum Depth of Binary Tree
- Word Search
- Dynamic Programming
- Triangle
- Backpack
- Backpack II
- Minimum Path Sum
- Unique Paths
- Unique Paths II
- Climbing Stairs
- Jump Game
- Word Break
- Longest Increasing Subsequence
- Palindrome Partitioning II
- Longest Common Subsequence
- Edit Distance
- Jump Game II
- Best Time to Buy and Sell Stock
- Best Time to Buy and Sell Stock II
- Best Time to Buy and Sell Stock III
- Best Time to Buy and Sell Stock IV
- Distinct Subsequences
- Interleaving String
- Maximum Subarray
- Maximum Subarray II
- Longest Increasing Continuous subsequence
- Longest Increasing Continuous subsequence II
- Graph
- Find the Connected Component in the Undirected Graph
- Route Between Two Nodes in Graph
- Topological Sorting
- Word Ladder
- Bipartial Graph Part I
- Data Structure
- Implement Queue by Two Stacks
- Min Stack
- Sliding Window Maximum
- Longest Words
- Heapify
- Problem Misc
- Nuts and Bolts Problem
- String to Integer
- Insert Interval
- Merge Intervals
- Minimum Subarray
- Matrix Zigzag Traversal
- Valid Sudoku
- Add Binary
- Reverse Integer
- Gray Code
- Find the Missing Number
- Minimum Window Substring
- Continuous Subarray Sum
- Continuous Subarray Sum II
- Longest Consecutive Sequence
- Part III - Contest
- Google APAC
- APAC 2015 Round B
- Problem A. Password Attacker
- Microsoft
- Microsoft 2015 April
- Problem A. Magic Box
- Problem B. Professor Q's Software
- Problem C. Islands Travel
- Problem D. Recruitment
- Microsoft 2015 April 2
- Problem A. Lucky Substrings
- Problem B. Numeric Keypad
- Problem C. Spring Outing
- Microsoft 2015 September 2
- Problem A. Farthest Point
- Appendix I Interview and Resume
- Interview
- Resume