合规国际互联网加速 OSASE为企业客户提供高速稳定SD-WAN国际加速解决方案。 广告
# Hash Function ### Source - lintcode: [(128) Hash Function](http://www.lintcode.com/en/problem/hash-function/) ~~~ In data structure Hash, hash function is used to convert a string(or any other type) into an integer smaller than hash size and bigger or equal to zero. The objective of designing a hash function is to "hash" the key as unreasonable as possible. A good hash function can avoid collision as less as possible. A widely used hash function algorithm is using a magic number 33, consider any string as a 33 based big integer like follow: hashcode("abcd") = (ascii(a) * 333 + ascii(b) * 332 + ascii(c) *33 + ascii(d)) % HASH_SIZE = (97* 333 + 98 * 332 + 99 * 33 +100) % HASH_SIZE = 3595978 % HASH_SIZE here HASH_SIZE is the capacity of the hash table (you can assume a hash table is like an array with index 0 ~ HASH_SIZE-1). Given a string as a key and the size of hash table, return the hash value of this key.f Have you met this question in a real interview? Yes Example For key="abcd" and size=100, return 78 Clarification For this problem, you are not necessary to design your own hash algorithm or consider any collision issue, you just need to implement the algorithm as described. ~~~ ### 题解1 基本实现题,大多数人看到题目的直觉是按照定义来递推不就得了嘛,但其实这里面大有玄机,因为在字符串较长时使用 long 型来计算33的幂会溢出!所以这道题的关键在于如何处理**大整数溢出**。对于整数求模,`(a * b) % m = a % m * b % m` 这个基本公式务必牢记。根据这个公式我们可以大大降低时间复杂度和规避溢出。 ### Java ~~~ class Solution { /** * @param key: A String you should hash * @param HASH_SIZE: An integer * @return an integer */ public int hashCode(char[] key,int HASH_SIZE) { if (key == null || key.length == 0) return -1; long hashSum = 0; for (int i = 0; i < key.length; i++) { hashSum += key[i] * modPow(33, key.length - i - 1, HASH_SIZE); hashSum %= HASH_SIZE; } return (int)hashSum; } private long modPow(int base, int n, int mod) { if (n == 0) { return 1; } else if (n == 1) { return base % mod; } else if (n % 2 == 0) { long temp = modPow(base, n / 2, mod); return (temp % mod) * (temp % mod) % mod; } else { return (base % mod) * modPow(base, n - 1, mod) % mod; } } } ~~~ ### 源码分析 题解1属于较为直观的解法,只不过在计算33的幂时使用了私有方法`modPow`, 这个方法使用了对数级别复杂度的算法,可防止 [TLE](# "Time Limit Exceeded 的简称。你的程序在 OJ 上的运行时间太长了,超过了对应题目的时间限制。") 的产生。注意两个 int 型数据在相乘时可能会溢出,故对中间结果的存储需要使用 long. ### 复杂度分析 遍历加求`modPow`,时间复杂度 O(nlogn)O(n \log n)O(nlogn), 空间复杂度 O(1)O(1)O(1). 当然也可以使用哈希表的方法将幂求模的结果保存起来,这样一来空间复杂度就是 O(n)O(n)O(n), 不过时间复杂度为 O(n)O(n)O(n). ### 题解2 - 巧用求模公式 从题解1中我们可以看到其时间复杂度还是比较高的,作为基本库来使用是比较低效的。我们从范例`hashcode("abc")`为例进行说明。 hashcode(abc)=(a×332+b×33+c)%M=(33(33×a+b)+c)%M=(33(33(33×0+a)+b)+c)%M\begin{array}{cl}hashcode(abc) & = & (a \times 33^{2} + b \times 33 + c)\% M\\ & = & (33(33\times a+b)+c)\% M\\ & = & (33(33(33\times0+a)+b)+c)\% M\end{array}hashcode(abc)===(a×332+b×33+c)%M(33(33×a+b)+c)%M(33(33(33×0+a)+b)+c)%M 再根据 (a×b)%M=(a%M)×(b%M)(a \times b) \% M = (a \% M) \times (b \% M)(a×b)%M=(a%M)×(b%M) 从中可以看出使用迭代的方法较容易实现。 ### Java ~~~ class Solution { /** * @param key: A String you should hash * @param HASH_SIZE: An integer * @return an integer */ public int hashCode(char[] key,int HASH_SIZE) { if (key == null || key.length == 0) return -1; long hashSum = 0; for (int i = 0; i < key.length; i++) { hashSum = 33 * hashSum + key[i]; hashSum %= HASH_SIZE; } return (int)hashSum; } } ~~~ ### 源码分析 精华在`hashSum = 33 * hashSum + key[i];` ### 复杂度分析 时间复杂度 O(n)O(n)O(n), 空间复杂度 O(1)O(1)O(1).