字符串 · 把《编程珠玑》读薄

## 字符串程序1：循环输入并将每个单词插入集合S(忽略重复单词)，然后排序输出。 ~~~ int main(void) { set<string> s; set<string>::iterator j; string t; while(cin >> t) s.insert(t); for(j=s.begin(); j!=s.end(); ++j) cout<<*j<<endl; return 0; } ~~~ 程序2：单词计数 ~~~ int main(void) { map<string, int> m; map<string, int>::iterator j; string t; while(cin >> t) m[t]++; for(j=m.begin(); j!=m.end(); ++j) cout<<j->first<<" "<<j->second<<endl; return 0; } ~~~ 程序3：建立自己的哈希表(散列表)，以下是一种实现： ~~~ class Hash { public: Hash(): seed_(131), size_(0) { memset(head_, 0, sizeof(head_)); } void Insert(const char* str) { unsigned int id = hash(str); char *dst = (char*)node_[size_].word; while(*dst++ = *str++); node_[size_].next = head_[id]; head_[id] = &node_[size_]; ++size_; } bool Find(const char* str) { unsigned int id = hash(str); for(Node* p=head_[id]; p; p=p->next) { char* dst = (char*)p->word; int i = 0; for(; *(str+i) && *(str+i)==*(dst+i); ++i); if(!*(str+i) && !*(dst+i)) return true; } return false; } private: unsigned int hash(const char* str) {// BKDR Hash Function unsigned int hash = 0; while(*str) { hash = hash * seed_ + (*str++); } return (hash & 0x7FFFFFFF) % kHashSize; } private: unsigned int seed_; unsigned int size_; static const int kWordSize = 26 + 1; static const int kNodeSize = 20000; static const int kHashSize = 10001; struct Node { char word[kWordSize]; Node *next; }; Node node_[kNodeSize]; Node* head_[kHashSize]; }; ~~~ **后缀数组** 假设我们有以下字符串及一个char*数组： ~~~ char c[20] = "hawstein"; char* pc[20]; ~~~ 我们让指针pc[i]指向字符串的第i个字符，即： ~~~ for(int i=0; i<8; ++i) pc[i] = &c[i]; ~~~ 这时候我们输出pc[i]，会得到字符串"hawstein"的所有后缀： ~~~ hawstein awstein wstein stein tein ein in n ~~~ 然后，我们对数组pc进行排序，将所有后缀按字典序排序： ~~~ sort(pc, pc+8, cmp); ~~~ 其中，比较函数cmp如下： ~~~ inline bool cmp(char* p, char*q) { return strcmp(p, q) < 0; } ~~~ 这时，我们再输出pc[i]，会得到排序后的结果： ~~~ awstein ein hawstein in n stein tein wstein ~~~ 我们把数组pc称为“后缀数组”。这里需要注意，数组pc 中存储的是指向每个后缀首字符的地址。我们也可以存储每个后缀首字符在原数组中的下标，效果是一样的。本章中用后缀数组解决了一个小问题：可重叠最长重复子串。比如对于字符串"banana"，其后缀数组为： ~~~ a ana anana banana na nana ~~~ 扫描一次数组，比较相邻子串，找到相邻子串的最长公共前缀即可。本例为"ana"，其中一个a是重叠的。后缀数组是处理字符串的有力工具，常见的两种实现方法是：倍增算法和DC3算法。推荐阅读以下材料来学习后缀数组： > > 许智磊，《后缀数组》 > > > > 罗穗骞，《后缀数组——处理字符串的有力工具》