# Word Break
- tags: [[DP_Sequence](# "单序列动态规划,通常使用 f[i] 表示前i个位置/数字/字母... 使用 f[n-1] 表示最后返回结果。")]
### Source
- leetcode: [Word Break | LeetCode OJ](https://leetcode.com/problems/word-break/)
- lintcode: [(107) Word Break](http://www.lintcode.com/en/problem/word-break/)
~~~
Given a string s and a dictionary of words dict, determine if s can be
segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "leetcode",
dict = ["leet", "code"].
Return true because "leetcode" can be segmented as "leet code".
~~~
### 题解
单序列([DP_Sequence](# "单序列动态规划,通常使用 f[i] 表示前i个位置/数字/字母... 使用 f[n-1] 表示最后返回结果。")) DP 题,由单序列动态规划的四要素可大致写出:
1. State: `f[i]` 表示前`i`个字符能否根据词典中的词被成功分词。
1. Function: `f[i] = or{f[j], j < i, letter in [j+1, i] can be found in dict}`, 含义为小于`i`的索引`j`中只要有一个`f[j]`为真且`j+1`到`i`中组成的字符能在词典中找到时,`f[i]`即为真,否则为假。具体实现可分为自顶向下或者自底向上。
1. Initialization: `f[0] = true`, 数组长度为字符串长度 + 1,便于处理。
1. Answer: `f[s.length]`
考虑到单词长度通常不会太长,故在`s`较长时使用自底向上效率更高。
### Python
~~~
class Solution:
# @param s, a string
# @param wordDict, a set<string>
# @return a boolean
def wordBreak(self, s, wordDict):
if not s:
return True
if not wordDict:
return False
max_word_len = max([len(w) for w in wordDict])
can_break = [True]
for i in xrange(len(s)):
can_break.append(False)
for j in xrange(i, -1, -1):
# optimize for too long interval
if i - j + 1 > max_word_len:
break
if can_break[j] and s[j:i + 1] in wordDict:
can_break[i + 1] = True
break
return can_break[-1]
~~~
### C++
~~~
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
if (s.empty()) return true;
if (wordDict.empty()) return false;
// get the max word length of wordDict
int max_word_len = 0;
for (unordered_set<string>::iterator it = wordDict.begin();
it != wordDict.end(); ++it) {
max_word_len = max(max_word_len, (*it).size());
}
vector<bool> can_break(s.size() + 1, false);
can_break[0] = true;
for (int i = 1; i <= s.size(); ++i) {
for (int j = i - 1; j >= 0; --j) {
// optimize for too long interval
if (i - j > max_word_len) break;
if (can_break[j] &&
wordDict.find(s.substr(j, i - j)) != wordDict.end()) {
can_break[i] = true;
break;
}
}
}
return can_break[s.size()];
}
};
~~~
### Java
~~~
public class Solution {
public boolean wordBreak(String s, Set<String> wordDict) {
if (s == null || s.length() == 0) return true;
if (wordDict == null || wordDict.isEmpty()) return false;
// get the max word length of wordDict
int max_word_len = 0;
for (String word : wordDict) {
max_word_len = Math.max(max_word_len, word.length());
}
boolean[] can_break = new boolean[s.length() + 1];
can_break[0] = true;
for (int i = 1; i <= s.length(); i++) {
for (int j = i - 1; j >= 0; j--) {
// optimize for too long interval
if (i - j > max_word_len) break;
String word = s.substring(j, i);
if (can_break[j] && wordDict.contains(word)) {
can_break[i] = true;
break;
}
}
}
return can_break[s.length()];
}
}
~~~
### 源码分析
Python 之类的动态语言无需初始化指定大小的数组,使用时下标`i`比 C++和 Java 版的程序少1。使用自底向上的方法求解状态转移,首先遍历一次词典求得单词最大长度以便后续优化。
### 复杂度分析
1. 求解词典中最大单词长度,时间复杂度为词典长度乘上最大单词长度 O(LD⋅Lw)O(L_D \cdot L_w)O(LD⋅Lw)
1. 词典中找单词的时间复杂度为 O(1)O(1)O(1)(哈希表结构)
1. 两重 for 循环,内循环在超出最大单词长度时退出,故最坏情况下两重 for 循环的时间复杂度为 O(nLw)O(n L_w)O(nLw).
1. 故总的时间复杂度近似为 O(nLw)O(n L_w)O(nLw).
1. 使用了与字符串长度几乎等长的布尔数组和临时单词`word`,空间复杂度近似为 O(n)O(n)O(n).
- Preface
- Part I - Basics
- Basics Data Structure
- String
- Linked List
- Binary Tree
- Huffman Compression
- Queue
- Heap
- Stack
- Set
- Map
- Graph
- Basics Sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
- Bucket Sort
- Counting Sort
- Radix Sort
- Basics Algorithm
- Divide and Conquer
- Binary Search
- Math
- Greatest Common Divisor
- Prime
- Knapsack
- Probability
- Shuffle
- Basics Misc
- Bit Manipulation
- Part II - Coding
- String
- strStr
- Two Strings Are Anagrams
- Compare Strings
- Anagrams
- Longest Common Substring
- Rotate String
- Reverse Words in a String
- Valid Palindrome
- Longest Palindromic Substring
- Space Replacement
- Wildcard Matching
- Length of Last Word
- Count and Say
- Integer Array
- Remove Element
- Zero Sum Subarray
- Subarray Sum K
- Subarray Sum Closest
- Recover Rotated Sorted Array
- Product of Array Exclude Itself
- Partition Array
- First Missing Positive
- 2 Sum
- 3 Sum
- 3 Sum Closest
- Remove Duplicates from Sorted Array
- Remove Duplicates from Sorted Array II
- Merge Sorted Array
- Merge Sorted Array II
- Median
- Partition Array by Odd and Even
- Kth Largest Element
- Binary Search
- Binary Search
- Search Insert Position
- Search for a Range
- First Bad Version
- Search a 2D Matrix
- Search a 2D Matrix II
- Find Peak Element
- Search in Rotated Sorted Array
- Search in Rotated Sorted Array II
- Find Minimum in Rotated Sorted Array
- Find Minimum in Rotated Sorted Array II
- Median of two Sorted Arrays
- Sqrt x
- Wood Cut
- Math and Bit Manipulation
- Single Number
- Single Number II
- Single Number III
- O1 Check Power of 2
- Convert Integer A to Integer B
- Factorial Trailing Zeroes
- Unique Binary Search Trees
- Update Bits
- Fast Power
- Hash Function
- Count 1 in Binary
- Fibonacci
- A plus B Problem
- Print Numbers by Recursion
- Majority Number
- Majority Number II
- Majority Number III
- Digit Counts
- Ugly Number
- Plus One
- Linked List
- Remove Duplicates from Sorted List
- Remove Duplicates from Sorted List II
- Remove Duplicates from Unsorted List
- Partition List
- Two Lists Sum
- Two Lists Sum Advanced
- Remove Nth Node From End of List
- Linked List Cycle
- Linked List Cycle II
- Reverse Linked List
- Reverse Linked List II
- Merge Two Sorted Lists
- Merge k Sorted Lists
- Reorder List
- Copy List with Random Pointer
- Sort List
- Insertion Sort List
- Check if a singly linked list is palindrome
- Delete Node in the Middle of Singly Linked List
- Rotate List
- Swap Nodes in Pairs
- Remove Linked List Elements
- Binary Tree
- Binary Tree Preorder Traversal
- Binary Tree Inorder Traversal
- Binary Tree Postorder Traversal
- Binary Tree Level Order Traversal
- Binary Tree Level Order Traversal II
- Maximum Depth of Binary Tree
- Balanced Binary Tree
- Binary Tree Maximum Path Sum
- Lowest Common Ancestor
- Invert Binary Tree
- Diameter of a Binary Tree
- Construct Binary Tree from Preorder and Inorder Traversal
- Construct Binary Tree from Inorder and Postorder Traversal
- Subtree
- Binary Tree Zigzag Level Order Traversal
- Binary Tree Serialization
- Binary Search Tree
- Insert Node in a Binary Search Tree
- Validate Binary Search Tree
- Search Range in Binary Search Tree
- Convert Sorted Array to Binary Search Tree
- Convert Sorted List to Binary Search Tree
- Binary Search Tree Iterator
- Exhaustive Search
- Subsets
- Unique Subsets
- Permutations
- Unique Permutations
- Next Permutation
- Previous Permuation
- Unique Binary Search Trees II
- Permutation Index
- Permutation Index II
- Permutation Sequence
- Palindrome Partitioning
- Combinations
- Combination Sum
- Combination Sum II
- Minimum Depth of Binary Tree
- Word Search
- Dynamic Programming
- Triangle
- Backpack
- Backpack II
- Minimum Path Sum
- Unique Paths
- Unique Paths II
- Climbing Stairs
- Jump Game
- Word Break
- Longest Increasing Subsequence
- Palindrome Partitioning II
- Longest Common Subsequence
- Edit Distance
- Jump Game II
- Best Time to Buy and Sell Stock
- Best Time to Buy and Sell Stock II
- Best Time to Buy and Sell Stock III
- Best Time to Buy and Sell Stock IV
- Distinct Subsequences
- Interleaving String
- Maximum Subarray
- Maximum Subarray II
- Longest Increasing Continuous subsequence
- Longest Increasing Continuous subsequence II
- Graph
- Find the Connected Component in the Undirected Graph
- Route Between Two Nodes in Graph
- Topological Sorting
- Word Ladder
- Bipartial Graph Part I
- Data Structure
- Implement Queue by Two Stacks
- Min Stack
- Sliding Window Maximum
- Longest Words
- Heapify
- Problem Misc
- Nuts and Bolts Problem
- String to Integer
- Insert Interval
- Merge Intervals
- Minimum Subarray
- Matrix Zigzag Traversal
- Valid Sudoku
- Add Binary
- Reverse Integer
- Gray Code
- Find the Missing Number
- Minimum Window Substring
- Continuous Subarray Sum
- Continuous Subarray Sum II
- Longest Consecutive Sequence
- Part III - Contest
- Google APAC
- APAC 2015 Round B
- Problem A. Password Attacker
- Microsoft
- Microsoft 2015 April
- Problem A. Magic Box
- Problem B. Professor Q's Software
- Problem C. Islands Travel
- Problem D. Recruitment
- Microsoft 2015 April 2
- Problem A. Lucky Substrings
- Problem B. Numeric Keypad
- Problem C. Spring Outing
- Microsoft 2015 September 2
- Problem A. Farthest Point
- Appendix I Interview and Resume
- Interview
- Resume