Java 正则表达式教程 · BeginnersBook 中文系列教程

# Java 正则表达式教程 > 原文： [https://beginnersbook.com/2014/08/java-regex-tutorial/](https://beginnersbook.com/2014/08/java-regex-tutorial/) **正则表达式**用于定义可用于搜索，操作和编辑文本的字符串模式。这些表达式也称为 **Regex** （正则表达式的简写形式）。 #### 让我们举个例子来更好地理解它：在下面的示例中，正则表达式`.*book.*`用于搜索文本中字符串“book”的出现。 ```java import java.util.regex.*; class RegexExample1{ public static void main(String args[]){ String content = "This is Chaitanya " + "from Beginnersbook.com."; String pattern = ".*book.*"; boolean isMatch = Pattern.matches(pattern, content); System.out.println("The text contains 'book'? " + isMatch); } } ``` 输出： ```java The text contains 'book'? true ``` 在本教程中，我们将学习如何定义模式以及如何使用它们。`java.util.regex` API（我们在处理 Regex 时需要导入的包）有两个主要类： 1）`java.util.regex.Pattern` - 用于定义模式 2）`java.util.regex.Matcher` - 用于使用模式对文本执行匹配操作 ## `java.util.regex.Pattern`类： #### 1）`Pattern.matches()` 我们已经在上面的例子中看到过这种方法的用法，我们在给定的文本中搜索了字符串`"book"`。这是使用 Regex 在文本中搜索`String`的最简单和最简单的方法之一。 ```java String content = "This is a tutorial Website!"; String patternString = ".*tutorial.*"; boolean isMatch = Pattern.matches(patternString, content); System.out.println("The text contains 'tutorial'? " + isMatch); ``` 如您所见，我们使用`Pattern`类的`matches()`方法来搜索给定文本中的模式。模式`.*tutorial.*`在字符串`"tutorial"`的开头和结尾允许零个或多个字符（表达式`.*`用于零个或多个字符）。 **限制**：通过这种方式，我们可以搜索文本中单个出现的模式。要匹配多次出现，您应该使用`Pattern.compile()`方法（在下一节中讨论）。 #### 2）`Pattern.compile()` 在上面的例子中，我们在文本中搜索了一个字符串`"tutorial"`，这是一个区分大小写的搜索，但是如果你想进行大小写不敏感搜索或者想要多次搜索，那么你可能需要在用文本搜索之前，先使用`Pattern.compile()`。这就是这种方法可以用于这种情况的方法。 ```java String content = "This is a tutorial Website!"; String patternString = ".*tuToRiAl."; Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); ``` 这里我们使用了一个标志`Pattern.CASE_INSENSITIVE`来进行不区分大小写的搜索，还有其他几个标志可以用于不同的目的。要阅读有关此类标志的更多信息，[请参阅此文档](https://docs.oracle.com/javase/tutorial/essential/regex/pattern.html)。 **现在**：我们已经获得了一个`Pattern`实例，但是如何匹配呢？为此，我们需要一个`Matcher`实例，我们可以使用`Pattern.matcher()`方法。让我们讨论一下。 #### 3）`Pattern.matcher()`方法在上一节中，我们学习了如何使用`compile()`方法获取`Pattern`实例。在这里，我们将学习如何使用`matcher()`方法从`Pattern`实例获取`Matcher`实例。 ```java String content = "This is a tutorial Website!"; String patternString = ".*tuToRiAl.*"; Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(content); boolean isMatched = matcher.matches(); System.out.println("Is it a Match?" + isMatched); ``` 输出： ```java Is it a Match?true ``` #### 4）`Pattern.split()` 要根据分隔符将文本拆分为多个字符串（这里使用**正则表达式**指定分隔符），我们可以使用`Pattern.split()`方法。这是如何做到的。 ```java import java.util.regex.*; class RegexExample2{ public static void main(String args[]){ String text = "ThisIsChaitanya.ItISMyWebsite"; // Pattern for delimiter String patternString = "is"; Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE); String[] myStrings = pattern.split(text); for(String temp: myStrings){ System.out.println(temp); } System.out.println("Number of split strings: "+myStrings.length); }} ``` 输出： ```java Th Chaitanya.It MyWebsite Number of split strings: 4 ``` 第二个拆分字符串在输出中为`null`。 ## `java.util.regex.Matcher`类我们已经讨论过上面的`Matcher`类了。让我们回忆一下： #### 创建`Matcher`实例 ```java String content = "Some text"; String patternString = ".*somestring.*"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(content); ``` #### 主要方法 `matches()`：它在创建 Matcher 实例时将正则表达式与传递给`Pattern.matcher()`方法的整个文本进行匹配。 ```java ... Matcher matcher = pattern.matcher(content); boolean isMatch = matcher.matches(); ``` `lookingAt()`：类似于`matches()`方法，只是它匹配正则表达式只对着文本的开头，而`matches()`搜索整个文本。 `find()`：搜索文本中正则表达式的出现次数。主要用于搜索多次出现的情况。 `start()`和`end()`：这两种方法通常与`find()`方法一起使用。它们用于获取使用`find()`方法找到的匹配项的开始和结束索引。 #### 让我们举个例子使用`Matcher`方法来找出多次出现： ```java package beginnersbook.com; import java.util.regex.*; class RegexExampleMatcher{ public static void main(String args[]){ String content = "ZZZ AA PP AA QQQ AAA ZZ"; String string = "AA"; Pattern pattern = Pattern.compile(string); Matcher matcher = pattern.matcher(content); while(matcher.find()) { System.out.println("Found at: "+ matcher.start() + " - " + matcher.end()); } } } ``` 输出： ```java Found at: 4 - 6 Found at: 10 - 12 Found at: 17 - 19 ``` 现在我们熟悉`Pattern`和`Matcher`类以及将正则表达式与文本匹配的过程。让我们看一下我们定义正则表达式的各种选项： #### 1）字符串字面值假设您只想在文本中搜索特定字符串，例如“abc”然后我们可以简单地编写这样的代码：这里的文本和正则表达式都是相同的。 `Pattern.matches("abc", "abc")` #### 2）字符类字符类将输入文本中的单个字符与字符类中的多个允许字符进行匹配。例如`[Cc]haitanya`将匹配所有出现的字符串`"chaitanya"`，带有小写或大写`C`。更多例子： `Pattern.matches("[pqr]", "abcd");`它会给出错误，因为文中没有`p`，`q`或`r` `Pattern.matches("[pqr]", "r");`当找到`r`时返回`true` `Pattern.matches("[pqr]", "pq");`返回`false`，因为其中任何一个都可以在文本中不是两个。以下是各种字符类构造的完整列表： `[abc]`：如果文本中包含其中一个（`a`，`b`或`c`）且只有一次，它将与文本匹配。 `[^abc]`：除`a`，`b`或`c`以外的任何单个字符（`^`表示否定） `[a-zA-Z]`：`a`到`z`，或`A`到`Z`，包括（范围） `[a-d[m-p]]`：`a`到`d`，或`m`到`p`：`[a-dm-p]`（联合） `[a-z&&[def]]`：它们中的任何一个（`d`，`e`或`f`） `[a-z&&[^bc]]`：`a`到`z`，除了`b`和`c`：`[ad-z]`（减法） `[a-z&&[^m-p]]`：`a`到`z`，除了`m`到`p`：`[a-lq-z]`（减法） #### 预定义字符类 - 元字符这些就像在编写正则表达式时可以使用的短代码。 ```java Construct Description . -> Any character (may or may not match line terminators) \d -> A digit: [0-9] \D -> A non-digit: [^0-9] \s -> A whitespace character: [ \t\n\x0B\f\r] \S -> A non-whitespace character: [^\s] \w -> A word character: [a-zA-Z_0-9] \W -> A non-word character: [^\w] ``` 对于例如 `Pattern.matches("\\d", "1");`将返回`true` `Pattern.matches("\\D", "z");`返回`true` `Pattern.matches(".p", "qp");`返回`true`，点（`.`）代表任何字符 #### 边界匹配器 ```java ^ Matches the beginning of a line. $ Matches then end of a line. \b Matches a word boundary. \B Matches a non-word boundary. \A Matches the beginning of the input text. \G Matches the end of the previous match \Z Matches the end of the input text except the final terminator if any. \z Matches the end of the input text. ``` 例如 `Pattern.matches("^Hello$", "Hello")`：返回`true`，以`Hello`开始和结束 `Pattern.matches("^Hello$", "Namaste! Hello")`：返回`false`，不以`Hello`开始 `Pattern.matches("^Hello$", "Hello Namaste!")`：返回`false`，不以`Hello`结束 #### 量词 ```java Greedy Reluctant Possessive Matches X? X?? X?+ Matches X once, or not at all (0 or 1 time). X* X*? X*+ Matches X zero or more times. X+ X+? X++ Matches X one or more times. X{n} X{n}? X{n}+ Matches X exactly n times. X{n,} X{n,}? X{n,}+ Matches X at least n times. X{n, m) X{n, m)? X{n, m)+ Matches X at least n time, but at most m times. ``` #### 几个例子 ```java import java.util.regex.*; class RegexExample{ public static void main(String args[]){ // It would return true if string matches exactly "tom" System.out.println( Pattern.matches("tom", "Tom")); //False /* returns true if the string matches exactly * "tom" or "Tom" */ System.out.println( Pattern.matches("[Tt]om", "Tom")); //True System.out.println( Pattern.matches("[Tt]om", "Tom")); //True /* Returns true if the string matches exactly "tim" * or "Tim" or "jin" or "Jin" */ System.out.println( Pattern.matches("[tT]im|[jJ]in", "Tim"));//True System.out.println( Pattern.matches("[tT]im|[jJ]in", "jin"));//True /* returns true if the string contains "abc" at * any place */ System.out.println( Pattern.matches(".*abc.*", "deabcpq"));//True /* returns true if the string does not have a * number at the beginning */ System.out.println( Pattern.matches("^[^\\d].*", "123abc")); //False System.out.println( Pattern.matches("^[^\\d].*", "abc123")); //True // returns true if the string contains of three letters System.out.println( Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aPz"));//True System.out.println( Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aAA"));//True System.out.println( Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "apZx"));//False // returns true if the string contains 0 or more non-digits System.out.println( Pattern.matches("\\D*", "abcde")); //True System.out.println( Pattern.matches("\\D*", "abcde123")); //False /* Boundary Matchers example * ^ denotes start of the line * $ denotes end of the line */ System.out.println( Pattern.matches("^This$", "This is Chaitanya")); //False System.out.println( Pattern.matches("^This$", "This")); //True System.out.println( Pattern.matches("^This$", "Is This Chaitanya")); //False } } ``` #### 参考 [正则表达式 - Javadoc](https://docs.oracle.com/javase/tutorial/essential/regex/)