Kotlin 正则表达式 · ZetCode 中文系列教程

# Kotlin 正则表达式 > 原文： [http://zetcode.com/kotlin/regularexpressions/](http://zetcode.com/kotlin/regularexpressions/) Kotlin 正则表达式教程展示了如何在 Kotlin 中使用正则表达式。正则表达式用于文本搜索和更高级的文本操作。正则表达式是内置工具，如`grep`，`sed`，文本编辑器（如 vi，emacs）以及包括 Kotlin，JavaScript，Perl 和 Python 在内的编程语言。 ## Kotlin 正则表达式在 Kotlin 中，我们使用`Regex`构建正则表达式。 ```kt Regex("book") "book".toRegex() Regex.fromLiteral("book") ``` 模式是一个正则表达式，用于定义我们正在搜索或操纵的文本。它由文本文字和元字符组成。元字符是控制正则表达式计算方式的特殊字符。例如，使用`\s`，我们搜索空白。特殊字符必须双转义，否则我们可以使用 Kotlin 原始字符串。创建模式后，可以使用其中一个函数将模式应用于文本字符串。函数包括`matches()`，`containsMatchIn()`，`find()`，`findall()`，`replace()`和`split()`。下表显示了一些常用的正则表达式： | 正则表达式 | 含义 | | --- | --- | | `.` | 匹配任何单个字符。 | | `?` | 一次匹配或根本不匹配前面的元素。 | | `+` | 与前面的元素匹配一次或多次。 | | `*` | 与前面的元素匹配零次或多次。 | | `^` | 匹配字符串中的起始位置。 | | `$` | 匹配字符串中的结束位置。 | | <code>|</code> | 备用运算符。 | | `[abc]` | 匹配`a`或`b`或`c`。 | | `[a-c]` | 范围; 匹配`a`或`b`或`c`。 | | `[^abc]` | 否定，匹配除`a`或`b`或`c`之外的所有内容。 | | `\s` | 匹配空白字符。 | | `\w` | 匹配单词字符；等同于`[a-zA-Z_0-9]` | ## Kotlin `containsMatchIn`方法如果正则表达式与整个输入字符串匹配，则`matches()`方法返回`true`。 `containsMatchIn()`方法指示正则表达式是否可以在指定的输入中找到至少一个匹配项。 `KotlinRegexSimple.kt` ```kt package com.zetcode fun main(args : Array<String>) { val words = listOf("book", "bookworm", "Bible", "bookish","cookbook", "bookstore", "pocketbook") val pattern = "book".toRegex() println("*********************") println("containsMatchIn function") words.forEach { word -> if (pattern.containsMatchIn(word)) { println("$word matches") } } println("*********************") println("matches function") words.forEach { word -> if (pattern.matches(word)) { println("$word matches") } } } ``` 在示例中，我们使用`matches()`和`containsMatchIn()`方法。我们有一个单词表。模式将使用这两种方法在每个单词中寻找一个`"book"`字符串。 ```kt val pattern = "book".toRegex() ``` 使用`toRegex()`方法创建正则表达式模式。正则表达式由四个普通字符组成。 ```kt words.forEach { word -> if (pattern.containsMatchIn(word)) { println("$word matches") } } ``` 我们遍历列表，并对每个单词应用`containsMatchIn()`。 ```kt words.forEach { word -> if (pattern.matches(word)) { println("$word matches") } } ``` 我们再次遍历该列表，并对每个单词应用`matches()`。 ```kt ********************* containsMatchIn function book matches bookworm matches bookish matches cookbook matches bookstore matches pocketbook matches ********************* matches function book matches ``` 对于`containsMatchIn()`方法，如果`"book"`单词在单词中某处，则模式匹配；对于`matches()`，输入字符串必须完全匹配模式。 ## Kotlin `find()方法 `find()`方法返回输入中正则表达式的第一个匹配项，从指定的起始索引开始。起始索引默认为 0。 `KotlinRegexFind.kt` ```kt package com.zetcode fun main(args : Array<String>) { val text = "I saw a fox in the wood. The fox had red fur." val pattern = "fox".toRegex() val found = pattern.find(text) val m = found?.value val idx = found?.range println("$m found at indexes: $idx") val found2 = pattern.find(text, 11) val m2 = found2?.value val idx2 = found2?.range println("$m2 found at indexes: $idx2") } ``` 在示例中，我们找到了`"fox"`项匹配项的索引。 ```kt val found = pattern.find(text) val m = found?.value val idx = found?.range ``` 我们找到`"fox"`一词的第一个匹配项。我们得到它的值和索引。 ```kt val found2 = pattern.find(text, 11) val m2 = found2?.value val idx2 = found2?.range ``` 在第二种情况下，我们从索引 11 开始搜索，从而找到下一项。 ```kt fox found at indexes: 8..10 fox found at indexes: 29..31 ``` 这是输出。 ## Kotlin `findAll`方法 `findAll()`方法返回输入字符串中所有出现的正则表达式的序列。 `KotlinFindAll.kt` ```kt package com.zetcode fun main(args : Array<String>) { val text = "I saw a fox in the wood. The fox had red fur." val pattern = "fox".toRegex() val found = pattern.findAll(text) found.forEach { f -> val m = f.value val idx = f.range println("$m found at indexes: $idx") } } ``` 在示例中，我们使用`findAll()`查找所有出现的'fox'术语。 ## Kotlin `split()`方法 `split()`方法将输入字符串拆分为正则表达式的匹配项。 `KotlinRegexSplitting.js` ```kt package com.zetcode fun main(args: Array<String>) { val text = "I saw a fox in the wood. The fox had red fur." val pattern = "\\W+".toRegex() val words = pattern.split(text).filter { it.isNotBlank() } println(words) } ``` 在示例中，我们找出`"fox"`项的出现次数。 ```kt val pattern = "\\W+".toRegex() ``` 该模式包含`\W`命名的字符类，代表非单词字符。与`+`量词结合使用时，该模式会查找非单词字符，例如空格，逗号或点，这些字符通常用于分隔文本中的单词。请注意，字符类是两次转义的。 ```kt val words = pattern.split(text).filter { it.isNotBlank() } ``` 使用`split()`方法，我们将输入字符串分成单词列表。此外，我们删除了空白的结尾词，该词是由于我们的文本以非单词字符结尾而创建的。 ```kt [I, saw, a, fox, in, the, wood, The, fox, had, red, fur] ``` 这是输出。 ## 不区分大小写的匹配为了启用不区分大小写的搜索，我们将`RegexOption.IGNORE_CASE`传递给`toRegex()`方法。 `KotlinRegexCaseInsensitive.kt` ```kt package com.zetcode fun main(args: Array<String>) { val words = listOf("dog", "Dog", "DOG", "Doggy") val pattern = "dog".toRegex(RegexOption.IGNORE_CASE) words.forEach { word -> if (pattern.matches(word)) { println("$word matches") } } } ``` 在示例中，无论大小写如何，我们都将模式应用于单词。 ```kt val pattern = "dog".toRegex(RegexOption.IGNORE_CASE) ``` 我们使用`RegexOption.IGNORE_CASE`忽略输入字符串的大小写。 ```kt dog matches Dog matches DOG matches ``` 这是输出。 ## 点元字符点（。）元字符代表文本中的任何单个字符。 `KotlinRegexDotMeta.kt` ```kt package com.zetcode fun main(args : Array<String>) { val words = listOf("seven", "even", "prevent", "revenge", "maven", "eleven", "amen", "event") val pattern = "..even".toRegex() words.forEach { word -> if (pattern.containsMatchIn(word)) { println("$word matches") } } } ``` 在示例中，列表中有八个单词。我们在每个单词上应用一个包含两个点元字符的模式。 ```kt prevent matches eleven matches ``` 有两个与模式匹配的单词。 ## 问号元字符问号（？）元字符是与上一个元素零或一次匹配的量词。 `KotlinRegexQMarkMeta.kt` ```kt package com.zetcode fun main(args : Array<String>) { val words = listOf("seven", "even", "prevent", "revenge", "maven", "eleven", "amen", "event") val pattern = ".?even".toRegex() words.forEach { word -> if (pattern.matches(word)) { println("$word matches") } } } ``` 在示例中，我们在点字符后添加问号。这意味着在模式中我们可以有一个任意字符，也可以在那里没有任何字符。 ```kt seven matches even matches ``` 这是输出。 ## `{n，m}`量词 `{n，m}`量词至少匹配前一个表达式的`n`个，最多匹配`m`个。 `KotlinRegexMnQuantifier.kt` ```kt package com.zetcode fun main(args: Array<String>) { val words = listOf("pen", "book", "cool", "pencil", "forest", "car", "list", "rest", "ask", "point", "eyes") val pattern = "\\w{3,4}".toRegex() words.forEach { word -> if (pattern.matches(word)) { println("$word matches") } else { println("$word does not match") } } } ``` 在示例中，我们搜索具有三个或四个字符的单词。 ```kt val pattern = "\\w{3,4}".toRegex() ``` 在模式中，我们将一个单词字符重复三到四次。请注意，数字之间不能有空格。 ```kt pen matches book matches cool matches pencil does not match forest does not match car matches list matches rest matches ask matches point does not match eyes matches ``` 这是输出。 ## 锚点锚点匹配给定文本内字符的位置。当使用^锚时，匹配必须发生在字符串的开头，而当使用$锚时，匹配必须发生在字符串的结尾。 `KotlinRegexAnchors.kt` ```kt package com.zetcode fun main(args : Array<String>) { val sentences = listOf("I am looking for Jane.", "Jane was walking along the river.", "Kate and Jane are close friends.") val pattern = "^Jane".toRegex() sentences.forEach { sentence -> if (pattern.containsMatchIn(sentence)) { println("$sentence") } } } ``` 在示例中，我们有三个句子。搜索模式为`^Jane`。该模式检查`"Jane"`字符串是否位于文本的开头。 `Jane\.`将在句子结尾处查找`"Jane"`。 ## 交替交替运算符| 创建具有多种选择的正则表达式。 `KotlinRegexAlternations.kt` ```kt package com.zetcode fun main(args: Array<String>) { val words = listOf("Jane", "Thomas", "Robert", "Lucy", "Beky", "John", "Peter", "Andy") val pattern = "Jane|Beky|Robert".toRegex() words.forEach { word -> if (pattern.matches(word)) { println("$word") } } } ``` 列表中有八个名称。 ```kt val pattern = "Jane|Beky|Robert".toRegex() ``` 此正则表达式查找`Jane`，`"Beky"`或`"`Robert"`字符串。 ## 子模式子模式是模式中的模式。子模式使用`()`字符创建。 `KotlinRegexSubpatterns.kt` ```kt package com.zetcode fun main(args: Array<String>) { val words = listOf("book", "bookshelf", "bookworm", "bookcase", "bookish", "bookkeeper", "booklet", "bookmark") val pattern = "book(worm|mark|keeper)?".toRegex() words.forEach { word -> if (pattern.matches(word)) { println("$word matches") } else { println("$word does not match") } } } ``` 该示例创建一个子模式。 ```kt val pattern = "book(worm|mark|keeper)?".toRegex() ``` 正则表达式使用子模式。它与书呆子，书签，簿记员和书本单词匹配。 ```kt book matches bookshelf does not match bookworm matches bookcase does not match bookish does not match bookkeeper matches booklet does not match bookmark matches ``` 这是输出。 ## 字符类字符类定义了一组字符，任何字符都可以出现在输入字符串中以使匹配成功。 `KotlinRegexChClass.kt` ```kt package com.zetcode fun main(args: Array<String>) { val words = listOf("a gray bird", "grey hair", "great look") val pattern = "gr[ea]y".toRegex() words.forEach { word -> if (pattern.containsMatchIn(word)) { println("$word") } } } ``` 在该示例中，我们使用字符类同时包含灰色和灰色单词。 ```kt val pattern = "gr[ea]y".toRegex() ``` `[ea]`类允许在模式中使用'e'或'a'字符。 ## 命名字符类有一些预定义的字符类。 `\s`与空白字符`[\t\n\t\f\v]`匹配，`\d`与数字`[0-9]`匹配，`\w`与单词字符`[a-zA-Z0-9_]`匹配。 `KotlinRegexNamedClass.kt` ```kt package com.zetcode fun main(args: Array<String>) { val text = "We met in 2013\. She must be now about 27 years old." val pattern = "\\d+".toRegex() val found = pattern.findAll(text) found.forEach { f -> val m = f.value println("$m") } } ``` 在示例中，我们在文本中搜索数字。 ```kt val pattern = "\\d+".toRegex() ``` `\d+`模式在文本中查找任意数量的数字集。 ```kt val found = pattern.findAll(text) ``` 用`findAll()`查找所有匹配项。 ```kt 2013 27 ``` 这是输出。 ## 捕获组捕获组是一种将多个字符视为一个单元的方法。通过将字符放置在一组圆括号内来创建它们。例如，`(book)`是包含`'b', 'o', 'o', 'k'`字符的单个组。捕获组技术使我们能够找出字符串中与正则表达式模式匹配的那些部分。 `KotlinRegexCapturingGroups.kt` ```kt package com.zetcode fun main(args: Array<String>) { val content = """<p>The <code>Pattern</code> is a compiled representation of a regular expression.</p>""" val pattern = "(<\\/?[a-z]*>)".toRegex() val found = pattern.findAll(content) found.forEach { f -> val m = f.value println("$m") } } ``` 该代码示例通过捕获一组字符来打印提供的字符串中的所有 HTML 标签。 ```kt val found = pattern.findAll(content) ``` 为了找到所有标签，我们使用`findAll()`方法。 ```kt <p> <code> </code> </p> ``` 我们找到了四个 HTML 标签。 ## Kotlin 正则表达式电子邮件示例在以下示例中，我们创建一个用于检查电子邮件地址的正则表达式模式。 `KotlinRegexEmails.kt` ```kt package com.zetcode fun main(args: Array<String>) { val emails = listOf("luke@gmail.com", "andy@yahoocom", "34234sdfa#2345", "f344@gmail.com", "dandy!@yahoo.com") val pattern = "[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\\.[a-zA-Z.]{2,18}".toRegex() emails.forEach { email -> if (pattern.matches(email)) { println("$email matches") } else { println("$email does not match") } } } ``` 本示例提供了一种可能的解决方案。 ```kt val pattern = "[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\\.[a-zA-Z.]{2,18}".toRegex() ``` 电子邮件分为五个部分。第一部分是本地部分。通常，它是公司名称，个人名称或昵称。 `[a-zA-Z0-9._-]+`列出了我们可以在本地部分使用的所有可能的字符。它们可以使用一次或多次。第二部分由文字`@`字符组成。第三部分是领域部分。通常是电子邮件提供商的域名，例如 yahoo 或 gmail。 `[a-zA-Z0-9-]+`是一个字符类，提供可在域名中使用的所有字符。 `+`量词允许使用这些字符中的一个或多个。第四部分是点字符。它前面是双转义字符（\\）以获取文字点。最后一部分是顶级域名：`[a-zA-Z.]{2,18}`。顶级域可以包含 2 到 18 个字符，例如`sk, net, info, travel, cleaning, travelinsurance`最大长度可以为 63 个字符，但是今天大多数域都少于 18 个字符。还有一个点字符。这是因为某些顶级域包含两个部分：例如`co.uk`。 ```kt luke@gmail.com matches andy@yahoocom does not match 34234sdfa#2345 does not match f344@gmail.com matches dandy!@yahoo.com does not match ``` 这是输出。在本章中，我们介绍了 Kotlin 中的正则表达式。您可能也对以下相关教程感兴趣： [Kotlin 范围教程](/kotlin/ranges/)和 [Kotlin 设置教程](/kotlin/sets/)。