ThinkChat2.0新版上线,更智能更精彩,支持会话、画图、阅读、搜索等,送10W Token,即刻开启你的AI之旅 广告
# package utf8 `import "unicode/utf8"` utf8包实现了对utf-8文本的常用函数和常数的支持,包括rune和utf-8编码byte序列之间互相翻译的函数。 ## Index * [Constants](#pkg-constants) * [func ValidRune(r rune) bool](#ValidRune) * [func RuneLen(r rune) int](#RuneLen) * [func RuneStart(b byte) bool](#RuneStart) * [func FullRune(p []byte) bool](#FullRune) * [func FullRuneInString(s string) bool](#FullRuneInString) * [func Valid(p []byte) bool](#Valid) * [func ValidString(s string) bool](#ValidString) * [func RuneCount(p []byte) int](#RuneCount) * [func RuneCountInString(s string) int](#RuneCountInString) * [func EncodeRune(p []byte, r rune) int](#EncodeRune) * [func DecodeRune(p []byte) (r rune, size int)](#DecodeRune) * [func DecodeRuneInString(s string) (r rune, size int)](#DecodeRuneInString) * [func DecodeLastRune(p []byte) (r rune, size int)](#DecodeLastRune) * [func DecodeLastRuneInString(s string) (r rune, size int)](#DecodeLastRuneInString) ### Examples * [DecodeLastRune](#example-DecodeLastRune) * [DecodeLastRuneInString](#example-DecodeLastRuneInString) * [DecodeRune](#example-DecodeRune) * [DecodeRuneInString](#example-DecodeRuneInString) * [EncodeRune](#example-EncodeRune) * [FullRune](#example-FullRune) * [FullRuneInString](#example-FullRuneInString) * [RuneCount](#example-RuneCount) * [RuneCountInString](#example-RuneCountInString) * [RuneLen](#example-RuneLen) * [RuneStart](#example-RuneStart) * [Valid](#example-Valid) * [ValidRune](#example-ValidRune) * [ValidString](#example-ValidString) ## Constants ``` const ( RuneError = '\uFFFD' // 错误的Rune或"Unicode replacement character" RuneSelf = 0x80 // 低于RunSelf的字符用代表单字节的同一值表示 MaxRune = '\U0010FFFF' // 最大的合法unicode码值 UTFMax = 4 // 最大的utf-8编码的unicode字符的长度 ) ``` 编码的基础常数。 ## func [ValidRune](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L425 "View Source") ``` func ValidRune(r rune) bool ``` 判断r是否可以编码为合法的utf-8序列。 Example ``` valid := 'a' invalid := rune(0xfffffff) fmt.Println(utf8.ValidRune(valid)) fmt.Println(utf8.ValidRune(invalid)) ``` Output: ``` true false ``` ## func [RuneLen](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L310 "View Source") ``` func RuneLen(r rune) int ``` 返回r编码后的字节数。如果r不是一个合法的可编码为utf-8序列的值,会返回-1。 Example ``` fmt.Println(utf8.RuneLen('a')) fmt.Println(utf8.RuneLen('界')) ``` Output: ``` 1 3 ``` ## func [RuneStart](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L384 "View Source") ``` func RuneStart(b byte) bool ``` 报告字节b是否可以作为某个rune编码后的第一个字节。第二个即之后的字节总是将左端两个字位设为10。 Example ``` buf := []byte("a界") fmt.Println(utf8.RuneStart(buf[0])) fmt.Println(utf8.RuneStart(buf[1])) fmt.Println(utf8.RuneStart(buf[2])) ``` Output: ``` true true false ``` ## func [FullRune](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L203 "View Source") ``` func FullRune(p []byte) bool ``` 报告切片p是否以一个码值的完整utf-8编码开始。不合法的编码因为会被转换为宽度1的错误码值而被视为完整的。 Example ``` buf := []byte{228, 184, 150} // 世 fmt.Println(utf8.FullRune(buf)) fmt.Println(utf8.FullRune(buf[:2])) ``` Output: ``` true false ``` ## func [FullRuneInString](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L209 "View Source") ``` func FullRuneInString(s string) bool ``` 函数类似FullRune但输入参数是字符串。 Example ``` str := "世" fmt.Println(utf8.FullRuneInString(str)) fmt.Println(utf8.FullRuneInString(str[:2])) ``` Output: ``` true false ``` ## func [RuneCount](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L359 "View Source") ``` func RuneCount(p []byte) int ``` 返回p中的utf-8编码的码值的个数。错误或者不完整的编码会被视为宽度1字节的单个码值。 Example ``` buf := []byte("Hello, 世界") fmt.Println("bytes =", len(buf)) fmt.Println("runes =", utf8.RuneCount(buf)) ``` Output: ``` bytes = 13 runes = 9 ``` ## func [RuneCountInString](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L374 "View Source") ``` func RuneCountInString(s string) (n int) ``` 函数类似RuneCount但输入参数是一个字符串。 Example ``` str := "Hello, 世界" fmt.Println("bytes =", len(str)) fmt.Println("runes =", utf8.RuneCountInString(str)) ``` Output: ``` bytes = 13 runes = 9 ``` ## func [Valid](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L387 "View Source") ``` func Valid(p []byte) bool ``` 返回切片p是否包含完整且合法的utf-8编码序列。 Example ``` valid := []byte("Hello, 世界") invalid := []byte{0xff, 0xfe, 0xfd} fmt.Println(utf8.Valid(valid)) fmt.Println(utf8.Valid(invalid)) ``` Output: ``` true false ``` ## func [ValidString](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L407 "View Source") ``` func ValidString(s string) bool ``` 报告s是否包含完整且合法的utf-8编码序列。 Example ``` valid := "Hello, 世界" invalid := string([]byte{0xff, 0xfe, 0xfd}) fmt.Println(utf8.ValidString(valid)) fmt.Println(utf8.ValidString(invalid)) ``` Output: ``` true false ``` ## func [EncodeRune](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L330 "View Source") ``` func EncodeRune(p []byte, r rune) int ``` EncodeRune将r的utf-8编码序列写入p(p必须有足够的长度),并返回写入的字节数。 Example ``` r := '世' buf := make([]byte, 3) n := utf8.EncodeRune(buf, r) fmt.Println(buf) fmt.Println(n) ``` Output: ``` [228 184 150] 3 ``` ## func [DecodeRune](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L219 "View Source") ``` func DecodeRune(p []byte) (r rune, size int) ``` 函数解码p开始位置的第一个utf-8编码的码值,返回该码值和编码的字节数。如果编码不合法,会返回(RuneError, 1)。该返回值在正确的utf-8编码情况下是不可能返回的。 如果一个utf-8编码序列格式不正确,或者编码的码值超出utf-8合法码值的范围,或者不是该码值的最短编码,该编码序列即是不合法的。函数不会执行其他的验证。 Example ``` b := []byte("Hello, 世界") for len(b) > 0 { r, size := utf8.DecodeRune(b) fmt.Printf("%c %v\n", r, size) b = b[size:] } ``` Output: ``` H 1 e 1 l 1 l 1 o 1 , 1 1 世 3 界 3 ``` ## func [DecodeRuneInString](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L229 "View Source") ``` func DecodeRuneInString(s string) (r rune, size int) ``` 函数类似DecodeRune但输入参数是字符串。 Example ``` str := "Hello, 世界" for len(str) > 0 { r, size := utf8.DecodeRuneInString(str) fmt.Printf("%c %v\n", r, size) str = str[size:] } ``` Output: ``` H 1 e 1 l 1 l 1 o 1 , 1 1 世 3 界 3 ``` ## func [DecodeLastRune](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L239 "View Source") ``` func DecodeLastRune(p []byte) (r rune, size int) ``` 函数解码p中最后一个utf-8编码序列,返回该码值和编码序列的长度。 Example ``` b := []byte("Hello, 世界") for len(b) > 0 { r, size := utf8.DecodeLastRune(b) fmt.Printf("%c %v\n", r, size) b = b[:len(b)-size] } ``` Output: ``` 界 3 世 3 1 , 1 o 1 l 1 l 1 e 1 H 1 ``` ## func [DecodeLastRuneInString](https://github.com/golang/go/blob/master/src/unicode/utf8/utf8.go#L276 "View Source") ``` func DecodeLastRuneInString(s string) (r rune, size int) ``` 函数类似DecodeLastRune但输入参数是字符串。 Example ``` str := "Hello, 世界" for len(str) > 0 { r, size := utf8.DecodeLastRuneInString(str) fmt.Printf("%c %v\n", r, size) str = str[:len(str)-size] } ``` Output: ``` 界 3 世 3 1 , 1 o 1 l 1 l 1 e 1 H 1 ```