企业🤖AI智能体构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
●▲● **`array split(string str, string pat)`** -- 按照 pat 字符串分割 str,返回分割后的字符串数组 ```sql hive> select split('abtcdtef','t') from test; ["ab","cd","ef"] ``` ●▲● **`string concat(string A, string B, ...)`** -- 拼接字符串,返回字符串AB ```sql select concat('abc', 'def', 'ghf'); +------------+--+ | _c0 | +------------+--+ | abcdefghf | +------------+--+ select concat(id, name) from test; +------------+--+ | _c0 | +------------+--+ | 1zhangsan | | 2lisi | +------------+--+ ``` ●▲● **`string concat_ws(string SEP, string A, string B, string ...)`** - -- 拼接字符串,返回字符串AB -- SEP: A与B的分割符 ```sql select concat_ws(',', "abc", "def", "gh"); +-------------+--+ | _c0 | +-------------+--+ | abc,def,gh | +-------------+--+ ``` ●▲● **`string substr(string A, int start)`** **`string substr(string A, int start, int len)`** **`string substring(string A, int start)`** **`string substring(string A, int start, int len)`** -- 截取字符串A ```sql select substr('abcde', 3); +------+--+ | _c0 | +------+--+ | cde | +------+--+ select substr('abcde', -1); +------+--+ | _c0 | +------+--+ | e | +------+--+ ``` ●▲● **`int instr(string str, string substr)`** -- 返回substr在str首次出现的位置 ```sql select instr("abcdef", "de"); +------+--+ | _c0 | +------+--+ | 4 | +------+--+ ``` ●▲● **int lenth(string A)`** -- 返回字符串A的长度 ```sql select length("abc"); +------+--+ | _c0 | +------+--+ | 3 | +------+--+ ``` ●▲● **`int locate(string substr, string str[, int pos])`** -- 返回substr在str中首次出现的位置,从pos开始查找 ```sql select locate('a', 'abcda'); +------+--+ | _c0 | +------+--+ | 1 | +------+--+ select locate('a', 'abcda', 2); +------+--+ | _c0 | +------+--+ | 5 | +------+--+ ``` ●▲● **`string upper(string A)`** **`string ucase(string A)`** -- 都是将字符串A转换为大写 ```sql select upper('abcdE'); +--------+--+ | _c0 | +--------+--+ | ABCDE | +--------+--+ select ucase('abcdE'); +--------+--+ | _c0 | +--------+--+ | ABCDE | +--------+--+ ``` <br/> ●▲● **`string lower(string A)`** **`string lcase(string A)`** -- 都是将字符串转换为小写 ```sql select lower('ABCDe'); +--------+--+ | _c0 | +--------+--+ | abcde | +--------+--+ select lcase('ABcD5'); +--------+--+ | _c0 | +--------+--+ | abcd5 | +--------+--+ ``` <br/> ●▲● **`string trim(string A)`** 去除字符串A两边的空格; `string ltrim(string A)` 去除字符串A左边空格; `string rtrim(string A)`去除字符串A右边空格; ```sql select trim(' ab c '); +-------+--+ | _c0 | +-------+--+ | ab c | +-------+--+ ``` ●▲● **`string regexp_replace(string A, string regx, string C)`** -- 将A中符合java正则表达式 regx 的部分用C替换 ```sql select regexp_replace('foobar', 'oo|ar', ''); +------+--+ | _c0 | +------+--+ | fb | +------+--+ ``` ●▲● **`string regexp_extract(string subject, string pattern, int index)`** -- 将subject按照pattern正则规则拆分,返回拆分后index位置的字符 ```sql select regexp_extract('foothebar','foo(.*?)(bar)', 2); +------+--+ | _c0 | +------+--+ | bar | +------+--+ ``` 注意:在有些情况下要使用转义字符,下面的等号要用双竖线转义,这是 java 正则表达式的规则。 ```sql select data_field, regexp_extract(data_field,'.*?bgStart\\=([^&]+)',1) as aaa, regexp_extract(data_field,'.*?contentLoaded_headStart\\=([^&]+)',1)as bbb, regexp_extract(data_field,'.*?AppLoad2Req\\=([^&]+)',1) asccc from pt_nginx_loginlog_st where pt = '2012-03-26' limit 2; ``` ●▲●`int ascii(string str)`查看str字符串中第一个字符的ascii值 ```sql select ascii('ab') as ascii; +--------+--+ | ascii | +--------+--+ | 97 | +--------+--+ select ascii(name) as ascii from test; +--------+--+ | ascii | +--------+--+ | 122 | | 108 | +--------+--+ ``` ●▲●`string base64(binary bin)`返回二进制bin的base64编码格式 ```sql select base64(binary('test')); +-----------+--+ | _c0 | +-----------+--+ | dGVzdA== | +-----------+--+ select base64(binary(name)) from test; +---------------+--+ | _c0 | +---------------+--+ | emhhbmdzYW4= | | bGlzaQ== | +---------------+--+ ``` ●▲●`string concat_ws(string SEP, array<string>)` -- 将数组以字符串返回 -- SEP 字符串元素分割符 ```sql select concat_ws('|', array('a', 'b', 'cd')); +---------+--+ | _c0 | +---------+--+ | a|b|cd | +---------+--+ -- 列转行 select user_id, concat_ws(",", collect_list(order_id)) as order_value from col_lie group by user_id limit 10; ``` ●▲●`string format_number(number x, int d)` -- 将x保留d位小数,四舍五入 ```sql select format_number(5.23456, 3); +--------+--+ | _c0 | +--------+--+ | 5.235 | +--------+--+ ``` <br/> ●▲●`map<string, string> str_to_map(text[, delimiter1, delimiter2])` -- 将字符串text按照给定的分割符转换为map结构 ```sql select str_to_map('key1:value1,key2:value2'); +------------------------------------+--+ | _c0 | +------------------------------------+--+ | {"key1":"value1","key2":"value2"} | +------------------------------------+--+ select str_to_map('k1=v1,k2=v2', ',', '='); +------------------------+--+ | _c0 | +------------------------+--+ | {"k1":"v1","k2":"v2"} | +------------------------+--+ ``` ●▲●`string printf(string format, obj... args)` -- 将obj进行format格式化 ```sql select printf("%08X", 123); +-----------+--+ | _c0 | +-----------+--+ | 0000007B | +-----------+--+ ``` ●▲●`binary unbase64(string base64str)` -- 将base64str解码为二进制 ```sql select base64(binary("hive")); +-----------+--+ | _c0 | +-----------+--+ | aGl2ZQ== | +-----------+--+ select unbase64('aGl2ZQ=='); +-------+--+ | _c0 | +-------+--+ | hive | +-------+--+ ``` <br/> ●▲● `string parse_url(string url, string partToExtract[, string keyToExtract])` -- 返回URL中指定的部分 -- partToExtract取值有`HOST`, `PATH`, `QUERY`,`REF`,`PROTOCOL`, `AUTHORITY`, `FILE`,`USERINFO`. ```sql select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST'); +---------------+--+ | _c0 | +---------------+--+ | facebook.com | +---------------+--+ select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY','k1'); +------+--+ | _c0 | +------+--+ | v1 | +------+--+ ``` ●▲●`string get_json_object(string json_str, string path)` -- 返回json_str中与path匹配的内容,如果没有匹配返回NULL ```sql select get_json_object('{"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" } ','$.owner'); +------+--+ | _c0 | +------+--+ | amy | +------+--+ ``` ●▲●`string space(int n)`返回长度为 n 的字符串 ```sql hive> select space(10) from test; hive> select length(space(10)) from test; 10 ``` ●▲●`string repeat(string str, int n)`返回重复 n 次后的 str 字符串 ```sql hive> select repeat('abc',5) from test; abcabcabcabcabc ``` ●▲●`string lpad(string str, int len, string pad)` -- 将 str 进行用 pad 进行左补足到 len 位 ```sql hive> select lpad('abc',10,'td') from test; tdtdtdtabc ``` 注意:与 GP,ORACLE 不同,pad 不能默认 <br/> ●▲●`string rpad(string str, int len, string pad)` -- 将 str 进行用 pad 进行右补足到 len 位 ``` hive> select rpad('abc',10,'td') from test; abctdtdtdt ``` ●▲●`int find_in_set(string str, string strList)` -- 返回 str 在 strlist 第一次出现的位置,strlist 是用逗号分割的字符串。如果没有找到该 str 字符,则返回 0 ```sql hive> select find_in_set('ab','ef,ab,de') from test; 2 hive> select find_in_set('at','ef,ab,de') from test; 0 ``` ●▲●`array<array<string>> sentences(string str, string lang, string locale)` -- 返回输入 str 分词后的单词数组 ```sql hive> select sentences('hello word!hello hive,hi hive,hellohive') from test; OK [["hello","word"],["hello","hive","hi","hive","hello","hive"]] ``` ●▲●`array<struct<string, double>> ngrams(array<array<string>>, int N, intK, int pf)` -- 与 sentences()函数一起使用,分词后,统计分词结果中一起出现频次最高的 TOP-K 结果 ```sql hive> SELECT ngrams(sentences('hello word!hellohive,hi hive,hello hive'),2,2) FROM test; [{"ngram":["hello","hive"],"estfrequency":2.0},{"ngram":["hive","hello"],"estfreq uency":1.0}] ``` 该查询中,统计的是两个词在一起出现频次最高的 TOP-2; 结果中,hello 与 hive 同时出现 2 次 ●▲● ``` array<struct<string,double>> context_ngrams(array<array<string>>,array<string>, int K, int pf) ``` -- 与 sentences()函数一起使用,分词后,统计分词结果中与数组中指定的单词一起出现(包括顺序)频次最高的 TOP-K 结果 ```sql hive> SELECT context_ngrams(sentences('helloword!hello hive,hi hive,hello hive') ,array('hello',null),3) FROM test; [{"ngram":["hive"],"estfrequency":2.0},{"ngram":["word"],"estfrequency":1.0}] ``` 该查询中,统计的是与’hello’一起出现,并且在 hello 后面的频次最高的TOP-3; 结果中,hello 与 hive 同时出现 2 次,hello 与 word 同时出现 1 次。 ```sql hive> SELECT context_ngrams(sentences('hello word!hello hive,hi hive,hello hive') ,array(null,'hive'),3) FROM test; [{"ngram":["hello"],"estfrequency":2.0},{"ngram":["hi"],"estfrequency":1.0}] ``` 该查询中,统计的是与’hive’一起出现,并且在 hive 之前的频次最高的 TOP-3