🔥码云GVP开源项目 12k star Uniapp+ElementUI 功能强大 支持多语言、二开方便! 广告
**1. `order by`特点** (1)因为只使用一个reducer执行,所以<mark>全局有序</mark>; (2)当输入规模较大时,需要较长的计算时间,应提前做好数据过滤; (3)支持使用`case when`,或表达式; (4)支持按位置编号排序,设置`set hive.groupby.orderby.position.alias=true;`,默认就是`true`; **2. 基本语法** ```sql -- order by 放在select语句的末尾 select * from table_name ... order by col_name, col_name, ... [asc(升序,默认)|desc(降序)] ``` **3. `order by`使用示例** ```sql -- 查询员工,按sal升序排序 select * from emp order by sal; +------------+------------+------------+----------+---------------+----------+-----------+-------------+--+ | emp.empno | emp.ename | emp.job | emp.mgr | emp.hiredate | emp.sal | emp.comm | emp.deptno | +------------+------------+------------+----------+---------------+----------+-----------+-------------+--+ | 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.0 | NULL | 20 | | 7900 | JAMES | CLERK | 7698 | 1981-12-3 | 950.0 | NULL | 30 | | 7876 | ADAMS | CLERK | 7788 | 1987-5-23 | 1100.0 | NULL | 20 | | 7521 | WARD | SALESMAN | 7698 | 1981-2-22 | 1250.0 | 500.0 | 30 | -- 查询员工,按sal降序排序 select * from emp order by sal desc; +------------+------------+------------+----------+---------------+----------+-----------+-------------+--+ | emp.empno | emp.ename | emp.job | emp.mgr | emp.hiredate | emp.sal | emp.comm | emp.deptno | +------------+------------+------------+----------+---------------+----------+-----------+-------------+--+ | 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000.0 | NULL | 10 | | 7902 | FORD | ANALYST | 7566 | 1981-12-3 | 3000.0 | NULL | 20 | | 7788 | SCOTT | ANALYST | 7566 | 1987-4-19 | 3000.0 | NULL | 20 | | 7566 | JONES | MANAGER | 7839 | 1981-4-2 | 2975.0 | NULL | 20 | -- 使用列的别名来做排序 select ename, sal*2 twosal from emp order by twosal; +---------+----------+--+ | ename | twosal | +---------+----------+--+ | SMITH | 1600.0 | | JAMES | 1900.0 | | ADAMS | 2200.0 | | WARD | 2500.0 | -- 多列排序 -- 先按照deptno排序,再按照sal排序 select ename, deptno, sal from emp order by deptno, sal; +---------+---------+---------+--+ | ename | deptno | sal | +---------+---------+---------+--+ | MILLER | 10 | 1300.0 | | CLARK | 10 | 2450.0 | | KING | 10 | 5000.0 | | SMITH | 20 | 800.0 | | ADAMS | 20 | 1100.0 | | JONES | 20 | 2975.0 | | SCOTT | 20 | 3000.0 | | FORD | 20 | 3000.0 | -- 按照位置编号进行排序 -- 按照第3列进行降序排序 select * from left_tbl order by 3 desc; +--------------+----------------+---------------+--+ | left_tbl.id | left_tbl.name | left_tbl.age | +--------------+----------------+---------------+--+ | 3 | 王五 | 25 | | 2 | 李四 | 22 | | 1 | 张三 | 20 | +--------------+----------------+---------------+--+ -- 使用case when,很少使用,暂时不知道什么效果 select * from left_tbl order by case when id>2 then 1 else 0 end; ``` **4. `hive.mapred.mode`属性**: ```sql -- 如果为nonstrict,默认,limit是可选的 0: jdbc:hive2://hadoop101:10000> set hive.mapred.mode=nostrict; select * from emp order by sal; -- 如果为strict,limit是必选的 0: jdbc:hive2://hadoop101:10000> set hive.mapred.mode=strict; select * from emp order by sal limit 5; ``` 这是因为在`order by`状态下所有的数据会在一台服务器进行reduce操作,即只有一台reduce在工作,如果在数据量大的情况下会出现无法输出的情况,如果进行`limit n`,则有`n * map number`条记录传入到reduce端,即使只有一个reduce也能够处理过来。