💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
<div id="article_content" class="article_content clearfix"> <link rel="stylesheet" href="https://csdnimg.cn/release/blogv2/dist/mdeditor/css/editerView/kdoc_html_views-1a98987dfd.css"> <link rel="stylesheet" href="https://csdnimg.cn/release/blogv2/dist/mdeditor/css/editerView/ck_htmledit_views-25cebea3f9.css"> <div id="content_views" class="htmledit_views"> <h1><a name="t0"></a>1、ClicHouse语法优化规则</h1> <p></p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="https://so.csdn.net/so/search?q=ClickHouse&amp;spm=1001.2101.3001.7020" target="_blank" class="hl hl-1" data-report-click="{&quot;spm&quot;:&quot;1001.2101.3001.7020&quot;,&quot;dest&quot;:&quot;https://so.csdn.net/so/search?q=ClickHouse&amp;spm=1001.2101.3001.7020&quot;,&quot;extra&quot;:&quot;{\&quot;searchword\&quot;:\&quot;ClickHouse\&quot;}&quot;}" data-tit="ClickHouse" data-pretit="clickhouse">ClickHouse</a>的SQL优化规则是基于RBO(Rule Based Optimization),下面是一些优化规则。</p> <h2><a name="t1"></a>1.1、COUNT优化</h2> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;在调用count函数时,如果使用的是count()或者count(*),且没有where条件,则会直接使用system.tables的total_rows,例如:</p> <p style="text-align:center;"><img alt="" height="507" src="https://img-blog.csdnimg.cn/95c125b1508c47899eac0609d6e278df.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_18,color_FFFFFF,t_70,g_se,x_16" width="576"></p> <p></p> <p></p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;注意:Optimized trivial count,这是对count的优化。</p> <h2><a name="t2"></a>1.2、消除子查询重复字段</h2> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;语句子查询中有两个重复的id字段,会被去重。</p> <h2><a name="t3"></a>1.3、谓词下推</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 当group by有having子句,但是没有with cube、with rollup或者with totals修饰的时候,having会下推到where提前过滤。</p> <pre data-index="0"><code class="language-sql hljs">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> UserID <span class="hljs-keyword">FROM</span> hits_v1 <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> UserID <span class="hljs-keyword">HAVING</span> UserID<span class="hljs-operator">=</span><span class="hljs-string">'123456789'</span>;</code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p style="text-align:center;"><img alt="" height="281" src="https://img-blog.csdnimg.cn/94db83a9631c4386baf1a76e9f58ff09.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_20,color_FFFFFF,t_70,g_se,x_16" width="673"></p> <h2><a name="t4"></a>1.4、聚合计算外推</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 聚合函数内的计算,会外推,如下:</p> <pre data-index="1"><code class="language-sql hljs">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> <span class="hljs-built_in">sum</span>(UserID <span class="hljs-operator">*</span> <span class="hljs-number">2</span>) <span class="hljs-keyword">FROM</span> visits_v1;</code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p style="text-align:center;"><img alt="" height="223" src="https://img-blog.csdnimg.cn/655b2e0198994b2e911e70efb7b44753.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_14,color_FFFFFF,t_70,g_se,x_16" width="463"></p> <h2><a name="t5"></a>1.5、聚合函数消除</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 如果对聚合键,也就是group by key使用min、max等聚合函数,则会将函数消除。</p> <pre data-index="2"><code class="language-sql hljs">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> <span class="hljs-built_in">sum</span>(UserID <span class="hljs-operator">*</span> <span class="hljs-number">2</span>),<span class="hljs-built_in">max</span>(VisitID),<span class="hljs-built_in">max</span>(UserID) <span class="hljs-keyword">FROM</span> visits_v1 <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> UserID;</code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p><img alt="" height="340" src="https://img-blog.csdnimg.cn/027a2031177141d599fed92891122536.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_20,color_FFFFFF,t_70,g_se,x_16" width="745"></p> <h2><a name="t6"></a>1.6、删除重复的order by key、删除重复的limit by key</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 当order by、limit后有跟相同的key,SYNTAX会提示可进行优化。</p> <h2><a name="t7"></a>1.7、三元运算优化</h2> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set optimize_if_chain_to_multiif = 1;</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;或者直接使用如下SQL:</p> <pre data-index="3"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 查看语法优化</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">EXPLAIN SYNTAX <span class="hljs-keyword">SELECT</span> number <span class="hljs-operator">=</span><span class="hljs-number">1</span>? <span class="hljs-string">'hello'</span>:(number<span class="hljs-operator">=</span><span class="hljs-number">2</span>?<span class="hljs-string">'world'</span>:<span class="hljs-string">'abcdef'</span>) <span class="hljs-keyword">from</span> numbers(<span class="hljs-number">10</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">settings optimize_if_chain_to_multiif <span class="hljs-operator">=</span> <span class="hljs-number">1</span>;</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <h1><a name="t8"></a>2、查询优化</h1> <h2><a name="t9"></a>2.1、单表查询</h2> <h3><a name="t10"></a>2.1.1、Prewhere替代where</h3> <p>&nbsp; &nbsp; &nbsp; &nbsp; Prewhere和where语句的作用相同,用来过滤数据。不同之处在于prewhere只支持*MergeTree族系列引擎的表,首先会读取指定的列数据,来判断数据过滤,等待数据过滤之后再读取select声明的列字段来补全其余属性。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 当查询列明显多于筛选列时使用PreWhere可十倍提升查询性能,Prewhere会自动优化执行过滤阶段的数据读取方式,降低io操作。参数optimize_move_to_prewhere=1为开启状态,可以设置为0对其关闭。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 在某些场合下,prewhere语句比where语句处理的数据量更少性能更高。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 默认情况,我们肯定不会关闭where自动优化成prewhere,但是某些场景即使开启优化,也不会自动转换成prewhere,需要手动指定prewhere:</p> <ul><li>使用常量表达式</li><li>使用默认值为alias类型的字段</li><li>包含了arrayJOIN,globalIn,globalNotIn或者IndexHint的查询</li><li>select查询的列字段和where的谓词相同</li><li>使用了主键字段</li></ul> <h3><a name="t11"></a>2.1.2、数据采样</h3> <p>&nbsp; &nbsp; &nbsp; &nbsp; 通过采样运算可极大提升数据分析的性能</p> <pre data-index="4"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">select</span> Title,<span class="hljs-built_in">count</span>(<span class="hljs-operator">*</span>) <span class="hljs-keyword">As</span> PageViews </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">FROM</span> hits_v1</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">SAMPLE <span class="hljs-number">01.</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">WHERE</span> CounterID <span class="hljs-operator">=</span> <span class="hljs-number">57</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> Title</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> PageViews <span class="hljs-keyword">DESC</span> LIMIT <span class="hljs-number">1000</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;采样修饰符只有在MergeTree engine表中才有效,且在创建表时需要制定采样策略。</p> <h3><a name="t12"></a>2.1.3、列裁剪与分区裁剪</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;数据量太大应避免使用select * 操作,查询的性能会与查询的字段大小和数量成线性关系,字段越少,消耗的io资源越少,性能就会越高。</p> <ul><li>列裁剪表示只选取所需要的列。</li><li>分区裁剪就是只读取需要的分区,在过滤条件中指定。</li></ul> <h3><a name="t13"></a>2.1.4、order by 结合where、limit使用</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;千万以上数据集进行order by 查询时需要配合where条件和limit语句一起使用。</p> <h3><a name="t14"></a>2.1.5、避免构建虚拟列</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;如非必须,不要在结果集上构建虚拟列,虚拟列非常消耗资源浪费性能,可以考虑在前端处理,或者在表中构造实际字段进行额外存储。</p> <h3><a name="t15"></a>2.1.6、uniqueCombined替代distinct</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;性能可提升10倍以上,uniqCombined底层采用类似HyperLogLog算法实现,能接收2%左右的数据误差,可直接使用这种去重方式提升查询性能。Count(distinct)会使用uniqExact精确去重。</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;不建议在千万级不同数据上执行distinct去重查询,改为近似去重uniqCombined。</p> <h2><a name="t16"></a>2.2、 多表关联</h2> <h3><a name="t17"></a>2.2.1、用in代替join</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;当多表联查时,查询的数据仅从其中一张表出事,可以考虑用in操作而不是join</p> <h3><a name="t18"></a>2.2.2、大小表join</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;多表join时要满足小表在右的原则,右表关联时被加载到内存中与左表进行比较,ClickHouse中无论是Left join、Right Join还是Inner join永远都是拿着右表中的每一条记录到左表中查找该记录是否存在,所以右表必须是小表。</p> <h3><a name="t19"></a>2.2.3、注意谓词下推(版本差异)</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ClikHouses在join查询时不会主动发起谓词下推的操作,需要每个子查询提前完成过滤操作,需要注意的是,是否执行谓词下推,对性能影响差别很大(新版本中已经不存在此问题,但是需要注意谓词的位置不用依然有性能的差异)。</p> <h3><a name="t20"></a>2.2.4、分布式表使用GLOBAL</h3> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;两张分布式表的in和join之前必须加上GLOBAL关键字,右表只会在接收查询请求的那个节点查询一次,并将其分发到其它节点上,如果不加GLOBAL关键字的话,每个节点都会单独发起一次对右表的查询,而右表又是分布式表,就导致右表一共会被查询次(N是该分布式表的分片数量),这就是查询放大,会带来很大开销。</p> <h3><a name="t21"></a>2.2.5、使用字典表</h3> <p>&nbsp; &nbsp; &nbsp; &nbsp; 将一些需要关联分析的业务创建成字典表进行join操作,前提是字典表不宜太大,因为字典表会常驻内存。</p> <h3><a name="t22"></a>2.2.6、提前过滤</h3> <p>&nbsp; &nbsp; &nbsp; &nbsp; 通过增加逻辑过滤可以减少数据扫描,达到提供执行速度及降低内存消息的目的。</p> <h1><a name="t23"></a>3、数据一致性</h1> <p>&nbsp; &nbsp; &nbsp; &nbsp; ReplacingMergeeTree:数据的去重只会在数据合并期间进行。合并会在后台一个不确定的时间进行,因此你无法预先作出计划。有一些数据可能仍未被处理。集暖你可以调用optimize语句发起计划外的合并,但请不要依靠它,因为OPTIMIZE语句会引发对护具的大量读写。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 因此,ReplacingMergeTree适用于在后台清除重复的数据以节省空间,但是它不保证没有重复的数据出现。</p> <h2><a name="t24"></a>3.1、案例</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 1、创建表</p> <pre data-index="5"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> test_a(</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> user_id UInt64,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> score String,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> deleted UInt8 <span class="hljs-keyword">DEFAULT</span> <span class="hljs-number">0</span>,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> create_time DateTime <span class="hljs-keyword">DEFAULT</span> toDateTime(<span class="hljs-number">0</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">)ENGINE <span class="hljs-operator">=</span> ReplacingMergeTree(create_time)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> user_id;</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p>&nbsp; &nbsp; &nbsp; &nbsp; 其中:</p> <ul><li>user_id是数据去重更新的标识;</li><li>create_time是版本号字段,每组数据中create_time最大的一行表示最新的数据;</li><li>delete是自定义的一个标记位,比如0代表为删除,1代表删除数据。</li></ul> <p>&nbsp; &nbsp; &nbsp; &nbsp; 2、写入与修改数据</p> <pre data-index="6"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">WITH</span>(</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-keyword">SELECT</span> [<span class="hljs-string">'A'</span>,<span class="hljs-string">'B'</span>,<span class="hljs-string">'C'</span>,<span class="hljs-string">'D'</span>,<span class="hljs-string">'E'</span>,<span class="hljs-string">'F'</span>,<span class="hljs-string">'G'</span>]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">)<span class="hljs-keyword">AS</span> dict</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> number <span class="hljs-keyword">AS</span> user_id,dict[number<span class="hljs-operator">%</span><span class="hljs-number">7</span><span class="hljs-operator">+</span><span class="hljs-number">1</span>] <span class="hljs-keyword">FROM</span> numbers(<span class="hljs-number">10000</span>);</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score,create_time)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">WITH</span>(</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-keyword">SELECT</span> [<span class="hljs-string">'AA'</span>,<span class="hljs-string">'BB'</span>,<span class="hljs-string">'CC'</span>,<span class="hljs-string">'DD'</span>,<span class="hljs-string">'EE'</span>,<span class="hljs-string">'FF'</span>,<span class="hljs-string">'GG'</span>]</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">)<span class="hljs-keyword">AS</span> dict</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> number <span class="hljs-keyword">AS</span> user_id,dict[number<span class="hljs-operator">%</span><span class="hljs-number">7</span><span class="hljs-operator">+</span><span class="hljs-number">1</span>],now() <span class="hljs-keyword">as</span> create_time <span class="hljs-keyword">FROM</span> numbers(<span class="hljs-number">5000</span>);</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p>&nbsp; &nbsp; &nbsp; &nbsp; 3、此时尚未触发分区合并,所以还未去重。</p> <h2><a name="t25"></a>3.2、手动OPTIMIZE</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 在写入数据后,立刻执行OPTIMIZE强制触发新写入分区的合并动作</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 语法:OPTIMIZE TABLE [db.]name [ON CLUSTER cluster]&nbsp; [PARTITION partition | PARTITION ID 'partition_id'] [FINAL] [DEDUPLICATE [BY expression]]</p> <pre data-index="7"><code class="language-sql hljs">OPTIMIZE <span class="hljs-keyword">TABLE</span> test_a <span class="hljs-keyword">FINAL</span>;</code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <h2><a name="t26"></a>3.3、通过Group by 去重</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 1、执行去重的查询</p> <pre data-index="8"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> user_id,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(score,create_time) <span class="hljs-keyword">AS</span> score,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(deleted,create_time) <span class="hljs-keyword">AS</span> deleted,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-built_in">max</span>(create_time) <span class="hljs-keyword">AS</span> ctime</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">FROM</span> test_a</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> user_id</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">HAVING</span> deleted <span class="hljs-operator">=</span> <span class="hljs-number">0</span>;</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p>&nbsp; &nbsp; &nbsp; &nbsp; 函数说明:</p> <ul><li>argMax(field1,field2):按照field2的最大值取field1的值。</li></ul> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;当我们更新数据时,会写入一行新的数据,例如上面语句中,通过查询最大的create_time得到修改后的score字段值。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 2、创建视图</p> <pre data-index="9"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">VIEW</span> view_test_a <span class="hljs-keyword">AS</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">SELECT</span> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> user_id,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(score,create_time) <span class="hljs-keyword">AS</span> score,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> argMax(deleted,create_time) <span class="hljs-keyword">AS</span> deleted,</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> <span class="hljs-built_in">max</span>(create_time) <span class="hljs-keyword">AS</span> ctime</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">FROM</span> test_a</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> user_id</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">HAVING</span> deleted <span class="hljs-operator">=</span> <span class="hljs-number">0</span>;</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p>&nbsp; &nbsp; &nbsp; &nbsp; 3、删除数据测试</p> <pre data-index="10"><code class="language-sql hljs"><ol class="hljs-ln" style="width:100%"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 原有1W条数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">select</span> <span class="hljs-built_in">count</span>(<span class="hljs-operator">*</span>) <span class="hljs-keyword">from</span> test_a;</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 插入并查询,原表有10001条数据,视图表有10000条数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score,create_time) <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">0</span>,<span class="hljs-string">'AAAA'</span>,now());</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">-- 插入并查询,原表有10002条数据,视图表有9999条数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> <span class="hljs-keyword">TABLE</span> test_a(user_id,score,deleted,create_time) <span class="hljs-keyword">VALUES</span> (<span class="hljs-number">0</span>,<span class="hljs-string">'AAAA'</span>,<span class="hljs-number">1</span>,now());</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p style="text-align:center;"><img alt="" height="547" src="https://img-blog.csdnimg.cn/cf2faafcd5824ac4bc01b87a8f207433.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAWXVhbl9DU0RG,size_20,color_FFFFFF,t_70,g_se,x_16" width="829"></p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;这行数据并没有真正的被删除,而是被过滤掉了。在一些合适场景下,可以结合表级别的TTL最终将物理数据删除。</p> <h2><a name="t27"></a>3.4、通过FINAL查询</h2> <p>&nbsp; &nbsp; &nbsp; &nbsp; 在查询语句后增加FINAL修饰符,这样在查询的过程中将会执行Merge的特殊逻辑(例如数据去重,预聚合等)。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 但是这种方法在早期版本基本没有人使用,因为在增加FINAL之后,我们的查询将会变成一个单线程的执行过程,查询速度非常慢。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; <span style="color:#fe2c24;"><strong>在v20.5.2.7-stable版本中,FINAL查询是支持多线程执行</strong>,</span><span style="color:#0d0016;">并且可以通过</span><span style="color:#fe2c24;">max_final_threads参数</span><span style="color:#0d0016;">控制单个查询的线程数。但是目前读取part部分的动作依然是串行的。</span></p> <p><span style="color:#0d0016;">&nbsp; &nbsp; &nbsp; &nbsp; FIANL查询最终的性能和很多因素相关,列字段的大小、分区的数量等等都会影响到最终的查询时间,所以还要结合实际场景取舍。</span></p> <p><span style="color:#0d0016;">&nbsp; &nbsp; &nbsp; &nbsp; 参考链接:https://github.com/ClickHouse/ClickHouse/pull/10463</span></p> <h1><a name="t28"></a><span style="color:#0d0016;">4、物化视图</span></h1> <p><span style="color:#0d0016;">&nbsp; &nbsp; &nbsp; &nbsp; ClickHouse的物化视图是一种查询结果的持久化,它确实是给我们带来了查询效率的提升。用户查起来跟表没有quiet,它就会一张表,它也像是一张时刻在预计算的表,创建的过程它是用了一个特殊引擎,加上后来as select,就是create一个table as select的写法。</span></p> <p><span style="color:#0d0016;">&nbsp; &nbsp; &nbsp; &nbsp; “查询结果集”的范围很宽泛,可以是基础表中部分数据的一份简单拷贝,也可以使多表Join之后产生的结果或其子集,或者原始数据的聚合指标等等。所以,物化视图不会随着基础表的变化而变化,所以它也称为快照(snapshot)。</span></p> <h2><a name="t29"></a><span style="color:#0d0016;">4.1、概述</span></h2> <h3><a name="t30"></a><span style="color:#0d0016;">4.1.1、物化视图与普通视图的区别</span></h3> <p><span style="color:#0d0016;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color:#fe2c24;">&nbsp;普通视图不保存数据,保存的仅仅是查询语句,</span>查询的时候还是从原表读取数据,可以将普通视图理解为是个子查询。<span style="color:#fe2c24;">物化视图是把查询的结果根据相应的引擎存入到了磁盘或内存中,</span>对数据重新进行了组织,你可以理解物化视图是完全的一张新表。</p> <h3><a name="t31"></a>4.1.2、优缺点</h3> <p>&nbsp; &nbsp; &nbsp; &nbsp; 优点:查询速度快,要是把物化视图这些规则全部写好, 它比元数据查询快乐很多,总的行数少了,因为都预计算好了。</p> <p>&nbsp; &nbsp; &nbsp; &nbsp; 缺点:它的本质是一个流式数据的使用场景,是累加式的技术,所以要用历史数据做去重,去核这样的分析,在物化视图里面是不太好用的。在某些场景的使用也是有限的。而且如果一战表加了很多物化视图,在写这张表的时候,就会消耗很多机器的资源,比如护具带宽占满、存储一下子增加了很多。</p> <h3><a name="t32"></a>4.1.3、基本语法</h3> <p>&nbsp; &nbsp; &nbsp; &nbsp; 也是create语法,会创建一个隐藏的目标表来保存视图数据。也可以TO表名,保存到一张显示的表。没有加TO表名,表名默认就会.inner.物化视图名。</p> <pre data-index="11"><code class="language-sql hljs"><span class="hljs-keyword">CREATE</span> [MATERIALIZED] <span class="hljs-keyword">VIEW</span> [IF <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">EXISTS</span>] [db.]table_name [<span class="hljs-keyword">TO</span>[db.]name] [ENGINE <span class="hljs-operator">=</span> engine] [POPULATE] <span class="hljs-keyword">AS</span> <span class="hljs-keyword">SELECT</span> ...</code><div class="hljs-button {2}" data-title="复制" onclick="hljs.copyCode(event)"></div></pre> <p>&nbsp; &nbsp; &nbsp; &nbsp; 1、创建物化视图的限制</p> <ol><li>必须指定物化视图的engine用于数据存储</li><li>TO [db].[table]语法的使用,不得使用POPULATE。</li><li>查询语句(select)可以包含下面的子句:DISTINCT,GROUP BY,ORDER BY,LIMIT...</li><li>物化视图的alter操作有些限制,操作起来不大方便。</li><li>如物化视图的定义使用了 TO [db.]name子语句,则可以将目标表的视图卸载DETACH再装载ATTACH。</li></ol> <p>&nbsp; &nbsp; &nbsp; &nbsp; 2、物化视图的数据更新</p> <ol><li>物化视图创建好之后,若源表被写入新数据则物化视图也会同步更新。</li><li>POPULATE关键字决定了物化视图的更新策略: <ol><li>若有POPULATE则在创建视图的过程会将源表已经存在的数据一并导入,类似于create table ... as</li><li>若无POPULATE则物化视图在创建之后没有数据,只会在创建只有同步之后写入源表的数据。</li><li>clickhouse官方并不推荐使用POPULATE,因为在创建物化视图的过程中同时写入的数据不能被插入物化视图。</li></ol></li><li>物化视图不支持同步删除,若源表的数据不存在(删除了)则物化视图的数据仍然保留。</li><li>物化视图是一种特殊的数据表,可以用show table查看。</li></ol> <h2><a name="t33"></a></h2> <p></p> <p></p> <p></p> </div><div data-report-view="{&quot;mod&quot;:&quot;1585297308_001&quot;,&quot;spm&quot;:&quot;1001.2101.3001.6548&quot;,&quot;dest&quot;:&quot;https://blog.csdn.net/Yuan_CSDF/article/details/122293843&quot;,&quot;extend1&quot;:&quot;pc&quot;,&quot;ab&quot;:&quot;new&quot;}"><div></div></div> </div>