数据库中如何搜索时空行为数据
这篇文章主要讲解了“数据库中如何搜索时空行为数据”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“数据库中如何搜索时空行为数据”吧!
数据结构
时空行为数据包含三个属性:时间、空间和对象。
非结构化索引:
createtabletest(idint8,crt_timetimestamp,--Timeposgeometry,--Locationobjjsonb--Objectdescription);
除了应用于JSON,结构化数据还可以用于对象描述。例如:
createtabletest(idint8,crt_timetimestamp,--Timeposgeometry,--Locationc1int,--Somepropertyexamplesc2int,c3text,c4float8,c5int,c6date,c7text,c8int,c9int,c10int);
时空行为数据的SQL查询实例
select*fromtestwherepos<->?<?andcrt_timebetween?and?and((c1=?andc2between?and?)orc10=?)...;
优化方法
考虑运用以下知识:
时间序列BRIN索引
crt_time字段是一个时间序列字段,表示生成数据的时间。在PostgreSQL堆存储中,存储和该字段的值具有很强的线性相关性。
因此,BRIN索引很合适。
使用BRIN索引来代替分区表进行TPC-H测试。大范围搜索的性能甚至优于使用分区表时的功能。
createindexidx_test_1ontestusingbrin(crt_time);
空间索引
显然,空间检索需要空间索引。PostgreSQL中可以使用三种方法实现空间检索。
1. 几何类型的GIST索引
createindexidx_test_2ontestusinggist(pos);
该索引支持空间KNN搜索和空间位置确定等功能。
2. 几何类型的主索引
createindexidx_test_2ontestusingspgist(pos);
该索引支持空间KNN搜索和空间位置确定等功能。
3. Geohash和B-tree索引(将经度和纬度转换为Geohash并为hash值创建B-tree索引)。只需使用表达式索引。
createindexidx_test_3ontestusingbtree(ST_GeoHash(pos,15));
此索引支持前缀搜索(其能落实编码地理信息网格中包含的关系)。它属于有损索引,需要二次过滤。
GiST和SPGiST空间索引能够找到准确的地理位置信息,优于GEOHASH索引。但是,查询信息时需要特别注意。
GIN 索引
此索引类型的目标是对象属性字段JSONB或多个结构化对象属性字段。只需使用GIN索引。
例如:
createextensionbtree_gin;
非结构化索引:
createindexidx_test_4ontestusinggin(obj);
结构化索引:
createindexidx_test_4ontestusinggin(c1,c2,c3,c4,c5,c6,c7,c8,c9);
BitmapAnd和BitmapOr
但是,可以同时使用这些索引吗? PostgreSQL为多个索引提供bitmapAnd及bitmapOr接口。它们可以组合多个索引,减少需要扫描的数据库数量。
Heap,onesquare=onepage:+---------------------------------------------+|c____u_____X___u___X_________u___cXcc______u_|+---------------------------------------------+Rowsmarkedcmatchcustomerspkeycondition.Rowsmarkedumatchusernamecondition.RowsmarkedXmatchbothconditions.Bitmapscanfromcustomers_pkey:+---------------------------------------------+|100000000001000000010000000000000111100000000|bitmap1+---------------------------------------------+Onebitperheappage,inthesameorderastheheapBits1whenconditionmatches,0ifnotBitmapscanfromix_cust_username:+---------------------------------------------+|000001000001000100010000000001000010000000010|bitmap2+---------------------------------------------+OncethebitmapsarecreatedabitwiseANDisperformedonthem:+---------------------------------------------+|100000000001000000010000000000000111100000000|bitmap1|000001000001000100010000000001000010000000010|bitmap2&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&|000000000001000000010000000000000010000000000|Combinedbitmap+-----------+-------+--------------+----------+|||vvvUsedtoscantheheaponlyformatchingpages:+---------------------------------------------+|___________X_______X______________X__________|+---------------------------------------------+Thebitmapheapscanthenseekstothestartofeachpageandreadsthepage:+---------------------------------------------+|___________X_______X______________X__________|+---------------------------------------------+seek------->^seek-->^seek--------->^|||------------------------onlythesepagesread
例如:
select*fromtestwherec1...andcrt_timebetween?and?andtest->>c1in(?,?...);
根据统计数据自动使用适当的索引。如果需要,bitmapAnd和bitmapOr将在多个索引上自动执行合并扫描。跳过不需要扫描的页面,重新检查命中的页面。
堆表存储分级和分区
存储可以分为一级分区或多级分区:
1. 单一分区
例如,按时间划分。
createtabletest(idint8,crt_timetimestamp,--Timeposgeometry,--Locationobjjsonb--Objectdescription)PARTITIONBYrange(crt_time);createtabletest_201701PARTITIONOFtestforvaluesFROM(2017-01-01)TO(2017-02-01);......
2. 多层分区
例如,先按时间分区,然后按Geohash划分。
createtabletest_201701PARTITIONOFtestforvaluesFROM(2017-01-01)TO(2017-02-01)partitionbyrange(st_geohash(pos,15));...createtabletest_201701_prefix1PARTITIONOFtestforvaluesFROM(xxxx1)TO(xxxx2);--GenerateBOX(GRID)onamap,findcorrespondingboundariesanduse--boundariesaspartitioningconditions
使用分区时,如果查询条件包括分区键(如时间和空间范围),相应的分区将自动定位,这即为需要扫描的数据量。
创建面向对象属性的GIN索引,以实现高效查询。
索引分级与分区
与数据一样,索引在不使用分区表的情况下也支持分区逻辑。
空间索引+时间分区
createindexidx_20170101ontblusinggist(pos)wherecrt_timebetween2017-01-01and2017-01-02;...createindexidx_20170102ontblusinggist(pos)wherecrt_timebetween2017-01-02and2017-01-03;...
通过使用前述分区索引,可以在输入时间范围后快速定位目标数据,执行空间搜索。
select*fromtblwherecrt_timebetween2017-01-01and2017-01-02--Timeand(pos<->?)<?--Distancetoapointtobesearchedforand?--Otherconditionsorderbypos<->?--Sortbydistancelimit?;--Numberofresultstobereturned
可以使用更多的索引分区,比如用作搜索条件和商店类型的维度(对象属性)(假设它是可枚举的或在范围相对较小的情况下)。
createindexidx_20170101_mod0ontblusinggist(pos)wherecrt_timebetween2017-01-01and2017-01-02anddtype=0;...createindexidx_20170101_mod1ontblusinggist(pos)wherecrt_timebetween2017-01-01and2017-01-02anddtype=1;...
通过使用前面的分区索引,在输入时间范围或特定条件以执行空间搜索后,可以快速定位目标数据。
select*fromtblwherecrt_timebetween2017-01-01and2017-01-02--Timeand(pos<->?)<?--Distancetoapointtobesearchedforanddtype=0--Objectconditionand?--Otherconditionsorderbypos<->?--Sortbydistancelimit?;--Numberofresultstobereturned
请注意,前面的SQL查询可以实现最佳性能优化。
索引组织形式(或索引结构)可以由逻辑分区重新构造,可以用上述类似的索引创建方法覆盖所有条件。
CTID相交阵列连接扫描
如前所述,BitmapAnd和BitmapOr合并扫描是在多个索引或GIN索引中自动执行的。事实上,这种扫描也可以在SQL中显式执行。
每个条件渗透对应的CTID。
使用Intersect或Union生成满足总体需求的CTID。(Intersect对应于“and”条件;union对应于“or”条件。)
生成一个ctid数组。
示例
1. 创建对象提要数据表
postgres=#createtabletbl(idint,infotext,crt_timetimestamp,pospoint,c1int,c2int,c3int);CREATETABLE
2. 将5000万条测试数据写入表中
postgres=#insertintotblselectgenerate_series(1,50000000),md5(random()::text),clock_timestamp(),point(180-random()*180,90-random()*90),random()*10000,random()*5000,random()*1000;INSERT050000000
3. 创建对象索引
postgres=#createindexidx_tbl_1ontblusinggin(info,c1,c2,c3);CREATEINDEX
4. 创建时间索引
postgres=#createindexidx_tbl_2ontblusingbtree(crt_time);CREATEINDEX
5. 创建空间索引
postgres=#createindexidx_tbl_3ontblusinggist(pos);CREATEINDEX
6. 生成数据布局以方便后续查询
postgres=#selectmin(crt_time),max(crt_time),count(*)fromtbl;min|max|count----------------------------+----------------------------+----------2017-07-2217:59:34.136497|2017-07-2218:01:27.233688|50000000(1row)
7. 创建一个极限KNN查询函数
createorreplacefunctionff(point,float8,int)returnssetoftidas$declarev_recrecord;v_limitint:=$3;beginsetlocalenable_seqscan=off;--Forceindexthatexitswhenscannedrowsreachaspecificnumberforv_recinselect*,(pos<->$1)asdist,ctidfromtblorderbypos<->$1loopifv_limit<=0then--raisenotice"Sufficientdataobtained"return;endif;ifv_rec.dist>$2then--raisenotice"Allmatchingpointsreturned"return;elsereturnnextv_rec.ctid;endif;v_limit:=v_limit-1;endloop;end;$languageplpgsqlstrictvolatile;postgres=#select*fromff(point(100,100),100,100);ff-------------(407383,11)(640740,9)(26073,51)(642750,34)...(100rows)Time:1.061ms
8. CTID合并检索
显示符合以下条件的记录
(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10)andpos<->point(0,0)<5andcrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40;
首先,分别查看每个条件,找匹配一个条件的记录数量,以及在索引扫描上所花时长。
1. 54,907条记录
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromtblwherec1in(1,2,3,4,100,200,99,88,77,66,55);QUERYPLAN-------------------------------------------------------------------------------------------------------------------------------BitmapHeapScanonpostgres.tbl(cost=820.07..65393.94rows=54151width=73)(actualtime=23.842..91.911rows=54907loops=1)Output:id,info,crt_time,pos,c1,c2,c3RecheckCond:(tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))HeapBlocks:exact=52778Buffers:sharedhit=52866->BitmapIndexScanonidx_tbl_1(cost=0.00..806.54rows=54151width=0)(actualtime=14.264..14.264rows=54907loops=1)IndexCond:(tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))Buffers:sharedhit=88Planningtime:0.105msExecutiontime:94.606ms(10rows)
2. 95,147条记录
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromtblwherec2<10;QUERYPLAN---------------------------------------------------------------------------------------------------------------------------------BitmapHeapScanonpostgres.tbl(cost=835.73..112379.10rows=99785width=73)(actualtime=69.243..179.388rows=95147loops=1)Output:id,info,crt_time,pos,c1,c2,c3RecheckCond:(tbl.c2<10)HeapBlocks:exact=88681Buffers:sharedhit=88734->BitmapIndexScanonidx_tbl_1(cost=0.00..810.79rows=99785width=0)(actualtime=53.612..53.612rows=95147loops=1)IndexCond:(tbl.c2<10)Buffers:sharedhit=53Planningtime:0.094msExecutiontime:186.201ms(10rows)
3. 149930条记录(为快速获得结果,PostgreSQL使用位图进行合并扫描)
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromtblwherec1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10;QUERYPLAN------------------------------------------------------------------------------------------------------------------------------------BitmapHeapScanonpostgres.tbl(cost=1694.23..166303.58rows=153828width=73)(actualtime=98.988..266.852rows=149930loops=1)Output:id,info,crt_time,pos,c1,c2,c3RecheckCond:((tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))OR(tbl.c2<10))HeapBlocks:exact=134424Buffers:sharedhit=134565->BitmapOr(cost=1694.23..1694.23rows=153936width=0)(actualtime=73.763..73.763rows=0loops=1)Buffers:sharedhit=141->BitmapIndexScanonidx_tbl_1(cost=0.00..806.54rows=54151width=0)(actualtime=16.733..16.733rows=54907loops=1)IndexCond:(tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))Buffers:sharedhit=88->BitmapIndexScanonidx_tbl_1(cost=0.00..810.79rows=99785width=0)(actualtime=57.029..57.029rows=95147loops=1)IndexCond:(tbl.c2<10)Buffers:sharedhit=53Planningtime:0.149msExecutiontime:274.548ms(15rows)
4. 60,687条记录(即使运用出色的KNN性能优化,仍然需要耗费195毫秒)。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromff(point(0,0),5,1000000);QUERYPLAN----------------------------------------------------------------------------------------------------------------------FunctionScanonpostgres.ff(cost=0.25..10.25rows=1000width=6)(actualtime=188.563..192.114rows=60687loops=1)Output:ffFunctionCall:ff((0,0)::point,5::doubleprecision,1000000)Buffers:sharedhit=61296Planningtime:0.029msExecutiontime:195.097ms(6rows)
让我们看看不使用KNN优化需要多长时间。
结果非常令人惊讶——极限优化性能提高了一个数量级。
5. 2,640,751条记录
使用所有索引逐个扫描数据条件,得到ctid并执行ctid扫描。
现在,让我们来分解这个过程:
首先,让我们看看时间和对象属性的合并查询,成果非常惊人。使用位图BitmapOr时,查询可以跳过大多数数据块,并且扫描时间比单索引扫描要短。
注意,在此步骤中记录的数量减少到7,847条。
postgres=#explain(analyze,verbose,timing,costs,buffers)selectctidfromtblwherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10);QUERYPLAN-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------BitmapHeapScanonpostgres.tbl(cost=35025.85..44822.94rows=7576width=6)(actualtime=205.577..214.821rows=7847loops=1)Output:ctidRecheckCond:(((tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))OR(tbl.c2<10))AND(tbl.crt_time>=2017-07-2217:59:34::timestampwithouttimezone)AND(tbl.crt_time<=2017-07-2217:59:40::timestampwithouttimezone))HeapBlocks:exact=6983Buffers:sharedhit=14343->BitmapAnd(cost=35025.85..35025.85rows=7581width=0)(actualtime=204.048..204.048rows=0loops=1)Buffers:sharedhit=7360->BitmapOr(cost=1621.11..1621.11rows=153936width=0)(actualtime=70.279..70.279rows=0loops=1)Buffers:sharedhit=141->BitmapIndexScanonidx_tbl_1(cost=0.00..806.54rows=54151width=0)(actualtime=15.860..15.860rows=54907loops=1)IndexCond:(tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))Buffers:sharedhit=88->BitmapIndexScanonidx_tbl_1(cost=0.00..810.79rows=99785width=0)(actualtime=54.418..54.418rows=95147loops=1)IndexCond:(tbl.c2<10)Buffers:sharedhit=53->BitmapIndexScanonidx_tbl_2(cost=0.00..33402.60rows=2462443width=0)(actualtime=127.101..127.101rows=2640751loops=1)IndexCond:((tbl.crt_time>=2017-07-2217:59:34::timestampwithouttimezone)AND(tbl.crt_time<=2017-07-2217:59:40::timestampwithouttimezone))Buffers:sharedhit=7219Planningtime:0.203msExecutiontime:216.697ms(20rows)
然后,看KNN的扫描时间:
注意,60,687条记录满足KNN距离条件,所以接下来将解释CTID合并扫描与原始扫描之间的性能比较。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromff(point(0,0),5,1000000);QUERYPLAN----------------------------------------------------------------------------------------------------------------------FunctionScanonpostgres.ff(cost=0.25..10.25rows=1000width=6)(actualtime=188.563..192.114rows=60687loops=1)Output:ffFunctionCall:ff((0,0)::point,5::doubleprecision,1000000)Buffers:sharedhit=61296Planningtime:0.029msExecutiontime:195.097ms(6rows)
最后,将这些片段合并到ctid中。
select*fromff(point(0,0),5,1000000)intersectselectctidfromtblwherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10);ff------------(1394,8)(3892,50)(6124,45)(7235,8)(7607,45)(11540,8)(13397,31)(14266,36)(18149,7)(19256,44)(24671,62)(26525,64)(30235,48)(13rows)Time:463.012ms
取得最终纪录。
select*fromtblwherectid=any(array(--arraystartselect*fromff(point(0,0),5,1000000)intersectselectctidfromtblwherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10))--arrayend);id|info|crt_time|pos|c1|c2|c3---------+----------------------------------+----------------------------+----------------------------------------+------+------+-----104558|c4699c933d4e2d2a10d828c4ff0b3362|2017-07-2217:59:34.362508|(4.20534582808614,2.43749532848597)|99|4858|543291950|1c2901689ab1eb7653d8ad972f7aa376|2017-07-2217:59:34.776808|(2.5384977646172,1.09820357523859)|3|2131|360459345|9e46548f29d914019ce53a589be8ebac|2017-07-2217:59:35.148699|(0.715781506150961,3.1486327573657)|1|1276|8542633|c422d6137f9111d5c2dc723b40c7023f|2017-07-2217:59:35.334278|(0.0631888210773468,2.2334903664887)|4968|3|245570570|fc57bfc6b7781d89b17c90417bd306f7|2017-07-2217:59:35.39653|(3.14926156774163,1.04107855819166)|88|2560|561865508|34509c7f7640afaf288a5e1d38199701|2017-07-2217:59:36.052573|(3.12869547866285,2.34822122845799)|2|65|8751004806|afe9f88cbebf615a7ae5f41180c4b33f|2017-07-2217:59:36.362027|(1.13972157239914,3.28763140831143)|3|1639|2081069986|6b9f27bfde993fb0bae3336ac010af7a|2017-07-2217:59:36.507775|(4.51995821669698,2.08761331625283)|2|200|3551361182|7c4c1c208c2b2b21f00772c43955d238|2017-07-2217:59:37.155127|(1.7334086727351,2.18367457855493)|9742|0|2321444244|41bf6f8e4b89458c13fb408a7db05284|2017-07-2217:59:37.339594|(0.52773853763938,2.16670122463256)|1|2470|8201850387|6e0011c6db76075edd2aa7f81ec94129|2017-07-2217:59:38.243091|(0.0168232340365648,0.420973123982549)|100|4395|3211989439|6211907ac254a4a3ca54f90822a2095e|2017-07-2217:59:38.551637|(0.0274275150150061,0.490507003851235)|1850|5|742267673|898fdd54dcc5b14c27cf1c8b9afe2471|2017-07-2217:59:39.170035|(0.394239127635956,2.86229319870472)|2892|6|917(13rows)Time:462.715ms
过程花费462毫秒。
9. 测试原始SQL查询的性能: PostgreSQL Multi-Index BitmapAnd and BitmapOr跳过扫描
直接编写SQL查询,而不是使用多CTID扫描。
postgres=#explain(analyze,verbose,timing,costs,buffers)select*fromtblwherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10)andpos<->point(0,0)<5;BitmapHeapScanonpostgres.tbl(cost=35022.06..44857.06rows=2525width=73)(actualtime=205.542..214.547rows=13loops=1)Output:id,info,crt_time,pos,c1,c2,c3RecheckCond:(((tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))OR(tbl.c2<10))AND(tbl.crt_time>=2017-07-2217:59:34::timestampwithouttimezone)AND(tbl.crt_time<=2017-07-2217:59:40::timestampwithouttimezone))Filter:((tbl.pos<->(0,0)::point)<5::doubleprecision)RowsRemovedbyFilter:7834HeapBlocks:exact=6983Buffers:sharedhit=14343->BitmapAnd(cost=35022.06..35022.06rows=7581width=0)(actualtime=203.620..203.620rows=0loops=1)Buffers:sharedhit=7360->BitmapOr(cost=1618.58..1618.58rows=153936width=0)(actualtime=71.660..71.660rows=0loops=1)Buffers:sharedhit=141->BitmapIndexScanonidx_tbl_1(cost=0.00..806.54rows=54151width=0)(actualtime=14.861..14.861rows=54907loops=1)IndexCond:(tbl.c1=ANY({1,2,3,4,100,200,99,88,77,66,55}::integer[]))Buffers:sharedhit=88->BitmapIndexScanonidx_tbl_1(cost=0.00..810.79rows=99785width=0)(actualtime=56.797..56.797rows=95147loops=1)IndexCond:(tbl.c2<10)Buffers:sharedhit=53->BitmapIndexScanonidx_tbl_2(cost=0.00..33402.60rows=2462443width=0)(actualtime=125.255..125.255rows=2640751loops=1)IndexCond:((tbl.crt_time>=2017-07-2217:59:34::timestampwithouttimezone)AND(tbl.crt_time<=2017-07-2217:59:40::timestampwithouttimezone))Buffers:sharedhit=7219Planningtime:0.160msExecutiontime:216.797ms(22rows)
性能如预期的那样好,之前解释过原因。KNN条件以外的条件已经将结果收敛到7,000条记录,因此没有必要使用包含KNN条件的索引。(即使使用KNN索引也需要195毫秒,因为有60,687条记录满足KNN条件。)
校验结果:
select*fromtblwherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10)andpos<->point(0,0)<5;id|info|crt_time|pos|c1|c2|c3---------+----------------------------------+----------------------------+----------------------------------------+------+------+-----104558|c4699c933d4e2d2a10d828c4ff0b3362|2017-07-2217:59:34.362508|(4.20534582808614,2.43749532848597)|99|4858|543291950|1c2901689ab1eb7653d8ad972f7aa376|2017-07-2217:59:34.776808|(2.5384977646172,1.09820357523859)|3|2131|360459345|9e46548f29d914019ce53a589be8ebac|2017-07-2217:59:35.148699|(0.715781506150961,3.1486327573657)|1|1276|8542633|c422d6137f9111d5c2dc723b40c7023f|2017-07-2217:59:35.334278|(0.0631888210773468,2.2334903664887)|4968|3|245570570|fc57bfc6b7781d89b17c90417bd306f7|2017-07-2217:59:35.39653|(3.14926156774163,1.04107855819166)|88|2560|561865508|34509c7f7640afaf288a5e1d38199701|2017-07-2217:59:36.052573|(3.12869547866285,2.34822122845799)|2|65|8751004806|afe9f88cbebf615a7ae5f41180c4b33f|2017-07-2217:59:36.362027|(1.13972157239914,3.28763140831143)|3|1639|2081069986|6b9f27bfde993fb0bae3336ac010af7a|2017-07-2217:59:36.507775|(4.51995821669698,2.08761331625283)|2|200|3551361182|7c4c1c208c2b2b21f00772c43955d238|2017-07-2217:59:37.155127|(1.7334086727351,2.18367457855493)|9742|0|2321444244|41bf6f8e4b89458c13fb408a7db05284|2017-07-2217:59:37.339594|(0.52773853763938,2.16670122463256)|1|2470|8201850387|6e0011c6db76075edd2aa7f81ec94129|2017-07-2217:59:38.243091|(0.0168232340365648,0.420973123982549)|100|4395|3211989439|6211907ac254a4a3ca54f90822a2095e|2017-07-2217:59:38.551637|(0.0274275150150061,0.490507003851235)|1850|5|742267673|898fdd54dcc5b14c27cf1c8b9afe2471|2017-07-2217:59:39.170035|(0.394239127635956,2.86229319870472)|2892|6|917(13rows)
分区索引示例
假设前面的查询条件保持不变,使用分区索引来测试性能。
这是为了演示分区索引的极端效果。在实际场景中,集合级别可能没有那么高(例如按天集合或按ID散列集合)。只要集合是可能的,就可以展现出色的性能。
postgres=#createindexidx_tbl_4ontblusinggist(pos)wherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10);CREATEINDEXTime:8359.330ms(00:08.359)
重构极值KNN优化函数
createorreplacefunctionff(point,float8,int)returnssetofrecordas$declarev_recrecord;v_limitint:=$3;beginsetlocalenable_seqscan=off;--Forceindexthatexitswhenscannedrowsreachaspecificnumberforv_recinselect*,(pos<->$1)asdistfromtblwherecrt_timebetween2017-07-2217:59:34and2017-07-2217:59:40and(c1in(1,2,3,4,100,200,99,88,77,66,55)orc2<10)orderbypos<->$1loopifv_limit<=0then--raisenotice"Sufficientdataobtained"return;endif;ifv_rec.dist>$2then--raisenotice"Allmatchingpointsreturned"return;elsereturnnextv_rec;endif;v_limit:=v_limit-1;endloop;end;$languageplpgsqlstrictvolatile;
查询性能:
postgres=#select*fromff(point(0,0),5,10000000)ast(idint,infotext,crt_timetimestamp,pospoint,c1int,c2int,c3int,distfloat8);id|info|crt_time|pos|c1|c2|c3|dist---------+----------------------------------+----------------------------+----------------------------------------+------+------+-----+-------------------1850387|6e0011c6db76075edd2aa7f81ec94129|2017-07-2217:59:38.243091|(0.0168232340365648,0.420973123982549)|100|4395|321|0.4213091410343191989439|6211907ac254a4a3ca54f90822a2095e|2017-07-2217:59:38.551637|(0.0274275150150061,0.490507003851235)|1850|5|74|0.491273232943761444244|41bf6f8e4b89458c13fb408a7db05284|2017-07-2217:59:37.339594|(0.52773853763938,2.16670122463256)|1|2470|820|2.23004532710301542633|c422d6137f9111d5c2dc723b40c7023f|2017-07-2217:59:35.334278|(0.0631888210773468,2.2334903664887)|4968|3|245|2.23438404136508291950|1c2901689ab1eb7653d8ad972f7aa376|2017-07-2217:59:34.776808|(2.5384977646172,1.09820357523859)|3|2131|360|2.765867313092471361182|7c4c1c208c2b2b21f00772c43955d238|2017-07-2217:59:37.155127|(1.7334086727351,2.18367457855493)|9742|0|232|2.788035202744092267673|898fdd54dcc5b14c27cf1c8b9afe2471|2017-07-2217:59:39.170035|(0.394239127635956,2.86229319870472)|2892|6|917|2.88931598221975459345|9e46548f29d914019ce53a589be8ebac|2017-07-2217:59:35.148699|(0.715781506150961,3.1486327573657)|1|1276|8|3.22896754478952570570|fc57bfc6b7781d89b17c90417bd306f7|2017-07-2217:59:35.39653|(3.14926156774163,1.04107855819166)|88|2560|561|3.316880007835811004806|afe9f88cbebf615a7ae5f41180c4b33f|2017-07-2217:59:36.362027|(1.13972157239914,3.28763140831143)|3|1639|208|3.47958123047986865508|34509c7f7640afaf288a5e1d38199701|2017-07-2217:59:36.052573|(3.12869547866285,2.34822122845799)|2|65|875|3.91188935630676104558|c4699c933d4e2d2a10d828c4ff0b3362|2017-07-2217:59:34.362508|(4.20534582808614,2.43749532848597)|99|4858|543|4.860691001307571069986|6b9f27bfde993fb0bae3336ac010af7a|2017-07-2217:59:36.507775|(4.51995821669698,2.08761331625283)|2|200|355|4.97877009299311(13rows)Time:0.592ms
太棒了!查询时间从200毫秒减少到1毫秒以内。
优化方法综述
优化方法回顾:
1. 为不同的数据类型构建不同的索引。
例如,对空间使用GiST或SP-GiST索引,对时间使用B树或BRIN索引,对多个对象属性使用GIN索引。索引的目的是缩小数据扫描的范围。
2. 方法五提到数据分区。
数据分区的目的是有意地组织数据,这意味着有意地组织数据以满足搜索需求。例如,如果时间是必需的查询条件或公共查询条件,那么可以按时间(分区)分割数据,以减少需要扫描的数据量。
3. 方法六描述了索引分区。
目的类似于方法五。方法五和方法六的区别在于分区在索引级别使用,因此当执行索引扫描时,数据命中率会直接提高。
4.方法七中的ctid合并扫描类似于PostgreSQL中的多索引bitmapAnd或bitmapOr扫描。
bitmapAnd/bitmapOr跳过不需要扫描的块,方法七中的ctid合并扫描跳过不需要扫描的行。
合并从多个索引扫描获得的ctid。跳过不需要扫描的行数。
如果当其他条件为“AND”时,过滤条件可以显著减少ctid(记录),则没有必要使用ctid合并扫描。相反,使用FILTER作为另一个条件。(这将略微增加CPU开销。)
5. 最好的功夫总是以最大的灵活性、自由和对每一个动作的无限想象为特征。
PostgreSQL实现多索引BitmapAnd或BitmapOr扫描,显著提高了多种条件(索引)下的数据命中率。
此外,PostgreSQL具有出色的CBO估计机制,它允许PostgreSQL不总是使用位图合并扫描的所有索引。这也是为什么在“测试原始SQL查询的性能——PostgreSQL多索引BitmapAnd位图或跳过扫描”一节中描述的性能更好。
6. 如何实现极端优化
采用方法五或六,并使用可修复的条件作为分区键来分区数据或索引。
对于其他条件,可以使用PostgreSQL中的多索引BitmapAnd或BitmapOr扫描来提高多条件(索引)的数据命中率。
我们可以看到,按照时间、空间和对象属性从5,000万数据块中进行多维检索所需的时间减少到了0.592毫秒。
7. 对于空间数据,除了使用GiST索引,我们还可以使用BRIN索引,这降低了成本。有条理地组织数据后,会使滤波性能良好。
感谢各位的阅读,以上就是“数据库中如何搜索时空行为数据”的内容了,经过本文的学习后,相信大家对数据库中如何搜索时空行为数据这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。