Academic report of Dr. Yang Jianye, University of New bet365 live casino and sports betting Wales, Australia

Source: Click: Time: April 25, 2019 10:50

Report title: Related technologies for collection containing relational query processing

Report time: 10:00 am on April 27, 2019

Reporting location: Conference Room 308 on the third floor of the Computer Building of the school headquarters

Reporter: Dr. Yang Jianye

Abstract: In the field of database,bet365 live casino and sports betting containment relation join is a classic basic bet365 live casino and sports betting operation,The current mainstream algorithms are divided into two categories: bet365 live casino and sports betting intersection-oriented and bet365 live casino and sports betting union-oriented,Both have their own advantages and disadvantages。The former uses the inverted index structure,The result is obtained by performing a bet365 live casino and sports betting intersection operation on the inverted index linked list,The entire operation process does not need to be verified,However, its disadvantage lies in creating an inverted index,Need to keep multiple copies of each bet365 live casino and sports betting in S,Thus as each bet365 live casino and sports betting grows,Its computational overhead has increased dramatically;The latter uses label technology to perform two stages of candidate bet365 live casino and sports betting generation and verification,The advantage is that the index structure is small,Candidate bet365 live casino and sports betting generation is efficient,The disadvantage is that verification is required,For data sets with relatively large result sets,This method is more time-consuming。We propose a new bet365 live casino and sports betting union-oriented method,TT-Join,This method combines the advantages of bet365 live casino and sports betting intersection orientation,And use the prefix tree structure to create index structures for R and S respectively,On this basis,Smartly synchronize tree tour operations between two prefix trees, Get the final result。We conduct experimental comparisons with 7 existing methods on 20 benchmark data sets,Experimental results show that TT-Join is significantly better than existing methods on most data sets,Up to two orders of magnitude。At the same time,In order to support larger data sets,We extend TT-Join to the distributed computing framework (mapreduce),Experiment result display,Our data partitioning strategy is significantly better than random partitioning and existing bet365 live casino and sports betting similarity-based partitioning。

About the speaker: Ant Financial Senior Algorithm Engineer，Responsible for the implementation of the group’s risk control-related models and algorithms。Graduated with a PhD from the University of New bet365 live casino and sports betting Wales, Australia in August 2017，Study under Professor Xuemin Lin, an expert in the database field，Graduated from Xi'an University of Electronic Science and Technology with bachelor's degree and master's degree，The main research direction is text database and graph database query technology and algorithm。Related work published in top international journals and conferences in the field of database and data mining，includes VLDBJ, ICDE, TKDE, KAIS, WWWJ, Among them, there are 5 long articles in CCF Class A international journals and conferences， 2 long articles in CCF Class B international journals，Served as a reviewer for several international journals and conferences，ACM/IEEE Member。

Title: Efficient bet365 live casino and sports betting containment join

Report time:10:00 am, April 27, 2019

Report location:conference room 308, 3rd floor, computer building, campus campus

Reporter:Dr. Yang jianye

Digest：bet365 live casino and sports betting containment join is a fundamental operation on massive collections of bet365 live casino and sports betting values.Recent research focuses on the in-memory bet365 live casino and sports betting containment join algorithms, and several techniques have been developed following intersection-oriented or union-oriented computing paradigms. Nevertheless, we observe that two computing paradigms have their limits due to the nature of the intersection and union operators. Particularly, intersection-oriented method relies on the intersection of the relevant inverted lists. A nice property of this method is that the join computation is verification free. However, the number of records explored during the join process may by large because there are multiple replicas for each record in S. On the other hand, the union-oriented method follows a candidate generation-and-verification paradigm by utilizing effective signatures. Unfortunately, union-oriented method needs to verify the candidate pairs, which may be cost expensive especially when the join result size is large. In this work, we propose a new union-oriented method, namely TT-Join, which not only enhances the advantage of the previous union-oriented methods but also integrates the goodness of intersection-oriented methods by imposing a variant of prefix tree structure. We conduct extensive experiments on 20 real-life datasets and synthetic datasets by comparing our method with 7 existing methods. The experiment results demonstrate that TT-Join significantly outperforms the existing algorithms on most of the datasets, and can achieve up to two orders of magnitude speedup. Furthermore, to support large scale of datasets, we extend our techniques to distributed systems on top of MapReduce framework. With the help of carefully designed load-aware distribution mechanisms, our distributed join algorithm can achieve up to an order of magnitude speedup than the baselines methods.

Speaker profile：Dr Jianye Yang is currently a senior data engineer in Ant Financial Services Group. He received the PhD degree in Computer Science from the University of New bet365 live casino and sports betting Wales, Australia, under the supervision of professor Xuemin Lin, in 2017. He got his Bachelor and Master degree in Computer Science from Xidian University, China, in 2010 and 2013 respectively. His research interests include text data query processing and graph data analysis. He has published multiple papers in top-tier international conference and journals, such as VLDBJ, ICDE, TKDE, KAIS, WWWJ. He served as peer reviewer for top-tier international conferences and journals.