The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data spa...The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.展开更多
The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging...The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging,relying on administrators to label the features of a class,or story,which a user comment corresponds to.Another common approach is to use pre-trained word embeddings to compare class descriptions for textual similarity,then use a distance metric such as cosine similarity or Euclidean distance to find top k neighbors.However,neither approach is able to fully utilize this user-generated unstructured natural language data,reducing the scope of these recommendation systems.This paper studies the application of domain adaptation on a transformer for the set of user comments to be indexed,and the use of simple contrastive learning for the sentence transformer fine-tuning process to generate meaningful semantic embeddings for the various user comments that apply to each class.In order to match a query containing content from multiple user comments belonging to the same class,the construction of a subquery channel for computing class-level similarity is proposed.This channel uses query segmentation of the aggregate query into subqueries,performing k-nearest neighbors(KNN)search on each individual subquery.RecBERT achieves state-of-the-art performance,outperforming other state-of-the-art models in accuracy,precision,recall,and F1 score for classifying comments between four and eight classes,respectively.RecBERT outperforms the most precise state-of-the-art model(distilRoBERTa)in precision by 6.97%for matching comments between eight classes.展开更多
Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations...Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.展开更多
基金Project (No.ABA048) supported by the Natural Science Foundationof Hubei Province,China
文摘The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.
文摘The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging,relying on administrators to label the features of a class,or story,which a user comment corresponds to.Another common approach is to use pre-trained word embeddings to compare class descriptions for textual similarity,then use a distance metric such as cosine similarity or Euclidean distance to find top k neighbors.However,neither approach is able to fully utilize this user-generated unstructured natural language data,reducing the scope of these recommendation systems.This paper studies the application of domain adaptation on a transformer for the set of user comments to be indexed,and the use of simple contrastive learning for the sentence transformer fine-tuning process to generate meaningful semantic embeddings for the various user comments that apply to each class.In order to match a query containing content from multiple user comments belonging to the same class,the construction of a subquery channel for computing class-level similarity is proposed.This channel uses query segmentation of the aggregate query into subqueries,performing k-nearest neighbors(KNN)search on each individual subquery.RecBERT achieves state-of-the-art performance,outperforming other state-of-the-art models in accuracy,precision,recall,and F1 score for classifying comments between four and eight classes,respectively.RecBERT outperforms the most precise state-of-the-art model(distilRoBERTa)in precision by 6.97%for matching comments between eight classes.
基金supported by the Korea Institute of Science and Technology Information (KISTI)
文摘Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.