期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Audiovisual Art Event Classification and Outreach Based on Web Extracted Data
1
作者 Andreas Giannakoulopoulos Minas Pergantis +1 位作者 Aristeidis Lamprogeorgos Stella Lampoura 《Journal of Software Engineering and Applications》 2025年第1期24-43,共20页
The World Wide Web provides a wealth of information about everything, including contemporary audio and visual art events, which are discussed on media outlets, blogs, and specialized websites alike. This information m... The World Wide Web provides a wealth of information about everything, including contemporary audio and visual art events, which are discussed on media outlets, blogs, and specialized websites alike. This information may become a robust source of real-world data, which may form the basis of an objective data-driven analysis. In this study, a methodology for collecting information about audio and visual art events in an automated manner from a large array of websites is presented in detail. This process uses cutting edge Semantic Web, Web Search and Generative AI technologies to convert website documents into a collection of structured data. The value of the methodology is demonstrated by creating a large dataset concerning audiovisual events in Greece. The collected information includes event characteristics, estimated metrics based on their text descriptions, outreach metrics based on the media that reported them, and a multi-layered classification of these events based on their type, subjects and methods used. This dataset is openly provided to the general and academic public through a Web application. Moreover, each event’s outreach is evaluated using these quantitative metrics, the results are analyzed with an emphasis on classification popularity and useful conclusions are drawn concerning the importance of artistic subjects, methods, and media. 展开更多
关键词 web data extraction Art Events Classification Artistic Outreach Online Media
在线阅读 下载PDF
Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks 被引量:4
2
作者 Sudhir Kumar Patnaik C.Narendra Babu Mukul Bhave 《Big Data Mining and Analytics》 EI 2021年第4期279-297,共19页
Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction e... Data are crucial to the growth of e-commerce in today's world of highly demanding hyper-personalized consumer experiences,which are collected using advanced web scraping technologies.However,core data extraction engines fail because they cannot adapt to the dynamic changes in website content.This study investigates an intelligent and adaptive web data extraction system with convolutional and Long Short-Term Memory(LSTM)networks to enable automated web page detection using the You only look once(Yolo)algorithm and Tesseract LSTM to extract product details,which are detected as images from web pages.This state-of-the-art system does not need a core data extraction engine,and thus can adapt to dynamic changes in website layout.Experiments conducted on real-world retail cases demonstrate an image detection(precision)and character extraction accuracy(precision)of 97%and 99%,respectively.In addition,a mean average precision of 74%,with an input dataset of 45 objects or images,is obtained. 展开更多
关键词 adaptive web scraping deep learning Long Short-Term Memory(LSTM) web data extraction You only look once(Yolo)
原文传递
An Efficient Mechanism for Product Data Extraction from E-Commerce Websites
3
作者 Malik Javed Akhtar Zahur Ahmad +3 位作者 Rashid Amin Sultan H.Almotiri Mohammed A.Al Ghamdi Hamza Aldabbas 《Computers, Materials & Continua》 SCIE EI 2020年第12期2639-2663,共25页
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst... A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results. 展开更多
关键词 Document object model rich data region common tag sequence web data extraction deep web mining
在线阅读 下载PDF
A Framework of Web Data Integrated LBS Middleware
4
作者 MENG Xiaofeng YIN Shaoyi XIAO Zhen 《Wuhan University Journal of Natural Sciences》 CAS 2006年第5期1187-1191,共5页
In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge d... In this paper, we propose a flexible locationbased service (LBS) middleware framework to make the development and deployment of new location based applications much easier. Considering the World Wide Web as a huge data source of location relative information, we integrate the common used web data extraction techniques into the middleware framework, exposing a unified web data interface for the upper applications to make them more attractive. Besides, the framework also emphasizes some common LBS issues, including positioning, location modeling, location-dependent query processing, privacy and secure management. 展开更多
关键词 location-based service (LBS) MIDDLEWARE web data extraction
在线阅读 下载PDF
Creating customized data services from web pages
5
作者 季光 Wang Guiling Han Yanbo 《High Technology Letters》 EI CAS 2013年第2期203-207,共5页
To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user throu... To extract structured data from a web page with customized requirements,a user labels some DOM elements on the page with attribute names.The common features of the labeled elements are utilized to guide the user through the labeling process to minimize user efforts,and are also utilized to retrieve attribute values.To turn the attribute values into a structured result,the attribute pattern needs to be induced.For this purpose,a space-optimized suffix tree called attribute tree is built to transform the document object model(DOM) tree into a simpler form while preserving its useful properties such as attribute sequence order.The pattern is induced bottom-up on the attribute tree,and is further used to build the structured result.Experiments are conducted and show high performance of our approach in terms of precision,recall and structural correctness. 展开更多
关键词 web data extraction structured data user labeling CUSTOMIZATION data service
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部