首站-论文投稿智能助手
典型文献
Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge
文献摘要:
At present,focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks.For most current focused crawling technologies,there are some difficulties in obtaining high-quality crawling results.The main difficulties are the establishment of topic benchmark models,the assessment of topic relevance of hyperlinks,and the design of crawling strategies.In this paper,we use domain ontology to build a topic benchmark model for a specific topic,and propose a novel multiple-filtering strategy based on local ontology and global ontology(MFSLG).A comprehensive priority evaluation method(CPEM)based on the web text and link structure is introduced to improve the computation precision of topic relevance for unvisited hyperlinks,and a simulated annealing(SA)method is used to avoid the focused crawler falling into local optima of the search.By incorporating SA into the focused crawler with MFSLG and CPEM for the first time,two novel focused crawler strategies based on ontology and SA(FCOSA),including FCOSA with only global ontology(FCOSAG)and FCOSA with both local ontology and global ontology(FCOSA_LG),are proposed to obtain topic-relevant webpages about rainstorm disasters from the network.Experimental results show that the proposed crawlers outperform the other focused crawling strategies on different performance metric indices.
文献关键词:
作者姓名:
Jingfa LIU;Fan LI;Ruoyao DING;Zi'ang LIU
作者机构:
Guangzhou Key Laboratory of Multilingual Intelligent Processing,Guangdong University of Foreign Studies,Guangzhou 510006,China;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006,China;School of Computer and Software,Nanjing University of Information Science&Technology,Nanjing 210044,China;Faculty of Science,University of Alberta,Edmonton T6G2H6,Canada
引用格式:
[1]Jingfa LIU;Fan LI;Ruoyao DING;Zi'ang LIU-.Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge)[J].信息与电子工程前沿(英文),2022(08):1189-1204
A类:
ontologies,crawler,hyperlinks,MFSLG,CPEM,unvisited,FCOSA,FCOSAG,webpages,crawlers
B类:
Focused,crawling,strategies,simulated,annealing,methods,rainstorm,domain,knowledge,At,present,focused,crucial,obtaining,effective,from,massive,heterogeneous,networks,For,most,current,technologies,there,are,some,difficulties,high,quality,results,establishment,topic,benchmark,models,assessment,relevance,design,In,this,paper,ontology,build,specific,novel,multiple,filtering,strategy,local,global,comprehensive,priority,evaluation,text,structure,introduced,improve,computation,precision,avoid,falling,into,optima,search,By,incorporating,first,including,only,both,proposed,relevant,about,disasters,Experimental,show,that,outperform,other,different,performance,metric,indices
AB值:
0.380029
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。