• Title/Summary/Keyword: Web data

Search Result 5,605, Processing Time 0.029 seconds

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

Monitoring of On-Line Nutrition Information-Analysis of Meta Data (인터넷 영양정보의 모니터링-메타데이터의 분석)

  • 강혜경;강명희;유경혜;이선영
    • Journal of Nutrition and Health
    • /
    • v.37 no.8
    • /
    • pp.688-700
    • /
    • 2004
  • This study was conducted to analyze how appropriate the on-line nutrition information was externally as a web information. Four-hundred-ninety-seven web sites from 5 internet search engines (Yahoo, Empas, Nate, Hanmir, Naver) were selected on the basis of April 25th, 2004. The skillful personnels monitored them about 8 evaluating categories: clarity, purpose, authority, durability, advertisement, privacy and/or security, responsibility, and contents. Forty percent of the selected web sites were operated by the companies which had commercial purpose like internet shopping malls and 5.6% by academies, societies, research institutions, schools/colleges and public institutions. Most of web sites (76.1 %) were managed for advertisements and sales of companies' commodities, and 32.6% had the food and nutrition information as first purpose. Ninety-three percent of web sites were targeted to healthy individuals through whole life cycle. Specifically, there were lots of web sites for the obesity which were offered by diet related companies. Of the 497 web sites, 193 mentioned the name providing the nutrition information, but only 1/3 had reliability on their specialty. As a source of nutrition information, 52.7% of web sites were using 'books of the major field' and 42.0% 'newspapers' and 23.7% 'broadcasting', respectively. Most web sites mentioned 'setting-up date' but not 'renewal date'. Thirty-six percent of web sites took '2 - 3 days' for the operators to answer the questions through the bulletin. Forty-seven percent of web sites answered' 1 - 10 questions' per 1 week, but 40.1 % of them didn't answer for a week at all. There were 118 web sites (23.7%) to record the connected frequencies and 36.0% of them put the advertisements. Around 96% of web sites mentioned feedback addresses. Among the menus of web sites, 68.0% were about self-advertisement and 64.0% about nutrition information. Each web site was scored to judge its external quality according to the operators by selecting 13 items. Web sites managed by public institution had highest scores (9.5), and lowest in private vendors', food companies' and individual web pages. Among search engines, Naver got the highest score of 7.0 and Nate the lowest one of 6.1. As it was only the pilot study, there were several limits in evaluating tools, time and monitored quantity. To make monitoring of on-line nutrition informatiions actively, standardized monitoring forms might be developed under the integrated studies.

Development of a Geo Semantic Web System (Geo Semantic Web 시스템의 개발)

  • Kim, Joung-Joon;Shin, In-Su;Han, Ki-Joon
    • Spatial Information Research
    • /
    • v.18 no.5
    • /
    • pp.83-92
    • /
    • 2010
  • Recently, as the Geospatial Web is combined with the Semantic Web in order to keep pace with the recent trends of information technology emphasizing interoperability, intelligence and individualization, the Geo Semantic Web was proposed, which is an intelligent geographical information Web service technology that can provide users with suitable information by connecting and integrating various types of spatial information and extensive aspatial information on the Web efficiently. For the Geo Semantic Web service, we need to develop Geo Ontology processing technologies that enable computers to process knowledge and information scattered around in the Web environment automatically. However, standards for Geo Ontology processing technologies have nod been established yet, and standardization organizations and various groups and agencies are conducting relevant studies. This paper analyzed various base theories and technologies related to Geo Ontology and developed a Geo Semantic Web system. The Geo Semantic Web system comprises Query Processing Manager that analyzes and processes Geo Semantic queries and manages sessions, Ontology Manager that generates and queries Geo Ontology and extracts spatial/aspatial data, and Clients. Finally, this paper proved the utility of the Geo Semantic Web system by applying it to a hypothetical scenario where Geo Semantic queries are required.

Design of Metadata Model and Development of Management System for Electronic Documents on the Web (Web상의 전자문서를 위한 메타데이터 모델의 제안 및 관리시스템의 개발)

  • Jung, Hyo-Taeg;Yang, Young-Jong;Kim, Soon-Yong;Lee, Sang-Duk;Choy, Yoon-Chul
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.4
    • /
    • pp.924-941
    • /
    • 1998
  • It is not easy to access to the required data from the Web by using search engines because there are too many data selected and they do not provide enough information related to the corresponding data. Metadata is data about data. It includes information about data itself and contents of data as well. Users can acquire enough information about the corresponding data and access to the required data exactly using metadata, and therefore the data usability will be increased. In this paper, several metadata technologies and metadata models that are already in process of standardization or adopted as standards are analyzed, and the SeriCore Metadata Model for documents such as papers, project reports, technical reports, abstracts, and manuals, and graphic images that are in the field of science technologies on the Web is proposed. The SeriCore Metadata Management System that can generate, store, and retrieve metadata effectively is designed and implemented based on the SeriCore Metadata Model.

  • PDF

Development of Integrated Retrieval System of the Biology Sequence Database Using Web Service (웹 서비스를 이용한 바이오 서열 정보 데이터베이스 및 통합 검색 시스템 개발)

  • Lee, Su-Jung;Yong, Hwan-Seung
    • The KIPS Transactions:PartD
    • /
    • v.11D no.4
    • /
    • pp.755-764
    • /
    • 2004
  • Recently, the rapid development of biotechnology brings the explosion of biological data and biological data host. Moreover, these data are highly distributed and heterogeneous, reflecting the distribution and heterogeneity of the Molecular Biology research community. As a consequence, the integration and interoperability of molecular biology databases are issue of considerable importance. But, up to now, most of the integrated systems such as link based system, data warehouse based system have many problems which are keeping the data up to date when the schema and data of the data source are changed. For this reason, the integrated system using web service technology that allow biological data to be fully exploited have been proposed. In this paper, we built the integrated system if the bio sequence information bated on the web service technology. The developed system allows users to get data with many format such as BSML, GenBank, Fasta to traverse disparate data resources. Also, it has better retrieval performance because the retrieval modules of the external database proceed in parallel.

Analysis of Computational Science and Engineering SW Data Format for Multi-physics and Visualization

  • Ryu, Gimyeong;Kim, Jaesung;Lee, Jongsuk Ruth
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.889-906
    • /
    • 2020
  • Analysis of multi-physics systems and the visualization of simulation data are crucial and difficult in computational science and engineering. In Korea, Korea Institute of Science and Technology Information KISTI developed EDISON, a web-based computational science simulation platform, and it is now the ninth year since the service started. Hitherto, the EDISON platform has focused on providing a robust simulation environment and various computational science analysis tools. However, owing to the increasing issues in collaborative research, data format standardization has become more important. In addition, as the visualization of simulation data becomes more important for users to understand, the necessity of analyzing input / output data information for each software is increased. Therefore, it is necessary to organize the data format and metadata for the representative software provided by EDISON. In this paper, we analyzed computational fluid dynamics (CFD) and computational structural dynamics (CSD) simulation software in the field of mechanical engineering where several physical phenomena (fluids, solids, etc.) are complex. Additionally, in order to visualize various simulation result data, we used existing web visualization tools developed by third parties. In conclusion, based on the analysis of these data formats, it is possible to provide a foundation of multi-physics and a web-based visualization environment, which will enable users to focus on simulation more conveniently.

Understanding the Food Hygiene of Cruise through the Big Data Analytics using the Web Crawling and Text Mining

  • Shuting, Tao;Kang, Byongnam;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.24 no.2
    • /
    • pp.34-43
    • /
    • 2018
  • The objective of this study was to acquire a general and text-based awareness and recognition of cruise food hygiene through big data analytics. For the purpose, this study collected data with conducting the keyword "food hygiene, cruise" on the web pages and news on Google, during October 1st, 2015 to October 1st, 2017 (two years). The data collection was processed by SCTM which is a data collecting and processing program and eventually, 899 kb, approximately 20,000 words were collected. For the data analysis, UCINET 6.0 packaged with visualization tool-Netdraw was utilized. As a result of the data analysis, the words such as jobs, news, showed the high frequency while the results of centrality (Freeman's degree centrality and Eigenvector centrality) and proximity indicated the distinct rank with the frequency. Meanwhile, as for the result of CONCOR analysis, 4 segmentations were created as "food hygiene group", "person group", "location related group" and "brand group". The diagnosis of this study for the food hygiene in cruise industry through big data is expected to provide instrumental implications both for academia research and empirical application.

Analysis User Action in Web Pages using Ajax technique (Ajax 를 이용한 사용자의 웹 페이지 이용 행태 분석)

  • Lee, Dong-Hoon;Yoon, Tae-Bok;Kim, Kun-Su;Lee, Jee-Hyong
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.528-533
    • /
    • 2008
  • Web page evaluation is important issue in the Internet. Web pages are increasing extremely fast. The web page evaluation based on frequency, like the count of the page view (PV), is not sufficient way even it is used variously. Because users never use the unnecessary or irrelevant web pages for a long time. We concentrated on user's visiting duration time for the evaluation web pages. And we can collect user actions. Users do some action when users using the web page in the web browser. The movements of mouse pointer, mouse button click, page scrolling and so on are produced in the web browser. JavaScript can collect user action and Ajax can send collected data to server when user using the web browser without no user notification.

  • PDF

An Web Caching Method based on the Object Reference Probability Distribution Characteristics and the Life Time of Web Object (웹 객체의 참조확률분포특성과 평균수명 기반의 웹 캐싱 기법)

  • Na, Yun-Ji;Ko, Il-Seok
    • Convergence Security Journal
    • /
    • v.6 no.4
    • /
    • pp.91-99
    • /
    • 2006
  • Generally, a study of web caching is conducted on a performance improvement with structural approaches and a new hybrid method using existing methods, and studies on caching method itself. And existing analysis of reference-characteristic are conducted on a history analysis and a preference of users, a view point of data mining by log analysis. In this study, we analyze the reference-characteristic of web object on a view point of a characteristic of probability-distribution and a mean value of lifetime of a web-object. And using this result, we propose the new method for a performance improvement of a web-caching.

  • PDF

A Management Method for hierarchical Information Structures on Web Systems (계층적 정보 구조의 Web 시스템 관리 기술)

  • Choi, Yong-Jun;Lim, Kyung-Su;Hwang, Do-Sam;Kim, Chong-Gun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1300-1310
    • /
    • 1998
  • Web Information Systems have many static HTML documents and dynamic CGI application programs. A hyperlinked information environment on Web systems include lots of mutually referenced documents. This cause problems of data consistency in a intra-document and among inter-documents. To solve the problems, we propose a management method of Web system which have hierarchical information structure, and an unified problem-solving approach. We construct a large scale practical Web system based upon the proposed architecture. The proposed results can provide many advantage to WebMasteters.

  • PDF