• Title/Summary/Keyword: keyword-based analysis

Search Result 629, Processing Time 0.024 seconds

Intelligent Web Crawler for Supporting Big Data Analysis Services (빅데이터 분석 서비스 지원을 위한 지능형 웹 크롤러)

  • Seo, Dongmin;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.575-584
    • /
    • 2013
  • Data types used for big-data analysis are very widely, such as news, blog, SNS, papers, patents, sensed data, and etc. Particularly, the utilization of web documents offering reliable data in real time is increasing gradually. And web crawlers that collect web documents automatically have grown in importance because big-data is being used in many different fields and web data are growing exponentially every year. However, existing web crawlers can't collect whole web documents in a web site because existing web crawlers collect web documents with only URLs included in web documents collected in some web sites. Also, existing web crawlers can collect web documents collected by other web crawlers already because information about web documents collected in each web crawler isn't efficiently managed between web crawlers. Therefore, this paper proposed a distributed web crawler. To resolve the problems of existing web crawler, the proposed web crawler collects web documents by RSS of each web site and Google search API. And the web crawler provides fast crawling performance by a client-server model based on RMI and NIO that minimize network traffic. Furthermore, the web crawler extracts core content from a web document by a keyword similarity comparison on tags included in a web documents. Finally, to verify the superiority of our web crawler, we compare our web crawler with existing web crawlers in various experiments.

A Study on Factors that Affect Intention to Use Accommodation Sharing Service (숙박공유서비스 이용의도에 영향을 미치는 요인에 관한 연구)

  • Yun, Jeong-Hwan;Lee, Sang-Joon
    • The Journal of Information Systems
    • /
    • v.26 no.3
    • /
    • pp.187-209
    • /
    • 2017
  • Purpose The sharing economy is the most important keyword that is changing the paradigm of the world economy and management, and it was selected as one of 10 ideas to change the world in 2011 in Time magazine. The purpose of this study is to verify the structural relationship between perceived value and risk, network effect, usefulness, trust, and intention to use of accommodation service. In addition, the effects of utility and trust, and the experience of using accommodation service are controlled by the effect of empirical value and reciprocal value among the perceived value of sharing economic service, The purpose of this study is to propose a plan to activate the accommodation sharing service. Design/methodology/approach This study was designed to investigate the structural relationship between perceived value, perceived risk, network effect, usefulness, trust, and intention to use. Empirical analysis was done using SPSS 21.0 and AMOS 21.0 Findings Based on the results of this study, the following conclusions can be drawn. First, it was concluded that the higher the economic value of the accommodation sharing service is, the more useful and reliable the service is. Second, the higher the experiential value of the accommodation sharing service is, the higher the usefulness and reliability of the service are. Third, it was concluded that the higher the value of the mutual benefit of the accommodation sharing service is, the higher the trust level of the service is, but not the usefulness of the service. Fourth, it was concluded that the higher the perceived risk of accommodation sharing service is, the lower the service trust level is, but the service usability is not affected. Fifth, it was concluded that the larger the network effect of the accommodation sharing service is, the more useful the service is, but it does not affect the reliability of the service. Sixth, it was concluded that the higher the overall reliability of the accommodation sharing service is, the higher the usefulness of the service is. Seventh, it was concluded that the higher the overall usefulness and reliability of the accommodation sharing service are, the higher the intention to use the sharing service is. Finally, in order to test the effect of the experience on the use of trust and usefulness of the accommodation sharing service, multiple group analysis was conducted to examine the relationship between trust and usefulness, It is shown that there is a moderating effect in the path.

A Comparative Analysis of Disaster-Related Curriculum between Emergency Department and Nursing Department

  • Jung, Ji-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.10
    • /
    • pp.183-188
    • /
    • 2019
  • This study is a descriptive research to compare and analyze the current status of disaster-related curriculum between emergency department and nursing department Research and analysis targets were 41 universities which include the emergency department in South Korean by using the universities' internet homepage, finally 30 universities were researched by removing the universities which doesn't upload the curriculum on their homepage, have emergency department or have nursing department. The research data were collected and analyzed by using the universities' internet homepage. The Keyword is 'Disaster', 'Catastrophe', and 'Emergency' to search the name of the subjects. The curriculum calculated as a percentage of frequency by using the status of disaster-related subjects opening, classification of major education, grade, credit, number of class, practical hours, and the total number of subjects. According to the study, 29 universities (96.7%) of emergency department and 19 universities (63.3%) of nursing department has the disaster-related subjects in their curriculum. The current status of the class opening is emergency department at second grade and nursing department as fourth grade. As a subject of major, two credits are the common class credits. Based on the results of the study, knowledge and skills and training courses are necessary to develop the ability to cope with disasters in the disaster field. The curriculum that matches the role of health care resources will be required.

An Analysis of the Influence of Digital Media Device and Communication Utilization Capabilities on Entrepreneurial Intention : Focusing on the Mediating Effect of Risk-Taking and Proactiveness (디지털 미디어 기기 및 커뮤니케이션 활용역량이 창업의도에 미치는 영향에 대한 분석 : 위험감수성 및 진취성의 매개효과를 중심으로)

  • Lee, Sang Gil;Leen, Jae Mahn
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.16 no.1
    • /
    • pp.113-126
    • /
    • 2021
  • After the Corona 19 pandemic in the first half of 2020, the business environment has been changed in a very different way. The convergence in digital device would be the keyword of the future business. Due to the Corona 19 incident, the ability to utilize digital media devices has emerged as an important topic as people are focusing on online. The Corona incident has reminded us of how important digitalization is at all points of contact. This study analyzed the effects of digital media device and communication utilization capabilities on entrepreneurial intention by reflecting the mediating effect of risk-taking and proactiveness. For this study, a survey of 250 ordinary people was conducted and finally 212 valid questionnaires were collected. Statistical techniques were analyzed using Amos23. The analysis of the collected data showed that digital media device utilization and communication utilization did not directly affect entrepreneurship intentions, but it was confirmed that entrepreneurship risk-taking would have an intention to start a business. Through this, it was suggested that the development of start-up programs based on entrepreneurship and digital media utilization capabilities should be strengthened in a smart society centered on information and communication to expand job creation for the digital generation.

A study on the effect of tax evasion controversy on corporate values in internet news portals through big data analysis (빅데이터 분석을 통한 인터넷 뉴스 포털에서의 탈세 논란이 기업 가치에 미치는 영향 연구)

  • Lee, Sang-Min;Park, Myung-Ho;Kim, Byung-Jun;Park, Dae-Keun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.51-57
    • /
    • 2021
  • If a company's actions to save or avoid taxes are judged to be tax evasion rather than legal tax action by the tax authorities, the company will not only pay tax but also non-tax costs such as damage to corporate image and stock price decline due to a series of tax evasion-related news articles. Therefore, this study measures the frequency of occurrence of tax evasion controversial keywords in internet news portal as a factor to measure the severity of the case, and analyzes the effect of the frequency of occurrence on corporate value. In the Korean stock market, we crawl related articles from internet news portal by using keywords that are controversial for tax evasion targeting top companies based on market capitalization, and generate a time series of the frequency of occurrence of keywords about tax evasion by company and analyze the effect of frequency of appearance on book value versus market capitalization. Through panel regression and impulse response analysis, it is analyzed that the frequency of appearance has a negative effect on the market capitalization and the effect gradually decreases until 12 months. This study examines whether the tax evasion issue affects the corporate value of Korean companies and suggests that it is necessary to take these influences into account when entrepreneurs set up tax-planning schemes.

A Study on Scale of Participation Motive for Leisure Sports (여가 스포츠 참여동기 척도 분석에 관한 연구)

  • Kim, Ji-Young;Kim, Seung-Hyeon
    • 한국체육학회지인문사회과학편
    • /
    • v.54 no.3
    • /
    • pp.439-452
    • /
    • 2015
  • The purpose of this study is to encourage continuous participation in sports and to provide basic data for the promotion of participation in leisure sports. To achieve the purpose, this study conducted factor scaling analysis on participation motives for leisure sports and subdivided them to analyze psychological reactions of participants. As for study methods, this study collected master and doctor's degree theses and academic journals on motives for sports participation that were conducted from 1997 to 2012 from Korean major search engines. On the search engines, a keyword 'motive' was searched first and then studies on participation motive for leisure sports were collected. Key words that appeared when searching 'motive' were combined with other key words and word spacing between them were checked before conducting a literature analysis. The study results showed that participation motives for leisure sports were divided into a participation motive, an internal motive, an external motive, a leisure motive and other motives. It was identified that there were 23 factors for the participation motive, 17 factors each for the internal motive and the external motive, 8 factors for the leisure motive and 57 factors for other motives. It was found out that 76 factors were used to study a participation motive for leisure sports, excluding the factors that have similar or overlapping meaning based on each factor.

Study of major issues and trends facing ports, using big data news: From 1991 to 2020 (뉴스 빅데이터를 활용한 항만이슈 변화연구 : 1991~2020)

  • Yoon, Hee-Young
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.159-178
    • /
    • 2021
  • This study analyzed issues and trends related to ports with 86,611 news articles for the 30 years from 1991 to 2020, using BIGKinds, a big data news analysis service. The analysis was based on keyword analysis, word cloud, relationship diagram analysis offered by BIG Kinds. Analysis results of issues and trends on ports for the last 30 years are summarized as follows. First, during Phase 1 (1991-2000), individual ports such as Busan, Incheon, and Gwangyang ports tried to strengthen their own competitiveness. During Phase 2 (2001-2010), efforts were made on gaining more professional and specialized port management abilities by establishing the Busan Port Authority in 2004, the Incheon Port Authority in 2005, and the Ulsan Port Authority in 2007. During Phase 3 (2011-2020), the promotion of future-oriented, eco-friendly, and smart ports was major issues. Efforts to reduce particulate matters and pollutants produced from ports were accelerated, and an attempt to build a smart port driven by port automation and digitalization was also intensified. Lastly, in 2020, when the maritime sector was severely hit by the unexpected shock of the COVID-19 pandemic, a microscopic analysis of trends and issues in 2019 and 2020 was made to look into the impact the pandemic on the maritime industry. It was found that shipping and port industries experienced more drastic changes than ever while trying to prepare for a post-pandemic era as well as promoting future-oriented ports. This study made policy suggestions by analyzing port-related news articles and trends, and it is expected that based on the findings of this research, further studies on enhancing the competitiveness of ports and devising a sustainable development strategy will follow through a comparative analysis of port issues of different countries, thereby making further progress toward academic research on ports.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

  • Choi, Youngseok;Park, Jinsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.111-123
    • /
    • 2013
  • Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.

Korea National College of Agriculture and Fisheries in Naver News by Web Crolling : Based on Keyword Analysis and Semantic Network Analysis (웹 크롤링에 의한 네이버 뉴스에서의 한국농수산대학 - 키워드 분석과 의미연결망분석 -)

  • Joo, J.S.;Lee, S.Y.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.2
    • /
    • pp.71-86
    • /
    • 2021
  • This study was conducted to find information on the university's image from words related to 'Korea National College of Agriculture and Fisheries (KNCAF)' in Naver News. For this purpose, word frequency analysis, TF-IDF evaluation and semantic network analysis were performed using web crawling technology. In word frequency analysis, 'agriculture', 'education', 'support', 'farmer', 'youth', 'university', 'business', 'rural', 'CEO' were important words. In the TF-IDF evaluation, the key words were 'farmer', 'dron', 'agricultural and livestock food department', 'Jeonbuk', 'young farmer', 'agriculture', 'Chonju', 'university', 'device', 'spreading'. In the semantic network analysis, the Bigrams showed high correlations in the order of 'youth' - 'farmer', 'digital' - 'agriculture', 'farming' - 'settlement', 'agriculture' - 'rural', 'digital' - 'turnover'. As a result of evaluating the importance of keywords as five central index, 'agriculture' ranked first. And the keywords in the second place of the centrality index were 'farmers' (Cc, Cb), 'education' (Cd, Cp) and 'future' (Ce). The sperman's rank correlation coefficient by centrality index showed the most similar rank between Degree centrality and Pagerank centrality. The KNCAF articles of Naver News were used as important words such as 'agriculture', 'education', 'support', 'farmer', 'youth' in terms of word frequency. However, in the evaluation including document frequency, the words such as 'farmer', 'dron', 'Ministry of Agriculture, Food and Rural Affairs', 'Jeonbuk', and 'young farmers' were found to be key words. The centrality analysis considering the network connectivity between words was suitable for evaluation by Cd and Cp. And the words with strong centrality were 'agriculture', 'education', 'future', 'farmer', 'digital', 'support', 'utilization'.