• 제목/요약/키워드: 키워드빈도분석

검색결과 352건 처리시간 0.026초

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제21권11호
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

A Study on the International Research Trend in Education Development focused on Text Network Analysis(2002~2017) (교육개발협력에 관한 국제 학술지 연구 동향 고찰 : 텍스트 네트워크 분석을 중심으로(2002~2017))

  • Kim, Sang-Mi;Kim, Young-Hwan;Cho, Won-Gyeum
    • Korean Journal of Comparative Education
    • /
    • 제28권1호
    • /
    • pp.1-24
    • /
    • 2018
  • The objective of the article is to find the research trends and the main traits presented in the keywords on abstracts of research articles of "International Journal of Education Development" from 2002 to 2017. To do this, Text Network Analysis(TNA) was applied targeting 966 papers on the journal and the major research outcomes are as follows. First, the frequency analysis on the keywords showed that the keywords like Administration of education program, Schools and instruction, Regional public administration, Educational support service, Elementary education, and Elementary and secondary school were analyzed more than 100 times and also high in centrality degree. Second, the analysis results of the keywords presented in those research articles by development goal periods showed that several new keywords like Elementary education, Elementary and secondary school, Education quality, Secondary education, Educational planning have emerged frequently after SDGs and these keywords showed high in their centrality analysis. Third, the analysis on education level showed that the keywords like Elementary education, Administration of education program, School children were high in frequency and centrality degree in Elementary level. In secondary level, Schools and instruction, Administration of education program, Academic achievement were high, and in high level, college and university was high, respectively.

A study on the effect of tax evasion controversy on corporate values in internet news portals through big data analysis (빅데이터 분석을 통한 인터넷 뉴스 포털에서의 탈세 논란이 기업 가치에 미치는 영향 연구)

  • Lee, Sang-Min;Park, Myung-Ho;Kim, Byung-Jun;Park, Dae-Keun
    • Journal of Internet Computing and Services
    • /
    • 제22권6호
    • /
    • pp.51-57
    • /
    • 2021
  • If a company's actions to save or avoid taxes are judged to be tax evasion rather than legal tax action by the tax authorities, the company will not only pay tax but also non-tax costs such as damage to corporate image and stock price decline due to a series of tax evasion-related news articles. Therefore, this study measures the frequency of occurrence of tax evasion controversial keywords in internet news portal as a factor to measure the severity of the case, and analyzes the effect of the frequency of occurrence on corporate value. In the Korean stock market, we crawl related articles from internet news portal by using keywords that are controversial for tax evasion targeting top companies based on market capitalization, and generate a time series of the frequency of occurrence of keywords about tax evasion by company and analyze the effect of frequency of appearance on book value versus market capitalization. Through panel regression and impulse response analysis, it is analyzed that the frequency of appearance has a negative effect on the market capitalization and the effect gradually decreases until 12 months. This study examines whether the tax evasion issue affects the corporate value of Korean companies and suggests that it is necessary to take these influences into account when entrepreneurs set up tax-planning schemes.

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • 제19D권4호
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • 제10권3호
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

Keyword Network Analysis of Trends in Research on Climate Change Education (키워드 네트워크 분석을 활용한 기후변화 교육 관련 연구동향 분석)

  • Kim, Soon Shik;Lee, Sang Gyun
    • Journal of the Korean Society of Earth Science Education
    • /
    • 제13권3호
    • /
    • pp.226-237
    • /
    • 2020
  • The purpose of the research is to analyze research trends related to climate change education by network analysis based on keywords extracted from the research title. For this purpose, 62 papers were selected from Korean Citation Index(KCI) journals published from 2011 to 2020 using such keywords as "climate change" and "climate change education" in the Research Information Sharing Service. The analysis procedure consisted of selection of analysis papers, keyword extraction and purification, and keyword network analysis and visualization. Textom, Ucinet 6.0, and NetDraw were used to analyze the frequency, degree centrality, and betweenness centrality. The results of the research showed that, first, Early 'Energy and Climate Change Education' had the highest frequency of papers examining climate change education. Second, the keywords/phrases that appeared most frequently in research on climate change education were "program" "energy," "analysis," "elementary school," "elementary school," "elementary school students," "development," and "impact." Third, the analysis of the centrality of betweenness centrality showed that the index of 'program', 'primary students' and 'primary schools' were the highest, and the largest group was 'development and effect of teaching and learning programs'. Based on these results, it was concluded that future research on climate change education needs to be examined in further detail and expanded into more specific areas.

Essential Technical Patent Extraction Method Associated with Fintech Based on Text Mining (텍스트 마이닝을 통한 핀테크 연관 핵심 기술 특허 추출 방법)

  • Lee, Hwangro;Choi, Eunmi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2015년도 추계학술발표대회
    • /
    • pp.1219-1222
    • /
    • 2015
  • 금융과 IT가 융합되는 핀테크(Fintech)가 IT산업과 금융산업에 새로운 패러다임으로 급부상하고 있다. 핀테크 기술에 대한 기술동향을 파악하고 유사한 연관 기술을 도출하는 것은 관련 사업자가 시장 경쟁에서 우위를 차지하기 위해 필요한 전략적 방향을 제시해 준다. 하지만 핀테크와 같이 단 기간 내에 기술에 대한 파급 속도가 빠르게 일어나며 산업전반에서 기술선점의 필요성이 크게 대두되는 경우 특허 데이터베이스만으로 유사기술을 검색을 위한 키워드를 선정하는 것이 어렵다는 단점이 있다. 본 논문에서는 새롭게 이슈화되는 기술 중 그 성장세가 급격하게 변화하여 등록된 특허만으로는 연관 기술 영역을 파악하는 일이 번거로운 상황에서 기사 분석을 통해 연관 기술 키워드를 추출 할 수 있는 방법을 제안하고자 한다. 특히 핀테크에서 중요하게 인식되는 결제, 보안, 사용자환경에 대한 연관 기술 키워드를 기사 내용에 포함되는 단어의 빈도 분석을 통해 추출하고자 하였다. 최종적으로 추출된 기술 키워드를 이용하여 실제 특허 검색 데이터베이스에서 관련 특허를 수집하고 분석하여 핀테크와 관련성이 매우 높은 연관 핵심 기술 특허를 도출하였다.

Trend Analysis of FinTech and Digital Financial Services using Text Mining (텍스트마이닝을 활용한 핀테크 및 디지털 금융 서비스 트렌드 분석)

  • Kim, Do-Hee;Kim, Min-Jeong
    • Journal of Digital Convergence
    • /
    • 제20권3호
    • /
    • pp.131-143
    • /
    • 2022
  • Focusing on FinTech keywords, this study is analyzing newspaper articles and Twitter data by using text mining methodology in order to understand trends in the industry of domestic digital financial service. In the growth of FinTech lifecycle, the frequency analysis has been performed by four important points: Mobile Payment Service, Internet Primary Bank, Data 3 Act, MyData Businesses. Utilizing frequency analysis, which combines the keywords 'China', 'USA', and 'Future' with the 'FinTech', has been predicting the FinTech industry regarding of the current and future position. Next, sentiment analysis was conducted on Twitter to quantify consumers' expectations and concerns about FinTech services. Therefore, this study is able to share meaningful perspective in that it presented strategic directions that the government and companies can use to understanding future FinTech market by combining frequency analysis and sentiment analysis.

A Relation Analysis between NDSL User Queries and Technical Terms (NDSL 검색 질의어와 기술용어간의 관계에 대한 분석적 연구)

  • Kang, Nam-Gyu;Cho, Min-Hee;Kwon, Oh-Seok
    • Journal of Information Management
    • /
    • 제39권3호
    • /
    • pp.163-177
    • /
    • 2008
  • In this paper, we analyzed the relationship between user query keywords that is used to search NDSL and technical terms extracted from NDSL journals. For the analysis, we extracted about 833,000 query keywords from NDSL search logs during nearly 17 months and approximately 41,000,000 technical terms from NDSL, INSPEC, FSTA journals. And we used only the English noun phrase in extracted those and then we did an experiment on analysis of equality, relationship analysis and frequency analysis.

Analysis of Qualitative Research on Science Education Trend in Korea Using Semantic Network Analysis (네트워크 분석을 통한 국내 과학교육 질적 연구동향 분석)

  • Lee, Sanggyun;Kim, Soonshik;Chae, Donghyun
    • Journal of the Korean Society of Earth Science Education
    • /
    • 제10권3호
    • /
    • pp.290-307
    • /
    • 2017
  • The purpose of this study is to analyze the research trends related to qualitative research on science education, to provide basic data of qualitative research on science education and to select the direction of follow-up research. The subject of the study is the level of Korean Citation Index (KCI-listed, KCI listing candidates), that can be searched by the key phrase, 'qualitative research', 'science education' in Korean language through the RISS service. In this study, the Descriptive Statistical Analysis Method is utilized to discover the number of research articles, classifying them by year and by journal. Also, the Sementic Network Analysis was conducted to the frequency of key words, Centrality Analysis throughout a variety of research articles using krkwic and Ucinet6.0. The results show that first, 138 research papers were published in 14 journals from 2005 to 2017. Second,, the analysis showed the highest frequency of appearance keyword in each article, 'elementary school teacher', 'gifted student', 'science teacher', 'class' were higher than others. third, according to the results of the whole Network Analysis, 'Analysis', 'elementary school', 'class' were analyzed as a highly influential node. And 'Comparison', 'inquiry', 'recognition', 'gifted students' were not close to the center of network. Fourth, keywords that appear in all sections are analysis, gifted students, and elementary school students, and can be analyzed continuously based on studies, lessons or recognition, and characteristics. Based on the results of this study, we explored the past and present of the study subjects related to the study of science education quality and discussed future direction of study.