• Title/Summary/Keyword: keyword-based analysis

Search Result 629, Processing Time 0.027 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

International Research Trends Related to Inquiry in Science Education: Perception and Perspective on Inquiry, Support and Strategy for Inquiry, and Teacher Professional Development for Inquiry (과학교육에서 탐구 관련 국외 연구 동향 -탐구의 인식과 관점, 전략과 지원, 교사 전문성의 관점에서-)

  • Yu, Eun-Jeong;Byun, Taejin;Baek, Jongho;Shim, Hyeon-Pyo;Ryu, Kumbok;Lee, Dongwon
    • Journal of The Korean Association For Science Education
    • /
    • v.41 no.1
    • /
    • pp.33-46
    • /
    • 2021
  • Inquiry occupies an important place in science education, and research related to inquiry is widely conducted. However, due to the inclusiveness of the concept of "exploration," each researcher perceives its meaning differently, and approaches may vary. In addition, criticisms have been raised that the results of classes using inquiry in science education do not guarantee meaningful changes to students. Therefore, this study attempts to identify the trend of SSCI-level research papers dealing with inquiry in science education over the past three years to confirm the current status and effectiveness of the inquiry. Researches used in the analysis are International Journal of Science Education, Journal of Research in Science Teaching, Research in Science Education, and Science Education, and limited to those that directly suggest "inquiry (enquiry)" as a keyword. Based on extracted 75 papers, the classification process was conducted, and an analysis frame was derived inductively by reflecting the subject and characteristics. Specific cases for each category were presented by dividing into three aspects: perception and perspective on inquiry, support and strategy for inquiry, and teacher professional development for inquiry. The results of examining the implications for scientific inquiry are as follows: First, rather than defining inquiry as an implicit proposition or presenting it as a step-by-step procedure, it was induced to grasp the meaning of inquiry more comprehensively and holistically. Second, as to whether the inquiry-based instruction is effective in all aspects of the cognitive, functional, and affective domains of science, the limitations are clearly presented, and the context-dependent and subject-specific properties and limitations of inquiry are emphasized. Third, uncertainty in science inquiry-based instruction can help learners to begin their inquiry and develop interest, but in the process of recognizing data and restructuring knowledge, explicit and specific guidance and scaffolding should be provided at an appropriate timing.

The Analysis of the Current Status of Medical Accidents and Disputes Researched in the Korean Web Sites (인터넷 사이트를 통해 살펴본 의료사고 및 의료분쟁의 현황에 관한 분석)

  • Cha, Yu-Rim;Kwon, Jeong-Seung;Choi, Jong-Hoon;Kim, Chong-Youl
    • Journal of Oral Medicine and Pain
    • /
    • v.31 no.4
    • /
    • pp.297-316
    • /
    • 2006
  • The increasing tendency of medical disputes is one of the remarkable social phenomena. Especially we must not overlook the phenomenon that production and circulation of information related to medical accidents is increasing rapidly through the internet. In this research, we evaluated the web sites which provide the information related to medical accidents using the keyword "medical accidents" in March 2006, and classified the 28 web sites according to the kinds of establishers. We also analyzed the contents of the sites, and checked and compared the current status of the web sites and problems that have to be improved. Finally, we suggested the possible solutions to prevent medical accidents. The detailed results were listed below. 1. Medical practitioners, general public, and lawyers were all familiar with and prefer the term "medical accidents" mainly. 2. In the number of sites searched by the keyword "medical accidents", lawyer had the most sites and medical practitioners had the least ones. 3. Many sites by general public and lawyers had their own medical record analysts but there was little professional analysts for dentistry. 4. General public were more interested in the prevention of medical accidents but the lawyers were more interested in the process after medical accidents. The sites by medical practitioners dealt with the least remedies of medical accidents, compared with other sites. 5. General public wanted the third party such as government intervention into the disputes including the medical dispute arbitration law or/and the establishment of independent medical dispute judgment institution. 6. In the comparison among the establishers of web sites, medical practitioners dealt with the least examples of medical accidents. 7. The suggestion of cases in counseling articles related to dental accidents were considered less importantly than the reality. 8. Whereas there were many articles about domestic cases related to the bloody dental treatment, in the open counseling articles the number of dental treatment regarding to non insurance treatment was large. 9. In comparing offered information of medical accidents based on the establishers, general public offered vocabularies, lawyers offered related laws and medical practitioners offered medical knowledge relatively. 10. They all cited the news pressed by the media to offer the current status of domestic medical accidents. Especially among the web sites by general public, NGOs provided the plentiful statistical data related to medical accidents. 11. The web sites that collect the medical accidents were only two. As a result of our research, we found out that, in the flood of information, medical disputes can be occurred by the wrong information from third party, and the medical practitioners have the most passive attitudes on the medical accidents. Thus, it is crucial to have the mutual interchange and exchange of information between lawyer, patients and medical practitioners, so that based on clear mutual comprehension we can solve the accidents and disputes more positively and actively.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

The Distribution and Characteristics of Protected Areas and Natural Resources in the Metropolitan Area in Blog Posts (블로그 게시물에 나타난 수도권 보전지역 및 자연자원의 분포 및 특성)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.50 no.5
    • /
    • pp.30-39
    • /
    • 2022
  • This study aimed to evaluate the awareness of conservation areas and green resources and analyze their characteristics by utilizing accumulated blog data created for specific places and objects. Among all the conservation areas and resources located in the Seoul metropolitan area, places that can be evaluated were classified, and sites were evaluated by dividing them into ten categories based on the number of blog posts written. As a result of the study, the users' awareness of forests was the highest, and the awareness of conservation areas and green resources was higher in urban areas than suburban areas. The result shows that the conservation areas and green resources located around the metropolitan area serve as natural tourist destinations while being the object of conservation for users. In addition, these results are in the same vein as the research results in domestic and foreign studies on the importance of ecosystem services in urban areas. Unlike existing research methods, this study is meaningful in that it identified the level of user awareness through social media analysis and applied it to evaluating conservation areas and green resources. It can be used as basic data to prepare a management plan considering public interest and awareness or to establish a development plan to increase awareness. In addition, the cumulative amount of blog content used in the study is meaningful in that it can identify and monitor users' interest in the space. However, it was not possible to examine the contents of each blog in detail because it was evaluated based on the amount of social media content. In addition, in the case of conservation areas and green resources, it is necessary to review and supplement the evaluation contents by adding keyword analysis and content analysis for the site to be evaluated as content other than the pure viewpoint of users may be mixed with development issues.

GenAI(Generative Artificial Intelligence) Technology Trend Analysis Using Bigkinds: ChatGPT Emergence and Startup Impact Assessment (빅카인즈를 활용한 GenAI(생성형 인공지능) 기술 동향 분석: ChatGPT 등장과 스타트업 영향 평가)

  • Lee, Hyun Ju;Sung, Chang Soo;Jeon, Byung Hoon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.4
    • /
    • pp.65-76
    • /
    • 2023
  • In the field of technology entrepreneurship and startups, the development of Artificial Intelligence(AI) has emerged as a key topic for business model innovation. As a result, venture firms are making various efforts centered on AI to secure competitiveness(Kim & Geum, 2023). The purpose of this study is to analyze the relationship between the development of GenAI technology and the startup ecosystem by analyzing domestic news articles to identify trends in the technology startup field. Using BIG Kinds, this study examined the changes in GenAI-related news articles, major issues, and trends in Korean news articles from 1990 to August 10, 2023, focusing on the emergence of ChatGPT before and after, and visualized the relevance through network analysis and keyword visualization. The results of the study showed that the mention of GenAI gradually increased in the articles from 2017 to 2023. In particular, OpenAI's ChatGPT service based on GPT-3.5 was highlighted as a major issue, indicating the popularization of language model-based GenAI technologies such as OpenAI's DALL-E, Google's MusicLM, and VoyagerX's Vrew. This proves the usefulness of GenAI in various fields, and since the launch of ChatGPT, Korean companies have been actively developing Korean language models. Startups such as Ritten Technologies are also utilizing GenAI to expand their scope in the technology startup field. This study confirms the connection between GenAI technology and startup entrepreneurship activities, which suggests that it can support the construction of innovative business strategies, and is expected to continue to shape the development of GenAI technology and the growth of the startup ecosystem. Further research is needed to explore international trends, the utilization of various analysis methods, and the possibility of applying GenAI in the real world. These efforts are expected to contribute to the development of GenAI technology and the growth of the startup ecosystem.

  • PDF

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

A Study on the Improvement of Recommendation Accuracy by Using Category Association Rule Mining (카테고리 연관 규칙 마이닝을 활용한 추천 정확도 향상 기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.27-42
    • /
    • 2020
  • Traditional companies with offline stores were unable to secure large display space due to the problems of cost. This limitation inevitably allowed limited kinds of products to be displayed on the shelves, which resulted in consumers being deprived of the opportunity to experience various items. Taking advantage of the virtual space called the Internet, online shopping goes beyond the limits of limitations in physical space of offline shopping and is now able to display numerous products on web pages that can satisfy consumers with a variety of needs. Paradoxically, however, this can also cause consumers to experience the difficulty of comparing and evaluating too many alternatives in their purchase decision-making process. As an effort to address this side effect, various kinds of consumer's purchase decision support systems have been studied, such as keyword-based item search service and recommender systems. These systems can reduce search time for items, prevent consumer from leaving while browsing, and contribute to the seller's increased sales. Among those systems, recommender systems based on association rule mining techniques can effectively detect interrelated products from transaction data such as orders. The association between products obtained by statistical analysis provides clues to predicting how interested consumers will be in another product. However, since its algorithm is based on the number of transactions, products not sold enough so far in the early days of launch may not be included in the list of recommendations even though they are highly likely to be sold. Such missing items may not have sufficient opportunities to be exposed to consumers to record sufficient sales, and then fall into a vicious cycle of a vicious cycle of declining sales and omission in the recommendation list. This situation is an inevitable outcome in situations in which recommendations are made based on past transaction histories, rather than on determining potential future sales possibilities. This study started with the idea that reflecting the means by which this potential possibility can be identified indirectly would help to select highly recommended products. In the light of the fact that the attributes of a product affect the consumer's purchasing decisions, this study was conducted to reflect them in the recommender systems. In other words, consumers who visit a product page have shown interest in the attributes of the product and would be also interested in other products with the same attributes. On such assumption, based on these attributes, the recommender system can select recommended products that can show a higher acceptance rate. Given that a category is one of the main attributes of a product, it can be a good indicator of not only direct associations between two items but also potential associations that have yet to be revealed. Based on this idea, the study devised a recommender system that reflects not only associations between products but also categories. Through regression analysis, two kinds of associations were combined to form a model that could predict the hit rate of recommendation. To evaluate the performance of the proposed model, another regression model was also developed based only on associations between products. Comparative experiments were designed to be similar to the environment in which products are actually recommended in online shopping malls. First, the association rules for all possible combinations of antecedent and consequent items were generated from the order data. Then, hit rates for each of the associated rules were predicted from the support and confidence that are calculated by each of the models. The comparative experiments using order data collected from an online shopping mall show that the recommendation accuracy can be improved by further reflecting not only the association between products but also categories in the recommendation of related products. The proposed model showed a 2 to 3 percent improvement in hit rates compared to the existing model. From a practical point of view, it is expected to have a positive effect on improving consumers' purchasing satisfaction and increasing sellers' sales.

A Study on Industry-specific Sustainability Strategy: Analyzing ESG Reports and News Articles (산업별 지속가능경영 전략 고찰: ESG 보고서와 뉴스 기사를 중심으로)

  • WonHee Kim;YoungOk Kwon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.287-316
    • /
    • 2023
  • As global energy crisis and the COVID-19 pandemic have emerged as social issues, there is a growing demand for companies to move away from profit-centric business models and embrace sustainable management that balances environmental, social, and governance (ESG) factors. ESG activities of companies vary across industries, and industry-specific weights are applied in ESG evaluations. Therefore, it is important to develop strategic management approaches that reflect the characteristics of each industry and the importance of each ESG factor. Additionally, with the stance of strengthened focus on ESG disclosures, specific guidelines are needed to identify and report on sustainable management activities of domestic companies. To understand corporate sustainability strategies, analyzing ESG reports and news articles by industry can help identify strategic characteristics in specific industries. However, each company has its own unique strategies and report structures, making it difficult to grasp detailed trends or action items. In our study, we analyzed ESG reports (2019-2021) and news articles (2019-2022) of six companies in the 'Finance,' 'Manufacturing,' and 'IT' sectors to examine the sustainability strategies of leading domestic ESG companies. Text mining techniques such as keyword frequency analysis and topic modeling were applied to identify industry-specific, ESG element-specific management strategies and issues. The analysis revealed that in the 'Finance' sector, customer-centric management strategies and efforts to promote an inclusive culture within and outside the company were prominent. Strategies addressing climate change, such as carbon neutrality and expanding green finance, were also emphasized. In the 'Manufacturing' sector, the focus was on creating sustainable communities through occupational health and safety issues, sustainable supply chain management, low-carbon technology development, and eco-friendly investments to achieve carbon neutrality. In the 'IT' sector, there was a tendency to focus on technological innovation and digital responsibility to enhance social value through technology. Furthermore, the key issues identified in the ESG factors were as follows: under the 'Environmental' element, issues such as greenhouse gas and carbon emission management, industry-specific eco-friendly activities, and green partnerships were identified. Under the 'Social' element, key issues included social contribution activities through stakeholder engagement, supporting the growth and coexistence of members and partner companies, and enhancing customer value through stable service provision. Under the 'Governance' element, key issues were identified as strengthening board independence through the appointment of outside directors, risk management and communication for sustainable growth, and establishing transparent governance structures. The exploration of the relationship between ESG disclosures in reports and ESG issues in news articles revealed that the sustainability strategies disclosed in reports were aligned with the issues related to ESG disclosed in news articles. However, there was a tendency to strengthen ESG activities for prevention and improvement after negative media coverage that could have a negative impact on corporate image. Additionally, environmental issues were mentioned more frequently in news articles compared to ESG reports, with environmental-related keywords being emphasized in the 'Finance' sector in the reports. Thus, ESG reports and news articles shared some similarities in content due to the sharing of information sources. However, the impact of media coverage influenced the emphasis on specific sustainability strategies, and the extent of mentioning environmental issues varied across documents. Based on our study, the following contributions were derived. From a practical perspective, companies need to consider their characteristics and establish sustainability strategies that align with their capabilities and situations. From an academic perspective, unlike previous studies on ESG strategies, we present a subdivided methodology through analysis considering the industry-specific characteristics of companies.

Value and Prosect of individual diary as research materials : Based on the "The 12th May Diaries Collection" (개인 일기의 연구 자료로서의 가치와 전망 "5월12일 일기컬렉션"을 중심으로)

  • Choi, Hyo Jin;Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.46
    • /
    • pp.95-152
    • /
    • 2015
  • "Archives of Everyday Life" refers to an organization or facility which collects, appraises, selects and preserves the document from the memory of individuals, groups, or a society through categorizing and classifying lives and cultures of ordinary people. The document includes materials such as diaries, autobiography, letters, and notes. It also covers any digital files or hypertext like posts from blogs and online communities, or photos uploaded on Social Network Services. Many research fields including the Records Management Studies has continuously claimed the necessity of collection and preservation of ordinary people's records on daily life produced every moment. Especially diary is a written record reflecting the facts experienced by an individual and his self-examination. Its originality, individuality and uniqueness are considered truly valuable as a document regardless of the era. Lately many diaries have been discovered and presented to the historical research communities, and diverse researchers in human and social studies have embarked more in-depth research on diaries, their authors, and social background of the time. Furthermore, researchers from linguistics, educational studies, and psychology analyze linguistic behaviors, status of cultural assimilation, and emotional or psychological changes of an author. In this study, we are conducting a metastudy from various research on diaries in order to reaffirm the value of "The 12th May Diaries Collection" as everyday life archives. "The 12th May Diaries Collection" consists of diaries produced and donated directly by citizens on the 12th May every year. It was only 2013 when Digital Archiving Institute in Univ. of Myungji organized the first "Annual call for the 12th May". Now more than 2,000 items were collected including hand writing diaries, digital documents, photos, audio and video files, etc. The age of participants also varies from children to senior citizens. In this study, quantitative analysis will be made on the diaries collected as well as more profound discoveries on the detailed contents of each item. It is not difficult to see stories about family and friends, school life, concerns over career path, daily life and feelings of citizens ranging all different generations, regions, and professions. Based on keyword and descriptors of each item, more comprehensive examination will be further made. Additionally this study will also provide suggestions to examine future research opportunities of these diaries for different fields such as linguistics, educational studies, historical studies or humanities considering diverse formats and contents of diaries. Finally this study will also discuss necessary tasks and challenges for "the 12th May Diaries Collection" to be continuously collected and preserved as Everyday Life Archives.