• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.044 seconds

Analyzing Perceptions of Unused Facilities in Rural Areas Using Big Data Techniques - Focusing on the Utilization of Closed Schools as a Youth Start-up Space - (빅데이터 분석 기법을 활용한 농촌지역 유휴공간 인식 분석 - 청년창업 공간으로써 폐교 활용성을 중심으로 -)

  • Jee Yoon Do;Suyeon Kim
    • Journal of Environmental Impact Assessment
    • /
    • v.32 no.6
    • /
    • pp.556-576
    • /
    • 2023
  • This study attempted to find a way to utilize idle spaces in rural areas as a way to respond to rural extinction. Based on the keywords "startup," "youth start-up," and "youth start-up+rural," start-up+rural," the study sought to identify the perception of idle facilities in rural areas through the keywords "Idle facilities" and "closed schools." The study presented basic data for policy direction and plan search by reviewing frequency analysis, major keyword analysis, network analysis, emotional analysis, and domestic and foreign cases. As a result of the analysis, first, it was found that idle facilities and school closures are acting importantly as factors for regional regeneration. Second, in the case of youth startups in rural areas, it was found that not only education on agriculture but also problems for residence should be solved together. Third, in the case of young people, it was confirmed that it was necessary to establish digital utilization for agriculture by actively starting a business using digital. Finally, in order to attract young people and revitalize the region through best practices at home and abroad, policy measures that can serve as various platforms such as culture and education as well as startups should be presented in connection with local residents. These results are significant in that they presented implications for youth start-ups in rural areas by reviewing start-up recognition for the influx of young people as one of the alternatives for the use of idle facilities and regional regeneration, and if additional solutions are presented through field surveys, they can be used to set policy goals that fit the reality.

Analysis of the ordering factors influencing the awarding price ratio of service contract in KONEPS

  • Jung-Sung Ha;Tae-Hong Choi;Wan-Sup Cho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.239-248
    • /
    • 2023
  • The purpose of this study is to analyze the factors for service contracts that affect the successful bid price rate, focusing on the case of the country market. In the study, ordering organizations and bidders differentiated themselves from existing studies by analyzing service contracts that affect the successful bid price rate in a wide range of country markets. Comparative analysis of the awarding price ratio for services, this work provides a comparable result to the existing results in the previous literature. The analytical model used five independent variables such as budget, contract method, the days of the public notice, the awarding method, and the lowest awarding ratio. In the survey and analysis, big data was collected using text mining for service bids for Nara Market over the past 18 years and data was analyzed in a multi-dimensional way. The results of the analysis are as follows, (1) if budget does not determine the awarding price ratio. This is not the case in small amounts. (2) The contract method affects the awarding price ratio. (3) The days of the public notice increase, the awarding price ratio decrease. (4) the awarding method affects the awarding price ratio. (5) The lowest awarding ratio determines the awarding price ratio. Based on the results of empirical analysis, policy implications were sought.

Developing a deep learning-based recommendation model using online reviews for predicting consumer preferences: Evidence from the restaurant industry (딥러닝 기반 온라인 리뷰를 활용한 추천 모델 개발: 레스토랑 산업을 중심으로)

  • Dongeon Kim;Dongsoo Jang;Jinzhe Yan;Jiaen Li
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.31-49
    • /
    • 2023
  • With the growth of the food-catering industry, consumer preferences and the number of dine-in restaurants are gradually increasing. Thus, personalized recommendation services are required to select a restaurant suitable for consumer preferences. Previous studies have used questionnaires and star-rating approaches, which do not effectively depict consumer preferences. Online reviews are the most essential sources of information in this regard. However, previous studies have aggregated online reviews into long documents, and traditional machine-learning methods have been applied to these to extract semantic representations; however, such approaches fail to consider the surrounding word or context. Therefore, this study proposes a novel review textual-based restaurant recommendation model (RT-RRM) that uses deep learning to effectively extract consumer preferences from online reviews. The proposed model concatenates consumer-restaurant interactions with the extracted high-level semantic representations and predicts consumer preferences accurately and effectively. Experiments on real-world datasets show that the proposed model exhibits excellent recommendation performance compared with several baseline models.

Service Quality Evaluation based on Social Media Analytics: Focused on Airline Industry (소셜미디어 어낼리틱스 기반 서비스품질 평가: 항공산업을 중심으로)

  • Myoung-Ki Han;Byounggu Choi
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.157-181
    • /
    • 2022
  • As competition in the airline industry intensifies, effective airline service quality evaluation has become one of the main challenges. In particular, as big data analytics has been touted as a new research paradigm, new research on service quality measurement using online review analysis has been attempted. However, these studies do not use review titles for analysis, relyon supervised learning that requires a lot of human intervention in learning, and do not consider airline characteristics in classifying service quality dimensions.To overcome the limitations of existing studies, this study attempts to measure airlines service quality and to classify it into the AIRQUAL service quality dimension using online review text as well as title based on self-trainingand sentiment analysis. The results show the way of effective extracting service quality dimensions of AIRQUAL from online reviews, and find that each service quality dimension have a significant effect on service satisfaction. Furthermore, the effect of review title on service satisfaction is also found to be significant. This study sheds new light on service quality measurement in airline industry by using an advanced analytical approach to analyze effects of service quality on customer satisfaction. This study also helps managers who want to improve customer satisfaction by providing high quality service in airline industry.

Analysis of Generative AI Technology Trends Based on Patent Data (특허 데이터 기반 생성형 AI 기술 동향 분석)

  • Seongmu Ryu;Taewon Song;Minjeong Lee;Yoonju Choi;Soonuk Seol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This paper analyzes the trends in generative AI technology based on patent application documents. To achieve this, we selected 5,433 generative AI-related patents filed in South Korea, the United States, and Europe from 2003 to 2023, and analyzed the data by country, technology category, year, and applicant, presenting it visually to find insights and understand the flow of technology. The analysis shows that patents in the image category account for 36.9%, the largest share, with a continuous increase in filings, while filings in the text/document and music/speech categories have either decreased or remained stable since 2019. Although the company with the highest number of filings is a South Korean company, four out of the top five filers are U.S. companies, and all companies have filed the majority of their patents in the U.S., indicating that generative AI is growing and competing centered around the U.S. market. The findings of this paper are expected to be useful for future research and development in generative AI, as well as for formulating strategies for acquiring intellectual property.

Analysis of Policy Trends in Convergence Research and Development Using Unstructured Text Data (비정형 텍스트 데이터를 활용한 융합연구개발의 정책 동향 분석 )

  • Jiye Rhee;JaeEun Shin
    • Knowledge Management Research
    • /
    • v.25 no.2
    • /
    • pp.177-191
    • /
    • 2024
  • This study aims to analyze policy changes over time by conducting a textual analysis of the basic plan for activating convergence research and development. By examining the basic plan for convergence research development, this study looks into changes in convergence research policies and suggests future directions, thereby exploring strategic approaches that can contribute to the advancement of science and technology and societal development in our country. In particular, it sought to understand the policy changes proposed by the basic plan by identifying the relevance and trends of topics over time. Various analytical methods such as TF-IDF analysis, topic modeling (LDA), and network (CONCOR) analysis were used to identify the key topics of each period and grasp the trends in policy changes. The analysis revealed clustering of topics by period and changes in topics, providing directions for the convergence research ecosystem and addressing pressing issues. The results of this study are expected to provide important insights to various stakeholders such as governments, businesses, academia, and research institutions, offering new insights into the changes in policies proposed by previous basic plans from a macroscopic perspective.

What Concerns Does ChatGPT Raise for Us?: An Analysis Centered on CTM (Correlated Topic Modeling) of YouTube Video News Comments (ChatGPT는 우리에게 어떤 우려를 초래하는가?: 유튜브 영상 뉴스 댓글의 CTM(Correlated Topic Modeling) 분석을 중심으로)

  • Song, Minho;Lee, Soobum
    • Informatization Policy
    • /
    • v.31 no.1
    • /
    • pp.3-31
    • /
    • 2024
  • This study aimed to examine public concerns in South Korea considering the country's unique context, triggered by the advent of generative artificial intelligence such as ChatGPT. To achieve this, comments from 102 YouTube video news related to ethical issues were collected using a Python scraper, and morphological analysis and preprocessing were carried out using Textom on 15,735 comments. These comments were then analyzed using a Correlated Topic Model (CTM). The analysis identified six primary topics within the comments: "Legal and Ethical Considerations"; "Intellectual Property and Technology"; "Technological Advancement and the Future of Humanity"; "Potential of AI in Information Processing"; "Emotional Intelligence and Ethical Regulations in AI"; and "Human Imitation."Structuring these topics based on a correlation coefficient value of over 10% revealed 3 main categories: "Legal and Ethical Considerations"; "Issues Related to Data Generation by ChatGPT (Intellectual Property and Technology, Potential of AI in Information Processing, and Human Imitation)"; and "Fear for the Future of Humanity (Technological Advancement and the Future of Humanity, Emotional Intelligence, and Ethical Regulations in AI)."The study confirmed the coexistence of various concerns along with the growing interest in generative AI like ChatGPT, including worries specific to the historical and social context of South Korea. These findings suggest the need for national-level efforts to ensure data fairness.

A comparative study on the performance of Transformer-based models for Korean speech recognition (트랜스포머 기반 모델의 한국어 음성인식 성능 비교 연구)

  • Changhan Oh;Minseo Kim;Kiyoung Park;Hwajeon Song
    • Phonetics and Speech Sciences
    • /
    • v.16 no.3
    • /
    • pp.79-86
    • /
    • 2024
  • Transformer models have shown remarkable performance in extracting meaningful information from sequential input data such as text and images, and are gaining attention as end-to-end models for speech recognition. This study compared the performances of the Transformer speech recognition model and its enhanced versions, the Conformer and E-Branchformer, when applied to Korean speech recognition. Using Korean speech data from AIHub, we prepared a training set of approximately 7,500 hours and evaluated the models using the ESPnet toolkit. Additionally, we compared syllables and subwords as recognition units and analyzed the performance differences with changes in the number of tokens using Byte Pair Encoding. The results showed that the E-Branchformer achieved the best performance in Korean speech recognition and Conformer outperformed Transformer but degraded in performance for long utterances owing to cross-attention alignment errors. We aimed to determine the optimal settings by analyzing the performance changes with subword token adjustments. This study comprehensively evaluated model accuracy and processing speed to maximize the efficiency of Korean speech recognition. This is expected to contribute to the training of large-scale Korean speech recognition models and improve Conformer recognition errors. Future research should include additional experiments with diverse Korean speech datasets and enhance the recognition performance through structural improvements in the Conformer.

Development of Topic Trend Analysis Model for Industrial Intelligence using Public Data (텍스트마이닝을 활용한 공개데이터 기반 기업 및 산업 토픽추이분석 모델 제안)

  • Park, Sunyoung;Lee, Gene Moo;Kim, You-Eil;Seo, Jinny
    • Journal of Technology Innovation
    • /
    • v.26 no.4
    • /
    • pp.199-232
    • /
    • 2018
  • There are increasing needs for understanding and fathoming of business management environment through big data analysis at industrial and corporative level. The research using the company disclosure information, which is comprehensively covering the business performance and the future plan of the company, is getting attention. However, there is limited research on developing applicable analytical models leveraging such corporate disclosure data due to its unstructured nature. This study proposes a text-mining-based analytical model for industrial and firm level analyses using publicly available company disclousre data. Specifically, we apply LDA topic model and word2vec word embedding model on the U.S. SEC data from the publicly listed firms and analyze the trends of business topics at the industrial and corporate levels. Using LDA topic modeling based on SEC EDGAR 10-K document, whole industrial management topics are figured out. For comparison of different pattern of industries' topic trend, software and hardware industries are compared in recent 20 years. Also, the changes of management subject at firm level are observed with comparison of two companies in software industry. The changes of topic trends provides lens for identifying decreasing and growing management subjects at industrial and firm level. Mapping companies and products(or services) based on dimension reduction after using word2vec word embedding model and principal component analysis of 10-K document at firm level in software industry, companies and products(services) that have similar management subjects are identified and also their changes in decades. For suggesting methodology to develop analysis model based on public management data at industrial and corporate level, there may be contributions in terms of making ground of practical methodology to identifying changes of managements subjects. However, there are required further researches to provide microscopic analytical model with regard to relation of technology management strategy between management performance in case of related to various pattern of management topics as of frequent changes of management subject or their momentum. Also more studies are needed for developing competitive context analysis model with product(service)-portfolios between firms.

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.