• Title/Summary/Keyword: 대표 벡터

Search Result 300, Processing Time 0.022 seconds

Developing a New Algorithm for Conversational Agent to Detect Recognition Error and Neologism Meaning: Utilizing Korean Syllable-based Word Similarity (대화형 에이전트 인식오류 및 신조어 탐지를 위한 알고리즘 개발: 한글 음절 분리 기반의 단어 유사도 활용)

  • Jung-Won Lee;Il Im
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.267-286
    • /
    • 2023
  • The conversational agents such as AI speakers utilize voice conversation for human-computer interaction. Voice recognition errors often occur in conversational situations. Recognition errors in user utterance records can be categorized into two types. The first type is misrecognition errors, where the agent fails to recognize the user's speech entirely. The second type is misinterpretation errors, where the user's speech is recognized and services are provided, but the interpretation differs from the user's intention. Among these, misinterpretation errors require separate error detection as they are recorded as successful service interactions. In this study, various text separation methods were applied to detect misinterpretation. For each of these text separation methods, the similarity of consecutive speech pairs using word embedding and document embedding techniques, which convert words and documents into vectors. This approach goes beyond simple word-based similarity calculation to explore a new method for detecting misinterpretation errors. The research method involved utilizing real user utterance records to train and develop a detection model by applying patterns of misinterpretation error causes. The results revealed that the most significant analysis result was obtained through initial consonant extraction for detecting misinterpretation errors caused by the use of unregistered neologisms. Through comparison with other separation methods, different error types could be observed. This study has two main implications. First, for misinterpretation errors that are difficult to detect due to lack of recognition, the study proposed diverse text separation methods and found a novel method that improved performance remarkably. Second, if this is applied to conversational agents or voice recognition services requiring neologism detection, patterns of errors occurring from the voice recognition stage can be specified. The study proposed and verified that even if not categorized as errors, services can be provided according to user-desired results.

Development of a High-Performance Concrete Compressive-Strength Prediction Model Using an Ensemble Machine-Learning Method Based on Bagging and Stacking (배깅 및 스태킹 기반 앙상블 기계학습법을 이용한 고성능 콘크리트 압축강도 예측모델 개발)

  • Yun-Ji Kwak;Chaeyeon Go;Shinyoung Kwag;Seunghyun Eem
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.36 no.1
    • /
    • pp.9-18
    • /
    • 2023
  • Predicting the compressive strength of high-performance concrete (HPC) is challenging because of the use of additional cementitious materials; thus, the development of improved predictive models is essential. The purpose of this study was to develop an HPC compressive-strength prediction model using an ensemble machine-learning method of combined bagging and stacking techniques. The result is a new ensemble technique that integrates the existing ensemble methods of bagging and stacking to solve the problems of a single machine-learning model and improve the prediction performance of the model. The nonlinear regression, support vector machine, artificial neural network, and Gaussian process regression approaches were used as single machine-learning methods and bagging and stacking techniques as ensemble machine-learning methods. As a result, the model of the proposed method showed improved accuracy results compared with single machine-learning models, an individual bagging technique model, and a stacking technique model. This was confirmed through a comparison of four representative performance indicators, verifying the effectiveness of the method.

Measuring Similarity of Android Applications Using Method Reference Frequency and Manifest Information (메소드 참조 빈도와 매니페스트 정보를 이용한 안드로이드 애플리케이션들의 유사도 측정)

  • Kim, Gyoosik;Hamedani, Masoud Reyhani;Cho, Seong-je;Kim, Seong Baeg
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.15-25
    • /
    • 2017
  • As the value and importance of softwares are growing up, software theft and piracy become a much larger problem. To tackle this problem, it is highly required to provide an accurate method for detecting software theft and piracy. Especially, while software theft is relatively easy in the case of Android applications (apps), screening illegal apps has not been properly performed in Android markets. In this paper, we propose a method to effectively measure the similarity between Android apps for detecting software theft at the executable file level. Our proposed method extracts method reference frequency and manifest information through static analysis of executable Android apps as the main features for similarity measurement. Each app is represented as an n-dimensional vectors with the features, and then cosine similarity is utilized as the similarity measure. We demonstrate the effectiveness of our proposed method by evaluating its accuracy in comparison with typical source code-based similarity measurement methods. As a result of the experiments for the Android apps whose source file and executable file are available side by side, we found that our similarity degree measured at the executable file level is almost equivalent to the existing well-known similarity degree measured at the source file level.

A Study on Teaching the Method of Lagrange Multipliers in the Era of Digital Transformation (라그랑주 승수법의 교수·학습에 대한 소고: 라그랑주 승수법을 활용한 주성분 분석 사례)

  • Lee, Sang-Gu;Nam, Yun;Lee, Jae Hwa
    • Communications of Mathematical Education
    • /
    • v.37 no.1
    • /
    • pp.65-84
    • /
    • 2023
  • The method of Lagrange multipliers, one of the most fundamental algorithms for solving equality constrained optimization problems, has been widely used in basic mathematics for artificial intelligence (AI), linear algebra, optimization theory, and control theory. This method is an important tool that connects calculus and linear algebra. It is actively used in artificial intelligence algorithms including principal component analysis (PCA). Therefore, it is desired that instructors motivate students who first encounter this method in college calculus. In this paper, we provide an integrated perspective for instructors to teach the method of Lagrange multipliers effectively. First, we provide visualization materials and Python-based code, helping to understand the principle of this method. Second, we give a full explanation on the relation between Lagrange multiplier and eigenvalues of a matrix. Third, we give the proof of the first-order optimality condition, which is a fundamental of the method of Lagrange multipliers, and briefly introduce the generalized version of it in optimization. Finally, we give an example of PCA analysis on a real data. These materials can be utilized in class for teaching of the method of Lagrange multipliers.

Classification and discrimination of excel radial charts using the statistical shape analysis (통계적 형상분석을 이용한 엑셀 방사형 차트의 분류와 판별)

  • Seungeon Lee;Jun Hong Kim;Yeonseok Choi;Yong-Seok Choi
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.1
    • /
    • pp.73-86
    • /
    • 2024
  • A radial chart of Excel is very useful graphical method in delivering information for numerical data. However, it is not easy to discriminate or classify many individuals. In this case, after shaping each individual of a radial chart, we need to apply shape analysis. For a radial chart, since landmarks for shaping are formed as many as the number of variables representing the characteristics of the object, we consider a shape that connects them to a line. If the shape becomes complicated due to the large number of variables, it is difficult to easily grasp even if visualized using a radial chart. Principal component analysis (PCA) is performed on variables to create a visually effective shape. The classification table and classification rate are checked by applying the techniques of traditional discriminant analysis, support vector machine (SVM), and artificial neural network (ANN), before and after principal component analysis. In addition, the difference in discrimination between the two coordinates of generalized procrustes analysis (GPA) coordinates and Bookstein coordinates is compared. Bookstein coordinates are obtained by converting the position, rotation, and scale of the shape around the base landmarks, and show higher rate than GPA coordinates for the classification rate.

Korea National College of Agriculture and Fisheries in Naver News by Web Crolling : Based on Keyword Analysis and Semantic Network Analysis (웹 크롤링에 의한 네이버 뉴스에서의 한국농수산대학 - 키워드 분석과 의미연결망분석 -)

  • Joo, J.S.;Lee, S.Y.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.2
    • /
    • pp.71-86
    • /
    • 2021
  • This study was conducted to find information on the university's image from words related to 'Korea National College of Agriculture and Fisheries (KNCAF)' in Naver News. For this purpose, word frequency analysis, TF-IDF evaluation and semantic network analysis were performed using web crawling technology. In word frequency analysis, 'agriculture', 'education', 'support', 'farmer', 'youth', 'university', 'business', 'rural', 'CEO' were important words. In the TF-IDF evaluation, the key words were 'farmer', 'dron', 'agricultural and livestock food department', 'Jeonbuk', 'young farmer', 'agriculture', 'Chonju', 'university', 'device', 'spreading'. In the semantic network analysis, the Bigrams showed high correlations in the order of 'youth' - 'farmer', 'digital' - 'agriculture', 'farming' - 'settlement', 'agriculture' - 'rural', 'digital' - 'turnover'. As a result of evaluating the importance of keywords as five central index, 'agriculture' ranked first. And the keywords in the second place of the centrality index were 'farmers' (Cc, Cb), 'education' (Cd, Cp) and 'future' (Ce). The sperman's rank correlation coefficient by centrality index showed the most similar rank between Degree centrality and Pagerank centrality. The KNCAF articles of Naver News were used as important words such as 'agriculture', 'education', 'support', 'farmer', 'youth' in terms of word frequency. However, in the evaluation including document frequency, the words such as 'farmer', 'dron', 'Ministry of Agriculture, Food and Rural Affairs', 'Jeonbuk', and 'young farmers' were found to be key words. The centrality analysis considering the network connectivity between words was suitable for evaluation by Cd and Cp. And the words with strong centrality were 'agriculture', 'education', 'future', 'farmer', 'digital', 'support', 'utilization'.

Target Advertisement Service using a Viewer's Profile Reasoning (시청자 프로파일 추론 기법을 이용한 표적 광고 서비스)

  • Kim Munjo;Im Jeongyeon;Kang Sanggil;Kim Munchrul;Kang Kyungok
    • Journal of Broadcast Engineering
    • /
    • v.10 no.1 s.26
    • /
    • pp.43-56
    • /
    • 2005
  • In the existing broadcasting environment, it is not easy to serve the bi-directional service between a broadcasting server and a TV audience. In the uni-directional broadcasting environments, almost TV programs are scheduled depending on the viewers' popular watching time, and the advertisement contents in these TV programs are mainly arranged by the popularity and the ages of the audience. The audiences make an effort to sort and select their favorite programs. However, the advertisement programs which support the TV program the audience want are not served to the appropriate audiences efficiently. This randomly provided advertisement contents can occur to the audiences' indifference and avoidance. In this paper, we propose the target advertisement service for the appropriate distribution of the advertisement contents. The proposed target advertisement service estimates the audience's profile without any issuing the private information and provides the target-advertised contents by using his/her estimated profile. For the experimental results, we used the real audiences' TV usage history such as the ages, fonder and time of the programs from AC Neilson Korea. And we show the accuracy of the proposed target advertisement service algorithm. NDS (Normalized Distance Sum) and the Vector correlation method, and implementation of our target advertisement service system.

Enhanced Growth Inhibition by Combined Gene Transfer of p53 and $p16^{INK4a}$ in Adenoviral Vectors to Lung Cancer Cell Lines (폐암세포주에 대한 p53 및 $p16^{INK4a}$의 복합종양억제유전자요법의 효과)

  • Choi, Seung -Ho;Park, Kyung-Ho;Seol, Ja-Young;Yoo, Chul-Gyu;Lee, Choon-Taek;Kim, Young-Whan;Han, Sung-Koo;Shim, Young-Soo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.50 no.1
    • /
    • pp.67-75
    • /
    • 2001
  • Background : Two tumor suppressor genes, p53 and p16, which have different roles in controlling the cell cycle and inducing apoptosis, are frequently inactivated during carcinogenesis including lung cancer. Single tumor suppressor gene therapies using either with p53 or p16 have been studied extensively. However, there is a paucity of reports regarding a combined gene therapy using these two genes. Methods : The combined effect of p53 and p16 gene transfer by the adenoviral vector on the growth of lung cancer cell lines and its interactive mechanism was investigated. Results : An isobologram showed that the co-transduction of p53 and p16 exhibited a synergistic growth in hibitory effect on NCI H358 and an additive effect on NCI H23. Cell cycle analysis demonstrated the induction of a synergistic G1/S arrest by a combined p53 and p16 transfer. This synergistic interaction was again confirmed in a soft agar confirmed in a soft agar clonogenic assay. Conclusion : These observations suggest the potential of a p53 and p16 combination gene therapy as another potent strategy in cancer gene therapy.

  • PDF

A Study on the Impact Factors of Contents Diffusion in Youtube using Integrated Content Network Analysis (일반영향요인과 댓글기반 콘텐츠 네트워크 분석을 통합한 유튜브(Youtube)상의 콘텐츠 확산 영향요인 연구)

  • Park, Byung Eun;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.19-36
    • /
    • 2015
  • Social media is an emerging issue in content services and in current business environment. YouTube is the most representative social media service in the world. YouTube is different from other conventional content services in its open user participation and contents creation methods. To promote a content in YouTube, it is important to understand the diffusion phenomena of contents and the network structural characteristics. Most previous studies analyzed impact factors of contents diffusion from the view point of general behavioral factors. Currently some researchers use network structure factors. However, these two approaches have been used separately. However this study tries to analyze the general impact factors on the view count and content based network structures all together. In addition, when building a content based network, this study forms the network structure by analyzing user comments on 22,370 contents of YouTube not based on the individual user based network. From this study, we re-proved statistically the causal relations between view count and not only general factors but also network factors. Moreover by analyzing this integrated research model, we found that these factors affect the view count of YouTube according to the following order; Uploader Followers, Video Age, Betweenness Centrality, Comments, Closeness Centrality, Clustering Coefficient and Rating. However Degree Centrality and Eigenvector Centrality affect the view count negatively. From this research some strategic points for the utilizing of contents diffusion are as followings. First, it is needed to manage general factors such as the number of uploader followers or subscribers, the video age, the number of comments, average rating points, and etc. The impact of average rating points is not so much important as we thought before. However, it is needed to increase the number of uploader followers strategically and sustain the contents in the service as long as possible. Second, we need to pay attention to the impacts of betweenness centrality and closeness centrality among other network factors. Users seems to search the related subject or similar contents after watching a content. It is needed to shorten the distance between other popular contents in the service. Namely, this study showed that it is beneficial for increasing view counts by decreasing the number of search attempts and increasing similarity with many other contents. This is consistent with the result of the clustering coefficient impact analysis. Third, it is important to notice the negative impact of degree centrality and eigenvector centrality on the view count. If the number of connections with other contents is too much increased it means there are many similar contents and eventually it might distribute the view counts. Moreover, too high eigenvector centrality means that there are connections with popular contents around the content, and it might lose the view count because of the impact of the popular contents. It would be better to avoid connections with too powerful popular contents. From this study we analyzed the phenomenon and verified diffusion factors of Youtube contents by using an integrated model consisting of general factors and network structure factors. From the viewpoints of social contribution, this study might provide useful information to music or movie industry or other contents vendors for their effective contents services. This research provides basic schemes that can be applied strategically in online contents marketing. One of the limitations of this study is that this study formed a contents based network for the network structure analysis. It might be an indirect method to see the content network structure. We can use more various methods to establish direct content network. Further researches include more detailed researches like an analysis according to the types of contents or domains or characteristics of the contents or users, and etc.

Analysis of Causality of the Increase in the Port Congestion due to the COVID-19 Pandemic and BDI(Baltic Dry Index) (COVID-19 팬데믹으로 인한 체선율 증가와 부정기선 운임지수의 인과성 분석)

  • Lee, Choong-Ho;Park, Keun-Sik
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.4
    • /
    • pp.161-173
    • /
    • 2021
  • The shipping industry plummeted and was depressed due to the global economic crisis caused by the bankruptcy of Lehman Brothers in the US in 2008. In 2020, the shipping market also suffered from a collapse in the unstable global economic situation due to the COVID-19 pandemic, but unexpectedly, it changed to an upward trend from the end of 2020, and in 2021, it exceeded the market of the boom period of 2008. According to the Clarksons report published in May 2021, the decrease in cargo volume due to the COVID-19 pandemic in 2020 has returned to the pre-corona level by the end of 2020, and the tramper bulk carrier capacity of 103~104% of the Panamax has been in the ports due to congestion. Earnings across the bulker segments have risen to ten-year highs in recent months. In this study, as factors affecting BDI, the capacity and congestion ratio of Cape and Panamax ships on the supply side, iron ore and coal seaborne tonnge on the demand side and Granger causality test, IRF(Impulse Response Function) and FEVD(Forecast Error Variance Decomposition) were performed using VAR model to analyze the impact on BDI by congestion caused by strengthen quarantine at the port due to the COVID-19 pandemic and the loading and discharging operation delay due to the infection of the stevedore, etc and to predict the shipping market after the pandemic. As a result of the Granger causality test of variables and BDI using time series data from January 2016 to July 2021, causality was found in the Fleet and Congestion variables, and as a result of the Impulse Response Function, Congestion variable was found to have significant at both upper and lower limit of the confidence interval. As a result of the Forecast Error Variance Decomposition, Congestion variable showed an explanatory power upto 25% for the change in BDI. If the congestion in ports decreases after With Corona, it is expected that there is down-risk in the shipping market. The COVID-19 pandemic occurred not from economic factors but from an ecological factor by the pandemic is different from the past economic crisis. It is necessary to analyze from a different point of view than the past economic crisis. This study has meaningful to analyze the causality and explanatory power of Congestion factor by pandemic.