• Title/Summary/Keyword: Large-set Classification

Search Result 183, Processing Time 0.026 seconds

Hybrid metrics model to predict fault-proneness of large software systems (대형 소프트웨어 시스템의 결함경향성 예측을 위한 혼성 메트릭 모델)

  • Hong, Euy-Seok
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.5
    • /
    • pp.129-137
    • /
    • 2005
  • Criticality prediction models that identify fault-prone spots using system design specifications play an important role in reducing development costs of large systems such as telecommunication systems. Many criticality prediction models using complexity metrics have been suggested. But most of them need training data set for model training. And they are classification models that can only classify design entities into fault-prone group and non fault-prone group. To solve this problem, this paper builds a new prediction model, HMM, using two styled hybrid metrics. HMM has strong point that it does not need training data and it enables comparison between design entities by criticality. HMM is implemented and compared with a well-known prediction model, BackPropagation neural network Model(BPM), considering internal characteristics and accuracy of prediction.

  • PDF

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF

Multi-classification Sensitive Image Detection Method Based on Lightweight Convolutional Neural Network

  • Yueheng Mao;Bin Song;Zhiyong Zhang;Wenhou Yang;Yu Lan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1433-1449
    • /
    • 2023
  • In recent years, the rapid development of social networks has led to a rapid increase in the amount of information available on the Internet, which contains a large amount of sensitive information related to pornography, politics, and terrorism. In the aspect of sensitive image detection, the existing machine learning algorithms are confronted with problems such as large model size, long training time, and slow detection speed when auditing and supervising. In order to detect sensitive images more accurately and quickly, this paper proposes a multiclassification sensitive image detection method based on lightweight Convolutional Neural Network. On the basis of the EfficientNet model, this method combines the Ghost Module idea of the GhostNet model and adds the SE channel attention mechanism in the Ghost Module for feature extraction training. The experimental results on the sensitive image data set constructed in this paper show that the accuracy of the proposed method in sensitive information detection is 94.46% higher than that of the similar methods. Then, the model is pruned through an ablation experiment, and the activation function is replaced by Hard-Swish, which reduces the parameters of the original model by 54.67%. Under the condition of ensuring accuracy, the detection time of a single image is reduced from 8.88ms to 6.37ms. The results of the experiment demonstrate that the method put forward has successfully enhanced the precision of identifying multi-class sensitive images, significantly decreased the number of parameters in the model, and achieved higher accuracy than comparable algorithms while using a more lightweight model design.

An Exploratory research on patent trends and technological value of Organic Light-Emitting Diodes display technology (Organic Light-Emitting Diodes 디스플레이 기술의 특허 동향과 기술적 가치에 관한 탐색적 연구)

  • Kim, Mingu;Kim, Yongwoo;Jung, Taehyun;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.135-155
    • /
    • 2022
  • This study analyzes patent trends by deriving sub-technical fields of Organic Light-Emitting Diodes (OLEDs) industry, and analyzing technology value, originality, and diversity for each sub-technical field. To collect patent data, a set of international patent classification(IPC) codes related to OLED technology was defined, and OLED-related patents applied from 2005 to 2017 were collected using a set of IPC codes. Then, a large number of collected patent documents were classified into 12 major technologies using the Latent Dirichlet Allocation(LDA) topic model and trends for each technology were investigated. Patents related to touch sensor, module, image processing, and circuit driving showed an increasing trend, but virtual reality and user interface recently decreased, and thin film transistor, fingerprint recognition, and optical film showed a continuous trend. To compare the technological value, the number of forward citations, originality, and diversity of patents included in each technology group were investigated. From the results, image processing, user interface(UI) and user experience(UX), module, and adhesive technology with high number of forward citations, originality and diversity showed relatively high technological value. The results provide useful information in the process of establishing a company's technology strategy.

Consideration for Setting Reference Range for Adrenocorticotropic Hormone Test according to Blood Collection Time (채혈 시간에 따른 부신피질 자극 호르몬 검사의 참고치 설정에 관한 고찰)

  • Ji-Hye Park;Jin-Ju Choi;Soo-Yeon Lim;Seon-Hee Yoo;Sun-Ho Lee
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.27 no.1
    • /
    • pp.42-46
    • /
    • 2023
  • Purpose The reference range described in Adrenocorticotropic Hormone reagent used in our laboratory is 10-60 pg/mL at 8 a.m. to 10 a.m., and 6-30 pg/mL at 8 p.m. to 10 p.m. However, in the case of outpatients, blood is mainly collected between 10 a.m. and 6 p.m., accounting for 57.8% of the total. Therefore, This study is intended to help make a more accurate diagnosis by reevaluating the reference range provided by the manufacturer of the Adrenocorticotropic Hormone reagent and setting split-timed reference range. Materials and Methods The patients collected blood before 10 a.m. were group A (68 people), and the patients collected blood after 10 a.m. were set to group B (80 people). A T-test was performed between groups to test their significance. And it was confirmed whether it was necessary to set the gender classification as a subgroup. The method of setting the reference range was calculated by the Bayesian's method and the Hoffmann's method. Results The reference range of Group A was 8.6 to 60.6 pg/mL by the Bayesian's method, and the Hoffmann's method was 3.6 to 61.3 pg/mL. The reference range of Group B was 6.9 to 50.5 pg/mL when applying the Bayesian's method, and the Hoffmann method's was 2.3 to 48.9 pg/mL. Conclusion This study was concluded that it was necessary to set the split-timed reference range. Through this study, the later the blood collection time, the lower the level of Adrenocorticotropic Hormone, indicating that blood collection time is important for patients with clinical significance. If a large number of subjects are selected and supplemented in the future, it is believed that systematic and accurate reference range can be set.

  • PDF

Application of Data Dictionary to BIM for Small and Medium Project (중소규모 사업용 BIM을 위한 데이터 사전의 활용)

  • Lee, Hwan Woo;Lee, Kyung Sub;Kim, Kwang Yang
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.26 no.6
    • /
    • pp.431-438
    • /
    • 2013
  • The systemization of construction information is required over whole life cycle of facilities to improve productivity of construction industry. BIM(Building Information Modeling) is a technology to manage information based on 3D information model. It has been actively suggested as one of the alternatives. However, it may be currently concentrated on the large project while the small and medium project based on BIM are slightly treated in indifference. In the case of small and medium project, the loss of information has been occurred more seriously than large project. However, it is hard to introduce BIM to the small and medium companies due to the lack of investment resources. This study has been performed to set up information management system based on BIM considering characteristics of small and medium project without excessive investment. In this study, pseudo BIM is defined as BIM for small and medium project. The concept of pseudo BIM has been suggested. The PLIB of ISO and construction information classification system of MOLIT in Korea are used to construct data dictionary for pseudo BIM. A pilot test is performed to verify the effectiveness of pseudo BIM.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

Analysis on the National R&D Portfolio of Food Safety in Korea from 2008 to 2010 (최근 3년(2008-2010)간 식품안전 분야 국가연구개발사업 운영 현황 분석)

  • Kwak, No-Seong;Jeong, Jiwon;Lee, Jong-Kyung
    • Journal of Food Hygiene and Safety
    • /
    • v.28 no.2
    • /
    • pp.115-123
    • /
    • 2013
  • Food safety management should be based on scientific evidences. FAO and WHO presented risk analysis as one of four principles in food safety management. WTO also admits the self safety regulation only when it is made on the basis of risk assessment. Without scientific analysis, tracing and eliminating the cause of food poisoning is impossible. Research and development plays a key role to produce scientific evidences. The Korean government ran over 40 programs in 11 agencies from 2008 to 2010. However, there is no statistics on food safety R&D at present. In this research, food safety projects conducted from 2008 to 2010 are listed up by means of analysing National Science and Technology Information Service (NTIS). The analytical criteria are the name of programs, national standard classification of science and technology, and keywords. As result, Korea Food and Drug Administration, Ministry for Food, Agriculture, Forestry and Fisheries, and Rural Development Administration play major role in the food safety R&D. The portion of more than one year projects should rise up in order to achieve the data for risk assessment, which is strongly required to improve. Besides, the research should be deeper so as to publish more SCI papers. The R&D portfolio should be changed in direction to raise up the portion of biological hazards such as norovirus. In order to do so, a large number of food safety programs should be emerged. The categories of food safety management and the hygiene/quality management of the agricultural and livestock products in the national standard classification of science and technology should be emerged because they are set up reflecting agencies' interests in spite of few differences between them.

A Study on the Reduction of Common Words to Classify Causes of Marine Accidents (해양사고 원인을 분류하기 위한 공통단어의 축소에 관한 연구)

  • Yim, Jeong-Bin
    • Journal of Navigation and Port Research
    • /
    • v.41 no.3
    • /
    • pp.109-118
    • /
    • 2017
  • The key word (KW) is a set of words to clearly express the important causations of marine accidents; they are determined by a judge in a Korean maritime safety tribunal. The selection of KW currently has two main issues: one is maintaining consistency due to the different subjective opinion of each judge, and the second is the large number of KW currently in use. To overcome the issues, the systematic framework used to construct KW's needs to be optimized with a minimal number of KW's being derived from a set of Common Words (CW). The purpose of this study is to identify a set of CW to develop the systematic KW construction frame. To fulfill the purpose, the word reduction method to find minimum number of CW is proposed using P areto distribution function and Pareto index. A total of 2,642 KW were compiled and 56 baseline CW were identified in the data sets. These CW, along with their frequency of use across all KW, are reported. Through the word reduction experiments, an average reduction rate of 58.5% was obtained. The estimated CW according to the reduction rates was verified using the Pareto chart. Through this analysis, the development of a systematic KW construction frame is expected to be possible.

Statistical Analysis of Projection-Based Face Recognition Algorithms (투사에 기초한 얼굴 인식 알고리즘들의 통계적 분석)

  • 문현준;백순화;전병민
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.5A
    • /
    • pp.717-725
    • /
    • 2000
  • Within the last several years, there has been a large number of algorithms developed for face recognition. The majority of these algorithms have been view- and projection-based algorithms. Our definition of projection is not restricted to projecting the image onto an orthogonal basis the definition is expansive and includes a general class of linear transformation of the image pixel values. The class includes correlation, principal component analysis, clustering, gray scale projection, and matching pursuit filters. In this paper, we perform a detailed analysis of this class of algorithms by evaluating them on the FERET database of facial images. In our experiments, a projection-based algorithms consists of three steps. The first step is done off-line and determines the new basis for the images. The bases is either set by the algorithm designer or is learned from a training set. The last two steps are on-line and perform the recognition. The second step projects an image onto the new basis and the third step recognizes a face in an with a nearest neighbor classifier. The classification is performed in the projection space. Most evaluation methods report algorithm performance on a single gallery. This does not fully capture algorithm performance. In our study, we construct set of independent galleries. This allows us to see how individual algorithm performance varies over different galleries. In addition, we report on the relative performance of the algorithms over the different galleries.

  • PDF