• Title/Summary/Keyword: Information Divide

Search Result 928, Processing Time 0.027 seconds

Korean Morphological Analysis Method Based on BERT-Fused Transformer Model (BERT-Fused Transformer 모델에 기반한 한국어 형태소 분석 기법)

  • Lee, Changjae;Ra, Dongyul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.4
    • /
    • pp.169-178
    • /
    • 2022
  • Morphemes are most primitive units in a language that lose their original meaning when segmented into smaller parts. In Korean, a sentence is a sequence of eojeols (words) separated by spaces. Each eojeol comprises one or more morphemes. Korean morphological analysis (KMA) is to divide eojeols in a given Korean sentence into morpheme units. It also includes assigning appropriate part-of-speech(POS) tags to the resulting morphemes. KMA is one of the most important tasks in Korean natural language processing (NLP). Improving the performance of KMA is closely related to increasing performance of Korean NLP tasks. Recent research on KMA has begun to adopt the approach of machine translation (MT) models. MT is to convert a sequence (sentence) of units of one domain into a sequence (sentence) of units of another domain. Neural machine translation (NMT) stands for the approaches of MT that exploit neural network models. From a perspective of MT, KMA is to transform an input sequence of units belonging to the eojeol domain into a sequence of units in the morpheme domain. In this paper, we propose a deep learning model for KMA. The backbone of our model is based on the BERT-fused model which was shown to achieve high performance on NMT. The BERT-fused model utilizes Transformer, a representative model employed by NMT, and BERT which is a language representation model that has enabled a significant advance in NLP. The experimental results show that our model achieves 98.24 F1-Score.

Development and Biogenesis of Peroxisome in Oil-seed Plants (지방 저장 식물의 퍼옥시좀 생성과 발달)

  • Dae-Jae Kim
    • Journal of Life Science
    • /
    • v.33 no.8
    • /
    • pp.651-662
    • /
    • 2023
  • Peroxisomes, known as microbodies, are a class of morphologically similar subcellular organelles commonly found in most eukaryotic cells. They are 0.2~1.8 ㎛ in diameter and are bound by a single membrane. The matrix is usually finely granular, but occasionally crystalline or fibrillary inclusions are observed. They characteristically contain hydrogen peroxide (H2O2) generating oxidases and contain the enzyme catalase, thus confining the metabolism of the poisonous H2O2 within these organelles. Therefore, the eukaryotic organelles are greatly dynamic both in morphology and metabolism. Plant peroxisomes, in particular, are associated with numerous metabolic processes, including β-oxidation, the glyoxylate cycle and photorespiration. Furthermore, plant peroxisomes are involved in development, along with responses to stresses such as the synthesis of important phytohormones of auxins, salicylic acid and jasmonic acids. In the past few decades substantial progress has been made in the study of peroxisome biogenesis in eukaryotic organisms, mainly in animals and yeasts. Advancement of sophisticated techniques in molecular biology and widening of the range of genomic applications have led to the identification of most peroxisomal genes and proteins (peroxins, PEXs). Furthermore, recent applications of proteome study have produced fundamental information on biogenesis in plant peroxisomes, together with improving our understanding of peroxisomal protein targeting, regulation, and degradation. Nonetheless, despite this progress in peroxisome development, much remains to be explained about how peroxisomes originate from the endoplasmic reticulum (ER), then assemble and divide. Peroxisomes perform dynamic roles in many phases of plant development, and in this review, we focus on the latest progress in furthering our understanding of plant peroxisome functions, biogenesis, and dynamics.

The Clinical Usefulness Measurement of the Whole Body Percent Fat Calculated by the Part Bone Mineral Density Measurement (부분골밀도 측정을 통해 산출되는 체지방률의 임상적 유용성에 대한 평가)

  • Kang, Young-Eun;Kim, Eun-Hye;Kim, Ho-Sung;Choi, Jong-Sook;Choi, Woo-Jun
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.15 no.1
    • /
    • pp.3-9
    • /
    • 2011
  • Purpose: Generally dual energy X-ray absorptiometry has been used for the purpose of evaluation of osteoporosis and treatment. Recently the interest of obesity came to be high and body percent fat test is increasing. Existing measure of body fat have to scan the whole body can be evaluated, but only lumbar spine and hip measurements was assumed to be whole body fat as well as improving the software. It tries to check whether the part measured value not being whole body measurement has the validity or not compared with the value calculated with the method that it is different, it forgives through a correlation with a (BIA) and (BMI). Materials and Methods: In 2010, the body percent fat was measured among the examinee coming to the Asan Medical Center public health care center from March till August against 90 females more than 40 years old through (DXA) and BIA. BMI utilized the value which wrote an hight and weight measured through the body measuring instrument in the examinee information and is automatically calculated. In addition, it classified as the low weight ($13-18.5kg/m^2$), normal ($18.5-25kg/m^2$), and corpulence ($25-30kg/m^2$) based on BMI and so that it could check whether there was the difference according to the weight or not BMI and BIA and correlation between DXA were analyzed in each group. The statistical program for the analysis used SPSS 12.0. Results: The comparison of DXA at 3 which it divides into the low weight and normal and corpulence groups and BIA did not show the difference noted statistically in all groups and the between group comparison was exposed to do not have a meaning. The body percent fat measured by the correlation analysis result DXA at the state that it doesn't divide into the group showed the high correlation (r=0.908, p0.01) noted statistically compared with BMI and showed the high correlation noted statistically in a comparison with BIA (r=0.927, p0.01). Conclusion: It confirmed that the whole body percent fat presumed from the part bone density measurement showed the excel correlation compared with BIA and BMI and information is high. There is still no clear standard about the presumed whole body percent fat and it is difficult to evaluate the fat evaluation by the bone mineral density measurement. However, it is determined that the information offering which is more objective through the comparative study with the body percent fat which is very efficient and in that it can obtain till the information about a fat as well as diagnosis of the osteoporosis through the bone density checkup is measured by the afterward telegraph bone density checkup and is clinically useful is possible.

  • PDF

A Study on the Access in the Government Archives & Records Service of Korea (한국 정부기록보존소의 역사기록물 공개에 관한 검토)

  • Lee, Jin-Young
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.3 no.1
    • /
    • pp.129-140
    • /
    • 2003
  • The ultimate goal of preserving and maintaining the records is to use them practically. The effective use of records should be supported by the reasonable recordskeeping systems and access standards. In this report, I examined the Korean laws and administrative systems related to the public records access issues. After I pointed out major problems of the access laws, the Government Information Opening Act (GOIA), and the problems in practices, I suggested some alternatives for the betterment of the access system. The GIOA established "eight standards of exemption to access" not to open some information to protect national interests and privacy. The Public Records Management Act (PRMA) applies to the archives transferred to "professional archives." The two laws show fundamental differences in the ways to open the public records to public. First, the GIOA deals with the whole information (the records) that public institutions keep and maintain, while the PRMA deals with the records that were transferred to the Government Archives. Second, the GIOA provides with a legal procedure to open public records and the standards to open or not to open them, while the PRMA allows the Government Archives to decide whether the transferred records should be opened or not. Third, the GIOA applies to record producing agencies, while the PRMA applies to public archival institutions. One of the most critical inadequacies of the PRMA is that there are no standards to judge to open the archives through reclassification procedure. The GIOA also suggests only the type of information that is not accessible. It does not specify how long the records can be closed. The GARS does not include the records less than 30 years old as its objects of the reclassification. To facilitate the opening of the archives, we need to revise the GIOA and the PRMA. It is necessary to clearly divide the realms between the GIOA and the PRMA on the access of the archives. The PRMA should clarify the principles of the reclassification as well as reclassifying method and exceptions. The exemption standards of the GIOA should be revised to restrict the abuse of the exemption clauses, and they should not be applied to the archives in the GARS indiscreetly and unconditionally.

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS (LDA, Top2Vec, BERTopic 모형의 토픽모델링 비교 연구 - 국외 문헌정보학 분야를 중심으로 -)

  • Yong-Gu Lee;SeonWook Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.5-30
    • /
    • 2024
  • The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Exploratory Case Study for Key Successful Factors of Producy Service System (Product-Service System(PSS) 성공과 실패요인에 관한 탐색적 사례 연구)

    • Park, A-Rum;Jin, Dong-Su;Lee, Kyoung-Jun
      • Journal of Intelligence and Information Systems
      • /
      • v.17 no.4
      • /
      • pp.255-277
      • /
      • 2011
    • Product Service System(PSS), which is an integrated combination of product and service, provides new value to customer and makes companies sustainable as well. The objective of this paper draws Critical Successful Factors(CSF) of PSS through multiple case study. First, we review various concepts and types in PSS and Platform business literature currently available on this topic. Second, after investigating various cases with the characteristics of PSS and platform business, we select four cases of 'iPod of Apple', 'Kindle of Amazon', 'Zune of Microsoft', and 'e-book reader of Sony'. Then, the four cases are categorized as successful and failed cases according to criteria of case selection and PSS classification. We consider two methodologies for the case selection, i.e., 'Strategies for the Selection of Samples and Cases' proposed by Bent(2006) and the seven case selection procedures proposed by Jason and John(2008). For case selection, 'Stratified sample and Paradigmatic cases' is adopted as one of several options for sampling. Then, we use the seven case selection procedures such as 'typical', 'diverse', 'extreme', 'deviant', 'influential', 'most-similar', and 'mostdifferent' and among them only three procedures of 'diverse', 'most?similar', and 'most-different' are applied for the case selection. For PSS classification, the eight PSS types, suggested by Tukker(2004), of 'product related', 'advice and consulancy', 'product lease', 'product renting/sharing', 'product pooling', 'activity management', 'pay per service unit', 'functional result' are utilized. We categorize the four selected cases as a product oriented group because the cases not only sell a product, but also offer service needed during the use phase of the product. Then, we analyze the four cases by using cross-case pattern that Eisenhardt(1991) suggested. Eisenhardt(1991) argued that three processes are required for avoiding reaching premature or even false conclusion. The fist step includes selecting categories of dimensions and finding within-group similarities coupled with intergroup difference. In the second process, pairs of cases are selected and listed. The second step forces researchers to find the subtle similarities and differences between cases. The third process is to divide the data by data source. The result of cross-case pattern indicates that the similarities of iPod and Kindle as successful cases are convenient user interface, successful plarform strategy, and rich contents. The differences between the successful cases are that, wheares iPod has been recognized as the culture code, Kindle has implemented a low price as its main strategy. Meanwhile, the similarities of Zune and PRS series as failed cases are lack of sufficient applications and contents. The differences between the failed cases are that, wheares Zune adopted an undifferentiated strategy, PRS series conducted high-price strategy. From the analysis of the cases, we generate three hypotheses. The first hypothesis assumes that a successful PSS system requires convenient user interface. The second hypothesis assumes that a successful PSS system requires a reciprocal(win/win) business model. The third hypothesis assumes that a successful PSS system requires sufficient quantities of applications and contents. To verify the hypotheses, we uses the cross-matching (or pattern matching) methodology. The methodology matches three key words (user interface, reciprocal business model, contents) of the hypotheses to the previous papers related to PSS, digital contents, and Information System (IS). Finally, this paper suggests the three implications from analyzed results. A successful PSS system needs to provide differentiated value for customers such as convenient user interface, e.g., the simple design of iTunes (iPod) and the provision of connection to Kindle Store without any charge. A successful PSS system also requires a mutually benefitable business model as Apple and Amazon implement a policy that provides a reasonable proft sharing for third party. A successful PSS system requires sufficient quantities of applications and contents.

    Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

    • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
      • Journal of Intelligence and Information Systems
      • /
      • v.22 no.3
      • /
      • pp.45-69
      • /
      • 2016
    • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

    Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

    • Kim, Myeong-Kyun;Cho, Yoonho
      • Journal of Intelligence and Information Systems
      • /
      • v.18 no.4
      • /
      • pp.59-77
      • /
      • 2012
    • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

    Business Application of Convolutional Neural Networks for Apparel Classification Using Runway Image (합성곱 신경망의 비지니스 응용: 런웨이 이미지를 사용한 의류 분류를 중심으로)

    • Seo, Yian;Shin, Kyung-shik
      • Journal of Intelligence and Information Systems
      • /
      • v.24 no.3
      • /
      • pp.1-19
      • /
      • 2018
    • Large amount of data is now available for research and business sectors to extract knowledge from it. This data can be in the form of unstructured data such as audio, text, and image data and can be analyzed by deep learning methodology. Deep learning is now widely used for various estimation, classification, and prediction problems. Especially, fashion business adopts deep learning techniques for apparel recognition, apparel search and retrieval engine, and automatic product recommendation. The core model of these applications is the image classification using Convolutional Neural Networks (CNN). CNN is made up of neurons which learn parameters such as weights while inputs come through and reach outputs. CNN has layer structure which is best suited for image classification as it is comprised of convolutional layer for generating feature maps, pooling layer for reducing the dimensionality of feature maps, and fully-connected layer for classifying the extracted features. However, most of the classification models have been trained using online product image, which is taken under controlled situation such as apparel image itself or professional model wearing apparel. This image may not be an effective way to train the classification model considering the situation when one might want to classify street fashion image or walking image, which is taken in uncontrolled situation and involves people's movement and unexpected pose. Therefore, we propose to train the model with runway apparel image dataset which captures mobility. This will allow the classification model to be trained with far more variable data and enhance the adaptation with diverse query image. To achieve both convergence and generalization of the model, we apply Transfer Learning on our training network. As Transfer Learning in CNN is composed of pre-training and fine-tuning stages, we divide the training step into two. First, we pre-train our architecture with large-scale dataset, ImageNet dataset, which consists of 1.2 million images with 1000 categories including animals, plants, activities, materials, instrumentations, scenes, and foods. We use GoogLeNet for our main architecture as it has achieved great accuracy with efficiency in ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Second, we fine-tune the network with our own runway image dataset. For the runway image dataset, we could not find any previously and publicly made dataset, so we collect the dataset from Google Image Search attaining 2426 images of 32 major fashion brands including Anna Molinari, Balenciaga, Balmain, Brioni, Burberry, Celine, Chanel, Chloe, Christian Dior, Cividini, Dolce and Gabbana, Emilio Pucci, Ermenegildo, Fendi, Giuliana Teso, Gucci, Issey Miyake, Kenzo, Leonard, Louis Vuitton, Marc Jacobs, Marni, Max Mara, Missoni, Moschino, Ralph Lauren, Roberto Cavalli, Sonia Rykiel, Stella McCartney, Valentino, Versace, and Yve Saint Laurent. We perform 10-folded experiments to consider the random generation of training data, and our proposed model has achieved accuracy of 67.2% on final test. Our research suggests several advantages over previous related studies as to our best knowledge, there haven't been any previous studies which trained the network for apparel image classification based on runway image dataset. We suggest the idea of training model with image capturing all the possible postures, which is denoted as mobility, by using our own runway apparel image dataset. Moreover, by applying Transfer Learning and using checkpoint and parameters provided by Tensorflow Slim, we could save time spent on training the classification model as taking 6 minutes per experiment to train the classifier. This model can be used in many business applications where the query image can be runway image, product image, or street fashion image. To be specific, runway query image can be used for mobile application service during fashion week to facilitate brand search, street style query image can be classified during fashion editorial task to classify and label the brand or style, and website query image can be processed by e-commerce multi-complex service providing item information or recommending similar item.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.