Search | Korea Science

An Embedding /Extracting Method of Audio Watermark Information for High Quality Stereo Music (고품질 스테레오 음악을 위한 오디오 워터마크 정보 삽입/추출 기술)

Bae, Kyungyul
- Journal of Intelligence and Information Systems
- /
- 제24권2호
- /
- pp.21-35
- /
- 2018
Since the introduction of MP3 players, CD recordings have gradually been vanishing, and the music consuming environment of music users is shifting to mobile devices. The introduction of smart devices has increased the utilization of music through music playback, mass storage, and search functions that are integrated into smartphones and tablets. At the time of initial MP3 player supply, the bitrate of the compressed music contents generally was 128 Kbps. However, as increasing of the demand for high quality music, sound quality of 384 Kbps appeared. Recently, music content of FLAC (Free License Audio Codec) format using lossless compression method is becoming popular. The download service of many music sites in Korea has classified by unlimited download with technical protection and limited download without technical protection. Digital Rights Management (DRM) technology is used as a technical protection measure for unlimited download, but it can only be used with authenticated devices that have DRM installed. Even if music purchased by the user, it cannot be used by other devices. On the contrary, in the case of music that is limited in quantity but not technically protected, there is no way to enforce anyone who distributes it, and in the case of high quality music such as FLAC, the loss is greater. In this paper, the author proposes an audio watermarking technology for copyright protection of high quality stereo music. Two kinds of information, "Copyright" and "Copy_free", are generated by using the turbo code. The two watermarks are composed of 9 bytes (72 bits). If turbo code is applied for error correction, the amount of information to be inserted as 222 bits increases. The 222-bit watermark was expanded to 1024 bits to be robust against additional errors and finally used as a watermark to insert into stereo music. Turbo code is a way to recover raw data if the damaged amount is less than 15% even if part of the code is damaged due to attack of watermarked content. It can be extended to 1024 bits or it can find 222 bits from some damaged contents by increasing the probability, the watermark itself has made it more resistant to attack. The proposed algorithm uses quantization in DCT so that watermark can be detected efficiently and SNR can be improved when stereo music is converted into mono. As a result, on average SNR exceeded 40dB, resulting in sound quality improvements of over 10dB over traditional quantization methods. This is a very significant result because it means relatively 10 times improvement in sound quality. In addition, the sample length required for extracting the watermark can be extracted sufficiently if the length is shorter than 1 second, and the watermark can be completely extracted from music samples of less than one second in all of the MP3 compression having a bit rate of 128 Kbps. The conventional quantization method can extract the watermark with a length of only 1/10 compared to the case where the sampling of the 10-second length largely fails to extract the watermark. In this study, since the length of the watermark embedded into music is 72 bits, it provides sufficient capacity to embed necessary information for music. It is enough bits to identify the music distributed all over the world. 272 can identify $4*10^{21}$, so it can be used as an identifier and it can be used for copyright protection of high quality music service. The proposed algorithm can be used not only for high quality audio but also for development of watermarking algorithm in multimedia such as UHD (Ultra High Definition) TV and high-resolution image. In addition, with the development of digital devices, users are demanding high quality music in the music industry, and artificial intelligence assistant is coming along with high quality music and streaming service. The results of this study can be used to protect the rights of copyright holders in these industries.
https://doi.org/10.13088/jiis.2018.24.2.021 인용 PDF KSCI

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- 제25권2호
- /
- pp.141-166
- /
- 2019
Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.
https://doi.org/10.13088/jiis.2019.25.2.141 인용 PDF KSCI HTML

Correlation of Proliferating Cell Nuclear Antigen (PCNA) Expression and S-phase Fraction, Survival Rate in Primary Non-Small Cell Lung Cancer (원발성 비소세포 폐암에서 PCNA의 발현정도와 암세포의 분열능 및 생존률과의 관계)

Yang, Sei-Hoon;Kim, Hak-Ryul;Gu, Ki-Seon;Jung, Byung-Hak;Jeong, Eun-Taik
- Tuberculosis and Respiratory Diseases
- /
- 제44권4호
- /
- pp.756-765
- /
- 1997
Background : To study the prognosis of patients with lung cancer, many investigators have reported the methods to detect cell proliferation in tissues including PCNA, thymidine autoradiography, flow cytometry and Ki-67. PCNA, also known as cyclin, is a cell related nuclear protein with 36KD intranuclear polypeptide that is maximally elevated in S phase of proliferating cells. In this study, PCNA was identified by paraffin-embedding tissue using immunohistochemistry which has an advantage of simplicity and maintenance of tissue architecture. The variation of PCNA expression is known to be related with proliferating fraction, histologic type, anatomic(TNM) stage, degree of cell differentiation, S-phase fraction and survival rate. We analyzed the correlation between PCNA expression and S-phase fraction, survival. Method : To investigate expression of PCNA in primary lung cancer, we used immunohistochemical stain to paraffin-embedded sections of 57 resected primary non-small cell lung cancer specimen and the results were analyzed according to the cell type, cell differentiation, TNM stage, S-phase fraction and survival. Results : PCNA expression was divided into five group according to degree of staging(-, +, ++, +++, ++++). Squamous cell type showed high positivity than in adenocarcinoma. Nonsignificant difference related to TNM stage was noticed. Nonsignificant difference related to degree of cell differentiation was noticed. S-phase fraction was increased with advance of PCNA positivity, but it could not reach the statistic significance. The 2 year survival rate and median survival time were -50% 13 months, +75% 41.3 months, ++73% 33.6 months, +++67% 29.0 months, ++++25% 9 months with statistic significance (P<0.05, Kaplan-Meier, generalized Wilcox). Conclusion : From this study, PCNA expression was high positive in squamous cell cancer. And, there was no relationship between PCNA positivity and TNM stage, cellular differentiation or S-phase fraction. But, the patients with high positive PCNA staining showed poor survival rate than the patients with lower positive PCNA staining (p<0.05). It was concluded that PCNA immunostaining is a simple and useful method for survival prediction in paraffin embedded tissue of non-small cell lung cancer.
PDF

Temperature Compensation of Optical FBG Sensors Embedded Tendon for Long-term Monitoring of Tension Force of Ground Anchor (광섬유 센서 내장형 텐던을 이용한 그라운드 앵커의 장기 장력모니터링을 위한 온도보상)

Sung, Hyun-Jong;Kim, Young-Sang;Kim, Jae-Min;Park, Gui-Hyun
- Journal of the Korean Geotechnical Society
- /
- 제28권5호
- /
- pp.13-25
- /
- 2012
Ground anchor method is one of the most popular reinforcing technology for slope in Korea. For the health monitoring of slope which is reinforced by permanent anchor for a long period, monitoring of the tension force of ground anchor is very important. However, since electromechanical sensors such as strain gauge and V/W type load cell are also subject to long-term risk as well as suffering from noise during long distance transmission and immunity to electromagnetic interference (EMI), optical FBG sensors embedded tendon was developed to measure strain of 7-wire strand by embedding FBG sensor into the center king cable of 7-wire strand. This FBG sensors embedded tendon has been successfully applied to measuring the short-term anchor force. But to adopt this tendon to long-term monitoring, temperature compensation of the FBG sensors embedded tendon should be done. In this paper, we described how to compensate the effect in compliance with the change of underground temperature during long-term tension force monitoring of ground anchors by using optical fiber sensors (FBG: Fiber Bragg Grating). The model test was carried out to determine the temperature sensitivity coefficient (${\beta}^{\prime}$) of FBG sensors embedded tendon. The determined temperature sensitivity coefficient ${\beta}^{\prime}=2.0{\times}10^{-5}/^{\circ}C$ was verified by comparing the ground temperatures predicted from the proposed sensor using ${\beta}^{\prime}$ with ground temperatures measured from ground thermometer. Finally, temperature compensations were carried out based on ${\beta}^{\prime}$ value and ground temperature measurement from KMA for the tension force monitoring results of tension type and compression type anchors, which had been installed more than 1 year before at the test site. Temperature compensated tension forces are compared with those measured from conventional load cell during the same measuring time. Test results show that determined temperature sensitivity coefficient (${\beta}^{\prime}$) of FBG sensors embedded tendon is valid and proposed temperature compensation method is also appropriate from the fact that the temperature compensated tension forces are not dependent on the change of ground temperature and are consistent with the tension forces measured from the conventional load cell.
https://doi.org/10.7843/kgs.2012.28.5.13 인용 PDF KSCI

Expression of UT-A in Rat Kidney: Ultrastructural Immunocytochemistry (흰쥐 콩팥에서 요소운반체-A의 발현: 미세구조적 면역세포화학법)

Lim, Sun-Woo;Jung, Ju-Young;Kim, Wan-Young;Han, Ki-Hwan;Cha, Jung-Ho;Chung, Jin-Woong;Kim, Jin
- Applied Microscopy
- /
- 제32권2호
- /
- pp.91-105
- /
- 2002
Urea transport in the kidney is mediated by a family of transporter proteins that includes renal urea transporters (UT-A) and erythrocyte urea transporters (UT-B). The cDNA of five isoforms of rat UT-A, UTA1, UT-A2, UT-A3, UT-A4, and UT-A5 have been cloned. The purpose of this study was to examine the expression of UT-A (L194), which marked UT-A1, UT-A2 and UT-A4. Male Sprague-Dawley rats, weighing approximately 200 g, were divided into three group: control rats had free access to water, dehydrated rats were deprived of water for 3 d, and water loaded rats had free access to 3% sucrose water for 3 d before being killed. The kidneys were preserved by in vivo perfusion through the abdominal aorta with the 2% paraformaldehyde-lysine- periodate (PLP) or 8% paraformaldehyde solution for 10 min. The sections were processed for immunohistochemical studies using pre-embedding immunoperoxidase method and immunogold method. In the normal rat kidney, UT-A1 was expressed intensely in the cytoplasm of the inner medullary collecting duct (IMCD) cell and UT-A2 was expressed on the plasma membrane of the terminal portion of the shortloop descending thin limb (DTL) cells (type I epithelium) and of the long-loop DTL cells (type II epithelium) in the initial part of the inner medulla. Immunoreactivity for UT-A1 in the IMCD cells, was decreased in dehydrated animals whereas strongly increased in water loaded animals compared with control animals. In the short-loop DTL, immunoreactivity for UT-A2 was increased in intensity in both dehydrated and water loaded groups. However, in the long-loop DTL of the outer part of the inner medulla, immunoreactivity for UT-A2 was markedly increase in intensity in dehydrated group, but not in water loaded group. In conclusion, in the rat kidney, UT-A1 is located in the cytoplasm of IMCD cells, whereas UT-A2 is located in the plasma membrane of both the short-and long-loop DTL cells. Immunohistochemistry studies revealed that UT-A1 and UT-A2 may have a different role in urea transport and are regulated by different mechanisms.
PDF KSCI

Prediction of multipurpose dam inflow utilizing catchment attributes with LSTM and transformer models (유역정보 기반 Transformer및 LSTM을 활용한 다목적댐 일 단위 유입량 예측)

Kim, Hyung Ju;Song, Young Hoon;Chung, Eun Sung
- Journal of Korea Water Resources Association
- /
- 제57권7호
- /
- pp.437-449
- /
- 2024
Rainfall-runoff prediction studies using deep learning while considering catchment attributes have been gaining attention. In this study, we selected two models: the Transformer model, which is suitable for large-scale data training through the self-attention mechanism, and the LSTM-based multi-state-vector sequence-to-sequence (LSTM-MSV-S2S) model with an encoder-decoder structure. These models were constructed to incorporate catchment attributes and predict the inflow of 10 multi-purpose dam watersheds in South Korea. The experimental design consisted of three training methods: Single-basin Training (ST), Pretraining (PT), and Pretraining-Finetuning (PT-FT). The input data for the models included 10 selected watershed attributes along with meteorological data. The inflow prediction performance was compared based on the training methods. The results showed that the Transformer model outperformed the LSTM-MSV-S2S model when using the PT and PT-FT methods, with the PT-FT method yielding the highest performance. The LSTM-MSV-S2S model showed better performance than the Transformer when using the ST method; however, it showed lower performance when using the PT and PT-FT methods. Additionally, the embedding layer activation vectors and raw catchment attributes were used to cluster watersheds and analyze whether the models learned the similarities between them. The Transformer model demonstrated improved performance among watersheds with similar activation vectors, proving that utilizing information from other pre-trained watersheds enhances the prediction performance. This study compared the suitable models and training methods for each multi-purpose dam and highlighted the necessity of constructing deep learning models using PT and PT-FT methods for domestic watersheds. Furthermore, the results confirmed that the Transformer model outperforms the LSTM-MSV-S2S model when applying PT and PT-FT methods.
https://doi.org/10.3741/JKWRA.2024.57.7.437 인용 PDF

EFFECT OF LIGHT IRRADIATION MODES ON THE MARGINAL LEAKAGE OF COMPOSITE RESIN RESTORATION (광조사 방식이 복합레진 수복물의 변연누출에 미치는 영향)

박은숙;김기옥;김성교
- Restorative Dentistry and Endodontics
- /
- 제26권4호
- /
- pp.263-272
- /
- 2001
The aim of this study was to investigate the influence of four different light curing modes on the marginal leakage of Class V composite resin restoration. Eighty extracted human premolars were used. Wedge-shaped class Y cavities were prepared on the buccal surface of the tooth with high-speed diamond bur without bevel. The cavities were positioned half of the cavity above and half beyond the cemento-enamel junction. The depth, height, and width of the cavity were 2 mm, 3 mm and 2 mm respectively. The specimens were divided into 4 groups of 20 teeth each. All the specimen cavities were treated with Prime & Bond$^{R}$ NT dental adhesive system (Dentsply DeTrey GmbH, Germany) according to the manufacturer's instructions and cured for 10 seconds except group VI which were cured for 3 seconds. All the cavities were restored with resin composite Spectrum$^{TM}$ TPH A2 (Dentsply DeTrey GmbH, Germany) in a bulk. Resin composites were light-cured under 4 different modes. A regular intensity group (600 mW/${cm}^2$, group I) was irradiated for 30 s, a low intensity group (300 mW/${cm}^2$, group II) for 60 s and a ultra-high intensity group (1930 mW/${cm}^2$, group IV) for 3 s. A pulse-delay group (group III) was irradiated with 400 mW/${cm}^2$ for 2 s followed by 800 mW/${cm}^2$ for 10 s after 5 minutes delay. The Spectrum$^{TM}$ 800 (Dentsply DeTrey GmbH, Germany) light-curing units were used for groups I, II and III and Apollo 95E (DMD, U.S.A.) was used for group IV. The composite resin specimens were finished and polished immediately after light curing except group III which were finished and polished during delaying time. Specimens were stored in a physiologic saline solution at 37$^{\circ}C$ for 24 hours. After thermocycling (500$\times$, 5-55$^{\circ}C$), all teeth were covered with nail varnish up to 0.5 mm from the margins of the restorations, immersed in 37$^{\circ}C$, 2% methylene blue solution for 24 hours, and rinsed with tap water for 24 hours. After embedding in clear resin, the specimens were sectioned with a water-cooled diamond saw (Isomet$^{TM}$, Buehler Co., Lake Bluff, IL, U.S.A.) along the longitudinal axis of the tooth so as to pass the center of the restorations. The cut surfaces were examined under a stereomicroscope (SZ-PT Olympus, Japan) at ${\times}$25 magnification, and the images were captured with a CCD camera (GP-KR222, Panasonic, Japan) and stored in a computer with Studio Grabber program. Dye penetration depth at the restoration/dentin and the restoration/enamel interfaces was measured as a rate of the entire depth of the restoration using a software (Scion image, Scion Corp., U.S.A.) The data were analysed statistically using One-way ANOVA and Tukey's method. The results were as follows : 1. Pulse-Delay group did not show any significant difference in dye penetration rate from other groups at enamel and dentin margins (p>0.05) 2. At dentin margin, ultra-high intensity group showed significantly higher dye penetration rate than both regular intensity group and low intensity group (p<0.05). 3. At enamel margin, there were no statistically significant difference among four groups (p>0.05). 4. Dentin margin showed significantly higher dye penetration rate than enamel margin in all groups (p<0.05).
PDF

THE CHANGE OF BONE FORMATION ACCORDING TO MAGNETIC INTENSITY OF MAGNET PLACID INTO TITANIUM IMPLANT SPECIMENS (타이타늄 임플랜트 시편 내부에 설치한 자석의 자성강도에 따른 골형성 변화)

Hwang Yun-Tae;Lee Sung-Bok;Choi Dae-Gyun;Choi Boo-Byung
- The Journal of Korean Academy of Prosthodontics
- /
- 제43권2호
- /
- pp.232-247
- /
- 2005
Purpose. The purposes of this investigation were to discover the possibility of clinical application in the areas of dental implants and bone grafts by investigating the bone formation histologically around specimen which was depending on the intensity of magnetic field of neodymium magnet inside of the specimens. Material and method. 1. Measurement of magnetic intensity - placed the magnet inside of the specimen, and measured the intensity of magnetic field around the 1st thread and 3rd thread of specimen 20 times by using a Gaussmeter(Kanetec Co., Japan). 2. Surgical Procedure - Male rabbit was anesthetised by constant amount of Ketamine (0.25ml/kg) and Rompun (0.25ml/kg). After incising the flat part of tibia, and planted the specimens of titanium implant, control group was stitched without magnet, while experimental groups were placed a magnedisc 500(Aichi Steel Co., Japan) or magnedisc 800(Aichi Steel Co., Japan) into it, fixed by pattern resin and stitched. 3. Management after the surgery - In order to prevent it from the infection of bacteria and for antiinflammation, Gentamycin and Ketopro were injected during 1 week from operation day, and dressed with potadine. 4. Preparation of histomorphometric analysis - At 2, 4 and 8 weeks after the surgery, the animals were sacrificed by excessed Ketamine, and then, specimens were obtained including the operated part and some parts of tibia, and fixed it to 10% of PBS buffer solution. After embedding specimens in Technovit 1200 and B.P solution, made a H-E stain. Samples width was 75$\mu$m . In histological findings through the optical microscope and using Kappa image base program(Olympus Co. Japan), the bone contact ratio and bone area ratio of each parts of specimens were measured and analyzed. 5. Statistical analysis - Statistical analysis was accomplished with Mann Whitney U-test. Results and conclusion. 1. In histomorphometric findings, increased new bone formation was shown in both control & experimental groups through the experiment performed for 2, 4 & 8 weeks. After 4 weeks, more osteoblasts and osteoclasts with significant bone remodeling were shown in experimental groups. 2. In histomorphometric analysis, the bone contact ratios were 38.5% for experimental group 1, 29.5% for experimental group 2 and 11.9% for control group. Experimental groups were higher than control group(p<0.05) (Fig. 6, Table IV). The bone area ratios were 60.9% for experimental group 2, 46.4% for experimental group 1 and 36.0% for control group. There was no significantly statistical difference between experimental groups and control group(p<0.05) (Fig. 8, Table VII) 3. In comparision of the bone contact ratios at each measurement sites according to magnetic intensity, experimental group 2(5.6mT) was higher than control group at the 1st thread (p<0.05) and experimental group 1 (1.8mT) was higher than control group at the 3rd thread(p<0.05) (Fig. 7, Table V, VI). 4. In comparision of the bone area ratios at each measurement sites according to magnetic intensity, experimental group 2(5.6mT) was higher than control group and experimental group 1 (4.0mT) at the 1st thread(p<0.1) and experimental group 2(4.4mT) was higher than experimental group 1 (1.8mT) at the 3rd thread(p<0.1) (Fig. 9, Table IX, X). Experiment group 2 was largest, followed by experiment group l and control group at the 3rd thread of implant. There was a significant difference at the 1st thread of control group & experiment group 2, and at 1st thread & 3rd thread of experiment group 1 & 2, and not at control group experiment group 1.(p<0.1)
PDF KSCI

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
- Journal of Intelligence and Information Systems
- /
- 제24권2호
- /
- pp.59-83
- /
- 2018
With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.
https://doi.org/10.13088/jiis.2018.24.2.059 인용 PDF KSCI

검색결과 699건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)