• Title/Summary/Keyword: automatically

Search Result 6,815, Processing Time 0.032 seconds

Development of Cloud Detection Method Considering Radiometric Characteristics of Satellite Imagery (위성영상의 방사적 특성을 고려한 구름 탐지 방법 개발)

  • Won-Woo Seo;Hongki Kang;Wansang Yoon;Pyung-Chae Lim;Sooahm Rhee;Taejung Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1211-1224
    • /
    • 2023
  • Clouds cause many difficult problems in observing land surface phenomena using optical satellites, such as national land observation, disaster response, and change detection. In addition, the presence of clouds affects not only the image processing stage but also the final data quality, so it is necessary to identify and remove them. Therefore, in this study, we developed a new cloud detection technique that automatically performs a series of processes to search and extract the pixels closest to the spectral pattern of clouds in satellite images, select the optimal threshold, and produce a cloud mask based on the threshold. The cloud detection technique largely consists of three steps. In the first step, the process of converting the Digital Number (DN) unit image into top-of-atmosphere reflectance units was performed. In the second step, preprocessing such as Hue-Value-Saturation (HSV) transformation, triangle thresholding, and maximum likelihood classification was applied using the top of the atmosphere reflectance image, and the threshold for generating the initial cloud mask was determined for each image. In the third post-processing step, the noise included in the initial cloud mask created was removed and the cloud boundaries and interior were improved. As experimental data for cloud detection, CAS500-1 L2G images acquired in the Korean Peninsula from April to November, which show the diversity of spatial and seasonal distribution of clouds, were used. To verify the performance of the proposed method, the results generated by a simple thresholding method were compared. As a result of the experiment, compared to the existing method, the proposed method was able to detect clouds more accurately by considering the radiometric characteristics of each image through the preprocessing process. In addition, the results showed that the influence of bright objects (panel roofs, concrete roads, sand, etc.) other than cloud objects was minimized. The proposed method showed more than 30% improved results(F1-score) compared to the existing method but showed limitations in certain images containing snow.

CT-Derived Deep Learning-Based Quantification of Body Composition Associated with Disease Severity in Chronic Obstructive Pulmonary Disease (CT 기반 딥러닝을 이용한 만성 폐쇄성 폐질환의 체성분 정량화와 질병 중증도)

  • Jae Eun Song;So Hyeon Bak;Myoung-Nam Lim;Eun Ju Lee;Yoon Ki Cha;Hyun Jung Yoon;Woo Jin Kim
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.5
    • /
    • pp.1123-1133
    • /
    • 2023
  • Purpose Our study aimed to evaluate the association between automated quantified body composition on CT and pulmonary function or quantitative lung features in patients with chronic obstructive pulmonary disease (COPD). Materials and Methods A total of 290 patients with COPD were enrolled in this study. The volume of muscle and subcutaneous fat, area of muscle and subcutaneous fat at T12, and bone attenuation at T12 were obtained from chest CT using a deep learning-based body segmentation algorithm. Parametric response mapping-derived emphysema (PRMemph), PRM-derived functional small airway disease (PRMfSAD), and airway wall thickness (AWT)-Pi10 were quantitatively assessed. The association between body composition and outcomes was evaluated using Pearson's correlation analysis. Results The volume and area of muscle and subcutaneous fat were negatively associated with PRMemph and PRMfSAD (p < 0.05). Bone density at T12 was negatively associated with PRMemph (r = -0.1828, p = 0.002). The volume and area of subcutaneous fat and bone density at T12 were positively correlated with AWT-Pi10 (r = 0.1287, p = 0.030; r = 0.1668, p = 0.005; r = 0.1279, p = 0.031). However, muscle volume was negatively correlated with the AWT-Pi10 (r = -0.1966, p = 0.001). Muscle volume was significantly associated with pulmonary function (p < 0.001). Conclusion Body composition, automatically assessed using chest CT, is associated with the phenotype and severity of COPD.

An Intervention Study on Integration of Family Planning and Maternal/Infant Care Services in Rural Korea (가족계획과 모자보건 통합을 위한 조산원의 투입효과 분석 -서산지역의 개입연구 평가보고-)

  • Bang, Sook;Han, Seung-Hyun;Lee, Chung-Ja;Ahn, Moon-Young;Lee, In-Sook;Kim, Eun-Shil;Kim, Chong-Ho
    • Journal of Preventive Medicine and Public Health
    • /
    • v.20 no.1 s.21
    • /
    • pp.165-203
    • /
    • 1987
  • This project was a service-cum-research effort with a quasi-experimental study design to examine the health benefits of an integrated Family Planning (FP)/Maternal & Child health (MCH) Service approach that provides crucial factors missing in the present on-going programs. The specific objectives were: 1) To test the effectiveness of trained nurse/midwives (MW) assigned as change agents in the Health Sub-Center (HSC) to bring about the changes in the eight FP/MCH indicators, namely; (i)FP/MCH contacts between field workers and their clients (ii) the use of effective FP methods, (iii) the inter-birth interval and/or open interval, (iv) prenatal care by medically qualified personnel, (v) medically supervised deliveries, (vi) the rate of induced abortion, (vii) maternal and infant morbidity, and (viii) preinatal & infant mortality. 2) To measure the integrative linkage (contacts) between MW & HSC workers and between HSC and clients. 3) To examine the organizational or administrative factors influencing integrative linkage between health workers. Study design; The above objectives called for quasi-experimental design setting up a study and control area with and without a midwife. An active intervention program (FP/MCH minimum 'package' program) was conducted for a 2 year period from June 1982-July 1984 in Seosan County and 'before and after' surveys were conducted to measure the change. Service input; This study was undertaken by the Soonchunhyang University in collaboration with WHO. After a baseline survery in 1981, trained nurses/midwives were introduced into two health sub-centers in a rural setting (Seosan county) for a 2 year period from 1982 to 1984. A major service input was the establishment of midwifery services in the existing health delivery system with emphasis on nurse/midwife's role as the link between health workers (nurse aids) and village health workers, and the referral of risk patients to the private physician (OBGY specialist). An evaluation survey was made in August 1984 to assess the effectiveness of this alternative integrated approach in the study areas in comparison with the control area which had normal government services. Method of evaluation; a. In this study, the primary objective was first to examine to what extent the FP/MCH package program brought about changes in the pre-determined eight indicators (outcome and impact measures) and the following relationship was first analyzed; b. Nevertheless, this project did not automatically accept the assumption that if two or more activities were integrated, the results would automatically be better than a non-integrated or categorical program. There is a need to assess the 'integration process' itself within the package program. The process of integration was measured in terms of interactive linkages, or the quantity & quality of contacts between workers & clients and among workers. Intergrative linkages were hypothesized to be influenced by organizational factors at the HSC clinic level including HSC goals, sltrurture, authority, leadership style, resources, and personal characteristics of HSC staff. The extent or degree of integration, as measured by the intensity of integrative linkages, was in turn presumed to influence programme performance. Thus as indicated diagrammatically below, organizational factors constituted the independent variables, integration as the intervening variable and programme performance with respect to family planning and health services as the dependent variable: Concerning organizational factors, however, due to the limited number of HSCs (2 in the study area and 3 in the control area), they were studied by participatory observation of an anthropologist who was independent of the project. In this observation, we examined whether the assumed integration process actually occurred or not. If not, what were the constraints in producing an effective integration process. Summary of Findings; A) Program effects and impact 1. Effects on FP use: During this 2 year action period, FP acceptance increased from 58% in 1981 to 78% in 1984 in both the study and control areas. This increase in both areas was mainly due to the new family planning campaign driven by the Government for the same study period. Therefore, there was no increment of FP acceptance rate due to additional input of MW to the on-going FP program. But in the study area, quality aspects of FP were somewhat improved, having a better continuation rate of IUDs & pills and more use of effective Contraceptive methods in comparison with the control area. 2. Effects of use of MCH services: Between the study and control areas, however, there was a significant difference in maternal and child health care. For example, the coverage of prenatal care was increased from 53% for 1981 birth cohort to 75% for 1984 birth cohort in the study area. In the control area, the same increased from 41% (1981) to 65% (1984). It is noteworthy that almost two thirds of the recent birth cohort received prenatal care even in the control area, indicating that there is a growing demand of MCH care as the size of family norm becomes smaller 3. There has been a substantive increase in delivery care by medical professions in the study area, with an annual increase rate of 10% due to midwives input in the study areas. The project had about two times greater effect on postnatal care (68% vs. 33%) at delivery care(45.2% vs. 26.1%). 4. The study area had better reproductive efficiency (wanted pregancies with FP practice & healthy live births survived by one year old) than the control area, especially among women under 30 (14.1% vs. 9.6%). The proportion of women who preferred the 1st trimester for their first prenatal care rose significantly in the study area as compared to the control area (24% vs 13%). B) Effects on Interactive Linkage 1. This project made a contribution in making several useful steps in the direction of service integration, namely; i) The health workers have become familiar with procedures on how to work together with each other (especially with a midwife) in carrying out their work in FP/MCH and, ii) The health workers have gotten a feeling of the usefulness of family health records (statistical integration) in identifying targets in their own work and their usefulness in caring for family health. 2. On the other hand, because of a lack of required organizational factors, complete linkage was not obtained as the project intended. i) In regards to the government health worker's activities in terms of home visiting there was not much difference between the study & control areas though the MW did more home visiting than Government health workers. ii) In assessing the service performance of MW & health workers, the midwives balanced their workload between 40% FP, 40% MCH & 20% other activities (mainly immunization). However, $85{\sim}90%$ of the services provided by the health workers were other than FP/MCH, mainly for immunizations such as the encephalitis campaign. In the control area, a similar pattern was observed. Over 75% of their service was other than FP/MCH. Therefore, the pattern shows the health workers are a long way from becoming multipurpose workers even though the government is pushing in this direction. 3. Villagers were much more likely to visit the health sub-center clinic in the study area than in the control area (58% vs.31%) and for more combined care (45% vs.23%). C) Organization factors (admistrative integrative issues) 1. When MW (new workers with higher qualification) were introduced to HSC, it was noted that there were conflicts between the existing HSC workers (Nurse aids with less qualification than MW) and the MW for the beginning period of the project. The cause of the conflict was studied by an anthropologist and it was pointed out that these functional integration problems stemmed from the structural inadequacies of the health subcenter organization as indicated below; i) There is still no general consensus about the objectives and goals of the project between the project staff and the existing health workers. ii) There is no formal linkage between the responsibility of each member's job in the health sub-center. iii) There is still little chance for midwives to play a catalytic role or to establish communicative networks between workers in order to link various knowledge and skills to provide better FP/MCH services in the health sub-center. 2. Based on the above findings the project recommended to the County Chief (who has power to control the administrative staff and the technical staff in his county) the following ; i) In order to solve the conflicts between the individual roles and functions in performing health care activities, there must be goals agreed upon by both. ii) The health sub·center must function as an autonomous organization to undertake the integration health project. In order to do that, it is necessary to support administrative considerations, and to establish a communication system for supervision and to control of the health sub-centers. iii) The administrative organization, tentatively, must be organized to bind the health worker's midwive's and director's jobs by an organic relationship in order to achieve the integrative system under the leadership of health sub-center director. After submitting this observation report, there has been better understanding from frequent meetings & communication between HW/MW in FP/MCH work as the program developed. Lessons learned from the Seosan Project (on issues of FP/MCH integration in Korea); 1) A majority or about 80% of the couples are now practicing FP. As indicated by the study, there is a growing demand from clients for the health system to provide more MCH services than FP in order to maintain the achieved small size of family through FP practice. It is fortunate to see that the government is now formulating a MCH policy for the year 2,000 and revising MCH laws and regulations to emphasize more MCH care for achieving a small size family through family planning practice. 2) Goal consensus in FP/MCH shouBd be made among the health workers It administrators, especially to emphasize the need of care of 'wanted' child. But there is a long way to go to realize the 'real' integration of FP into MCH in Korea, unless there is a structural integration FP/MCH because a categorical FP is still first priority to reduce the rate of population growth for economic reasons but not yet for health/welfare reasons in practice. 3) There should be more financial allocation: (i) a midwife should be made available to help to promote the MCH program and coordinate services, (in) there should be a health sub·center director who can provide leadership training for managing the integrated program. There is a need for 'organizational support', if the decision of integration is made to obtain benefit from both FP & MCH. In other words, costs should be paid equally to both FP/MCH. The integration slogan itself, without the commitment of paying such costs, is powerless to advocate it. 4) Need of management training for middle level health personnel is more acute as the Government has already constructed 90 MCH centers attached to the County Health Center but without adequate manpower, facilities, and guidelines for integrating the work of both FP and MCH. 5) The local government still considers these MCH centers only as delivery centers to take care only of those visiting maternity cases. The MCH center should be a center for the managment of all pregnancies occurring in the community and the promotion of FP with a systematic and effective linkage of resources available in the county such as i.e. Village Health Worker, Community Health Practitioner, Health Sub-center Physicians & Health workers, Doctors and Midwives in MCH center, OBGY Specialists in clinics & hospitals as practiced by the Seosan project at primary health care level.

  • PDF

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

A Study on Web-based Technology Valuation System (웹기반 지능형 기술가치평가 시스템에 관한 연구)

  • Sung, Tae-Eung;Jun, Seung-Pyo;Kim, Sang-Gook;Park, Hyun-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.23-46
    • /
    • 2017
  • Although there have been cases of evaluating the value of specific companies or projects which have centralized on developed countries in North America and Europe from the early 2000s, the system and methodology for estimating the economic value of individual technologies or patents has been activated on and on. Of course, there exist several online systems that qualitatively evaluate the technology's grade or the patent rating of the technology to be evaluated, as in 'KTRS' of the KIBO and 'SMART 3.1' of the Korea Invention Promotion Association. However, a web-based technology valuation system, referred to as 'STAR-Value system' that calculates the quantitative values of the subject technology for various purposes such as business feasibility analysis, investment attraction, tax/litigation, etc., has been officially opened and recently spreading. In this study, we introduce the type of methodology and evaluation model, reference information supporting these theories, and how database associated are utilized, focusing various modules and frameworks embedded in STAR-Value system. In particular, there are six valuation methods, including the discounted cash flow method (DCF), which is a representative one based on the income approach that anticipates future economic income to be valued at present, and the relief-from-royalty method, which calculates the present value of royalties' where we consider the contribution of the subject technology towards the business value created as the royalty rate. We look at how models and related support information (technology life, corporate (business) financial information, discount rate, industrial technology factors, etc.) can be used and linked in a intelligent manner. Based on the classification of information such as International Patent Classification (IPC) or Korea Standard Industry Classification (KSIC) for technology to be evaluated, the STAR-Value system automatically returns meta data such as technology cycle time (TCT), sales growth rate and profitability data of similar company or industry sector, weighted average cost of capital (WACC), indices of industrial technology factors, etc., and apply adjustment factors to them, so that the result of technology value calculation has high reliability and objectivity. Furthermore, if the information on the potential market size of the target technology and the market share of the commercialization subject refers to data-driven information, or if the estimated value range of similar technologies by industry sector is provided from the evaluation cases which are already completed and accumulated in database, the STAR-Value is anticipated that it will enable to present highly accurate value range in real time by intelligently linking various support modules. Including the explanation of the various valuation models and relevant primary variables as presented in this paper, the STAR-Value system intends to utilize more systematically and in a data-driven way by supporting the optimal model selection guideline module, intelligent technology value range reasoning module, and similar company selection based market share prediction module, etc. In addition, the research on the development and intelligence of the web-based STAR-Value system is significant in that it widely spread the web-based system that can be used in the validation and application to practices of the theoretical feasibility of the technology valuation field, and it is expected that it could be utilized in various fields of technology commercialization.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

The Precision Test Based on States of Bone Mineral Density (골밀도 상태에 따른 검사자의 재현성 평가)

  • Yoo, Jae-Sook;Kim, Eun-Hye;Kim, Ho-Seong;Shin, Sang-Ki;Cho, Si-Man
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.13 no.1
    • /
    • pp.67-72
    • /
    • 2009
  • Purpose: ISCD (International Society for Clinical Densitometry) requests that users perform mandatory Precision test to raise their quality even though there is no recommendation about patient selection for the test. Thus, we investigated the effect on precision test by measuring reproducibility of 3 bone density groups (normal, osteopenia, osteoporosis). Materials and Methods: 4 users performed precision test with 420 patients (age: $57.8{\pm}9.02$) for BMD in Asan Medical Center (JAN-2008 ~ JUN-2008). In first group (A), 4 users selected 30 patient respectively regardless of bone density condition and measured 2 part (L-spine, femur) in twice. In second group (B), 4 users measured bone density of 10 patients respectively in the same manner of first group (A) users but dividing patient into 3 stages (normal, osteopenia, osteoporosis). In third group (C), 2 users measured 30 patients respectively in the same manner of first group (A) users considering bone density condition. We used GE Lunar Prodigy Advance (Encore. V11.4) and analyzed the result by comparing %CV to LSC using precision tool from ISCD. Check back was done using SPSS. Results: In group A, the %CV calculated by 4 users (a, b, c, d) were 1.16, 1.01, 1.19, 0.65 g/$cm^2$ in L-spine and 0.69, 0.58, 0.97, 0.47 g/$cm^2$ in femur. In group B, the %CV calculated by 4 users (a, b, c, d) were 1.01, 1.19, 0.83, 1.37 g/$cm^2$ in L-spine and 1.03, 0.54, 0.69, 0.58 g/$cm^2$ in femur. When comparing results (group A, B), we found no considerable differences. In group C, the user_1's %CV of normal, osteopenia and osteoporosis were 1.26, 0.94, 0.94 g/$cm^2$ in L-spine and 0.94, 0.79, 1.01 g/$cm^2$ in femur. And the user_2's %CV were 0.97, 0.83, 0.72 g/$cm^2$ L-spine and 0.65, 0.65, 1.05 g/$cm^2$ in femur. When analyzing the result, we figured out that the difference of reproducibility was almost not found but the differences of two users' several result values have effect on total reproducibility. Conclusions: Precision test is a important factor of bone density follow up. When Machine and user's reproducibility is getting better, it’s useful in clinics because of low range of deviation. Users have to check machine's reproducibility before the test and keep the same mind doing BMD test for patient. In precision test, the difference of measured value is usually found for ROI change caused by patient position. In case of osteoporosis patient, there is difficult to make initial ROI accurately more than normal and osteopenia patient due to lack of bone recognition even though ROI is made automatically by computer software. However, initial ROI is very important and users have to make coherent ROI because we use ROI Copy function in a follow up. In this study, we performed precision test considering bone density condition and found LSC value was stayed within 3%. There was no considerable difference. Thus, patient selection could be done regardless of bone density condition.

  • PDF

Utility of Wide Beam Reconstruction in Whole Body Bone Scan (전신 뼈 검사에서 Wide Beam Reconstruction 기법의 유용성)

  • Kim, Jung-Yul;Kang, Chung-Koo;Park, Min-Soo;Park, Hoon-Hee;Lim, Han-Sang;Kim, Jae-Sam;Lee, Chang-Ho
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.1
    • /
    • pp.83-89
    • /
    • 2010
  • Purpose: The Wide Beam Reconstruction (WBR) algorithms that UltraSPECT, Ltd. (U.S) has provides solutions which improved image resolution by eliminating the effect of the line spread function by collimator and suppression of the noise. It controls the resolution and noise level automatically and yields unsurpassed image quality. The aim of this study is WBR of whole body bone scan in usefulness of clinical application. Materials and Methods: The standard line source and single photon emission computed tomography (SPECT) reconstructed spatial resolution measurements were performed on an INFINA (GE, Milwaukee, WI) gamma camera, equipped with low energy high resolution (LEHR) collimators. The total counts of line source measurements with 200 kcps and 300 kcps. The SPECT phantoms analyzed spatial resolution by the changing matrix size. Also a clinical evaluation study was performed with forty three patients, referred for bone scans. First group altered scan speed with 20 and 30 cm/min and dosage of 740 MBq (20 mCi) of $^{99m}Tc$-HDP administered but second group altered dosage of $^{99m}Tc$-HDP with 740 and 1,110 MBq (20 mCi and 30 mCi) in same scan speed. The acquired data was reconstructed using the typical clinical protocol in use and the WBR protocol. The patient's information was removed and a blind reading was done on each reconstruction method. For each reading, a questionnaire was completed in which the reader was asked to evaluate, on a scale of 1-5 point. Results: The result of planar WBR data improved resolution more than 10%. The Full-Width at Half-Maximum (FWHM) of WBR data improved about 16% (Standard: 8.45, WBR: 7.09). SPECT WBR data improved resolution more than about 50% and evaluate FWHM of WBR data (Standard: 3.52, WBR: 1.65). A clinical evaluation study, there was no statistically significant difference between the two method, which includes improvement of the bone to soft tissue ratio and the image resolution (first group p=0.07, second group p=0.458). Conclusion: The WBR method allows to shorten the acquisition time of bone scans while simultaneously providing improved image quality and to reduce the dosage of radiopharmaceuticals reducing radiation dose. Therefore, the WBR method can be applied to a wide range of clinical applications to provide clinical values as well as image quality.

  • PDF