• Title/Summary/Keyword: 누락데이터

Search Result 166, Processing Time 0.022 seconds

A Performance Comparison of Land-Based Floating Debris Detection Based on Deep Learning and Its Field Applications (딥러닝 기반 육상기인 부유쓰레기 탐지 모델 성능 비교 및 현장 적용성 평가)

  • Suho Bak;Seon Woong Jang;Heung-Min Kim;Tak-Young Kim;Geon Hui Ye
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.2
    • /
    • pp.193-205
    • /
    • 2023
  • A large amount of floating debris from land-based sources during heavy rainfall has negative social, economic, and environmental impacts, but there is a lack of monitoring systems for floating debris accumulation areas and amounts. With the recent development of artificial intelligence technology, there is a need to quickly and efficiently study large areas of water systems using drone imagery and deep learning-based object detection models. In this study, we acquired various images as well as drone images and trained with You Only Look Once (YOLO)v5s and the recently developed YOLO7 and YOLOv8s to compare the performance of each model to propose an efficient detection technique for land-based floating debris. The qualitative performance evaluation of each model showed that all three models are good at detecting floating debris under normal circumstances, but the YOLOv8s model missed or duplicated objects when the image was overexposed or the water surface was highly reflective of sunlight. The quantitative performance evaluation showed that YOLOv7 had the best performance with a mean Average Precision (intersection over union, IoU 0.5) of 0.940, which was better than YOLOv5s (0.922) and YOLOv8s (0.922). As a result of generating distortion in the color and high-frequency components to compare the performance of models according to data quality, the performance degradation of the YOLOv8s model was the most obvious, and the YOLOv7 model showed the lowest performance degradation. This study confirms that the YOLOv7 model is more robust than the YOLOv5s and YOLOv8s models in detecting land-based floating debris. The deep learning-based floating debris detection technique proposed in this study can identify the spatial distribution of floating debris by category, which can contribute to the planning of future cleanup work.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

A Study on the Improvement of Recommendation Accuracy by Using Category Association Rule Mining (카테고리 연관 규칙 마이닝을 활용한 추천 정확도 향상 기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.27-42
    • /
    • 2020
  • Traditional companies with offline stores were unable to secure large display space due to the problems of cost. This limitation inevitably allowed limited kinds of products to be displayed on the shelves, which resulted in consumers being deprived of the opportunity to experience various items. Taking advantage of the virtual space called the Internet, online shopping goes beyond the limits of limitations in physical space of offline shopping and is now able to display numerous products on web pages that can satisfy consumers with a variety of needs. Paradoxically, however, this can also cause consumers to experience the difficulty of comparing and evaluating too many alternatives in their purchase decision-making process. As an effort to address this side effect, various kinds of consumer's purchase decision support systems have been studied, such as keyword-based item search service and recommender systems. These systems can reduce search time for items, prevent consumer from leaving while browsing, and contribute to the seller's increased sales. Among those systems, recommender systems based on association rule mining techniques can effectively detect interrelated products from transaction data such as orders. The association between products obtained by statistical analysis provides clues to predicting how interested consumers will be in another product. However, since its algorithm is based on the number of transactions, products not sold enough so far in the early days of launch may not be included in the list of recommendations even though they are highly likely to be sold. Such missing items may not have sufficient opportunities to be exposed to consumers to record sufficient sales, and then fall into a vicious cycle of a vicious cycle of declining sales and omission in the recommendation list. This situation is an inevitable outcome in situations in which recommendations are made based on past transaction histories, rather than on determining potential future sales possibilities. This study started with the idea that reflecting the means by which this potential possibility can be identified indirectly would help to select highly recommended products. In the light of the fact that the attributes of a product affect the consumer's purchasing decisions, this study was conducted to reflect them in the recommender systems. In other words, consumers who visit a product page have shown interest in the attributes of the product and would be also interested in other products with the same attributes. On such assumption, based on these attributes, the recommender system can select recommended products that can show a higher acceptance rate. Given that a category is one of the main attributes of a product, it can be a good indicator of not only direct associations between two items but also potential associations that have yet to be revealed. Based on this idea, the study devised a recommender system that reflects not only associations between products but also categories. Through regression analysis, two kinds of associations were combined to form a model that could predict the hit rate of recommendation. To evaluate the performance of the proposed model, another regression model was also developed based only on associations between products. Comparative experiments were designed to be similar to the environment in which products are actually recommended in online shopping malls. First, the association rules for all possible combinations of antecedent and consequent items were generated from the order data. Then, hit rates for each of the associated rules were predicted from the support and confidence that are calculated by each of the models. The comparative experiments using order data collected from an online shopping mall show that the recommendation accuracy can be improved by further reflecting not only the association between products but also categories in the recommendation of related products. The proposed model showed a 2 to 3 percent improvement in hit rates compared to the existing model. From a practical point of view, it is expected to have a positive effect on improving consumers' purchasing satisfaction and increasing sellers' sales.

Evaluation on the Immunization Module of Non-chart System in Private Clinic for Development of Internet Information System of National Immunization Programme m Korea (국가 예방접종 인터넷정보시스템 개발을 위한 의원정보시스템의 예방접종 모듈 평가연구)

  • Lee, Moo-Sik;Lee, Kun-Sei;Lee, Seok-Gu;Shin, Eui-Chul;Kim, Keon-Yeop;Na, Bak-Ju;Hong, Jee-Young;Kim, Yun-Jeong;Park, Sook-Kyung;Kim, Bo-Kyung;Kwon, Yun-Hyung;Kim, Young-Taek
    • Journal of agricultural medicine and community health
    • /
    • v.29 no.1
    • /
    • pp.65-75
    • /
    • 2004
  • Objectives: Immunizations have been one of the most effective measures preventing from infectious diseases. It is quite important national infectious disease prevention policy to keep the immunizations rate high and monitor the immunizations rate continuously. To do this, Korean CDC introduced the National Immunization Registry Program(NIRP) which has been implementing since 2000 at the Public Health Centers(PHC). The National Immunization Registry Program will be near completed after sharing, connecting and transfering vaccination data between public and private sector. The aims of this study was to evaluate the immunization module of non-chart system in private clinic with health information system of public health center(made by POSDATA Co., LTD) and immunization registry program(made by BIT Computer Co., LTD). Methods: The analysis and survey were done by specialists in medical, health field, and health information fields from 2001. November to 2002. January. We made the analysis and recommendation about the immunization module of non-chart system in private clinic. Results and Conclusions: To make improvement on immunization module, the system will be revised on various function like receipt and registration, preliminary medical examination, reference and inquiry, registration of vaccine, print-out various sheet, function of transfer vaccination data, issue function of vaccination certification, function of reminder and recall, function of statistical calculation, and management of vaccine stock. There are needs of an accurate assessment of current immunization module on each private non-chart system. And further studies will be necessary to make it an accurate system under changing health policy related national immunization program. We hope that the result of this study may contribute to establish the National Immunization Registry Program.

  • PDF

Economic Sanction and DPRK Trade - Estimating the Impact of Japan's Sanction in the 2000s - (대북 경제제재와 북한무역 - 2000년대 일본 대북제재의 영향력 추정 -)

  • Lee, Suk
    • KDI Journal of Economic Policy
    • /
    • v.32 no.2
    • /
    • pp.93-143
    • /
    • 2010
  • This paper estimates the impact of Japan's economic sanction on DPRK trade in the 2000s. It conceptualizes the effects of sanction on DPRK trade, econometrically tests whether such effects exist in case of Japan's sanction using currently available DPRK trade statistics, and measures the size of the effects by correcting and reconfiguring the deficiencies of the currently available DPRK trade statistics. The main findings of the paper are as follows. First, Japan's sanction can have two different effects on DPRK trade: 'Sanction Country Effect' and "Third Country Effect.' The former means that the sanction diminishes DPRK trade with Japan while the latter refers to the effects on DPRK trade with other countries as well. The third country effect can arise not simply because the DPRK changes its trade routes to circumvent the sanction, but because the sanction forces the DPRK to readjust its major trade items and patterns. Second, currently no official DPRK trade statistics are available. Thus, the so-called mirror data referring to DPRK trading partners' statistics should be employed for the analysis of the sanction effects. However, all currently available mirror data suffer from three fundamental problems: 1) they may omit the real trade partners of the DPRK; 2) they may confuse ROK trade with DPRK trade; 3) they cannot distinguish non-commercial trade from commercial trade, whereas only the latter concerns Japan's sanction. Considering those problems, we have to adopt the following method in order to reach a reasonable conclusion about the sanction effect. That is, we should repeat the same analysis using all different mirror data currently available, which include KOTRA, IMF and UN Commodity Trade Statistics, and then discuss only the common results from them. Third, currently available mirror data make the following points. 1) DPRK trade is well explained by the gravity model. 2) Japan's sanction has not only the sanction country effect but also the third country effect on DPRK trade. 3) The third country effect occurs differently on DPRK export and import. In case of export, the mirror statistics reveal positive (+) third country effects on all of the major trade partners of the DPRK, including South Korea, China and Thailand. However, on DPRK import, such third country effects are not statistically significant even for South Korea and China. 4) This suggests that Japan's sanction has greater effects on DPRK import rather than its export. Fourth, as far as DPRK export is concerned, it is possible to resolve the abovementioned fundamental problems of mirror data and thus reconstruct more accurate statistics on DPRK trade. Those reconstructed statistics lead us to following conclusions. 1) Japan's economic sanction diminished DPRK's export to Japan from 2004 to 2006 by 103 million dollars on annual average (Sanction Country Effect). It comprises around 60 percent of DPRK's export to Japan in 2003. 2) However, for the same period, the DPRK diverted its exports to other countries to cope up with Japan's sanction, and as a result its export to other countries increased by 85 million dollars on annual average (Third Country Effect). 3) This means that more than 80 per cent of the sanction country effect was made up for by the third country effect. And the actual size of impact that Japan's sanction made on DPRK export in total was merely 30 million dollars on annual average. 4) The third country effect occurred mostly in inter-Korean trade. In fact, Japan's sanction increased DPRK export to the ROK by 72 million dollars on annual average. In contrast, there was no statistically significant increase in DPRK export to China caused by Japan's sanction. 5) It means that the DPRK confronted Japan's sanction and mitigated its impact primarily by using inter-Korean trade and thus the ROK. Fifth, two things should be noted concerning the fourth results above. 1) The results capture the third country effect caused only by trade transfer. Facing Japan's sanction, the DPRK could transfer its existing trade with Japan to other countries. Also it could change its main export items and increase the export of those new items to other countries as mentioned in the first result. However, the fourth results above reflect only the former, not the latter. 2) Although Japan's sanction did not make a huge impact on DPRK export, it might not be necessarily true for DPRK import. Indeed the currently available mirror statistics suggest that Japan's sanction has greater effects on DPRK import. Hence it would not be wise to argue that Japan's sanction did not have much impact on DPRK trade in general, simply using the fourth result above.

  • PDF

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.