• Title/Summary/Keyword: 정렬

Search Result 2,797, Processing Time 0.028 seconds

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

The study of MDCT of Radiation dose in the department of Radiology of general hospitals in the local area (일 지역 종합병원 영상의학과 MDCT선량에 대한 연구)

  • Shin, Jung-Sub
    • Journal of the Korean Society of Radiology
    • /
    • v.6 no.4
    • /
    • pp.281-290
    • /
    • 2012
  • The difference of radiation dose of MDCT due to different protocols between hospitals was analyzed by CTDI, DLP, the number of Slice and the number of DLP/Slice in 30 cases of the head, the abdomen and the chest that have 10 cases each from MDCT examination of the department of diagnostic imaging of three general hospitals in Gyeongsangbuk-do. The difference of image quality, CTDI, DLP, radiation dose in the eye and radiation dose in thyroid was analyzed after both helical scan and normal scan for head CT were performed because a protocol of head CT is relatively simple and head CT is the most frequent case. Head CT was significantly higher in two-thirds of hospitals compared to A hospital that does not exceed a CTDI diagnostic reference level (IAEA 50mGy, Korea 60mGy) (p<0.001). DLP was higher in one-third of hospitals than a diagnostic reference level of IAEA 1,050mGy.cm and Korea 1,000mGy.cm and two-thirds exceeded the recommendation of Korea and those were significantly higher than A hospital that does not exceed a diagnostic reference level (p<0.001). Abdomen CT showed 119mGy that was higher than a diagnostic reference level of IAEA 25mGy and Korea 20mGy in one-third. DLP in all hospitals was higher that Korea recommendation of 700mGy.cm. Among target hospitals, C hospital showed high radiation dose in all tests because MPR and 3D were of great importance due to low pitch and high Tube Curren. To analyze the difference of radiation dose by scan methods, normal scan and helical scan for head CT of the same patient were performed. In the result, CTDI and DLP of helical CT were higher 63.4% and 93.7% than normal scan (p<0.05, p<0.01). However, normal scan of radiation dose in thyroid was higher 87.26% (p<0.01). Beam of helical CT looked like a bell in the deep part and the marginal part so thyroid was exposed with low radiation dose deviated from central beam. In addition, helical scan used Gantry angle perpendicularly and normal scan used it parallel to the orbitomeatal line. Therefore, radiation dose in thyroid decreased in helical scan. However, a protocol in this study showed higher radiation dose than diagnostic reference level of KFDA. To obey the recommendation of KFDA, low Tube Curren and high pitch were demanded. In this study, the difference of image quality between normal scan and helical scan was not significant. Therefore, a standardized protocol of normal scan was generally used and protective gear for thyroid was needed except a special case. We studied a part of CT cases in the local area. Therefore, the result could not represent the entire cases. However, we confirmed that patient's radiation dose in some cases exceeded the recommendation and the deviation between hospitals was observed. To improve this issue, doctors of diagnostic imaging or technologists of radiology should perform CT by the optimized protocol to decrease a level of CT radiation and also reveal radiation dose for the right to know of patients. However, they had little understanding of the situation. Therefore, the effort of relevant agencies with education program for CT radiation dose, release of radiation dose from CT examination and addition of radiation dose control and open CT contents into evaluation for hospital services and certification, and also the effort of health professionals with the best protocol to realize optimized CT examination.

Analysis of Respiratory Motional Effect on the Cone-beam CT Image (Cone-beam CT 영상 획득 시 호흡에 의한 영향 분석)

  • Song, Ju-Young;Nah, Byung-Sik;Chung, Woong-Ki;Ahn, Sung-Ja;Nam, Taek-Keun;Yoon, Mi-Sun
    • Progress in Medical Physics
    • /
    • v.18 no.2
    • /
    • pp.81-86
    • /
    • 2007
  • The cone-beam CT (CBCT) which is acquired using on-board imager (OBI) attached to a linear accelerator is widely used for the image guided radiation therapy. In this study, the effect of respiratory motion on the quality of CBCT image was evaluated. A phantom system was constructed in order to simulate respiratory motion. One part of the system is composed of a moving plate and a motor driving component which can control the motional cycle and motional range. The other part is solid water phantom containing a small cubic phantom ($2{\times}2{\times}2cm^3$) surrounded by air which simulate a small tumor volume in the lung air cavity CBCT images of the phantom were acquired in 20 different cases and compared with the image in the static status. The 20 different cases are constituted with 4 different motional ranges (0.7 cm, 1.6 cm, 2.4 cm, 3.1 cm) and 5 different motional cycles (2, 3, 4, 5, 6 sec). The difference of CT number in the coronal image was evaluated as a deformation degree of image quality. The relative average pixel intensity values as a compared CT number of static CBCT image were 71.07% at 0.7 cm motional range, 48.88% at 1.6 cm motional range, 30.60% at 2.4 cm motional range, 17.38% at 3.1 cm motional range The tumor phantom sizes which were defined as the length with different CT number compared with air were increased as the increase of motional range (2.1 cm: no motion, 2.66 cm: 0.7 cm motion, 3.06 cm: 1.6 cm motion, 3.62 cm: 2.4 cm motion, 4.04 cm: 3.1 cm motion). This study shows that respiratory motion in the region of inhomogeneous structures can degrade the image quality of CBCT and it must be considered in the process of setup error correction using CBCT images.

  • PDF

A Study on the Availability of the On-Board Imager(OBI) and Cone-Beam CT(CBCT) in the Verification of Patient Set-up (온보드 영상장치(On-Board Imager) 및 콘빔CT(CBCT)를 이용한 환자 자세 검증의 유용성에 대한 연구)

  • Bak, Jino;Park, Sung-Ho;Park, Suk-Won
    • Radiation Oncology Journal
    • /
    • v.26 no.2
    • /
    • pp.118-125
    • /
    • 2008
  • Purpose: On-line image guided radiation therapy(on-line IGRT) and(kV X-ray images or cone beam CT images) were obtained by an on-board imager(OBI) and cone beam CT(CBCT), respectively. The images were then compared with simulated images to evaluate the patient's setup and correct for deviations. The setup deviations between the simulated images(kV or CBCT images), were computed from 2D/2D match or 3D/3D match programs, respectively. We then investigated the correctness of the calculated deviations. Materials and Methods: After the simulation and treatment planning for the RANDO phantom, the phantom was positioned on the treatment table. The phantom setup process was performed with side wall lasers which standardized treatment setup of the phantom with the simulated images, after the establishment of tolerance limits for laser line thickness. After a known translation or rotation angle was applied to the phantom, the kV X-ray images and CBCT images were obtained. Next, 2D/2D match and 3D/3D match with simulation CT images were taken. Lastly, the results were analyzed for accuracy of positional correction. Results: In the case of the 2D/2D match using kV X-ray and simulation images, a setup correction within $0.06^{\circ}$ for rotation only, 1.8 mm for translation only, and 2.1 mm and $0.3^{\circ}$ for both rotation and translation, respectively, was possible. As for the 3D/3D match using CBCT images, a correction within $0.03^{\circ}$ for rotation only, 0.16 mm for translation only, and 1.5 mm for translation and $0.0^{\circ}$ for rotation, respectively, was possible. Conclusion: The use of OBI or CBCT for the on-line IGRT provides the ability to exactly reproduce the simulated images in the setup of a patient in the treatment room. The fast detection and correction of a patient's positional error is possible in two dimensions via kV X-ray images from OBI and in three dimensions via CBCT with a higher accuracy. Consequently, the on-line IGRT represents a promising and reliable treatment procedure.

A Study on the Amino Acid Components Soil Humus Composition (토양부식산(土壤腐植酸)의 형태별(形態別) Amino 산(酸) 함량(含量)에 관(關)한 연구(硏究))

  • Kim, Jeong-Je;Lee, Wi-Young
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.21 no.3
    • /
    • pp.254-263
    • /
    • 1988
  • Contents and distribution of amino acids in the humic acid and fulvic acid fractions of different types ($R_p$, B, A, P) were investigated. Extracted humic and fulvic acids were purified and analyzed. The results are summarized as the following: (1) Composition of Humus The total humus ($H_T$), amount of humic acid (a), amount of fulvic acid (b), and ${\Delta}logK$ all decrease in the order of $R_p$ > B > A > P type. The same trend was observed in the total nitrogen and carbon. (2) Contents and composition of amino acids in humic acids. 1) The total amounts of amino acids in the humic acid fraction of different types were in the following order for soils under coniferous forest trees: $R_p$ > B > A > P type, but for soils under deciduous forest trees the order was P > A > $R_p$ > B type. There were positive correlationships between total amino acids and total carbon and ${\Delta}logK$ for humic acids from soils under coniferous forest trees, but a negative correlationship was existed. between total amino acids and C/N ratios. No significant correlation was found for samples taken from soils under deciduous forest trees. 2) The ratios of one group of amino acids to the others were compared. The ratios of acidic amino acids were in the order of P > $R_p$ > B > A type. those of neutral amino acids followed the order of $R_p$ > B > A > P type and those of the basic amino acids were in the order of B > A >$R_p$ > P type for soils under coniferous forest trees. Contents of total amino acids were in the order of the neutral > the acidic > the basic amino acids. For the soils under deciduous forest trees the order of the ratio was different. Acidic amino acids followed the order of A > P > B > $R_p$ type, neutral ones followed the order of P > $R_p$ > A > B type, and the basic amino acids did the order of $$P{\geq_-}$$ A > B $$\geq_-$$ $-R_p$ type where the difference was very small. 3) In general aspartic aicd, glycine and glutamic acid were the major components in all samples. Histidine, tyrosine and methionine belonged to the group contained in a small amount. (3) Contents and composition of amino acids in fulvic acids. 1) The total amounts of amino acids of different types of fulvic acids were in the order of $R_p$ > B > P > A type regardless of origin of samples. There were positive correlationships observed between the toal amino acids and total carbon and ${\Delta}logK$ for soils under coniferous forest trees. For soils under deciduous forest trees, positive correlationships were observed among total amino aicds, total nitrogen, total humus ($H_T$), total humic aicd (a), and ${\Delta}logK$, but a negative correlationship existed between total amino acids and C/N ratio. 2) Thr ratio among acidic amino acids, neutral amino acids and basic amino acids of different types were $R_p$ > B > P > A type. In this respect there was no difference between the two soils. 3) In general glycine, aspartic acid, and alanine were the major constituents in all samples of different types, while tyrosine and methionine were contained in a small amount. Virtually no amount of arginine was measured.

  • PDF

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Development of a complex failure prediction system using Hierarchical Attention Network (Hierarchical Attention Network를 이용한 복합 장애 발생 예측 시스템 개발)

  • Park, Youngchan;An, Sangjun;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.127-148
    • /
    • 2020
  • The data center is a physical environment facility for accommodating computer systems and related components, and is an essential foundation technology for next-generation core industries such as big data, smart factories, wearables, and smart homes. In particular, with the growth of cloud computing, the proportional expansion of the data center infrastructure is inevitable. Monitoring the health of these data center facilities is a way to maintain and manage the system and prevent failure. If a failure occurs in some elements of the facility, it may affect not only the relevant equipment but also other connected equipment, and may cause enormous damage. In particular, IT facilities are irregular due to interdependence and it is difficult to know the cause. In the previous study predicting failure in data center, failure was predicted by looking at a single server as a single state without assuming that the devices were mixed. Therefore, in this study, data center failures were classified into failures occurring inside the server (Outage A) and failures occurring outside the server (Outage B), and focused on analyzing complex failures occurring within the server. Server external failures include power, cooling, user errors, etc. Since such failures can be prevented in the early stages of data center facility construction, various solutions are being developed. On the other hand, the cause of the failure occurring in the server is difficult to determine, and adequate prevention has not yet been achieved. In particular, this is the reason why server failures do not occur singularly, cause other server failures, or receive something that causes failures from other servers. In other words, while the existing studies assumed that it was a single server that did not affect the servers and analyzed the failure, in this study, the failure occurred on the assumption that it had an effect between servers. In order to define the complex failure situation in the data center, failure history data for each equipment existing in the data center was used. There are four major failures considered in this study: Network Node Down, Server Down, Windows Activation Services Down, and Database Management System Service Down. The failures that occur for each device are sorted in chronological order, and when a failure occurs in a specific equipment, if a failure occurs in a specific equipment within 5 minutes from the time of occurrence, it is defined that the failure occurs simultaneously. After configuring the sequence for the devices that have failed at the same time, 5 devices that frequently occur simultaneously within the configured sequence were selected, and the case where the selected devices failed at the same time was confirmed through visualization. Since the server resource information collected for failure analysis is in units of time series and has flow, we used Long Short-term Memory (LSTM), a deep learning algorithm that can predict the next state through the previous state. In addition, unlike a single server, the Hierarchical Attention Network deep learning model structure was used in consideration of the fact that the level of multiple failures for each server is different. This algorithm is a method of increasing the prediction accuracy by giving weight to the server as the impact on the failure increases. The study began with defining the type of failure and selecting the analysis target. In the first experiment, the same collected data was assumed as a single server state and a multiple server state, and compared and analyzed. The second experiment improved the prediction accuracy in the case of a complex server by optimizing each server threshold. In the first experiment, which assumed each of a single server and multiple servers, in the case of a single server, it was predicted that three of the five servers did not have a failure even though the actual failure occurred. However, assuming multiple servers, all five servers were predicted to have failed. As a result of the experiment, the hypothesis that there is an effect between servers is proven. As a result of this study, it was confirmed that the prediction performance was superior when the multiple servers were assumed than when the single server was assumed. In particular, applying the Hierarchical Attention Network algorithm, assuming that the effects of each server will be different, played a role in improving the analysis effect. In addition, by applying a different threshold for each server, the prediction accuracy could be improved. This study showed that failures that are difficult to determine the cause can be predicted through historical data, and a model that can predict failures occurring in servers in data centers is presented. It is expected that the occurrence of disability can be prevented in advance using the results of this study.