• Title/Summary/Keyword: Similarity calculation

Search Result 208, Processing Time 0.022 seconds

A Fast Digital Elevation Model Extraction Algorithm Using Gradient Correlation (Gradient Correlation을 이용한 고속 수치지형표고 모델 추출 방법)

  • Chul Soo Ye;Byung Min Jeon;Kwae Hi Lee
    • Korean Journal of Remote Sensing
    • /
    • v.14 no.3
    • /
    • pp.250-261
    • /
    • 1998
  • The purpose of this paper is to extract fast DEM (Digital Elevation Model) using satellite images. DEM extraction consists of three parts. First part is the modeling of satellite position and attitude, second part is the matching of two images to find corresponding points of them and third part is to calculate the elevation of each point by using the results of the first and second part. The position and attitude modeling of satellite is processed by using GCPs. A area based matching method is used to find corresponding points between the stereo satellite images. The elevation of each point is calculated using the exterior orientation parameters obtained from modeling and conjugate points from matching. In the DEM generation system, matching procedure holds most of a processing time, therefore to reduce the time for matching, a new fast matching algorithm using gradient correlation and fast similarity measure calculation method is proposed. In this paper, the SPOT satellite images, level 1A 6000$\times$6000 panchromatic images are used to extract DEM. The experiment result shows the possibility of fast DEM extraction with the satellite images.

Algorithm to Search for the Original Song from a Cover Song Using Inflection Points of the Melody Line (멜로디 라인의 변곡점을 활용한 커버곡의 원곡 검색 알고리즘)

  • Lee, Bo Hyun;Kim, Myung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.5
    • /
    • pp.195-200
    • /
    • 2021
  • Due to the development of video sharing platforms, the amount of video uploads is exploding. Such videos often include various types of music, among which cover songs are included. In order to protect the copyright of music, an algorithm to find the original song of the cover song is essential. However, it is not easy to find the original song because the cover song is a modification of the composition, speed and overall structure of the original song. So far, there is no known effective algorithm for searching the original song of the cover song. In this paper, we propose an algorithm for searching the original song of the cover song using the inflection points of the melody line. Inflection points represent the characteristic points of change in the melody sequence. The proposed algorithm compares the original song and the cover song using the sequence of inflection points for the representative phrase of the original song. Since the characteristics of the representative phrase are used, even if the cover song is a song made by modifying the overall composition of the song, the algorithm's search performance is excellent. Also, since the proposed algorithm uses only the features of the inflection point sequence, the memory usage is very low. The efficiency of the algorithm was verified through performance evaluation.

A Study of Computational Literature Analysis based Classification for a Pairwise Comparison by Contents Similarity in a section of Tokkijeon, 'Fish Tribe Conference' (컴퓨터 문헌 분석 기반의 토끼전 '어족회의' 대목 내용 유사도에 따른 이본 계통 분류 연구)

  • Kim, Dong-Keon;Jeong, Hwa-Young
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.15-25
    • /
    • 2022
  • This study aims to identify the family and lineage of a part of a "Fish Tribe Conference" in the section Tokkijeon by utilizing computer literature analysis techniques. First of all, we encode the classification for a pairwise comparison's type of each paragraph to build a corpus, and based on this, we use the Hamming distance to calculate the distance matrix between each classification for a pairwise comparison's. We visualized classification for a pairwise comparison's clustering pattern by applying multidimensional scale method, and hierarchical clustering to explore the characteristics of the 'fish family' line and lineage compared to the existing cluster analysis study on entire paragraphs of "Tokkijeon". As a result, unlike the cluster analysis of the entire paragraph of "Tokkijeon", which consists of six categories, the "Fish Tribe Conference" section has five categories and some classification for a pairwise comparison's accesses. The results of this study are that the relative distance between Yibon was measured and systematic classification was performed in an objective and empirical way by calculation, and the characteristics of the line of the fish family were revealed compared to the analysis of the entire rabbit exhibition.

Research on the Development of Distance Metrics for the Clustering of Vessel Trajectories in Korean Coastal Waters (국내 연안 해역 선박 항적 군집화를 위한 항적 간 거리 척도 개발 연구)

  • Seungju Lee;Wonhee Lee;Ji Hong Min;Deuk Jae Cho;Hyunwoo Park
    • Journal of Navigation and Port Research
    • /
    • v.47 no.6
    • /
    • pp.367-375
    • /
    • 2023
  • This study developed a new distance metric for vessel trajectories, applicable to marine traffic control services in the Korean coastal waters. The proposed metric is designed through the weighted summation of the traditional Hausdorff distance, which measures the similarity between spatiotemporal data and incorporates the differences in the average Speed Over Ground (SOG) and the variance in Course Over Ground (COG) between two trajectories. To validate the effectiveness of this new metric, a comparative analysis was conducted using the actual Automatic Identification System (AIS) trajectory data, in conjunction with an agglomerative clustering algorithm. Data visualizations were used to confirm that the results of trajectory clustering, with the new metric, reflect geographical distances and the distribution of vessel behavioral characteristics more accurately, than conventional metrics such as the Hausdorff distance and Dynamic Time Warping distance. Quantitatively, based on the Davies-Bouldin index, the clustering results were found to be superior or comparable and demonstrated exceptional efficiency in computational distance calculation.

Metadata extraction using AI and advanced metadata research for web services (AI를 활용한 메타데이터 추출 및 웹서비스용 메타데이터 고도화 연구)

  • Sung Hwan Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.499-503
    • /
    • 2024
  • Broadcasting programs are provided to various media such as Internet replay, OTT, and IPTV services as well as self-broadcasting. In this case, it is very important to provide keywords for search that represent the characteristics of the content well. Broadcasters mainly use the method of manually entering key keywords in the production process and the archive process. This method is insufficient in terms of quantity to secure core metadata, and also reveals limitations in recommending and using content in other media services. This study supports securing a large number of metadata by utilizing closed caption data pre-archived through the DTV closed captioning server developed in EBS. First, core metadata was automatically extracted by applying Google's natural language AI technology. The next step is to propose a method of finding core metadata by reflecting priorities and content characteristics as core research contents. As a technology to obtain differentiated metadata weights, the importance was classified by applying the TF-IDF calculation method. Successful weight data were obtained as a result of the experiment. The string metadata obtained by this study, when combined with future string similarity measurement studies, becomes the basis for securing sophisticated content recommendation metadata from content services provided to other media.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Ecological Changes of Insect-damaged Pinus densiflora Stands in the Southern Temperate Forest Zone of Korea (I) (솔잎혹파리 피해적송림(被害赤松林)의 생태학적(生態学的) 연구(研究) (I))

  • Yim, Kyong Bin;Lee, Kyong Jae;Kim, Yong Shik
    • Journal of Korean Society of Forest Science
    • /
    • v.52 no.1
    • /
    • pp.58-71
    • /
    • 1981
  • Thecodiplosis japonesis is sweeping the Pinus densiflora forests from south-west to north-east direction, destroying almost all the aged large trees as well as even the young ones. The front line of infestation is moving slowly but ceaselessly norhwards as a long bottle front. Estimation is that more than 40 percent of the area of P. densiflora forest has been damaged already, however some individuals could escapes from the damage and contribute to restore the site to the previous vegetation composition. When the stands were attacked by this insect, the drastic openings of the upper story of tree canopy formed by exclusively P. densiflora are usually resulted and some environmental factors such as light, temperature, litter accumulation, soil moisture and offers were naturally modified. With these changes after insect invasion, as the time passes, phytosociologic changes of the vegetation are gradually proceeding. If we select the forest according to four categories concerning the history of the insect outbreak, namely, non-attacked (healthy forest), recently damaged (the outbreak occured about 1-2 years ago), severely damaged (occured 5-6 years ago), damage prolonged (occured 10 years ago) and restored (occured about 20 years ago), any directional changes of vegetation composition could be traced these in line with four progressive stages. To elucidate these changes, three survey districts; (1) "Gongju" where the damage was severe and it was outbroken in 1977, (2) "Buyeo" where damage prolonged and (3) "Gochang" as restored, were set, (See Tab. 1). All these were located in the south temperate forest zone which was delimited mainly due to the temporature factor and generally accepted without any opposition at present. In view of temperature, the amount and distribution of precipitation and various soil factor, the overall homogeneity of environmental conditions between survey districts might be accepted. However this did not mean that small changes of edaphic and topographic conditions and microclimates can induce any alteration of vegetation patterns. Again four survey plots were set in each district and inter plot distance was 3 to 4 km. And again four subplots were set within a survey plot. The size of a subplot was $10m{\times}10m$ for woody vegetation and $5m{\times}5m$ for ground cover vegetation which was less than 2 m high. The nested quadrat method was adopted. In sampling survey plots, the followings were taken into account: (1) Natural growth having more than 80 percent of crown density of upper canopy and more than 5 hectares of area. (2) Was not affected by both natural and artificial disturbances such as fire and thinning operation for the past three decades. (3) Lower than 500 m of altitude (4) Less than 20 degrees of slope, and (5) Northerly sited aspect. An intensive vegetation survey was undertaken during the summer of 1980. The vegetation was devided into 3 categories for sampling; the upper layer (dominated mainly by the pine trees), the middle layer composed by oak species and other broad-leaved trees as well as the pine, and the ground layer or the lower layer (shrubby form of woody plants). In this study our survey was concentrated on woody species only. For the vegetation analysis, calculated were values of intensity, frequency, covers, relative importance, species diversity, dominance and similarity and dissimilasity index when importance values were calculated, different relative weights as score were arbitrarily given to each layer, i.e., 3 points for the upper layer, 2 for the middle layer and 1 for the ground layer. Then the formula becomes as follows; $$R.I.V.=\frac{3(IV\;upper\;L.)+2(IV.\;middle\;L.)+1(IV.\;ground\;L.)}{6}$$ The values of Similarity Index were calculated on the basis of the Relative Importance Value of trees (sum of relative density, frequency and cover). The formula used is; $$S.I.=\frac{2C}{S_1+S_2}{\times}100=\frac{2C}{100+100}{\times}100=C(%)$$ Where: C = The sum of the lower of the two quantitative values for species shared by the two communities. $S_1$ = The sum of all values for the first community. $S_2$ = The sum of all values for the second community. In Tab. 3, the species composition of each plot by layer and by district is presented. Without exception, the species formed the upper layer of stands was Pinus densiflora. As seen from the table, the relative cover (%), density (number of tree per $500m^2$), the range of height and diameter at brest height and cone bearing tendency were given. For the middle layer, Quercus spp. (Q. aliena, serrata, mongolica, accutissina and variabilis) and Pinus densiflora were dominating ones. Genus Rhodedendron and Lespedeza were abundant in ground vegetation, but some oaks were involved also. (1) Gongju district The total of woody species appeared in this district was 26 and relative importance value of Pinus densiflora for the upper layer was 79.1%, but in the middle layer, the R.I.V. for Quercus acctissima, Pinus densiflora, and Quercus aliena, were 22.8%, 18.7% and 10.0%, respectively, and in ground vegetation Q. mongolica 17.0%, Q. serrata 16.8% Corylus heterophylla 11.8%, and Q. dentata 11.3% in order. (2) Buyeo district. The number of species enumerated in this district was 36 and the R.I.V. of Pinus densiflora for the uppper layer was 100%. In the middle layer, the R.I.V. of Q. variabilis and Q. serrata were 8.6% and 8.5% respectively. In the ground vegetative 24 species were counted which had no more than 5% of R.I.V. The mean R.I.V. of P.densiflora ( totaling three layers ) and averaging four plots was 57.7% in contrast to 46.9% for Gongju district. (3) Gochang-district The total number of woody species was 23 and the mean R.I.V. of Pinus densiflora was 66.0% showing greater value than those for two former districts. The next high value was 6.5% for Q. serrata. As the time passes since insect outbreak, the mean R.I.V. of P. densiflora increased as the following order, 46.9%, 57.7% and 66%. This implies that P. densiflora was getting back to its original dominat state again. The pooled importance of Genus Quercus was decreasing with the increase of that for Pinus densiflora. This trend was contradict to the facts which were surveyed at Kyonggi-do area (the central temperate forest zone) reported previously (Yim et al, 1980). Among Genus Quercus, Quercus acutissina, warm-loving species, was more abundant in the southern temperature zone to which the present research is concerned than the central temperate zone. But vice-versa was true with Q. mongolica, a cold-loving one. The species which are not common between the present survey and the previous report are Corpinus cordata, Beltala davurica, Wisturia floribunda, Weigela subsessilis, Gleditsia japonica var. koraiensis, Acer pseudosieboldianum, Euonymus japonica var. macrophylla, Ribes mandshuricum, Pyrus calleryana var. faruiei, Tilia amurensis and Pyrus pyrifolia. In Figure 4 and Table 5, Maximum species diversity (maximum H'), Species diversity (H') and Eveness (J') were presented. The Similarity indices between districts were shown in Tab. 5. Seeing Fig. 6, showing two-dimensional ordination of polts on the basis of X and Y coordinates, Ai plots aggregate at the left site, Bi plots at lower site, and Ci plots at upper-right site. The increasing and decreasing patterns as to Relative Density and Relative Importance Value by genus or species were given in Fig. 7. Some of the patterns presented here are not consistent with the previously reported ones (Yim, et al, 1980). The present authors would like to attribute this fact that two distinct types of the insect attack, one is the short war type occuring in the south temperate forest zone, which means that insect attack went for a few years only, the other one is a long-drawn was type observed at the temperate forest zone in which the insect damage went on continuously for several years. These different behaviours of infestation might have resulted the different ways of vegetational change. Analysing the similarity indices between districts, the very convincing results come out that the value of dissimilarity index between A and B was 30%, 27% between B and C and 35% between A and C (Table 6). The range of similarity index was obtained from the calculation of every possible combinations of plots between two districts. Longer time isolation between communities has brought the higher value of dissimilarity index. The main components of ground vegetation, 10 to 20 years after insect outbreak, become to be consisted of mainly Genus Lespedeza and Rhododendron. Genus Quercus which relate to the top dorminant state for a while after insect attack was giving its place to Pinus densiflora. It was implied that, provided that the soil fertility, soil moisture and soil depth were good enough, Genus Quercuss had never been so easily taken ever by the resistant speeies like Pinus densiflora which forms the edaphic climax at vast areas of forest land. Usually they refer Quercus to the representative component of the undisturbed natural forest in the central part of this country.

  • PDF

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.