• Title/Summary/Keyword: duplicate detection

Search Result 87, Processing Time 0.024 seconds

Concentric Circle-Based Image Signature for Near-Duplicate Detection in Large Databases

  • Cho, A-Young;Yang, Won-Keun;Oh, Weon-Geun;Jeong, Dong-Seok
    • ETRI Journal
    • /
    • v.32 no.6
    • /
    • pp.871-880
    • /
    • 2010
  • Many applications dealing with image management need a technique for removing duplicate images or for grouping related (near-duplicate) images in a database. This paper proposes a concentric circle-based image signature which makes it possible to detect near-duplicates rapidly and accurately. An image is partitioned by radius and angle levels from the center of the image. Feature values are calculated using the average or variation between the partitioned sub-regions. The feature values distributed in sequence are formed into an image signature by hash generation. The hashing facilitates storage space reduction and fast matching. The performance was evaluated through discriminability and robustness tests. Using these tests, the particularity among the different images and the invariability among the modified images are verified, respectively. In addition, we also measured the discriminability and robustness by the distribution analysis of the hashed bits. The proposed method is robust to various modifications, as shown by its average detection rate of 98.99%. The experimental results showed that the proposed method is suitable for near-duplicate detection in large databases.

Improved Facial Component Detection Using Variable Parameter and Verification (가변 변수와 검증을 이용한 개선된 얼굴 요소 검출)

  • Oh, Jeong-su
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.3
    • /
    • pp.378-383
    • /
    • 2020
  • Viola & Jones' object detection algorithm is a very good algorithm for the face component(FC) detection, but there are still problems such as duplicate detection, false detection and non-detection due to parameter setting. This paper proposes an improved FC detection algorithm that applies the variable parameter to reduce non-detection and the verification to reduce duplicate detection and false detection to the Viola & Jones' algorithm. The proposed algorithm reduces the non-detection by changing the parameter value of the Viola & Jones' algorithm until the potential valid FCs are detected, and eliminates the duplicate detection and the false detection by using the verification that evaluates size, position, and uniqueness of the detected FCs. Simulation results show that the proposed algorithm includes valid FCs in the detected objects and then detects only the valid FCs by removing invalid FCs from them.

A Post-Verification Method of Near-Duplicate Image Detection using SIFT Descriptor Binarization (SIFT 기술자 이진화를 이용한 근-복사 이미지 검출 후-검증 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.699-706
    • /
    • 2015
  • In recent years, as near-duplicate image has been increasing explosively by the spread of Internet and image-editing technology that allows easy access to image contents, related research has been done briskly. However, BoF (Bag-of-Feature), the most frequently used method for near-duplicate image detection, can cause problems that distinguish the same features from different features or the different features from same features in the quantization process of approximating a high-level local features to low-level. Therefore, a post-verification method for BoF is required to overcome the limitation of vector quantization. In this paper, we proposed and analyzed the performance of a post-verification method for BoF, which converts SIFT (Scale Invariant Feature Transform) descriptors into 128 bits binary codes and compares binary distance regarding of a short ranked list by BoF using the codes. Through an experiment using 1500 original images, it was shown that the near-duplicate detection accuracy was improved by approximately 4% over the previous BoF method.

Evaluation of the Measurement of Trace Phenols by Adsorption/Thermal Desorption/Gas Chromatography/Mass Spectrometry (ATD/GC/MS) in Artificial Air (흡착관/열탈착 GC/MS 방법에 의한 모사시료 중의 미량 페놀 분석에 관한 평가)

  • 허귀석;이재환;황승만;정필갑;유연미;김정우;이대우
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.18 no.2
    • /
    • pp.127-137
    • /
    • 2002
  • Phenolic compounds in air are toxic even at their low concentrations. We had evaluated a total of five phenolic compounds (Phenol, o-Cresol, m-Cresol, 2-Nitrophenol and 4-Chloro-3-methylphenol) in artificial air using a combination of ATD/GC/MS. To compare the adsorption efficiency of these phenolic compounds, three adsorbents (Tenax TA, Carbotrap and Carbopack B) were tested. Tenax TA adsorbent was most effective of all the adsorbents used for the efficiency test. Five phenolic compounds were found to be very stable on adsorbent tubes for 4 days at room temperature. Detection limit of five phenolic compounds ranged from 0.05 to 0.08 ppb (when assumed to collect 10 L air). The calibration curve was linear over the range of 22∼ 164 ng. The reproducibility was less than 4%. Sampling of duplicate pairs (DPs) was made to demonstrate duplicate precision and sampling efficiency.

A Study on the Duplicate Records Detection in the Serials Union Catalog (연속간행물 종합목록의 중복레코드 최소화 방안 연구)

  • Lee, Hye-jin;Choi, Ho-nam;Kim, Wan-jong;Kim, Soon-young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.445-448
    • /
    • 2007
  • A Serials Union Catalog is an essential Bibliographic Control tool for integrated and shared the serials information which is scattered to the domestic libraries. It provides reliable informations about serials to user through creating optimized catalogs and holding informations. It is important of the consistency of the bibliographic record and the record's duplication ratio is an important criterion about Database Quality Assessment. This paper checks bibliographic data elements and proposes the duplicate detection process to improve union catalog quality for minimizing duplicate detection.

  • PDF

A Study on Duplicate Detection Algorithm in Union Catalog (종합목록의 중복레코드 검증을 위한 알고리즘 연구)

  • Cho, Sun-Yeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.37 no.4
    • /
    • pp.69-88
    • /
    • 2003
  • This study intends to develop a new duplicate detection algorithm to improve database quality. The new algorithm is developed to analyze by variables of language and bibliographic type, and it checks elements in bibliographic data not just MARC fields. The algorithm computes the degree of similarity and the weight values to avoid possible elimination of records by simple input error. The study was peformed on the 7,649 newly uploaded records during the last one year against the 210,000 sample master database. The findings show that the new algorithm has improved the duplicates recall rate by 36.2%.

Efficient and Privacy-Preserving Near-Duplicate Detection in Cloud Computing (클라우드 환경에서 검색 효율성 개선과 프라이버시를 보장하는 유사 중복 검출 기법)

  • Hahn, Changhee;Shin, Hyung June;Hur, Junbeom
    • Journal of KIISE
    • /
    • v.44 no.10
    • /
    • pp.1112-1123
    • /
    • 2017
  • As content providers further offload content-centric services to the cloud, data retrieval over the cloud typically results in many redundant items because there is a prevalent near-duplication of content on the Internet. Simply fetching all data from the cloud severely degrades efficiency in terms of resource utilization and bandwidth, and data can be encrypted by multiple content providers under different keys to preserve privacy. Thus, locating near-duplicate data in a privacy-preserving way is highly dependent on the ability to deduplicate redundant search results and returns best matches without decrypting data. To this end, we propose an efficient near-duplicate detection scheme for encrypted data in the cloud. Our scheme has the following benefits. First, a single query is enough to locate near-duplicate data even if they are encrypted under different keys of multiple content providers. Second, storage, computation and communication costs are alleviated compared to existing schemes, while achieving the same level of search accuracy. Third, scalability is significantly improved as a result of a novel and efficient two-round detection to locate near-duplicate candidates over large quantities of data in the cloud. An experimental analysis with real-world data demonstrates the applicability of the proposed scheme to a practical cloud system. Last, the proposed scheme is an average of 70.6% faster than an existing scheme.

Content-based Video Retrieval for Illegal Copying Contents Detection using Hashing (Hashing을 이용한 불법 복제 콘텐츠 검출을 위한 내용 기반 영상 검색)

  • Son, Heusu;Byun, Sung-Woo;Lee, Soek-Pil
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.10
    • /
    • pp.1358-1363
    • /
    • 2018
  • As the usage of the Internet grows and digital media become more diversified, it has become much easier for digital contents to be distributed and shared. This makes easier to access the desired digital contents. On the other hand, there is an increasing need to protect the copyright of digital works. There are some prevalent ways to protect ownership, but they accompany several disadvantages. Among those ways, watermarking methods have the advantage of ensuring invisibility, but they also have a disadvantage that they are vulnerable to external attacks such as a noise and signal processing. In this paper, we propose the detecting method of illegal contents that is robust against external attacks to protect digital works. We extract HSV and LBP features from images and use Euclidian-based hashing techniques to shorten the searching time on high-dimensional and near-duplicate videos. According to the results, the proposed method showed higher detection rates than that of the Watermarking techniques in terms of the images with fabrications or deformations.

Fast Handover Provision Mechanism through Reduction of CoA Configuration Time (CoA 설정 시간 단축을 통한 빠른 핸드오버 제공 메카니즘)

  • Choi, Ji-Hyoung;Lee, Dong-Chul;Kim, Dong-Il
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.79-82
    • /
    • 2007
  • Recently the diffusion of the advancement of mobile communication technique and mobile terminal increased, The users were demanded seamless services when carrying and moving. It proposed the FMIPv6 (Fast Handoff for Mobile IPv6) from the IETF like this meeting this requirement. The handover procedure of the FMIPv6 causes to defecate with movement detection, new CoA configuration and binding update. But, the delay occurs from each process, when the DAD(Duplicate Address Detection) of the CoA executing, the big delay occurs. This paper proposes a scheme of delay reduction, it omits DAD process and stores in the AR(Access Router) relates in the CoA of the mobile terminal information.

  • PDF

Tree-Pattern-Based Clone Detection with High Precision and Recall

  • Lee, Hyo-Sub;Choi, Myung-Ryul;Doh, Kyung-Goo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.5
    • /
    • pp.1932-1950
    • /
    • 2018
  • The paper proposes a code-clone detection method that gives the highest possible precision and recall, without giving much attention to efficiency and scalability. The goal is to automatically create a reliable reference corpus that can be used as a basis for evaluating the precision and recall of clone detection tools. The algorithm takes an abstract-syntax-tree representation of source code and thoroughly examines every possible pair of all duplicate tree patterns in the tree, while avoiding unnecessary and duplicated comparisons wherever possible. The largest possible duplicate patterns are then collected in the set of pattern clusters that are used to identify code clones. The method is implemented and evaluated for a standard set of open-source Java applications. The experimental result shows very high precision and recall. False-negative clones missed by our method are all non-contiguous clones. Finally, the concept of neighbor patterns, which can be used to improve recall by detecting non-contiguous clones and intertwined clones, is proposed.