Search | Korea Science

A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion

Bart Dikmans;Dongwann Kang
- Journal of Information Processing Systems
- /
- v.19 no.5
- /
- pp.602-613
- /
- 2023
High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.
https://doi.org/10.3745/JIPS.04.0288 인용 PDF

Efficient Spatial Query Processing in Constraint Databases (제약 데이터베이스에서의 효율적인 공간질의 처리)

Woo, Sung-Koo;Ryu, Keun-Ho
- Journal of Korea Spatial Information System Society
- /
- v.11 no.1
- /
- pp.79-86
- /
- 2009
The tuple of constraint database consists of constraint logical formula and it could process the presentation and query of the constraint database simply. Query operation processing shall include the constraint formula between related tuple such as selection, union, intersection of spatial data through the constraint database. However, this could produce the increasing of duplicated or unnecessary data. Hence, it will drive up the cost as per quality. This paper identified problems regarding query processing result in the constraint database. Also this paper suggested the tuple minimization summary method for result relation and analyzed the effects for efficient query processing. We were able to identify that the effectiveness of the query processing was enhanced by eliminating unnecessary constraint formula of constraint relation using the tuple minimization method.
PDF

Incorporating Deep Median Networks for Arabic Document Retrieval Using Word Embeddings-Based Query Expansion

Yasir Hadi Farhan;Mohanaad Shakir;Mustafa Abd Tareq;Boumedyen Shannaq
- Journal of Information Science Theory and Practice
- /
- v.12 no.3
- /
- pp.36-48
- /
- 2024
The information retrieval (IR) process often encounters a challenge known as query-document vocabulary mismatch, where user queries do not align with document content, impacting search effectiveness. Automatic query expansion (AQE) techniques aim to mitigate this issue by augmenting user queries with related terms or synonyms. Word embedding, particularly Word2Vec, has gained prominence for AQE due to its ability to represent words as real-number vectors. However, AQE methods typically expand individual query terms, potentially leading to query drift if not carefully selected. To address this, researchers propose utilizing median vectors derived from deep median networks to capture query similarity comprehensively. Integrating median vectors into candidate term generation and combining them with the BM25 probabilistic model and two IR strategies (EQE1 and V2Q) yields promising results, outperforming baseline methods in experimental settings.
https://doi.org/10.1633/JISTaP.2024.12.3.3 인용 PDF HTML

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval (MeSH 기반의 LDA 토픽 모델을 이용한 검색어 확장)

You, Sukjin
- Journal of Korean Library and Information Science Society
- /
- v.52 no.1
- /
- pp.79-108
- /
- 2021
Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.
https://doi.org/10.16981/kliss.52.1.202103.79 인용 PDF KSCI

Query Expansion based on Word Sense Community (유사 단어 커뮤니티 기반의 질의 확장)

Kwak, Chang-Uk;Yoon, Hee-Geun;Park, Seong-Bae
- Journal of KIISE
- /
- v.41 no.12
- /
- pp.1058-1065
- /
- 2014
In order to assist user's who are in the process of executing a search, a query expansion method suggests keywords that are related to an input query. Recently, several studies have suggested keywords that are identified by finding domains using a clustering method over the documents that are retrieved. However, the clustering method is not relevant when presenting various domains because the number of clusters should be fixed. This paper proposes a method that suggests keywords by finding various domains related to the input queries by using a community detection algorithm. The proposed method extracts words from the top-30 documents of those that are retrieved and builds communities according to the word graph. Then, keywords representing each community are derived, and the represented keywords are used for the query expansion method. In order to evaluate the proposed method, we compared our results to those of two baseline searches performed by the Google search engine and keyword recommendation using TF-IDF in the search results. The results of the evaluation indicate that the proposed method outperforms the baseline with respect to diversity.
https://doi.org/10.5626/JOK.2014.41.12.1058 인용

Content and Trajectory Retrievals of Moving Objects in Video Databases (비디오 데이타베이스에서 이동 객체의 내용 및 궤적 검색)

복경수;유재수
- Journal of KIISE:Databases
- /
- v.31 no.3
- /
- pp.219-231
- /
- 2004
Recently, together with increasing use of multimedia data, many works on moving objects in video databases have been made. Moving objects change visual features and spatial positions with the lapse of time in video data. And they arc related to the other objects or events. In this paper, we propose a new modeling and various query types of moving objects for content based retrieval in video databases. The proposed modeling represents visual features, moving trajectories and semantic contents related to objects. Therefore, it allows to process various query types. And we propose various query operators for the retrieval types. To show the superiority of our modoling, we implement the retrieval systems and compare it with the existing methods in terms of the supporting query types. The proposed method supports various query types and improves the efficiency of the query processing over the existing methods.
PDF KSCI

Building Intelligent User Interface Agent for Semantically Reformulating User Query in Medicine

Lim, Chae-Myung;Chu, Sung-Joon;Lee, Dong-Hoon;Park, Duck-Whan;Park, Tae-Young;Yang, Jung-Jin
- Proceedings of the KAIS Fall Conference
- /
- 2003.11a
- /
- pp.57-64
- /
- 2003
Achieving the beneficiary goal of recent discovery in human genome project still needs a way to retrieve and analyze the exponentially expanding bio-related information. Research on bio-related fields naturally applies knowledge discovered to the current problem and make inferences to extract new information where shared concepts and data containing information need to be defined and used in a coherent way. In such a professional domain, while the need to help users reduce their work and to improve search results has been emerged. methods for systematic retrieval and adequate exchange of relevant information are still in their infancy. The design of our system aims at improving the quality of information retrieval in a professional domain by utilizing both corpus-based and concept-based ontology. Meta-rules of helping users to make an adequate query are formed into an ontology in the domain. The integration of those knowledge permits the system to retrieve relevant information in a more semantic and systematic fashion. This work mainly describes the query models with details of GUI and a secondary query generation of the system.
PDF

Implementation of Query Processor for Efficient Vehicle Monitoring and Control in e-Logistics (e-로지스틱스에서 효율적인 차량관제를 위한 질의 처리기 구현)

Kim, Dong-Ho;Kim, Jin-Suk;Ryu, Keun-Ho
- Journal of the Korean Association of Geographic Information Studies
- /
- v.7 no.3
- /
- pp.35-47
- /
- 2004
Telematics and LBS is one of rapidly emerged technology domains. In order to efficiently construct them, moving object technology which manages huge volume of real-time location data is required. Especially, the query which obtains special sorts of information closely related to the detailed applications is required in order to effectively retrieve and analyze the location data for moving object in logistics domain. It has also complex query structure comparing to the conventional database query. The approach using the standard database query language, like SQL, can be considered as an effective alternative choice. In this paper, we not only propose a new query language, entitled as MOQL based on SQL, for the query processing of the vehicle monitoring and control in e-Logistics but also design and implement the query processor.
PDF

Spatio-temporal Query Processing Systems for Ubiquitous Environments

Kim, Jeong Joon;Kang, Jeong Jin;Rothwell, Edward J.;Lee, Ki Young
- International Journal of Internet, Broadcasting and Communication
- /
- v.5 no.2
- /
- pp.1-4
- /
- 2013
With the recent development of the ubiquitous computing technology, there are increasing interest and research in technologies such as sensors and RFID related to information recognition and location positioning in various ubiquitous fields. Especially, RTLS (Real-Time Locating Services) dealing with spatio-temporal data is emerging as a promising technology. For these reasons, the ISO/IEC published RTLS standard specification for compatibility and interoperability in RTLS. Therefore, in this paper, we designed and implemented Spatio-temporal Query Processing Systems for efficiently managing and searching the incoming Spatio-temporal data stream of moving objects. Spatio-temporal Query Processing Systems's spatio-temporal middleware maintains interoperability among heterogeneous devices and guarantees data integrity in query processing through real time processing of unceasing spatio-temporal data streams and two way synchronization of spatio-temporal DBMSs. Web Server uses the SOAP(Simple Object Access Protocol) message between client and server for interoperability and translates client's SOAP message into CQL(Continuous Query Language) of the spatio-temporal middleware.
https://doi.org/10.7236/IJIBC.2013.5.2.1 인용 PDF

Query Processing Systems in Sensor Networks (센서 네트워크에서 질의 처리 시스템)

Kim, Jeong-Joon;Chung, Sung-Taek
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.17 no.4
- /
- pp.137-142
- /
- 2017
Recently, along with the development of IoT technology, technologies for wirelessly sensing various data, such as sensor nodes, RFID, CCTV, smart phones, etc., have rapidly developed, and in the field of multiple applications, to utilize sensor network related technology Have been actively pursued in various fields. Therefore, as GeoSensor utilization increases, query processing systems for efficiently processing 2D data such as spatial sensor data are actively researched. However, existing spatial query processing systems do not support a spatial-temporal data type and a spatial-temporal operator for processing spatial-temporal sensor data. Therefore, they are inadequate for processing spatial-temporal sensor data like GeoSensor. Accordingly, this paper developed a spatial-temporal query processing system, for efficient spatial-temporal query processing of spatial-temporal sensor data in a sensor network.
https://doi.org/10.7236/JIIBC.2017.17.4.137 인용 PDF KSCI

Search Result 268, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)