• Title/Summary/Keyword: huge data

Search Result 1,411, Processing Time 0.024 seconds

Cluster ing for Analysis of Raman Hyper spectral Dental Data

  • Jung, Sung-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.1
    • /
    • pp.19-28
    • /
    • 2013
  • In this research, we presented an effective clustering method based on ICA for the analysis of huge Raman hyperspectral dental data. The hyperspectral dataset captured by HR800 micro Raman spectrometer at UMKC-CRISP(University of Missouri-Kansas City Center for Research on Interfacial Structure and Properties), has 569 local points. Each point has 1,005 hyperspectal dentin data. We compared the clustering effectiveness and the clustering time for the case of using all dataset directly and the cases of using the scores after PCA and ICA. As the result of experiment, the cases of using the scores after PCA and ICA showed, not only more detailed internal dentin information in the aspect of medical analysis, but also about 7~19 times much shorter processing times for clustering. ICA based approach also presented better performance than that of PCA, in terms of the detailed internal information of dentin and the clustering time. Therefore, we could confirm the effectiveness of ICA for the analysis of Raman hyperspectral dental data.

A Study on the Influence of Expectation of Big Data Service on e-Commerce on the Use Intension (e-Commerce 상에서 빅데이터 서비스제공 기대가 이용의도에 미치는 영향 연구)

  • Kim, Young Kook;Yum, Su Whan;Kim, Jin Hyung;Bae, Suk Min;Jung, Jai Jin
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.9
    • /
    • pp.1132-1139
    • /
    • 2019
  • Big data is prominently used as a prediction method in achieving a goal, because it can analyze the regularities to predict future results from a vast amount of past data. Furthermore, big data has huge influence in very diverse academic fields. On such awareness, this study analyzed the regular effect of e-Commerce usefulness from the effects which expectations on big-data service affect the usage purpose of e-Commerce usefulness. This study categorized e-Commerce usefulness into quality recognition, service, and ease, and studied how each category works between the relationship of big-data service expectation and the use intention.

A Big-Data Trajectory Combination Method for Navigations using Collected Trajectory Data (수집된 경로데이터를 사용하는 내비게이션을 위한 대용량 경로조합 방법)

  • Koo, Kwang Min;Lee, Taeho;Park, Heemin
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.2
    • /
    • pp.386-395
    • /
    • 2016
  • In trajectory-based navigation systems, a huge amount of trajectory data is needed for efficient route explorations. However, it would be very hard to collect trajectories from all the possible start and destination combinations. To provide a practical solution to this problem, we suggest a method combining collected GPS trajectories data into additional generated trajectories with new start and destination combinations without road information. We present a trajectory combination algorithm and its implementation with Scala programming language on Spark platform for big data processing. The experimental results proved that the proposed method can effectively populate the collected trajectories into valid trajectory paths more than three hundred times.

XML Based Meta-data Specification for Industrial Speech Databases (산업용 음성 DB를 위한 XML 기반 메타데이터)

  • Joo Young-Hee;Hong Ki-Hyung
    • MALSORI
    • /
    • v.55
    • /
    • pp.77-91
    • /
    • 2005
  • In this paper, we propose an XML based meta-data specification for industrial speech databases. Building speech databases is very time-consuming and expensive. Recently, by the government supports, huge amount of speech corpus has been collected as speech databases. However, the formats and meta-data for speech databases are different depending on the constructing institutions. In order to advance the reusability and portability of speech databases, a standard representation scheme should be adopted by all speech database construction institutions. ETRI proposed a XML based annotation scheme [51 for speech databases, but the scheme has too simple and flat modeling structure, and may cause duplicated information. In order to overcome such disadvantages in this previous scheme, we first define the speech database more formally and then identify object appearing in speech databases. We then design the data model for speech databases in an object-oriented way. Based on the designed data model, we develop the meta-data specification for industrial speech databases.

  • PDF

A practical application of cluster analysis using SPSS

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1207-1212
    • /
    • 2009
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input text data. Various measures of similarities (or dissimilarities) between objects (or variables) are developed. We introduce a real application problem of clustering procedure in SPSS when the distance matrix of the objects (or variables) is only given as an input data. It will be very helpful for the cluster analysis of huge data set which leads the size of the proximity matrix greater than 1000, particularly. Syntax command for matrix input data in SPSS for clustering is given with numerical examples.

  • PDF

A Topic Modeling Approach to Marketing Strategies for Smartphone Companies (소셜미디어 토픽모델링을 통한 스마트폰 마케팅 전략 수립 지원)

  • Cha, Yoon-Jeong;Lee, Jee-Hye;Choi, Jee-Eun;Kim, Hee-Woong
    • Knowledge Management Research
    • /
    • v.16 no.4
    • /
    • pp.69-87
    • /
    • 2015
  • Given the huge number of data produced by its users, SNS is a great source of customer insights. Since viral trends in SNS reflect customers' direct feedback, companies can draw out highly meaningful business insights when such data is effectively analyzed and managed. However, while the importance of understanding SNS big data keeps growing, the methods for analyzing atypical data such as SNS postings for business insights over product has not been well studied. This study aims to demonstrate the way to exploit topic modeling method to support marketing strategy generation and therefore leverage business process. First, we conducted topic modeling analysis for twitter data of Apple and Samsung smartphones. Then we comparatively examined the analysis results to draw meaningful market insights about each smartphone product. Finally, we draw out a strategic marketing recommendation for each smartphone brand based on the findings.

Optimal Fingerprint Data Filtering Model for Location Based Services (위치기반 서비스 강화를 위한 최적 데이터 필터링 기법 및 측위 시스템 적용 모델)

  • Jung, Jun;Kim, Jae-Hoon
    • Korean Management Science Review
    • /
    • v.29 no.2
    • /
    • pp.79-90
    • /
    • 2012
  • Focusing on the rapid market penetration of smart phones, the importance of LBS (Location Based Service) is drastically increased. However, traditional GPS method has critical weakness caused by limited availability, such as indoor environment. WPS is newly attractive method as a widely applicable positioning method. In WPS, RSSI (Received Signal Strength Indication) data of all Wi-Fi APs (Access Point) are measured and stored into a huge database. The stored RSSI data in database make single radio fingerprint map. By the radio fingerprint map, we can estimate the actual position of target point. The essential factor of radio fingerprint database is data integrity of RSSI. Because of millions of APs in urban area, RSSI measurement data are seriously contaminated. Therefore, we present the unified filtering method for RSSI measurement data. As the results of filtering, we can show the effectiveness of suggested method in practical positioning system of mobile operator.

The Prefix Array for Multimedia Information Retrieval in the Real-Time Stenograph (실시간 속기 자막 환경에서 멀티미디어 정보 검색을 위한 Prefix Array)

  • Kim, Dong-Joo;Kim, Han-Woo
    • Proceedings of the KIEE Conference
    • /
    • 2006.10c
    • /
    • pp.521-523
    • /
    • 2006
  • This paper proposes an algorithm and its data structure to support real-time full-text search for the streamed or broadcasted multimedia data containing real-time stenograph text. Since the traditional indexing method used at information retrieval area uses the linguistic information, there is a heavy cost. Therefore, we propose the algorithm and its data structure based on suffix array, which is a simple data structure and has low space complexity. Suffix array is useful frequently to search for huge text. However, subtitle text of multimedia data is to get longer by time. Therefore, suffix array must be reconstructed because subtitle text is continually changed. We propose the data structure called prefix array and search algorithm using it.

  • PDF

DESIGN OF A CONTEXT ANALYSIS MODEL ON USN ENVIRONMENT

  • Jin, Cheng-Hao;Lee, Yong-Mi;Nam, Kwang-Woo;Lee, Jun-Wook;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.122-125
    • /
    • 2008
  • Sensors used in many USN (Ubiquitous Sensor Network) domain applications generate a large amount of sensor stream data. The volume of sensor stream data is too huge to store the whole data and data speed is too fast to control each of them. In order to provide rapid and reliable context analysis service over sensor stream data, we propose a WHEN-DO context analysis model that supports the functionality of sliding window. This model is designed to be used as follows: If the sensor stream data satisfies condition in 'WHEN' clause, then it will execute actions in 'DO' clause in WHEN-DO context analysis model. The proposed WHEN-DO context analysis model can be applied to many other USN environment applications such as monitoring the status of a building and then taking actions in corresponding context condition.

  • PDF

Genome data mining for everyone

  • Lee, Gir-Won;Kim, Sang-Soo
    • BMB Reports
    • /
    • v.41 no.11
    • /
    • pp.757-764
    • /
    • 2008
  • The genomic sequences of a huge number of species have been determined. Typically, these genome sequences and the associated annotation data are accessed through Internet-based genome browsers that offer a user-friendly interface. Intelligent use of the data should expedite biological knowledge discovery. Such activity is collectively called data mining and involves queries that can be simple, complex, and even combinational. Various tools have been developed to make genome data mining available to computational and experimental biologists alike. In this mini-review, some tools that have proven successful will be introduced along with examples taken from published reports.