• 제목/요약/키워드: huge data

검색결과 1,411건 처리시간 0.027초

Korean and English Sentiment Analysis Using the Deep Learning

  • 마렌드라;최형림;임성배
    • 한국산업정보학회논문지
    • /
    • 제23권3호
    • /
    • pp.59-71
    • /
    • 2018
  • Social media has immense popularity among all services today. Data from social network services (SNSs) can be used for various objectives, such as text prediction or sentiment analysis. There is a great deal of Korean and English data on social media that can be used for sentiment analysis, but handling such huge amounts of unstructured data presents a difficult task. Machine learning is needed to handle such huge amounts of data. This research focuses on predicting Korean and English sentiment using deep forward neural network with a deep learning architecture and compares it with other methods, such as LDA MLP and GENSIM, using logistic regression. The research findings indicate an approximately 75% accuracy rate when predicting sentiments using DNN, with a latent Dirichelet allocation (LDA) prediction accuracy rate of approximately 81%, with the corpus being approximately 64% accurate between English and Korean.

Clustering based on Dependence Tree in Massive Data Streams

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • 제6권2호
    • /
    • pp.182-186
    • /
    • 2008
  • RFID systems generate huge amount of data quickly. The data are associated with the locations and the timestamps and the containment relationships. It is requires to assure efficient queries and updates for product tracking and monitoring. We propose a clustering technique for fast query processing. Our study presents the state charts of temporal event flow and proposes the dependence trees with data association and uses them to cluster the linked events. Our experimental evaluation show the power of proposing clustering technique based on dependence tree.

Receiver Operating Characteristic Analysis by Data Mining

  • 이성원;이제영
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2001년도 추계학술발표회 논문집
    • /
    • pp.195-197
    • /
    • 2001
  • Data Mining is used to discover patterns and relationships in huge amounts of data. Researchers in many different fields have shown great interest in data mining analysis. Using the classification technique of data mining analysis, the available model for Receiver Operating Characteristic(ROC) method is presented. We present that this may help analyze result of data mining techniques.

  • PDF

Changes in Research Paradigms in Data Intensive Environments

  • Minsoo Park
    • International journal of advanced smart convergence
    • /
    • 제12권4호
    • /
    • pp.98-103
    • /
    • 2023
  • As technology advanced dramatically in the late 20th century, a new era of science arrived. The emerging era of scientific discovery, variously described as e-Science, cyberscience, and the fourth paradigm, uses technologies required for computation, data curation, analysis, and visualization. The emergence of the fourth research paradigm will have such a huge impact that it will shake the foundations of science, and will also have a huge impact on the role of data-information infrastructure. In the digital age, the roles of data-information professionals are becoming more diverse. As eScience emerges as a sustainable and growing part of research, data-information professionals and centeres are exploring new roles to address the issues that arise from new forms of research. The functions that data-information professionals and centeres can fundamentally provide in the e-Science area are data curation, preservation, access, and metadata. Basically, it involves discovering and using available technical infrastructure and tools, finding relevant data, establishing a data management plan, and developing tools to support research. A further advanced service is archiving and curating relevant data for long-term preservation and integration of datasets and providing curating and data management services as part of a data management plan. Adaptation and change to the new information environment of the 21st century require strong and future-responsive leadership. There is a strong need to effectively respond to future challenges by exploring the role and function of data-information professionals in the future environment. Understanding what types of data-information professionals and skills will be needed in the future is essential to developing the talent that will lead the transformation. The new values and roles of data-information professionals and centers for 21st century researchers in STEAM are discussed.

토너먼트 기반의 빅데이터 분석 알고리즘 (An Algorithms for Tournament-based Big Data Analysis)

  • 이현진
    • 디지털콘텐츠학회 논문지
    • /
    • 제16권4호
    • /
    • pp.545-553
    • /
    • 2015
  • 모든 데이터는 그 자체로 가치를 가지고 있지만, 실세계에서 수집되는 데이터들은 무작위적이며 비구조화되어 있다. 따라서 이러한 데이터를 효율적으로 활용하기 위해서 데이터에서 유용한 정보를 추출하기 위한 데이터 변환과 분석 알고리즘들을 사용하게 된다. 이러한 목적으로 사용되는 것이 데이터 마이닝이다. 오늘날에는 데이터를 분석하기 위한 다양한 데이터 마이닝 기법뿐만 아니라, 대용량 데이터를 효율적으로 처리하기 위한 연산 요건과 빠른 분석 시간을 필요로 하고 있다. 대용량 데이터를 저장하기 위하여 하둡이 많이 사용되며, 이 하둡의 데이터를 분석하기 위하여 맵리듀스 프레임워크를 사용한다. 본 논문에서는 단일 머신에서 동작하는 알고리즘을 맵리듀스 프레임워크로 개발할 때 적용의 효율성을 높이기 위한 토너먼트 기반 적용 방안을 제안하였다. 본 방법은 다양한 알고리즘에 적용할 수 있으며, 널리 사용되는 데이터 마이닝 알고리즘인 k-means, k-근접 이웃 분류에 적용하여 그 유용성을 보였다.

빅 데이터를 이용한 스마트 응용의 설계 (Design of a Smart Application using Big Data)

  • 오선진
    • 한국인터넷방송통신학회논문지
    • /
    • 제15권6호
    • /
    • pp.17-24
    • /
    • 2015
  • 정보 기술과 첨단 무선 네트워크 응용 기술의 급속한 발전과 더불어, 방대하고 다양한 형태의 데이터들이 시시각각 양산되고 있으며, 최근 빅 데이터 분석기술의 중요성과 가치는 점차 증대되고 있다. 과거에는 너무 방대하여 관리조차 힘들어 무용지물이던 빅 데이터는 데이터 수집 컴퓨팅 장비와 분석 도구의 발전을 통해 다양한 활용분야에서 작은 규모의 데이터로는 불가능했던 새로운 영감이나 가치를 추출해 내는 것이 가능하게 되었다. 하지만 현실 세계에서는 아직도 빅 데이터 대부분이 제대로 적절하게 분석되어 사용되지 못하고 사장되는 것이 사실이다. 결국, 빅 데이터에서 통찰력 습득과 새로운 가치 창출을 위한 전제 조건으로 효율적인 빅 데이터 처리를 위한 분석 기술의 확보가 중요하다고 할 수 있다. 본 논문에서는 이러한 빅 데이터를 보다 효율적으로 처리하고 원하는 관심 정보를 효과적으로 추출해 낼 수 있는 정밀한 분석기법과 처리 기술을 연구하고 이를 실제 적용하는 스마트 응용을 설계한다.

Binary XML을 이용한 전자출결시스템 설계 및 개발 (Design and Development of Electronic Attendance-absence Recording System Using Binary XML)

  • 이재건;염세훈;방혜자
    • 디지털산업정보학회논문지
    • /
    • 제11권3호
    • /
    • pp.11-19
    • /
    • 2015
  • Due to recent development in mobile devices, the mobile device utilization and many related applications have been increasing. Most of initial applications on mobile devices just showed simple information, but now they processes huge data. However, smart devices have certain limitations in processing massive data. Especially, if the size of data increases, the speed of data processing adversely decreases, so the performance of programs also decreases. If hardware specification of the mobile devices is not enough to handle it, response time will be drastically delayed. To overcome these drawbacks, most of application running on mobile devices communicate with their servers to manage data. XML is a proper language for data communication to send and receive data between servers and mobile devices, because it defines rules of document's format and it is a textual data format and small-sized language. However, mobile devices have limitation such as memory, CPU and wireless network to process huge data and XML also takes a lot of time to communicate with servers and devices and handle data, so it could be overhead in service time. Binary XML is an alternative of performance improvement in data processing, which has XML's benefits and minimizes the XML size by binary coding. However, most of binaryXML which are used on applications don't fit on mobile applications. In this paper, we surveyed many kinds of binaryXML, compared merits and demerits to find a binaryXML for mobile applications. We propose how to use binary XML and implemented an electronic attendance system using binary XML to overcome the limitation of XML and to reduce the load of data communications between servers and devices.

Strategy for Determining the Structures of Large Biomolecules using the Torsion Angle Dynamics of CYANA

  • Jee, Jun-Goo
    • 한국자기공명학회논문지
    • /
    • 제20권4호
    • /
    • pp.102-108
    • /
    • 2016
  • With the rapid increase of data on protein-protein interactions, the need for delineating the 3D structures of huge protein complexes has increased. The protocols for determining nuclear magnetic resonance (NMR) structure can be applied to modeling complex structures coupled with sparse experimental restraints. In this report, I suggest the use of multiple rigid bodies for improving the efficiency of NMR-assisted structure modeling of huge complexes using CYANA. By preparing a region of known structure as a new type of residue that has no torsion angle, one can facilitate the search of the conformational spaces. This method has a distinct advantage over the rigidification of a region with synthetic distance restraints, particularly for the calculation of huge molecules. I have demonstrated the idea with calculations of decaubiquitins that are linked via Lys6, Lys11, Lys27, Lys29, Lys33, Lys48, or Lys63, or head to tail. Here, the ubiquitin region consisting of residues 1-70 was treated as a rigid body with a new residue. The efficiency of the calculation was further demonstrated in Lys48-linked decaubiquitin with ambiguous distance restraints. The approach can be readily extended to either protein-protein complexes or large proteins consisting of several domains.

Stereo matching for large-scale high-resolution satellite images using new tiling technique

  • Hong, An Nguyen;Woo, Dong-Min
    • 전기전자학회논문지
    • /
    • 제17권4호
    • /
    • pp.517-524
    • /
    • 2013
  • Stereo matching has been grabbing the attention of researchers because it plays an important role in computer vision, remote sensing and photogrammetry. Although most methods perform well with small size images, experiments applying them to large-scale data sets under uncontrolled conditions are still lacking. In this paper, we present an empirical study on stereo matching for large-scale high-resolution satellite images. A new method is studied to solve the problem of huge size and memory requirement when dealing with large-scale high resolution satellite images. Integrating the tiling technique with the well-known dynamic programming and coarse-to-fine pyramid scheme as well as using memory wisely, the suggested method can be utilized for huge stereo satellite images. Analyzing 350 points from an image of size of 8192 x 8192, disparity results attain an acceptable accuracy with RMS error of 0.5459. Taking the trade-off between computational aspect and accuracy, our method gives an efficient stereo matching for huge satellite image files.

A bio-text mining system using keywords and patterns in a grid environment

  • Kwon, Hyuk-Ryul;Jung, Tae-Sung;Kim, Kyoung-Ran;Jahng, Hye-Kyoung;Cho, Wan-Sup;Yoo, Jae-Soo
    • 한국산업정보학회:학술대회논문집
    • /
    • 한국산업정보학회 2007년도 춘계학술대회
    • /
    • pp.48-52
    • /
    • 2007
  • As huge amount of literature including biological data is being generated after post genome era, it becomes difficult for researcher to find useful knowledge from the biological databases. Bio-text mining and related natural language processing technique are the key issues in the intelligent knowledge retrieval from the biological databases. We propose a bio-text mining technique for the biologists who find Knowledge from the huge literature. At first, web robot is used to extract and transform related literature from remote databases. To improve retrieval speed, we generate an inverted file for keywords in the literature. Then, text mining system is used for extracting given knowledge patterns and keywords. Finally, we construct a grid computing environment to guarantee processing speed in the text mining even for huge literature databases. In the real experiment for 10,000 bio-literatures, the system shows 95% precision and 98% recall.

  • PDF