• Title/Summary/Keyword: Data Scientists

Search Result 3,357, Processing Time 0.025 seconds

A New Support Vector Machines for Classifying Uncertain Data (불완전 데이터의 패턴 분석을 위한 $_{MI}$SVMs)

  • Kiyoung, Lee;Dae-Won, Kim;Doheon, Lee;Kwang H., Lee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.703-705
    • /
    • 2004
  • Conventional support vector machines (SVMs) find optimal hyperplanes that have maximal margins by treating all data equivalently. In the real world, however, the data within a data set may differ in degree of uncertainty or importance due to noise, inaccuracies or missing values in the data. Hence, if all data are treated as equivalent, without considering such differences, the optimal hyperplanes identified are likely to be less optimal. In this paper, to more accurately identify the optimal hyperplane in a given uncertain data set, we propose a membership-induced distance from a hyperplane using membership values, and formulate three kinds of membership-induced SVMs.

  • PDF

XML Based Heterogeneous Sensory Data Management System (XML 기반의이기종 센서 데이터 관리 시스템)

  • Nawaz, Waqas;Fahim, Muhammad;Lee, Sung-Young;Lee, Young-Koo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06b
    • /
    • pp.305-306
    • /
    • 2011
  • The Wireless sensor networks (WSN) continuously generates large volumes of raw data which own natural heterogeneity. These networks are normally application specific with no sharing or reusability of sensor data among applications. In order for applications and services to be developed independently of particular network, sensor data need to be available in more standardized form. In this paper, we propose Architecture for Sensory data management. This Extensible Markup Language (XML) oriented architecture allows the sensor data to be understood and processed in a meaningful way by a variety of applications with different purposes. We developed a middle layer which performs transformation on raw sensory data to XML and vice versa.

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

Influence of Data Preprocessing

  • Zhu, Changming;Gao, Daqi
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.2
    • /
    • pp.51-57
    • /
    • 2016
  • In this paper, we research the influence of data preprocessing. We conclude that using different preprocessing methods leads to different classification performances. Moreover, not all data preprocessing methods are necessary, and a criterion is given to make sure which data preprocessing is necessary and which one is effective. Experiments on some real-world data sets validate that different data preprocessing methods result in different effects. Furthermore, experiments about some algorithms with different preprocessing methods also confirm that preprocessing has a great influence on the performance of a classifier.

Korean Middle School Students' Epistemic Ideas of Claim, Data, Evidence, and Argument When Evaluating and Critiquing Arguments (한국 중학생들의 주장, 자료, 근거와 과학 논의에 대한 인식론적 이해조사)

  • Ryu, Suna
    • Journal of The Korean Association For Science Education
    • /
    • v.35 no.2
    • /
    • pp.199-208
    • /
    • 2015
  • An enhanced understanding of the nature of scientific knowledge-what counts as a scientific argument and how scientists justify their claims with evidence-has been central in Korean science instruction. However, despite its importance, scholars are generally concerned about the difficulty of both addressing and improving students' epistemic understanding, especially for students of a young age. This study investigated Korean middle school students' epistemic ideas about claim, data, evidence, and argument when they engage in reading both text-based and data-inscription arguments. Compared to previous studies, Korean middle school students show a sophisticated understanding of the role of claim and evidence. Yet, these students think that there is only a single way of interpreting data. When comparing students' ideas from text-based and data-inscription arguments, the majority of Korean students barely perceive text description as evidence and recognize only measured data as evidence.

e-Science Technologies in Synchrotron Radiation Beamline - Remote Access and Automation (A Case Study for High Throughput Protein Crystallography)

  • Wang Xiao Dong;Gleaves Michael;Meredith David;Allan Rob;Nave Colin
    • Macromolecular Research
    • /
    • v.14 no.2
    • /
    • pp.140-145
    • /
    • 2006
  • E-science refers to the large-scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. The Grid is a service-oriented architecture proposed to provide access to very large data collections, very large scale computing resources and remote facilities. Web services, which are server applications, enable online access to service providers. Web portal interfaces can further hide the complexity of accessing facility's services. The main use of synchrotron radiation (SR) facilities by protein crystallographers is to collect the best possible diffraction data for reasonably well defined problems. Significant effort is therefore being made throughout the world to automate SR protein crystallography facilities so scientists can achieve high throughput, even if they are not expert in all the techniques. By applying the above technologies, the e-HTPX project, a distributed computing infrastructure, was designed to help scientists remotely plan, initiate and monitor experiments for protein crystallographic structure determination. A description of both the hardware and control software is given together in this paper.

Applications of Ground-Based Remote Sensing for Precision Agriculture

  • Hong Soon-Dal;Schepers James S.
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2005.08a
    • /
    • pp.100-113
    • /
    • 2005
  • Leaf color and plant vigor are key indicators of crop health. These visual plant attributes are frequently used by greenhouse managers, producers, and consultants to make water, nutrient, and disease management decisions. Remote sensing techniques can quickly quantify soil and plant attributes, but it requires humans to translate such data into meaningful information. Over time, scientists have used reflectance data from individual wavebands to develop a series of indices that attempt to quantify things like soil organic matter content, leaf chlorophyll concentration, leaf area index, vegetative cover, amount of living biomass, and grain yield. The recent introduction of active sensors that function independent of natural light has greatly expanded the capabilities of scientists and managers to obtain useful information. Characteristics and limitations of active sensors need to be understood to optimize their use for making improved management decisions. Pot experiments involving sand culture were conducted in 2003 and 2004 in a green house to evaluate corn and red pepper biomass. The rNDVI, gNDVI and aNDVI by ground-based remote sensors were used for evaluation of corn and red pepper biomass. The result obtained from the case study was shown that ground remote sensing as a non-destructive real-time assessment of plant nitrogen status was thought to be a useful tool for in season crop nitrogen management providing both spatial and temporal information.

  • PDF

Application of Cancer Genomics to Solve Unmet Clinical Needs

  • Lee, Se-Hoon;Sim, Sung Hoon;Kim, Ji-Yeon;Cha, SooJin;Song, Ahnah
    • Genomics & Informatics
    • /
    • v.11 no.4
    • /
    • pp.174-179
    • /
    • 2013
  • The large amount of data on cancer genome research has contributed to our understanding of cancer biology. Indeed, the genomics approach has a strong advantage for analyzing multi-factorial and complicated problems, such as cancer. It is time to think about the actual usage of cancer genomics in the clinical field. The clinical cancer field has lots of unmet needs in the management of cancer patients, which has been defined in the pre-genomic era. Unmet clinical needs are not well known to bioinformaticians and even non-clinician cancer scientists. A personalized approach in the clinical field will bring potential additional challenges to cancer genomics, because most data to now have been population-based rather than individualbased. We can maximize the use of cancer genomics in the clinical field if cancer scientists, bioinformaticians, and clinicians think and work together in solving unmet clinical needs. In this review, we present one imaginary case of a cancer patient, with which we can think about unmet clinical needs to solve with cancer genomics in the diagnosis, prediction of prognosis, monitoring the status of cancer, and personalized treatment decision.

An Assessment of ICT Infrastructure, Deployment and Applications in the Science and Technology (S&T) Research Institutions in Ghana

  • Kwafoa, Paulina Nana Yaa;Entsua-Mensah, Clement
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.11 no.1
    • /
    • pp.29-48
    • /
    • 2021
  • The paper discusses the ICT infrastructure as far as the availability of (computers, local or wide area networks, Internet connectivity and its reliability, size of the bandwidth and its optimization, etc.) in the S&T research institution. It also examined the profile of the research scientists and looked at the type of ICT infrastructure that is available for their use as well as the reliability of the Internet connectivity within these research institutions. It looked at the broadband capacities of the research institutions and the ICT capabilities in respect of the technical and managerial support back-up that are available to the research institutions. The study used the survey research method with a questionnaire as well as personal observation to gather the data. From the data gathered, it was realized that the internet connectivity and the size of the bandwidth that the R&D institutions subscribed to differed significantly. Again, the extent to which the research scientists were able to access the internet in their respective institutions depended on the quality of the local network in place. Generally, the investments in ICT were made for different management objectives, and these were meant to facilitate the generation of new knowledge as well as make measurable improvements in R&D activities.

A Preliminary Analysis of Observing Classroom Inquiry on a Web-based Discussion Board System

  • LEE, Soo-Young;LEE, Youngmin
    • Educational Technology International
    • /
    • v.12 no.2
    • /
    • pp.19-46
    • /
    • 2011
  • The purpose of the study was to identify the characteristics of classroom inquiry features exhibited on a web-based discussion board, which is called the Message Board. Approximately 4,000 students from 80 schools with 60 on-line scientists were participated in the study. During the study, a total of 639 messages in the selected cluster and several patterns were identified and analyzed. Three main features of the classroom inquiry were analyzed in terms of: 1) learner gives priority to evidence in responding to questions; 2) learner formulates explanations from evidence; 3) learner communicates and justifies explanations. The results are as follow. First, once learners identified and understood the questions posed by the curriculum, they needed to collect evidence or information in responding to the questions. Depending on the question that students were given, types of evidence/data students needed to collect and how to collect the data could vary. Second, students' formulated descriptions, explanations, and predictions after summarizing evidence were observed on the Message Board. However, the extent to which students summarized evidence for descriptions, explanations, and predictions varied. In addition, students were able to make a better use of evidence over time when they formulate descriptions and explanations. Third, the Message Board was designed to allow the great amount of learner self-direction. Classroom teachers and on-line scientists played an important role in providing guidance in developing inquiry. At the same time, development of content understanding also contributed to inquiry development.