• Title/Summary/Keyword: Data Definition Language

Search Result 95, Processing Time 0.019 seconds

The Standard of Judgement on Plagiarism in Research Ethics and the Guideline of Global Journals for KODISA (KODISA 연구윤리의 표절 판단기준과 글로벌 학술지 가이드라인)

  • Hwang, Hee-Joong;Kim, Dong-Ho;Youn, Myoung-Kil;Lee, Jung-Wan;Lee, Jong-Ho
    • Journal of Distribution Science
    • /
    • v.12 no.6
    • /
    • pp.15-20
    • /
    • 2014
  • Purpose - In general, researchers try to abide by the code of research ethics, but many of them are not fully aware of plagiarism, unintentionally committing the research misconduct when they write a research paper. This research aims to introduce researchers a clear and easy guideline at a conference, which helps researchers avoid accidental plagiarism by addressing the issue. This research is expected to contribute building a climate and encouraging creative research among scholars. Research design, data, methodology & Results - Plagiarism is considered a sort of research misconduct along with fabrication and falsification. It is defined as an improper usage of another author's ideas, language, process, or results without giving appropriate credit. Plagiarism has nothing to do with examining the truth or accessing value of research data, process, or results. Plagiarism is determined based on whether a research corresponds to widely-used research ethics, containing proper citations. Within academia, plagiarism goes beyond the legal boundary, encompassing any kind of intentional wrongful appropriation of a research, which was created by another researchers. In summary, the definition of plagiarism is to steal other people's creative idea, research model, hypotheses, methods, definition, variables, images, tables and graphs, and use them without reasonable attribution to their true sources. There are various types of plagiarism. Some people assort plagiarism into idea plagiarism, text plagiarism, mosaic plagiarism, and idea distortion. Others view that plagiarism includes uncredited usage of another person's work without appropriate citations, self-plagiarism (using a part of a researcher's own previous research without proper citations), duplicate publication (publishing a researcher's own previous work with a different title), unethical citation (using quoted parts of another person's research without proper citations as if the parts are being cited by the current author). When an author wants to cite a part that was previously drawn from another source the author is supposed to reveal that the part is re-cited. If it is hard to state all the sources the author is allowed to mention the original source only. Today, various disciplines are developing their own measures to address these plagiarism issues, especially duplicate publications, by requiring researchers to clearly reveal true sources when they refer to any other research. Conclusions - Research misconducts including plagiarism have broad and unclear boundaries which allow ambiguous definitions and diverse interpretations. It seems difficult for researchers to have clear understandings of ways to avoid plagiarism and how to cite other's works properly. However, if guidelines are developed to detect and avoid plagiarism considering characteristics of each discipline (For example, social science and natural sciences might be able to have different standards on plagiarism.) and shared among researchers they will likely have a consensus and understanding regarding the issue. Particularly, since duplicate publications has frequently appeared more than plagiarism, academic institutions will need to provide pre-warning and screening in evaluation processes in order to reduce mistakes of researchers and to prevent duplicate publications. What is critical for researchers is to clearly reveal the true sources based on the common citation rules and to only borrow necessary amounts of others' research.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.

Development of Automated Region of Interest for the Evaluation of Renal Scintigraphy : Study on the Inter-operator Variability (신장 핵의학 영상의 정량적 분석을 위한 관심영역 자동설정 기능 개발 및 사용자별 분석결과의 변화도 감소효과 분석)

  • 이형구;송주영;서태석;최보영;신경섭
    • Progress in Medical Physics
    • /
    • v.12 no.1
    • /
    • pp.41-50
    • /
    • 2001
  • The quantification analysis of renal scintigraphy is strongly affected by the location, shape and size of region of interest(ROI). When ROIs are drawn manually, these ROIs are not reproducible due to the operators' subjective point of view, and may lead to inconsistent results even if the same data were analyzed. In this study, the effect of the ROI variation on the analysis of renal scintigraphy when the ROIs are drawn manually was investigated, and in order to obtain more consistent results, methods for automated ROI definition were developed and the results from the application of the developed methods were analyzed. Relative renal function, glomerular filtration rate and mean transit time were selected as clinical parameters for the analysis of the effect of ROI and the analysis tools were designed with the programming language of IDL5.2. To obtain renal scintigraphy, $^{99m}$Tc-DTPA was injected to the 11 adults of normal condition and to study the inter-operator variability, 9 researchers executed the analyses. The calculation of threshold using the gradient value of pixels and border tracing technique were used to define renal ROI and then the background ROI and aorta ROI were defined automatically considering anatomical information and pixel value. The automatic methods to define renal ROI were classified to 4 groups according to the exclusion of operator's subjectiveness. These automatic methods reduced the inter-operator variability remarkably in comparison with manual method and proved the effective tool to obtain reasonable and consistent results in analyzing the renal scintigraphy quantitatively.

  • PDF