• Title/Summary/Keyword: 분류계수

Search Result 1,161, Processing Time 0.03 seconds

Detection of Pine Wilt Disease tree Using High Resolution Aerial Photographs - A Case Study of Kangwon National University Research Forest - (시계열 고해상도 항공영상을 이용한 소나무재선충병 감염목 탐지 - 강원대학교 학술림 일원을 대상으로 -)

  • PARK, Jeong-Mook;CHOI, In-Gyu;LEE, Jung-Soo
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.22 no.2
    • /
    • pp.36-49
    • /
    • 2019
  • The objectives of this study were to extract "Field Survey Based Infection Tree of Pine Wilt Disease(FSB_ITPWD)" and "Object Classification Based Infection Tree of Pine Wilt Disease(OCB_ITPWD)" from the Research Forest at Kangwon National University, and evaluate the spatial distribution characteristics and occurrence intensity of wood infested by pine wood nematode. It was found that the OCB optimum weights (OCB) were 11 for Scale, 0.1 for Shape, 0.9 for Color, 0.9 for Compactness, and 0.1 for Smoothness. The overall classification accuracy was approximately 94%, and the Kappa coefficient was 0.85, which was very high. OCB_ITPWD area is approximately 2.4ha, which is approximately 0.05% of the total area. When the stand structure, distribution characteristics, and topographic and geographic factors of OCB_ITPWD and those of FSB_ITPWD were compared, age class IV was the most abundant age class in FSB_ITPWD (approximately 55%) and OCB_ITPWD (approximately 44%) - the latter was 11% lower than the former. The diameter at breast heigh (DBH at 1.2m from the ground) results showed that (below 14cm) and (below 28cm) DBH trees were the majority (approximately 93%) in OCB_ITPWD, while medium and (more then 30cm) DBH trees were the majority (approximately 87%) in FSB_ITPWD, indicating different DBH distribution. On the other hand, the elevation distribution rate of OCB_ITPWD was mostly between 401 and 500m (approximately 30%), while that of FSB_ITPWD was mostly between 301 and 400m (approximately 45%). Additionally, the accessibility from the forest road was the highest at "100m or less" for both OCB_ITPWD (24%) and FSB_ITPWD (31%), indicating that more trees were infected when a stand was closer to a forest road with higher accessibility. OCB_ITPWD hotspots were 31 and 32 compartments, and it was highly distributed in areas with a higher age class and a higher DBH class.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

Influence of Microcracks in Geochang Granite on Brazilian Tensile Strength (거창화강암의 미세균열이 압열인장강도에 미치는 영향)

  • Park, Deok-Won
    • Korean Journal of Mineralogy and Petrology
    • /
    • v.34 no.3
    • /
    • pp.193-208
    • /
    • 2021
  • The characteristics of the microcrack lengths(①), microcrack spacings(②) and Brazilian tensile strengths(③) related to the six directions of rock cleavages(H2~R1) in Geochang granite were analyzed. First, the 18 cumulative graphs for the above three major factors representing unique characteristics of the rock cleavages were made. Through the general chart for these graphs classified into three planes and three rock cleavages, the 28 parameters on the length, spacing and Brazilian tensile strength have been determined. The results of correlation analysis among these parameters are summarized as follows. Second, the above parameters were classified into six groups(I~VI) according to the sorting order on the magnitude of parameter values among three rock cleavages and three planes. The values of parameters belonging to group I and II are in order of R(rift) < G(grain) < H(hardway) and H < G < R. The values of the 8 parameters on the length of line(os2, 𝚫s, 𝚫L and oSmean), the exponent(λLmean and λSmean), the slope(amean) and the anisotropy coefficient (Anmean) are in order of R < G < H and H'(hardway plane) < G'(grain plane) < R'(rift plane). Third, the noticeable differences in distribution patterns among the six types of charts for three planes and three rock cleavages are as follows. From the chart for three planes, the values of 𝚫L, 𝚫s and 𝚫σt, corresponding to the distance between two points where the two fitting lines meet on the X-axis, increase in the order of R' < H' < G'. In particular, the two graphs of R2 and G2 related to the length and Brazilian tensile strength are almost parallel to each other and show the distribution characteristics of hardway plane. Among the graphs related to the Brazilian tensile strength, the overall shape for hardway plane is similar to that for grain. From the chart for three rock cleavages, the slopes of the graphs related to the length increase in the order of R < G < H, while those of the graphs related to the spacing and Brazilian tensile strength decrease in the order of R < G < H. Lastly, the characteristics of variation among the six rock cleavages, the three planes and the three rock cleavages were visualized through the correlation chart among the above parameters from this study.

Development of nutrition quotient for elementary school children to evaluate dietary quality and eating behaviors (학령기 아동 대상 영양지수 개발과 타당도 검증)

  • Lee, Jung-Sug;Hwang, Ji-Yun;Kwon, Sehyug;Chung, Hae-Rang;Kwak, Tong-Kyung;Kang, Myung-Hee;Choi, Young-Sun;Kim, Hye-Young
    • Journal of Nutrition and Health
    • /
    • v.53 no.6
    • /
    • pp.629-647
    • /
    • 2020
  • Purpose: This study was undertaken to develop a nutrition quotient for elementary school children (NQ-C) for evaluating the overall dietary quality and eating behaviors. Methods: The NQ-C was developed by implementing 3 stages: item generation, item reduction, and validation. Candidate food behavior checklist (FBC) items of the NQ-C were derived from systematic literature reviews, expert in-depth interviews, statistical analyses of the fifth Korean National Health and Nutrition Examination Survey data, and national nutrition policies and recommendations. For the pilot survey, 260 elementary school students (128 second graders and 132 fifth graders) completed self-administered questionnaires as well as 24-hour dietary intakes, with the help of their parents and survey team staff, if required. Based on the pilot survey results, expert reviews, and priorities of national nutrition policy and recommendations, checklist items were reduced from 41 to 24. A total of 20 items for NQ-C were finally selected from results generated from 1,144 nationwide samples surveyed. Construct validity of the NQ-C was assessed using the confirmatory factor analysis, LInear Structural RELations. Results: Analyses of the exploratory factors of NQ-C identified that 5 dimensions of diet (balance, diversity, moderation, practice and environment) accounted for 46.2% of the total variance. Standardized path coefficients were used as weights of the items. The NQ-C and 5-factor scores of the subjects were calculated using the obtained weights of the FBC items. Conclusion: Our data indicates that NQ-C is a useful and suitable instrument for assessing nutrition adequacy, dietary quality, and eating behaviors of Korean elementary school children.

Verification of International Trends and Applicability in the Republic of Korea for a Greenhouse Gas Inventory in the Grassland Biomass Sector (초지 바이오매스 부문 온실가스 인벤토리 구축을 위한 국제 동향과 국내 적용 가능성 평가)

  • Sle-gee Lee;Jeong-Gwan Lee;Hyun-Jun Kim
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.43 no.4
    • /
    • pp.257-267
    • /
    • 2023
  • The grassland section of the greenhouse gas inventory has limitations due to a lack of review and verification of biomass compared to organic carbon in soil while grassland is considered one of the carbon storages in terrestrial ecosystems. Considering the situation at internal and external where the calculation of greenhouse gas inventory is being upgraded to a method with higher scientific accuracy, research on standards and methods for calculating carbon accumulation of grassland biomass is required. The purpose of this study was to identify international trends in the calculation method of the grassland biomass sector that meets the Tier 2 method and to conduct a review of variables applicable to the Republic of Korea. Identify the estimation methods and access levels for grassland biomass through the National Inventory Report in the United Nations Framework Convention on Climate Change and type the main implications derived from overseas cases. And, a field survey was conducted on 28 grasslands in the Republic of Korea to analyse the applicability of major issues. Four major international issues regarding grassland biomass were identified. 1) country-specific coefficients by land use; 2) calculations on woody plants; 3) loss and recovery due to wildfire; 4) amount of change by human activities. As a result of field surveys and analysis of activity data available domestically, it was found that there was a significant difference in the amount of carbon in biomass according to use type classification and climate zone-soil type classification. Therefore, in order to create an inventory of grassland biomass at the Tier 2 level, a policy and institutional system for making activity data should develop country-specific coefficients for climate zones and soil types.

Studies on Grouping of the Varieties by Plant Type and their Ecological Variation for Peanut(Arachis hypogaea L.) (땅콩의 초형을 주로한 품종군분류 및 그들의 생태적 변이에 관한 연구)

  • Eun-Sup Lee
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.18
    • /
    • pp.124-155
    • /
    • 1975
  • To obtain the fundamental informations on the varietal improvement of peanut and to study the ecological variations of the important agronomic traits and to the relationship between the traits studied, an investigation was made on varietal classification of 489 introduced on the basis of their morphological and ecological differences at Crop Experiment Station, Suweon in 1968, and the other study conducted at some location as above in 1969 was to investigate the ecological variations of the materials in accordance with changes of seeding date using classified varietal group under 5 different seeding times from April 16 to July 7 with twenty days interval. The results obtained were summarized as follows: 1. Peanut varieties tested were classified into Spanish, Virginia Erect, Virginia Runner, Southeast Runner. Valencia and Semirunner, on the basis of plant type, flowering time, number of grains per pod and grain size. 2. Characteristics of varietal group classified are as followings. (1) Spanish; erect, small grained and early maturing type. (2) Virginia Erect; erect, large grained and late flowering type. (3) Virginia Runner; runner, large grained and late maturing type. (4) Southeast Runner; runner, small grained and early maturing type. (5) Valencia; erect, small grained and early flowering type with 3-4 grains per pod. (6) Semi runner; semirunner, large grained and late flowering type. 3. Flowering period in respective varietal group was consistently shorted by delayed seeding date and the degree of shortening was more serious in late flowering varietal group. 4. Number of branches per plant was generally decreased in late seeding date in respective group. However, Spanish and Virginia Runner exhibited lower number of branches in the first seeding rather than the second seeding and the lowest number of branches was found in Spanish while the highest were Virginia Erect in all seeding date. 5. Shelling ratio was high in Spanish and Southeast Runner in any seeding date and decreased remarkedly by seeding after May. 6. Number of pod per plant in all varietal groups was remarkedly decreased by delayed seeding date and the degree of decreasing was more serious in large grain varietal group. 7. The higher pod weight per plant was found in second seeding date rather than first seeding and pod weight per plant was decreased obviously in all late seeding after the second. Therefore, among the cultivars tested, Southeast Runner noted the highest pod weight per plant while Virginia Runner showed the lowest. 8. Grain number per plant expressed the similar tendency as the pod weight per plant but was low in large grain group and high in small grain group in all seeding date employed. 9. 100 grain weight was heaviest in second seeding and was decreased remarkedly after the second and even the first seeding date. 10. Yield per 10a noted considerable variations in accordance with seeding date in all groups classified. However, the yield was increased in second seeding date (May 7) and decreased in the others. 11. Length of main stem and branches were exceptionally decreased in the first seeding date compare to the second in Spanish while other varieties were tend to be same between the indicated seeding date, but. these two traits were strikingly decreased in all seeding after the second. This tendency, however, strongly suggested the importance of environmental effects on peanut growth in terms of their changes due to the different seeding date. 12. Highly significant positive correlations were showed between yield and yield componets such as pod weight per plant, 100 grain weight and the number of grains per plant in all varietal groups except, Virginia Runner. However, the other characters were almost not correlated with yield and differences in correlation coefficients among the seeding dates were found. 13. Path coefficients estimated for yield components to yield was higher in number of grains per plant pod weight per plant and 100 grain weight in terms of direct effect and the other components were negligible in all varietal groups. 14. Heritabilities estimated were generally high in pod number per plant, shelling ratio, 100 grain weights and number of grains per pod and the other traits were relatively low.

  • PDF

STUDIES ON AVIAN VISCERAL LYMPHOMATOSIS I. THE INCREASED INCIDENSE AMONG CHICKEN FLOCKS AND PATHOLOGIC PICTURES (장기형임파종증(臟器型淋巴腫症)에 관(關)한 연구(硏究) 1. 계군(鷄群)에서의 임파종증(淋巴腫症)의 발생(發生) 및 병리학적소견(病理學的所見))

  • Kim, Uh Ho;Lim, Chang Hyeong
    • Korean Journal of Veterinary Research
    • /
    • v.4 no.1
    • /
    • pp.35-42
    • /
    • 1964
  • 1). An nanlysis was made of 3,500 postmortem diagnoses for the three years 1961 through 1963 to determine whether there was any actual incidence of avian visceral lymphomntosis in the field. Chickens autopsied, which showed gross alterations were 7.6 percent or 266 cases. The diminished incidence of the disease in second and third years seemed due to decreased total numbers of chicken flocks year by year for the reason of difficult feed supply. 2). Because chickens autopsied in this study were not clearly known of their breeds and lines, no distinct data on the incidence in various breeds were made. Some exact breeds were in too small numbers to have any statistical significance. Inconceivably, no other types of avian leukosis than visceral lymphomatosis had been observed in any appreciable number in this analysis. 3). Pathologic analysis for affected organs was made grossly and microscopically. In the gross pictures, liver, spleen, kidney, ovary, and in some case, intestine principally showed lesions, but its manifestation was variable in different organs. In such organs, livers were affected more frequently, and spleens followed next. The organs were classified and arranged according to the gross alterations, and among their distribution one-half of livers were in diffuse variety; one-fourths in nodular; about one-sevenths in mixed; and granular variety followed next. In the spleen samples, two-thirds were in diffuse variety; one-fourths in nodular; and follicular only in three cases. Ovaries almost showed follicular lesions, the diffused were less than one-fifths of total specimens. Kidneys were occurred almost in diffuse variety. And intestine showed only nodular tomors. Microscopically, 42 cases of visceral lymphomatosis composed of 24 livers, 10 spleens, 3 kidneys, 3 intestines and 2 ovaries were examined. The tumor cells were lymphoid cells showing various component in size, shape and stainability. Mitotic figures were usually present. The proportion of the component cells were various in all cases and there were variations in the distribution of the tumor cells. The types of distribution were classified according to the standard proposed by Horiuchi as nodular, infiltrative and diffuse proliferation. In cases of visceral lymphomatosis of the livers and the spleens the types of infiltrative, nodular and diffuse proliferation could be classified. In the cases of the kidneys the types of diffuse and nodular proliferation were observed. In the cases of the intestines and the ovaries the types of infiltrative and diffuse proliferation were observed respectively.

  • PDF

The Influence Evaluation of $^{201}Tl$ Myocardial Perfusion SPECT Image According to the Elapsed Time Difference after the Whole Body Bone Scan (전신 뼈 스캔 후 경과 시간 차이에 따른 $^{201}Tl$ 심근관류 SPECT 영상의 영향 평가)

  • Kim, Dong-Seok;Yoo, Hee-Jae;Ryu, Jae-Kwang;Yoo, Jae-Sook
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.14 no.1
    • /
    • pp.67-72
    • /
    • 2010
  • Purpose: In Asan Medical Center we perform myocardial perfusion SPECT to evaluate cardiac event risk level for non-cardiac surgery patients. In case of patients with cancer, we check tumor metastasis using whole body bone scan and whole body PET scan and then perform myocardial perfusion SPECT to reduce unnecessary exam. In case of short term in patients, we perform $^{201}Tl$ myocardial perfusion SPECT after whole body bone scan a minimum 16 hours in order to reduce hospitalization period but it is still the actual condition in which the evaluation about the affect of the crosstalk contamination due to the each other dissimilar isotope administration doesn't properly realize. So in our experiments, we try to evaluate crosstalk contamination influence on $^{201}Tl$ myocardial perfusion SPECT using anthropomorphic torso phantom and patient's data. Materials and Methods: From 2009 August to September, we analyzed 87 patients with $^{201}Tl$ myocardial perfusion SPECT. According to $^{201}Tl$ myocardial perfusion SPECT yesterday whole body bone scan possibility of carrying out, a patient was classified. The image data are obtained by using the dual energy window in $^{201}Tl$ myocardial perfusion SPECT. We analyzed $^{201}Tl$ and $^{99m}Tc$ counts ratio in each patients groups obtained image data. We utilized anthropomorphic torso phantom in our experiment and administrated $^{201}Tl$ 14.8 MBq (0.4 mCi) at myocardium and $^{99m}Tc$ 44.4 MBq (1.2 mCi) at extracardiac region. We obtained image by $^{201}Tl$ myocardial perfusion SPECT without gate method application and analyzed spatial resolution using Xeleris ver 2.0551. Results: In case of $^{201}Tl$ window and the counts rate comparison result yesterday whole body bone scan of being counted in $^{99m}Tc$ window, the difference in which a rate to 24 hours exponential-functionally notes in 1:0.114 with Ventri (GE Healthcare, Wisconsin, USA), 1:0.249 after the bone tracer injection in 12 hours in 1:0.411 with 1:0.79 with Infinia (GE healthcare, Wisconsin, USA) according to a reduction a time-out was shown (Ventri p=0.001, Infinia p=0.001). Moreover, the rate of the case in which it doesn't perform the whole body bone scan showed up as the average 1:$0.067{\pm}0.6$ of Ventri, and 1:$0.063{\pm}0.7$ of Infinia. According to the phantom after experiment spatial resolution measurement result, and an addition or no and time-out of $^{99m}Tc$ administrated, it doesn't note any change of FWHM (p=0.134). Conclusion: Through the experiments using anthropomorphic torso phantom and patients data, we found that $^{201}Tl$ myocardium perfusion SPECT image later carried out after the bone tracer injection with 16 hours this confirmed that it doesn't receive notable influence in spatial resolution by $^{99m}Tc$. But this investigation is only aimed to image quality, so it needs more investigation in patient's radiation dose and exam accuracy and precision. The exact guideline presentation about the exam interval should be made of the validation test which is exact and in which it is standardized about the affect of the crosstalk contamination according to the isotope use in which it is different later on.

  • PDF

Analysis of HBeAg and HBV DNA Detection in Hepatitis B Patients Treated with Antiviral Therapy (항 바이러스 치료중인 B형 간염환자에서 HBeAg 및 HBV DNA 검출에 관한 분석)

  • Cheon, Jun Hong;Chae, Hong Ju;Park, Mi Sun;Lim, Soo Yeon;Yoo, Seon Hee;Lee, Sun Ho
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.23 no.1
    • /
    • pp.35-39
    • /
    • 2019
  • Purpose Hepatitis B virus (hepatitis B virus, HBV) infection is a worldwide major public health problem and it is known as a major cause of chronic hepatitis, liver cirrhosis and liver cancer. And serologic tests of hepatitis B virus is essential for diagnosing and treating these diseases. In addition, with the development of molecular diagnostics, the detection of HBV DNA in serum diagnoses HBV infection and is recognized as an important indicator for the antiviral agent treatment response assessment. We performed HBeAg assay using Immunoradiometric assay (IRMA) and Chemiluminescent Microparticle Immunoassay (CMIA) in hepatitis B patients treated with antiviral agents. The detection rate of HBV DNA in serum was measured and compared by RT-PCR (Real Time - Polymerase Chain Reaction) method Materials and Methods HBeAg serum examination and HBV DNA quantification test were conducted on 270 hepatitis B patients undergoing anti-virus treatment after diagnosis of hepatitis B virus infection. Two serologic tests (IRMA, CMIA) with different detection principles were applied for the HBeAg serum test. Serum HBV DNA was quantitatively measured by real-time polymerase chain reaction (RT-PCR) using the Abbott m2000 System. Results The detection rate of HBeAg was 24.1% (65/270) for IRMA and 82.2% (222/270) for CMIA. Detection rate of serum HBV DNA by real-time RT-PCR is 29.3% (79/270). The measured amount of serum HBV DNA concentration is $4.8{\times}10^7{\pm}1.9{\times}10^8IU/mL$($mean{\pm}SD$). The minimum value is 16IU/mL, the maximum value is $1.0{\times}10^9IU/mL$, and the reference value for quantitative detection limit is 15IU/mL. The detection rates and concentrations of HBV DNA by group according to the results of HBeAg serological (IRMA, CMIA)tests were as follows. 1) Group I (IRMA negative, CMIA positive, N = 169), HBV DNA detection rate of 17.7% (30/169), $6.8{\times}10^5{\pm}1.9{\times}10^6IU/mL$ 2) Group II (IRMA positive, CMIA positive, N = 53), HBV DNA detection rate 62.3% (33/53), $1.1{\times}10^8{\pm}2.8{\times}10^8IU/mL$ 3) Group III (IRMA negative, CMIA negative, N = 36), HBV DNA detection rate 36.1% (13/36), $3.0{\times}10^5{\pm}1.1{\times}10^6IU/mL$ 4) Group IV(IRMA positive, CMIA negative, N = 12), HBV DNA detection rate 25% (3/12), $1.3{\times}10^3{\pm}1.1{\times}10^3IU/mL$ Conclusion HBeAg detection rate according to the serological test showed a large difference. This difference is considered for a number of reasons such as characteristics of the Ab used for assay kit and epitope, HBV of genotype. Detection rate and the concentration of the group-specific HBV DNA classified serologic results confirmed the high detection rate and the concentration in Group II (IRMA-positive, CMIA positive, N = 53).

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.