Derivation of Digital Music's Ranking Change Through Time Series Clustering (시계열 군집분석을 통한 디지털 음원의 순위 변화 패턴 분류)
-
- Journal of Intelligence and Information Systems
- /
- v.26 no.3
- /
- pp.171-191
- /
- 2020
This study focused on digital music, which is the most valuable cultural asset in the modern society and occupies a particularly important position in the flow of the Korean Wave. Digital music was collected based on the "Gaon Chart," a well-established music chart in Korea. Through this, the changes in the ranking of the music that entered the chart for 73 weeks were collected. Afterwards, patterns with similar characteristics were derived through time series cluster analysis. Then, a descriptive analysis was performed on the notable features of each pattern. The research process suggested by this study is as follows. First, in the data collection process, time series data was collected to check the ranking change of digital music. Subsequently, in the data processing stage, the collected data was matched with the rankings over time, and the music title and artist name were processed. Each analysis is then sequentially performed in two stages consisting of exploratory analysis and explanatory analysis. First, the data collection period was limited to the period before 'the music bulk buying phenomenon', a reliability issue related to music ranking in Korea. Specifically, it is 73 weeks starting from December 31, 2017 to January 06, 2018 as the first week, and from May 19, 2019 to May 25, 2019. And the analysis targets were limited to digital music released in Korea. In particular, digital music was collected based on the "Gaon Chart", a well-known music chart in Korea. Unlike private music charts that are being serviced in Korea, Gaon Charts are charts approved by government agencies and have basic reliability. Therefore, it can be considered that it has more public confidence than the ranking information provided by other services. The contents of the collected data are as follows. Data on the period and ranking, the name of the music, the name of the artist, the name of the album, the Gaon index, the production company, and the distribution company were collected for the music that entered the top 100 on the music chart within the collection period. Through data collection, 7,300 music, which were included in the top 100 on the music chart, were identified for a total of 73 weeks. On the other hand, in the case of digital music, since the cases included in the music chart for more than two weeks are frequent, the duplication of music is removed through the pre-processing process. For duplicate music, the number and location of the duplicated music were checked through the duplicate check function, and then deleted to form data for analysis. Through this, a list of 742 unique music for analysis among the 7,300-music data in advance was secured. A total of 742 songs were secured through previous data collection and pre-processing. In addition, a total of 16 patterns were derived through time series cluster analysis on the ranking change. Based on the patterns derived after that, two representative patterns were identified: 'Steady Seller' and 'One-Hit Wonder'. Furthermore, the two patterns were subdivided into five patterns in consideration of the survival period of the music and the music ranking. The important characteristics of each pattern are as follows. First, the artist's superstar effect and bandwagon effect were strong in the one-hit wonder-type pattern. Therefore, when consumers choose a digital music, they are strongly influenced by the superstar effect and the bandwagon effect. Second, through the Steady Seller pattern, we confirmed the music that have been chosen by consumers for a very long time. In addition, we checked the patterns of the most selected music through consumer needs. Contrary to popular belief, the steady seller: mid-term pattern, not the one-hit wonder pattern, received the most choices from consumers. Particularly noteworthy is that the 'Climbing the Chart' phenomenon, which is contrary to the existing pattern, was confirmed through the steady-seller pattern. This study focuses on the change in the ranking of music over time, a field that has been relatively alienated centering on digital music. In addition, a new approach to music research was attempted by subdividing the pattern of ranking change rather than predicting the success and ranking of music.
Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.
Purpose: The radiochromic film (Gafchromic EBT3, Ashland Advanced Materials, USA) and 3-dimensional analysis system dosimetry checkTM (DC, MathResolutions, USA) were evaluated for patient-specific quality assurance (QA) of helical tomotherapy. Materials and Methods: Depending on the tumors' positions, three types of targets, which are the abdominal tumor (130.6㎤), retroperitoneal tumor (849.0㎤), and the whole abdominal metastasis tumor (3131.0㎤) applied to the humanoid phantom (Anderson Rando Phantom, USA). We established a total of 12 comparative treatment plans by the four geometric conditions of the beam irradiation, which are the different field widths (FW) of 2.5-cm, 5.0-cm, and pitches of 0.287, 0.43. Ionization measurements (1D) with EBT3 by inserting the cheese phantom (2D) were compared to DC measurements of the 3D dose reconstruction on CT images from beam fluence log information. For the clinical feasibility evaluation of the DC, dose reconstruction has been performed using the same cheese phantom with the EBT3 method. Recalculated dose distributions revealed the dose error information during the actual irradiation on the same CT images quantitatively compared to the treatment plan. The Thread effect, which might appear in the Helical Tomotherapy, was analyzed by ripple amplitude (%). We also performed gamma index analysis (DD: 3mm/ DTA: 3%, pass threshold limit: 95%) for pattern check of the dose distribution. Results: Ripple amplitude measurement resulted in the highest average of 23.1% in the peritoneum tumor. In the radiochromic film analysis, the absolute dose was on average 0.9±0.4%, and gamma index analysis was on average 96.4±2.2% (Passing rate: >95%), which could be limited to the large target sizes such as the whole abdominal metastasis tumor. In the DC analysis with the humanoid phantom for FW of 5.0-cm, the three regions' average was 91.8±6.4% in the 2D and 3D plan. The three planes (axial, coronal, and sagittal) and dose profile could be analyzed with the entire peritoneum tumor and the whole abdominal metastasis target, with planned dose distributions. The dose errors based on the dose-volume histogram in the DC evaluations increased depending on FW and pitch. Conclusion: The DC method could implement a dose error analysis on the 3D patient image data by the measured beam fluence log information only without any dosimetry tools for patient-specific quality assurance. Also, there may be no limit to apply for the tumor location and size; therefore, the DC could be useful in patient-specific QAl during the treatment of Helical Tomotherapy of large and irregular tumors.
The objective of this research is to test the feasibility of developing a statewide truck traffic forecasting methodology for Wisconsin by using Origin-Destination surveys, traffic counts, classification counts, and other data that are routinely collected by the Wisconsin Department of Transportation (WisDOT). Development of a feasible model will permit estimation of future truck traffic for every major link in the network. This will provide the basis for improved estimation of future pavement deterioration. Pavement damage rises exponentially as axle weight increases, and trucks are responsible for most of the traffic-induced damage to pavement. Consequently, forecasts of truck traffic are critical to pavement management systems. The pavement Management Decision Supporting System (PMDSS) prepared by WisDOT in May 1990 combines pavement inventory and performance data with a knowledge base consisting of rules for evaluation, problem identification and rehabilitation recommendation. Without a r.easonable truck traffic forecasting methodology, PMDSS is not able to project pavement performance trends in order to make assessment and recommendations in the future years. However, none of WisDOT's existing forecasting methodologies has been designed specifically for predicting truck movements on a statewide highway network. For this research, the Origin-Destination survey data avaiiable from WisDOT, including two stateline areas, one county, and five cities, are analyzed and the zone-to'||'&'||'not;zone truck trip tables are developed. The resulting Origin-Destination Trip Length Frequency (00 TLF) distributions by trip type are applied to the Gravity Model (GM) for comparison with comparable TLFs from the GM. The gravity model is calibrated to obtain friction factor curves for the three trip types, Internal-Internal (I-I), Internal-External (I-E), and External-External (E-E). ~oth "macro-scale" calibration and "micro-scale" calibration are performed. The comparison of the statewide GM TLF with the 00 TLF for the macro-scale calibration does not provide suitable results because the available 00 survey data do not represent an unbiased sample of statewide truck trips. For the "micro-scale" calibration, "partial" GM trip tables that correspond to the 00 survey trip tables are extracted from the full statewide GM trip table. These "partial" GM trip tables are then merged and a partial GM TLF is created. The GM friction factor curves are adjusted until the partial GM TLF matches the 00 TLF. Three friction factor curves, one for each trip type, resulting from the micro-scale calibration produce a reasonable GM truck trip model. A key methodological issue for GM. calibration involves the use of multiple friction factor curves versus a single friction factor curve for each trip type in order to estimate truck trips with reasonable accuracy. A single friction factor curve for each of the three trip types was found to reproduce the 00 TLFs from the calibration data base. Given the very limited trip generation data available for this research, additional refinement of the gravity model using multiple mction factor curves for each trip type was not warranted. In the traditional urban transportation planning studies, the zonal trip productions and attractions and region-wide OD TLFs are available. However, for this research, the information available for the development .of the GM model is limited to Ground Counts (GC) and a limited set ofOD TLFs. The GM is calibrated using the limited OD data, but the OD data are not adequate to obtain good estimates of truck trip productions and attractions .. Consequently, zonal productions and attractions are estimated using zonal population as a first approximation. Then, Selected Link based (SELINK) analyses are used to adjust the productions and attractions and possibly recalibrate the GM. The SELINK adjustment process involves identifying the origins and destinations of all truck trips that are assigned to a specified "selected link" as the result of a standard traffic assignment. A link adjustment factor is computed as the ratio of the actual volume for the link (ground count) to the total assigned volume. This link adjustment factor is then applied to all of the origin and destination zones of the trips using that "selected link". Selected link based analyses are conducted by using both 16 selected links and 32 selected links. The result of SELINK analysis by u~ing 32 selected links provides the least %RMSE in the screenline volume analysis. In addition, the stability of the GM truck estimating model is preserved by using 32 selected links with three SELINK adjustments, that is, the GM remains calibrated despite substantial changes in the input productions and attractions. The coverage of zones provided by 32 selected links is satisfactory. Increasing the number of repetitions beyond four is not reasonable because the stability of GM model in reproducing the OD TLF reaches its limits. The total volume of truck traffic captured by 32 selected links is 107% of total trip productions. But more importantly, ~ELINK adjustment factors for all of the zones can be computed. Evaluation of the travel demand model resulting from the SELINK adjustments is conducted by using screenline volume analysis, functional class and route specific volume analysis, area specific volume analysis, production and attraction analysis, and Vehicle Miles of Travel (VMT) analysis. Screenline volume analysis by using four screenlines with 28 check points are used for evaluation of the adequacy of the overall model. The total trucks crossing the screenlines are compared to the ground count totals. L V/GC ratios of 0.958 by using 32 selected links and 1.001 by using 16 selected links are obtained. The %RM:SE for the four screenlines is inversely proportional to the average ground count totals by screenline .. The magnitude of %RM:SE for the four screenlines resulting from the fourth and last GM run by using 32 and 16 selected links is 22% and 31 % respectively. These results are similar to the overall %RMSE achieved for the 32 and 16 selected links themselves of 19% and 33% respectively. This implies that the SELINICanalysis results are reasonable for all sections of the state.Functional class and route specific volume analysis is possible by using the available 154 classification count check points. The truck traffic crossing the Interstate highways (ISH) with 37 check points, the US highways (USH) with 50 check points, and the State highways (STH) with 67 check points is compared to the actual ground count totals. The magnitude of the overall link volume to ground count ratio by route does not provide any specific pattern of over or underestimate. However, the %R11SE for the ISH shows the least value while that for the STH shows the largest value. This pattern is consistent with the screenline analysis and the overall relationship between %RMSE and ground count volume groups. Area specific volume analysis provides another broad statewide measure of the performance of the overall model. The truck traffic in the North area with 26 check points, the West area with 36 check points, the East area with 29 check points, and the South area with 64 check points are compared to the actual ground count totals. The four areas show similar results. No specific patterns in the L V/GC ratio by area are found. In addition, the %RMSE is computed for each of the four areas. The %RMSEs for the North, West, East, and South areas are 92%, 49%, 27%, and 35% respectively, whereas, the average ground counts are 481, 1383, 1532, and 3154 respectively. As for the screenline and volume range analyses, the %RMSE is inversely related to average link volume. 'The SELINK adjustments of productions and attractions resulted in a very substantial reduction in the total in-state zonal productions and attractions. The initial in-state zonal trip generation model can now be revised with a new trip production's trip rate (total adjusted productions/total population) and a new trip attraction's trip rate. Revised zonal production and attraction adjustment factors can then be developed that only reflect the impact of the SELINK adjustments that cause mcreases or , decreases from the revised zonal estimate of productions and attractions. Analysis of the revised production adjustment factors is conducted by plotting the factors on the state map. The east area of the state including the counties of Brown, Outagamie, Shawano, Wmnebago, Fond du Lac, Marathon shows comparatively large values of the revised adjustment factors. Overall, both small and large values of the revised adjustment factors are scattered around Wisconsin. This suggests that more independent variables beyond just 226; population are needed for the development of the heavy truck trip generation model. More independent variables including zonal employment data (office employees and manufacturing employees) by industry type, zonal private trucks 226; owned and zonal income data which are not available currently should be considered. A plot of frequency distribution of the in-state zones as a function of the revised production and attraction adjustment factors shows the overall " adjustment resulting from the SELINK analysis process. Overall, the revised SELINK adjustments show that the productions for many zones are reduced by, a factor of 0.5 to 0.8 while the productions for ~ relatively few zones are increased by factors from 1.1 to 4 with most of the factors in the 3.0 range. No obvious explanation for the frequency distribution could be found. The revised SELINK adjustments overall appear to be reasonable. The heavy truck VMT analysis is conducted by comparing the 1990 heavy truck VMT that is forecasted by the GM truck forecasting model, 2.975 billions, with the WisDOT computed data. This gives an estimate that is 18.3% less than the WisDOT computation of 3.642 billions of VMT. The WisDOT estimates are based on the sampling the link volumes for USH, 8TH, and CTH. This implies potential error in sampling the average link volume. The WisDOT estimate of heavy truck VMT cannot be tabulated by the three trip types, I-I, I-E ('||'&'||'pound;-I), and E-E. In contrast, the GM forecasting model shows that the proportion ofE-E VMT out of total VMT is 21.24%. In addition, tabulation of heavy truck VMT by route functional class shows that the proportion of truck traffic traversing the freeways and expressways is 76.5%. Only 14.1% of total freeway truck traffic is I-I trips, while 80% of total collector truck traffic is I-I trips. This implies that freeways are traversed mainly by I-E and E-E truck traffic while collectors are used mainly by I-I truck traffic. Other tabulations such as average heavy truck speed by trip type, average travel distance by trip type and the VMT distribution by trip type, route functional class and travel speed are useful information for highway planners to understand the characteristics of statewide heavy truck trip patternS. Heavy truck volumes for the target year 2010 are forecasted by using the GM truck forecasting model. Four scenarios are used. Fo~ better forecasting, ground count- based segment adjustment factors are developed and applied. ISH 90 '||'&'||' 94 and USH 41 are used as example routes. The forecasting results by using the ground count-based segment adjustment factors are satisfactory for long range planning purposes, but additional ground counts would be useful for USH 41. Sensitivity analysis provides estimates of the impacts of the alternative growth rates including information about changes in the trip types using key routes. The network'||'&'||'not;based GMcan easily model scenarios with different rates of growth in rural versus . . urban areas, small versus large cities, and in-state zones versus external stations. cities, and in-state zones versus external stations.
At the initial stage of Internet advertising, banner advertising came into fashion. As the Internet developed into a central part of daily lives and the competition in the on-line advertising market was getting fierce, there was not enough space for banner advertising, which rushed to portal sites only. All these factors was responsible for an upsurge in advertising prices. Consequently, the high-cost and low-efficiency problems with banner advertising were raised, which led to an emergence of keyword advertising as a new type of Internet advertising to replace its predecessor. In the beginning of 2000s, when Internet advertising came to be activated, display advertisement including banner advertising dominated the Net. However, display advertising showed signs of gradual decline, and registered minus growth in the year 2009, whereas keyword advertising showed rapid growth and started to outdo display advertising as of the year 2005. Keyword advertising refers to the advertising technique that exposes relevant advertisements on the top of research sites when one searches for a keyword. Instead of exposing advertisements to unspecified individuals like banner advertising, keyword advertising, or targeted advertising technique, shows advertisements only when customers search for a desired keyword so that only highly prospective customers are given a chance to see them. In this context, it is also referred to as search advertising. It is regarded as more aggressive advertising with a high hit rate than previous advertising in that, instead of the seller discovering customers and running an advertisement for them like TV, radios or banner advertising, it exposes advertisements to visiting customers. Keyword advertising makes it possible for a company to seek publicity on line simply by making use of a single word and to achieve a maximum of efficiency at a minimum cost. The strong point of keyword advertising is that customers are allowed to directly contact the products in question through its more efficient advertising when compared to the advertisements of mass media such as TV and radio, etc. The weak point of keyword advertising is that a company should have its advertisement registered on each and every portal site and finds it hard to exercise substantial supervision over its advertisement, there being a possibility of its advertising expenses exceeding its profits. Keyword advertising severs as the most appropriate methods of advertising for the sales and publicity of small and medium enterprises which are in need of a maximum of advertising effect at a low advertising cost. At present, keyword advertising is divided into CPC advertising and CPM advertising. The former is known as the most efficient technique, which is also referred to as advertising based on the meter rate system; A company is supposed to pay for the number of clicks on a searched keyword which users have searched. This is representatively adopted by Overture, Google's Adwords, Naver's Clickchoice, and Daum's Clicks, etc. CPM advertising is dependent upon the flat rate payment system, making a company pay for its advertisement on the basis of the number of exposure, not on the basis of the number of clicks. This method fixes a price for advertisement on the basis of 1,000-time exposure, and is mainly adopted by Naver's Timechoice, Daum's Speciallink, and Nate's Speedup, etc, At present, the CPC method is most frequently adopted. The weak point of the CPC method is that advertising cost can rise through constant clicks from the same IP. If a company makes good use of strategies for maximizing the strong points of keyword advertising and complementing its weak points, it is highly likely to turn its visitors into prospective customers. Accordingly, an advertiser should make an analysis of customers' behavior and approach them in a variety of ways, trying hard to find out what they want. With this in mind, her or she has to put multiple keywords into use when running for ads. When he or she first runs an ad, he or she should first give priority to which keyword to select. The advertiser should consider how many individuals using a search engine will click the keyword in question and how much money he or she has to pay for the advertisement. As the popular keywords that the users of search engines are frequently using are expensive in terms of a unit cost per click, the advertisers without much money for advertising at the initial phrase should pay attention to detailed keywords suitable to their budget. Detailed keywords are also referred to as peripheral keywords or extension keywords, which can be called a combination of major keywords. Most keywords are in the form of texts. The biggest strong point of text-based advertising is that it looks like search results, causing little antipathy to it. But it fails to attract much attention because of the fact that most keyword advertising is in the form of texts. Image-embedded advertising is easy to notice due to images, but it is exposed on the lower part of a web page and regarded as an advertisement, which leads to a low click through rate. However, its strong point is that its prices are lower than those of text-based advertising. If a company owns a logo or a product that is easy enough for people to recognize, the company is well advised to make good use of image-embedded advertising so as to attract Internet users' attention. Advertisers should make an analysis of their logos and examine customers' responses based on the events of sites in question and the composition of products as a vehicle for monitoring their behavior in detail. Besides, keyword advertising allows them to analyze the advertising effects of exposed keywords through the analysis of logos. The logo analysis refers to a close analysis of the current situation of a site by making an analysis of information about visitors on the basis of the analysis of the number of visitors and page view, and that of cookie values. It is in the log files generated through each Web server that a user's IP, used pages, the time when he or she uses it, and cookie values are stored. The log files contain a huge amount of data. As it is almost impossible to make a direct analysis of these log files, one is supposed to make an analysis of them by using solutions for a log analysis. The generic information that can be extracted from tools for each logo analysis includes the number of viewing the total pages, the number of average page view per day, the number of basic page view, the number of page view per visit, the total number of hits, the number of average hits per day, the number of hits per visit, the number of visits, the number of average visits per day, the net number of visitors, average visitors per day, one-time visitors, visitors who have come more than twice, and average using hours, etc. These sites are deemed to be useful for utilizing data for the analysis of the situation and current status of rival companies as well as benchmarking. As keyword advertising exposes advertisements exclusively on search-result pages, competition among advertisers attempting to preoccupy popular keywords is very fierce. Some portal sites keep on giving priority to the existing advertisers, whereas others provide chances to purchase keywords in question to all the advertisers after the advertising contract is over. If an advertiser tries to rely on keywords sensitive to seasons and timeliness in case of sites providing priority to the established advertisers, he or she may as well make a purchase of a vacant place for advertising lest he or she should miss appropriate timing for advertising. However, Naver doesn't provide priority to the existing advertisers as far as all the keyword advertisements are concerned. In this case, one can preoccupy keywords if he or she enters into a contract after confirming the contract period for advertising. This study is designed to take a look at marketing for keyword advertising and to present effective strategies for keyword advertising marketing. At present, the Korean CPC advertising market is virtually monopolized by Overture. Its strong points are that Overture is based on the CPC charging model and that advertisements are registered on the top of the most representative portal sites in Korea. These advantages serve as the most appropriate medium for small and medium enterprises to use. However, the CPC method of Overture has its weak points, too. That is, the CPC method is not the only perfect advertising model among the search advertisements in the on-line market. So it is absolutely necessary that small and medium enterprises including independent shopping malls should complement the weaknesses of the CPC method and make good use of strategies for maximizing its strengths so as to increase their sales and to create a point of contact with customers.
Medical services are a fundamental and essential service in all urban areas. The location and accessibility of medical service facilities and institutions are critical to the diagnosis, control and prevention of illness and disease. The purpose of this paper is to present the results of a study on the location of medical facilities in Kwangju and the utilization of these facilities by the inhabitants. The following information is a summary of the findings: (1) Korea, like many countries, is now witnessing an increase in the age of its population as a result of higher living standards and better medical services. Korea is also experiencing a rapid increase in health care costs. To ensure easy access to medical consultation, diagnosis and treatment by individuals, the hierarchical efficient location of medical facilities, low medical costs, equalized medical services, preventive medical care is important. (2) In Korea, the quality of medical services has improved significantly as evident by the increased number of medical facilities and medical personnel. However, there is still a need for not only quantitative improvements but also for a more equitable distribution of and location of medical services. (3) There are 503 medical facilities in Kwangju each with a need to service 2,556 people. This is below the national average of 1,498 inhabitants per facility. The higher locational quotient and satisfactory population per medical facility showed at the civic center. On the other hand, problem regions such as the traditional residential area in Buk-Gu, Moo-deung mountain area and the outer areas of west Kwangju still maintain rural characteristics. (4) In the study area there are 86 general medicine clinics which provide basic medical services. i. e. one clinic per every 14,949 residents. As a basic service, its higher locational quotient showed in the residential area. The lower population concentration per clinic was found in the civic center and in the former town center, Songjeong-dong. In recently build residential areas and in the civic center, the lack of general medicine clinics is not a serious medical services issue because of the surplus of medical specialists in Korea. People are inclined to seek a consultation with a specialist in specific fields rather than consult a general practitioner. As a result of this phenomenon, there are 81 internal medicine facilities. Of these, 32.1% provide services to people who are not referred by a primary care physician but who self-diagnose then choose a medical facility specializing in what they believe to be their health problem. Areas in the city, called dongs, without any internal facilities make up 50% of the total 101 dongs. (5) There are 78 surgical facilities within the area, and there is little difference at the locational appearance from internal medicine facilities. There are also 71 pediatric health clinics for people under 15 years of age in this area, represents one clinic per 5,063 people. On the quantitative aspect, this is a positive situation. Accessibility is the most important facility choice factor, so it should be evenly located in proportion to demander distribution. However, 61% of 102 dongs have no pediatric clinics because of the uneven location. (6) There are 43 obstetrical and gynecological clinics in Kwangju, and the number of residents being served per clinic is 15,063. These services need to be given regularly so it should increase the numbers. There are 37 ENT clinics in the study area with the lower concentration in Dong-gu (32.4%) making no locational differences by dong. There are 23 dermatology clinics with the largest concentration in Dong-Gu. There are 17 ophthalmic clinics concentrated in the residential area because of the primary function of this type of specialization. (7) The use of general medicine clinics, internal medicine clinics, pediatric clinics, ENT clinics by the inhabitants indicate a trend toward primary or routine medical services. Obstetrics and gynecology clinics are used on a regular basis. In choosing a general medicine clinic, internal medicine clinic, pediatric clinic, and a ENT clinic, accessibility is the key factor while choice of a general hospital, surgery clinic, or an obstetrics and gynecology clinic, thes faith and trust in the medical practitioner is the priority consideration. (8) I considered the efficient use of medical facilities in the aspect of locational and management and suggest the following: First, primary care facilities should be evenly distributed in every area. In Kwangju, the number of medical facilities is the lowest among the six largest cities in Korea. Moreover, they are concentrated in Dong-gu and in newly developed areas. The desired number of medical facilities should be within 30 minutes of each person's home. For regional development there is a need to develop a plan to balance, for example, taxes and funds supporting personnel, equipment and facilities. Secondly, medical services should be co-ordinated to ensure consistent, appropriate, quality services. Primary medical facilities should take charge of out-patient activities, and every effort should be made to standardize and equalize equipment and facility resources and to ensure ongoing development and training in the primary services field. A few specialty medical facilities and general hospitals should establish a priority service for incurable and terminally ill patients. (9) The management scheme for the inhabitants' efficient use of medical service is as follows: The first task is to efficiently manage medical facilities and related services. Higher quality of medical services can be accomplished within the rapidly changing medical environment. A network of social, administrative and medical organizations within an area should be established to promote information gathering and sharing strategies to better assist the community. Statistics and trends on the rate or occurrence of diseases, births, deaths, medical and environment conditions of the poor or estranged people should be maintained and monitored. The second task is to increase resources in the area of disease prevention and health promotion. Currently the focus is on the treatment and care of individuals with illness or disease. A strong emphasis should also be placed on promoting prevention of illness and injury within the community through not only public health offices but also via medical service facilities. Home medical care should be established and medical testing centers should be located as an ordinary service level. Also, reduced medical costs for the physically handicapped, cardiac patients, and mentally ill or handicapped patients should be considered.
Korea has stepped up efforts to investigate and catalog its flora and fauna to conserve the biodiversity of the Korean Peninsula and secure biological resources since the ratification of the Convention on Biological Diversity (CBD) in 1992 and the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits (ABS) in 2010. Thus, after its establishment in 2007, the National Institute of Biological Resources (NIBR) of the Ministry of Environment of Korea initiated a project called the Korean Indigenous Species Investigation Project to investigate indigenous species on the Korean Peninsula. For 15 years since its beginning in 2006, this project has been carried out in five phases, Phase 1 from 2006-2008, Phase 2 from 2009-2011, Phase 3 from 2012-2014, Phase 4 from 2015-2017, and Phase 5 from 2018-2020. Before this project, in 2006, the number of indigenous species surveyed was 29,916. The figure was cumulatively aggregated at the end of each phase as 33,253 species for Phase 1 (2008), 38,011 species for Phase 2 (2011), 42,756 species for Phase 3 (2014), 49,027 species for Phase 4 (2017), and 54,428 species for Phase 5(2020). The number of indigenous species surveyed grew rapidly, showing an approximately 1.8-fold increase as the project progressed. These statistics showed an annual average of 2,320 newly recorded species during the project period. Among the recorded species, a total of 5,242 new species were reported in scientific publications, a great scientific achievement. During this project period, newly recorded species on the Korean Peninsula were identified using the recent taxonomic classifications as follows: 4,440 insect species (including 988 new species), 4,333 invertebrate species except for insects (including 1,492 new species), 98 vertebrate species (fish) (including nine new species), 309 plant species (including 176 vascular plant species, 133 bryophyte species, and 39 new species), 1,916 algae species (including 178 new species), 1,716 fungi and lichen species(including 309 new species), and 4,812 prokaryotic species (including 2,226 new species). The number of collected biological specimens in each phase was aggregated as follows: 247,226 for Phase 1 (2008), 207,827 for Phase 2 (2011), 287,133 for Phase 3 (2014), 244,920 for Phase 4(2017), and 144,333 for Phase 5(2020). A total of 1,131,439 specimens were obtained with an annual average of 75,429. More specifically, 281,054 insect specimens, 194,667 invertebrate specimens (except for insects), 40,100 fish specimens, 378,251 plant specimens, 140,490 algae specimens, 61,695 fungi specimens, and 35,182 prokaryotic specimens were collected. The cumulative number of researchers, which were nearly all professional taxonomists and graduate students majoring in taxonomy across the country, involved in this project was around 5,000, with an annual average of 395. The number of researchers/assistant researchers or mainly graduate students participating in Phase 1 was 597/268; 522/191 in Phase 2; 939/292 in Phase 3; 575/852 in Phase 4; and 601/1,097 in Phase 5. During this project period, 3,488 papers were published in major scientific journals. Of these, 2,320 papers were published in domestic journals and 1,168 papers were published in Science Citation Index(SCI) journals. During the project period, a total of 83.3 billion won (annual average of 5.5 billion won) or approximately US $75 million (annual average of US $5 million) was invested in investigating indigenous species and collecting specimens. This project was a large-scale research study led by the Korean government. It is considered to be a successful example of Korea's compressed development as it attracted almost all of the taxonomists in Korea and made remarkable achievements with a massive budget in a short time. The results from this project led to the National List of Species of Korea, where all species were organized by taxonomic classification. Information regarding the National List of Species of Korea is available to experts, students, and the general public (https://species.nibr.go.kr/index.do). The information, including descriptions, DNA sequences, habitats, distributions, ecological aspects, images, and multimedia, has been digitized, making contributions to scientific advancement in research fields such as phylogenetics and evolution. The species information also serves as a basis for projects aimed at species distribution and biological monitoring such as climate-sensitive biological indicator species. Moreover, the species information helps bio-industries search for useful biological resources. The most meaningful achievement of this project can be in providing support for nurturing young taxonomists like graduate students. This project has continued for the past 15 years and is still ongoing. Efforts to address issues, including species misidentification and invalid synonyms, still have to be made to enhance taxonomic research. Research needs to be conducted to investigate another 50,000 species out of the estimated 100,000 indigenous species on the Korean Peninsula.
Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is
Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.
Market timing is an investment strategy which is used for obtaining excessive return from financial market. In general, detection of market timing means determining when to buy and sell to get excess return from trading. In many market timing systems, trading rules have been used as an engine to generate signals for trade. On the other hand, some researchers proposed the rough set analysis as a proper tool for market timing because it does not generate a signal for trade when the pattern of the market is uncertain by using the control function. The data for the rough set analysis should be discretized of numeric value because the rough set only accepts categorical data for analysis. Discretization searches for proper "cuts" for numeric data that determine intervals. All values that lie within each interval are transformed into same value. In general, there are four methods for data discretization in rough set analysis including equal frequency scaling, expert's knowledge-based discretization, minimum entropy scaling, and na