DOI QR코드

DOI QR Code

Application and Research of Monte Carlo Sampling Algorithm in Music Generation

  • MIN, Jun (College of Electronics and Information Engineering, TongJi university) ;
  • WANG, Lei (College of Electronics and Information Engineering, TongJi university) ;
  • PANG, Junwei (College of Electronics and Information Engineering, TongJi university) ;
  • HAN, Huihui (College of Electronics and Information Engineering, TongJi university) ;
  • Li, Dongyang (College of Electronics and Information Engineering, TongJi university) ;
  • ZHANG, Maoqing (College of Electronics and Information Engineering, TongJi university) ;
  • HUANG, Yantai (College of automation and electrical engineering, Zhejiang University of Science and Technology)
  • Received : 2021.10.16
  • Accepted : 2022.09.05
  • Published : 2022.10.31

Abstract

Composing music is an inspired yet challenging task, in that the process involves many considerations such as assigning pitches, determining rhythm, and arranging accompaniment. Algorithmic composition aims to develop algorithms for music composition. Recently, algorithmic composition using artificial intelligence technologies received considerable attention. In particular, computational intelligence is widely used and achieves promising results in the creation of music. This paper attempts to provide a survey on the music generation based on the Monte Carlo (MC) algorithm. First, transform the MIDI music format files to digital data. Among these data, use the logistic fitting method to fit the time series, obtain the time distribution regular pattern. Except for time series, the converted data also includes duration, pitch, and velocity. Second, using MC simulation to deal with them summed up their distribution law respectively. The two main control parameters are the value of discrete sampling and standard deviation. Processing the above parameters and converting the data to MIDI file, then compared with the output generated by LSTM neural network, evaluate the music comprehensively.

Keywords

1. Introduction

COMPOSSING melodious music is a challenging task since many musical elements need to be considered, such as pitch, rhythm, chord, timbre, musical form, and accompaniment. In the past, music composition was usually accomplished by a few talented people[1-4]. The algorithmic composition, which formulates the creation of music as a formal problem, facilitates the development of algorithms for music composition.

The algorithmic composition enables automatic composition by using computers and mathematics. In 1959, Hiller and Isaacson first programmed the Illinois automatic computer (ILLIAC) to generate music algorithmically[5]. The music piece was composed by the computer and then transformed into a score for a string quartet to perform. At present, the company Google and Sony are trying and exploring actively this field. In Google's magenta project, based on Tensorflow[6], the latest achievement is to generate a 90-second melody. FlowMachines[7], an artificial intelligence system developed by Sony company, can learn various music styles in a huge music library, create different styles of music with unique style transformation, optimization, and interaction technology independently.

From a mathematical perspective, composing music can be viewed as a stochastic process and, therefore, mathematical models such as Markov chains are useful for composition[8]. The music composition using mathematical models has the advantages of low complexity and fast response, which are adequate for real-time application. In recent years, other technologies have appeared in papers published on automatic composition, such as hierarchical structure[9], cellular automata theory[10], experience-based composition[11], answer set programming[12], possibility construction spatial thinking model[13], probability logic[14], etc.

The advances of artificial intelligence (AI) promote its application to algorithmic composition. Spurred by the progress of deep neural networks, Support Vector Machine (SVM)[15], Deep Neural Networks (DNN)[16], Convolution Neural Network (CNN)[17][18], especially the Recurrent Neural Network (RNN)[19-23] have gathered renewed interests these days. In particular, recurrent neural network (RNN) has aroused people's interest again. Through the accumulation and certification of practice, most of the network models used to generate music are based on recurrent neural networks and their variants. These techniques have been applied to music composition. In particular, In RNN, the LSTM network is used to process and generate music files commonly[24]. Although the LSTM can model music and synthesize new note sequences, they only can keep the time unchanged. In RNN, LSTM is used to process and generate music files commonly. Although the LSTM can model music and synthesize new note sequences, they can just keep the time unchanged. For example, when the training data move time steps forward or backward, the model will not recognize them as the different data separately among the processing of music training. When the training data move several semitones up or down, the above models can not keep the transposition invariant, which will increase the time cost unnecessary.

Therefore, in the actual operation, it is not difficult to find that if only through the LSTM network to generate the music, both the system running time and the music timbre are all not satisfactory. To realize the transposition invariant and equip the system to recognize the interval relationship between chords, the relative position of the notes rather than the absolute position in the whole score must be considered when designing the music generation structure. In light of the shortcomings of the above methods during the process of music generation, the main contributions of this paper are listed as follows.

1) Model the music sample structure as a probability distribution, that is, the system will count the probability of each note appearing in each specific bar and each tone.

2) Proposing that the system can realize the shift-invariance by learning the notes distribution probability in the music sample, and the system can recognize the interval relationship between chords.

3) Establish a complete music evaluation index, and evaluate the generated music.

The MC method can simulate the music generate process fantastic. The advantage of using the MC algorithm to create music is the system will consider the relative position between the notes rather than the absolute position in the whole score, and the running time will be shortened greatly. At the same time, the process of music generation can be regarded as a process that combines theoretical experience and accidental inspiration. For the theoretical experience of music generation, the MC algorithm can train the data set, then summarize the rules of notes, and learn the distribution of the note in the whole score. However, the influence of accidental inspiration in music generation needs to consider the breakthrough and innovation from different directions. In addition, the evaluation of compositions is a paramount issue that needs to be addressed in AI-based composition systems.

The paper is organized as follows. Section II reviews the studies on music generation by artificial intelligence algorithms. Section III introduces the method and theory of recommendation and recapitulates the process of the experiment. Section IV summarizes the result analysis and evaluation. Finally, Section V provides some suggestions for future research topics and presented a summary of this paper.

2. Related work

With the application of deep learning techniques for music, generation has been rapidly advancing, there are some application examples of using algorithms to generate music. Smith et al. use several variants of multilayer neural networks to generate music, which can understand the structure and rhythm of music well. Different from the traditional model, this model takes the music data as a whole. At the same time, it divides the music data by time, and each part can be regarded as a music sample. These segmented parts are processed into more advanced music structures abstractly, then repeated process in the main structure of the system until the final result is generated[25]. Nayebi et al.[26] compare the practicability of unit structure in music generation between LSTM network with Gated Recurrent Network (GRU), these models use the WAV format music files as the input samples of the network, output the waveform of the music samples by giving a random number. However, after repeated experiments, it has taken too much time to process music files of the WAV format, this idea does not try to carry out a lot of testing data, lacks a certain persuasion. Bonn G et al.[27] put forward the answer set programming method, which uses the answer set to build an automatic composting system called Anton. Although this method can generate a short melody well, it can not generate complete music and there are also have some shortcomings in rhythm coding and melody processing. Aguilera G et al.[28] propose a probabilistic logic for generating counterpoint notes in the Polyphonic Logic method, which uses a probability algorithm to generate a fixed melody counterpoint music, but the output is unstable even input the same music sample in the system, and the algorithm without considering the genre characteristics of melody. Voss and Clark[29] observed and proposed the 1/f noise for composition. They have used independent 1/f noise sources in a simple algorithm to determine the duration and pitch of successive notes of a melody. The music obtained by this method was judged by most listeners to be much more pleasing than that obtained using either a white noise source or an 1/f2 noise source. Sertan and Chordia[30] utilized the variable-length Markov model that considers the pitch, rhythm, instrument, and key, to predict the subsequent sequence of Turkish folk music. The thesis also introduces pitch-related viewpoints that are specifically aimed to model the unique melodic properties of making music. Prechtl et al.[31] generated music for games by using the Markov chains with musical features such as tempo, velocity, volume, and chords. They outline an implementation of the approach in an actual game, focusing primarily on how the music system traces the game’s emotional narrative by periodically querying certain narrative parameters and adjusting the musical features of its output accordingly.

3. Proposed method

Through the introduction of the ways in the first two chapters, it can be concluded that the time cost and the final music quality are two important standards of the evaluation criteria in music generation. Although the process of using algorithmic to synthesize the music still need human to participate, this can be regarded as an effective auxiliary means to promote the final music generation. Also, the music generated by different models have their characteristics, but if the system only contains one method the final audio files will not be satisfactory. When designing a music generation system, it needs to combine a variety of means, the system will also develop in the direction of hybrid. From the perspective of the system’s structure, the process of music generation is mainly divided into three parts, namely, music data preprocessing, feature extraction, and music generation. The first is the preprocessing of the music data, which needs to convert MIDI files into music matrix or others, convert music files into digital symbols, which can be processed by computer directly. Secondly, processing the music matrix with the MC method, summarize the distribution probability, and characteristics of these data. Finally, restore the music digital symbols to MIDI music files.

From the point of view in data analysis, using the MC method can summarize the distribution law of duration, pitch and speed in music files, at the same time, by using the logistic fitting method to fit the time characteristics. After comparing the music files generated by the MC method with the music files generate by the LSTM system under the music evaluation mechanism of Euler, the final result shows the system which bases on the MC method will consume less more time and generate music have better quality. The structure of the paper is shown in Fig. 1.

E1KOBZ_2022_v16n10_3355_f0001.png 이미지

Fig. 1. The structure of the paper

3.1 Data Processing

By the usage of 215 world-famous piano MIDI pieces as the sample data. The MIDI music samples are processed through conversion tools. Express the duration, pitch, and velocity with the numbers at each time node, express the output information such as the notes, pitch, and beat in the form of numbers. Based on this point, it will be possible to employ the algorithm to process the music files. For instance, Table 1 is the note matrix fragments for famous piano music <For Elise>, Fig. 2. is the corresponding musical notation.

Table 1. The note matrix for MIDI

E1KOBZ_2022_v16n10_3355_t0001.png 이미지

E1KOBZ_2022_v16n10_3355_f0002.png 이미지

Fig. 2. The part musical notation for <For Elise>

In Table 1, the first column indicates the onset of the notes in beats (based on ticks per quarter note) and the second column indicates the duration of the notes in these same beat values. The third column indicates the MIDI channel (1-16), and the fourth indicates the MIDI pitch, where the middle C (C4) is 60. The fifth column is the velocity describing how fast the key of the note is pressed (0-127). The last two columns correspond to the first two (onset in beats, duration in beats) except that seconds are used instead of beats. The numbers of the column time (beats) and duration (beats) are twice as much as the time (sec) and duration (sec) respectively.

3.2 MC Sampling Algorithm

Through sampling in random numbers, the MC sampling algorithm can predict complex trends. The basic idea is to find out the probability of the occurrence for some elements in the problem, grasp the geometric quantity and motion characteristics of data distribution, then use mathematical methods to simulate the process of generating data. Based on the probability model, the final result can be regarded as an approximate solution. Firstly, the MC sampling algorithm is used to simulate the distribution of the value of duration, pitch, and velocity in music. The rules of distribution are summarized to form a probability distribution model. Secondly, estimating the digital characteristics of the music generation model by statistical method, and calculate the optimal solution of the actual results quantitatively. Then need to calculate the discrete sampling value of the output by simulating the input distribution characteristics of the music sample model, and obtain the best estimation of the output from the discrete distribution value of the output directly. To achieve the goals stated above, a framework that can fulfill these two objectives is built:

1) Establish the measurement model of output Y and input Xi to analyze the main error sources, then establish the uncertainty evaluation model y = f (Xi);

2) Determine the input (music samples) and the number of simulations (10000);

3) Generate several groups of pseudo-random numbers by computer and brought them into the music model, then obtain the output value of N groups of models yr (r = 1, 2, ..., N). Calculate the optimal output value and standard uncertainty with the N groups model values and the formula (1) and the formula (2);

\(\begin{aligned}\bar{y}=\frac{1}{N} \sum_{r=1}^{N} Y_{r}\end{aligned}\)       (1)

\(\begin{aligned}u(y)=\sqrt{\frac{1}{N-1} \sum_{r=1}^{N}\left(y_{r}-\bar{y}\right)^{2}}\end{aligned}\)       (2)

4) Obtain the maximum and minimum values among the N groups of the model values. Divide the number between the maximum and minimum values into several intervals (equal to the numbers of the final output, 100), count the frequency of each value in each interval. At the same time, determine the inclusion interval by combining the inclusion probability P, so the expanded uncertainty of output Y also can be obtained.

The determination of sampling number N directly affects the reliability of evaluation results and the feasibility of simulation in the process of estimating the digital characteristics for generate the music model and calculating the optimal solution for the actual results. With the increase of sampling times N, the output of various index values are more accurate, but the calculation time also will be longer, until can not be achieved, to get stable statistical data, we need to increase or decrease the amount of energy artificially when implement the MC sampling algorithm. Describe this step with mathematical language is to make the standard deviation less than half of the standard uncertainty. The steps of uncertainty adjustment in the MC sampling process are as follows:

1)Assume the value nadj is a suitable positive integer, the general value is 1 or 2;

2)Suppose N = max (J, 104), where J is the minimum integer which greater or equal to 100 / (1-P), in this paper P = 99 %, N = 10000;

3)If h = 1, which means this is the first time to conduct the simulation of MC;

4)When using the calculation tool to run the MC algorithm, N groups random numbers of the input quantity Xi in the measurement model will be generated;

5)The model values of N groups yr (r = 1, 2, ..., N) were been obtained by measuring. According to formula (1) and formula (2), calculated the estimated value y(h) and the standard uncertainty u(y (H)), calculate the endpoint values of the inclusion interval when the inclusion probability is P in the sequence;

6)Let h = h +1 and restart MC simulation;

7)According to formula (3) and formula (4), calculate the average value y and standard deviation sy of the estimated value of Y;

\(\begin{aligned}y=\frac{1}{h} \sum_{j=1}^{h} y^{j}\end{aligned}\)       (3)

\(\begin{aligned}s_{y}=\sqrt{\frac{1}{h(h-1)} \sum_{j=1}^{h}\left(y^{(j)}-y\right)^{2}}\end{aligned}\)       (4)

8)In the same way, calculate the standard deviations su(y), sylow, syhigh corresponding to the, u(y), ylow, yhigh;

9)Calculate u(y) according to all models, the numbers of the models are h × N ;

10)Calculate the numerical tolerance δ of u (y): if u (y) = c * 104, where c is an n-digit decimal positive number of nadj, so c=1, then \(\begin{aligned}\delta=\frac{c * 10^{4}}{2}=\frac{10^{4}}{2};\end{aligned}\) 

11)If one of either sy, su (y), sylow, syhigh is greater than half of δ, let h = h +1 and restart the MC simulation. Otherwise, consider all calculation conditions are stable. According to all model values, calculate the optimal value of y, standard uncertainty u(y) , and inclusion interval under inclusion probability P.

ALGORITHM: MC SAMPLING ALGORITHM

Input: The number of rounds h=1, the number of MC executed per round is M, Significant number ndig, inclusion probability ρ.

Output: Output estimate y(h), standard uncertainty u(y)(h), include interval endpoints ylow(h), yhigh(h)

The distributed iteration at each node, for h=1, 2, ...

1: Calculate y(r), u(y)(r), ylow(r) and yhigh(r), obtain the mean standard deviation sy, su(y), sylow and syhigh.

2: Calculate the standard uncertainty u(y).

3: Calculate the numerical tolerance δ of u(y).

4: Judge sy, su(y), sylow and syhigh<δ, if not, step to 1.

End for

3.3 Experiments

The time distribution of music samples is independent of each other, which is consistent with the characteristics for the application object of the logistic regression model. The essence of logistic regression is to divide the probability of occurrence by the probability of non-occurrence and then take logarithm, this transformation changes the value range contradiction and the curve relationship between dependent variables and independent variables. The reason is that the probability of occurrence and non-occurrence becomes a ratio, which is a buffer to expand the range of values, then the logarithmic transformation is carried out to change the dependent variable. According to an enormous number of experiments, this kind of transformation often makes the variable of dependent and independent formation a linear relationship. Therefore, logistic fitting not only solves the problem that the dependent variable is not continuous but also solves the problem that the dependent variable is continuous. Fig. 3 shows the time data nonlinear curve fitting (logistic).

E1KOBZ_2022_v16n10_3355_f0003.png 이미지

Fig. 3. Nonlinear curve fitting of time

As seen from Fig. 3 (a) logistic regression analysis can show the distribution law of time series fantastic. Use logistic fitting the music samples, the Reduced Chi-Sqr=0.03577, square of the error be equal to COD (R^2) =0.9997, the formula for Chi-Sqr COD (R^2) are see in Table 4 and the value of the conventional residuals fluctuate in a reasonable range. Using logistic regression analysis to fit the time series can summary the distribution law of time series data well.

Fig. 3 (b) is the value obtained by using a logistic fitting method to fit the time series of music samples, the results show that the error between the fitting value and the original data is between -0.5~0.5. Fitting data can represent the original data well.

Fig. 4 shows the value of the duration simulated by the MC method.

E1KOBZ_2022_v16n10_3355_f0004.png 이미지

Fig. 4. Simulation value for the duration

Fig. 4 (a) is the process of data analysis for duration applying the method of MC. During the MC simulation, the final duration value conforms to the basic law in music creation, curve changes in Fig. 4 (b) also correspond to the prelude, climax, and ending in the music. The duration value which fitted out has certain regularity with the advance of the time.

Using the same way to deal with the data of pitch and velocity, then obtain Fig. 5 and Fig. 6. Fig. 5 (a) is the process of data analysis for pitch applying the method of MC. Fig. 5 (b) is the final value generated by the MC algorithm, it can be seen that most of the values are concentrated around 60 (the average value).

E1KOBZ_2022_v16n10_3355_f0005.png 이미지

Fig. 5. Simulation value for the pitch

E1KOBZ_2022_v16n10_3355_f0006.png 이미지

Fig. 6. Simulation value for the velocity

For the value of the pitch and velocity, from the point of view of mathematical distribution, the distribution has its focus but also has a certain degree of increasing or decreasing. Finally, restore the processed data in the form of Table 2 (select the first 30 groups data and reserve 4 decimal).

Table 2. The note matrix for the final result

E1KOBZ_2022_v16n10_3355_t0002.png 이미지

In Table 2, R represents a row of numbers, T (sec) represents onset (sec), D (sec) represent duration (sec), P represents pitch, V represents velocity, T (beats) represent onset (beats) and D (beats) represent duration (beats).

Next, process the note matrix into a MIDI file and displayed it in the form of musical notation. Here selected a fragment of the music file and the result is shown in Fig. 7.

E1KOBZ_2022_v16n10_3355_f0007.png 이미지

Fig. 7. The part musical notation for the final result

Fig. 7 shows the part musical notation for the final result.

Seeing from Fig. 7 that the music generated by the MC method is tidy in note distribution. At the same time, there has a strong connection between each scale, and the generated music sound smooth.

Fig. 8 shows the proportion of each note in the generated sample. The proportion of each notes accord with the law of music generation. The final music samples not only have their main melody, but also the diverse characteristics in the distribution of notes.

E1KOBZ_2022_v16n10_3355_f0008.png 이미지

Fig. 8. Proportion of notes

Through the observation of Fig. 9, it can be discovered that the generated pieces capture basic harmonic relationships between the melody and accompaniment and contain consistent rhythmic patterns and easy to observe that the shift is accompanied by the desired changes in rhythm density and note density, as mentioned in Fig. 4 to Fig. 6.

E1KOBZ_2022_v16n10_3355_f0009.png 이미지

Fig. 9. The output of beat time and speed time

4. Evaluation

Using the subjective index of melody measure to evaluate the generated music samples, to obtain the objective melody evaluation index table of music, bring the scale data into the continuous chaotic model for simulation. Table 3 is the objective melody evaluation index table of music.

Table 3. Melody evaluation index table

E1KOBZ_2022_v16n10_3355_t0003.png 이미지

According to Euler's theory and the knowledge of music cognitive psychology, the complexity of melody has a certain relationship with auditory comfort. If the audiences have little mental calculation when listening to the melody, it means the melody is comfortable to enjoy. In other words, the more complex information that needs to be accepted and understood, the more terrible the audiences feel. The calculation process of melody measure are as follows:

1) Calculate the interval between every two notes in the melody;

2) For each interval i(i, is the number of intervals), define ai = n(i) × n(d) , where n(i)/ n(d ) is the frequency ratio of the current interval, decompose the prime factor ai.

ai = p1k1 p2k2 …pnkn       (5)

Formula 5, pj represents the j-th prime number in the sequence, kj is the appear times for the prime number and n is the biggest number among the ai .

3) The melodic metrics of the interval ai can be defined as:

\(\begin{aligned}G\left(a_{i}\right)=\sum_{j=1}^{n}\left(k_{j} p_{j}-k_{j}\right)+1\\\end{aligned}\)       (6)

4) Finally, the melodic metrics of the whole melody can be defined as:

\(\begin{aligned}G=\frac{1}{m} \sum_{i=1}^{m} G\left(a_{i}\right)\\\end{aligned}\)       (7)

Using formulas (5), (6), (7) to calculate the melodic metrics of the system. Fig. 10 is the musical notation of the output with the LSTM, in which the sample was unprocessed.

E1KOBZ_2022_v16n10_3355_f0010.png 이미지

Fig. 10. The musical notation of the output with the LSTM​​​​​​​

Seeing from Fig. 10 that the music generated by the LSTM method is chaotic in note distribution. At the same time, there is a big gap between each scale, and the generated music sound is not very comfortable.

S. E. of the region reflects the average difference between the dependent variable and the actual value. The value is smaller, the difference between the estimated value and the actual value is smaller, and the representativeness of the estimated value is stronger. Sum squared resid is used to measure the effect of variables and random errors. Log-likelihood reflects the fitting state of the model when there is only an intercept. Generally, the value is negative. The smaller the absolute value is, the better the fitting state is. F-statistic indicates the significance of the whole fitting process. The larger the value is, the better the fitting effect is. S. D. dependent var is the arithmetic square root of the variance, which reflects the degree of dispersion of a data set. The smaller the general standard deviation is, the more stable these values are. Akaike info criterion (AIC) is an indicator to measure the goodness of the statistical model's fitting effect. The smaller the value is, the better the fitting effect is. Schwarz criterion, Hannah Quinn criterion are similar to AIC. The smaller the value is, the better the fitting effect is. When the Durbin Watson (DW) stat is closer to 0, the correlation between the fitting data and the original data higher. The calculation process of the elements submitted above is shown in Table 4. It can be seen from Table 5 and Table 6 that the DW value of the data generated by MC theory is closer to 0.

T.

Table 4. The Elements of Regression Analysis

E1KOBZ_2022_v16n10_3355_t0004.png 이미지

Table 5. Output Regression Analysis Based On MC

E1KOBZ_2022_v16n10_3355_t0005.png 이미지

Table 6. Output Regression Analysis Based On LSTM​​​​​​​

E1KOBZ_2022_v16n10_3355_t0006.png 이미지

In Table 5 and Table 6, S. E. of the regression reflects the average difference between the dependent variable and the actual value. Sum squared resid is used to measure the effect of variables and random errors. Log-likelihood: reflects the fitting state of the model when there is only an intercept. Generally, the value is negative. F-statistic indicates the significance of the whole fitting process. S. D. dependent var is the arithmetic square root of the variance, which reflects the degree of dispersion of a data set. Akaike info criterion (AIC) is an indicator to measure the goodness of the statistical model's fitting effect. Schwarz criterion, Hannah Quinn criter are similar to AIC. During the experiment, deleting the parameter VELOCITY in Table 6, due to the number of velocities generated by the LSTM system is stationary, equal to 40, and it's meaningless for regression analysis. The final fitting number of R-square for the music which is used by the way of MC is 0.997, and the corresponding number for LSTM is 0.982. Both of these two numbers largely coincide with the raw data, no matter how the output numbers are generated. After the data comparison, excepting the Log-likelihood and S.D. dependent var, that the results generated by the MC method are always better than LSTM. At the same time, excepting parameter F-statistic, for all metrics, lower the values, better the results.

Using the LSTM, under the same number of samples, the generated music sounds terrible and unstable. This is because the distribution of notes lacks too much regularity, the music sounds abrupt, and the melody is excessive without a buffer. After many attempts, the LSTM system will consume a lot of time and take up a lot of running resources. Most importantly, the output results are especially different from each other each time and do not have a unified style. Although LSTM can deal with the problem of time series well, for melodious music, the influence of melody has a greater influence than the time series.

It can be seen that the melody is more natural than the result with LSTM, and the music doesn't sound too abrupt. Through the above simulation results and statistical index data, it shows that the music generated by the MC method has a more stable melody line contour and less dissonance interval. However, the melody generated by the LSTM system has more jumping intervals and larger local fluctuation of melody.

Therefore, compared with the system of the LSTM, the MC sampling algorithm can balance perfectly in the music field, which can not only deal with the time series of music samples but also can summarize the distribution of duration, pitch, and velocity.

Fig. 11 and Fig.12 show the melody change diagram of the sample generated by LSTM and MC. Replaced the dynamics of the note by the depth of the color, if the dynamics are light, the color is close to blue, otherwise close to red. Fig. 11(a) and Fig. 12(a) show the melody change diagram of the sample generated by LSTM and MC. Replaced the dynamics of the note by the depth of the color, if the dynamics are light, the color is close to blue, otherwise close to red. Fig. 11 (b) and Fig. 12 (b) show the process of note change from the second beat to the 16th beat of the generated music with two beats as the interval. Here, using images to observe the melody change is an idea and suggestion, that is also an exciting way to describe the melody for the output.

E1KOBZ_2022_v16n10_3355_f0011.png 이미지

Fig. 11. Melody change of LSTM

E1KOBZ_2022_v16n10_3355_f0012.png 이미지

Fig. 12. Melody change of MC

5. Conclusions And Future Work

We have presented a deep generative approach to music structure analysis based on the MC algorithm. The model represents the essential characteristics of sections, homogeneity, repetitiveness, and regularity, with the MC algorithm. The experimental results show that the proposed method is effective for musical structure analysis.

The proposed method considers homogeneity, repetitiveness, and regularity, but not a novelty, which has been emphasized in conventional research. Exploiting this aspect remains an avenue for future work. It is also important to deal with further hierarchies, as music has a hierarchical structure moving from motives and phrases to sections and section groups.

References

  1. G. Hu, H. Wang, "Most Likely Optimal Subsampled Markov Chain Monte Carlo," Journal of Systems Science &, vol. 34, no. 3, pp. 1121-1134, 2021.
  2. S. Hu, X. Wang, "Exponential Convergence Rates of Markov Chains under a Weaken Minorization Condition," Acta Mathematica Sinica, vol. 34, no. 12, pp. 1829-1836, 2018. https://doi.org/10.1007/s10114-018-7541-8
  3. B. McFee, J. Kim, M. Cartwright, et al, "Open-source practices for music signal processing research: Recommendations for transparent, sustainable, and reproducible audio research," IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 128-137, 2018. https://doi.org/10.1109/MSP.2018.2875349
  4. X. Wang, D. Ding, H. Dong, X. Zhang, "Neural-Network-Based Control for Discrete-Time Nonlinear Systems with Input Saturation Under Stochastic Communication Protocol," IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 12, pp. 766-778, 2021. https://doi.org/10.1109/JAS.2021.1003922
  5. L. Hiller, L. Isaacson, Experimental Music: Composition With an Electronic Computer, New York, NY, USA: McGraw-Hill, 1959.
  6. C. Walder, "Modelling Symbolic Music: Beyond the Piano Roll," in Proc. of JMLR: Workshop and Conference Proceedings, 63, 174-189, Jun. 2016.
  7. F. Ghedini, P. Francois, R. Pierre, "Creating music and texts with flow machines," Multidisciplinary Contributions to the Science of Creative Thinking, pp. 325-343, July. 2016.
  8. V. Nasiri, A.A. Darvishsefat, R. Rafiee, A. Shirvany, M. A. Hemat, "Land use change modeling through an integrated Multi-Layer Perceptron Neural Network and Markov Chain analysis(case study: Arasbaran region, Iran)," Journal of Forestry Research, vol. 30, no. 3, pp. 943-957. April. 2019. https://doi.org/10.1007/s11676-018-0659-9
  9. H. Takamori, T. Nakatsuka, S. Fukayama, M. Goto, S. Morishima, "Audio-based automatic generation of a piano reduction score by considering the musical structure," in Proc. of International Conference on Multimedia Modeling (ICMM), pp. 169-181, Dec. 2019.
  10. M. Muller, A. Arzt, S. Balke, M. Dorfer, G. Widmer, "Cross-modal music retrieval and applications: An overview of key methodologies," IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 52-62, 2019. https://doi.org/10.1109/MSP.2018.2868887
  11. M. Xu, Z. Wang, G. Xia, "Transferring piano performance control across environments," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 221-225, 2019.
  12. AllMusic, "Allmusic-record reviews, streaming songs, genres and bands," Aug, 2020.
  13. M. Muller, "An educational guide through the FMP notebooks for teaching and learning fundamentals of music processing," Signals, vol. 2, no. 2, pp. 245-285, 2021. https://doi.org/10.3390/signals2020018
  14. B. D. Smith, G. E. Garnett, "Reinforcement Learning and the Creative, Automated Music Improviser," in Proc. of EvoMUSART 2012: Evolutionary and Biologically Inspired Music, Sound, Art and Design, pp. 223-234, 2012.
  15. Z. Wang, X. Gus, "A framework for automated popsong melody generation with piano accompaniment arrangement," arXiv preprint, Dec, 2018.
  16. J. Gillick, A. Roberts, J. Engel, D. Eck, D. Bamman, "Learning to groove with inverse sequence transformations," in Proc. of the 36th International Conference on Machine Learning, ICML, pp. 2269-2279, 2019.
  17. M. Dorfer, J. Schluter, A. Vall, F. Korzeniowski, G. Widmer, "End-to-end cross-modality retrieval with cca projections and pairwise ranking loss," International Journal of Multimedia Information Retrieval, vol. 7, no. 2, pp. 117-128, 2018. https://doi.org/10.1007/s13735-018-0151-5
  18. M. Niu, Y. Lin, Q. Zou, "sgRNACNN: identifying sgRNA on-target activity in four crops using ensembles of convolutional neural networks," Plant molecular biology, vol. 105, no. 4, pp. 483-495, 2021. https://doi.org/10.1007/s11103-020-01102-y
  19. H. Dong, W. Hsiao, L. Yang, Y. Yang, "Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment," in Proc. of Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, vol. 32, no. 1, pp. 34-41, 2018.
  20. P. Sangkloy, J. Lu, C. Fang, F. Yu, J. Hays, "Scribbler: Controlling deep image synthesis with sketch and color," in Proc. of 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Honolulu, HI, USA, pp. 5400-5409, 2017.
  21. G. Hadjeres, F. Nielsen, "Anticipation-rnn: enforcing unary constraints in sequence generation, with application to interactive music generation," Neural Computing and Applications, vol. 32, pp. 995-1005, 2020. https://doi.org/10.1007/s00521-018-3868-4
  22. K. Ganguli, S. Gulati, X. Serra, P. Rao, "Data-driven exploration of melodic structures in hindustani music," in Proc. of The 17th International Society for Music Information Retrieval Conference, 2016.
  23. K. Lu, C. S. Foo, K. K. Teh, H. D. Tran, V. R. Chandrasekhar, "Semi-supervised audio classification with consistency-based regularization," in Proc. of Interspeech 2019, pp. 3654-3658, 2019.
  24. S. Hochreiter, J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, Issue 8, pp. 1735-1780, Nov, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  25. L. Yang, A. Lerch, "On the evaluation of generative models in music," Neural Computing and Applications, vol. 32, pp. 4773-4784, 2020. https://doi.org/10.1007/s00521-018-3849-7
  26. A. A. S. Gunawan, A. P. Iman, D. Suhartono, "Algorithmic Music Generation using Recurrent Neural Networks," Atlantis Press, vol. 13, no. 1, pp. 645-654, 2020.
  27. G. Boenn, "The Farey Sequence as a Model for Musical Rhythm and Meter", Computational Models of Rhythm and Meter, pp. 83-112, 2018.
  28. G. Aguilera, L. Riggi, K. Miller, T. Roslin, R. Bommarco, "Organic fertilisation enhances generalist predators and suppresses aphid growth in the absence of specialist predators," Journal of Applied Ecology, vol. 58, no. 7, pp. 1455-1465, 2021. https://doi.org/10.1111/1365-2664.13862
  29. R. F. Voss, J. Clarke, "1/f noise in music and speech," Nature, vol. 258, pp. 317-318, 1975. https://doi.org/10.1038/258317a0
  30. P. Chordia, "Modeling melodic improvisation in Turkish folk music using variable-length Markov models," in Proc. of Int. Soc. Music Inf. Retrieval Conf, pp. 269-274, 2011.
  31. P. Anthony, L. Robin, W. Alistair, S. Robert, "Algorithmic music as intelligent game music," in Proc. of AISB Anniversary Conv, pp. 1-4, Apr, 2014.