import pandas as pd
import numpy as np
From datetime import datetime, timedelta
Import yfinance as yf #import data via yahoo
sp_list = [‘spy’]
today = datetime.today ()
# dd/mm/yy
#Get Last Business Day
Offset = max (1, (today.weekday () + 6) % 7-3)
Timed = TimeDelta (Offset)
TODAY_BUSINESS = TODAY -TIMED
Print (“d1 =”, today_business)
TODAY = TODAY_BUSINESS.STRFTIME (“%Y-%M-%D”)
symbols_list = sp_list
start = ‘2000-01-01’
end = today
Print (‘S & P500 Stock Download’)
R = yf.download (symbols_list, start, end)
df_pivot = r
#Replace all nan data with zero
treasury_yeild = df_pivot.close
From scipy.stats import multivaria_normal
From Sklearn.mixTure Import GaussianmixTure
#’10yryeild’: treasury_yeild.close.values
# 0. Create dataset
sp_list = [‘spy’]
symbols_list = sp_list
Start = ‘2008-01-01’
Print (‘S & P500 Stock Download’)
R = yf.download (symbols_list, start, end)
daily_returns = (r.close.pct_change ()) #Daily Log Returns Close Price for Each Day -Y Observations
daily_returns = daily_returns.iloc [1:]
X = daily_returns.values
Gmm = GaussianmixTure (n_components = 3) .fit (x.Reshape (-1,1)) # Instantiated and Fit The Model
From scipy.stats import multivaria_normal
From Sklearn.mixTure Import GaussianmixTure
#’10yryeild’: treasury_yeild.close.values
# 0. Create dataset
gmm_data = pdataframe ({‘inflation’: inflation.weight, ‘market_return’: spy.close.values})
gmm_data = gmm_data.to_numpy ()
X = gmm_data
x, y = np.meshgrid (np.sort (x [:, 0]), np.sort (x [:, 1]))
Xy = np.array ([x.flatten (), y.flatten ()]). T
Gmm = GaussianmixTure (n_components = 3) .fit (x) # Instantiated and Fit The Model
Print (‘Converget:’, GMM.CONVERGED_) # Check if the model has converged
Source: Deephub IMBA
This article
About 2300 words
Suggestions read
5 minutes
This article introduces you how to apply GMM to the financial market and economy.
introduce
Through the development of the past decade, ordinary people are becoming more and more likely to enter the stock market, and the amount of funds entering and exiting the market every day has reached a record high.
As investors, you can cultivate experiences and intuition when you buy or sell in various ways. One of the easiest ways is to consult friends or other investors, but they will soon flood you from a contradictory point of view.
This article will try to solve the golden problem of making money with rigorous mathematical tools rather than stiff opinions -when should I buy or sell?
I will demonstrate how to use the Gaussian hybrid model to help determine when funds enter or withdraw from the market.
Mathematically, any market conditions of any given time can be called “market state”. The market can usually be interpreted as the concept of any amount, such as a bear market or a bull market; the size of the fluctuations and so on. We can concentrate on the status of the trading day according to some characteristics, which will be much better than each concept.
Because the market is not clearly defined -therefore, there is no response variable representing the market -so using unsupervised machine learning models to confirm that the market state may be much better than the supervision model, which is also the theoretical assumption of this article.
There is supervision and unsupervised machine learning
The difference between these two methods is whether the data set used is marked: Supervision and learning uses labeled input and output data, and the unsupervised learning algorithm is not determined. The labeling of the dataset is a response variable or the variables that try to predict the value or classification value. Therefore, when using the supervision machine learning algorithm, the predicted variable is clearly defined. An example of a very simple but powerful supervision and learning is linear return. Predict Y through x.
Gaussian hybrid model (GMM)
The Gaussian hybrid model is the overlap of multiple normal distribution in the P dimension space. The dimension of the space is generated by the number of variables. For example, if we have a variable (the S & P 500 index returns), GMM will fit the one -dimensional data. GMM can be used to simulate the status of the stock market and other financial applications. One feature of the stock market return is the heavy tail produced by the high fluctuation day. The ability to capture high fluctuations at the tail of the distributed tail is very important for capturing information during the modeling process.
The figure above represents some multi -mode data with four clusters. The Gaussian hybrid model is a cluster model for labeling data.
One of the main benefits of using GMM for unsupervised clustering is that the space containing each cluster can show the oval shape. The Gaussian hybrid model not only considers the average value, but also considers the collaboration to form a cluster.
One advantage of the GMM method is that it is completely data -driven. The data provided to the model can be used for clustering. The important thing is that the label of each cluster can be numbers because data drives potential features, not human opinion.
GMM’s mathematical explanation
The goal of the Gaussian hybrid model is to allocate the data point to one of the nainer distribution. To this end, the expectation of maximizing (EM) algorithm to solve the parameters of each normal distribution.
Step 1: Random initialization Starting normal distribution parameters
Step 2: Expectation E -EXPECTION, calculate the expectations of the sample to hide variables according to the current parameter value;
Step 3: Perform Maximum, according to the hidden variables of the current sample, solve the maximum like graive of the parameter;
Step 4: Calculation (scores of data status, average value, co -party difference) combined probability of the combination probability of the combination probability
Step 5: Repeat Step 2-4 until you converge at the same level
The probability of each data point belongs to a cluster is shown below. On the basis of index, we get the probability of each data point belongs to each independent cluster. The size of the matrix will be calculated according to the number of clusters. Because it is a probability matrix, the value of the index “i” is 1.
Index I represent each data point or vector. Index C represents a given cluster; if we have three clusters (C) will be 1 or 2 or 3.
The above is the multi -variable Gaussian formula. Among them, MU and Sigma are parameters that need to use the EM algorithm for estimates.
Another key concept is that every Gauss distribution in our space is unbounded and overlap each other. According to the position of the data point, a probability is assigned from each distribution. The total probability of each data point that belongs to any cluster is 1.
Finally, because the EM algorithm is an iterative process, we need to measure the progress of each step to understand when it is stopped. To this end, we use the model’s malaria function to measure when the parameters converge.
GMM implementation
This section will be divided into two sections, each of which represents an application of GMM.
Use GMM to divide the return of S & P500 into three states. The data comes from Yahoo Finance.
Here I need to determine how many states can represent the market environment. We will assume that three states -bear markets, shocks, and bull markets.
I will use the S & P500 to return to the GMM.
The Python implementation of GMM on one -dimensional data is very simple.
Using Sklearn’s Gaussian hybrid model can find the state we want.
Judging from the above analysis, the two states may be fine.
One problem that may occur is convergence. It may be based on the definition of the standard of a threshold in the initial conditions and the threshold in the EM algorithm, or it may form a different distribution. This needs further investigation.
Use macroeconomic data that meets GMM to classify the US economy
For the intuitive demonstration of GMM, I will use two -dimensional data (two variables). Each corresponding cluster is a multi -state distribution of three dimensions. In this example, the first dimension is the inflation value (we call X), the second dimension is the monthly return rate of S & P500 (we call Y), and the third dimension is the combined probability of x & y. In other words, what is the probability of a combination of X and Y.
The picture shows a major advantage of GMM compared to other cluster algorithms. Positive distribution can produce elliptical shapes. This nature comes from the coordinator matrix.
Give two -dimensional data, GMM can produce three different states.
Finally, if you want to create a meaningful model, you should consider more variables. In fact, a series of different indicators constitute the US economy and its performance. We can continue and merge the dimension of any amount, but it is important to understand the relevant structure of the data provided to the model before entering N dimensions.
Summarize
This is a brief introduction to how we apply GMM to the financial market and economy. Keep in mind that this is just an introduction. The introduction of the GMM method is to improve the stability of the data data of the stock market price as the state. The connection between market conditions and economy needs more in -depth research.