{"id":524,"date":"2020-05-17T19:58:46","date_gmt":"2020-05-17T23:58:46","guid":{"rendered":"https:\/\/www.olivineresearch.com\/blog\/?p=524"},"modified":"2020-05-28T21:19:37","modified_gmt":"2020-05-29T01:19:37","slug":"pairs-trading-the-math-behind-the-code","status":"publish","type":"post","link":"https:\/\/www.olivineresearch.com\/blog\/?p=524","title":{"rendered":"Pairs Trading: The Math behind the Code"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"524\" class=\"elementor elementor-524\" data-elementor-settings=\"[]\">\n\t\t\t\t\t\t<div class=\"elementor-inner\">\n\t\t\t\t\t\t\t<div class=\"elementor-section-wrap\">\n\t\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d4e9aaf elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d4e9aaf\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-fc796a1\" data-id=\"fc796a1\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d74d1df elementor-widget elementor-widget-text-editor\" data-id=\"d74d1df\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>Over the past several years, I have experimented with different programming languages (Python, R, GNU Octave and Matlab) to implement facets of quantitative trading techniques.\u00a0<span style=\"letter-spacing: 0px;\">Personally, I have found this very fulfilling because it brings together three disciplines that interest me \u2013 finance, programming and math \u2013 in no particular order. The rubric of this process has evolved over several years and I have had to course correct and go back to the drawing board often. Since some of the actual code has been used in a commercial interest, I want to heed respect to any such obligation and refrain from using prop code in this blog. However, I have also come to realize that the code in and of itself (or the programming language) is only of secondary importance.\u00a0What really matters is an appreciation of the mathematical principles underlying the trading strategy and a good disposition to risk management. My strong sense is that many of these core concepts can be applied in a health care data analysis setting;\u00a0<\/span>without<span style=\"letter-spacing: 0px;\">\u00a0codifying everything under the<\/span><span style=\"letter-spacing: 0px;\">\u00a0hard label of AI\/ML.<\/span><\/p><p><span style=\"letter-spacing: 0px;\">The usage of different programming languages was mostly curiosity in terms of feature sets laid out in each and I have gravitated towards open source when possible. However, for this example below specific to Pairs Trading, I have used snippets of Matlab code for efficiency purposes. <b><i>PS<\/i><\/b>: I do wish GNU Octave had an elegant implementation of the Augmented Dickey Fuller test which I use below in Matlab.<\/span><\/p><p>\u00a0<\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-fb1cbd1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"fb1cbd1\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9fb3960\" data-id=\"9fb3960\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bc9735b elementor-widget elementor-widget-text-editor\" data-id=\"bc9735b\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p><strong>W<\/strong><b>hat is Pairs Trading?<\/b><\/p>\n<p>Pairs trading is a market neutral strategy (i.e. theoretically a trading strategy if constructed correctly can provide excess returns irrespective of a broader up\/down\/sideways market). This makes it, at least on the surface, an appealing hedge strategy. However, there is plenty of academic and market research that suggests that calls into question the efficacy of such a strategy. In my opinion the failures could be primarily due to the following reasons:<\/p>\n<p>1) Picking the wrong assets to pairs trade. A cursory instinct is to pick firms that seem to be operating in the space space ( eg, (Pepsi, Coca Cola), (Intel, AMD), (Home Depot, Lowes), (Analog Devices, Maxim Integrated) etc to go into a pairs trade. However, without really analyzing and back-testing relevant data attributed to these pairs, the trading strategy may not achieve its objective.&nbsp;<\/p>\n<p>2) Very poor testing strategy accentuated by complicated models and data snooping. Very often, quantitatively oriented market participants have a tendency to deploy overly parameterized and therefore complicated models. Parameters are added on to just prove a preconceived notion<\/p>\n<p>3) Wrong duration. A certain strategy may work only for a specific time period. Extrapolating those results to a different duration ( or going out of sample) may yield disastrous results.<\/p>\n<p><span style=\"letter-spacing: 0px;\">4) Overly relying on quantitative techniques alone with no consideration of economic fundamentals. There may a good business reason why the pairs move contrary to expectations. However, quantitatively orients investors are loath to make the adjustment<\/span><\/p>\n<p><span style=\"letter-spacing: 0px;\"><b>How is Pairs Trading Implemented?<\/b><\/span><\/p>\n<p><span style=\"letter-spacing: 0px;\">Pairs trading is generally implemented by picking <\/span><strong style=\"letter-spacing: 0px;\">a pair of securities ( <\/strong><i style=\"letter-spacing: 0px;\">don&#8217;t want to get too nuanced but really this pair can be several composites and does not have to be limited to 2 securities<\/i><span style=\"letter-spacing: 0px;\">)&nbsp;to buy (go long) and sell (go short) at the right price, right duration and right ratio within an optimal risk management framework. To identify some of this criteria and relationship, advanced mathematical and statistical techniques are used. But, at its very core is a simple <\/span><b style=\"letter-spacing: 0px;\">mean-reversion concept of time series.<\/b><\/p>\n<p>Let&#8217;s assume A is one time series and B is the other time series such that:<\/p>\n<p>A = Ratio (Alpha) * B + White noise or Error term<\/p>\n<p>If A and B make for a good pairs trade, then the expected value of the constant ratio must converge to the mean over a period of time.&nbsp;<span style=\"letter-spacing: 0px;\">The mean reversion can be evaluated by testing for co-integration and stationarity of the&nbsp; time series. <\/span><b style=\"letter-spacing: 0px;\"><u>Remember,&nbsp; it is co-integration and not correlation<\/u><\/b><span style=\"letter-spacing: 0px;\"> and it is fairly easy to obscure the two.<\/span><\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-16cad86 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"16cad86\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d99f24c\" data-id=\"d99f24c\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ad418cc elementor-widget elementor-widget-text-editor\" data-id=\"ad418cc\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p><strong>Correlation<\/strong> between two time series is in effect a measurement of relation of their returns over a shorter time horizon. In simpler terms, if the stocks are strongly correlated based on daily returns that means that there is a high probability that the stocks move up or down in tandem on most days. However, there is no guarantee that the stock prices will track each other over a longer time period. And over a longer time horizon for the time series, this drift can somewhat be persistent and may never return to the mean (or exhibit any mean reversion).&nbsp;<b style=\"text-decoration-line: underline;\"> If anyone mistakenly enters into a pairs trade based on a correlation analysis, a rude awakening may be in the offing.<\/b> It is very natural to jump into a pairs trade on say a (Home Depot, Lowes), (Intel, AMD) or (Pepsi, Coca Cola) without looking at how such a strategy would make make statistical sense for the duration considered. Though some of these stock may exhibit strong daily correlations and make make for good pair trades on a very short term basis, they may end up drifting contrary to expectations in a longer time frame.&nbsp;<\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d703fa1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d703fa1\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ae91f9b\" data-id=\"ae91f9b\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-68b5fe2 elementor-widget elementor-widget-text-editor\" data-id=\"68b5fe2\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>Combinations of some time series can exhibit the stationarity property \u2013 which means that the drift away from the mean is fairly contained. Though possible, it is very difficult to find individual stocks that exhibit stationarity consistently over longer time periods. There is plenty of academic work that indicates that stocks generally follow a random walk and there is no way to predict future movements based on past movements in a precise and dependable manner. To model individual stock movements, one would have to then rely on stochastic processes (eg: Geometric Brownian Motion GBM) to mimic the randomness inherent in this formula.<\/p><p>&#8220;Xt= \u03bc+Xt-1 + \u03f5t&#8221;<\/p><p>Where:<br \/>&#8220;Xt&#8221; is the log of the price of the stock at time t<br \/>&#8220;Xt-1 &#8221; is the log of the price of the stock at time t-1<br \/>&#8220;\u03bc&#8221; is the drift constant<br \/>and &#8220;\u03f5t&#8221; is the error or noise term at time t<\/p><p class=\"MsoNormal\">\u00a0<\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-73f2288 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"73f2288\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1914a7d\" data-id=\"1914a7d\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-08f3d2c elementor-widget elementor-widget-text-editor\" data-id=\"08f3d2c\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>That said, if you expand your horizon to look for a pair of stocks and depending on what ratio of one you buy versus short (<b>also referred to as the hedge ratio<\/b>), the overall value of this pair could exhibit stationary or mean reverting properties. Such pairs are then considered to be <b>co-integrated<\/b>. Remember that this pair portfolio consists of the long and short and the \u201cspread\u201d is the difference in the market prices. The strategy is to simply buy the pair when the spread is low and sell\/short the pair when the spread is high with the underpinning premise that the \u201cspread\u201d is mean reverting.&nbsp;<\/p>\n<p>Going back to the time series example from a few paragraphs back:<\/p>\n<p>A = Alpha * B + White noise or Error term<\/p>\n<p>A and B are two time series data sets for two securities.&nbsp;<span style=\"letter-spacing: 0px;\">At various points, the ratio Alpha will drift up and below the mean presenting trading opportunities. When Alpha is smaller than the mean, then we buy A and short B. When Alpha is larger than the mean, we do the opposite &#8211; short A and buy B.<\/span><\/p>\n<p><\/p>\n<p>There are several statistical techniques to mine for this and commonly used test is the Dickey Fuller test ( another one is the Johansen test). Almost all statistical and math software packages (including Matlab, R and Python) offer this functionality. This function is also available open source from the spatial-econometrics toolbox maintained by James P. Lesage with support from the National Science foundation (<a href=\"https:\/\/www.spatial-econometrics.com\/\">https:\/\/www.spatial-econometrics.com\/<\/a>). I will use the cadf function from this toolbox in the example below. The beauty of the cadf function that it does two things: first it computes the hedge ratio by running a linear regression between the two prices series A and B above to form a portfolio. Then, as a two for one, it also tests for stationarity on the portfolio.<\/p>\n<p>Keep in mind that co-integration properties can exist across multiple time series (not just 2). If you want to test for more than 2 series for cointegration then the Johansen test is probably more apt.<\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-e646e7e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e646e7e\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-cdd57e4\" data-id=\"cdd57e4\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-80ffd15 elementor-widget elementor-widget-text-editor\" data-id=\"80ffd15\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>The fundamental premise of the Augmented Dickey Fuller Test is that the change of a certain price series over the next period is proportional to the difference between the mean price and the current price. If you want to read more about the mathematical soups to nuts of ADF, you can refer to the notes here: <a href=\"http:\/\/faculty.smu.edu\/tfomby\/eco6375\/BJ%20Notes\/ADF%20Notes.pdf\">http:\/\/faculty.smu.edu\/tfomby\/eco6375\/BJ%20Notes\/ADF%20Notes.pdf<\/a><\/p><p>With that basic understanding, let&#8217;s analyze two stocks (ADI and MXIM) that display high correlation but are not co-integrated though one can easily be fooled into making that assumption because they operate in the same industry &#8211; semiconductors.<span style=\"letter-spacing: 0px;\">\u00a0Here is how the stocks have performed over the past 3 years.<\/span><\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-dac4213 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dac4213\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a24650b\" data-id=\"a24650b\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-246b0cb elementor-widget elementor-widget-image\" data-id=\"246b0cb\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-image\">\n\t\t\t\t\t\t\t\t\t\t<img width=\"960\" height=\"441\" src=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi-1024x470.png\" class=\"attachment-large size-large\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi-1024x470.png 1024w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi-300x138.png 300w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi-768x352.png 768w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi.png 1092w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-44220d9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"44220d9\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5c194bc\" data-id=\"5c194bc\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0b15942 elementor-widget elementor-widget-text-editor\" data-id=\"0b15942\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p><strong>Running a quick lagged correlation coefficient test in Matlab on these two stocks over the past 3 years on daily returns shows a correlation of .8119. With a p-value of 0 this indicates that the two stocks are significantly correlated.<\/strong><\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d3a0776 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d3a0776\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1e748a9\" data-id=\"1e748a9\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-11e543b elementor-widget elementor-widget-text-editor\" data-id=\"11e543b\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>However, the results of the cointegration\/augmented Dickey Fuller test indicates the following.<\/p><p><a href=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/matlab_cadf.png\"><img loading=\"lazy\" class=\"aligncenter size-full wp-image-535\" src=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/matlab_cadf.png\" alt=\"\" width=\"831\" height=\"256\" srcset=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/matlab_cadf.png 831w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/matlab_cadf-300x92.png 300w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/matlab_cadf-768x237.png 768w\" sizes=\"(max-width: 831px) 100vw, 831px\" \/><\/a><\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6a7ff13 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6a7ff13\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-ab7de29\" data-id=\"ab7de29\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-74e2776 elementor-widget elementor-widget-text-editor\" data-id=\"74e2776\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>The t-statistic of this test is \u2013 1.9416 which is way lesser than the 90% threshold and we can safely assume that this pair is NOT cointegrated. The hedge ratio is 1.8053 but this does not seem to be a good candidate for pairs trading. <b>Going based on correlation alone would have been the wrong move!<\/b><\/p><p>\u00a0<\/p><p>Now, let&#8217;s look at two other stocks ( SPY and QCOM) again over the past 3 years and run the same cointegration test.<\/p><p><img loading=\"lazy\" class=\"aligncenter size-full wp-image-550\" src=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/spy_qcom.png\" alt=\"\" width=\"917\" height=\"286\" srcset=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/spy_qcom.png 917w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/spy_qcom-300x94.png 300w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/spy_qcom-768x240.png 768w\" sizes=\"(max-width: 917px) 100vw, 917px\" \/><\/p><p>\u00a0<\/p><p>As you can see the t-statistic is a much better -3.3217 which implies a more than 90% probability that the two time series are co-integrated. This would need more validation before putting on the trade at a hedge ratio of 4.174 but definitely seems more promising than the previous pair.\u00a0<\/p><p>I will not go into generating specific trading signals because that exercise is very nuanced and there are several attributes to consider including training\/testing the data set and specific signaling algorithms. However, as a very simplistic illustration, t<span style=\"letter-spacing: 0px;\">he chart below indicates the z-score based buy and sell signals for the SPY\/QCOM pair based on the assumption that the ratio reverts to the mean. The trading signals are generated from a measure of the rolling mean (60 day Moving Average) and (60 day standard deviation). The z score is (5 day moving average &#8211; 60 day moving average) divided by (60 day standard deviation).\u00a0 Though this set up looks promising, this needs to be validated further.<\/span><\/p><p>\u00a0<\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a1c599a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a1c599a\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f5725f1\" data-id=\"f5725f1\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c1ee3f2 elementor-widget elementor-widget-image\" data-id=\"c1ee3f2\" data-element_type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-image\">\n\t\t\t\t\t\t\t\t\t\t<img width=\"960\" height=\"411\" src=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/Capture_SPYQCOM.png\" class=\"attachment-large size-large\" alt=\"\" loading=\"lazy\" srcset=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/Capture_SPYQCOM.png 994w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/Capture_SPYQCOM-300x129.png 300w, https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/Capture_SPYQCOM-768x329.png 768w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/>\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a1deab6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a1deab6\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t\t\t<div class=\"elementor-row\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5d53283\" data-id=\"5d53283\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-column-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t<div class=\"elementor-widget-wrap\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5f82174 elementor-widget elementor-widget-text-editor\" data-id=\"5f82174\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<div class=\"elementor-text-editor elementor-clearfix\"><p>All that said, keep in mind that:<\/p><p>1) When regime shifts occur the past data may not absolutely indicate what will happen in the future.\u00a0<\/p><p>2) If there are obvious pair trading combos, the arbitrageurs will be at work minimizing the marginal utility of the idea.\u00a0<\/p><p>3) The other basic assumption underlying the algo is that the financial data is normally distributed ( in reality however financial data may not be normally distributed and may exhibit fat tail behavior).<\/p><p>4) Don&#8217;t be wed to any strategy and refrain from picking data points to prove a preconceived notion. If testing on in sample and out sample data sets do not provide statistically significant and consistent results, it is probably telling you to not rely on the strategy.<\/p><p>5) There is beauty in simplicity. Keep the model and algorithm as uncomplicated as possible.<\/p><p><span style=\"letter-spacing: 0px;\">Notwithstanding the above,\u00a0 an appreciation of the data and underlying math can help effectively structure your code and increase your odds of success; as long as you don&#8217;t throw risk management out of the window.\u00a0<\/span><\/p><\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Over the past several years, I have experimented with different programming languages (Python, R, GNU Octave and Matlab) to implement facets of quantitative trading techniques.&nbsp;Personally, I have found this very fulfilling because it brings together three disciplines that interest me \u2013 finance, programming and math \u2013 in no particular order. The rubric of this process has evolved over several years and I have had to course correct and go back to the drawing board often. Since some of the actual code has been used in a commercial interest, I want to heed respect to any such obligation and refrain from using prop code in this blog. However, I have also come to realize that the code in and of itself (or the programming language) is only of secondary importance.&nbsp;What really matters is an appreciation of the mathematical principles underlying the trading strategy and a good disposition to risk management. My strong sense is that many of these core concepts can be applied in a health care data analysis setting;&nbsp;without&nbsp;codifying everything under the&nbsp;hard label of AI\/ML. The usage of different programming languages was mostly curiosity in terms of feature sets laid out in each and I have gravitated towards open source when possible. However, for this example below specific to Pairs Trading, I have used snippets of Matlab code for efficiency purposes. PS: I do wish GNU Octave had an elegant implementation of the Augmented Dickey Fuller test which I use below in Matlab. What is Pairs Trading? Pairs trading is a market neutral strategy (i.e. theoretically a trading strategy if constructed correctly can provide excess returns irrespective of a broader up\/down\/sideways market). This makes it, at least on the surface, an appealing hedge strategy. However, there is plenty of academic and market research that suggests that calls into question the efficacy of such a strategy. In my opinion the failures could be primarily due to the following reasons: 1) Picking the wrong assets to pairs trade. A cursory instinct is to pick firms that seem to be operating in the space space ( eg, (Pepsi, Coca Cola), (Intel, AMD), (Home Depot, Lowes), (Analog Devices, Maxim Integrated) etc to go into a pairs trade. However, without really analyzing and back-testing relevant data attributed to these pairs, the trading strategy may not achieve its objective.&nbsp; 2) Very poor testing strategy accentuated by complicated models and data snooping. Very often, quantitatively oriented market participants have a tendency to deploy overly parameterized and therefore complicated models. Parameters are added on to just prove a preconceived notion 3) Wrong duration. A certain strategy may work only for a specific time period. Extrapolating those results to a different duration ( or going out of sample) may yield disastrous results. 4) Overly relying on quantitative techniques alone with no consideration of economic fundamentals. There may a good business reason why the pairs move contrary to expectations. However, quantitatively orients investors are loath to make the adjustment How is Pairs Trading Implemented? Pairs trading is generally implemented by picking a pair of securities ( don&#8217;t want to get too nuanced but really this pair can be several composites and does not have to be limited to 2 securities)&nbsp;to buy (go long) and sell (go short) at the right price, right duration and right ratio within an optimal risk management framework. To identify some of this criteria and relationship, advanced mathematical and statistical techniques are used. But, at its very core is a simple mean-reversion concept of time series. Let&#8217;s assume A is one time series and B is the other time series such that: A = Ratio (Alpha) * B + White noise or Error term If A and B make for a good pairs trade, then the expected value of the constant ratio must converge to the mean over a period of time.&nbsp;The mean reversion can be evaluated by testing for co-integration and stationarity of the&nbsp; time series. Remember,&nbsp; it is co-integration and not correlation and it is fairly easy to obscure the two. Correlation between two time series is in effect a measurement of relation of their returns over a shorter time horizon. In simpler terms, if the stocks are strongly correlated based on daily returns that means that there is a high probability that the stocks move up or down in tandem on most days. However, there is no guarantee that the stock prices will track each other over a longer time period. And over a longer time horizon for the time series, this drift can somewhat be persistent and may never return to the mean (or exhibit any mean reversion).&nbsp; If anyone mistakenly enters into a pairs trade based on a correlation analysis, a rude awakening may be in the offing. It is very natural to jump into a pairs trade on say a (Home Depot, Lowes), (Intel, AMD) or (Pepsi, Coca Cola) without looking at how such a strategy would make make statistical sense for the duration considered. Though some of these stock may exhibit strong daily correlations and make make for good pair trades on a very short term basis, they may end up drifting contrary to expectations in a longer time frame.&nbsp; Combinations of some time series can exhibit the stationarity property \u2013 which means that the drift away from the mean is fairly contained. Though possible, it is very difficult to find individual stocks that exhibit stationarity consistently over longer time periods. There is plenty of academic work that indicates that stocks generally follow a random walk and there is no way to predict future movements based on past movements in a precise and dependable manner. To model individual stock movements, one would have to then rely on stochastic processes (eg: Geometric Brownian Motion GBM) to mimic the randomness inherent in this formula. &#8220;Xt= \u03bc+Xt-1 + \u03f5t&#8221; Where: &#8220;Xt&#8221; is the log of the price of the stock at time t &#8220;Xt-1 &#8221; is the log of the price of the stock at time t-1 &#8220;\u03bc&#8221; is the drift constant and &#8220;\u03f5t&#8221; is the error or noise term at time t That said, if you expand your horizon to look for a pair of stocks and depending on what ratio of one you buy versus short (also referred to as the hedge ratio), the overall value of this pair could exhibit stationary or mean reverting properties. Such pairs are then considered to be co-integrated. Remember that this pair portfolio consists of the long and short and the \u201cspread\u201d is the difference in the market prices. The strategy is to simply buy the pair when the spread is low and sell\/short the pair when the spread is high with the underpinning premise that the \u201cspread\u201d is mean reverting.&nbsp; Going back to the time series example from a few paragraphs back: A = Alpha * B + White noise or Error term A and B are two time series data sets for two securities.&nbsp;At various points, the ratio Alpha will drift up and below the mean presenting trading opportunities. When Alpha is smaller than the mean, then we buy A and short B. When Alpha is larger than the mean, we do the opposite &#8211; short A and buy B. There are several statistical techniques to mine for this and commonly used test is the Dickey Fuller test ( another one is the Johansen test). Almost all statistical and math software packages (including Matlab, R and Python) offer this functionality. This function is also available open source from the spatial-econometrics toolbox maintained by James P. Lesage with support from the National Science foundation (https:\/\/www.spatial-econometrics.com\/). I will use the cadf function from this toolbox in the example below. The beauty of the cadf function that it does two things: first it computes the hedge ratio by running a linear regression between the two prices series A and B above to form a portfolio. Then, as a two for one, it also tests for stationarity on the portfolio. Keep in mind that co-integration properties can exist across multiple time series (not just 2). If you want to test for more than 2 series for cointegration then the Johansen test is probably more apt. The fundamental premise of the Augmented Dickey Fuller Test is that the change of a certain price series over the next period is proportional to the difference between the mean price and the current price. If you want to read more about the mathematical soups to nuts of ADF, you can refer to the notes here: http:\/\/faculty.smu.edu\/tfomby\/eco6375\/BJ%20Notes\/ADF%20Notes.pdf With that basic understanding, let&#8217;s analyze two stocks (ADI and MXIM) that display high correlation but are not co-integrated though one can easily be fooled into making that assumption because they operate in the same industry &#8211; semiconductors.&nbsp;Here is how the stocks have performed over the past 3 years. Running a quick lagged correlation coefficient test in Matlab on these two stocks over the past 3 years on daily returns shows a correlation of .8119. With a p-value of 0 this indicates that the two stocks are significantly correlated. However, the results of the cointegration\/augmented Dickey Fuller test indicates the following. The t-statistic of this test is \u2013 1.9416 which is way lesser than the 90% threshold and we can safely assume that this pair is NOT cointegrated. The hedge ratio is 1.8053 but this does not seem to be a good candidate for pairs trading. Going based on correlation alone would have been the wrong move! Now, let&#8217;s look at two other stocks ( SPY and QCOM) again over the past 3 years and run the same cointegration test. As you can see the t-statistic is a much better -3.3217 which implies a more than 90% probability that the two time series are co-integrated. This would need more validation before putting on the trade at a hedge ratio of 4.174 but definitely seems more promising than the previous pair.&nbsp; I will not go into generating specific trading signals because that exercise is very nuanced and there are several attributes to consider including training\/testing the data set and specific signaling algorithms. However, as a very simplistic illustration, the chart below indicates the z-score based buy and sell signals for the SPY\/QCOM pair based on the assumption that the ratio reverts to the mean. The trading signals are generated from a measure of the rolling mean (60 day Moving Average) and (60 day standard deviation). The z score is (5 day moving average &#8211; 60 day moving average) divided by (60 day standard deviation).&nbsp; Though this set up looks promising, this needs to be validated further. All that said, keep in mind that: 1) When regime shifts occur the past data may not absolutely indicate what will happen in the future.&nbsp; 2) If there are obvious pair trading combos, the arbitrageurs will be at work minimizing the marginal utility of the idea.&nbsp; 3) The other basic assumption underlying the algo is that the financial data is normally distributed ( in reality however financial data may not be normally distributed and may exhibit fat tail behavior). 4) Don&#8217;t be wed to any strategy and refrain from picking data points to prove a preconceived notion. If testing on in sample and out sample data sets do not provide statistically significant and consistent results, it is probably telling you to not rely on the strategy. 5) There is beauty in simplicity. Keep the model and algorithm as uncomplicated as possible. Notwithstanding the above,&nbsp; an appreciation of the data and underlying math can help effectively structure your code and increase your odds of success; as long as you don&#8217;t throw risk management out of the window.&nbsp;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_s2mail":"yes"},"categories":[28],"tags":[24,25],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v14.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Pairs Trading: The Math behind the Code &lt; Olivine Research<\/title>\n<meta name=\"description\" content=\"Understanding the difference between correlation and cointegration for constucting pairs trading algos\" \/>\n<meta name=\"robots\" content=\"index, follow\" \/>\n<meta name=\"googlebot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta name=\"bingbot\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.olivineresearch.com\/blog\/?p=524\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Pairs Trading: The Math behind the Code &lt; Olivine Research\" \/>\n<meta property=\"og:description\" content=\"Understanding the difference between correlation and cointegration for constucting pairs trading algos\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.olivineresearch.com\/blog\/?p=524\" \/>\n<meta property=\"og:site_name\" content=\"Olivine Research\" \/>\n<meta property=\"article:published_time\" content=\"2020-05-17T23:58:46+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-05-29T01:19:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi-1024x470.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Olivine18\" \/>\n<meta name=\"twitter:site\" content=\"@Olivine18\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/#website\",\"url\":\"https:\/\/www.olivineresearch.com\/blog\/\",\"name\":\"Olivine Research\",\"description\":\"A Random Walk -&gt; Technology | Finance | Healthcare | Programming\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/www.olivineresearch.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/?p=524#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/www.olivineresearch.com\/blog\/wp-content\/uploads\/2020\/05\/mxim_adi.png\",\"width\":1092,\"height\":501},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/?p=524#webpage\",\"url\":\"https:\/\/www.olivineresearch.com\/blog\/?p=524\",\"name\":\"Pairs Trading: The Math behind the Code &lt; Olivine Research\",\"isPartOf\":{\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/?p=524#primaryimage\"},\"datePublished\":\"2020-05-17T23:58:46+00:00\",\"dateModified\":\"2020-05-29T01:19:37+00:00\",\"author\":{\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/#\/schema\/person\/01ffd75d83b4c7f0f7ffa9fcee242a9d\"},\"description\":\"Understanding the difference between correlation and cointegration for constucting pairs trading algos\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.olivineresearch.com\/blog\/?p=524\"]}]},{\"@type\":[\"Person\"],\"@id\":\"https:\/\/www.olivineresearch.com\/blog\/#\/schema\/person\/01ffd75d83b4c7f0f7ffa9fcee242a9d\",\"name\":\"Administrator\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/524"}],"collection":[{"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=524"}],"version-history":[{"count":50,"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/524\/revisions"}],"predecessor-version":[{"id":670,"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/524\/revisions\/670"}],"wp:attachment":[{"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.olivineresearch.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}