Jekyll2021-03-21T09:41:47+00:00http://blog.omega-prime.co.uk//:: (Bloggable a) => a -> IO ()A programming and finance blog by Max Bolingbroke.
US equity valuations2021-02-14T11:38:00+00:002021-02-14T11:38:00+00:00http://blog.omega-prime.co.uk/2021/02/14/bubble<p>The outlook for the US stock market looks unusually poor.</p>
<p>(Nothing in this post is financial advice, and it does not represent the views of my employer.)</p>
<h1 id="expected-returns-are-low">Expected returns are low</h1>
<p><a href="http://pages.stern.nyu.edu/~adamodar/">Aswath Damodaran</a>, an academic who studies valuation, publishes regular estimates of the expected return on equity investments. This is constructed from public company earnings forecasts (sourced from investment bank research reports). As of 2020, the expected returns to holding equity look worse than at any time since the 1960s:
<img src="/2021/02/damodaran-implied-erp.png" alt="" /></p>
<p>Cliff Asness runs AQR, one of the largest quantitative investment managers in the world. In an early-2020 article he <a href="https://www.aqr.com/Insights/Perspectives/Is-Systematic-Value-Investing-Dead">points out</a> that, measured by their stock-price-to-book-value ratio, high-growth stocks are today about 6 times as expensive as low-growth stocks, which is historically unprecedented. I’m sure this ratio is even higher today:
<img src="/2021/02/asness-pb-spread.png" alt="" /></p>
<p>A more common metric of equity market valuation is the <a href="https://www.multpl.com/shiller-pe">Shiller price-to-earnings ratio</a> of the S&P 500. By this metric, US bigcaps are more expensive than at any time since 2000:
<img src="/2021/02/cape.png" alt="" /></p>
<h1 id="sentiment-is-strong">Sentiment is strong</h1>
<p>There were 464 IPOs<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> in 2020: the most in any year since 1999. A <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/j.1540-6261.2006.00885.x">classic measure</a> of the level of exuberance in the market is the first-day return earned by investing in an IPO (the “IPO pop”). In 2020, the <a href="https://site.warrington.ufl.edu/ritter/ipo-data/">average first-day return</a> was 41.6%, the highest since 2000.</p>
<p><img src="/2021/02/ipo-first-day-returns.png" alt="" /></p>
<p>IPO activity was subdued for the first few months of 2020 due to the pandemic. If you measure the IPO pop over just the last few months of 2020, it <a href="https://seekingalpha.com/article/4395387-2020-ipo-bubble-just-reached-dot-com-levels">looks even higher</a>: maybe 46-52%.</p>
<p>US investor sentiment is clearly exceptionally strong, which is a bit confusing given that the real economy is in tatters, US politics is in gridlock, and the 2020s look to be the pivotal decade in which China will <a href="https://www.linkedin.com/pulse/big-cycles-over-last-500-years-ray-dalio/">overtake the US</a> as the leading world power. Perhaps this bullish sentiment is justified, but it’s a little difficult for me to see why.</p>
<h1 id="low-information-retail-traders-are-piling-in">Low-information retail traders are piling in</h1>
<p>Retail investors are making up an increasing amount of the market. Federal Reserve data shows that the US <a href="https://fred.stlouisfed.org/series/PSAVERT">personal savings rate</a> is at levels that haven’t been seen since the 1970s. The reason for this is that people are stuck at home with little to spend their savings on, and the US government is handing out stimulus cheques and other forms of income support with wild abandon. A lot of these savings are ending up in the stock market.</p>
<p>Retail investors characteristically like to invest in products with lottery-like payoffs: i.e. negative expected return, but with a low-probability possibility of a huge upside. Two classic ways to buy a lottery ticket in the stock market are to trade options, and to invest in penny stocks.</p>
<p>What we see is that the use of options is at almost an all-time high: equity option volumes are <a href="https://www.businesswire.com/news/home/20201202005584/en/OCC-November-2020-Total-Volume-Up-71-Percent-From-a-Year-Ago">up 80%</a> versus the same period last year. (Notoriously, retail traders used the leverage afforded by these options to <a href="https://www.ft.com/content/f2929fcc-cc62-4d20-8397-2040ce3f595e">enormously drive up the price</a> of Gamestop Corp.) The increased use of options may partly be caused by new fintech entrants like Robinhood that make it easier than ever for retail clients to begin option trading.</p>
<p>In the penny stock arena, data <a href="https://twitter.com/sentimentrader/status/1349367424722853888/photo/1">shows</a> that 2020 experienced unprecedented levels of trading:
<img src="/2021/02/penny-stock-volume.png" alt="" /></p>
<p>We also see retail enthusiasm spilling over into the crypto markets, where Bitcoin, Ethereum and Dogecoin have been rallying hugely as money pours into these highly volatile assets.</p>
<!--It is tempting to tell a story where retail flows have temporarily pushed up asset prices, but when these unreliable flows stop or reverse, there will be some reversal in prices back towards long-term average valuations.-->
<h1 id="what-are-the-risks">What are the risks?</h1>
<p>Although several alarm bells are ringing, and valuations look a bit overheated, it’s not at all clear to me that there is any immediate danger of a huge stock price crash.</p>
<p>One commonly-cited catalyst for a crash is that Biden is expected to raise the corporate tax rate to 28%. When Trump cut the rate from 35% to 21% in 2017, this was regarded as being a key factor in the 2017 to early-2018 equity market rally. An announcement of a tax rate increase by the new administration could conceivably act as a shock in the other direction in the near-term. However, this is not necessarily a 100% convincing argument: for example, you could counter that if the US authorizes $2k stimulus cheques, that will provide a second wind to the retail inflows and push the markets to even greater heights.</p>
<p>In my view it is completely possible that equity prices just continue to grow for years to come, even in defiance of the fundamentals, in a pure Keynesian beauty contest fuelled by institutional investors seeking yield and by easy money from the Fed’s low interest rate policies. There is <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/jofi.12818">evidence</a> that since the 1990s, increases in equity market valuation have been substantially driven by FOMC decisions, so the Federal Reserve clearly has enormous power to pump the market, and there is no near-term sign of this power facing any challenges.</p>
<p>What’s more, there is (almost) <a href="https://www.investopedia.com/terms/t/tina-there-no-alternative.asp">no alternative</a> to owning US equities:</p>
<ul>
<li>
<p>International equities don’t offer the same breadth of investment possibilities, particularly in the technology sector. Whatever you think about the valuation of these tech companies, I think it’s important to own a bit of them, even if for no other reason than to hedge the “AI risk” that software continues to eat the world, and actually automates you out of a job!</p>
</li>
<li>
<p>Traditional alternatives like fixed income products now offer negative real yields. This is most notable in the EU and environs, where nominal rates are negative: in Denmark, you can now actually get <em>paid</em> to take out of mortgage. Furthermore, with Federal Reserve money creation also at extremely high levels, and big federal spending expansion expected, there is an elevated risk of inflation. Fixed income products are exposed to this risk in a way that equities mostly are not.</p>
</li>
</ul>
<p>So, are these indicators totally non-actionable? Perhaps. Nonetheless, I think it may be worth downweighting the US a little bit at this point. Personally I’m looking at rotating some of my own savings into international equities (particularly mainland China), and exploring opportunities to earn high yields in the crypto space.</p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>When including SPACs, CEFs and REIT offerings in the count of IPOs. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>MaxThe outlook for the US stock market looks unusually poor.How close is England to covid herd immunity?2020-11-02T22:21:00+00:002020-11-02T22:21:00+00:00http://blog.omega-prime.co.uk/2020/11/02/herd-immunity<p>I don’t have an answer to this question, but I’ve gathered a few related stats that helped to get my own thinking into order.</p>
<p>The UK government is running two <a href="https://www.imperial.ac.uk/medicine/research-and-impact/groups/react-study/real-time-assessment-of-community-transmission-findings/">randomized assays</a> of the population to monitor the coronavirus situation in England:</p>
<ul>
<li>REACT1 is a “PCR” test of approximately 150k people per month. This measures the number of people who currently have the virus.</li>
<li>REACT2 is a “IgG antibody” test of approximately 150k people per month. This measures the number of people who have had the virus at some point in the past. It is unknown how long covid remains detectable via antibody test, but the REACT authors say: “most infected people mount an IgG antibody response detectable after 14-21 days although levels may start to wane after ~90 days”</li>
</ul>
<p>By comparing REACT2 prevalence estimates to the fraction of the population with a confirmed case according to the <a href="https://coronavirus.data.gov.uk/cases">government data dashboard</a>, we can estimate that the true number of cases in the population is roughly 7-10x the number of diagnoses to date.</p>
<table>
<thead>
<tr>
<th> </th>
<th>Start</th>
<th>End</th>
<th>N</th>
<th>Sample Prevalence</th>
<th>Cumulative Cases</th>
<th>Cumulative Cases (% Pop)</th>
<th>Sample Prevalence/Cumulative Cases (% Pop)</th>
</tr>
</thead>
<tbody>
<tr>
<td>REACT2 Round 1</td>
<td>20th Jun</td>
<td>13th Jul</td>
<td>99908</td>
<td>5.96%</td>
<td>245000</td>
<td>0.44%</td>
<td>14.60</td>
</tr>
<tr>
<td>REACT2 Round 2</td>
<td>31st Jul</td>
<td>13th Aug</td>
<td>105829</td>
<td>4.83%</td>
<td>270000</td>
<td>0.48%</td>
<td>10.73</td>
</tr>
<tr>
<td>REACT2 Round 3</td>
<td>15th Sep</td>
<td>28th Sep</td>
<td>159367</td>
<td>4.38%</td>
<td>370000</td>
<td>0.66%</td>
<td>7.10</td>
</tr>
</tbody>
</table>
<p>At the time of writing, 1053k people have tested positive, so conservatively 7M people have already had covid (i.e. 12% of the country). As a sanity check on this number, the REACT2 Round 1 <a href="https://www.imperial.ac.uk/media/imperial-college/institute-of-global-health-innovation/Ward-et-al-120820.pdf">report</a> itself estimated 3.36M people as of the end of June.</p>
<p>Herd immunity refers to the situation where the reproduction rate, \(R\), is less than 1 (and thus the number of infected people starts to shrink) because enough people are infected that the virus can’t find susceptible hosts faster than it is killed by the carrier’s immune system.</p>
<p>Assuming people mix homogenously, you need a fraction \(\geq 1-(1/R_0)\) of the population to have got the virus to achieve this, where \(R_0\) is a property of the virus that measures how easy it is to spread. <a href="https://www.nature.com/articles/s41577-020-00451-5">Estimates</a> place covid \(R_0\) around 2.5-4, with 2.5 looking like the <a href="https://royalsociety.org/-/media/policy/projects/set-c/set-covid-19-R-estimates.pdf">consensus</a> of the BMJ for England. Thus the herd immunity threshold sits around 60%, i.e. 34M infected people. We would have to 5x the number of currently infected people to reach this level, probably increasing cumulative deaths by 5x in the process (i.e. incurring 187k new deaths).</p>
<p>However, some argue that under <a href="https://www.bmj.com/content/370/bmj.m3563">more realistic</a> population-mixing assumptions, the threshold could be somewhat lower than this: perhaps as low as 10-20% of the population i.e. 5.6-11M people, which is a range that we have already reached.</p>
<p>The REACT1 reports use the prevalence numbers to estimate a nationwide R. This currently stands at 1.56 after, rebounding off a low of 0.57 achieved around the time that the most restrictive lockdown measures were released on May 10th:</p>
<table>
<thead>
<tr>
<th> </th>
<th>Start</th>
<th>End</th>
<th>N</th>
<th>Sample Prevalence</th>
<th>R</th>
</tr>
</thead>
<tbody>
<tr>
<td>REACT1 Round 1</td>
<td>1st May</td>
<td>1st Jun</td>
<td>120620</td>
<td>0.16%</td>
<td>0.57</td>
</tr>
<tr>
<td>REACT1 Round 2</td>
<td>19th Jun</td>
<td>7th Jul</td>
<td>159199</td>
<td>0.09%</td>
<td>0.6</td>
</tr>
<tr>
<td>REACT1 Round 3</td>
<td>24th Jul</td>
<td>11th Aug</td>
<td>162821</td>
<td>0.04%</td>
<td>1.3</td>
</tr>
<tr>
<td>REACT1 Round 4</td>
<td>20th Aug</td>
<td>8th Sep</td>
<td>154325</td>
<td>0.13%</td>
<td>1.7</td>
</tr>
<tr>
<td>REACT1 Round 5</td>
<td>18th Sep</td>
<td>5th Oct</td>
<td>174949</td>
<td>0.60%</td>
<td>1.2</td>
</tr>
<tr>
<td>REACT1 Round 6 (partial)</td>
<td>16th Oct</td>
<td>25th Oct</td>
<td>85971</td>
<td>1.28%</td>
<td>1.56</td>
</tr>
</tbody>
</table>
<p>The R estimates here are convincingly higher than 1, so my feeling is that unfortunately the optimists who expect herd immunity to be with just 10-20% of the population infected seem incorrect at this point, and we would have to incur substantial numbers of new deaths to bring \(R\) under 1 in the absence of measures such as the recently-announced lockdown.</p>
<!-- (If a significant number of people are being reinfected by covid, that could be a complicating factor, but this doesn't seem to be a widespread phenomenon at this point.) -->
<!--
The UK government's reaction to call for a second lockdown at this point seems justifiable. Our understanding of the IFR has evolved only slightly since my [initial post|http://blog.omega-prime.co.uk/2020/03/22/coronavirus-lockdown/], and it [seems|https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-34-ifr/] that it is about 1.15% in rich countries that skew old. The UK government will probably end up spending more than 300bn GBP in total fighting the virus, and by doing so will plausibly avoid 1.15%*(66.5 million people) = 764k deaths, which works out to a reasonable cost of 392k GBP per life, consistent with standard NICE costings.
-->MaxI don’t have an answer to this question, but I’ve gathered a few related stats that helped to get my own thinking into order.L1 regression solved six ways2020-04-01T23:21:00+01:002020-04-01T23:21:00+01:00http://blog.omega-prime.co.uk/2020/04/01/l1-regression<p>The most fundamental technique in statistical learning is ordinary least squares (OLS) regression. If we have a vector of observations \(y\) and a matrix of features associated with each observation \(X\), then we assume the observations are a linear function of the features plus some (iid) random noise, \(\epsilon\):</p>
<p>\[ y = Xb + \epsilon \]</p>
<p>The maximum likelihood estimate of the regression coefficients \(b\) is that which minimizes the sum of squares error \(e(b)\) in our reconstruction of \(y\) i.e.:</p>
<p>\[ e(b) = (y - Xb)^T (y - Xb) \]</p>
<p>You can minimize \(e\) analytically by setting the derivative with respect to \(b\) equal to zero:</p>
<div>\[
\begin{align}
e(b) &= y^T y - 2y^T Xb + b^T X^T Xb \\
\frac{de}{db} &= -2X^T y + 2 X^T Xb \\
&= 0 \\
X^T Xb &= X^T y \\
b &= (X^T X)^{-1} X^T y
\end{align}
\]</div>
<p>So far so straightforward. Things get more interesting when you consider a small variation on this scheme that is very useful in practice: L1 penalized regression aka the “lasso”. In this scheme, we augment the \(e\) function we are trying to minimize with a small penalty proportional to the size of the \(b\) coefficients:</p>
<p>\[ e(b) = (y - Xb)^T (y - Xb) + \gamma \sum_i |b_i| \]</p>
<p>This has the effect of driving many of the \(e\)-minimizing \(b\) components to exactly zero. <em>Sparse</em> solutions like this have advantages in interpretability and may result in regression models that generalise better out-of-sample than naive OLS estimates do. L1 penalties also have applications beyond simple regression: for example, they are the foundational tool in <a href="https://en.wikipedia.org/wiki/Compressed_sensing">compressed sensing</a>.</p>
<p>But: how do we find the \(b\) that minimizes this L1-penalised \(e\)? It’s a bit tricky because the \(|b|\) term is not differentiable. However, we know that a global optimum <em>does</em> exist because \(e\) is a convex function: \(e\) is the the sum of two convex components.</p>
<p>This technical post looks at minimizing this convex but non-smooth function through 6 different methods. My developments will be based on Parikh and Boyd’s notes on <a href="https://web.stanford.edu/~boyd/papers/pdf/prox_algs.pdf">“Proximal Algorithms”</a>, Boyd and Vandenberghe’s book <a href="https://web.stanford.edu/~boyd/cvxbook/">“Convex Optimization”</a>, Boyd, Xiao and Mutapcic’s notes on <a href="https://web.stanford.edu/class/ee392o/subgrad_method.pdf">“Subgradient Methods”</a> and Ryan Tibshirani’s <a href="https://www.youtube.com/playlist?list=PLjbUi5mgii6AVdvImLB9-Hako68p9MpIC">video lecture series</a> on convex optimization.</p>
<h1 id="subgradient-descent">Subgradient descent</h1>
<p>Even though functions like \(|b|\) are not differentiable, you can still define a <em>subgradient</em> for them. A subgradient \(g\) of \(f\) at \(x\) satisfies:
\[
\forall a. f(y) \geq f(x) + g^T(y - x)
\]</p>
<p>So the subgradient of \(|b|\) (with respect to \(b\)) at 0 is the set \([-1,1]\). Abusing notation, we can say that the subgradient set for our \(e\) is:
\[
2 X^T Xb - 2X^T y + \gamma ~ \textbf{if}(b = 0, [-1,1], \textbf{sign}(b))
\]</p>
<p>The <em>subgradient method</em> for minimization starts with initial guess \(x^{0}\) and then updates that guess by:
\[
x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)}
\]
Where \(\alpha_k\) is a step size chosen according to some schedule (e.g. \(\alpha_k = a/\sqrt{k}\) for fixed \(a\)), and \(g^{(k)}\) is any subgradient. Subgradient descent is guaranteed to converge to the optimal value for any convex objective, though it might be extremely slow.</p>
<p>In Python:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X</span> <span class="o">=</span> <span class="o">...</span> <span class="n">N</span> <span class="o">*</span> <span class="n">F</span> <span class="n">matrix</span> <span class="o">...</span>
<span class="n">y</span> <span class="o">=</span> <span class="o">...</span> <span class="n">N</span> <span class="n">vector</span> <span class="o">...</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">gamma</span> <span class="o">=</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">N</span><span class="p">)</span> <span class="o">*</span> <span class="n">alpha</span>
<span class="k">def</span> <span class="nf">evaluate_objective</span><span class="p">(</span><span class="n">betas</span><span class="p">):</span>
<span class="n">errors</span> <span class="o">=</span> <span class="n">y</span><span class="p">[</span><span class="bp">None</span><span class="p">,</span> <span class="p">:]</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">betas</span><span class="p">,</span> <span class="n">X</span><span class="o">.</span><span class="n">T</span><span class="p">)</span>
<span class="n">sse</span> <span class="o">=</span> <span class="p">(</span><span class="n">errors</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">gamma</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">betas</span><span class="p">)</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">sse</span>
<span class="k">def</span> <span class="nf">subgradient_descent</span><span class="p">():</span>
<span class="n">betas</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">2000</span><span class="p">,</span> <span class="n">F</span><span class="p">))</span>
<span class="n">betas</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">a</span> <span class="o">=</span> <span class="mf">1e-4</span>
<span class="n">XTX</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">T</span> <span class="o">@</span> <span class="n">X</span>
<span class="n">XTy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="n">eps</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">finfo</span><span class="p">(</span><span class="n">betas</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span><span class="o">.</span><span class="n">eps</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">betas</span><span class="p">)):</span>
<span class="n">penalty_subg</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">where</span><span class="p">(</span>
<span class="n">np</span><span class="o">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">t</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span> <span class="o"><=</span> <span class="n">eps</span><span class="p">,</span>
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">F</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span>
<span class="n">np</span><span class="o">.</span><span class="n">sign</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">t</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span>
<span class="p">)</span>
<span class="n">subg</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">XTX</span><span class="p">,</span> <span class="n">betas</span><span class="p">[</span><span class="n">t</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span> <span class="o">-</span> <span class="n">XTy</span> <span class="o">+</span> <span class="n">gamma</span> <span class="o">*</span> <span class="n">penalty_subg</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">a</span><span class="o">/</span><span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="n">betas</span><span class="p">[</span><span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="n">betas</span><span class="p">[</span><span class="n">t</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">alpha</span><span class="o">*</span><span class="n">subg</span>
<span class="n">sse</span> <span class="o">=</span> <span class="n">evaluate_objective</span><span class="p">(</span><span class="n">betas</span><span class="p">)</span>
<span class="k">return</span> <span class="n">betas</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">sse</span><span class="p">)]</span>
</code></pre></div></div>
<h1 id="coordinate-descent">Coordinate descent</h1>
<p><a href="https://en.wikipedia.org/wiki/Coordinate_descent">Coordinate descent</a> iterates over each dimension to be optimized and minimizes the objective with respect to that single variable holding all others fixed. This method is popular with practitioners due to it’s simplicity, though it is of little interest to theorists due to slow convergence.</p>
<p>Let’s optimize our objective, \(e(b)\) over input \(b_k\). Rewriting the objective:</p>
<div>\[
\begin{align}
e(b) &= (y - Xb)^T (y - Xb) + \gamma \sum_i \|b_i\| \\
&= \sum_{j} (y_j - \sum_{i \neq k} X_{j,i} b_i - X_{j,k} b_k)^2 + \gamma \sum_{i \neq k} \|b_i\| + \gamma \|b_k\|
\end{align}
\]</div>
<p>We can minimize this under the assumption that \(b_k>0\):</p>
<div>\[
\begin{align}
\frac{de(b)}{db_k} &= -\sum_{j} 2 X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i - X_{j,k} b_k) + \gamma \\
0 &= \gamma - 2 \sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) + 2 \sum_{j} X_{j,k} X_{j,k} b_k \\
b_k &= \frac{\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) - \frac{1}{2}\gamma}{\sum_{j} X_{j,k}^2}
\end{align}
\]</div>
<p>By a similar argument we can show that if \(b_k<0\), the minimizer is:</p>
<div>\[
\begin{align}
b_k &= \frac{\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) + \frac{1}{2}\gamma}{\sum_{j} X_{j,k}^2}
\end{align}
\]</div>
<p>Finally, if \(b_k=0\) we can use subgradients: 0 will be in the subgradient set of \(e(b)\) with respect to \(b\) if and only if:</p>
<div>\[
\begin{align}
\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) &\in [-\frac{\gamma}{2},\frac{\gamma}{2}]
\end{align}
\]</div>
<p>Putting it together, we can define the minimizing \(b_k\) by cases:</p>
<div>\[
b_k = \begin{cases}
\frac{\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) - \frac{1}{2}\gamma}{\sum_{j} X_{j,k}^2} &
\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) > \frac{1}{2}\gamma \\
\frac{\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) + \frac{1}{2}\gamma}{\sum_{j} X_{j,k}^2} &
\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i) < -\frac{1}{2}\gamma \\
0 & \text{otherwise}
\end{cases}
\]</div>
<p>This definition can be simplified by defining a “soft thresholding” function as follows (this will also be useful later on):</p>
<div>\[
\begin{align}
\text{threshold}(x, \mu) &= \text{sign}(x)\text{max}(|x| - \mu, 0) \\
b_k &= \frac{\text{threshold}(\sum_{j} X_{j,k} (y_j - \sum_{i \neq k} X_{j,i} b_i), \frac{1}{2}\gamma)}{\sum_{j} X_{j,k}^2}
\end{align}
\]</div>
<p>As a Python program:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">soft_threshold</span><span class="p">(</span><span class="n">beta</span><span class="p">,</span> <span class="n">thold</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">sign</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">fmax</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">beta</span><span class="p">)</span><span class="o">-</span><span class="n">thold</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">coordinate_descent</span><span class="p">():</span>
<span class="n">betas</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">20</span><span class="p">,</span> <span class="n">F</span><span class="p">))</span>
<span class="n">betas</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">r</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">betas</span><span class="p">)):</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">betas</span><span class="p">[</span><span class="n">r</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:]</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">F</span><span class="p">):</span>
<span class="n">projection</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span>
<span class="n">X</span><span class="p">[:,</span> <span class="n">k</span><span class="p">],</span>
<span class="n">y</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <span class="p">:</span><span class="n">k</span><span class="p">],</span> <span class="n">beta</span><span class="p">[:</span><span class="n">k</span><span class="p">])</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">:],</span> <span class="n">beta</span><span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">:])</span>
<span class="p">)</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="mf">0.5</span><span class="o">*</span><span class="n">gamma_01</span>
<span class="n">scale</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <span class="n">k</span><span class="p">],</span> <span class="n">X</span><span class="p">[:,</span> <span class="n">k</span><span class="p">])</span>
<span class="n">beta</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">soft_threshold</span><span class="p">(</span><span class="n">projection</span><span class="p">,</span> <span class="n">threshold</span><span class="p">)</span><span class="o">/</span><span class="n">scale</span>
<span class="n">betas</span><span class="p">[</span><span class="n">r</span><span class="p">]</span> <span class="o">=</span> <span class="n">beta</span>
<span class="n">sse</span> <span class="o">=</span> <span class="n">evaluate_objective</span><span class="p">(</span><span class="n">betas</span><span class="p">)</span>
<span class="k">return</span> <span class="n">betas</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">sse</span><span class="p">)]</span>
</code></pre></div></div>
<h1 id="proximal-gradient-descent">Proximal gradient descent</h1>
<p>To explain proxmial gradient descent, let’s first refresh our memory about standard gradient descent. For a smooth function \(f(k)\), we find the minimum of the function by starting with an initial guess \(x^{(0)}\) and then iteratively updating by subtracting \(t\) times the gradient:
\[
x^{(k+1)} = x^{(k)} - t\nabla f(x)
\]</p>
<!-- There are several ways to choose the step size $t$ at each iteration. One popular choice is do a *line search*: start with some initial fixed $t$ then iterate with $t^+ = 0.8 t$ while $f(x - t\nabla f(x)) > f(x) - \frac{1}{2}t\|\nabla f(x)\|_2^2$.
In particular, line search ensures that no update to $x$ increases $f(x)$. -->
<p>Now notice that each of these update steps is equivalent to minimizing kind of a second-order Taylor expansion of \(f\) around \(x^{(k)}\) i.e.:</p>
<p>\[
f(x^+) \approx f(x^{(k)}) + (x^+ - x^{(k)})^T \nabla f(x^{(k)})) + (x^+ - x^{(k)})^T \nabla^2 f(x^{(k)})) (x^+ - x^{(k)})
\]</p>
<p>We approximate further by making the simplifying assumption that \(\nabla^2 f(x) = \frac{1}{2t}I\) then:
\[
df/dx^+ \approx \nabla f(x^{(k)})) - \frac{1}{t} x^{(k)} + \frac{1}{t} x^+
\]</p>
<p>And the minimum \(\text{argmin}_{x^+} f(x^+)\) is achieved when this gradient is zero i.e.:
\[
x^+ = x^{(k)} - t \nabla f(x^{(k)}))
\]</p>
<p>With the <a href="https://en.wikipedia.org/wiki/Proximal_gradient_method">proximal gradient method</a>, we begin by rewriting \(e(b)\) as the sum of two parts \(e(b) = f(b) + g(b)\) where \(f\) is differentiable. So for our example L1-penalized objective:</p>
<div>\[
\begin{align}
f(b) &= (y - Xb)^T (y - Xb) \\
g(b) &= \gamma \sum_i \|b_i\|
\end{align}
\]</div>
<p>We now proceed as with gradient descent, but perform the simplified Taylor expansion on the smooth (\(f\)) part only. At each gradient descent step we therefore want to solve the problem:</p>
<div>
\begin{align}
\text{argmin}_{x^+} e(x^+)
&= \text{argmin}_{x^+} h(x^+) + g(x^+) + (x^+-x^{(k)})^T \nabla g(x) + \frac{1}{2t}(x^+-x^{(k)})^T (x^+-x^{(k)}) \\
&= \text{argmin}_{x^+}
\frac{1}{2t}\|x^+ - (x^{(k)} - t \nabla f(x^{(k)}))\|_2^2 + g(x^+) \\
&= \text{prox}_{tg} (x^{(k)} - t \nabla f(x^{(k)}))
\end{align}
</div>
<p>Where \(\text{prox}_{tg}\) is just a compact way of denoting this minimization: this is called the <a href="https://en.wikipedia.org/wiki/Proximal_operator">“proxmial operator”</a> of \(g\) with scale parameter \(t\).</p>
<p>Often it’s much easy to analytically solve the minimization being done by the proximal operator, while it might be very hard or impossible to solve the original \(e\) minimization. For our L1-penalized case, we want to be able to solve problems like this:</p>
<div>
\begin{align}
\text{prox}_{tg}(x) &= \text{argmin}_{z}
\frac{1}{2t}\|z - x\|_2^2 + \gamma \sum_i \|z_i\|
\end{align}
</div>
<p>In our case the minimization in the proximal operator <strong>is</strong> much easier to solve than the original \(e\) minimization. The proximal operator does have a closed form solution that can be <a href="https://math.stackexchange.com/a/511106">readily derived</a> via subgradients, but trying to use subgradients to minimize the original \(e\) <a href="https://stats.stackexchange.com/questions/174003/why-is-my-derivation-of-a-closed-form-lasso-solution-incorrect">fails</a>. The proximal gradient method has managed to turn a hard problem into an easy one!</p>
<p>The solution to \(\text{prox}_{tg}(x)\) is just the soft-thresholding function again:
<!-- <div>
\begin{align}
[\text{prox}_{tg}(x)]_i &= \begin{cases}
0 & |x_i| \leq t\gamma \\
x_i - t\gamma \text{sign}(x_i) & |x_i| > t\gamma
\end{cases} \\
&= \text{sign}(x_i)\text{max}(|x_i| - t\gamma, 0)
\end{align}
</div> --></p>
<div>
\begin{align}
[\text{prox}_{tg}(x)]_i &= \text{threshold}(x_i, t\gamma)
\end{align}
</div>
<p>We can put this method together as a Python program as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">evaluate_sse</span><span class="p">(</span><span class="n">beta</span><span class="p">):</span>
<span class="n">resid</span> <span class="o">=</span> <span class="n">y</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">beta</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">resid</span><span class="p">,</span> <span class="n">resid</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">proximal_gradient_descent</span><span class="p">():</span>
<span class="n">betas</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">20</span><span class="p">,</span> <span class="n">F</span><span class="p">))</span>
<span class="n">betas</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">XTX</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">T</span> <span class="o">@</span> <span class="n">X</span>
<span class="n">XTy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">betas</span><span class="p">)):</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">XTX</span><span class="p">,</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="o">-</span> <span class="n">XTy</span>
<span class="c1"># Line search: iterate to find a step size `t` which results
</span> <span class="c1"># in us reducing the value of the objective function
</span> <span class="n">t</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">soft_threshold</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">t</span> <span class="o">*</span> <span class="n">g</span><span class="p">,</span> <span class="n">t</span><span class="o">*</span><span class="n">gamma</span><span class="p">)</span>
<span class="c1"># G is the "generalized gradient" of the objective,
</span> <span class="c1"># so our update looks like a "gradient descent step":
</span> <span class="c1"># betas[k] = betas[k-1] - tG
</span> <span class="n">G</span> <span class="o">=</span> <span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">])</span><span class="o">/</span><span class="n">t</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">evaluate_sse</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">])</span>
<span class="n">last_e</span> <span class="o">=</span> <span class="n">evaluate_sse</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="n">max_e</span> <span class="o">=</span> <span class="n">last_e</span> <span class="o">-</span> <span class="n">t</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">G</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">t</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span><span class="o">*</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">G</span><span class="p">,</span> <span class="n">G</span><span class="p">)</span>
<span class="k">if</span> <span class="n">e</span> <span class="o"><=</span> <span class="n">max_e</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">t</span> <span class="o">*=</span> <span class="mf">0.8</span>
<span class="n">sse</span> <span class="o">=</span> <span class="n">evaluate_objective</span><span class="p">(</span><span class="n">betas</span><span class="p">)</span>
<span class="k">return</span> <span class="n">betas</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">sse</span><span class="p">)]</span>
</code></pre></div></div>
<h1 id="accelerated-proximal-gradient-descent">Accelerated proximal gradient descent</h1>
<p>A popular variant of proximal gradient descent is to use some <a href="https://dominikschmidt.xyz/nesterov-momentum/">“Nestorov momentum”</a>:</p>
<div>
\begin{align}
v &= x^{(k-1)} + \frac{k-2}{k+1}(x^{(k-1)} - x^{(k-2)}) \\
x^{(k)} &= \text{prox}_{tg} (v - t \nabla g(v))
\end{align}
</div>
<p>Unlike proximal gradient descent, this is no longer a descent method: some iterations might increase the value of the objective! Nonetheless, the method can be proven to converge so long as the smooth \(g\) function is not too large: technically \(\nabla g\) should be <a href="https://en.wikipedia.org/wiki/Lipschitz_continuity">Lipschitz-continuous</a>.</p>
<p>In Python:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">accelerated_proximal_gradient_descent</span><span class="p">():</span>
<span class="n">betas</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">20</span><span class="p">,</span> <span class="n">F</span><span class="p">))</span>
<span class="n">betas</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">XTX</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">T</span> <span class="o">@</span> <span class="n">X</span>
<span class="n">XTy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">betas</span><span class="p">)):</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">XTX</span><span class="p">,</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span> <span class="o">-</span> <span class="n">XTy</span>
<span class="c1"># Iterate to find a step size `t` which doesn't move us too far
</span> <span class="c1"># from the objective
</span> <span class="n">t</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="p">((</span><span class="n">k</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o">/</span><span class="p">(</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span><span class="o">*</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">2</span><span class="p">])</span>
<span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">soft_threshold</span><span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">t</span> <span class="o">*</span> <span class="n">g</span><span class="p">,</span> <span class="n">t</span><span class="o">*</span><span class="n">gamma</span><span class="p">)</span>
<span class="n">G</span> <span class="o">=</span> <span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:]</span> <span class="o">-</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">,</span> <span class="p">:])</span><span class="o">/</span><span class="n">t</span>
<span class="n">e</span> <span class="o">=</span> <span class="n">evaluate_sse</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">,</span> <span class="p">:])</span>
<span class="n">last_e</span> <span class="o">=</span> <span class="n">evaluate_sse</span><span class="p">(</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span>
<span class="n">max_e</span> <span class="o">=</span> <span class="n">last_e</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">-</span> <span class="n">v</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">t</span><span class="p">))</span><span class="o">*</span><span class="p">((</span><span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">-</span> <span class="n">v</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span>
<span class="k">if</span> <span class="n">e</span> <span class="o"><=</span> <span class="n">max_e</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">t</span> <span class="o">*=</span> <span class="mf">0.8</span>
<span class="n">sse</span> <span class="o">=</span> <span class="n">evaluate_objective</span><span class="p">(</span><span class="n">betas</span><span class="p">)</span>
<span class="k">return</span> <span class="n">betas</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">sse</span><span class="p">)]</span>
</code></pre></div></div>
<h1 id="alternating-direction-method-of-multipliers-admm">Alternating direction method of multipliers (ADMM)</h1>
<p>ADMM is a method of solving minimization problems of the form \(f(x) + g(z)\) subject to linear constraints \(Ax+Bz=c\). The <em>augmented Lagrangian</em> for this problem is just the standard <a href="https://en.wikipedia.org/wiki/Lagrange_multiplier">Lagrangian</a> plus a quadratic penalty for the constraint violation:</p>
<div>
\begin{align}
L_\rho(x, z, \lambda) &= f(x) + g(z) + \lambda^T(Ax+Bz-c) + (\rho/2) \|Ax+Bz-c\|_2^2
\end{align}
</div>
<p>This quadratic penalty will prove to be key, because just like in the proximal gradient descent case it will help ensure that the Lagrangian has a sensible gradient.</p>
<p>To minimize using ADMM, we keep estimates for \(x\), \(z\) and \(\lambda\) which we iteratively update as follows:</p>
<div>
\begin{align}
x^{(k+1)} &= \text{argmin}_x L_\rho(x, z^{(k)}, \lambda^{(k)}) \\
z^{(k+1)} &= \text{argmin}_z L_\rho(x^{(k+1)}, z, \lambda^{(k)}) \\
\lambda^{(k+1)} &= \lambda^{(k)} + \rho(Ax^{(k+1)} + Bz^{(k+1)} - c)
\end{align}
</div>
<p>Under weak conditions (mostly just convexity of \(f\) and \(g\)), it can be shown that this procedure converges on the constrained global minimum!</p>
<p>How can we solve the L1 regression problem in this framework? It’s not at first obvious, because in the L1 problem there are no constraints at all, but ADMM seems to be useful only for optimization with linear constraints. The small trick we need is to instantiate the constraint \(Ax+Bz=c\) to enforce that \(x=z\) i.e. \(A=I\), \(B=-I\) and \(c=0\). If \(f\) and \(g\) are the residual error and regularization functions – as they were in the proximal gradient case – then we have:</p>
<div>
\begin{align}
L_\rho(x, z, \lambda) &= (y - Xx)^T (y - Xx) + \gamma \sum_i \|z_i\| + \lambda^T(x-z) + (\rho/2) \|x-z\|_2^2
\end{align}
</div>
<p>Minimizing \(L_\rho(x, z, \lambda)\) with respect to \(x\) is straightforward:</p>
<div>
\begin{align}
0 &= 2 X^T X x - 2 X^T y + \lambda + \rho(x - z) \\
2 X^T X x + \rho x &= 2 X^T y - \lambda + \rho z \\
x &= (2 X^T X + \rho)^{-1}(2 X^T y - \lambda + \rho z)
\end{align}
</div>
<p>The problem of minimizing \(L_\rho(x, z, y)\) with respect to \(z\) is tricky, but you can show it is equivalent to solving the proximal operator for \(g\), which is helpful because we already know how to evaluate <strong>that</strong>:</p>
<div>
\begin{align}
& \text{argmin}_z L_\rho(x, z, \lambda) \\
=& \text{argmin}_z \gamma \sum_i \|z_i\| - \lambda^T z + (\rho/2) (z^T z - 2 x^T z) \\
=& \text{argmin}_z \gamma \sum_i \|z_i\| + \rho/2(z^T z - 2 z^T (x + \lambda/\rho) + (x - \lambda/\rho)^T (x - \lambda/\rho)) \\
=& \text{argmin}_z \gamma \sum_i \|z_i\| + \rho/2 \|z - (x + (1/\rho)\lambda)\|^2_2 \\
=& \text{prox}_{\frac{1}{\rho}g} (x + (1/\rho)\lambda)
\end{align}
</div>
<p>Putting this together as a Python function:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">admm</span><span class="p">():</span>
<span class="n">betas</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">50</span><span class="p">,</span> <span class="n">F</span><span class="p">))</span>
<span class="n">betas</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="p">:]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">XTX</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">T</span> <span class="o">@</span> <span class="n">X</span>
<span class="n">XTy</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="c1"># Works well on my examples, but algorithm can be sensitive to choice of rho:
</span> <span class="c1"># Section 3.4.1 of https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
</span> <span class="c1"># has some ideas here
</span> <span class="n">rho</span> <span class="o">=</span> <span class="mi">100</span>
<span class="k">def</span> <span class="nf">argmin_x</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">lamb</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">inv</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">XTX</span> <span class="o">+</span> <span class="n">rho</span><span class="p">)</span> <span class="o">@</span> <span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">XTy</span> <span class="o">-</span> <span class="n">lamb</span> <span class="o">+</span> <span class="n">rho</span><span class="o">*</span><span class="n">z</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">argmin_z</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">lamb</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="mi">1</span><span class="o">/</span><span class="n">rho</span>
<span class="k">return</span> <span class="n">soft_threshold</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="n">t</span><span class="o">*</span><span class="n">lamb</span><span class="p">,</span> <span class="n">t</span><span class="o">*</span><span class="n">gamma</span><span class="p">)</span>
<span class="n">x</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">lamb</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">F</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">F</span><span class="p">),</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">F</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">betas</span><span class="p">)):</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">argmin_x</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">lamb</span><span class="p">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">argmin_z</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">lamb</span><span class="p">)</span>
<span class="n">lamb</span> <span class="o">+=</span> <span class="n">rho</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="n">z</span><span class="p">)</span>
<span class="n">betas</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">z</span>
<span class="n">sse</span> <span class="o">=</span> <span class="n">evaluate_objective</span><span class="p">(</span><span class="n">betas</span><span class="p">)</span>
<span class="k">return</span> <span class="n">betas</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">sse</span><span class="p">)]</span>
</code></pre></div></div>
<h1 id="lars">LARS</h1>
<p><a href="https://en.wikipedia.org/wiki/Least-angle_regression">LARS</a>, or “least angle regression” is an algorithm that can be used to efficiently solve the L1 regression problem for a whole range of \(\lambda\) values. I defer to Wikipedia and the very readable <a href="http://statweb.stanford.edu/~tibs/ftp/lars.pdf">paper</a> to describe this approach. There is less value in understanding the details of LARS than of ADMM and the other descent methods, because the descent methods can be applied to many problems, not just the L1 problem.</p>
<h1 id="conclusion">Conclusion</h1>
<p>If you’re trying to use L1 regression in practice, just throw Scikit-Learn at the problem :-). But personally I’m not completely satisfied by just using a black box: it’s fascinating to know something about what is going on under the hood to solve this tricky optimization problem.</p>MaxThe most fundamental technique in statistical learning is ordinary least squares (OLS) regression. If we have a vector of observations \(y\) and a matrix of features associated with each observation \(X\), then we assume the observations are a linear function of the features plus some (iid) random noise, \(\epsilon\):Coronavirus lockdown: balanced on a knife edge2020-03-22T16:34:00+00:002020-03-22T16:34:00+00:00http://blog.omega-prime.co.uk/2020/03/22/coronavirus-lockdown<p>Countries around the world are going into more-or-less complete states of lockdown in an effort to stop the spread of novel coronavirus. The question I find myself asking is whether the obvious economic cost of this can possibly justify the benefits. I built a model to answer this question and I find that, contrary to my priors, lockdown is actually justifiable under reasonable assumptions.</p>
<p>The key inputs to the model are:</p>
<ol>
<li>What fraction of cases are symptomatic?</li>
<li>What fraction of symptomatic cases result in death?</li>
<li>How do these quantities vary with age?</li>
<li>In the absence of lockdown, how much of the population will eventually contract the disease?</li>
<li>What are the costs of lockdown?</li>
</ol>
<p>At this stage, the first 3 questions can be answered with some precision.</p>
<h2 id="asymptomatic-case-fraction">Asymptomatic case fraction</h2>
<p>On the Diamond Princess, 3063 of 3711 people was tested and so we know that roughly 50% of cases were asymptomatic (318 asymptomatic cases versus 308 symptomatic). Estimates from Wuhan <a href="https://www.medrxiv.org/content/10.1101/2020.03.04.20031104v1.full.pdf">transmission dynamics</a> come up with a very similar figure:</p>
<p><img src="/2020/03/wuhan-asymptomatic.png" alt="" /></p>
<h2 id="case-fatality-rate">Case fatality rate</h2>
<p><code class="highlighter-rouge">sCFR</code> is the symptomatic case fatality rate: the fraction of people with symptoms who will eventually die. I consider three sources of evidence for this:</p>
<ul>
<li>We know the number of infections and deaths on the Diamond Princess with certainity (there are a few people who haven’t yet died or recovered but this is negligible). Implied sCRF from a <a href="https://cmmid.github.io/topics/covid19/severity/diamond_cruise_cfr_estimates.html">model</a> fit to the data from the ship are as follows:</li>
</ul>
<p><img src="/2020/03/diamond-princess-cfr.png" alt="" /></p>
<ul>
<li>We can also make an estimate using data from Wuhan, which is well-studied at this point.
<ul>
<li>One estimate of the symptomatic case fatality rate can be made by dividing the number of deaths by the number of symptomatic cases 14 days ago (it <a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30183-5/fulltext">takes</a> <a href="https://jamanetwork.com/journals/jama/article-abstract/2761044">about</a> 10 days from the onset of symptoms to ICU admission). This <a href="https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30195-X/fulltext">gives</a> a high sCFR of about 5.7%:</li>
</ul>
</li>
</ul>
<p><img src="/2020/03/naive-wuhan-cfr.jpg" alt="" /></p>
<ul>
<li>However, it seems very likely that a large number of Wuhan cases remain undiagnosed. Studies that try to adjust for this by using <a href="https://www.nature.com/articles/s41591-020-0822-7">estimated transmission dynamics</a> or <a href="https://www.medrxiv.org/content/10.1101/2020.03.04.20031104v1">the results of tests done by other countries on travellers from Wuhan</a> find sCFR much more in line with the Diamond Princess experience:</li>
</ul>
<p><img src="/2020/03/wuhan-cfr0.png" alt="" />
<img src="/2020/03/wuhan-cfr1.png" alt="" /></p>
<p>We can take the mean of these three models to estimate the sCFR:</p>
<table>
<thead>
<tr>
<th>Age Range</th>
<th>sCFR Diamond Princess</th>
<th>sCFR Wuhan (from transmission rates)</th>
<th>sCFR Wuhan (from Travellers)</th>
<th>sCFR Mean</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 - 9</td>
<td>0.00%</td>
<td>0.01%</td>
<td>0.50%</td>
<td>0.17%</td>
</tr>
<tr>
<td>10 - 19</td>
<td>0.00%</td>
<td>0.02%</td>
<td>1%</td>
<td>0.34%</td>
</tr>
<tr>
<td>20 - 29</td>
<td>0.20%</td>
<td>0.09%</td>
<td>0.30%</td>
<td>0.20%</td>
</tr>
<tr>
<td>30 - 39</td>
<td>0.22%</td>
<td>0.18%</td>
<td>0.20%</td>
<td>0.20%</td>
</tr>
<tr>
<td>40 - 49</td>
<td>0.42%</td>
<td>0.40%</td>
<td>0.35%</td>
<td>0.39%</td>
</tr>
<tr>
<td>50 - 59</td>
<td>1.29%</td>
<td>1.30%</td>
<td>1%</td>
<td>1.20%</td>
</tr>
<tr>
<td>60 - 69</td>
<td>3.61%</td>
<td>4.60%</td>
<td>2%</td>
<td>3.34%</td>
</tr>
<tr>
<td>70 - 79</td>
<td>8.00%</td>
<td>9.80%</td>
<td>3%</td>
<td>6.93%</td>
</tr>
<tr>
<td>80 - 89</td>
<td>14.76%</td>
<td>18%</td>
<td>5.50%</td>
<td>12.75%</td>
</tr>
</tbody>
</table>
<p>For comparison, the <a href="https://www.cdc.gov/flu/about/burden/2018-2019.html">sCFR for seasonal flu</a> is 0.1% overall. The average over all age groups for coronavirus is about 0.7%. The Spanish Flu seems to have had sCFR of between <a href="https://rybicki.blog/2018/04/11/1918-influenza-pandemic-case-fatality-rate/?fbclid=IwAR3SYYuiERormJxeFZ5Mx2X_00QRP9xkdBktfmzJmc8KR-iqpbK8tGlNqtQ">10 and 20 percent</a>.</p>
<h2 id="if-uncontrolled-how-widespread-would-it-be">If uncontrolled, how widespread would it be?</h2>
<p>Seasonal flu has R0 <a href="https://www.ncbi.nlm.nih.gov/pubmed/19545404">1.3</a> and about 3-11% of the population get it each year. Spanish Flu <a href="https://www.infectioncontroltoday.com/public-health/100-years-after-spanish-flu-lessons-learned-and-challenges-future">had</a> an R0 of 2.0 and spread to about <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2720273/">one third</a> of the world. Measles has an R0 of 10 and <a href="http://www.med.mcgill.ca/epidemiology/courses/EPIB591/Fall%202010/mid-term%20presentations/Paper9.pdf">spread</a> to 80% of the Faroe Islands when it arrived there.</p>
<p>If uncontrolled, coronavirus R0 is <a href="https://www.ijidonline.com/article/S1201-9712(20)30091-6/fulltext">probably</a> around 2.2. Based on the experience with other diseases we can expect that uncontrolled it would spread to somewhere between one-third and 50% of the population.</p>
<h2 id="what-are-the-benefits-of-lockdown">What are the benefits of lockdown?</h2>
<p>I take the sCFR and the probability of having a symptomatic infection, and multiply it by the <a href="https://en.wikipedia.org/wiki/Demography_of_the_United_Kingdom#Age_structure">number of people</a> in each demographic group in the United Kingdom to find the expected number of deaths in each category.</p>
<p>Using <a href="https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/lifeexpectancies/datasets/nationallifetablesunitedkingdomreferencetables">actuarial life tables</a> I work out the number of years of life lost as a result of each death. Because many of the people who die have preexisting conditions, I say that they would only have lived for 80% as long as an average person of the same age had the coronavirus not done them in.</p>
<p>Results are as follows:</p>
<table>
<thead>
<tr>
<th>Age Range</th>
<th>sCFR Mean</th>
<th>Unconditional Death Probability</th>
<th>Expected Deaths</th>
<th>Expected Person-Years Lost</th>
</tr>
</thead>
<tbody>
<tr>
<td>0 - 9</td>
<td>0.17%</td>
<td>0.01%</td>
<td>631</td>
<td>38,583</td>
</tr>
<tr>
<td>10 - 19</td>
<td>0.34%</td>
<td>0.09%</td>
<td>6,530</td>
<td>347,408</td>
</tr>
<tr>
<td>20 - 29</td>
<td>0.20%</td>
<td>0.06%</td>
<td>5,085</td>
<td>230,593</td>
</tr>
<tr>
<td>30 - 39</td>
<td>0.20%</td>
<td>0.06%</td>
<td>5,010</td>
<td>188,274</td>
</tr>
<tr>
<td>40 - 49</td>
<td>0.39%</td>
<td>0.12%</td>
<td>10,854</td>
<td>325,418</td>
</tr>
<tr>
<td>50 - 59</td>
<td>1.20%</td>
<td>0.30%</td>
<td>23,035</td>
<td>522,255</td>
</tr>
<tr>
<td>60 - 69</td>
<td>3.34%</td>
<td>0.83%</td>
<td>56,897</td>
<td>902,152</td>
</tr>
<tr>
<td>70 - 79</td>
<td>6.93%</td>
<td>1.56%</td>
<td>69,716</td>
<td>682,942</td>
</tr>
<tr>
<td>80 - 89</td>
<td>12.75%</td>
<td>3.19%</td>
<td>76,964</td>
<td>390,975</td>
</tr>
<tr>
<td>90+</td>
<td>17.75%</td>
<td>4.44%</td>
<td>21,126</td>
<td>49,688</td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td>275,848</td>
<td>3,678,287</td>
</tr>
</tbody>
</table>
<p>So allowing uncontrolled spreading costs maybe 3.6 million person-years of life and 275k deaths in the UK. It seems likely from the Chinese experience that lockdown can prevent all or most (90%+) of this loss.</p>
<p>My full model is available <a href="https://docs.google.com/spreadsheets/d/11Ck055L2qItnzBDFQL-VHoq1eDquFLepq1tmcvIEmtw/edit?usp=sharing">online</a>.</p>
<h2 id="what-are-the-costs-of-lockdown">What are the costs of lockdown?</h2>
<p>The cost of lockdown are easy to enumerate but hard to quantify. Losses include:</p>
<ul>
<li>Unemployment and resultant financial hardship</li>
<li>Higher expected future taxation to recoup the costs of government intervention</li>
<li>Loss of wealth, both in the form of stock/bonds held in investment accounts, and in ownership stakes of e.g. small privately-held business. This may cause things like later retirement if pension schemes are unable to meet their obligations to pensioners.</li>
<li>Loss of utility due to social distancing: fewer parties, no cinema or theatre trips, no foreign holidays.</li>
</ul>
<p>The best-articulated endgame for lockdown is from <a href="https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf">Imperial College</a>. Their model assumes periods of on-again-off-again lockdown for 18 months designed to “flatten the curve” to a level that the healthcare system can deal with. After 18 months herd immunity has reached a level where lockdown can be completely lifted. You would also expect a vaccine or wide-scale blood plasma infusions to be deployable within 12-18 months, which puts a cap on the length of any lockdown.</p>
<p>I crudely estimate the costs of lockdown by saying that all 63 million people in the UK are in lockdown for 12 months and suffer a 15% quality of life penalty as a result of lockdown. This implies 9.5 million person-years of life lost as a result of lockdown. What I find striking is that this number is only a factor of 2 to 3 higher than the estimated 3.6 million person-year benefit from lockdown. So while from a cost-benefit-analysis point of view lockdown <strong>does</strong> look like the wrong choice, with only slightly different assumptions it would start to look attractive.</p>
<p>Another way to compare is to multiply the number of person-years saved by lockdown by the £30,000-per-person-year threshold <a href="https://onlinelibrary.wiley.com/doi/pdf/10.1002/psb.1562">used by the NHS</a> to decide whether or not the public should pay for a medical treatment. This method implies that we should be willing to spend about £110bn to avoid the consequences of uncontrolled spread. Goldman Sachs estimates that lockdown would <a href="https://thehill.com/policy/finance/488648-goldman-sachs-says-gdp-could-fall-24-percent-in-second-quarter">cost</a> about 6-7% of GDP in the long run, i.e. about £130bn <a href="https://www.statista.com/statistics/281744/gdp-of-the-united-kingdom-uk-since-2000/">in the UK</a>. From this financial point of view a policy of lockdown also looks totally defensible, and whether or not the costs exceed the benefits will depend on your exact assumptions.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Whether or not to go into lockdown is a hard decision because reasonable people can disagree on whether it makes sense from a cost-benefit perspective. This was surprising to me!</p>
<p>Areas where you might reasonably disagree with my model include:</p>
<ul>
<li>If you think that the costs of the lockdown will lead to a utility drop of less than 15%, or can be implemented by locking down the country for substantially less than 12 months</li>
<li>If you don’t subscribe to the ethical theory of utilitarianism at all (mine is a very utilitarian approach!)</li>
<li>If you believe that the lives of old people should be counted equally with the lives of the young then your estimate of the negative consequences of uncontrolled spread would increase</li>
</ul>
<p>On the other hand, if you think that almost all of the people who die from coronavirus would have died soon from something else, then the case for lockdown will look untenable.</p>
<p>A few factors that don’t really make a difference to the model:</p>
<ul>
<li>In an uncontrolled spread scenario, many people would get ill but not die. Assuming that every symptomatic person is sick for 7 days and then recovers, the UK loses only 297,404 person-years to sickness, which isn’t even close to the losses due to actual deaths. It seems to me that deaths are the right thing to focus on in the “costs” column, though I concede that it’s possible that a small number of those who recover will suffer permanent respiratory damage to - I don’t have a firm grip on how common this is.</li>
<li>In a recession, mortality actually <a href="https://www.npr.org/2018/01/09/576669311/hidden-brain-great-recession-deaths?t=1584893233639">falls</a> due to better cardiovascular health and fewer traffic accidents (you don’t need to commute to work if you are unemployed!). This probably only saves around 472,500 person-years so can’t do much to offset the utility losses from a lockdown-induced recession.</li>
</ul>
<p>One key limitation of my approach here is that I only compare two policy scenarios: complete lockdown and uncontrolled spread. Countries with effective government responses (e.g. China, South Korea and Singapore) have found middle ground between these two extremes that involve comprehensive efforts to track back detected cases to other potential infectees. These seem clearly preferable to either of the scenarios I consider but to date no European government seems capable of implementing this.</p>
<p>Perhaps a partial lockdown of just the most at-risk groups could also make sense (h/t <a href="https://infoproc.blogspot.com/2020/03/covid-19-cocoon-vulnerable-save-economy.html">Steve Hsu</a>). However, proposals to quarantine just the oldest in our society don’t obviously make sense to me, since only ~55% of the lost person-years are suffered by those who are 60 or older: although young people are less likely to die, they also have more time left to live!</p>MaxCountries around the world are going into more-or-less complete states of lockdown in an effort to stop the spread of novel coronavirus. The question I find myself asking is whether the obvious economic cost of this can possibly justify the benefits. I built a model to answer this question and I find that, contrary to my priors, lockdown is actually justifiable under reasonable assumptions.Predicting international house prices2020-03-01T17:28:00+00:002020-03-01T17:28:00+00:00http://blog.omega-prime.co.uk/2020/03/01/international-house-prices<p>What factors influence house prices? This is a perennial topic of dinner party discussion, but the standard of the debate rarely rises above offering anecdotal evidence. Frustrated with the status quo, I decided to tackle the question with statistics. In this post I look at which macroeconomic factors are associated with future house price rises or falls.</p>
<p>(This post is not intended to be investment advice and does not represent the opinions of my employer.)</p>
<p>To do this, I perform a <a href="https://en.wikipedia.org/wiki/Fama%E2%80%93MacBeth_regression">Fama-MacBeth regression</a>:</p>
<ol>
<li>I gathered quarterly timeseries for several macro factors covering many countries, plus quarterly house price returns in those countries</li>
<li>For each quarter in my sample, I perform a linear regression to predict next quarter’s house price returns from this quarter’s macro factors.</li>
<li>Now I have, for each factor, a timeseries of regression coefficients. If a <a href="https://en.wikipedia.org/wiki/Statistical_hypothesis_testing">hypothesis test</a> rejects the null that this timeseries has zero mean, there is a statistically significant association between the factor and future returns.</li>
</ol>
<p>I tested several factors that have channels of potential causality to house prices:</p>
<ul>
<li>Rent: if houses yield a lot of rent relative to their price, they could rise in price as people do e.g. more buy-to-let transactions</li>
<li>Affordability: house prices that are low relative to income could imply that housing will rise in the future</li>
<li>Past price rises: if house prices have risen in the recent past, this might continue in the future</li>
<li>Population growth: if the population is growing you might expect prices to rise as supply struggles to keep up</li>
<li>Economic growth measured several ways (GDP growth, median wage growth, per-capita consumption growth etc): if people have more money to spend they might use some of it to bid up existing houses</li>
<li>Interest rates: low rates make it easier to take out a big mortgage which lets them spend more on housing</li>
<li>National indebtedness: if people already carry a lot of debt they might be unwilling to take on a lot of housing debt, which could help hold valuations down</li>
</ul>
<p>In brief: my findings are that house prices rise most quickly in countries where prices have risen in the past one year, which offer attractive rental yields, have experienced high per-capita consumption growth over the last year, and have low short-term interest rates. The other factors tested are either subsumed by one of these strong predictors, or seem unrelated (at least, as far as I can tell given my sample: weak effects might only be discoverable in the presence of data from more countries and time periods).</p>
<p>My methodology is similar to that of Case and Shiller in their 1990 <a href="https://www.nber.org/papers/w3368.pdf">paper</a> “Forecasting Prices and Excess Returns in the Housing Market”. The key difference are that I consider the international housing market, while they look at the difference between US states, and they only look at a subset of the variables that I consider. I also have the advantage of having 30 years more data than they do!</p>
<p>The countries in my sample were as follows:</p>
<ul>
<li>Developed Europe: Austria (AUT), Belgium (BEL), Switzerland (CHE), Germany (DEU), Denmark (DNK), Spain (ESP), Finland (FIN), France (FRA), the United Kingdom (GBR), Ireland (IRL), Iceland (ISL), Italy (ITA), Luxembourg (LUX), the Netherlands (NLD), Norway (NOR), Portugal (PRT), Sweden (SWE)</li>
<li>Emerging Europe: Czechia (CZE), Estonia (EST), Greece (GRC), Hungary (HUN), Lithuania (LTU), Latvia (LVA), Poland (POL), Russia (RUS), Slovakia (SVK), Slovenia (SVN)</li>
<li>Asia: Australia (AUS), China (CHN), Indonesia (IDN), India (IND), Israel (ISR), Japan (JPN), Korean (KOR), New Zealand (NZL)</li>
<li>North America: Canada (CAN), Mexico (MEX), the USA (USA)</li>
<li>South America: Chile (CHL), Colombia (COL)</li>
<li>Africa: South Africa (ZAF)</li>
</ul>
<p>If you had borrowed money at the prevailing local-currency short-term interest rate and invested it in housing in the top 3 countries in this list as picked by my model <sup id="fnref:2"><a href="#fn:2" class="footnote">1</a></sup>, you would have grown your initial borrowing almost 900% from 1979 to 2020. This far outpaces the approximately 40% cumulative growth that you would have experienced in, say, the USA or Canada during that time:</p>
<p><img src="/2020/03/strategy_topn_simple.png" alt="" /></p>
<p>If UK/USA/Canadian house price appreciation looks a bit lower than you are expecting in this graph, remember that interest costs have been deducted from all the timeseries here, and I don’t include any income you would have made from renting the property out.</p>
<p>This graph makes it look like you’d be crazy not to following my trading strategy rather than own a home in a single country for a long period of time. However, the graph is misleading in that I have made no provision for transaction costs – in reality you’d end up giving a lot of the strategy return back in the shape of real estate agent fees and housing transaction tax (“stamp duty” in the UK).</p>
<p>As one estimate of tranasction costs, consider that UK stamp duty is <a href="https://www.gov.uk/stamp-duty-land-tax/residential-property-rates">roughly</a> 2%, real estate agent fees are <a href="https://www.which.co.uk/money/mortgages-and-property/home-movers/selling-a-house/the-cost-of-selling-a-house-a4gmu3n57y9v#headline_3">about</a> 1.5% and you could reasonably incur many other costs (e.g. conveyancing and mortgage arrangement fees) adding up to 0.5% for a total 4% cost every time you switch the house you own. If you switched houses every quarter over the 168 quarters from 1979 to 2020, that would imply that 672% of your 900% return would be eaten by fees! In reality you would want to switch houses less frequently than this, but it would still be a drag on strategy returns.</p>
<p>In the following sections I dive further into the technical detail of the model and discuss other trading strategies that you might use to exploit this model.</p>
<h1 id="which-returns">Which returns?</h1>
<p>How do you measure the returns to housing investment? One naive method is to look at the average selling price within each period, and say that the extent to which this grows from period to period represents the returns which a homeowner would have experienced. However, this is obviously flawed:</p>
<ol>
<li>The houses being sold at time T are different to those that we were sold at time T-1, so you are not measuring the price change in a fixed housing investment.</li>
<li>People are unwilling to sell houses if they would realize a loss, so the housing transactions countributing to the average are mostly drawn from the sample of homes with higher-than-average appreciation.</li>
<li>This methodology does not take account of quality improvements that have been made to houses between sales (e.g. regular maintenance, refurbishment or extension).</li>
</ol>
<p>These factors tend to bias up the reported returns to housing investment. Economists are well aware of these issues and have a few measurement methods that try to adjust for the bias. I principally rely on house price returns from the <a href="https://www.dallasfed.org/~/media/documents/institute/wpapers/2011/0099.pdf">Dallas Fed’s house price indexes</a>, which use the repeat-sales method to help debias. In this method, returns are estimated by looking at the average price growth experienced at any given fixed addresses, which avoids e.g. the returns being biased upwards if most housing transactions are of expensive new-builds. For countries not covered by the Dallas Fed, I estimate returns using <a href="https://data.oecd.org/price/housing-prices.htm">OECD’s average-sale-price index</a>.</p>
<p>I do not attempt to subtract maintenance or improvement costs from returns, which is definitely a flaw in my methodology. Unfortunately I don’t have good data about how these costs vary between countries so it’s not clear how to do better here.</p>
<p>I measure all returns in local currency terms, subtracting the local short-term interest rate, to give which are usually called “excess returns”. This is the metric we should use if we imagine that we are foreign investors in the local property market and we hedge our foreign exchange risk by selling a <a href="https://en.wikipedia.org/wiki/Foreign_exchange_swap">FX swap</a> so that we don’t lose or gain money if exchange rates moves.</p>
<p>For example, imagine that we are a UK based investor in Turkish property. If Turkish property appreciates 20% but the Turkish Lira falls 14% against the pound we’ve only made a 6% gain. On the face of it, we’ll be taking on a lot of foreign exchange risk: the Lira is <a href="https://uk.tradingview.com/symbols/TRYGBP/">volatile</a>, and we could easily lose 40% of our investment in a year just due to Lira depreciation. The smart thing to do is to enter into a GBP/TRY swap in the amount of our real estate investment: if TRY depreciates, we will gain money on the swap that offsets our real estate losses, i.e we are <em>hedged</em> with respect to currency movements.</p>
<p>The cost of a FX swap like this depend on the interest rate differential between the two countries: if GBP interest rates are 1% and Turkish rates are 12% then we’ll have to pay 11% annually to sell TRY via this swap, so our expected return on the Turkish real estate investment is equal to the expected return in local currency returns minus Turkish rates, plus GBP rates. Because GBP rates are the same regardless of which country we are investing in, in order to compare countries in which we might invest, we are most interested in predicting local currency returns minus local rates i.e. excess returns.</p>
<p>Other options would be to predict:</p>
<ul>
<li>Returns in local currency terms without adjustment for short-term interest rates: this is flawed because it overstates the returns available from investing in countries with very high inflation</li>
<li>Returns in dollar terms: this is the best variable to predict if you are going to make real estate investments in foreign countries without hedging the FX risk. However, the dollar FX rate introduces considerable noise into the returns which greatly weakens the power of our statistical tests.</li>
<li>Real returns in local currency terms. This avoids overstating the returns available in high-inflation countries, and does not introduce USD FX rate noise, but it’s not possible to guarantee that you will actually earn real returns, so the practical relevance of predicting these returns is unclear. Although there is clearly a positive association between high interest rates and high inflation, FX swaps cost an amount proportional to the interest rate and <strong>not</strong> proportional to the inflation rate.</li>
</ul>
<p>I do not include rental income in the returns that I’m trying to predict because there is usually little uncertainty about the yield you will earn, so there is no need to model it!</p>
<h1 id="population-growth">Population growth</h1>
<p>I obtained population data from the OECD and performed a Fama-MacBeth regression predicting excess returns from some of the fields.</p>
<p>This table summarizes the results: each column represents one model, the cells holding the average regression coefficient, the <a href="https://en.wikipedia.org/wiki/T-statistic"><code class="highlighter-rouge">t</code>-statistic</a> for the regression coefficient timeseries, and a number of stars summarizing the <a href="https://en.wikipedia.org/wiki/P-value"><code class="highlighter-rouge">p</code>-value</a> <sup id="fnref:3"><a href="#fn:3" class="footnote">2</a></sup> from <code class="highlighter-rouge">*</code> (<code class="highlighter-rouge">p < 0.05</code>) to <code class="highlighter-rouge">**</code> (<code class="highlighter-rouge">p < 0.01</code>) or <code class="highlighter-rouge">***</code> (<code class="highlighter-rouge">p < 0.005</code>).</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">population_15_to_64_growth</td>
<td style="text-align: left">0.00928 (0.026)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">2.48 (2.61)</td>
</tr>
<tr>
<td style="text-align: left">population_15_to_64_fraction</td>
<td style="text-align: left">-0.00364 (-0.09)</td>
<td style="text-align: left">0.00552 (0.145)</td>
<td style="text-align: left">-0.0248 (-0.582)</td>
</tr>
<tr>
<td style="text-align: left">population_total_growth</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.232 (-0.515)</td>
<td style="text-align: left">-3.41* (-2.94)</td>
</tr>
</tbody>
</table>
<p>The only population model which shows a significant coefficient is one which includes both the growth rate in the total population and the 15-64 growth rate, and the coefficient on the two rates have opposite signs. This looks like a clearly spurious result caused by the colinearity of these two variables.</p>
<p>Interestingly, if you include rental income in the returns then there is a weak positive association with population growth. So (weirdly) it appears that population growth causes high rental yields, but not house price appreciation:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">population_15_to_64_growth</td>
<td style="text-align: left">1.79** (4.67)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">8.43* (4.23)</td>
</tr>
<tr>
<td style="text-align: left">population_15_to_64_fraction</td>
<td style="text-align: left">-0.0232 (-0.525)</td>
<td style="text-align: left">0.0256 (0.568)</td>
<td style="text-align: left">-0.194* (-3.17)</td>
</tr>
<tr>
<td style="text-align: left">population_total_growth</td>
<td style="text-align: left"> </td>
<td style="text-align: left">2.25* (3.89)</td>
<td style="text-align: left">-10.4 (-2.89)</td>
</tr>
</tbody>
</table>
<h1 id="economic-growth">Economic growth</h1>
<p>I take quarterly median wage growth from the OECD <a href="https://data.oecd.org/earnwage/average-wages.htm">wage database</a>, and real GDP and private consumption growth from the OECD <a href="https://stats.oecd.org/Index.aspx?DataSetCode=QNA">quarterly national accounts</a> data. I tested both quarterly data, and rolling means evaluated over the trailing year of data (to smooth out intra-year variation). Regression results are as follows:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
<th style="text-align: left">4</th>
<th style="text-align: left">5</th>
<th style="text-align: left">6</th>
<th style="text-align: left">7</th>
<th style="text-align: left">8</th>
<th style="text-align: left">9</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">real_wage_growth</td>
<td style="text-align: left">0.508*** (4.06)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.0768 (0.705)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">local_real_personal_disposable_income_growth</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.227*** (3.62)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.154* (2.47)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">real_private_consumption_growth</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.396*** (6.7)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.26*** (3.89)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">real_gdp_growth</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.314*** (5.76)</td>
<td style="text-align: left">0.0945 (1.56)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">real_wage_growth_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.116* (3.3)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.0372 (-1.15)</td>
</tr>
<tr>
<td style="text-align: left">local_real_personal_disposable_income_growth_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.161*** (7.38)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.0221 (0.79)</td>
</tr>
<tr>
<td style="text-align: left">real_private_consumption_growth_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.153*** (9.14)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.0967*** (3.79)</td>
</tr>
<tr>
<td style="text-align: left">real_gdp_growth_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.151*** (6.56)</td>
<td style="text-align: left">0.0559 (2.27)</td>
</tr>
</tbody>
</table>
<p>Broadly, the pattern is that real per-capita private consumption growth has the strongest association with future house price appreciation, and the magnitude of this is quite large: a 1% increase in consumption predicts a 0.15% to 0.4% increase in house prices.</p>
<h1 id="momentum">Momentum</h1>
<p>In most markets, recent price changes tend to continue into the future. This is an one of the most robust observations in quantitative finance, and it has been validated as being true from as early as <a href="https://www.aqr.com/Insights/Research/Journal-Article/A-Century-of-Evidence-on-Trend-Following-Investing">1880</a>. It seems that housing is no exception to this rule: the average rate of price growth over the last year is a strong prediction of future returns:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">momentum</td>
<td style="text-align: left">0.174*** (14.9)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.0195 (-0.592)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.177*** (18.2)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.227*** (6.86)</td>
</tr>
<tr>
<td style="text-align: left">usd_momentum</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.0911*** (12)</td>
<td style="text-align: left">-0.0317 (-1.72)</td>
</tr>
</tbody>
</table>
<p>Here I have tested the rate of growth as computed from local returns (<code class="highlighter-rouge">momentum</code>), from excess returns, and from dollar returns. Each works, but the strongest prediction of future excess returns is the measure calculated from excess returns themselves. The model suggests that if prior period returns were 1%, then next period returns are expected to be be 0.18%.</p>
<p>This is a fairly short-term effect. If you lag the momentum variable by 1 year it has no predictive power, and looking at the momentum effect from the prior 6 quarters separately (rather than taking the mean of the first 4 to form <code class="highlighter-rouge">excess_momentum</code>) we find that only the first 4 quarters get a significant positive coefficient:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">excess_momentum</td>
<td style="text-align: left">0.198*** (17.5)</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag_1y</td>
<td style="text-align: left">-0.0165 (-1.79)</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag0</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.385*** (8.25)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag1</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.197** (2.54)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag2</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.106* (1.87)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag3</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.179*** (3.75)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag4</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.113* (-1.74)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum_lag5</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.0071 (-0.116)</td>
</tr>
</tbody>
</table>
<h1 id="value-and-carry">Value and carry</h1>
<p>Momentum is the most famous quantitative factor, but “value” and “carry” are <a href="https://www.factorresearch.com/research-value-momentum-carry-across-asset-classes">almost as famous</a>. Value is the idea that assets that have a low market price relative to their “fundamental” value will tend to outperform in the future, and “carry” is roughly the idea that assets that are expected to yield a lot of cashflow relative to their market price will outperform.</p>
<p>In housing investment, the obvious “value” measure is to look at the ratio of incomes to house prices. If a country has incomes that are low relative to house prices, that housing market might be overvalued. “Carry” is simply the rental yield, and if housing is like other markets we expect countries with high rental yields to have house prices that appreciate in the future.</p>
<p>Rental yield and price-to-income are available in the <a href="https://data.oecd.org/price/housing-prices.htm">OECD housing database</a>, but only as indexes which are normalized such that all countries have value 100 in the index as of 2015. This frustrates cross-country comparison, so I convert from the index to an absolute level of rental yield and price-to-income by scaling the index such that the last observation matches the rental yield/price-to-income data from <a href="https://www.globalpropertyguide.com/">Global Property Guide</a> and <a href="https://www.numbeo.com/cost-of-living/">Numbeo</a>, two great resources for quantitative data about the current state of the housing market worldwide.</p>
<p>Regressing the value and carry factors together with momentum, we do see roughly the expected pattern, but it is very weak compared to momentum:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">momentum</td>
<td style="text-align: left">0.177*** (7.68)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">carry</td>
<td style="text-align: left">-0.0113 (-0.138)</td>
<td style="text-align: left">0.1 (1.15)</td>
<td style="text-align: left">0.0609 (0.602)</td>
<td style="text-align: left">0.159 (1.53)</td>
</tr>
<tr>
<td style="text-align: left">value</td>
<td style="text-align: left">0.00375** (3.63)</td>
<td style="text-align: left">0.00189 (1.56)</td>
<td style="text-align: left">0.00416** (3.35)</td>
<td style="text-align: left">0.00276 (2.24)</td>
</tr>
<tr>
<td style="text-align: left">excess_momentum</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.174*** (13)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">usd_momentum</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.0834*** (6.01)</td>
<td style="text-align: left"> </td>
</tr>
</tbody>
</table>
<p>The “value” effect is not always statistically significant, but the coefficient always has the right sign. The “carry” effect is never significant but has the right sign in the most important specification, where momentum is measured via <code class="highlighter-rouge">excess_momentum</code>.</p>
<h1 id="interest-rates">Interest rates</h1>
<p>I take long and short term interest rates from the OECD <a href="https://stats.oecd.org/Index.aspx?DataSetCode=MEI_FIN">MEI database</a>. I form the “ranked” version of each of these by sorting countries on the interest rate variable and then assigning countries a real number in the range -1 to 1 which is a linear function of their index after sorting. Interest rates exhibit very strong dispersion across countries, and ranking helps avoid the regression results being driven purely by one or two outlier countries.</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
<th style="text-align: left">4</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">short_term_rates</td>
<td style="text-align: left">-0.93*** (-5.65)</td>
<td style="text-align: left">-0.593*** (-6.36)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">long_term_rates</td>
<td style="text-align: left">0.397 (1.84)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.625*** (-5.05)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_short_term_rates</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.00836*** (-5.87)</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_long_term_rates</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.00114 (0.811)</td>
<td style="text-align: left">-0.00595*** (-6.78)</td>
</tr>
</tbody>
</table>
<p>As you would expect, high short term rates are associated with lower excess returns in the future: a 1% higher short term rate means 0.6% lower price appreciation. This is partly a mechanical effect, since
excess returns are defined as local price returns minus short term rates, but in fact this pattern remains even if we predcit local price returns instead:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
<th style="text-align: left">4</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">short_term_rates</td>
<td style="text-align: left">-0.404 (-2.67)</td>
<td style="text-align: left">-0.326*** (-3.55)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">long_term_rates</td>
<td style="text-align: left">0.155 (0.821)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.527* (-2.32)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_short_term_rates</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.00284 (-2.04)</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_long_term_rates</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.000378 (-0.284)</td>
<td style="text-align: left">-0.00316*** (-3.44)</td>
</tr>
</tbody>
</table>
<h1 id="household-indebtedness">Household indebtedness</h1>
<p>In this set of tests, I take indebtedness measures from the OECD annual <a href="https://stats.oecd.org/Index.aspx?DataSetCode=NAAG">national accounts</a> data. I don’t find an association between any of these features and future returns:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
<th style="text-align: left">4</th>
<th style="text-align: left">5</th>
<th style="text-align: left">6</th>
<th style="text-align: left">7</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">household_saving_fraction</td>
<td style="text-align: left">0.022 (0.792)</td>
<td style="text-align: left">-0.000814 (-0.0259)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">household_debt_fraction</td>
<td style="text-align: left">0.000586 (0.463)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.000552 (0.428)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">net_lending_pct_gdp</td>
<td style="text-align: left">-0.0368 (-1.3)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.00645 (0.188)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_household_saving_fraction</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.000316 (-0.139)</td>
<td style="text-align: left">-0.000481 (-0.455)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_household_debt_fraction</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">4.15e-05 (0.0258)</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-3.97e-05 (-0.0254)</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_net_lending_pct_gdp</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.000689 (-0.419)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.00077 (-0.839)</td>
</tr>
</tbody>
</table>
<h1 id="rental-growth">Rental growth</h1>
<p>I take measures of rental growth as used to compute <a href="https://stats.oecd.org/Index.aspx?DataSetCode=PRICES_CPI">OECD consumer price indexes</a> and regress returns on them. If anything it seems like rental growth is negative related to future price returns, which is the opposite of what I would expect. The relationship is weak, however, with only one model showing a weak significant result:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
<th style="text-align: left">1</th>
<th style="text-align: left">2</th>
<th style="text-align: left">3</th>
<th style="text-align: left">4</th>
<th style="text-align: left">5</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">cpi_actual_rentals</td>
<td style="text-align: left">-0.239* (-1.46)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">cpi_actual_rentals_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.039 (-1.31)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_cpi_actual_rentals_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.00182 (-1.16)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">cpi_imputed_rentals</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">1.15 (1.26)</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">cpi_imputed_rentals_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">0.00429 (0.107)</td>
<td style="text-align: left"> </td>
</tr>
<tr>
<td style="text-align: left">rank_cpi_imputed_rentals_1y</td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: left">-0.000391 (-0.3)</td>
</tr>
</tbody>
</table>
<h1 id="composite-model">Composite model</h1>
<p>Taking the most robust predictors from each of the models above, we can form a model that simultaneously takes account of all these factors:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">excess_momentum</td>
<td style="text-align: left">0.146*** (14.4)</td>
</tr>
<tr>
<td style="text-align: left">carry</td>
<td style="text-align: left">0.314*** (3.74)</td>
</tr>
<tr>
<td style="text-align: left">value</td>
<td style="text-align: left">0.000202 (0.199)</td>
</tr>
<tr>
<td style="text-align: left">real_private_consumption_growth_1y</td>
<td style="text-align: left">0.0858*** (3)</td>
</tr>
<tr>
<td style="text-align: left">short_term_rates</td>
<td style="text-align: left">-0.496*** (-6.43)</td>
</tr>
</tbody>
</table>
<p>This model has an average R-squared of 58%, meaning it explains almost two thirds of the variation in the international cross-section of house price returns.</p>
<p>If we try to predict returns inclusive of rental income, the carry factor obviously becomes much more important but the others remain mostly unaffected:</p>
<table>
<thead>
<tr>
<th style="text-align: left"> </th>
<th style="text-align: left">0</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">excess_momentum</td>
<td style="text-align: left">0.14*** (13.9)</td>
</tr>
<tr>
<td style="text-align: left">carry</td>
<td style="text-align: left">1.29*** (15.6)</td>
</tr>
<tr>
<td style="text-align: left">value</td>
<td style="text-align: left">0.00028 (0.279)</td>
</tr>
<tr>
<td style="text-align: left">real_private_consumption_growth_1y</td>
<td style="text-align: left">0.0843*** (3.02)</td>
</tr>
<tr>
<td style="text-align: left">short_term_rates</td>
<td style="text-align: left">-0.5*** (-6.59)</td>
</tr>
</tbody>
</table>
<p>Which countries are the best investments as of the time of writing this post? From best (rank 0) to worst (rank 33):</p>
<table>
<thead>
<tr>
<th style="text-align: right"> </th>
<th style="text-align: left">ex rental yield</th>
<th style="text-align: left">inc rental yield</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">0</td>
<td style="text-align: left">LVA</td>
<td style="text-align: left">LVA</td>
</tr>
<tr>
<td style="text-align: right">1</td>
<td style="text-align: left">PRT</td>
<td style="text-align: left">PRT</td>
</tr>
<tr>
<td style="text-align: right">2</td>
<td style="text-align: left">SVK</td>
<td style="text-align: left">SVK</td>
</tr>
<tr>
<td style="text-align: right">3</td>
<td style="text-align: left">HUN</td>
<td style="text-align: left">HUN</td>
</tr>
<tr>
<td style="text-align: right">4</td>
<td style="text-align: left">SVN</td>
<td style="text-align: left">SVN</td>
</tr>
<tr>
<td style="text-align: right">5</td>
<td style="text-align: left">GRC</td>
<td style="text-align: left">LTU</td>
</tr>
<tr>
<td style="text-align: right">6</td>
<td style="text-align: left">LTU</td>
<td style="text-align: left">GRC</td>
</tr>
<tr>
<td style="text-align: right">7</td>
<td style="text-align: left">EST</td>
<td style="text-align: left">POL</td>
</tr>
<tr>
<td style="text-align: right">8</td>
<td style="text-align: left">POL</td>
<td style="text-align: left">IRL</td>
</tr>
<tr>
<td style="text-align: right">9</td>
<td style="text-align: left">LUX</td>
<td style="text-align: left">EST</td>
</tr>
<tr>
<td style="text-align: right">10</td>
<td style="text-align: left">AUT</td>
<td style="text-align: left">LUX</td>
</tr>
<tr>
<td style="text-align: right">11</td>
<td style="text-align: left">NLD</td>
<td style="text-align: left">NLD</td>
</tr>
<tr>
<td style="text-align: right">12</td>
<td style="text-align: left">CZE</td>
<td style="text-align: left">BEL</td>
</tr>
<tr>
<td style="text-align: right">13</td>
<td style="text-align: left">BEL</td>
<td style="text-align: left">AUT</td>
</tr>
<tr>
<td style="text-align: right">14</td>
<td style="text-align: left">IRL</td>
<td style="text-align: left">DNK</td>
</tr>
<tr>
<td style="text-align: right">15</td>
<td style="text-align: left">DNK</td>
<td style="text-align: left">CZE</td>
</tr>
<tr>
<td style="text-align: right">16</td>
<td style="text-align: left">DEU</td>
<td style="text-align: left">ESP</td>
</tr>
<tr>
<td style="text-align: right">17</td>
<td style="text-align: left">SWE</td>
<td style="text-align: left">USA</td>
</tr>
<tr>
<td style="text-align: right">18</td>
<td style="text-align: left">ESP</td>
<td style="text-align: left">DEU</td>
</tr>
<tr>
<td style="text-align: right">19</td>
<td style="text-align: left">FRA</td>
<td style="text-align: left">SWE</td>
</tr>
<tr>
<td style="text-align: right">20</td>
<td style="text-align: left">FIN</td>
<td style="text-align: left">FIN</td>
</tr>
<tr>
<td style="text-align: right">21</td>
<td style="text-align: left">USA</td>
<td style="text-align: left">ITA</td>
</tr>
<tr>
<td style="text-align: right">22</td>
<td style="text-align: left">NOR</td>
<td style="text-align: left">NOR</td>
</tr>
<tr>
<td style="text-align: right">23</td>
<td style="text-align: left">CHE</td>
<td style="text-align: left">CHL</td>
</tr>
<tr>
<td style="text-align: right">24</td>
<td style="text-align: left">ITA</td>
<td style="text-align: left">FRA</td>
</tr>
<tr>
<td style="text-align: right">25</td>
<td style="text-align: left">CHL</td>
<td style="text-align: left">CHE</td>
</tr>
<tr>
<td style="text-align: right">26</td>
<td style="text-align: left">JPN</td>
<td style="text-align: left">CAN</td>
</tr>
<tr>
<td style="text-align: right">27</td>
<td style="text-align: left">GBR</td>
<td style="text-align: left">NZL</td>
</tr>
<tr>
<td style="text-align: right">28</td>
<td style="text-align: left">NZL</td>
<td style="text-align: left">GBR</td>
</tr>
<tr>
<td style="text-align: right">29</td>
<td style="text-align: left">CAN</td>
<td style="text-align: left">JPN</td>
</tr>
<tr>
<td style="text-align: right">30</td>
<td style="text-align: left">KOR</td>
<td style="text-align: left">ZAF</td>
</tr>
<tr>
<td style="text-align: right">31</td>
<td style="text-align: left">AUS</td>
<td style="text-align: left">AUS</td>
</tr>
<tr>
<td style="text-align: right">32</td>
<td style="text-align: left">ZAF</td>
<td style="text-align: left">RUS</td>
</tr>
<tr>
<td style="text-align: right">33</td>
<td style="text-align: left">RUS</td>
<td style="text-align: left">KOR</td>
</tr>
</tbody>
</table>
<h1 id="trading-strategy">Trading strategy</h1>
<p>If we could short housing, one way to exploit this model would be to go long or short housing in proportion to the expected returns estimate from our best model as of <code class="highlighter-rouge">n</code> periods ago. If we did this, our returns would look like this, where <code class="highlighter-rouge">n</code> ranges from 0 to 8:</p>
<p><img src="/2020/03/strategy_long_short.png" alt="" /></p>
<p>The number in brackets in the legend of this chart is the <a href="https://en.wikipedia.org/wiki/Sharpe_ratio">Sharpe ratio</a> i.e. the ratio of the mean return of the strategy within a year to the standard deviation of the strategy within that year. The Sharpe Ratio is the best measure of how good a trading strategy is, and our best result here of 2.76 is very respectable: as a point of comparison, the US stock market has been doing extremely well over the last few years, and it has realized a Sharpe of only 1 or so.</p>
<p>Unfortunately, we can’t short the housing market, so a more realistic strategy is hold all countries in proportion to their expected return, but avoid holding any with expected negative returns:</p>
<p><img src="/2020/03/strategy_long_only.png" alt="" /></p>
<p>(Note that from 1991 to 1994 this strategy is entirely invested in cash because all countries are expected to underperform.)</p>
<p>Another idea is to hold just the top <code class="highlighter-rouge">n</code> countries, where <code class="highlighter-rouge">n</code> ranges from 1 to 5:</p>
<p><img src="/2020/03/strategy_topn.png" alt="" /></p>
<p>Note that all of these strategies completely ignore the rental income portion of the return. This makes a big difference. If we set the expected rental return in the next period equal to the return over the previous period and then buy housing in the top <code class="highlighter-rouge">n</code> countries according to this metric, then our total return (including rental income) is much more strongly positive, with a maximum Sharpe of 2.86:</p>
<p><img src="/2020/03/strategy_topn_yieldful.png" alt="" /></p>
<p>Strikingly, you can get results almost as good by just purchasing housing in the countries exhibiting the highest current rental yield, which is a strategy commonly employed by housing speculators in the UK:</p>
<p><img src="/2020/03/strategy_topn_yield_yieldful.png" alt="" /></p>
<h1 id="conclusion">Conclusion</h1>
<p>House prices are strongly predictable (R-squared 58%) and mostly driven by momentum, rental yields, per-capita consumption growth, and short term interest rates. Trading strategies based on buying just those countries with the highest predicted price appreciation appear to realize very high, economically significant, Sharpe ratios, but it’s unclear how implementable these are given high transaction costs and the long-only nature of the housing market. The impact of government housing ownership regulations and taxes is also uncertain but this would certainly drag down returns somewhat. The dubious nature of house price index data, which do not fully account for the costs of maintenance and refurbishment, should also give us pause.</p>
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes">
<ol>
<li id="fn:2">
<p>It’s important to note the model-implied forward returns in this test have <strong>not</strong> been generated by a model fit on the returns that we are trying to predict. Instead, we do a “rolling out of sample backtest” where the model-implied expected return for the following quarter, <code class="highlighter-rouge">q+1</code>, is generated from:</p>
<ul>
<li>The current macroeconomic factors for <code class="highlighter-rouge">q</code>, and</li>
<li>The mean regression coefficient seen over all regressions up to the most recent one that could have been known at the time: i.e. the regression which predicts house returns in period <code class="highlighter-rouge">q</code> from macroeconomic factors known in <code class="highlighter-rouge">q-1</code></li>
</ul>
<p><a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>My <code class="highlighter-rouge">p</code>-values are not derived from the Student’s <code class="highlighter-rouge">t</code>-distribution, but rather from a Newey-West test. This is necessary because the left-hand-side in our regression (house price returns) are autocorrelated. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>MaxWhat factors influence house prices? This is a perennial topic of dinner party discussion, but the standard of the debate rarely rises above offering anecdotal evidence. Frustrated with the status quo, I decided to tackle the question with statistics. In this post I look at which macroeconomic factors are associated with future house price rises or falls.Is Bitcoin a junk asset?2019-04-11T19:01:00+01:002019-04-11T19:01:00+01:00http://blog.omega-prime.co.uk/2019/04/11/bitcoin-junk<p>One reason to believe that Bitcoin is a poor investment is because it is a rather volatile asset. As a general rule, “lottery like” assets with high variance in their valuations are known to underperform low-risk equivalents:</p>
<ul>
<li>You are not <a href="https://smile.amazon.co.uk/Missing-Risk-Premium-Volatility-Investing/dp/1470110970?sa-no-redirect=1">rewarded</a> in the stock market for holding volatile equities which are heavily exposed to systematic risk. In fact, such equities have returns that are <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/j.1540-6261.2006.00836.x">lower</a>, or at least <a href="https://febs.onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-6261.1992.tb04398.x">no higher</a> than their low-risk counterparts.</li>
<li>In the betting markets, investing in a diverse portfolio of risky long shot bets
has strong <a href="https://www.tandfonline.com/doi/abs/10.1080/00036840410001674240">negative returns</a></li>
<li>Out-of-the-money options have <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1340767">lower</a> returns than you would expect from theory</li>
</ul>
<p>All these effects are violations of academic expectations about investor rationality & efficient markets. It is believed that they come about either due to irrational investor demand for lottery-like payoffs, or as a rational response to leverage constraints.</p>
<p>One way that academics measure these effects is to construct “factor returns” that are the weighted average of the returns of other assets. For example, you might construct the factor returns to “quality” by working what you would earn if you took a long position in the stock of companies with low year-to-year variation in earnings, low debt, and strong earnings growth, while shorting debt-ridden shrinking companies with no earnings consistency. By averaging the returns of many stocks you ensure that the factor return is unrelated to the return of any one stock, and it just captures the part of the return that can be explained by “quality”. Of interest to us, the quantitative fund manager <a href="https://www.aqr.com">AQR</a> publishes two factor return timeseries that might plausibly by related to the returns on lottery-like assets:</p>
<ul>
<li><a href="https://www.aqr.com/Insights/Datasets/Betting-Against-Beta-Equity-Factors-Daily">Betting against beta</a>: measures the returns to buying stocks which are poorly correlated to the stock market as a whole, while short-selling stocks which strongly correlate with the market.</li>
<li><a href="https://www.aqr.com/Insights/Datasets/Quality-Minus-Junk-Factors-Daily">Quality minus junk</a>: measures the returns to buying “quality” stocks which are profitable, growing, and low-risk, while selling their “junk” counterparts.</li>
</ul>
<p>Each of these hypothetical investment strategies shorts assets with lottery-like characteristics and consequently have strong (before-cost) returns:
<img src="/2019/04/factor-returns.png" alt="" /></p>
<p>So, is Bitcoin a junky lottery ticket with poor prospects for the future? One way to test this is to check whether the price co-moves with other junky assets. I correlated Bitcoin returns 2013 to date with those on various other assets:</p>
<ul>
<li>The global QMJ and BAB factors, plus two regional equivalents that are constructed with just stocks in the USA and Japan respectively</li>
<li>Traditional equity alternatives such as gold and high yield/investment grade bonds</li>
<li>Various equity indexes, with a particular focus on Asian indexes due to the popularity of cryptocurrency investment in those countries</li>
<li>Other crypto-currencies</li>
</ul>
<p><img src="/2019/04/return-corr-2013.png" alt="" /></p>
<p>This suggests that cryptocurrencies are remarkably uncorrelated with almost any other assets. In particular, there is no particular correlation with junky assets or BAB/QMJ. This is good news for those worried that Bitcoin is valueless because it is a lottery, but by the same token, it is bad news for those who think Bitcoin is valuable because it is “digital gold”: there is no correlation with high-quality assets such as the precious metal or investment grade bonds.</p>
<p>The correlation between Bitcoin and DOGE is particularly interesting. Dogecoin is a junk asset because no-one believes it is valuable, <a href="https://www.smh.com.au/technology/it-was-a-piss-take-the-aussie-behind-joke-cryptocurrency-dogecoin-and-how-it-reached-2-4b-20180125-p4yyvf.html">not even the creators</a>, so the fact that Bitcoin correlates with it could suggest that Bitcoin also has junk characteristics.</p>
<p>The picture does not change much if we restrict just to correlations during the 2018-2019 crypto bear market:</p>
<p><img src="/2019/04/return-corr-2018.png" alt="" /></p>
<p>In fact, looking at the rolling correlation between Bitcoin and some of these other assets, there is little evidence of change in the correlation structure over time (with the notable exception of Ethereum):</p>
<p><img src="/2019/04/return-rank-corr-ts.png" alt="" /></p>
<p>One possible concern with these correlative results is that the cryptocurrency markets could lead/lag traditional asset markets due to e.g. being less efficient. If this were true then my correlation estimates above would be artificially low: to correct for this, I looked at an alternative correlation metric that is robust to lagging. Specifically, to measure the correlation between assets A and B, I regress the (ranked) timeseries of A returns on the returns of the B, plus 1 day lagged/leading versions of B returns. The square-root of the coefficient of determination for this linear model is our lag-robust correlation metric. For ease of interpretation, I choose the sign of the square root to match that of the Pearson correlation betweeen A and B returns.</p>
<p>The results of this analysis are rather interesting: Bitcoin exhibits a small positive loading on US QMJ, which is not shared by the other two cryptos. Although all three cryptocurrencies have a negative loading on Japanese QMJ, Dogecoin and Ethereum get substantially more negative coefficients. (The QMS pattern repeats in the BAB equivalents.) We also see Ethereum and Bitcoin pick up a positive correlation with the gold price, and this correlation is not shared by Dogecoin.</p>
<p><img src="/2019/04/return-rank-lacorr-2018.png" alt="" /></p>
<h2 id="conclusion">Conclusion</h2>
<p>Overall it seems that there is hope here for those who are bullish on Bitcoin because they believe it is a new safe-haven asset. This is an argument that I am sympathetic to:</p>
<ul>
<li>Bitcoin has several structural advantages over competitor stores-of-value such as gold:
<ul>
<li>It is cheaper to store: forget security guards and fortresses, you can just need to secure a short secret</li>
<li>It is faster and cheaper to transmit than gold</li>
<li>Your transactions <a href="https://edition.cnn.com/2019/01/26/uk/venezuela-maduro-bank-of-england-gold-withdrawal-gbr-intl/index.html">cannot be blocked by foreign governments</a></li>
</ul>
</li>
<li>The fact that Bitcoin has no “intrinsic value” is irrelevant. The same is basically true for gold: a currency has value only because other people agree that it does, and this is sufficient to support the valuation even in the absence of other uses of the currency. Money is the bubble that <a href="https://medium.com/sunrise-over-the-merkle-trees/peter-thiel-on-crypto-investing-ada9112c9148">never has to pop</a>.</li>
</ul>
<p>Does crypto have value other than as a store of value? Personally, I have a hard time seeing a strong case here. Focusing on store-of-value applications, the game theory suggests that – in just the same way that gold outcompeted silver as a global store of value – a single cryptocurrency should take ~100% market share. It also seems very likely that if any cryptoasset reaches this point, it will be the coin with the best “brand name” i.e. Bitcoin. I therefore think that Bitcoin is the only worthwhile cryptocurrency investment, and expect that the returns to holding non-Bitcoin cryptos should be zero or negative on average.</p>
<p>Is this thesis strong enough to justify a big position in Bitcoin? While the coin has conceptual potential as a store of value, and this potential is somewhat supported by the correlative evidence, the Dogecoin correlations are clear evidence that most of what happens to Bitcoin value is still being driven by idiosyncratic demand/risk factors unrelated to my thesis, but rather related to the success of cryptocurrency as a whole. Because I’m not optimistic about the prospects for the asset class, and Bitcoin’s price is mostly being driven by systematic risk to crypto, I have cut my Bitcoin position significantly.</p>MaxOne reason to believe that Bitcoin is a poor investment is because it is a rather volatile asset. As a general rule, “lottery like” assets with high variance in their valuations are known to underperform low-risk equivalents:Do children make you happy?2019-03-25T19:17:00+00:002019-03-25T19:17:00+00:00http://blog.omega-prime.co.uk/2019/03/25/children-and-happiness<p>Do children make you happy? This is a question that anyone who is considering starting a family should be asking themselves. Among my peers there is a widespread assumption that children are a key ingredient in a happy middle age, but it’s never been clear to me that this is so.</p>
<p>I’m at a time in my life where this has become a useful question to answer, so I turned to the same place I look when I want to know the actual truth about any issue of substance: <a href="https://scholar.google.co.uk">Google Scholar</a>.</p>
<p>In the pro column, you might expect children to make you happier because:</p>
<ul>
<li>You get to watch your kid grow up, which can be quite rewarding if they become a good and sucessful person</li>
<li>Having children fulfils a biological imperative: you’d expect evolution to have set up your biology such that procreating <em>was</em> a happy experience, to encourage it to happen</li>
<li>Working on a shared project can help you bond with your partner</li>
<li>Having children is one of the few experiences that you are likely to share with most of the rest of the population, and shared experiences are useful means of making friends. Just think about how much time groups of young people spend talking about school with each other, or how often groups of older people are found to be discussing their children.</li>
<li>There is social pressure to have children. The pressure is exerted both by your parents, and by those of your friends who have chosen the route of childbirth. Bowing to this pressure should defuse this one potential source of stress!</li>
</ul>
<p>These are all compelling reasons for procreating improving happiness, but you can also make the bear case for childbirth:</p>
<ul>
<li>Having children is financially costly: developed-world estimates suggest a cost of raising a child to age 21 of about <a href="https://en.wikipedia.org/wiki/Cost_of_raising_a_child">about £200k</a>. Not only does this represent a direct hit to your happiness inasmuch as this money could be spent on alternative sources of joy (yachting, world travel, starting your own business), it also increases the chance that you’ll become financially stressed, which is a very clear source of unhappiness that is easily strong enough to e.g. break up a relationship.</li>
<li>It is also very costly in terms of your time: married women with children do 4 hours <a href="https://www.jstor.org/stable/2117814">more housework</a> than the childless on average (men do 1 hour more). This is certainly a lower bound on total time costs: you’ll also spend a bunch of time transporting them, shopping for them, researching schools etc.</li>
<li>It restricts the kind of choices you can make in your life. Want to go on holiday during the school term to avoid the rush? No-can-do. Work overseas for a few years? Probably highly disruptive to your child’s education. Impulsive visit to a fancy restaurant? Probably not, unless the babysitter has last-minute availability.</li>
<li>Having children creates risks to your happiness that would not otherwise exist. If your child is injured, becomes a drug addict, is bullied at school, is taken away from you in a divorce etc this is likely to be bad for your mental health.</li>
</ul>
<p>What does the evidence say? It would be misleading to simply compare the reported happiness of people with and without children, because it could be that people without children are naturally more unhappy for unrelated reasons – for example, because they tend to be poorer, and are less likely to be in long-term relationships. Or, it could be that those who have children are unusually happy for unrelated reasons: e.g. they may be recently married and in an unusually financially stable position (which is what precipitated the decision to give birth in the first place). To address this, there are essentially two approaches:</p>
<ol>
<li>Compare how happiness changes over time for those who have children. If there is a significant benefit to childbirth, you should see a significant and persistent increase in happiness in the year of birth. This is known as a longitudinal or event-study approach.</li>
<li>Compare the childless to those with children, but try to statistically control for those sources of variation in happiness that are unrelated to having children: i.e. a cross-sectional approach.</li>
</ol>
<p>I think the most compelling evidence comes from the event studies. Comparing self-reported happiness with a previous report from the same person is a very robust control for unmeasurable individual effects, such as a genetic predisposition to be happy.</p>
<p>The most careful <a href="https://www.demogr.mpg.de/papers/working/wp-2012-013.pdf">event</a> <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2167277">studies</a> I could find in the literature actually couple the longitudinal approach with additional controls for other variables that might explain happiness. This analysis suggests that the happiness change around childbirth <em>is</em> positive, but that these gains disappear entirely within a year:
<img src="/2019/03/happiness-before-after-study.png" alt="" /></p>
<p>This effect is fairly large: on their scale, a 0.5 point decrease is comparable to the negative effects of losing your job, so a 0.5 point increase around childbirth is certainly nice to have!</p>
<p>Another <a href="https://www.jstor.org/stable/3401473">study</a> investigates the effect of having children cross-sectionally, comparing self-reported happiness in pairs of monozygotic twins. The use of twins provides a strong control for unobservable genetic variation, which the study authors couple with controls for a number of other factors. They find a striking pattern where being in a partnership is strongly associated with life satsifaction, but having children does not have a clear additive effect: women’s happiness increases, but men’s does not. This chart shows the additive effect on a scale with 0 = not particularly satisfied, 2 = very satisfied:
<img src="/2019/03/twin-study.png" alt="" /></p>
<p>This twin study does not look at the number of years since childbirth, so it’s possible that the improvement in female happiness occurs only in the year of birth, which would make these results match the time-series evidence from the event studies. Consistent with this idea, the study finds that having children does not significantly affect the subjective well-being of either male or female twins aged 50–70 years.</p>
<p>Another interesting way to test the childbirth-happiness effect is via a natural experiment. Couples that require IVF to give birth will have the IVF treatment fail or succeed for reasons that are uncorrelated with non-parenthood-related determinants of later happiness. One <a href="https://www.aeaweb.org/articles?id=10.1257/aer.20141467">Danish study</a> exploited this to show that there was no significant difference in depression or divorce rate in the 10 years following IVF treatment between couples who do and do not have children as a result of that treatment. In fact, the data suggest that children perhaps cause <em>increased</em> depression in every year after the birth year:
<img src="/2019/03/life-outcomes-after-ivf-failure.png" alt="" /></p>
<p>All in all, the evidence seems clear that children do not increase you happiness more than very temporarily. What’s more, none of this addresses the concern that the people who do choose to become parents may be exactly those whose who will benefit from it. For example, if you have always had an image of yourself as a mother, then fulfilling that goal will make you happy (albeit temporarily). However, if you have always been indifferent to the idea, then you may find that even this temporary boost in happiness fails to materialise. I think this argues that if you are in enough doubt about whether to have children to read this blog post, you should expect improved happiness for even less than the 1 year that the evidence suggests is the average benefit.</p>
<h2 id="later-life">Later life</h2>
<p>These studies only consider your happiness over the first 18 years of a child’s life, where they are likely to be living with you. Perhaps substantial gains to having children accrue much later in life, when you are free to enjoy their company without experiencing many of the negatives? There is less evidence on this due to the very long horizons involved, but looking at the simple average happiness by age for both the childless and parents of children in German data, it seems that there is no particular difference:
<img src="/2019/03/happiness-later-life.png" alt="" /></p>
<p>There are several other reasons to believe that improved happiness will not manifest later in life. Studies studies suggest that parents reported well-being actually <a href="https://www.jstor.org/stable/2095629?seq=1#page_scan_tab_contents">improves</a> once adult children leave the home, and that all types of parent, including “empty nest” parents, are at <strong>least</strong> as <a href="https://www.ncbi.nlm.nih.gov/pubmed/16433280">depressed</a> as equivalent non-parents. Childless elders are just as <a href="https://journals.sagepub.com/doi/abs/10.1177/0192513X07303895">embedded into their communities</a> as parents, rarely express regrets about not having had children, and frequency of <a href="https://journals.sagepub.com/doi/abs/10.1177/016402757913004">interaction</a> with adult children has no bearing on adult happiness. Basically, there also seems to be no evidence that children contribute to happiness in old age.</p>
<!-- Also small-N survey evidence supports this: https://link.springer.com/article/10.1007/BF00986583 -->
<h2 id="conclusion">Conclusion</h2>
<p>Going into my literature review, I didn’t have any clear view on whether children improved happiness. I was actually expecting to find a positive effect of having children, consistent with “folk wisdom”, and so was rather surprised by the strength of the evidence against the idea: this is not at all what I found. There may be many fine reasons to have kids, but the selfish hope for a boost in happiness is not one of them.</p>
<!--
FIXME
It's just a shame that it doesn't seem to persist - it certainly doesn't seem worth having kids just to experience this short-term gain.
* [Antinatilists](https://en.wikipedia.org/wiki/Antinatalism) also have philosophical objections to having children: you might argue that minimizing suffering is more important than maximizing happiness, or that it is immoral to have them because it's impossible for them to give consent to being born. You can also argue that having children is bad for the environment. (I have to admit I don't personally find these arguments very compelling!)
TODO:
- Defuse argument that I assumed the conclusion coming in
- Defuse argument that there might be long run effects (more than 21 years after birth)
-->MaxDo children make you happy? This is a question that anyone who is considering starting a family should be asking themselves. Among my peers there is a widespread assumption that children are a key ingredient in a happy middle age, but it’s never been clear to me that this is so.Making money in cryptocurrency without price risk2018-03-28T20:58:00+01:002018-03-28T20:58:00+01:00http://blog.omega-prime.co.uk/2018/03/28/making-money-in-cryptocurrency-without-price-risk<p>Cryptocurrency trading is a high-risk business, with annualized volatility
of many tokens exceeding 100%. While I think that every investor should hold <strong>some</strong> Bitcoin (as part of holding the <a href="https://en.wikipedia.org/wiki/Market_portfolio">CAPM market portfolio</a>), it’s probably unwise to commit more than <a href="https://qoppac.blogspot.co.uk/2017/12/obligatory-bitcoin-post.html">a few percentage points of your net worth</a>.</p>
<p>However – there are ways to make money in crypto that do not involve taking on any price risk! In this post I will describe five strategies that I’ve been trading in my personal account with some success.</p>
<h2 id="margin-lending">Margin lending</h2>
<p><a href="https://www.bitfinex.com/">BitFinex</a> is one of the major cryptocurrency exchanges. As well as offering “cash” trading, it lets investors trade on margin. If a margin trader wants to take a leveraged position in, say, Bitcoin they need to borrow dollars to purchase the coin. On BitFinex, these dollars are provided by other users of BitFinex who have deposited them on the exchange and made them available for lending.</p>
<p>When you lend dollars on BitFinex you lose the use of the dollars for a short fixed term. The borrower may repay the loan at any time during the term, but while the loan remains open you receive interest. Historically, the interest rate has been very substantial – often in excess of 100% annualized!</p>
<p align="center">
<img src="/2018/03/margin_lending.png" alt="" width="800px" />
</p>
<p>Implementing this trade is fairly straightforward:</p>
<ol>
<li>Deposit USD. You pay <a href="https://support.bitfinex.com/hc/en-us/articles/115004261674-Deposit-Fees">0.1%</a> to do this via a bank transfer.
- At one time, BitFinex did not support USD deposits, so the only way to get USD on the system was to buy cryptocurrency elsewhere, transfer it to BitFinex, and then sell it for USD. This may still be cheaper/easier than arranging a bank transfer.</li>
<li>Set up <a href="https://github.com/HFenter/MarginBot">MarginBot</a> to automatically offer your dollars on the lending market.</li>
</ol>
<p>At the time of writing, this is not a particularly profitable strategy – after BitFinex take their 15% cut of the interest paid, you can expect to make only about 4.5% annualized. In my opinion this return is not sufficient to compensate you for the risks of the investment, namely:</p>
<ol>
<li><strong>Tether risk</strong>: when you own USD on BitFinex you don’t own real dollars, but rather units a token called “TetherUSD”. This token is somewhat dodgy:
<ol>
<li>It is theoretically redeemable at the <a href="http://tether.to">issuer</a> for real dollars, but the issuer has not had functioning international banking relationships since April 2017, making this something of a moot point. (It is, however, generally possible to trade it 1:1 for real dollars at <a href="https://www.kraken.com/">Kraken</a>.)</li>
<li>Accusations are <a href="https://twitter.com/bitfinexed">occasionally thrown around</a> suggesting that Tether is not actually backed 1-for-1 with real dollars. Personally I don’t believe this, but it is a risk – it is particularly strange that the audit that had been comissioned to put paid to this accusation <a href="https://www.coindesk.com/tether-confirms-relationship-auditor-dissolved/">never happened</a>.</li>
<li>I think it is somewhat likely that Tether will be shut down as a <a href="https://blog.bitmex.com/tether/">money laundering scheme</a> in the medium-to-long term.</li>
</ol>
</li>
<li><strong>Hack risk</strong>: around 2016-08-03, BitFinex was <a href="https://themerkle.com/bitfinex-to-socialize-losses-customers-will-lose-36-of-deposited-funds/">hacked</a>, and they socialized the resulting losses across all customers of the platform, resulting in a 36% haircut to everyone’s positions. They did eventually make everyone whole again, presumably from the substantial trading profits generated by the exchange, so my backtest results above do not account for this loss.</li>
<li><strong>Margin lending risk</strong>: when you lend to a margin trader, your funds are secured by the cryptocurrency that the trader purchases. If their position value gets too low, the BitFinex platform will automatically sell it in the market to make sure that sufficient funds are available to repay the margin loan. However, in unusual market conditions with fast price changes or thin books, it may not be possible to liquidate all traders in the market without imposing losses on the lenders.</li>
<li><strong>Fixed term lending risk</strong>: you cannot close the loan out early if you need the funds in a hurry. This is mitigated by the fact that the usual lending period is only 2 days.</li>
</ol>
<p>That all being said, I think it is somewhat likely that the returns to margin lending will at some point improve to something like their historical norms.</p>
<table>
<thead>
<tr>
<th> </th>
<th>Annualized returns</th>
</tr>
</thead>
<tbody>
<tr>
<td>2015</td>
<td>18.4%</td>
</tr>
<tr>
<td>2016</td>
<td>12.4%</td>
</tr>
<tr>
<td>2017</td>
<td>30.9%</td>
</tr>
<tr>
<td>2018 to date</td>
<td>15.2%</td>
</tr>
<tr>
<td>Post-hack to date</td>
<td>24.5%</td>
</tr>
</tbody>
</table>
<p>To estimate historical returns to margin lending, I used data from <a href="https://www.bfxdata.com/datadownload/">bfxdata.com</a>. To cross-check the validity of the data, I compared the VWAR to my own realized VWAR in MarginBot:</p>
<p align="center">
<img src="/2018/03/margin_lending_vs_marginbot.png" alt="" width="800px" />
</p>
<p>As you can see, it’s a pretty good match. Some of the outliers are probably due to operational issues I had running MarginBot reliably when I first started using it, and during periods of high load on Bitfinex. Others may be the result of traders actively seeking to close high-rate loans early and replace them with cheaper financing – an effect I do not try to account for in my backtest.</p>
<h2 id="bitmex-hedged-short-perpetual">BitMEX hedged short perpetual</h2>
<p><a href="http://bitmex.com/">BitMEX</a> is a cryptocurrency derivatives exchange. Because futures positions do not have to be fully-funded, it provides an alternative to margin borrowing for traders who wish to take on leveraged positions in crypto.</p>
<p>One of the products that is listed on BitMEX is <code class="highlighter-rouge">XBTUSD</code>. This is a <a href="https://www.bitmex.com/app/perpetualContractsGuide">perpetual swap</a> whose value is roughly anchored to the price of Bitcoin via a mechanism called “funding”. Specifically, every 8 hours, shorts will earn (and longs will pay) interest on the notional USD value of their positions at a rate <code class="highlighter-rouge">F</code> determined by the following formula:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F = P + Clamp(0.01% - P, -0.05%, 0.05%)
</code></pre></div></div>
<p>Where <code class="highlighter-rouge">P</code> is the <a href="https://www.bitmex.com/app/index/.XBTUSDPI8H">average premium</a> at which <code class="highlighter-rouge">XBTUSD</code> traded over the BitMEX Bitcoin reference rate in the last 8 hours. If P was zero, you would therefore earn 0.01% every 8 hour period i.e. 11.6% annualized just from the funding component. In practice, P tends to be positive, which makes the rate higher than this, but also much more volatile! (Note that if P is negative then shorts can end up paying longs rather than vice-versa.)</p>
<p>Shorting <code class="highlighter-rouge">XBTUSD</code> is a risky proposition, but you can neutralize this risk by going long spot Bitcoin. This means that any USD losses on your <code class="highlighter-rouge">XBTUSD</code> short will be exactly offset by gains in the value of the Bitcoin, but you’ll still earn the funding rate <code class="highlighter-rouge">F</code>.</p>
<p>Specifically, to initiate this trade you should:</p>
<ol>
<li>Buy <code class="highlighter-rouge">Q</code> units of Bitcoin at the prevailing cash price <code class="highlighter-rouge">C</code></li>
<li>Short <code class="highlighter-rouge">C*Q</code> USD of <code class="highlighter-rouge">XBTUSD</code> on BitMEX – you should be able to do this at a price
not more than 0.1% from <code class="highlighter-rouge">C</code></li>
</ol>
<p>The value of the portfolio created by steps 1 and 2 is <code class="highlighter-rouge">Q*C</code> USD, and if the Bitcoin cash price later changes to <code class="highlighter-rouge">C'</code> then it’s easy to see that the P&L on your Bitcoin position <code class="highlighter-rouge">Q*(C'-C)</code> USD should be <a href="https://www.bitmex.com/app/pnlGuide">offset</a> exactly by that on the <code class="highlighter-rouge">XBTUSD</code> position <code class="highlighter-rouge">C*Q*((1/C')-(1/C)) = Q*((C/C')-1)</code> BTC i.e. <code class="highlighter-rouge">Q*((C/C')-1)*C' = Q*(C-C')</code> USD.</p>
<p>One wrinkle with this calculation is that it ignores the cumulative effect of the funding <code class="highlighter-rouge">F</code>. In the long term you hope that <code class="highlighter-rouge">F</code> will be positive. If it is, then your BitMEX margin account will be credited with Bitcoin with value equal to the interest you are owed. This will create long Bitcoin price risk, and you will need to explicitly “reinvest” in the strategy by shorting more <code class="highlighter-rouge">XBTUSD</code> to remain perfectly hedged and earn the compounded returns I simulate in my backtests. Equally, if <code class="highlighter-rouge">F</code> proves to be negative you will need to delever by buying back <code class="highlighter-rouge">XBTUSD</code> or else you will grow steadily more short Bitcoin.</p>
<p>You should probably transfer all/most the Bitcoin you purchased in the first step to BitMEX to act as margin for your <code class="highlighter-rouge">XBTUSD</code> short. This helps avoid the short being stopped out, which would leave you with unhedged long exposure to Bitcoin.</p>
<p>If you had followed this strategy historically you would have earned a healthy return with an <a href="https://en.wikipedia.org/wiki/Information_ratio">annualized IR</a> of 3.7:</p>
<table>
<thead>
<tr>
<th> </th>
<th>Annualized returns</th>
</tr>
</thead>
<tbody>
<tr>
<td>2016-06-02 to 2016-12-29</td>
<td>98.7%</td>
</tr>
<tr>
<td>2017</td>
<td>65.9%</td>
</tr>
<tr>
<td>2017-04-21 to date</td>
<td>69.0%</td>
</tr>
<tr>
<td>2018 to date</td>
<td>-0.00239%</td>
</tr>
</tbody>
</table>
<p align="center">
<img src="/2018/03/xbtusd_short.png" alt="" width="800px" />
</p>
<p>(I highlight the 2017-04-21 breakpoint above because this is the date at which <a href="https://blog.bitmex.com/site_announcement/bitcoin-usd-swap-funding-rate-calculation-changes/">BitMEX changed their funding formula</a> to use the fixed 0.01% rate mentioned above.)</p>
<p>The principal risks of this strategy are:</p>
<ol>
<li><strong>BitMEX counterparty risk</strong>: BitMEX may be hacked or abscond with your deposits. They do appear to be careful about security but nothing is impossible.</li>
<li>
<p><strong>Margin risk</strong>: traders on BitMEX do not have to fully-fund their positions. Therefore, to avoid imposing losses on the rest of the ecosystem BitMEX operate a system where a losing trader may have their positions liquidated in the market. Just as with the equivalent system at Bitfinex, there is no guarantee that this will be possible if the market is somehow disorderly, and in this case BitMEX might not be able to meet its obligations to other users.</p>
<p>As an ameliorating factor, BitMEX operates an <a href="https://www.bitmex.com/app/insuranceFund">insurance fund</a> whose value currently stands at ~$40m. The fund is automatically topped up whenever a forced liquidation executes at a price better than expected, and the fund will be drawn upon to make traders whole in the event that BitMEX’s margining system fails.</p>
</li>
<li>
<p><strong>Basis risk</strong>: in my P&L calculation above, I assumed that the price of <code class="highlighter-rouge">XBTUSD</code> remains equal to that of spot Bitcoin. This is a somewhat reasonable assumption, given that the funding mechanism anchors the price very closely to the value of BitMEX’s <a href="https://www.bitmex.com/app/index/.BXBT">.BXBT index</a>, and that index is derived from quotes on two reputable exchanges that do not use Tether, but it’s not impossible to imagine that the two might become disconnected, leaving you less than perfectly hedged.</p>
</li>
<li><strong>Premium risk</strong>: as you can see above, the strategy is far from riskless and hasn’t made any money this year at all. If the premium rate remains low/negative you can lose your initial capital even if nothing technically goes “wrong” with the trade.</li>
</ol>
<h2 id="bitmex-hedged-short-futures">BitMEX hedged short futures</h2>
<p>As well as offering the <code class="highlighter-rouge">XBTUSD</code> perpetual swap, BitMEX lists traditional futures which settle at expiry to the Bitcoin USD price. You do not recieve/pay any funding cost when holding these futures, but it is usually the case that the futures trade at a premium to spot Bitcoin. This suggests another trading strategy, which is known as “cash and carry” in the BitMEX community:</p>
<ol>
<li>Buy <code class="highlighter-rouge">Q</code> units of Bitcoin at the prevailing spot price <code class="highlighter-rouge">C</code></li>
<li>Short <code class="highlighter-rouge">Q*F</code> USD of the current Bitcoin futures contract at the prevailing price <code class="highlighter-rouge">F > C</code></li>
<li>Sell your Bitcoin exactly at the final settlement price of the future i.e. sell it <a href="https://www.bitmex.com/app/index/.BXBT30M">TWAP over the window 11:30-12:00 UTC</a> on the expiry day.</li>
</ol>
<p>Your initial position value is <code class="highlighter-rouge">Q*C</code> USD. If the final settlement price is <code class="highlighter-rouge">C'</code> then your final P&L is <code class="highlighter-rouge">Q*(C'-C)</code> USD from the spot position and <code class="highlighter-rouge">Q*F*((1/C')-(1/F))=Q*((F/C')-1)</code> BTC from the futures position i.e. <code class="highlighter-rouge">Q*(F-C')</code> USD.</p>
<p>Therefore, your total P&L is <code class="highlighter-rouge">Q*(F-C') + Q*(C'-C) = Q*(F-C)</code> USD. If <code class="highlighter-rouge">k = F/C</code> (i.e. <code class="highlighter-rouge">k > 1</code>) then this simplifies to <code class="highlighter-rouge">Q*C*(k-1)</code> USD i.e. you earn a certain return-on-capital of <code class="highlighter-rouge">k-1</code>.</p>
<p>Just like in the <code class="highlighter-rouge">XBTUSD</code> case above, you will want to deposit your spot Bitcoin on BitMEX to avoid your short being liquidated. Unlike in the <code class="highlighter-rouge">XBTUSD</code> case, you should never lose capital due to fluctuations in the funding rate – if funding costs are negative then you will able to see at the time of entry that <code class="highlighter-rouge">k < 1</code>, and you should simply choose not to trade at all.</p>
<p>Returns to this strategy have been good but the IR is only 1 or so, probably due to the frequent spikes visible in the equity curve. (I suspect these spikes may be more an artifact of my backtest being slightly sloppy with timestamps, rather than a real effect.)</p>
<table>
<thead>
<tr>
<th> </th>
<th>Annualized returns</th>
</tr>
</thead>
<tbody>
<tr>
<td>2015</td>
<td>47.2%</td>
</tr>
<tr>
<td>2016</td>
<td>69.9%</td>
</tr>
<tr>
<td>2017</td>
<td>16.9%</td>
</tr>
<tr>
<td>2018 to date</td>
<td>116%</td>
</tr>
</tbody>
</table>
<p align="center">
<img src="/2018/03/xbt_futures_short.png" alt="" width="800px" />
</p>
<p>The key risks of this trade are:</p>
<ol>
<li>
<p><strong>BitMEX counterparty risk</strong> and <strong>Margin risk</strong>, just as described for <code class="highlighter-rouge">XBTUSD</code> above.</p>
</li>
<li><strong>Fixed term lending risk</strong>: you should never lose money with this strategy, but this only applies if you hold the position until expiry. It is entirely possible that your mark-to-market P&L will be negative at some/all times before expiry. In this case, liquidating your position to recover the commited funds will cause you to take a loss.</li>
<li>
<p><strong>Basis risk</strong>: the basis risks of this trade are similar to those of the <code class="highlighter-rouge">XBTUSD</code> trade, but there is one additional complication. It is BitMEX’s practice to derive the final settlement price strictly from the value of a Bitcoin on the expiry date. This means that in the event of a chain split that happens before contract expiry, the futures will settle at the value of the coin <a href="https://blog.bitmex.com/policy-on-bitcoin-hard-forks-update/">ex that distribution</a>.</p>
<p>It so happens that this policy actually benefits us as shorts, creating a new source of P&L: the value of a forked coin is always positive, so a fork announcment will tend to drive the value of the futures down while leaving the value of our spot bitcoin unaffected.</p>
<p>(This same rule actually created some extremely profitable trading opportunities around the SegWit2X split late last year. Traders on BitMEX did not seem to price in the value of the SegWit2X distribution, so it was possible to buy Bitcoin, short Bitcoin futures, and short SegWit2X futures on Bitfinex to lock in a large, deterministic profit.)</p>
</li>
</ol>
<h2 id="bitmex-hedged-short-altcoin-futures">BitMEX hedged short altcoin futures</h2>
<p>BitMEX lists futures contracts on some altcoins too (e.g. Ethereum, Dash, Ripple and Monero to name just a few). These futures settle to the value of the altcoin in XBT terms, but frequently trade at a premium to that spot price. This means that the following trade is profitable:</p>
<ol>
<li>Buy <code class="highlighter-rouge">Q</code> units of the altcoin at a price <code class="highlighter-rouge">B*C</code> USD, where <code class="highlighter-rouge">C</code> is the price of Bitcoin in USD and <code class="highlighter-rouge">B</code> the price of the alt in BTC.</li>
<li>Short <code class="highlighter-rouge">Q</code> units of altcoin futures at the prevailing price <code class="highlighter-rouge">E</code>.</li>
<li>Short <code class="highlighter-rouge">Q*E*F</code> USD of Bitcoin futures at the prevailing price <code class="highlighter-rouge">F</code>.</li>
<li>At futures expiry, sell your altcoins for USD.</li>
</ol>
<p>Once the trade is closed out, your P&L will be as follows:</p>
<ul>
<li>Your spot position will earn <code class="highlighter-rouge">Q*(B'*C'-B*C)</code> USD</li>
<li>Your altcoin futures short will earn <code class="highlighter-rouge">Q*(E-B')*C'</code> USD</li>
<li>Your Bitcoin futures short will earn <code class="highlighter-rouge">Q*E*F*((1/C')-(1/F))*C'</code> USD</li>
<li>If <code class="highlighter-rouge">j=E/B</code> and <code class="highlighter-rouge">k=F/C</code> then it is easy to verify that this amounts to a total P&L of <code class="highlighter-rouge">Q*B*C*(j*k - 1)</code> USD – i.e. you will certainly make money so long as <code class="highlighter-rouge">j*k > 1</code>.</li>
</ul>
<p>This trade has the advantage that it lets you “double-dip” in the futures premium - if both BTC and altcoin futures are trading at a premium, you can earn a return-on-capital that is roughly the sum of those premia. However, there are a number of serious issues with running the strategy in practice:</p>
<ul>
<li>
<p><strong>Margin call risk</strong>: you cannot deposit your long altcoin position on BitMEX to use as margin, because BitMEX only accepts Bitcoins as collateral. This means that you will have to source a (potentially unlimited) number of bitcoins from somewhere to keep your short alive in the event that it moves against you.</p>
<p>Not only does this introduce the risk that you may not be able to hold the position to expiry, but it also makes the return-on-capital of the strategy comparatively less attractive.</p>
<p>Adding to the seriousness of this issue, BitMEX does not operate a “portfolio margining” system. Imagine that the price of an altcoin remains absolutely fixed in USD terms, but Bitcoin experiences a huge price spike. If this happens, your short Bitcoin futures position will have significant negative P&L that will be offset almost completely by positive P&L on your altcoin futures short. Despite the fact that your account has no net P&L and so is not at risk of defaulting, BitMEX will nonetheless <a href="https://www.bitmex.com/app/isolatedMargin#Portfolio-Margining">close out your Bitcoin short</a>, leaving you unhedged against a Bitcoin price drop.</p>
</li>
<li>
<p><strong>High trading costs</strong>: BitMEX has industry-leading fees of <a href="https://www.bitmex.com/app/fees">7.5bps taker/-2.5bps maker</a> for Bitcoin derivatives, but they charge a comparatively chunky 25bps taker/-5bps maker for the altcoins.</p>
</li>
<li>
<p><strong>Thin markets</strong>: the altcoin futures markets are extremely illiquid in comparison to the Bitcoin ones. Not only can bid-ask spreads be very substantial, but this may also increase the risk of a adverse event where BitMEX’s automatic deleveraging is unable to liquidate busted positions at prices which avoid imposing losses on counterparties.</p>
</li>
</ul>
<p>Quite apart from these concerns, all the standard risks associated with trading the BitMEX Bitcoin futures that we mentioned earlier still apply.</p>
<p>Altcoin futures seem to have been trading on BitMEX since early-to-mid 2017. The returns have been decent since that time:</p>
<table>
<thead>
<tr>
<th>Coin</th>
<th>Annualized returns</th>
</tr>
</thead>
<tbody>
<tr>
<td>BCH</td>
<td>10.5%</td>
</tr>
<tr>
<td>DASH</td>
<td>26.7%</td>
</tr>
<tr>
<td>ETH</td>
<td>15.6%</td>
</tr>
<tr>
<td>LTC</td>
<td>31.4%</td>
</tr>
<tr>
<td>XMR</td>
<td>26.9%</td>
</tr>
<tr>
<td>XRP</td>
<td>15.5%</td>
</tr>
<tr>
<td>ZEC</td>
<td>59.2%</td>
</tr>
</tbody>
</table>
<p>Note that these returns are in Bitcoin terms – you could make an even higher risk-free profit in USD terms by also putting on the Bitcoin futures hedge as described above.</p>
<p>The equity curve looks even trashier than the XBT futures variant. This is probably due a combination of timestamp sloppiness and the fact that the only source for altcoin marks I could find was the Bitfinex daily price API, which does probably provide timestamps which are synchronized with BitMEX’s. These limitations mostly affect the mark-to-market P&L, so the overall return numbers above should still be trustworthy.</p>
<p align="center">
<img src="/2018/03/alt_futures_short.png" alt="" width="800px" />
</p>
<h2 id="cryptofacilities-hedged-short-futures">CryptoFacilities hedged short futures</h2>
<p>This final strategy is a simple variant on the “BitMEX hedged short futures” strategy above, but adapted to <a href="http://cryptofacilities.com">CryptoFacilities</a>, a FCA-registered and London based exchange offering futures on the XBT/USD, XRP/USD and XRP/XBT rates.</p>
<p>Because the XRP/USD futures are settled in Ripple, if you short them while holding an offsetting long Ripple position then you lock in a return on capital which depends only on the ex ante premium of the futures over the spot rate. This is entirely analagous to the long Bitcoin/short BitMEX Bitcoin futures trade.</p>
<p>CryptoFacilities offers contracts which expire roughly one week, one month, and 3 months ahead. Historically, it has been common for the weekly future to trade at a 1-2% premium to spot, and the quarterly contract frequently traded at a premium of 10-15%, so this trade has had annualized returns of 60-70%. Unfortunately I don’t have any backtest results to share with you here because it is not presently possible to get price data for expired contracts from the CryptoFacilities API (according to their support team they do plan to add this feature, but there is no ETA for this).</p>
<p>The risks of this trade are very similar to those of the BitMEX equivalent. However, you do face one risk that is unique to CryptoFacilities. On the CryptoFacilities platform, your counterparty for a trade is not CryptoFacilities themselves, but rather the customer that you traded with. In the event that the other customer has their positions liquidated, the CryptoFacilities system will try to buy them back in the market and assign the resulting positions to you so you aren’t left unhedged. However, this is a best-effort process which CryptoFacilities does not guarantee the success of, so if the system fails due to e.g. thin markets then you may have your own position closed out at an unfavourable price and with no recourse.</p>
<h2 id="closing-words">Closing words</h2>
<p>The strategies outlined above have historically been able to return 50%+ annualized without any exposure to cryptocurrency price risk. In my opinion these are all examples where the market has mispriced risk, because the return has been more than sufficient to compensate for the expected value of the losses.</p>
<p>All the strategies are somewhat correlated, and their returns seem to depend on the level of speculative fervour in the crypto markets. This makes sense, because all of these strategies are just margin lending in one guise or another, and increased demand for speculative dollars should raise the cost of renting those dollars. This does mean that these strategies may be long crypto in a sense – while you may not lose any money if crypto tanks, you won’t make very much either!</p>
<p>Indeed, as of the time of writing, all of these strategies are offering very minimal returns of 5% or less annualized. This is doubtless related to the fact that we have been in a bear market for crypto for 3 months now. Because of these low returns, I’m not trading any of these strategies right now, but I stand ready to invest again should the market turn.</p>
<p>Finally, an obligitory note that you should not blindly follow the advice in this post! I’m just some dude on the internet with an unhealthy interest in financial market anomalies, not any kind of qualified investment manager or financial adviser.</p>
<p>Please note that all my backtests indicate returns that may not have been achievable historically (in particular, because I don’t account for trading costs), and may not be at all indicative of the returns that are available in the future. Even though these strategies try to avoid raw Bitcoin price risk, there are still substantial risks involved in implementing them, and you will have to decide for yourself whether those are suitable, after carefully considering all the factors.</p>MaxCryptocurrency trading is a high-risk business, with annualized volatility of many tokens exceeding 100%. While I think that every investor should hold some Bitcoin (as part of holding the CAPM market portfolio), it’s probably unwise to commit more than a few percentage points of your net worth.Being concrete about the benefits of tax efficient index investment2017-04-11T00:16:24+01:002017-04-11T00:16:24+01:00http://blog.omega-prime.co.uk/2017/04/11/being-concrete-about-the-benefits-of-tax-efficient-index-investment<p>In my <a href="/2017/04/08/tax-efficient-and-financing-efficient-uk-individual-investing">last post</a> I discussed the methods that a UK individual could use to make investments. There were plenty of different methods, all with their own unique tradeoffs.</p>
<p>In this post I’m going to focus just on the issue of tax efficiency. Let me remind you that I’m definitely not a tax professional and this post just reflects my current understanding of the situation. You probably shouldn’t rely on it to be correct, and should seek independent advice before using any of this info.</p>
<p>That being said, what I have done is write a simulation of an unleveraged FTSE 100 investment from 1989 to 2016-11-01, as achieved via four methods:</p>
<ol>
<li>Index-tracking ETF</li>
<li>Index future</li>
<li>Spread bet</li>
<li>CFD</li>
</ol>
<p>With a 100,000 GBP initial investment, the most efficient investing method (spread betting) has a final account value of 610,718 GBP: 163k higher than the least efficient investment method (index futures), which had a final value of 447,629 GBP. That’s a 36% difference!</p>
<p>CFDs and ETFs came somewhere in the middle: the CFD investor would have had 521,090 GBP at the end, while the ETF holder would have 491,957 GBP – and this is <em>with</em> the generous assumption that the fees charged by the ETF provider are 0%.</p>
<p>Spread bets win for a simple reason: they don’t pay any capital gains or income tax at all. With an unleveraged investment, the high financing costs of a spread bet are irrelevant. Note that my model assumes that your spread bet provider pays you the full value of a dividend if you had a long position. It is by no means guaranteed that this applies to your provider, but there are few companies out there that <strong>do</strong> do things this way: I will cover a few in the last part of this post.</p>
<p>Why are index futures so inefficient? The reason is that index future returns are net of the risk free rate. This means that you have to stick your unmargined money in a bank account to earn back the risk free rate, and that means you end up paying income tax – the most onerous of all the taxes. Over the sample period the futures strategy ends up paying 115,309 GBP in income tax alone. The capital gains tax obligations are a relatively modest 19,236 GBP. The trading costs of this option are also relatively high (thanks to the quarterly contract roll), at 2,156 GBP but this is dwarfed by the tax charges.</p>
<p>Note that my backtest period includes a period of rather high interest rates in the UK (rates fluctuated <a href="http://www.tradingeconomics.com/united-kingdom/interest-rate">around 10%</a> at the beginning of the period). It’s likely that investing in futures is more tax-efficient nowadays than it was historically.</p>
<p>CFDs benefit from being able to treat dividend payments on the underlying as capital gains. The CFD investor would have paid only 49,914 GBP in capital gains tax over the period, and no income tax at all. The fact that this number is roughly half the total tax burden of the index future investor reflects the fact that the higher rate of income tax is about twice the rate of capital gains tax.</p>
<p>Finally, the ETF investor would have paid a mix: 46,314 GBP in income tax, and 16,897 GBP in capital gains. This is a total tax burden not much higher than that paid by the CFD investor, but it differs in that the CFD investor’s capital gains liabilities mostly arise towards the end of the test (2013 and later), while the ETF is dribbling dividend income away to the taxman almost every year since inception (actually, 1994 is the first year in which the ETF dividend income exceeds the tax-free threshold).</p>
<p>Naturally, for those who are willing and able to invest all their money within an ISA, all of this discussion is irrelevant – in this case, all dividends and capital gains will be tax free anyway, so you may as well just buy an ETF. Individuals who have hit their ISA contribution cap, or who want to do things that are incompatible with ISAs (e.g. hold futures and options, or use leverage) may however find this information useful.</p>
<p>Note finally that my tests make a number of assumptions:</p>
<ul>
<li>You are a UK higher rate taxpayer</li>
<li>Today’s tax regime applies across all of history</li>
<li>For the index future results: that you can save in an easy-access account offering 0.5% above LIBOR</li>
<li>You realize your gains somehow to take full advantage of your annual tax free allowance</li>
<li>I couldn’t find monthly FTSE 100 index returns anywhere (I know this sounds weird, but I really did try quite hard and they were nowhere to be found) so I backed them out from Quandl’s <a href="https://www.quandl.com/collections/futures/liffe-ftse-100-index-futures">FTSE 100 index future data</a> by assuming an annual dividend yield of <a href="http://www.ftse.com/Analytics/FactSheets/Home/DownloadSingleIssue?issueName=UKX">3.83%</a>. At least this should mean that my results aren’t affected by shocks to the level of market-expected dividiends.</li>
</ul>
<h3 id="spread-betting-providers">Spread-Betting Providers</h3>
<p>From the above we can see that spread betting can be advantageous. However, this conclusion is sensitive to the amount of dividends that the provider passes on to the better. My computed final account value of 610,718 GBP assumes 100% of dividends are passed on, but if just 90% are passed on then you will have only 552,525 GBP at the end – still good, but not much better than a CFD investment. At 85% retention final value is 525,472 GBP, and if your provider is cheeky enough to retain 80% then final value would be 499,698 GBP – almost as bad as holding an ETF.</p>
<p>I gathered some info from around the web about the charges imposed by various spread-betting providers. For long term investors the relevant bits of info are the financing rate and the fraction of dividends that are passed through to you (for a long position). The below table compares a few providers on these criteria, again assuming an investment into the FTSE 100.</p>
<p>(Note that I expect that you would only pay the financing cost on the value of your position that exceeds your cash deposit, so the financing rate may not be at all important for a totally unleveraged investor.)</p>
<table>
<tr>
<th>Provider</th>
<th>Financing Rate<br /> (LIBOR + X%)</th>
<th>Dividend Passthrough</th>
</tr>
<tr>
<td><a href="http://www.ayondo.com/en/markets/">Ayondo</a></td>
<td><a href="http://www.ayondo.com/en/learn/spread-betting/financing/">2.5%</a></td>
<td><a href="http://www.ayondo.com/en/learn/spread-betting/corporate-actions/">100%</a></td>
</tr>
<tr>
<td><a href="http://www.cityindex.co.uk/">CityIndex</a></td>
<td><a href="http://www.cityindex.co.uk/financing-and-charges.aspx">2.5%</a></td>
<td>I expect 90% given <a href="http://www.cityindex.co.uk/financing-and-charges.aspx">"net dividends"</a>, which matches <a href="http://www.financial-spread-betting.com/spreadbetting/Cityindex-cfds.html">another source</a></td>
</tr>
<tr>
<td><a href="http://www.cmcmarkets.co.uk">CMC Markets</a></td>
<td><a href="http://www2.cmcmarkets.co.uk/help/eng/cfd/your_daily_statement.jsp?subsection=3">3%</a></td>
<td><a href="http://www2.cmcmarkets.co.uk/help/eng/cfd/your_daily_statement.jsp?subsection=1">100%</a></td>
</tr>
<tr>
<td><a href="https://www.corespreads.com">CoreSpreads</a></td>
<td><a href="https://www.corespreads.com/knowledge-base/overnight-financing-what-is-it/">2.5%</a></td>
<td><a href="https://www.corespreads.com/knowledge-base/dividends-how-are-they-treated-in-a-spread-bet/">90%</a></td>
</tr>
<tr>
<td><a href="http://www.etxcapital.co.uk/">ETX</a></td>
<td><a href="http://www.etxcapital.co.uk/support/details/what-are-your-overnight-financing-charges">3%</a></td>
<td><a href="http://www.etxcapital.co.uk/support/details/what-is-a-dividend">100%</a> but another (older) source <a href="http://www.trade2win.com/boards/spread-betting-cfds/101974-dividends-what-do-you-get-2.html">says 90%</a></td>
</tr>
<tr>
<td><a href="https://www.gkfx.com">GKFX</a></td>
<td>??</td>
<td><a href="https://www.gkfx.co.uk/faq/orders/what-are-index-and-equity-dividends-and-why-are-they-applied-on">100%?</a>, see also <a href="https://www.gkfx.co.uk/Dividends">here</a></td>
</tr>
<tr>
<td><a href="https://www.ig.com/uk">IG</a></td>
<td><a href="https://www.ig.com/uk/shares-spread-bet-product-details">2.5%</a></td>
<td>I expect 90% given <a href="https://www.ig.com/uk/shares-spread-bet-product-details">"net dividends"</a>, but other sources <a href="http://www.trade2win.com/boards/">say 85%</a></td>
</tr>
<tr>
<td><a href="https://www.intertrader.com/spread_betting.html">InterTrader</a></td>
<td><a href="https://www.intertrader.com/spread_betting/spread_betting_costs.html">2.5%</a></td>
<td><a href="https://www.intertrader.com/trader_education/faq.html">80%</a></td>
</tr>
<tr>
<td><a href="https://www.lcg.com">LCG</a></td>
<td><a href="https://www.lcg.com/uk/indices/spreads-costs/">2.5%</a></td>
<td><a href="http://www.spread-betting.com/compare/spread-bets-pay-partial-dividends">Perhaps 80%</a></td>
</tr>
<tr>
<td><a href="https://www.spreadex.com/financials/">SpreadEx</a></td>
<td><a href="https://www.spreadex.com/financials/indices/uk-100-daily/MLkO14u">??</a></td>
<td><a href="https://www.spreadex.com/financials/range-of-markets/indices/market-information/#indicesFAQs">100%</a></td>
</tr>
</table>
<p>Without considering any other factors, Ayondo seems like the best deal, with full dividend passthrough and low financing costs.</p>MaxIn my last post I discussed the methods that a UK individual could use to make investments. There were plenty of different methods, all with their own unique tradeoffs.Tax-efficient and financing-efficient UK individual investing2017-04-08T10:50:51+01:002017-04-08T10:50:51+01:00http://blog.omega-prime.co.uk/2017/04/08/tax-efficient-and-financing-efficient-uk-individual-investing<p>In my <a href="/2017/04/05/the-case-for-leverage-in-personal-investing">last post</a> I gave an example of a situation where individual investors might want to borrow money for investment purposes. This post will give an overview of the methods that individuals can use to achieve that leverage efficiently. I will also cover tax considerations, some of which may be relevant even to unleveraged positions. Much of what I cover here will be UK specific, particularly when it comes to taxes.</p>
<p>Before we begin I should probably say that I’m not a tax accountant, a lawyer, a professional financial advisor, or anything else: I’m just a guy with access to Google and an interest in efficiency. You should probably speak to a professional before acting on any on the info in this article! I <em>do</em> work for an investment company, but I’m certainly not speaking for them here, and the information in this post has little-to-no relevance to their business. This is simply a summary of my understanding based on my research – I haven’t actually tried most of these methods in practice. I would appreciate feedback if you notice any errors.</p>
<h3 id="secured-lending">Secured Lending</h3>
<p>A secured loan such as a mortage or <a href="https://en.wikipedia.org/wiki/Home_equity_line_of_credit">HELOC</a> is the form of borrowing that is probably most familiar to people. Because these loans are backed by an asset (i.e. probably your house), you can get very good interest rates: I see 2 year fixed teaser rates as low as 1.2% AER, which is less than 1% above the overnight GBP LIBOR rate of 0.225%.</p>
<p>The obvious downside of this form of borrowing is that the amount you can borrow is limited by the amount of home equity you have.</p>
<h3 id="margin">Margin</h3>
<p>Many stock brokerages offer margin accounts to their customers. A margin account is one where you are allowed to borrow to invest more than you have deposited into the account. The borrowed capital is secured by the equity in the account, which must meet a minimum value threshold (“margin requirement”), normally defined as some fraction of the total notional value of the account.</p>
<p>The broker I’m most familiar with is <a href="https://www.interactivebrokers.co.uk">Interactive Brokers</a> (IB). Roughly speaking, their <a href="https://gdcdyn.interactivebrokers.com/en/index.php?f=4745">rules</a> allow a margin account to borrow up to 100% of the value of the equity in the account (i.e. achieve 2x leverage). The interest rates charged are fairly low: right now, for GBP borrowing they charge 1.5% above LIBOR. The rates get more competitive if you borrow more – loans above GBP 80,000 only attract a charge of 1% over LIBOR.</p>
<p>If you’re using a broker’s margin facility you obviously need to accept their schedule of trading costs too. Luckily, IB’s fees are just as competitive as their margin charges, and start at around <a href="https://www.interactivebrokers.co.uk/en/index.php?f=1590&p=stocks1">6 GBP for an equity trade</a>.</p>
<h3 id="futures">Futures</h3>
<p>If you want to invest in an asset on which a liquid <a href="https://en.wikipedia.org/wiki/Futures_contract">futures contract</a> exists, this can be a very cheap way to achieve leverage. At the time of writing, a single <a href="https://www.theice.com/products/38716764/FTSE-100-Index-Future">FTSE 100 index future contract</a> has face value of around 73,000 GBP. The returns on this contract will closely match the returns on investing that same face value in an index tracking ETF. However, unlike with an ETF investment, if you purchase one of these futures contracts, you don’t need to invest those tens of thousands of pounds upfront – instead, you just need to deposit a certain amount of margin with your broker. Right now, IB only <a href="https://www.interactivebrokers.co.uk/en/index.php?f=marginnew&p=fut">require</a> about 6,500 GBP of deposited margin for a FTSE position held overnight, so you can potentially achieve 10x leverage without paying any financing costs.</p>
<p>If you use futures to make long-term investments in assets it is important to understand how the returns you earn on futures differs from that on the underlying asset. By an arbitrage argument you can show that the price of a futures contract should be equal to the <a href="https://en.wikipedia.org/wiki/Forward_price">forward price</a> F:</p>
<p align="center">
<img src="/2017/04/forward-price.png" alt="" width="200px" />
</p>
<p>Where S is the spot price of the underlying instrument (in our example, this would be the FTSE 100 index), T is the time to expiry of the contract, r is the risk-free rate and q is the “cost of carry”. The cost of carry is essentially a measure of the return you earn just by holding a position in the underlying. For an equity index future like the FTSE the cost of carry will be positive because by holding the components of the FTSE you actually earn dividends. For commodity futures the cost of carry may be negative because you will actually have to pay to store your oil or whatever.</p>
<p>If the spot price of the underlying stays unchanged, the daily return R on the contract will be:</p>
<p align="center">
<img src="/2017/04/forward-returns.png" alt="" width="400px" />
</p>
<p>This illustrates the key difference between holding the underlying and the future. With the future, you don’t just earn the returns of the underlying asset. – the value of your contract also decays each day by an amount related to the difference between the cost of carry and the risk free rate. If this decay costs you money, the future is said to be in <a href="https://en.wikipedia.org/wiki/Contango">“contango”, otherwise it is in “backwardation”</a>. You can somewhat offset the decay due to the risk free rate part of this by depositing the notional amount of your investment in an account that earns the risk free rate. However, even if you do this, you wouldn’t expect the returns on your position to perfectly match those of the ETF because the forward price is determined based on the <strong>expected</strong> risk free rate and cost of carry. If interest rates are unexpectedly low, or dividend payouts are unexpectedly high, then your futures investment will underperform the equivalent ETF, so you are bearing some additional risk with the futures investment.</p>
<p>The other issue with futures contracts is that if you hold them for the long term you will need to deal with the fact that they have a limited lifespan. For example, the June 2017 FTSE 100 contract expires on the 16th of June. On or before that date you will need to sell your position in the June contract and buy an equivalent one in a later one (e.g. the September 2017) contract, or else you’ll stop earning any returns from the 17th June onwards. This regular roll process incurs transaction costs which acts as a drag on your investment. Thankfully, futures contracts are generally very cheap to trade: not only does the bid-ask spread tend to be tight, but brokerage fees are lower – IB only charge <a href="https://www.interactivebrokers.co.uk/en/index.php?f=commission&p=futures1">GBP 1.70 per contract</a> to trade FTSE 100 futures, and most futures have extremely tight bid-ask spreads that are essentially negligible from the perspective of a long term investor.</p>
<p>One problem that makes index future investment particularly tricky for the individual investor is that these contracts generally have rather large notional value. The ~70k GBP value of one FTSE contract mentioned above is quite typical. So if you only have a small account, you can’t really use futures unless you’re willing to accept enormous leverage and all the risks that entails.</p>
<h3 id="contracts-for-difference">Contracts For Difference</h3>
<p>Contracts For Difference (CFDs) are an instrument you can buy from a counterparty who specialises in them. Big UK names in this area are <a href="https://www.ig.com/uk/cfd-trading">IG</a>, <a href="http://www.cityindex.co.uk/cfd-trading/">CityIndex</a> and <a href="https://www.cmcmarkets.com/en-gb/cfds">CMC Markets</a>, though IB <a href="https://gdcdyn.interactivebrokers.com/en/index.php?f=1170">also offers them</a>. Like a futures contract, these products let you earn the returns on a big notional investment in an asset without putting down the full amount of that investment upfront – instead, you just need to deposit some margin. Depending on the provider and the reference asset, the margin requirements can be very low: CityIndex seems to only require <a href="http://www.cityindex.co.uk/range-of-markets/indices/">0.5% margin</a> for a UK index investment, allowing for a frankly crazy 200x level of leverage. IB only require <a href="http://ibkb.interactivebrokers.com/article/1984">5% margin</a>.</p>
<p>Also like a future, CFDs are not available on just any underlying. It’s easy to bet on equity indexes, FX, and big-cap stocks with CFDs. It is also reasonably common to find bond or commodity CFDs, but not all providers will offer a full range here (IB <a href="http://ibkb.interactivebrokers.com/article/1984">don’t offer any</a>). The other characteristic that CFDs share with futures is low trading costs: for index CFDs, providers commonly only charge a spread of 1 index point, i.e. about 0.01% for the FTSE 100. IB as usual offer a good price of only <a href="https://gdcdyn.interactivebrokers.com/en/index.php?f=commission&p=cfd2">0.005% per trade</a> for version of the FTSE 100.</p>
<p>Now we turn to the <em>differences</em> between CFDs and futures. For starters, unlike futures, CFDs <strong>do</strong> have financing costs, and they are chunky: typical rates from CityIndex and friends are <a href="https://www.ig.com/uk/spread-betting-cfds-charges">2%-2.5% above LIBOR</a>, with IB again offering an usually good deal by only charging <a href="https://gdcdyn.interactivebrokers.com/en/index.php?f=1595&p=cfds2">a 1.5% spread</a>. On the plus side, if you hold a position in an asset via a CFD you will recieve dividends on that underlying, something that is not true if using a future.</p>
<h3 id="spread-bets">Spread Bets</h3>
<p>Spread bets are a bit of a UK specific way to lever yourself up. Many companies offering CFDs in the UK also offer spread bets. These are essentially CFDs in all but name, and will face almost identical trading and financing costs as compared to the equivalent CFD product. They are also generally available on exactly the same set of underlyings. The key difference between a CFD and a spread bet is that spread bets are treated as gambling rather than investing by the tax system, with the consequence that earnings via one of these instruments are subject to neither capital gains nor income tax!</p>
<p>I will return to the issue of tax later, as there is quite a lot to say on the topic.</p>
<h3 id="options">Options</h3>
<p><a href="https://en.wikipedia.org/wiki/Option_(finance)">Options</a> are a slightly more complex way to gain leverage than the above alternatives. The idea here is that if you want to make a leveraged long bet on e.g. the FTSE, you can achieve that by buying a long-dated call option with a strike price somewhere around the current level of the index. Because the strike price is high, you will be able to purchase the option relatively cheaply, but you can potentially recieve a very high return. For example, let’s say the FTSE is around 7000: you might be able to buy an option on 1x the index expiring in two years with a strike of 7000 for around 400 GBP. If the index is up 10% to 7700 at that time, then you will earn a profit of 700-400 = 300 GBP i.e. a 75% return on your investment, so you have effectively have 7.5x leverage.</p>
<p>Like futures or CFDs, options are only available on certain underlying assets. In the US, you can buy options on equity indexes with expiries up to three years in the future: these are known as <a href="http://www.cboe.com/products/stock-index-options-spx-rut-msci-ftse/s-p-500-index-options/spx-leaps">LEAPs</a>. Exchange-traded options with long expiries are also available in other countries: for example, the ICE lists <a href="https://www.theice.com/products/38716770/FTSE-100-Index-Option">FTSE 100</a> options expiring a couple of years ahead.</p>
<p>Also like with futures, investments via options won’t earn any dividends, but neither will they attract financing charges. Individual investors may have difficulty with the fact that these options have notional values in excess of 50,000 GBP (in the US, S&P 500 <a href="http://www.cboe.com/products/stock-index-options-spx-rut-msci-ftse/s-p-500-index-options/mini-spx-index-options-xsp">mini options</a> with smaller notionals are available, but they only list about 1 year out).</p>
<p>There are two other ways of purchasing options that may be more suitable for the UK small investor as they let you take a position in a smaller size. Firstly, companies offering spread bets tend to also sell options. I haven’t looked into this, but given how expensive their spread bet financing is, I would not be surprised if their options were substantially overpriced. Secondly, you can purchase a “covered warrant”, which is essentially an exchange-listed option targeted at individual investors. <a href="https://sglistedproducts.co.uk/warrant">Societe Generale</a> offers them via the <a href="http://www.londonstockexchange.com/exchange/prices-and-markets/covered-warrants/covered-warrants-home.html">London Stock Exchange</a>: i.e. these options can be purchased just like a regular stock.</p>
<p>I did have a brief look into whether covered warrants offered good value for money. Specifically, I looked into the cost of <a href="https://sglistedproducts.co.uk/productdetailspage/symbol:SE91/">SE91</a>, a call option on the FTSE 100 with strike price 8000 and expiry December 2018 (the longest dated option available at the time of writing). When I looked at it, the warrant was quoting at around 0.25 GBP with a spread of 0.002 GBP (1%):</p>
<p align="center">
<img src="/2017/04/warrant-quote.png" alt="" />
</p>
<p>The equivalent exchange-traded option was had a mid price of about 156 GBP on a spread of about 50 GBP (32%):</p>
<p align="center">
<img src="/2017/04/warrant-ice-quote.png" alt="" />
</p>
<p>The exchange-traded option is for a notional exposure 1000x larger than the warrant, which explains the order-of-magnitude difference between the prices. Taking this into account, the warrant looks pretty expensive, with even the bid price of 0.2488 GBP being higher than the equivalent exchange-traded option ask of 0.1805 GBP.</p>
<p>We can quantify exactly how much more expensive the warrant is by using the <a href="https://en.wikipedia.org/wiki/Black–Scholes_model">Black-Scholes option valuation model</a>. Given the level and volatility of the FTSE at the time, the model implies a fair market value for the option of 153 GBP, which lies within the bid-ask spread we actually observe on the exchange:</p>
<p align="center">
<img src="/2017/04/warrant-ice-quote-model.png" alt="" />
</p>
<p>SocGen are trying to charge us about 250 GBP for equivalent exposure. To put this in the same terms as the financing costs for the other instruments (i.e. as a spread to LIBOR), we can tweak the borrow cost assumption in this model until we get the right price out:</p>
<p align="center">
<img src="/2017/04/warrant-price-matched-model.png" alt="" />
</p>
<p>So it looks like the SocGen options are effectively offering leverage at a cost of LIBOR plus 2.75%, which is not a good deal. Trading them might still make sense so long as you are intending to hold them short-term, because they have much narrower bid-ask spreads than the exchange-traded equivalent, but in this case you’d probably end up better off buying the options OTC from a spread betting company.</p>
<p>I won’t consider options further as frankly speaking I find them harder to analyse than the alternatives.</p>
<h3 id="uk-taxes-on-investments">UK Taxes On Investments</h3>
<p>There are four forms of tax that are relevant to investors operating in the UK:</p>
<ul>
<li>Stamp duty: payable upon purchasing an asset</li>
<li>Dividend tax: payable upon recieving dividends from an asset</li>
<li>Income tax: payable upon recieving non-dividend income from an asset</li>
<li>Capital gains tax: payable upon sale of an asset</li>
</ul>
<p>Stamp duty is the simplest of the three. It’s a <a href="https://www.gov.uk/guidance/stamp-duty-reserve-tax-the-basics#how-much-is-payable">flat 0.5% charge</a> upon the purchase of shares in individual companies. It is not payable on the purchase of ETFs, futures contracts, or spread bets, so mostly not very relevant.</p>
<p>The amount of dividend tax you pay depends on your total income in a year, and can <a href="https://www.gov.uk/government/publications/dividend-allowance-factsheet/dividend-allowance-factsheet#examples">range</a> from 0% (if your dividends amount to less than the current tax-free allowance of 5,000 GBP) to 38.1% (if you pay “additional rate” tax of 45% on income above 150,000 GBP).</p>
<p>Income tax is payable on income from an asset that is not considered to be a dividend. Basically, if the asset is a bond, or a fund <a href="http://monevator.com/bonds-and-bond-funds-taxed/">more than 60% invested in bonds</a>, you will have to pay income tax instead of dividend tax. Income tax can range from 0% (if you earn less than the current Personal Allowance of 11,500 GBP) to 60% (if the income from dividends pushes you into the 100,000 GBP to 123,000 GBP band where the Personal Allowance is withdrawn). For more info see <a href="https://www.gov.uk/income-tax-rates/current-rates-and-allowances">HMRC</a> and <a href="http://www.contractorcalculator.co.uk/marginal_tax_rates_explained.aspx">this</a> discussion of the marginal tax rate. It’s not 100% clear to me what the tax treatment is on the final repayment of principal made by a bond issuer. I suspect the final repayment is treated as a capital gain, and for UK government debt at least it seems that <a href="http://www.dmo.gov.uk/index.aspx?page=gilts/gilt_faq#ISA_and_Tax">no capital gains tax is payable</a>.</p>
<p>Capital gains tax is payable on realised gains in excess of the annual <a href="https://www.gov.uk/capital-gains-tax/allowances">11,300 GBP</a> threshold. Higher rate taxpayers (i.e. those earning above 45,000 GBP) will <a href="https://www.gov.uk/capital-gains-tax/rates">pay 20%</a> on anything above this. Those who don’t pay higher rate tax may only pay 10% on some amount of their gains.</p>
<p>Capital gains tax is perhaps the trickiest of the taxes. Firstly, you need to know that it’s calculated based on your net realized loss during a year. So if you make a gain by selling some asset, you can avoid paying tax on that by selling another asset on which you have booked a loss. If you realise a net loss during a year, that can be <a href="https://www.gov.uk/capital-gains-tax/losses">carried forward indefinitely</a> to be set against future capital gains.</p>
<p>Secondly, note that that the tax free amount of 11,300 GBP is a “use it or lose it” proposition: if you don’t have 11,300 GBP of gains to report in a year then you won’t be able to make use of it, and it will vanish forever. This ends up being another reason to invest in a diversified portfolio of assets: if you are diversified then you’re likely to have <em>some</em> asset that you can liquidate during a tax year to take advantage of the allowance (just be careful that you don’t fall foul of the “bed and breakfasting” rules – see <a href="http://www.iii.co.uk/articles/29414/five-strategies-reduce-your-cgt-liability">this guide to realizing capital gains</a> for more info).</p>
<p>One general theme of all this is that you generally end up paying less tax on capital appreciation than on dividends.</p>
<h3 id="tax-efficient-investing">Tax Efficient Investing</h3>
<p>For concreteness, let’s say we are interested in making a (either leveraged or unleveraged) investment in equity indexes. How do these taxes apply to the investing methods discussed above, i.e. ETFs, futures, CFDs and spread bets? As already mentioned, none of these assets attract stamp duty. But what about dividend and capital gains tax?</p>
<p>ETFs are relatively straightforward: you pay dividend tax on the distributions, and capital gains tax upon selling an ETF that has increased in value. This may mean that it is more tax efficient to purchase an ETF that reinvests divends for you (like <a href="https://www.ishares.com/uk/intermediaries/en/products/253716/ishares-ftse-100-ucits-etf-acc-fund">CUKX</a>) rather than one that distributes them (such as <a href="https://www.ishares.com/uk/intermediaries/en/products/251795/ishares-ftse-100-ucits-etf-inc-fund">ISF</a>).</p>
<p>Futures contracts are straightforward: there are no dividends, so you simply pay capital gains tax. One potential problem is that you won’t have much control over when you realise gains for capital gains purposes because you’ll probably be rolling the contracts quarterly anyway. Furthermore, if you have taken that portion of your equity that does <em>not</em> go towards the margin requirement, and invested it in an interest-bearing account, then you will have to pay income tax on any interest income. For tax purposes it might be most efficient to invest in a zero-coupon government bond which will not attract either income tax or capital gains tax, but this might be more trouble than it is worth.</p>
<p>The tax treatment of CFDs is interesting. All cashflows due to the CFD are <a href="https://www.gov.uk/hmrc-internal-manuals/capital-gains-manual/cg56101">considered to be capital gains by HMRC</a> – what’s surprising is that this this includes both the interest you pay to support the position, and any payments you receive as a result of the underlying making a dividend payment. This makes CFDs rather attractive: you can end up paying capital gains tax rates on dividend income, and benefit from being able to use you interest payments to reduce capital gains liability, reducing the effective cost of margin by up to 20%.</p>
<p>Finally we come to spread bets: as mentioned earlier, bets are subject to different rules, so you don’t pay any tax at all on these. The flip side to this is if of course that if you make a loss, you aren’t able to offset it against capital gains elsewhere. It’s not totally clear to me whether this treatment applies to payments made on the spread bet as a result of dividend adjustment, but it looks like it may do. This is why some spread betting providers (e.g. <a href="https://www.corespreads.com/knowledge-base/dividends-how-are-they-treated-in-a-spread-bet/">CoreSpreads</a>) only pay out 80% or 90% of the value of any dividend to the punter. One last thing to note is that the spread betting providers themselves pay a <a href="https://www.gov.uk/guidance/general-betting-duty-pool-betting-duty-and-remote-gaming-duty#rates">betting duty of 3%</a> on the difference between punter’s losses and profits: this will of course be passed on to you in the form of higher fees.</p>
<h3 id="summary">Summary</h3>
<p>This is a lot of info to take in, so I’ve tried to summarize the most important points below. Trading costs assume a 100,000 GBP investment in the FTSE 100.</p>
<table>
<thead>
<tr>
<th></th>
<th>Secured Lending</th>
<th>Margin</th>
<th>Futures</th>
<th>CFDs</th>
<th>Spread Bets</th>
</tr>
</thead>
<tbody>
<tr>
<th>Available underlying</th>
<td>Anything</td>
<td>Anything</td>
<td>Equity indexes, debt, commodities,
FX, certain equities (though liquidity may be limited)</td>
<td colspan="2">Equity indexes, debt (sometimes), commodities (sometimes), FX, certain equities</td>
</tr>
<tr>
<th>Approximate max leverage</th>
<td>4x (assuming 75% LTV)</td>
<td>2x</td>
<td>10x</td>
<td>200x</td>
<td>200x</td>
</tr>
<tr>
<th>Financing cost above LIBOR</th>
<td>1%</td>
<td>1% to 1.5%</td>
<td>0%</td>
<td>1.5% to 2.5% (20% less if treatable as capital loss)</td>
<td>2% to 2.5%</td>
</tr>
<tr>
<th>Other holding costs</th>
<td colspan="2">0.09% (<a href="https://www.vanguard.co.uk/uk/portal/detail/etf/overview?portId=9509&assetCode=EQUITY##overview">Vanguard's VUKE</a> ongoing charge)</td>
<td>0.014% (quarterly <a href="https://www.interactivebrokers.co.uk/en/index.php?f=commission&p=futures1">roll costs</a>)</td>
<td colspan="2">0%</td>
</tr>
<tr>
<th>Trading costs (FTSE 100)</th>
<td colspan="2">0.09% (VUKE 0.06% bid-ask spread, 0.03% <a href="https://www.interactivebrokers.co.uk/en/index.php?f=1590&p=stocks1">commission</a>)</td>
<td><a href="https://www.interactivebrokers.co.uk/en/index.php?f=commission&p=futures1">0.0017%</a></td>
<td><a href="https://www.interactivebrokers.co.uk/en/index.php?f=commission&p=cfd2">0.005%</a></td>
<td><a href="http://www.cityindex.co.uk/range-of-markets/indices/">0.01%</a></td>
</tr>
<tr>
<th>Dividend treatment</th>
<td colspan="2">Paid in full by ETF provider</td>
<td>None, but expected dividends become a positive carry on holding the contract</td>
<td><a href="https://www.interactivebrokers.com/en/index.php?f=deliveryExerciseActions&p=corpActionCFDs&conf=am">Paid in full</a></td>
<td>Generally <a href="https://www.ig.com/uk/indices-spread-bet-product-details">paid in full</a> but some providers may withhold 10%-20%</td>
</tr>
</tbody>
</table>
<h3 id="a-return-boosting-idea">A return-boosting idea</h3>
<p>As a final note, here’s something I just noticed and haven’t seen mentioned anywhere else. If investing via a CFD, spread bet or futures contract, you only need to deposit margin with your broker. If you’re only using 1x leverage, this means that 90% of the notional value of your investment is free for use elsewhere, so long as you are able to move it back to the margin account if needed.</p>
<p>What’s interesting is that as an individual investor it’s <a href="http://www.moneysupermarket.com/savings/easy-access-accounts/?goal=SAV_EASYACCESS">straightforward</a> to find bank accounts that pay more than the risk free rate – even though these accounts do enjoy full backing from a sovereign government, and so are risk free in practice. For example, right now I can see an easy-access (aka demand deposit) account from <a href="https://www.rcibank.co.uk/">RCI Bank</a> accruing interest daily and paying an AER of 1.1% on balances up to 1 million pounds – i.e. about 0.9% above LIBOR. This is higher than the financing cost of a futures position (though not a CFD or spread bet), so it seems to me that there is reason to believe that the returns on a futures investment will actually beat out the equivalent ETF, so long as you do invest the “spare” equity in this way.</p>MaxIn my last post I gave an example of a situation where individual investors might want to borrow money for investment purposes. This post will give an overview of the methods that individuals can use to achieve that leverage efficiently. I will also cover tax considerations, some of which may be relevant even to unleveraged positions. Much of what I cover here will be UK specific, particularly when it comes to taxes.The case for leverage in personal investing2017-04-05T21:21:15+01:002017-04-05T21:21:15+01:00http://blog.omega-prime.co.uk/2017/04/05/the-case-for-leverage-in-personal-investing<p>The standard advice for personal investing that I see all around the web is to put your money into one or more low cost equity index tracking funds. Commentators also sometimes recommend an allocation to bonds (e.g. a 60/40 split between stocks and bonds), though the popularity of this advice seems to become less common with every passing month of the bull market.</p>
<p>However, the more I learn about investment, the more I come to think that this answer is suboptimal. To see why, let’s consider a simplified world where we have exactly two assets to which we can allocate our wealth: stocks and bonds.</p>
<h3 id="portfolio-theory">Portfolio Theory</h3>
<p>The historical evidence (see e.g. the excellent book <a href="https://www.amazon.co.uk/Expected-Returns-Investors-Harvesting-Rewards/dp/1119990726">Expected Returns</a>) is that the returns on bonds and equities are uncorrelated, with bonds having lower volatility (aka standard deviation) than equities – US treasuries experienced an annualized volatility of 4.7% a year between 1990 and 2009, while US equities had a volatility of 15.5% over the same period.</p>
<p>Given their lower volatility, bonds are clearly less risky than equities. However, you would hope that if you invest in equities rather than bonds, then your willingness to accept the inherently higher risks is compensated for by a higher expected return. This idea of can be captured mathematically as the <a href="https://en.wikipedia.org/wiki/Sharpe_ratio">Sharpe ratio</a>, which measures the reward you receive per unit risk taken. Specifically, the Sharpe ratio S is equal to the ratio between the expected “excess return” of the investment, and the standard deviation of those returns. The excess return is defined as the amount of the expected return R above a risk-free rate Rf (e.g. the rate of return you can get by lending overnight in the money markets). Putting it all together we get this formula for S:</p>
<p align="center">
<img src="/2017/04/sharpe-ratio.svg" alt="Thanks Wikipedia :)" width="200px" />
</p>
<p>All other things being equal, you probably want to invest in assets with as high a Sharpe ratio as possible.</p>
<p>It can be tricky to figure out what the Sharpe ratio is for investments of interest, but history suggests that stocks and bonds both have similar Sharpe ratios of roughly 0.3. What’s more, the correlation between their returns is close to 0. This last fact is important because <a href="https://en.wikipedia.org/wiki/Modern_portfolio_theory">portfolio theory</a> shows that you can form an investment with high Sharpe ratio by holding a diversified portfolio of two or more uncorrelated assets with low Sharpe ratios. Assuming that historical returns, volatilities and correlations are good guides to the future, the portfolio of stocks and bonds with highest Sharpe is one that holds roughly $3 of bonds for every $1 of stocks: this ratio comes about because the volatility of stocks is approximately 3 times that of bonds. This optimal portfolio has Sharpe ratio of about 0.42 i.e. 1.41 (= sqrt(2)) times higher than that either asset class by itself.</p>
<p>You can get a feel for how this optimization process works by playing with my <a href="http://www.omega-prime.co.uk/capm/">online portfolio theory tool</a>.</p>
<h3 id="leverage">Leverage</h3>
<p>What’s striking about this optimal portfolio is that it’s very different from the normal advice: it puts 75% of capital into bonds, much more than even the most conservative conventional advice of a 40% allocation. The obvious objection to the bond-heavy optimal portfolio is of course that it will have very low expected return compared to one with a bigger weighting on equities. This is absolutely true: we expect the volatility of the optimal portfolio to be around 0.75*(bond volatility) + 0.25*(stock volatility) = 0.75*4.7% + 0.25*15.5% = 7.4%. Because this is roughly half the volatility of equities by themselves, we’d expect the excess return on the portfolio to therefore only be about (0.42/0.3)/2 = 1.41/2 = 0.7 times that of a pure-equity allocation, which definitely sounds like bad news for the optimal portfolio.</p>
<p>However, this is a solvable problem – to recover the high expected returns we desire, we simply have to borrow money to invest greater notional amounts into the portfolio. If we borrow, then using 2x leverage (i.e. borrowing so as to invest $2 for every $1 of capital we actually control) would scale the volatility on our portfolio up from 7.4% to the equity-like levels of 14.8%. If we assume that we could borrow at the risk-free rate, then because our portfolio has Sharpe ratio higher than that of plain equities, the expected excess return in this scenario would be 41% <strong>higher</strong> than that of equities alone. So we earn better returns than with a pure-equity play even though we are running similar risks.</p>
<p>Everyone has probably been told at some point that diversification is good, but the way it is usually explained is by saying that diversification reduces your risk, which sounds worthy but sort of boring ☺. When you realise that this risk reduction means that you free up some “risk budget” which you can use to achieve extra returns via leverage, diversification starts sounding more exciting!</p>
<p>Of course, in reality we are unlikely to be able to borrow at the risk-free rate, but there are ways to borrow that are only just slightly more expensive than this – depending on the currency, companies and investment funds regularly borrow at rates as low as 0.5% above the risk-free rate. Even as an individual investor, there are ways in which you can borrow quite cost-effectively – this is a topic I will cover in a future post. The cost of leverage is of course a constraint that we should bear in bind, though: the fact that we face lending costs rules out activities such as levering up short term bonds (e.g. US treasury bills) which have very low volatility (because they take on very little interest rate risk).</p>
<h3 id="risk-parity">Risk Parity</h3>
<p>The approach to investing outlined above can be roughly summarized as:</p>
<ol>
<li>Assume that all investible assets have the same Sharpe ratio</li>
<li>Therefore decide to allocate to them in an amount inversely proportional to their volatility (following the advice of portfolio theory and mean-variance optimization for Sharpe ratio maximization)</li>
<li>Leverage up the resulting portfolio to achieve a particular desired level of risk</li>
</ol>
<p>This method is also known as “risk parity”. Famously, it’s the strategy used by Bridgewater’s <a href="https://www.bridgewater.com/research-library/the-all-weather-strategy/">All Weather hedge fund</a>, which has returned a Sharpe of roughly 0.5 since inception in 1996. (All Weather invests in asset classes other than stocks and bonds, so we would expect it to have a higher Sharpe than our earlier prediction of 0.42, simply due to the extra diversification.)</p>
<p>What’s particularly interesting about risk parity is that it’s not actually immediately obvious that taking bigger risks with your money leads to a sufficient extra level of return to compensate you for those risks. For example, take the case of stocks and bonds. From 1990 to 2009, US equities returned a (geometric) mean of 8.5% per year while treasuries returned 6.8%: so equities did earn a higher return, but one that doesn’t seem commensurate with the 3 times higher volatility experienced. Furthermore, global equities (which had similar volatility to US equities) actually only returned 5.9% i.e. considerably less than US bonds!</p>
<p>The fact that risk-taking is under-compensated is actually a well-known anomaly: an excellent paper on the subject is <a href="http://pages.stern.nyu.edu/~lpederse/papers/BettingAgainstBeta.pdf">Betting Against Beta</a> which suggests the reason may be because many investors are either unwilling or unable to use leverage. Whatever the cause, it’s good news for risk parity, because this means that the low-risk assets you are levering up actually have a higher Sharpe ratio than the high-risk assets that you (relatively speaking) disprefer, so your portfolio’s Sharpe ratio will be even better than you would naively expect. This subject is explored further in <a href="https://www.aqr.com/~/media/files/papers/understanding-risk-parity.pdf">this readable paper</a> on risk parity from AQR.</p>MaxThe standard advice for personal investing that I see all around the web is to put your money into one or more low cost equity index tracking funds. Commentators also sometimes recommend an allocation to bonds (e.g. a 60/40 split between stocks and bonds), though the popularity of this advice seems to become less common with every passing month of the bull market.Portfolio mean-variance optimisation in the browser2017-04-04T20:39:37+01:002017-04-04T20:39:37+01:00http://blog.omega-prime.co.uk/2017/04/04/portfolio-mean-variance-optimisation-in-the-browser<p>This is just to announce that I’ve written a small <a href="http://www.omega-prime.co.uk/capm/">tool</a> to visualise the risk/reward tradeoffs associated with investing in a diversified portfolio of risky assets.</p>
<p>Specifically, the tool shows the effects of applying classic <a href="https://en.wikipedia.org/wiki/Modern_portfolio_theory">mean-variance optimisation</a> to maximise the <a href="https://en.wikipedia.org/wiki/Sharpe_ratio">Sharpe ratio</a> of your portfolio.</p>MaxThis is just to announce that I’ve written a small tool to visualise the risk/reward tradeoffs associated with investing in a diversified portfolio of risky assets.Faster ordered maps for Java2016-12-29T18:22:31+00:002016-12-29T18:22:31+00:00http://blog.omega-prime.co.uk/2016/12/29/faster-ordered-maps-for-java<p>Sorted maps are useful alternatives to standard unordered hashmaps. Not only do they tend to make your programs more deterministic, they also make some kinds of queries very efficient. For example, one thing we frequently want to do at <a href="https://gsacapital.com/">work</a> is find the most recent observation of a sparse timeseries as of a particular time. If the series is represented as an ordered mapping from time to value, then this this question is easily answered in log time by a bisection on the mapping.</p>
<p>A disadvantage to using ordered maps is that they tend to have higher constant factors than simple unordered ones. Java is no exception to this rule: as we will see below, the standard HashMap outperforms the ordered TreeMap equivalent by two or three times.</p>
<p>My new open source Java library, <a href="https://github.com/batterseapower/btreemap">btreemap</a>, is an attempt to ameliorate these constant factors. As the name suggests, it is based on <a href="https://en.wikipedia.org/wiki/B-tree">B-tree technology</a> rather than the <a href="https://en.wikipedia.org/wiki/Red%E2%80%93black_tree">red-black trees</a> that are used in TreeMap. These “mechanically sympathetic” balanced tree data structures improve cache locality by using tree nodes with high fanout.</p>
<p>My library offers both boxed (<a href="http://static.javadoc.io/uk.co.omega-prime/btreemap/1.1.0/uk/co/omegaprime/btreemap/BTreeMap.html">BTreeMap</a>) and type-specialized (e.g. <a href="http://static.javadoc.io/uk.co.omega-prime/btreemap/1.1.0/uk/co/omegaprime/btreemap/IntIntBTreeMap.html">IntIntBTreeMap</a>) unboxed variants of the core data structure. Benchmarking it against some competing sorted collections (including <a href="http://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/ints/Int2IntRBTreeMap.html">fastutil</a> and <a href="http://mapdb.org">MapDB 1</a>) reveals it beats them by a good margin, though it still carries a performance penalty versus the simple HashMap:</p>
<p><img src="/2016/12/implementation.png" alt="" /></p>
<p>So switching from TreeMap to IntIntBTreeMap may be worth a 2x performance increase. Nice!</p>
<p>These benchmarks:</p>
<ul>
<li>Use <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> to ensure e.g. that the JVM is warmed up</li>
<li>Were run on an <code class="highlighter-rouge">int</code> to <code class="highlighter-rouge">int</code> map with 100k keys</li>
<li>Do not include <code class="highlighter-rouge">lowerKey</code> numbers for HashMap or Int2IntRBTreeMap, which do not support the operation</li>
<li>Are in memory-only and do not make use of the persistence features of MapDB</li>
</ul>
<p>Performance benefits depend on the amount of data you are working with. Small working sets may fit into level 1 or 2 cache so will pay a relatively small penalty for a lack of cache locality. These graph show how throughput depends on the number of keys there are in the working set, where keys are distributed between a varying number of fixed size (100 key) maps. B-trees do not start to show a performance advantage until we reach 10k keys or so:</p>
<p align="center">
<img src="/2016/12/size-small.png" alt="" />
</p>
<p>There are a few interesting things about the implementation:</p>
<ul>
<li>
<p>The obvious way to represent a tree node is as an object with two fields: a fixed size array of children, and a size (number of children that are present). However, in Java this means taking two indirections when you want to access a child (you need to first load the address of the array from the object, then load the child at an offset from that base address). Instead, I define tree nodes as a class with one field for each possible child, and then use <a href="http://mydailyjava.blogspot.co.uk/2013/12/sunmiscunsafe.html">sun.misc.Unsafe</a> for fast random access to these fields. This change made <code class="highlighter-rouge">get</code> about 10% faster in my testing.</p>
</li>
<li>
<p>The internal nodes store the links to their children in sorted order. Therefore, you’d expect binary search to be a good way to find the child associated with a particular tree. In practice I found that linear search was 20% or more faster, probably due to branch prediction improvements.</p>
</li>
<li>
<p>To avoid copy-pasting the code for each unboxed version of the data structure, I had to come up with a horrible templating language partly based on <a href="http://jtwig.org/">JTwig</a>. <a href="http://openjdk.java.net/jeps/169">Value types</a> can’t come fast enough!</p>
</li>
</ul>MaxSorted maps are useful alternatives to standard unordered hashmaps. Not only do they tend to make your programs more deterministic, they also make some kinds of queries very efficient. For example, one thing we frequently want to do at work is find the most recent observation of a sparse timeseries as of a particular time. If the series is represented as an ordered mapping from time to value, then this this question is easily answered in log time by a bisection on the mapping.4 things you didn’t know you could do with Java2016-08-18T19:31:41+01:002016-08-18T19:31:41+01:00http://blog.omega-prime.co.uk/2016/08/18/4-things-you-didnt-know-you-could-do-with-java<p>Java is often described as a simple programming language. While this is arguably true, it has still retained the ability to surprise me after using it full time for years.</p>
<p>This blog post describes four features that are obscure enough that they’ve surprised either me or one of my seasoned colleagues.</p>
<h3 id="abstract-over-thrown-exception-type">Abstract over thrown exception type</h3>
<p>Yes, <code class="highlighter-rouge">throws</code> clauses may contain type variables. This means that this sort of thing is admissable:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">interface</span> <span class="nc">ExceptionalSupplier</span><span class="o"><</span><span class="n">T</span><span class="o">,</span> <span class="n">E</span> <span class="kd">extends</span> <span class="n">Throwable</span><span class="o">></span> <span class="o">{</span>
<span class="n">T</span> <span class="nf">supply</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">E</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">class</span> <span class="nc">FileStreamSupplier</span>
<span class="kd">implements</span> <span class="n">ExceptionalSupplier</span><span class="o"><</span><span class="n">FileInputStream</span><span class="o">,</span> <span class="n">IOException</span><span class="o">></span> <span class="o">{</span>
<span class="nd">@Override</span>
<span class="kd">public</span> <span class="n">FileInputStream</span> <span class="nf">supply</span><span class="o">()</span> <span class="kd">throws</span> <span class="n">IOException</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">FileInputStream</span><span class="o">(</span><span class="k">new</span> <span class="n">File</span><span class="o">(</span><span class="s">"foo.txt"</span><span class="o">));</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>As the example suggests, this pattern is frequently useful if you want to do functional abstraction without having to rethrow exceptions as <code class="highlighter-rouge">RuntimeException</code> everywhere.
The place people are most likely to encounter this for the first time is when looking at the <a href="https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Throwables.html#propagateIfPossible(java.lang.Throwable, java.lang.Class)"><code class="highlighter-rouge">Throwables</code></a> utilities in Guava.</p>
<h3 id="intersection-types">Intersection types</h3>
<p>This refers to the ability to write down a type that is the set intersection of two other types – so the type <code class="highlighter-rouge">A & B</code> is only inhabited by values that are instances of both <code class="highlighter-rouge">A</code> and <code class="highlighter-rouge">B</code>. Java lets you use this sort of type within generic bounds, but not anywhere else:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">class</span> <span class="nc">IntersectionType</span> <span class="o">{</span>
<span class="kd">public</span> <span class="o"><</span><span class="n">T</span> <span class="kd">extends</span> <span class="n">List</span> <span class="o">&</span> <span class="n">Iterator</span><span class="o">></span> <span class="kt">void</span> <span class="nf">consume</span><span class="o">(</span><span class="n">T</span> <span class="n">weirdThing</span><span class="o">)</span> <span class="o">{</span>
<span class="n">weirdThing</span><span class="o">.</span><span class="na">iterator</span><span class="o">().</span><span class="na">next</span><span class="o">();</span>
<span class="n">weirdThing</span><span class="o">.</span><span class="na">next</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>One place this comes up in practice is where you need to know that something is both of some useful type and also, for resource managment purposes, <code class="highlighter-rouge">Closeable</code>.</p>
<h3 id="constructor-type-parameters">Constructor type parameters</h3>
<p>Constructors are somewhat analagous to static methods. Did you know that just like static methods, constructors can take type arguments? Observe:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">class</span> <span class="nc">ConstructorTyArgs</span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">List</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">strings</span><span class="o">;</span>
<span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="nf">ConstructorTyArgs</span><span class="o">(</span><span class="n">List</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">xs</span><span class="o">,</span> <span class="n">Function</span><span class="o"><</span><span class="n">T</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="n">f</span><span class="o">)</span> <span class="o">{</span>
<span class="n">strings</span> <span class="o">=</span> <span class="n">xs</span><span class="o">.</span><span class="na">stream</span><span class="o">().</span><span class="na">map</span><span class="o">(</span><span class="n">f</span><span class="o">).</span><span class="na">collect</span><span class="o">(</span><span class="n">Collectors</span><span class="o">.</span><span class="na">toList</span><span class="o">());</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">useSite</span><span class="o">()</span> <span class="o">{</span>
<span class="k">new</span> <span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="nf">ConstructorTyArgs</span><span class="o">(</span>
<span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">),</span>
<span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="o">.</span><span class="na">toString</span><span class="o">()</span> <span class="o">+</span> <span class="s">"!"</span><span class="o">);</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>This feature is useless enough that I’ve never felt any desire to do this. In fact, I only noticed it when I was reading a formal grammar for Java.</p>
<p>Note that the type parameters you write here are not the same as the type parameters of the enclosing class (if any). This means that unfortunately there is no way to write the static <code class="highlighter-rouge">create</code> method below as a constructor, since it requires refining the bounds on the class type parameters:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">class</span> <span class="nc">Comparablish</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="o">{</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">T</span> <span class="n">value</span><span class="o">;</span>
<span class="kd">private</span> <span class="kd">final</span> <span class="n">Comparator</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">comparator</span><span class="o">;</span>
<span class="kd">public</span> <span class="nf">Comparablish</span><span class="o">(</span><span class="n">T</span> <span class="n">value</span><span class="o">,</span> <span class="n">Comparator</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">comparator</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">value</span> <span class="o">=</span> <span class="n">value</span><span class="o">;</span>
<span class="k">this</span><span class="o">.</span><span class="na">comparator</span> <span class="o">=</span> <span class="n">comparator</span><span class="o">;</span>
<span class="o">}</span>
<span class="kd">static</span> <span class="o"><</span><span class="n">T</span> <span class="kd">extends</span> <span class="n">Comparable</span><span class="o"><</span><span class="n">T</span><span class="o">>></span> <span class="n">Comparablish</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="nf">create</span><span class="o">(</span><span class="n">T</span> <span class="n">value</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="n">Comparablish</span><span class="o"><</span><span class="n">T</span><span class="o">>(</span><span class="n">value</span><span class="o">,</span> <span class="n">Comparator</span><span class="o">.</span><span class="na">naturalOrder</span><span class="o">());</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<h3 id="inline-classes">Inline classes</h3>
<p>I’m sure everyone is aware that you can declare <em>anonymous</em> inner classes within a method body like this:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">class</span> <span class="nc">AnonymousInnerClass</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">method</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">Object</span><span class="o">()</span> <span class="o">{</span>
<span class="kt">int</span> <span class="nf">foo</span><span class="o">()</span> <span class="o">{</span> <span class="k">return</span> <span class="mi">1</span><span class="o">;</span> <span class="o">}</span>
<span class="o">}.</span><span class="na">foo</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>But did you know that you can also declare <em>named</em> inner classes too?</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">class</span> <span class="nc">InlineClass</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">method</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">class</span> <span class="nc">MyIterator</span> <span class="kd">implements</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="o">{</span>
<span class="kd">private</span> <span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Integer</span> <span class="nf">next</span><span class="o">()</span> <span class="o">{</span> <span class="k">return</span> <span class="n">i</span><span class="o">++;</span> <span class="o">}</span>
<span class="nd">@Override</span> <span class="kd">public</span> <span class="kt">boolean</span> <span class="nf">hasNext</span><span class="o">()</span> <span class="o">{</span> <span class="k">return</span> <span class="kc">false</span><span class="o">;</span> <span class="o">}</span>
<span class="o">}</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">MyIterator</span><span class="o">().</span><span class="na">next</span><span class="o">()</span> <span class="o">+</span> <span class="k">new</span> <span class="n">MyIterator</span><span class="o">().</span><span class="na">next</span><span class="o">();</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>The inner class (which may not be static) can close over local variables and is subject to the same scoping rules as a variable. In particular, this means that a named inner class can use itself recursively from within it’s definition, but you can’t declare a mutually recursive group of multiple classes like this:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kt">int</span> <span class="nf">rec</span><span class="o">()</span> <span class="o">{</span>
<span class="kd">class</span> <span class="nc">A</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">f</span><span class="o">()</span> <span class="o">{</span> <span class="k">return</span> <span class="k">new</span> <span class="n">B</span><span class="o">().</span><span class="na">g</span><span class="o">();</span> <span class="o">}</span>
<span class="o">}</span>
<span class="kd">class</span> <span class="nc">B</span> <span class="o">{</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">g</span><span class="o">()</span> <span class="o">{</span> <span class="k">return</span> <span class="k">new</span> <span class="n">A</span><span class="o">().</span><span class="na">f</span><span class="o">();</span> <span class="o">}</span>
<span class="o">}</span>
<span class="k">return</span> <span class="k">new</span> <span class="nf">B</span><span class="o">().</span><span class="na">g</span><span class="o">();</span>
<span class="o">}</span></code></pre></figure>MaxJava is often described as a simple programming language. While this is arguably true, it has still retained the ability to surprise me after using it full time for years.A Cambridge Computer Science degree summarised in 58 crib sheets2016-07-12T21:21:18+01:002016-07-12T21:21:18+01:00http://blog.omega-prime.co.uk/2016/07/12/a-cambridge-computer-science-degree-summarised-in-58-crib-sheets<p>From 2005 to 2008 I was an undergraduate studying <a href="http://www.cl.cam.ac.uk/">Computer Science at Cambridge</a>.
My method of preparing for the exams was to summarise each lecture course into just a few sides of A4, which I’d then commit to memory in their entirety.</p>
<p>To make them shorter and hence easier to memorise, I’d omit all but truly essential information from each crib sheet. For example, I wouldn’t include any formula if it was easily derivable from first principles, and I certainly didn’t waste any words on conceptual explanations. As a consequence, these sheets certainly aren’t the best choice for those learning a subject for the first time, but they might come in handy as a refresher for those with some familiarity with the subject.</p>
<p>So without further ado, here is my summary of a complete Cambridge Computer Science degree in 58 crib sheets:</p>
<table>
<tr>
<td>Advanced System Topics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/advanced-system-topics.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/advanced-system-topics.lyx">lyx</a></td>
</tr>
<tr>
<td>Algorithms</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/algorithms.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/algorithms.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Algorithms II</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/algorithms-ii.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/algorithms-ii.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Artifical Intelligence I</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/artifical-intelligence-i.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/artifical-intelligence-i.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Bioinformatics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/bioinformatics.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/bioinformatics.lyx">lyx</a></td>
</tr>
<tr>
<td>Business Studies</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/business-studies.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/business-studies.lyx">lyx</a></td>
</tr>
<tr>
<td>C And C++</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/c-and-c++.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/c-and-c++.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Comparative Architectures</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/comparative-architectures.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/comparative-architectures.lyx">lyx</a></td>
</tr>
<tr>
<td>Compiler Construction</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/compiler-construction.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/compiler-construction.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Computation Theory</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computation-theory.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computation-theory.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Computer Design</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-design.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-design.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Computer Graphics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-graphics.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-graphics.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Computer Systems Modelling</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-systems-modelling.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-systems-modelling.lyx">lyx</a></td>
</tr>
<tr>
<td>Computer Vision</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-vision.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/computer-vision.lyx">lyx</a></td>
</tr>
<tr>
<td>Concepts In Programming Languages</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/concepts-in-programming-languages.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/concepts-in-programming-languages.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Concurrent Systems And Applications</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/concurrent-systems-and-applications.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/concurrent-systems-and-applications.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Databases</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/databases.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/databases.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Denotational Semantics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/denotational-semantics.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/denotational-semantics.lyx">lyx</a></td>
</tr>
<tr>
<td>Digital Communications</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-communications.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-communications.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Digital Communications II</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-communications-ii.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-communications-ii.lyx">lyx</a></td>
</tr>
<tr>
<td>Digital Electronics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-electronics.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-electronics.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Digital Signal Processing</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-signal-processing.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/digital-signal-processing.lyx">lyx</a></td>
</tr>
<tr>
<td>Discrete Mathematics I</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/discrete-mathematics-i.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/discrete-mathematics-i.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Discrete Mathematics II</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/discrete-mathematics-ii.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/discrete-mathematics-ii.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Distributed Systems</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/distributed-systems.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/distributed-systems.lyx">lyx</a></td>
</tr>
<tr>
<td>ECAD</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/ecad.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/ecad.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Economics And Law</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/economics-and-law.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/economics-and-law.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Floating Point Computation</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/floating-point-computation.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/floating-point-computation.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Foundations Of Computer Science</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/foundations-of-computer-science.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/foundations-of-computer-science.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Foundations Of Functional Programming</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/foundations-of-functional-programming.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/foundations-of-functional-programming.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Human Computer Interaction</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/human-computer-interaction.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/human-computer-interaction.lyx">lyx</a></td>
</tr>
<tr>
<td>Information Retrieval</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/information-retrieval.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/information-retrieval.lyx">lyx</a></td>
</tr>
<tr>
<td>Information Theory And Coding</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/information-theory-and-coding.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/information-theory-and-coding.lyx">lyx</a></td>
</tr>
<tr>
<td>Introduction To Security</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/introduction-to-security.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/introduction-to-security.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Logic And Proof</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/logic-and-proof.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/logic-and-proof.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Mathematical Methods For CS</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematical-methods-for-cs.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematical-methods-for-cs.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Mathematics I</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematics-i.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematics-i.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Mathematics II</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematics-ii.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematics-ii.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Mathematics III</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematics-iii.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mathematics-iii.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Mechanics And Relativity</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mechanics-and-relativity.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/mechanics-and-relativity.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Natural Language Processing</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/natural-language-processing.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/natural-language-processing.lyx">lyx</a></td>
</tr>
<tr>
<td>Operating Systems</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/operating-systems.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/operating-systems.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Optimising Compilers</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/optimising-compilers.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/optimising-compilers.lyx">lyx</a></td>
</tr>
<tr>
<td>Oscillations And Waves</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/oscillations-and-waves.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/oscillations-and-waves.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Probability</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/probability.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/probability.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Professional Practice And Ethics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/professional-practice-and-ethics.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/professional-practice-and-ethics.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Programming In Java</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/programming-in-java.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/programming-in-java.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Prolog</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/prolog.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/prolog.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Quantum And Statistical Mechanics</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/quantum-and-statistical-mechanics.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/quantum-and-statistical-mechanics.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Regular Languages And Finite Automata</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/regular-languages-and-finite-automata.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/regular-languages-and-finite-automata.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Semantics Of Programming Languages</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/semantics-of-programming-languages.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/semantics-of-programming-languages.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Software Design</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/software-design.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/software-design.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Software Engineering</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/software-engineering.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/software-engineering.doc">doc</a></td>
<td></td>
</tr>
<tr>
<td>Specification And Verification I</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/specification-and-verification-i.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/specification-and-verification-i.lyx">lyx</a></td>
</tr>
<tr>
<td>Specification And Verification II</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/specification-and-verification-ii.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/specification-and-verification-ii.lyx">lyx</a></td>
</tr>
<tr>
<td>Topics In Concurrency</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/topics-in-concurrency.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/topics-in-concurrency.lyx">lyx</a></td>
</tr>
<tr>
<td>Types</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/types.pdf">pdf</a></td>
<td></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/types.lyx">lyx</a></td>
</tr>
<tr>
<td>VLSI Design</td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/vlsi-design.pdf">pdf</a></td>
<td><a href="http://www.omega-prime.co.uk/files/crib-sheets/vlsi-design.docx">doc</a></td>
<td></td>
</tr>
</table>
<p>Because I only created crib sheets for subjects that I thought I might potentially choose to answer questions on during the exam, this list does not cover every available course (though it’s probably at least 70% of them). The other thing to note is that Cambridge requires Computer Science students to take some courses in natural science during your first year: the crib sheets that I’ve included (e.g. “Mechanics And Relativity” and “Oscillations And Waves”) reflect my specialization in physics.</p>MaxFrom 2005 to 2008 I was an undergraduate studying Computer Science at Cambridge. My method of preparing for the exams was to summarise each lecture course into just a few sides of A4, which I’d then commit to memory in their entirety.Datastructures for external memory2016-07-05T06:49:31+01:002016-07-05T06:49:31+01:00http://blog.omega-prime.co.uk/2016/07/05/datastructures-for-external-memory<p>Something I recently became interested in is map data structures for
external memory — i.e. ways of storing indexed data that are optimized
for storage on disk.</p>
<p>In a typical analysis of algorithm time complexity, you assume it takes
constant time to access memory or perform a basic CPU operation such as
addition. This is of course not wholly accurate: in particular, cache
effects mean that memory access time varies wildly depending on what
exact address you are querying. In a system where your algorithm may
access external memory, this becomes even more true — a CPU that takes
1ns to perform an addition may easily find itself waiting 5ms (i.e. 5
million ns) for a read from a spinning disk to complete.</p>
<p>An alternative model of complexity is the Disk Access Machine (DAM). In
this model, reading one <em>block</em> of memory (of fixed size <code class="highlighter-rouge">B</code>) has
constant time cost, and <strong>all</strong> other operations are free. Just like its
conventional cousin this is clearly a simplification of reality, but
it’s one that lets us succinctly quantify the disk usage of various data
structures.</p>
<p>At the time of writing, this is the performance we can expect from the
storage hierarchy:</p>
<table>
<tr>
<th>Category</th>
<th>Representative device</th>
<th>Sequential Read Bandwidth</th>
<th>Sequential Write Bandwidth</th>
<th>4KB Read IOPS</th>
<th>4KB Write IOPS</th>
</tr>
<tr>
<td>Mechanical disk</td>
<td><a href="http://www.tomshardware.com/charts/hdd-charts-2013/-04-Write-Throughput-Average-h2benchw-3.16,Marque_fbrandx46,2904.html">Western Digital Black WD4001FAEX (4TB)</a></td>
<td>130MB/s</td>
<td>130MB/s</td>
<td>110</td>
<td>150</td>
</tr>
<tr>
<td>SATA-attached SSD</td>
<td><a href="http://www.tomshardware.com/reviews/samsung-850-pro-ssd-performance,3861.html">Samsung 850 Pro (1TB)</a></td>
<td>550MB/s</td>
<td>520MB/s</td>
<td>10,000</td>
<td>36,000</td>
</tr>
<tr>
<td>PCIe-attached SSD</td>
<td><a href="http://www.tomshardware.com/reviews/intel-750-series-ssd,4096.html">Intel 750 (1.2TB)</a></td>
<td>2,400MB/s</td>
<td>1,200MB/s</td>
<td>440,000</td>
<td>290,000</td>
</tr>
<tr>
<td>Main memory</td>
<td><a href="http://www.techspot.com/news/62129-ddr3-vs-ddr4-raw-bandwidth-numbers.html">Skylake @ 3200MHz</a></td>
<td>42,000MB/s</td>
<td>48,000MB/s</td>
<td colspan="2">16,100,000 (<a href="http://www.7-cpu.com/cpu/Skylake.html">62ns/operation</a>)</td> <!-- FIXME: implies read bandwidth higher than sequential? 100ns more reasonable? -->
</tr>
</table>
<!--
? http://codecapsule.com/2014/02/12/coding-for-ssds-part-2-architecture-of-an-ssd-and-benchmarking/
Mechanical Time(Page_Size) = Seek Time + (Ceil(Page_Size / 4KB) / (Read_Rate / 4KB))
SSD Time(Page_Size) = Ceil(Page_Size / 2MB) / (Read_Rate / 2MB)
-->
<p>(In the above table, all IOPS figures are reported assuming a queue
depth of 1, so will tend to be worst case numbers for the SSDs.)</p>
<p>Observe that the implied bandwidth of random reads from a mechanical
disk is (110 * 4KB/s) i.e. 440KB/s — approximately 300 times slower
than the sequential read case. In contrast, random read bandwith from a
PCIe-attached SSD is (440,000 * 4KB/s) = 1.76GB/s i.e. only about 1.4
times slower than the sequential case. So you still pay a penalty for
random access even on SSDs, but it’s much lower than the equivalent cost
on spinning disks.</p>
<p>One way to think about the IOPS numbers above is to break them down into
that part of the IOPS that we can attribute to the time necessary to
transfer the 4KB block (i.e. <code class="highlighter-rouge">4KB/Bandwidth</code>) and whatever is left,
which we can call the seek time (i.e. <code class="highlighter-rouge">(1/IOPS) - (4KB/Bandwidth)</code>):</p>
<table>
<tr>
<th>Category</th>
<th>Implied Seek Time From Read</th>
<th>Implied Seek Time From Write</th>
<th>Mean Implied Seek Time</th>
</tr>
<tr>
<td>Mechanical Disk</td>
<td>9.06ms</td>
<td>6.63ms</td>
<td>7.85ms</td>
</tr>
<tr>
<td>SATA-attached SSD</td>
<td>92.8us</td>
<td>20.2us</td>
<td>56.5us</td>
</tr>
<tr>
<td>PCIe-attached SSD</td>
<td>645ns</td>
<td>193ns</td>
<td>419ns</td>
</tr>
</table>
<p>If we are using the DAM to model programs running on top of one of these
storage mechanisms, which block size <code class="highlighter-rouge">B</code> should we choose such that
algorithm costs derived from the DAM are a good guide to real-world time
costs? Let’s say that our DAM cost for some algorithm is <code class="highlighter-rouge">N</code> block
reads. Consider two scenarios:</p>
<ul>
<li>If these reads are all contiguous, then the true time cost (in
seconds) of the reads will be <code class="highlighter-rouge">N*(B/Bandwidth) + Seek Time</code></li>
<li>If they are all random, then the true time cost is
<code class="highlighter-rouge">N*((B/Bandwidth) + Seek Time)</code>, i.e. <code class="highlighter-rouge">(N - 1)*Seek Time</code> more than
the sequential case</li>
</ul>
<p>The fact that the same DAM cost can correspond to two very different
true time costs suggests that in we should try to choose a block size
that minimises the difference between the two possible true costs. With
this in mind, a sensible choice is to set <code class="highlighter-rouge">B</code> equal to the product of
the seek time and the bandwidth of the device. If we do this, then in
random-access scenario (where the DAM most underestimates the cost):</p>
<ul>
<li>Realized IOPS will be at least half of peak IOPS for the storage
device.</li>
<li>Realized bandwidth will be at least half of peak bandwidth for the
storage device.</li>
</ul>
<p>If we choose <code class="highlighter-rouge">B</code> smaller than the bandwidth/seek time product then we’ll
get IOPS closer to device maximum, but only at the cost of worse
bandwidth. Likewise, larger blocks than this will reduce IOPS but boost
bandwidth. The proposed choice of <code class="highlighter-rouge">B</code> penalises both IOPS and bandwidth
equally. Applying this idea to the storage devices above:</p>
<table>
<tr>
<th>Category</th>
<th>Implied Block Size From Read</th>
<th>Implied Block Size From Write</th>
<th>Mean Implied Block Size</th>
</tr>
<tr>
<td>Mechanical Disk</td>
<td>1210KB</td>
<td>883KB</td>
<td>1040KB</td>
</tr>
<tr>
<td>SATA-attached SSD</td>
<td>52.3KB</td>
<td>10.8KB</td>
<td>31.6KB</td>
</tr>
<tr>
<td>PCIe-attached SSD</td>
<td>1.59KB</td>
<td>243B</td>
<td>933B</td>
</tr>
</table>
<p>On SSDs the smallest writable/readable unit of storage is the <em>page</em>. On
current generation devices, a page tends to be around <a href="http://codecapsule.com/2014/02/12/coding-for-ssds-part-2-architecture-of-an-ssd-and-benchmarking/">8KB in
size</a>.
It’s gratifying to see that this is within an order of magnitude of our
SSD block size estimates here.</p>
<p>Interestingly, the suggested block sizes for mechanical disks are much
larger than the typical block sizes used in operating systems and
databases, where 4KB virtual memory/database pages are common (and
certainly much larger than the 512B sector size of most spinning disks).
I am of course <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.219.7269&rep=rep1&type=pdf">not the
first</a>
to observe that typical database page sizes appear to be far too small.</p>
<h2 id="applying-the-dam">Applying the DAM</h2>
<p>Now we’ve decided how we can apply the DAM to estimate disk costs that
will translate (at least roughly) to real-world costs, we can actually
apply the model to the analysis of some algorithms. Before we begin,
some interesting features of the DAM:</p>
<ul>
<li>Binary search is not optimal. Binary-searching <code class="highlighter-rouge">N</code> items takes
<code class="highlighter-rouge">O(log (N/B))</code> block reads, but <code>O(log<sub>B</sub> N)</code> search is possible with
other algorithms.</li>
<li>Sorting by inserting items one at a time into a B-tree and then
traversing the tree is not optimal. The proposed approach takes
<code>O(N log<sub>B</sub> N)</code> but it’s possible to sort in <code class="highlighter-rouge">O((N/B) * log (N/B))</code>.</li>
<li>Unlike with the standard cost model, many map data structures have
different costs for lookup and insertion in the DAM, which means
that e.g. adding <code class="highlighter-rouge">UNIQUE</code> constraints to database indexes can
actually change the complexity of inserting into the index (since
you have to do lookup in such an index before you know whether an
insert should succeed).</li>
</ul>
<p>Now let’s cover a few map data structures. We’ll see that the maps that
do well in the DAM model will be those that are best able to
sequentialize their access patterns to exploit the block structure of
memory.</p>
<h2 id="2-3-tree">2-3 Tree</h2>
<p>The <a href="https://en.wikipedia.org/wiki/2–3_tree">2-3 tree</a> is a balanced
tree structure where every leaf node is at the same depth, and all
internal nodes have either 1 or 2 keys — and therefore have either 2 or
3 children. Leaf nodes have either 1 or 2 key/value pairs.</p>
<p>Lookup in this tree is entirely straightforward and has complexity
<code class="highlighter-rouge">O(log N)</code>. Insertion into the tree proceeds recursively starting from
the root node:</p>
<ol>
<li>If inserting into a leaf, we add the data item to the leaf. Note
that this may mean that the leaf temporarily contain 3 key/value
pairs, which is more than the usual limit.</li>
<li>If inserting into a internal node, we recursively add the data item
to the appropriate child. After doing this, the child may contain 3
keys, in which case we pull one up to this node, creating a new
sibling in the process. If this node already contained 2 keys this
will in turn cause it to become oversized. An example of how this
might look is:
<a href="/2016/07/23-internal.png"><img src="/2016/07/23-internal.png" alt="" /></a></li>
<li>If, after the recursion completes, the root node contains 3 keys,
then we pull a new root node (with one key) out of the old root,
like so:
<a href="/2016/07/23-root.png"><img src="/2016/07/23-root.png" alt="" /></a></li>
</ol>
<p>It’s easy to see that this keeps the tree balanced. This insertion
process also clearly has <code class="highlighter-rouge">O(log N)</code> time complexity, just like lookup.
The data structure makes no attempt to exploit the fact that memory is
block structured, so both insertion and lookup have identical complexity
in the DAM and the standard cost model.</p>
<h2 id="b-tree">B-Tree</h2>
<p>The <a href="https://en.wikipedia.org/wiki/B-tree">B-tree</a> (and the very closely
related <a href="https://en.wikipedia.org/wiki/B%2B_tree">B+tree</a>) is probably
the most popular structure for external memory storage. It can be seen
as a simple generalisation of the 2-3 tree where, instead of each
internal node having 1 or 2 keys, it instead has between <code class="highlighter-rouge">m</code> and <code class="highlighter-rouge">2m</code>
keys for any <code class="highlighter-rouge">m > 0</code>. We then set <code class="highlighter-rouge">m</code> to the maximum value so that one
internal node fits exactly within our block size <code class="highlighter-rouge">B</code>, i.e. <code class="highlighter-rouge">m = O(B)</code>.</p>
<p>In the DAM cost model, lookup in a B-tree has time complexity
<code>O(log<sub>B</sub> N)</code>. This is because we can access each internal node’s set of
at least <code class="highlighter-rouge">m</code> keys using a single block read — i.e. in <code class="highlighter-rouge">O(1)</code> — and this
lets us make a choice between at least <code class="highlighter-rouge">m+1 = O(B)</code> child nodes.</p>
<p>For similar reasons to the lookup case, inserting into a B-tree also has
time cost <code>O(log<sub>B</sub> N)</code> in the DAM.</p>
<h2 id="buffered-repository-tree">Buffered Repository Tree</h2>
<p>A <a href="http://www.daimi.au.dk/~large/ioS06/BGVW.pdf">buffered repository
tree</a>, or BRT, is a
generalization of a 2-3 tree where each internal node is associated with
an additional <em>buffer</em> of size <code class="highlighter-rouge">k = O(B)</code>. When choosing <code class="highlighter-rouge">k</code> a sensible
choice is to make it just large enough to use all the space within a
block that is not occupied by the keys of the internal node.</p>
<p>When inserting into this tree, we do not actually modify the tree
structure immediately. Instead, a record of the insert just gets
appended to the root node’s buffer until that buffer becomes full. Once
it is full, we’re sure to be able to spill at least <code class="highlighter-rouge">k/3</code> insertions to
one child node. These inserts will be buffered at the lower level in
turn, and may trigger recursive spills to yet-deeper levels.</p>
<p>What is the time complexity of insertion? Some insertions will be very
fast because they just append to the buffer, while others will involve
extensive spilling. To smooth over these differences, we therefore
consider the amortized cost of an insertion. If we insert <code class="highlighter-rouge">N</code> elements
into the tree, then at each of the <code class="highlighter-rouge">O(log (N/B))</code> levels of the tree
we’ll spill at most <code class="highlighter-rouge">O(N/(k/3)) = O(N/B)</code> times. This gives a total cost
for the insertions of <code class="highlighter-rouge">O((N/B) log (N/B))</code>, which is an amortized cost
of <code class="highlighter-rouge">O((log (N/B))/B)</code>.</p>
<p>Lookup proceeds pretty much as normal, except that the buffer at each
level must be searched before any child nodes are considered. In the
DAM, this additional search has cost <code class="highlighter-rouge">O(1)</code>, so lookup cost becomes
<code class="highlighter-rouge">O(log (N/B))</code>.</p>
<p>Essentially what we’ve done with this structure is greatly sped up the
insertion rate by exploiting the fact that the DAM lets us batch up
writes into groups of size <code class="highlighter-rouge">O(B)</code> for free. This is our first example of
a structure whose insertion cost is <em>lower</em> than its lookup cost.</p>
<h2 id="b-ε-tree">B-ε Tree</h2>
<p>It turns out that it’s possible to see the B-tree and the BRT as the two
most extreme examples of a whole family of data structures.
Specifically, both the B-tree and the BRT are instances of a more
general notion called a
<a href="http://www.cs.au.dk/~gerth/papers/alcomft-tr-03-75.pdf">B-ε</a> tree,
where ε is a real variable ranging between 0 and 1.</p>
<p>A B-ε tree is a generalisation of a 2-3 tree where each internal node
has between <code class="highlighter-rouge">m</code> and <code class="highlighter-rouge">2m</code> keys, where <code class="highlighter-rouge">0 < m = O(Bε)</code>. Each node is also
accompanied by a buffer of size <code class="highlighter-rouge">k = O(B)</code>. This buffer space is used to
queue pending inserts, just like in the BRT.</p>
<p>One possible implementation strategy is to set <code class="highlighter-rouge">m</code> so that one block is
entirely full with keys when <code class="highlighter-rouge">ε = 1</code>, and so that <code class="highlighter-rouge">m = 2</code> when <code class="highlighter-rouge">ε = 0</code>.
The <code class="highlighter-rouge">k</code> value can then be chosen to exactly occupy any space within the
block that is not being used for keys (so in particular, if <code class="highlighter-rouge">ε = 1</code> then
<code class="highlighter-rouge">k = 0</code>). With these definitions it’s clear that the <code class="highlighter-rouge">ε = 1</code> case
corresponds to a B-tree and <code class="highlighter-rouge">ε = 0</code> gives you a BRT.</p>
<p>As you would expect, the B-ε insertion algorithm operates in essentially
the same manner as described above for the BRT. To derive the time
complexity of insertion, we once again look at the amortized cost.
Observe that the structure will have
<code>O(log<sub>B</sub>ε (N/B)) = O((log<sub>B</sub> (N/B))/ε) = O((log<sub>B</sub> N)/ε)</code> levels and that on
each spill we’ll be able to push down at least <code class="highlighter-rouge">O(B1-ε)</code> elements to a
child. This means that after inserting <code class="highlighter-rouge">N</code> elements into the tree, we’ll
spill at most <code class="highlighter-rouge">O(N/(B1-ε)) = O(N*Bε-1)</code> times. This gives a total cost
for the insertions of <code>O(N*Bε-1*(log<sub>B</sub> N)/ε)</code>, which is an amortized cost
of <code>O((Bε-1/ε)*log<sub>B</sub> N)</code>.</p>
<p>The time complexity of lookups is just the number of levels in the tree
i.e. <code>O((log<sub>B</sub> N)/ε)</code>.</p>
<h2 id="fractal-tree">Fractal Tree</h2>
<p>These complexity results for the B-ε tree suggest a tantalising
possibility: if we set <code class="highlighter-rouge">ε = ½</code> we’ll have a data structure whose
asymptotic insert time will be strictly better (by a factor of <code class="highlighter-rouge">sqrt B</code>)
than that of B-trees, but which have exactly the same asymptotic lookup
time. This data structure is given the exotic name of a <a href="https://en.wikipedia.org/wiki/Fractal_tree_index">“fractal
tree”</a>. Unfortunately,
the idea is <a href="http://www.google.com/patents/US8489638">patented</a>
<a href="http://www.google.co.uk/patents/US8185551">by</a> the founders of Tokutek
(now
<a href="https://www.percona.com/about-percona/newsroom/press-releases/percona-acquires-tokutek">Percona</a>),
so they’re only used commercially in Percona products like TokuDB. If
you want to read more about what you are missing out on, there’s a good
article on the <a href="https://www.percona.com/blog/2013/07/02/tokumx-fractal-treer-indexes-what-are-they/">company
blog</a>
and a
<a href="http://insideanalysis.com/wp-content/uploads/2014/08/Tokutek_lsm-vs-fractal.pdf">whitepaper</a>.</p>
<h2 id="log-structured-merge-tree">Log-Structured Merge Tree</h2>
<p>The final data structure we’ll consider, the <a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree">log-structured merge
tree</a> (LSMT)
rivals the popularity of the venerable B-tree and is the technology
underlying most “NoSQL” stores.</p>
<p>In a LSMT, you maintain your data in a list of B-trees of varying sizes.
Lookups are accomplished by checking each B-tree in turn. To avoid
lookups having to check too many B-trees, we arrange that we never have
too many small B-trees in the collection.</p>
<p>There are two classes of LSMT that fit this general scheme:
<strong>size-tiered</strong> and <strong>levelled</strong>.</p>
<p>In a <strong>levelled</strong> LSMT, your collection is a list of B-trees of size at
most <code class="highlighter-rouge">O(B)</code>, <code class="highlighter-rouge">O(B*k)</code>, <code class="highlighter-rouge">O(B*k2)</code>, <code class="highlighter-rouge">O(B*k3)</code>, etc for some growth factor
<code class="highlighter-rouge">k</code>. Call these level 0, 1, 2 and so on. New items are inserted into
level 0 tree. When this tree exceeds its size bound, it is merged into
the level 1 tree, which may trigger recursive merges in turn.</p>
<p>Observe that if we insert <code class="highlighter-rouge">N</code> items into a levelled LSMT, there will be
<code class="highlighter-rouge">O(logk (N/B))</code> B-trees and the last one will have <code class="highlighter-rouge">O(N/B)</code> items in it.
Therefore lookup has complexity <code>O(log<sub>B</sub> N * log<sub>k</sub> (N/B))</code>. To derive the
update cost, observe that the items in the last level have been merged
down the full <code>O(log<sub>k</sub> (N/B))</code> levels, and they will have been merged
into on average <code class="highlighter-rouge">O(k)</code> times in each level before moving down to the
next. Therefore the amortized insertion cost is
<code>O((k * log<sub>k</sub> (N/B)) / B)</code>.</p>
<p>If we set <code class="highlighter-rouge">k = ½</code> then lookup and insert complexity simplify to
<code>O((log<sub>B</sub> N)2)</code> and <code>O(log<sub>B</sub> N / sqrt B)</code> respectively.</p>
<p>In a <strong>size-tiered</strong> LSMT things are slightly different. In this scheme
we have a staging buffer of size <code class="highlighter-rouge">O(B)</code> and more than one tree at each
level: specifically, at level <code class="highlighter-rouge">i >= 0</code>, we have up to <code class="highlighter-rouge">k</code> B-trees of
size exactly <code class="highlighter-rouge">O(B*ki)</code>. New items are inserted inte the staging buffer.
When it runs out of space, we turn it into a B-tree and insert it into
level 0. If would causes us to have more than <code class="highlighter-rouge">k</code> trees in the level, we
merge the <code class="highlighter-rouge">k</code> trees together into one tree of size <code class="highlighter-rouge">O(B*k)</code> that we can
try to insert into level 1, which may in turn trigger recursive merges.</p>
<p>The complexity arguments we made for levelled LSMT carry over almost
unchanged into this new setting, showing that the two schemes have
identical costs. LSMTs match the insert performance of fractal trees,
but suffer the cost of an extra log factor when doing lookup. To try to
improve lookup time, in practice most LSMT implementations store each
B-tree along with a <a href="https://en.wikipedia.org/wiki/Bloom_filter">Bloom
filter</a> which allows them to
avoid accessing a tree entirely when a key of interest is certainly not
included in it.</p>
<p>There are several <a href="http://www.cs.umb.edu/~poneil/lsmtree.pdf">good</a>
<a href="http://www.benstopford.com/2015/02/14/log-structured-merge-trees/">overviews</a>
of LSMTs available online.</p>
<h2 id="experiments">Experiments</h2>
<p>To validate my knowledge of these data structures, I wrote a <a href="https://github.com/batterseapower/btree/blob/master/btree.py">Python
program</a>
that tries to perform an apples-to-apples comparison of various B-ε tree
variants. The code implements the datastructure and also logs how many
logical blocks it would need to touch if the tree was actually
implemented on a block-structured device (in reality I just represent it
as a Python object). I assume that as many of the trees towards the top
of the tree as possible are stored in memory and so don’t hit the block
device.</p>
<p>I simulate a machine with 1MB of memory and 32KB pages. Keys are assumed
to be 16 bytes and values 240 bytes. With these assumptions can see how
the number of block device pages we need to write to varies with the
number of keys in the tree for each data structure:</p>
<p><a href="/2016/07/uncached_writes.png"><img src="/2016/07/uncached_writes.png" alt="uncached_writes" /></a></p>
<p>These experimental results match what we would expect from the
theoretical analysis: the BRT has a considerable advantage over the
alternatives when it comes to writes, B-trees are the worst, and fractal
trees occupy the middle ground.</p>
<p>The equivalent results for reads are as follows:</p>
<p><a href="/2016/07/uncached_reads.png"><img src="/2016/07/uncached_reads.png" alt="uncached_reads" /></a></p>
<p>This is essentially a mirror image of the write results, showing that
we’re fundamentally making a trade-off here.</p>
<h2 id="summary">Summary</h2>
<p>We can condense everything we’ve learnt above into the following table:</p>
<table>
<thead>
<tr>
<th>Structure</th>
<th>Lookup</th>
<th>Insert</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://en.wikipedia.org/wiki/2–3_tree">2-3 Tree</a></td>
<td bgcolor="#ffaaaa"><tt>O(log N)</tt></td>
<td bgcolor="#ffaaaa"><tt>O(log N)</tt></td>
</tr>
<tr>
<td><a href="http://www.cs.au.dk/~gerth/papers/alcomft-tr-03-75.pdf">B-ε Tree</a></td>
<td><tt>O((log<sub>B</sub> N)/ε)</tt></td>
<td><tt>O((B<sup>ε-1</sup>/ε)*log<sub>B</sub> N)</tt></td>
</tr>
<tr>
<td><a href="https://en.wikipedia.org/wiki/B-tree">B-Tree</a> (<tt>ε=1</tt>)</td>
<td bgcolor="#aaffaa"><tt>O(log<sub>B</sub> N)</tt></td>
<td bgcolor="#ffdddd"><tt>O(log<sub>B</sub> N)</tt></td>
</tr>
<tr>
<td><a href="https://en.wikipedia.org/wiki/Fractal_tree_index">Fractal Tree</a> (<tt>ε=½</tt>)</td>
<td bgcolor="#aaffaa"><tt>O(log<sub>B</sub> N)</tt></td>
<td bgcolor="#ddffdd"><tt>O(log<sub>B</sub> N / sqrt B)</tt></td>
</tr>
<tr>
<td><a href="http://www.daimi.au.dk/~large/ioS06/BGVW.pdf">Buffered Repository Tree</a> (<tt>ε=0</tt>)</td>
<td bgcolor="#ffaaaa"><tt>O(log (N/B))</tt></td>
<td bgcolor="#aaffaa"><tt>O((log (N/B))/B)</tt></td>
</tr>
<tr>
<td><a href="https://en.wikipedia.org/wiki/Log-structured_merge-tree">Log Structured Merge Tree</a></td>
<td bgcolor="#ddffdd"><tt>O((log<sub>B</sub> N)<sup>2</sup>)</tt></td>
<td bgcolor="#ddffdd"><tt>O(log<sub>B</sub> N / sqrt B)</tt></td>
</tr>
</tbody>
</table>
<p>These results suggest that you should always prefer to use a fractal
tree to any of a B-tree, LSMT or 2-3 tree. In the real world, things may
not be so clear cut: in particular, because of the fractal tree patent
situation, it may be difficult to find a free and high-quality
implementation of that data structure.</p>
<p>Most engineering effort nowadays is being directed at improving
implementations of B-trees and LSMTs, so you probably want to choose one
of these two options depending on whether your workload is read or write
heavy, respectively. Some would argue, however, that all database
workloads are essentially write bound, given that you can usually
optimize a slow read workload by simply adding some additional indexes.</p>MaxSomething I recently became interested in is map data structures for external memory — i.e. ways of storing indexed data that are optimized for storage on disk.Compression of floating point timeseries2016-01-25T22:43:17+00:002016-01-25T22:43:17+00:00http://blog.omega-prime.co.uk/2016/01/25/compression-of-floating-point-timeseries<p>I recently had cause to investigate fast methods of storing and transferring financial timeseries. Naively, timeseries can be represented in memory or on disk as simple dense arrays of floating point numbers. This is an attractive representation with many nice properties:</p>
<ul>
<li>Straightforward and widely used.</li>
<li>You have random access to the nth element of the timeseries with no further indexes required.</li>
<li>Excellent locality-of-reference for applications that process timeseries in time order, which is the common case.</li>
<li>Often natively supported by CPU vector instructions.</li>
</ul>
<p>However, it is not a particularly space-efficient representation. Financial timeseries have considerable structure (e.g. Vodafone’s price on T is likely to be very close to the price on T+1), and this structure can be exploited by compression algorithms to greatly reduce storage requirements. This is important either when you need to store a large number of timeseries, or need to transfer a smaller number of timeseries over a bandwidth-constrained network link.</p>
<p>Timeseries compression has recieved quite a bit of attention from both the academic/scientific programming community (see e.g. <a href="http://cs.txstate.edu/~burtscher/research/FPC/">FPC</a> and <a href="http://paperhub.s3.amazonaws.com/7558905a56f370848a04fa349dd8bb9d.pdf">PFOR</a>) and also practicioner communities such as the demoscene (see <a href="http://www.farbrausch.com/~fg/seminars/workcompression_download.pdf">this presentation</a> by a member of Farbrausch). This post summarises my findings about the effect that a number of easy-to-implement “filters” have on the final compression ratio.</p>
<p>In the context of compression algorithms, filters are simple invertible transformations that are applied to the stream in the hopes of making the stream more compressible by subsequent compressors. Perhaps the canonical example of a filter is the <a href="https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform">Burrows-Wheeler transform</a>, which has the effect of moving runs of similar letters together. Some filters will turn a decompressed input stream (from the user) of length N into an output stream (fed to the compressor) of length N, but in general filters will actually have the effect of making the stream <em>longer</em>. The hope is that the gains to compressability are enough to recover the bytes lost to any encoding overhead imposed by the filter.</p>
<p>In my application, I was using the compression as part of an RPC protocol that would be used interactively, so I wanted keep decompression time very low, and for ease-of-deployment I wanted to get results in the context of Java without making use of any native code. Consequently I was interested in which choice of filter and compression algorithm would give a good tradeoff between performance and compression ratio.</p>
<p>I determined this experimentally. In my experiments, I used timeseries associated with 100 very liquid US stocks retrieved from <a href="http://finance.yahoo.com">Yahoo Finance</a>, amounting to 69MB of CSVs split across 6 fields per stock (open/high/low/close and adjusted close prices, plus volume). This amounted to 12.9 million floating point numbers.</p>
<h2 id="choice-of-compressor">Choice of compressor</h2>
<p>To decide which compressors were contenders, I compressed these price timeseries with a few pure-Java implementations of the algorithms:</p>
<table>
<tr>
<th>Compressor</th>
<th>Compression time (s)</th>
<th>Decompression time (s)</th>
<th>Compression ratio</th>
</tr>
<tr>
<td>None</td>
<td>0.0708</td>
<td>0.0637</td>
<td>1.000</td>
</tr>
<tr>
<td>Snappy (org.iq80.snappy:snappy-0.3)</td>
<td>0.187</td>
<td>0.115</td>
<td>0.843</td>
</tr>
<tr>
<td>Deflate <tt>BEST_SPEED</tt> (JDK 8)</td>
<td>4.59</td>
<td>4.27</td>
<td>0.602</td>
</tr>
<tr>
<td>Deflate <tt>DEFAULT_COMPRESSION</tt> (JDK 8)</td>
<td>5.46</td>
<td>4.29</td>
<td>0.582</td>
</tr>
<tr>
<td>Deflate <tt>BEST_COMPRESSION</tt> (JDK 8)</td>
<td>7.33</td>
<td>4.28</td>
<td>0.580</td>
</tr>
<tr>
<td>BZip2 <tt>MIN_BLOCKSIZE</tt> (org.apache.commons:commons-compress-1.10)</td>
<td>1.79</td>
<td>0.756</td>
<td>0.540</td>
</tr>
<tr>
<td>BZip2 <tt>MAX_BLOCKSIZE</tt> (org.apache.commons:commons-compress-1.10)</td>
<td>1.73</td>
<td>0.870</td>
<td>0.515</td>
</tr>
<tr>
<td>XZ <tt>PRESET_MIN</tt> (org.apache.commons:commons-compress-1.10 + org.tukaani:xz-1.5)</td>
<td>2.66</td>
<td>1.20</td>
<td>0.469</td>
</tr>
<tr>
<td>XZ <tt>PRESET_DEFAULT</tt> (org.apache.commons:commons-compress-1.10 + org.tukaani:xz-1.5)</td>
<td>9.56</td>
<td>1.15</td>
<td>0.419</td>
</tr>
<tr>
<td>XZ <tt>PRESET_MAX</tt> (org.apache.commons:commons-compress-1.10 + org.tukaani:xz-1.5)</td>
<td>9.83</td>
<td>1.13</td>
<td>0.419</td>
</tr>
</table>
<p>These numbers were gathered from a custom benchmark harness which simply compresses and then decompresses the whole dataset once. However, I saw the same broad trends confirmed by a JMH benchmark of the same combined operation:</p>
<table>
<tr>
<th>Compressor</th>
<th>Compress/decompress time (s)</th>
<th>JMH compress/decompress time (s)</th>
</tr>
<tr>
<td>None</td>
<td>0.135</td>
<td>0.127 ± 0.002</td>
</tr>
<tr>
<td>Snappy (org.iq80.snappy:snappy-0.3)</td>
<td>0.302</td>
<td>0.215 ± 0.003</td>
</tr>
<tr>
<td>Deflate <tt>BEST_SPEED</tt> (JDK 8)</td>
<td>8.86</td>
<td>8.55 ± 0.15</td>
</tr>
<tr>
<td>Deflate <tt>DEFAULT_COMPRESSION</tt> (JDK 8)</td>
<td>9.75</td>
<td>9.35 ± 0.09</td>
</tr>
<tr>
<td>Deflate <tt>BEST_COMPRESSION</tt> (JDK 8)</td>
<td>11.6</td>
<td>11.4 ± 0.1</td>
</tr>
<tr>
<td>BZip2 <tt>MIN_BLOCKSIZE</tt> (org.apache.commons:commons-compress-1.10)</td>
<td>2.55</td>
<td>3.10 ± 0.04</td>
</tr>
<tr>
<td>BZip2 <tt>MAX_BLOCKSIZE</tt> (org.apache.commons:commons-compress-1.10)</td>
<td>2.6</td>
<td>3.77 ± 0.31</td>
</tr>
<tr>
<td>XZ <tt>PRESET_MIN</tt> (org.apache.commons:commons-compress-1.10 + org.tukaani:xz-1.5)</td>
<td>3.86</td>
<td>4.08 ± 0.12</td>
</tr>
<tr>
<td>XZ <tt>PRESET_DEFAULT</tt> (org.apache.commons:commons-compress-1.10 + org.tukaani:xz-1.5)</td>
<td>10.7</td>
<td>11.1 ± 0.1</td>
</tr>
<tr>
<td>XZ <tt>PRESET_MAX</tt> (org.apache.commons:commons-compress-1.10 + org.tukaani:xz-1.5)</td>
<td>11.0</td>
<td>11.5 ± 0.4</td>
</tr>
</table>
<p>What we see here is rather impressive performance from BZip2 and Snappy. I expected Snappy to do well, but BZip2’s good showing surprised me. In some previous (unpublished) microbenchmarks I’ve not seen GZipInputStream (a thin wrapper around Deflate with DEFAULT_COMPRESSION) be quite so slow, and my results also seem to contradict other Java <a href="https://github.com/ning/jvm-compressor-benchmark/wiki">compression</a> <a href="http://java-performance.info/performance-general-compression/">benchmarks</a>.</p>
<p>One contributing factor may be that the structure of the timeseries I was working with in that unpublished benchmark was quite different: there was a lot more repetition (runs of NaNs and zeroes), and compression ratios were consequently higher.</p>
<p>In any event, based on these results I decided to continue my evaluation with both Snappy and BZip2 MIN_BLOCKSIZE. It’s interesting to compare these two compressors because, unlike BZ2, Snappy doesn’t perform any entropy encoding.</p>
<h2 id="filters">Filters</h2>
<p>The two filters that I evaluated were <em>transposition</em> and <em>zig-zagged delta encoding</em>.</p>
<h1 id="transposition">Transposition</h1>
<p>The idea behind transposition (also known as <a href="http://www.blosc.org/blog/new-bitshuffle-filter.html">“shuffling”</a>) is as follows. Let’s say that we have three floating point numbers, each occuping 4 bytes:</p>
<p align="center">
<a href="/2016/01/diagrams01.png"><img src="/2016/01/diagrams01.png" alt="" /></a>
</p>
<p>On a <a href="https://en.wikipedia.org/wiki/Endianness">big-endian</a> system this will be represented in memory row-wise by the 4 consecutive bytes of the first float (MSB first), followed by the 4 bytes of the second float, and so on. In contrast, a transposed representation of the same data would encode all of the MSBs first, followed by all of the second-most-significant bytes, and so on, in a column-wise fashion:</p>
<p align="center">
<a href="/2016/01/diagrams02.png"><img src="/2016/01/diagrams02.png" alt="" /></a>
</p>
<p>The reason you might think that writing the data column-wise would improve compression is that you might expect that e.g. the most significant bytes of a series of floats in a timeseries would be very similar to each other. By moving these similar bytes closer together you increase the chance that compression algorithms will be able to find repeating patterns in them undisturbed by the essentially random content of the LSB.</p>
<h1 id="field-transposition">Field transposition</h1>
<p>Analagous to the byte-level transposition described above, another thing we might try is transposition at the level of a float subcomponent. Recall that floating point numbers are divided into sign, exponent and mantissa components. For single precision floats this looks like:</p>
<p align="center">
<a href="/2016/01/diagrams03.png"><img src="/2016/01/diagrams03.png" alt="" /></a>
</p>
<p>Inspired by this, another thing we might try is transposing the data field-wise – i.e. serializing all the signs first, followed by all the exponents, then all the mantissas:</p>
<p align="center">
<a href="/2016/01/diagrams04.png"><img src="/2016/01/diagrams04.png" alt="" /></a>
</p>
<p>(Note that I’m inserting padding bits to keep multibit fields byte aligned – more on this later on.)</p>
<p>We might expect this transposition technique to improve compression by preventing changes in unrelated fields from causing us to be unable to spot patterns in the evolution of a certain field. A good example of where this might be useful is the sign bit: for many timeseries of interest we expect the sign bit to be uniformly 1 or 0 (i.e. all negative or all positive numbers). If we encoded the float without splitting it into fields, that that one very predictable bit is mixed in with 31 much more varied bits, which makes it much harder to spot this pattern.</p>
<h2 id="delta-encoding">Delta encoding</h2>
<p>In delta encoding, you encode consecutive elements of a sequence not by their absolute value but rather by how much larger than they are than the previous element in the sequence. You might expect this to aid later compression of timeseries data because, although a timeseries might have an overall trend, you would expect the day-to-day variation to be essentially unchanging. For example, Vodafone’s stock price might be generally trending up from 150p at the start of the year to 200p at the end, but you expect it won’t usually change by more than 10p on any individual day within that year. Therefore, by delta-encoding the sequence you would expect to increase the probability of the sequence containing a repeated substring and hence its compressibility.</p>
<p>This idea can be combined with transposition, by applying the transposition to the deltas rather than the raw data to be compressed. If you do go this route, you might then apply a trick called zig-zagging (used in e.g. <a href="https://developers.google.com/protocol-buffers/docs/encoding?hl=en#signed-integers">protocol buffers</a>) and store your deltas such that small negative numbers are represented as small positive ints. Specifically, you might store the delta -1 as 1, 1 as 2, -2 as 3, 2 as 4 and so on. The reasoning behind this is that you expect your deltas to be both positive and negative, but certainly clustered around 0. By using zig-zagging, you tend to cause the MSB of your deltas to become 0, which then in turn leads to extremely compressible long runs of zeroes in your transposed version of those deltas.</p>
<h2 id="special-cases">Special cases</h2>
<p>One particular floating point number is worth discussing: NaN. It is very common for financial timeseries to contain a few NaNs scattered throughout. For example, when a stock exchange is on holiday no official close prices will be published for the day, and this tends to be represented as a NaN in a timeseries of otherwise similar prices.</p>
<p>Because NaNs are both common and very dissimilar to other numbers that we might encounter, we might want to encode them with a special short representation. Specifically, I implemented a variant of the field transposition above, where the sign bit is actually stored extended to a two bit “descriptor” value with the following interpretation:</p>
<table>
<tr><th>Bit 1</th><th>Bit 2</th><th>Interpretation</th></tr>
<tr><td>0</td><td>0</td><td>Zero</td></tr>
<tr><td>0</td><td>1</td><td>NaN</td></tr>
<tr><td>1</td><td>0</td><td>Positive</td></tr>
<tr><td>1</td><td>1</td><td>Negative</td></tr>
</table>
<p>The mantissa and exponent are not stored if the descriptor is 0 or 1.</p>
<p>Note that this representation of NaNs erases the distinction between different NaN values, when in reality there are e.g. 16,777,214 distinct single precision NaNs. This technically makes this a lossy compression technique, but in practice it is rarely important to be able to distinguish between different NaN values. (The only application that I’m aware of that actually depends on the distinction between NaNs is <a href="http://www.lua-users.org/lists/lua-l/2009-11/msg00089.html">LuaJIT</a>.)</p>
<h2 id="methodology">Methodology</h2>
<p>In my experiments (available on <a href="https://github.com/batterseapower/timeseries-compression">Github</a>) I tried all combinations of the following compression pipeline:</p>
<ol>
<li>
<p>Field transposition: on (start by splitting each number into 3 fields) or off (treat whole floating point number as a single field)?</p>
</li>
<li>
<p>(Only if field transposition is being used.) Special cases: on or off?</p>
</li>
<li>
<p>Delta encoding: off (store raw field contents) or on (stort each field as an offset from the previous field)? When delta encoding was turned on, I additionally used zig-zagging.</p>
</li>
<li>
<p>Byte transposition: given that I have a field, should I transpose the bytes of that field? In fact, I exhaustively investigated all possible byte-aligned transpositions of each field.</p>
</li>
<li>
<p>Compressor: BZ2 or Snappy?</p>
</li>
</ol>
<p>I denote a byte-level transposition as a list of numbers summing to the number of bytes in one data item. So for example, a transposition for 4-byte numbers which wrote all of the LSBs first, followed by the all of the next-most-significant bytes etc would be written as [1, 1, 1, 1], while one that broke each 4-byte quantity into two 16-bit chunks would be written [2, 2], and the degenerate case of no transposition would be [4]. Note that numbers occur in the list in increasing order of the significance of the bytes in the item that they manipulate.</p>
<p>As discussed above, in the case where a literal or mantissa wasn’t an exact multiple of 8 bits, my filters padded the field to the nearest byte boundary before sending it to the compressor. This means that the filtering process actually makes the data substantially larger (e.g. 52-bit double mantissas are padded to 56 bits, becoming 7.7% larger in the process). This not only makes the filtering code simpler, but also turns out to be essential for good compression when using Snappy, which is only able to detect byte-aligned repitition.</p>
<p>Without further ado, let’s look at the results.</p>
<h1 id="dense-timeseries">Dense timeseries</h1>
<p>I begin by looking at dense timeseries where NaNs do not occur. With such data, it’s clear that we won’t gain from the “special cases” encoding above, so results in this section are derived from a version of the compression code where we just use 1 bit to encode the sign.</p>
<h1 id="single-precision-floats">Single-precision floats</h1>
<p>The minimum compressed size (in bytes) achieved for each combination of parameters is as follows:</p>
<!--
Compressor BZ2 Snappy
Exponent Method Mantissa Method
Delta Delta 6364312 9067141
Literal 6283216 8622587
Literal Delta 6372444 9071864
Literal 6306624 8626114
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Exponent Method</th>
<th>Mantissa Method</th>
<th>BZ2</th>
<th>Snappy</th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="2" valign="top">Delta</th>
<th>Delta</th>
<td>6364312</td>
<td>9067141</td>
</tr>
<tr>
<th>Literal</th>
<td>6283216</td>
<td>8622587</td>
</tr>
<tr>
<th rowspan="2" valign="top">Literal</th>
<th>Delta</th>
<td>6372444</td>
<td>9071864</td>
</tr>
<tr>
<th>Literal</th>
<td>6306624</td>
<td>8626114</td>
</tr>
</tbody>
</table>
<p>(This table says that, for example, if we delta-encode the float exponents but literal-encode the mantissas, then the best tranposition scheme achieved a compressed size of 6,283,216 bytes.)</p>
<p>The story here is that delta encoding is strictly better than literal encoding for the exponent, but conversely literal encoding is better for the mantissa. In fact, if we look at the performance of each possible mantisas transposition, we can see that delta encoding tends to underperform in those cases where the MSB is split off into its own column, rather than being packaged up with the second-most-significant byte. This result is consistent across both BZ2 and Snappy.</p>
<!--
Mantissa Codec
[3] 6364312
[2, 1] 6514554
[1, 1, 1] 7321926
[1, 2] 7394645
Name: Delta, dtype: int32
Mantissa Codec
[2, 1] 6283216
[3] 6753718
[1, 1, 1] 7226293
[1, 2] 7824557
Name: Literal, dtype: int32
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Mantissa Transposition</th>
<th>Mantissa Method</th>
<th>BZ2</th>
<th>Snappy</th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="2" valign="top"><tt>[1, 1, 1]</tt></th>
<th>Delta</th>
<td>7321926</td>
<td>9199766</td>
</tr>
<tr>
<th>Literal</th>
<td>7226293</td>
<td>9154821</td>
</tr>
<tr>
<th rowspan="2" valign="top"><tt>[1, 2]</tt></th>
<th>Delta</th>
<td>7394645</td>
<td>9317258</td>
</tr>
<tr>
<th>Literal</th>
<td>7824557</td>
<td>9420206</td>
</tr>
<tr>
<th rowspan="2" valign="top"><tt>[2, 1]</tt></th>
<th>Delta</th>
<td>6514554</td>
<td>9099462</td>
</tr>
<tr>
<th>Literal</th>
<td>6283216</td>
<td>8622587</td>
</tr>
<tr>
<th rowspan="2" valign="top"><tt>[3]</tt></th>
<th>Delta</th>
<td>6364312</td>
<td>9067141</td>
</tr>
<tr>
<th>Literal</th>
<td>6753718</td>
<td>9475316</td>
</tr>
</tbody>
</table>
<p>The other interesting feature of these results is that transposition tends to hurt BZ2 compression ratios. It always makes things worse with delta encoding, but even with literal encoding only one particular transposition ([2, 1]) actually strongly improves a BZ2 result. Things are a bit different for Snappy: although once again delta is always worse with transposition enabled, transpostion <strong>always</strong> aids Snappy in the literal case – though once again the effect is strongest with [2, 1] transposition.</p>
<p>The strong showing for [2, 1] transposition suggests to me that the lower-order bits of the mantissa are more correlated with each other than they are with the MSB. This sort of makes sense, since due to the fact that equities trade with a fixed <a href="https://en.wikipedia.org/wiki/Tick_size">tick size</a> prices will actually be quantised into a relatively small number of values. This will tend to cause the lower order bits of the mantissa to become correlated.</p>
<p>Finally, we can ask what would happen if we didn’t make the mantissa/exponent distinction at all and instead just packed those two fields together:</p>
<!--
Compressor BZ2 Snappy
Number Method
Delta 6254386 8847839
Literal 6366714 8497983
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Method</th>
<th>BZ2</th>
<th>Snappy</th>
</tr>
</thead>
<tbody>
<tr>
<th>Delta</th>
<td>6254386</td>
<td>8847839</td>
</tr>
<tr>
<th>Literal</th>
<td>6366714</td>
<td>8497983</td>
</tr>
</tbody>
</table>
<p>These numbers don’t show any clear preference for either of the two approaches. For BZ2, delta performance is improved by not doing the splitting, at the cost of larger outputs when using the delta method, while for Snappy we have the opposite: literal performance is improved while delta performance is harmed. What is true is that the best case, the compressed sizes we observe here are better than the best achievable size in the no-split case.</p>
<p>In some ways it is quite surprising that delta encoding ever beats literal encoding in this scenario, because it’s not clear that the deltas we compute here are actually meaningful and hence likely to generally be small.</p>
<p>We can also analyse the best available transpositions in this case. Considering BZ2 first:</p>
<!--
Number Codec Delta Number Codec Literal
Rank
1 [4] 6254386 [2, 2] 6366714
2 [3, 1] 6260446 [2, 1, 1] 6438215
3 [2, 2] 6395954 [3, 1] 7033810
4 [2, 1, 1] 6402354 [4] 7109668
5 [1, 1, 1, 1] 7136612 [1, 1, 1, 1] 7327437
6 [1, 2, 1] 7227282 [1, 1, 2] 7405128
7 [1, 1, 2] 7285350 [1, 2, 1] 8039403
8 [1, 3] 7337277 [1, 3] 8386647
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>BZ2 Rank</th>
<th>Delta Transposition</th>
<th>Size</th>
<th>Literal Transposition</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>[4]</td>
<td>6254386</td>
<td>[2, 2]</td>
<td>6366714</td>
</tr>
<tr>
<th>2</th>
<td>[3, 1]</td>
<td>6260446</td>
<td>[2, 1, 1]</td>
<td>6438215</td>
</tr>
<tr>
<th>3</th>
<td>[2, 2]</td>
<td>6395954</td>
<td>[3, 1]</td>
<td>7033810</td>
</tr>
<tr>
<th>4</th>
<td>[2, 1, 1]</td>
<td>6402354</td>
<td>[4]</td>
<td>7109668</td>
</tr>
<tr>
<th>5</th>
<td>[1, 1, 1, 1]</td>
<td>7136612</td>
<td>[1, 1, 1, 1]</td>
<td>7327437</td>
</tr>
<tr>
<th>6</th>
<td>[1, 2, 1]</td>
<td>7227282</td>
<td>[1, 1, 2]</td>
<td>7405128</td>
</tr>
<tr>
<th>7</th>
<td>[1, 1, 2]</td>
<td>7285350</td>
<td>[1, 2, 1]</td>
<td>8039403</td>
</tr>
<tr>
<th>8</th>
<td>[1, 3]</td>
<td>7337277</td>
<td>[1, 3]</td>
<td>8386647</td>
</tr>
</tbody>
</table>
<p>Just as above, delta encoding does best when transposition at all is used, and generally gets worse as the transposition gets more and more “fragmented”. On the other hand, literal encoding does well with transpositions that tend to keep together the first two bytes (i.e. the exponent + the leading bits of the mantissa).</p>
<p>Now let’s look at the performance of the unsplit data when compressed with Snappy:</p>
<!--
Number Codec Delta Number Codec Literal
Rank
1 [3, 1] 8847839 [2, 1, 1] 8497983
2 [2, 1, 1] 8883392 [1, 1, 1, 1] 9033135
3 [1, 1, 1, 1] 8979988 [1, 2, 1] 9311863
4 [1, 2, 1] 9093582 [3, 1] 9404027
5 [2, 2] 9650796 [2, 2] 10107042
6 [4] 9659888 [1, 1, 2] 10842190
7 [1, 1, 2] 9847987 [4] 10942085
8 [1, 3] 10524215 [1, 3] 11722159
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Snappy Rank</th>
<th>Delta Transposition</th>
<th>Size</th>
<th>Literal Transposition</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>[3, 1]</td>
<td>8847839</td>
<td>[2, 1, 1]</td>
<td>8497983</td>
</tr>
<tr>
<th>2</th>
<td>[2, 1, 1]</td>
<td>8883392</td>
<td>[1, 1, 1, 1]</td>
<td>9033135</td>
</tr>
<tr>
<th>3</th>
<td>[1, 1, 1, 1]</td>
<td>8979988</td>
<td>[1, 2, 1]</td>
<td>9311863</td>
</tr>
<tr>
<th>4</th>
<td>[1, 2, 1]</td>
<td>9093582</td>
<td>[3, 1]</td>
<td>9404027</td>
</tr>
<tr>
<th>5</th>
<td>[2, 2]</td>
<td>9650796</td>
<td>[2, 2]</td>
<td>10107042</td>
</tr>
<tr>
<th>6</th>
<td>[4]</td>
<td>9659888</td>
<td>[1, 1, 2]</td>
<td>10842190</td>
</tr>
<tr>
<th>7</th>
<td>[1, 1, 2]</td>
<td>9847987</td>
<td>[4]</td>
<td>10942085</td>
</tr>
<tr>
<th>8</th>
<td>[1, 3]</td>
<td>10524215</td>
<td>[1, 3]</td>
<td>11722159</td>
</tr>
</tbody>
</table>
<p>The Snappy results are very different from the BZ2 case. Here, the same sort of transpositions tend to do well with both the literal and delta methods. The kinds of transpositions that are successful are those that keep together the exponent and the leading bits of the mantissa, though even fully-dispersed transpositions like [1, 1, 1, 1] put in a strong showing.</p>
<p>That’s a lot of data, but what’s the bottom line? For Snappy, splitting the floats into mantisas and exponent before processing does seem to have slightly more consistently small outputs than working with unsplit data. The BZ2 situation is less clear but only because the exact choice doesn’t seem to make a ton of difference. Therefore, my recommendation for single-precision floats is to delta-encode exponents, and to use literal encoding for mantissas with [2, 1] transposition.</p>
<h1 id="double-precision-floats">Double-precision floats</h1>
<p>While there were only 4 different transpositions for single-precision floats, there are 2 ways to transpose a double-precision exponent, and 64 ways to transpose the mantissa. This makes the parameter search for double precision considerably more computationally expensive. The results are:</p>
<!--
Compressor BZ2 Snappy
Exponent Method Mantissa Method
Delta Delta 6485895 12463583
Literal 6500390 10437550
Literal Delta 6456152 12469132
Literal 6475579 10439869
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Exponent Method</th>
<th>Mantissa Method</th>
<th>BZ2</th>
<th>Snappy</th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="2" valign="top">Delta</th>
<th>Delta</th>
<td>6485895</td>
<td>12463583</td>
</tr>
<tr>
<th>Literal</th>
<td>6500390</td>
<td>10437550</td>
</tr>
<tr>
<th rowspan="2" valign="top">Literal</th>
<th>Delta</th>
<td>6456152</td>
<td>12469132</td>
</tr>
<tr>
<th>Literal</th>
<td>6475579</td>
<td>10439869</td>
</tr>
</tbody>
</table>
<p>These results show interesting differences between BZ2 and Snappy. For BZ2 there is not much in it, but it’s consistently always better to literal-encode the exponent and delta-encode the mantissa. For Snappy, things are exactly the other way around: delta-encoding the exponent and literal-encoding the mantissa is optimal.</p>
<p>The choice of exponent transposition scheme has the following effect:</p>
<!--
== Delta or literal better for the exponent?
Compressor BZ2 Snappy
Exponent Codec Exponent Method
[1, 1] Delta 6485895 10437550
Literal 6492837 10439869
[2] Delta 6496032 10560467
Literal 6456152 10564737
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Exponent Transposition</th>
<th>Exponent Method</th>
<th>BZ2</th>
<th>Snappy</th>
</tr>
</thead>
<tbody>
<tr>
<th rowspan="2" valign="top"><tt>[1, 1]</tt></th>
<th>Delta</th>
<td>6485895</td>
<td>10437550</td>
</tr>
<tr>
<th>Literal</th>
<td>6492837</td>
<td>10439869</td>
</tr>
<tr>
<th rowspan="2" valign="top"><tt>[2]</tt></th>
<th>Delta</th>
<td>6496032</td>
<td>10560467</td>
</tr>
<tr>
<th>Literal</th>
<td>6456152</td>
<td>10564737</td>
</tr>
</tbody>
</table>
<p>It’s not clear, but [1, 1] transposition might be optimal. Bear in mind that double exponents are only 11 bits long, so the lower 5 bits of the LSB being encoded here will always be 0. Using [1, 1] transposition might better help the compressor get a handle on this pattern.</p>
<!--
== Delta or literal better for the mantissa when using Snappy?
Mantissa Codec Delta Mantissa Codec Literal
Rank
1 [5, 1, 1] 12463583 [3, 2, 1, 1] 10437550
2 [1, 3, 2, 1] 12678887 [1, 1, 1, 2, 1, 1] 10437550
3 [6, 1] 12737838 [1, 2, 2, 1, 1] 10437550
4 [4, 2, 1] 12749067 [2, 1, 2, 1, 1] 10437550
5 [7] 12766804 [1, 1, 1, 1, 1, 1, 1] 10598739
6 [1, 3, 1, 1, 1] 12820360 [2, 1, 1, 1, 1, 1] 10598739
7 [3, 1, 2, 1] 12824154 [3, 1, 1, 1, 1] 10598739
8 [4, 1, 1, 1] 12890981 [1, 2, 1, 1, 1, 1] 10598739
9 [3, 3, 1] 12915900 [1, 1, 1, 1, 2, 1] 10715444
10 [3, 1, 1, 1, 1] 12946767 [3, 1, 2, 1] 10715444
11 [3, 4] 12980988 [2, 1, 1, 2, 1] 10715444
12 [2, 2, 2, 1] 12988305 [1, 2, 1, 2, 1] 10715444
13 [2, 2, 1, 1, 1] 13144026 [1, 3, 1, 1, 1] 10843992
14 [3, 2, 1, 1] 13164515 [1, 1, 2, 1, 1, 1] 10859665
15 [1, 3, 3] 13290747 [2, 2, 1, 1, 1] 10859665
16 [1, 1, 2, 2, 1] 13360711 [1, 3, 2, 1] 10942563
17 [4, 3] 13366221 [2, 2, 2, 1] 10946079
18 [1, 4, 1, 1] 13471949 [1, 1, 2, 2, 1] 10946079
19 [3, 1, 3] 13515593 [2, 1, 3, 1] 10951911
20 [1, 1, 2, 1, 1, 1] 13516479 [1, 1, 1, 3, 1] 10951911
== Delta or literal better for the mantissa when using BZ2?
Mantissa Codec Delta Mantissa Codec Literal
Rank
1 [7] 6456152 [6, 1] 6475579
2 [6, 1] 6498686 [1, 5, 1] 6495555
3 [4, 3] 6962167 [2, 4, 1] 6510416
4 [4, 2, 1] 6994270 [1, 1, 4, 1] 6510416
5 [1, 6] 7040401 [3, 3, 1] 6692613
6 [3, 4] 7054835 [2, 1, 3, 1] 6692613
7 [1, 5, 1] 7092230 [1, 1, 1, 3, 1] 6692613
8 [2, 5] 7108000 [1, 2, 3, 1] 6692613
9 [2, 4, 1] 7176760 [1, 6] 6926231
10 [3, 3, 1] 7210100 [7] 6931475
11 [1, 1, 5] 7335755 [2, 5] 6936096
12 [1, 1, 4, 1] 7391040 [1, 1, 5] 6936096
13 [2, 2, 2, 1] 7502334 [2, 1, 4] 6983984
14 [1, 1, 1, 4] 7522644 [1, 2, 4] 6983984
15 [2, 2, 3] 7527811 [3, 4] 6983984
16 [3, 1, 3] 7557472 [1, 1, 1, 4] 6983984
17 [3, 1, 2, 1] 7568613 [2, 2, 2, 1] 7216000
18 [5, 2] 7594170 [1, 1, 2, 2, 1] 7216000
19 [1, 3, 2, 1] 7607076 [4, 2, 1] 7241303
20 [1, 3, 3] 7621445 [1, 3, 2, 1] 7244281
-->
<p>When looking at the best mantissa transpositions, there are so many possible transpositions that we’ll consider BZ2 and Snappy one by one, examining just the top 10 transposition choices for each. BZ2 first:</p>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>BZ2 Rank</th>
<th>Delta Mantissa Transposition</th>
<th>Size</th>
<th>Literal Mantissa Transposition</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>[7]</td>
<td>6456152</td>
<td>[6, 1]</td>
<td>6475579</td>
</tr>
<tr>
<th>2</th>
<td>[6, 1]</td>
<td>6498686</td>
<td>[1, 5, 1]</td>
<td>6495555</td>
</tr>
<tr>
<th>3</th>
<td>[4, 3]</td>
<td>6962167</td>
<td>[2, 4, 1]</td>
<td>6510416</td>
</tr>
<tr>
<th>4</th>
<td>[4, 2, 1]</td>
<td>6994270</td>
<td>[1, 1, 4, 1]</td>
<td>6510416</td>
</tr>
<tr>
<th>5</th>
<td>[1, 6]</td>
<td>7040401</td>
<td>[3, 3, 1]</td>
<td>6692613</td>
</tr>
<tr>
<th>6</th>
<td>[3, 4]</td>
<td>7054835</td>
<td>[2, 1, 3, 1]</td>
<td>6692613</td>
</tr>
<tr>
<th>7</th>
<td>[1, 5, 1]</td>
<td>7092230</td>
<td>[1, 1, 1, 3, 1]</td>
<td>6692613</td>
</tr>
<tr>
<th>8</th>
<td>[2, 5]</td>
<td>7108000</td>
<td>[1, 2, 3, 1]</td>
<td>6692613</td>
</tr>
<tr>
<th>9</th>
<td>[2, 4, 1]</td>
<td>7176760</td>
<td>[1, 6]</td>
<td>6926231</td>
</tr>
<tr>
<th>10</th>
<td>[3, 3, 1]</td>
<td>7210100</td>
<td>[7]</td>
<td>6931475</td>
</tr>
</tbody>
</table>
<p>We can see that literal encoding tends to beat delta encoding, though the very best size was in fact achieved via a simple untransposed delta representation. In both the literal and the delta case, the encodings that do well tend to keep the middle 5 bytes of the mantissa grouped together, which is support for our idea that these bytes tend to be highly correlated, with most of the information being encoded in the MSB.</p>
<p>Turning to Snappy:</p>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Snappy Rank</th>
<th>Delta Mantissa Transposition</th>
<th>Size</th>
<th>Literal Mantissa Transposition</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>[5, 1, 1]</td>
<td>12463583</td>
<td>[3, 2, 1, 1]</td>
<td>10437550</td>
</tr>
<tr>
<th>2</th>
<td>[1, 3, 2, 1]</td>
<td>12678887</td>
<td>[1, 1, 1, 2, 1, 1]</td>
<td>10437550</td>
</tr>
<tr>
<th>3</th>
<td>[6, 1]</td>
<td>12737838</td>
<td>[1, 2, 2, 1, 1]</td>
<td>10437550</td>
</tr>
<tr>
<th>4</th>
<td>[4, 2, 1]</td>
<td>12749067</td>
<td>[2, 1, 2, 1, 1]</td>
<td>10437550</td>
</tr>
<tr>
<th>5</th>
<td>[7]</td>
<td>12766804</td>
<td>[1, 1, 1, 1, 1, 1, 1]</td>
<td>10598739</td>
</tr>
<tr>
<th>6</th>
<td>[1, 3, 1, 1, 1]</td>
<td>12820360</td>
<td>[2, 1, 1, 1, 1, 1]</td>
<td>10598739</td>
</tr>
<tr>
<th>7</th>
<td>[3, 1, 2, 1]</td>
<td>12824154</td>
<td>[3, 1, 1, 1, 1]</td>
<td>10598739</td>
</tr>
<tr>
<th>8</th>
<td>[4, 1, 1, 1]</td>
<td>12890981</td>
<td>[1, 2, 1, 1, 1, 1]</td>
<td>10598739</td>
</tr>
<tr>
<th>9</th>
<td>[3, 3, 1]</td>
<td>12915900</td>
<td>[1, 1, 1, 1, 2, 1]</td>
<td>10715444</td>
</tr>
<tr>
<th>10</th>
<td>[3, 1, 1, 1, 1]</td>
<td>12946767</td>
<td>[3, 1, 2, 1]</td>
<td>10715444</td>
</tr>
</tbody>
</table>
<p>The Snappy results are strikingly different from the BZ2 ones. In this case, just like BZ2, literal encoding tends to beat delta encoding, but the difference is much more pronounced than the BZ2 case. Furthermore, the kinds of transpositions that minimize the size of the literal encoded data here are very different from the transpositions that were successful with BZ2: in that case we wanted to keep the middle bytes together, while here the scheme [1, 1, 1, 1, 1, 1, 1] where every byte has it’s own column is not far from optimal.</p>
<p>And now considering results for the case where we do not split the floating point number into mantissa/exponent components:</p>
<!--
== GZip/BZ2 should dominate Snappy?
Compressor BZ2 Snappy
Number Method
Delta 6401147 11835533
Literal 6326562 9985079
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Method</th>
<th>BZ2</th>
<th>Snappy</th>
</tr>
</thead>
<tbody>
<tr>
<th>Delta</th>
<td>6401147</td>
<td>11835533</td>
</tr>
<tr>
<th>Literal</th>
<td>6326562</td>
<td>9985079</td>
</tr>
</tbody>
</table>
<p>These results show a clear preference for literal encoding, which is definitiely what we expect, given that delta encoding is not obviously meaningful for a unsplit number. We also see results that are universally better than those for split case: it seems that splitting the number into fields is actually a fairly large pessimisation! This is probably caused by the internal fragmentation implied by our byte-alignment of the data, which is a much greater penalty for doubles than it was for singles. It would be interesting to repeat the experiment without byte-alignment.</p>
<p>We can examine which transposition schemes do best in the unsplit case. BZ2 first:</p>
<!--
== Delta or literal better for the number when using BZ2?
Number Codec Delta Number Codec Literal
Rank
1 [8] 6401147 [6, 2] 6326562
2 [7, 1] 6407935 [1, 5, 2] 6351132
3 [6, 2] 6419062 [2, 4, 2] 6360533
4 [6, 1, 1] 6440333 [1, 1, 4, 2] 6360533
5 [4, 3, 1] 6834375 [3, 3, 2] 6562859
6 [4, 4] 6866167 [1, 2, 3, 2] 6562859
7 [4, 2, 1, 1] 6903123 [2, 1, 3, 2] 6562859
8 [4, 2, 2] 6903322 [1, 1, 1, 3, 2] 6562859
9 [1, 7] 6979216 [6, 1, 1] 6598104
10 [1, 6, 1] 6983998 [1, 5, 1, 1] 6621003
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>BZ2 Rank</th>
<th>Delta Transposition</th>
<th>Size</th>
<th>Literal Transposition</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>[8]</td>
<td>6401147</td>
<td>[6, 2]</td>
<td>6326562</td>
</tr>
<tr>
<th>2</th>
<td>[7, 1]</td>
<td>6407935</td>
<td>[1, 5, 2]</td>
<td>6351132</td>
</tr>
<tr>
<th>3</th>
<td>[6, 2]</td>
<td>6419062</td>
<td>[2, 4, 2]</td>
<td>6360533</td>
</tr>
<tr>
<th>4</th>
<td>[6, 1, 1]</td>
<td>6440333</td>
<td>[1, 1, 4, 2]</td>
<td>6360533</td>
</tr>
<tr>
<th>5</th>
<td>[4, 3, 1]</td>
<td>6834375</td>
<td>[3, 3, 2]</td>
<td>6562859</td>
</tr>
<tr>
<th>6</th>
<td>[4, 4]</td>
<td>6866167</td>
<td>[1, 2, 3, 2]</td>
<td>6562859</td>
</tr>
<tr>
<th>7</th>
<td>[4, 2, 1, 1]</td>
<td>6903123</td>
<td>[2, 1, 3, 2]</td>
<td>6562859</td>
</tr>
<tr>
<th>8</th>
<td>[4, 2, 2]</td>
<td>6903322</td>
<td>[1, 1, 1, 3, 2]</td>
<td>6562859</td>
</tr>
<tr>
<th>9</th>
<td>[1, 7]</td>
<td>6979216</td>
<td>[6, 1, 1]</td>
<td>6598104</td>
</tr>
<tr>
<th>10</th>
<td>[1, 6, 1]</td>
<td>6983998</td>
<td>[1, 5, 1, 1]</td>
<td>6621003</td>
</tr>
</tbody>
</table>
<p>Recall that double precision floating point numbers have 11 bits of exponent and 52 bits of mantissa. We can actually see that showing up in the literal results above: the transpositions that do best are those that either pack together the exponent and the first bits of the mantissa, or have a seperate column for just the exponent information (e.g. [1, 5, 2] or [2, 4, 2]).</p>
<p>And Snappy:</p>
<!--
== Delta or literal better for the number when using Snappy?
Number Codec Delta Number Codec Literal
Rank
1 [5, 1, 1, 1] 11835533 [1, 1, 1, 2, 1, 1, 1] 9985079
2 [1, 3, 2, 1, 1] 12137223 [2, 1, 2, 1, 1, 1] 9985079
3 [6, 1, 1] 12224432 [3, 2, 1, 1, 1] 9985079
4 [4, 2, 1, 1] 12253670 [1, 2, 2, 1, 1, 1] 9985079
5 [1, 3, 1, 1, 1, 1] 12280165 [1, 1, 1, 1, 1, 1, 1, 1] 10232996
6 [3, 1, 2, 1, 1] 12281694 [2, 1, 1, 1, 1, 1, 1] 10232996
7 [5, 1, 2] 12338551 [1, 2, 1, 1, 1, 1, 1] 10232996
8 [4, 1, 1, 1, 1] 12399017 [3, 1, 1, 1, 1, 1] 10232996
9 [3, 1, 1, 1, 1, 1] 12409691 [1, 1, 1, 1, 2, 1, 1] 10343471
10 [2, 2, 2, 1, 1] 12434147 [2, 1, 1, 2, 1, 1] 10343471
-->
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th>Snappy Rank</th>
<th>Delta Transposition</th>
<th>Size</th>
<th>Literal Transposition</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<th>1</th>
<td>[5, 1, 1, 1]</td>
<td>11835533</td>
<td>[1, 1, 1, 2, 1, 1, 1]</td>
<td>9985079</td>
</tr>
<tr>
<th>2</th>
<td>[1, 3, 2, 1, 1]</td>
<td>12137223</td>
<td>[2, 1, 2, 1, 1, 1]</td>
<td>9985079</td>
</tr>
<tr>
<th>3</th>
<td>[6, 1, 1]</td>
<td>12224432</td>
<td>[3, 2, 1, 1, 1]</td>
<td>9985079</td>
</tr>
<tr>
<th>4</th>
<td>[4, 2, 1, 1]</td>
<td>12253670</td>
<td>[1, 2, 2, 1, 1, 1]</td>
<td>9985079</td>
</tr>
<tr>
<th>5</th>
<td>[1, 3, 1, 1, 1, 1]</td>
<td>12280165</td>
<td>[1, 1, 1, 1, 1, 1, 1, 1]</td>
<td>10232996</td>
</tr>
<tr>
<th>6</th>
<td>[3, 1, 2, 1, 1]</td>
<td>12281694</td>
<td>[2, 1, 1, 1, 1, 1, 1]</td>
<td>10232996</td>
</tr>
<tr>
<th>7</th>
<td>[5, 1, 2]</td>
<td>12338551</td>
<td>[1, 2, 1, 1, 1, 1, 1]</td>
<td>10232996</td>
</tr>
<tr>
<th>8</th>
<td>[4, 1, 1, 1, 1]</td>
<td>12399017</td>
<td>[3, 1, 1, 1, 1, 1]</td>
<td>10232996</td>
</tr>
<tr>
<th>9</th>
<td>[3, 1, 1, 1, 1, 1]</td>
<td>12409691</td>
<td>[1, 1, 1, 1, 2, 1, 1]</td>
<td>10343471</td>
</tr>
<tr>
<th>10</th>
<td>[2, 2, 2, 1, 1]</td>
<td>12434147</td>
<td>[2, 1, 1, 2, 1, 1]</td>
<td>10343471</td>
</tr>
</tbody>
</table>
<p>Here we see the same pattern as we did above: Snappy seems to prefer “more transposed” transpositions than BZ2 does, and we even see a strong showing for the maximal split [1, 1, 1, 1, 1, 1, 1, 1].</p>
<p>To summarize: for doubles, it seems that regardless of which compressor you use, you are better off not splitting into mantissa/exponent portions, and just literal encoding the whole thing. If using Snappy, [1, 1, 1, 1, 1, 1, 1, 1] transposition seems to be the way to go, but the situation is less clear with BZ2: [6, 2] did well in our tests but it wasn’t a runaway winner.</p>
<p>If for some reason you did want to use splitting, if you are also going to use BZ2, then [6, 1] literal encoding for the mantissas and literal encoding for the exponents seems like a sensible choice. If you are a Snappy user, then I would suggest that a principled choice would be to use [1, 1, 1, 1, 1, 1, 1] literal encoding for the mantissas and likewise [1, 1] literal encoding for the exponent.</p>
<h1 id="sparse-timeseries">Sparse timeseries</h1>
<p>Let’s now look at the sparse timeseries case, where many of the values in the timeseries are NaN. In this case, we’re interested in evaluating how useful the “special case” optimization above is in improving compression ratios.</p>
<!--
No Split Split w/ Special Cases Split wout/ Special Cases
Knockout
0.00 9985079 10445497 10437550
0.10 10819117 9909251 11501526
0.50 8848393 6238097 10255811
0.75 5732076 3628543 7218287
-->
<p>To evaluate this, I replaced a fraction of numbers in my test dataset with NaNs and looked at the best possible size result for a few such fractions. The compressed size in each case is:</p>
<table class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>No Split</th>
<th>Split w/ Special Cases</th>
<th>Split wout/ Special Cases</th>
</tr>
</thead>
<tbody>
<tr>
<th>0.00</th>
<td>9985079</td>
<td>10445497</td>
<td>10437550</td>
</tr>
<tr>
<th>0.10</th>
<td>10819117</td>
<td>9909251</td>
<td>11501526</td>
</tr>
<tr>
<th>0.50</th>
<td>8848393</td>
<td>6238097</td>
<td>10255811</td>
</tr>
<tr>
<th>0.75</th>
<td>5732076</td>
<td>3628543</td>
<td>7218287</td>
</tr>
</tbody>
</table>
<p align="center">
<a href="/2016/01/special-cases.png"><img src="/2016/01/special-cases.png" alt="" /></a>
</p>
<p>Note that for convenience here the only compressor I tested was Snappy – i.e. BZ2 was not tested. I also didn’t implement special cases in the no-split case, because an artifact of my implementation is that the special-casing is done at the same time as the float is split into its three component fields (sign, mantissa, exponent).</p>
<p>As we introduce small numbers of NaNs to the data, both the nosplit and non-special-cased data get larger. This is expected, because we’re replacing predictable timeseries values at random with totally dissimilar values and hence adding entropy. The special-cased split shrinks because this increasing entropy is compensated for by the very short codes we have chosen for NaNs (for which we do pay a very small penalty in the NaNless case). At very high numbers of NaNs, the compressed data for all methods shrinks as NaNs become the rule rather than the exception.</p>
<p>High numbers of NaNs (10% plus) is probably a realistic fraction for real world financial data, so it definitely does seem like implementing special cases is worthwhile. The improvement would probably be considerably less if we looked at BZ2-based results, though.</p>
<h2 id="closing-thoughts">Closing thoughts</h2>
<p>One general observation is that delta encoding is very rarely the best choice, and when it <em>is</em> the best, the gains are usually marginal when compared to literal encoding. This is interesting because Fabian Giesen came to exactly the same conclusion (that delta encoding is redundant when you can do transposition) in the <a href="http://www.farbrausch.com/~fg/seminars/workcompression_download.pdf">excellent presentation</a> that I linked to earlier.</p>
<p>By applying these techniques to the dataset I was dealing with at work, I was able to get a nice compression ratio on the order of 10%-20% over and above that I could achieve with naive use of Snappy, so I consider the work a success, but don’t intend to do any more research in the area. However, there are definitely more experiments that could be done in this vein. In particular, interesting questions are:</p>
<ul>
<li>How robust are the findings of this post when applied to other datasets?</li>
<li>What if we don’t byte-align everything i.e. remove some of the useless padding bits? Does that improve the BZ2 case? (My prelimiary experiments showed that it made Snappy considerably worse.)</li>
<li>Why <em>exactly</em> are the results for BZ2 and Snappy so different? Presumably it relates to the lack of an entropy encoder is Snappy, but it is not totally clear to me how this leads to the results above.</li>
</ul>MaxI recently had cause to investigate fast methods of storing and transferring financial timeseries. Naively, timeseries can be represented in memory or on disk as simple dense arrays of floating point numbers. This is an attractive representation with many nice properties:Easy publishing to Maven Central with Gradle2015-11-30T23:24:45+00:002015-11-30T23:24:45+00:00http://blog.omega-prime.co.uk/2015/11/30/easy-publishing-to-maven-central-with-gradle<p>I recently released my first open source library for Java,
<a href="http://batterseapower.github.io/mdbi/">MDBI</a>. I learnt a lot about the
Java open-source ecosystem as part of this process, and this blog
summarises that in the hope that it will be useful to others.
Specifically, the post will explain how to set up a project using the
modern <a href="http://gradle.org/">Gradle</a> build system to build code and
deploy it to the standard <a href="http://search.maven.org/">Maven Central</a>
repository from the command line really easily.</p>
<h1 id="getting-started">Getting started</h1>
<p>In the Haskell ecosystem, everyone uses
<a href="https://www.haskell.org/cabal/">Cabal</a> and
<a href="http://hackage.haskell.org/">Hackage</a>, which are developed by the same
people and tightly integrated. In contrast, Java’s ecosystem is a bit
more fragmented: build systems and package repositiories are managed by
different organisations, and you need to do a bit of integration work to
join everything up.</p>
<p>In particular, in order to get started we’re going to have to sign up
with two different websites: <a href="http://central.sonatype.org/pages/ossrh-guide.html">Sonatype
OSS</a> and
<a href="https://bintray.com/">Bintray</a>:</p>
<ul>
<li>
<p>No-one can publish directly to Maven Central: instead you need to
publish your project to an <a href="https://maven.apache.org/guides/mini/guide-central-repository-upload.html">“approved
repository”</a>,
from where it will be synced to Central. Sonatype OSS is an approved
repository that Sonatype (the company that runs Maven Central)
provide free of charge specifically for open-source projects. We
will use this to get our artifacts into Central, so go and <a href="http://central.sonatype.org/pages/ossrh-guide.html">follow
the sign-up instructions
now</a>.</p>
<p>Your application will be manually reviewed by a Sonatype employee
and approved within one or two working days. If you want an example
of what this process looks like you can <a href="https://issues.sonatype.org/browse/OSSRH-18967">take a look at the ticket I
raised for my MDBI
project</a>.</p>
</li>
<li>
<p>Sonatype OSS is a functional enough way to get your artifacts onto
Central, but it has some irritating features. In particular, when
you want to make a release you need to first push your artifacts to
OSS, and then use an ugly and confusing web interface called
<a href="https://oss.sonatype.org/">Sonatype Nexus</a> to actually “promote”
this to Central. I wanted the release to Central to be totally
automated, and the easiest way to use that is to have a 3rd party
deal with pushing to and then promoting from OSS. For this reason,
you should also sign up with <a href="https://bintray.com/">Bintray</a> (you
can do this with one click if you have a GitHub account).</p>
<p>Bintray is run by a company called JFrog and basically seems to be a
Nexus alternative. JFrog run a Maven repository called
<a href="https://bintray.com/bintray/jcenter">JCenter</a>, and it’s easy to
publish to that via Bintray. Once it’s on JCenter we’ll be able to
push and promote it on Sonatype OSS fully automatically.</p>
</li>
</ul>
<p>We also need to create a Bintray “package” within your Bintray Maven
repository. Do this via the Bintray interface — it should be
self-explanatory. Use the button on the package page to request it be
linked to JCenter (this was approved within a couple of hours for me).</p>
<p>We’ll also need a GPG public/private key pair. Let’s set that up now:</p>
<ol>
<li>Open up a terminal and run <code class="highlighter-rouge">gpg --gen-key</code>. Accept all the defaults
about the algorithm to use, and enter a name, email and passphrase
of your choosing.</li>
<li>
<p>If you run <code class="highlighter-rouge">gpg --list-public-keys</code> you should see something like
this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/Users/mbolingbroke/.gnupg/pubring.gpg
--------------------------------------
pub 2048R/3117F02B 2015-11-18
uid Max Bolingbroke
sub 2048R/15245385 2015-11-18
</code></pre></div> </div>
<p>Whatever is in place of <code class="highlighter-rouge">3117F02B</code> is the name of your key. I’ll
call this <code class="highlighter-rouge">$KEYNAME</code> from now on.</p>
</li>
<li>Run
<code class="highlighter-rouge">gpg --keyserver hkp://pool.sks-keyservers.net --send-keys $KEYNAME</code>
to publish your key.</li>
<li>Run <code class="highlighter-rouge">gpg -a --export-key $KEYNAME</code> and
<code class="highlighter-rouge">gpg -a --export-secret-key $KEYNAME</code> to get your public and secret
keys as ASCII text. <a href="https://bintray.com/profile/edit">Edit your Bintray
account</a> and paste these into the
“GPG Signing” part of the settings.</li>
<li>Edit your personal Maven repository on Bintray and select the option
to “GPG Sign uploaded files automatically”. Don’t use Bintray’s
public/private key pair.</li>
</ol>
<p>Now you have your Bintray and OSS accounts we can move on to setting up
Gradle.</p>
<h1 id="gradle-setup">Gradle setup</h1>
<p>The key problem we’re trying to solve with our Gradle build is producing
a set of JARs that meet the <a href="http://central.sonatype.org/pages/requirements.html">Maven Central
requirements</a>. What
this boils down to is ensuring that we provide:</p>
<ul>
<li>The actual JAR file that people will run.</li>
<li>Source JARs containing the code that we built.</li>
<li>Javadoc JARs containing compiled the HTML help files.</li>
<li>GPG signatures for all of the above. (This is why we created a GPG
key above.)</li>
<li>A POM file containing project metadata.</li>
</ul>
<p>To satisfy these requirements we’re going to use
<a href="https://github.com/bmuschko/gradle-nexus-plugin">gradle-nexus-plugin</a>.
The resulting (unsigned, but otherwise Central-compliant) artifacts will
then be uploaded to Bintray (and eventually Sonatype OSS + Central)
using
<a href="https://github.com/bintray/gradle-bintray-plugin">gradle-bintray-plugin</a>.
I also use one more plugin — Palantir’s
<a href="https://github.com/palantir/gradle-gitsemver">gradle-gitsemver</a> — to
avoid having to update the Gradle file whenever the version number
changes. Our Gradle file begins by pulling all those plugins in:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="n">buildscript</span> <span class="o">{</span>
<span class="n">repositories</span> <span class="o">{</span>
<span class="n">jcenter</span><span class="o">()</span>
<span class="n">maven</span> <span class="o">{</span> <span class="n">url</span> <span class="s2">"http://dl.bintray.com/palantir/releases"</span> <span class="o">}</span>
<span class="o">}</span>
<span class="n">dependencies</span> <span class="o">{</span>
<span class="n">classpath</span> <span class="s1">'com.bmuschko:gradle-nexus-plugin:2.3.1'</span>
<span class="n">classpath</span> <span class="s1">'com.jfrog.bintray.gradle:gradle-bintray-plugin:1.4'</span>
<span class="n">classpath</span> <span class="s1">'com.palantir:gradle-gitsemver:0.7.0'</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">apply</span> <span class="nl">plugin:</span> <span class="s1">'java'</span>
<span class="n">apply</span> <span class="nl">plugin:</span> <span class="s1">'com.bmuschko.nexus'</span>
<span class="n">apply</span> <span class="nl">plugin:</span> <span class="s1">'com.jfrog.bintray'</span>
<span class="n">apply</span> <span class="nl">plugin:</span> <span class="s1">'gitsemver'</span></code></pre></figure>
<p>Now we have the usual Gradle configuration describing how to build the
JAR. Note the use of the <code class="highlighter-rouge">semverVersion()</code> function (provided by the
<code class="highlighter-rouge">gradle-gitsemver</code> plugin) which returns a version number derived from
from the most recent Git tag of the form <code class="highlighter-rouge">vX.Y.Z</code>. Despite the name of
the plugin, there is no requirement to actually adhere to the principles
of <a href="http://semver.org/">Semantic Versioning</a> to use it: the only
requirements for the version numbers are syntactic.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="n">version</span> <span class="nf">semverVersion</span><span class="o">()</span>
<span class="n">group</span> <span class="s1">'uk.co.omega-prime'</span>
<span class="kt">def</span> <span class="n">projectName</span> <span class="o">=</span> <span class="s1">'mdbi'</span>
<span class="kt">def</span> <span class="n">projectDescription</span> <span class="o">=</span> <span class="s1">'Max\'s DataBase Interface: a simple but powerful JDBC wrapper inspired by JDBI'</span>
<span class="n">sourceCompatibility</span> <span class="o">=</span> <span class="mf">1.8</span>
<span class="n">jar</span> <span class="o">{</span>
<span class="n">baseName</span> <span class="o">=</span> <span class="n">projectName</span>
<span class="n">manifest</span> <span class="o">{</span>
<span class="n">attributes</span> <span class="s1">'Implementation-Title'</span><span class="o">:</span> <span class="n">projectName</span><span class="o">,</span>
<span class="s1">'Implementation-Version'</span><span class="o">:</span> <span class="n">version</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">repositories</span> <span class="o">{</span>
<span class="n">mavenCentral</span><span class="o">()</span>
<span class="o">}</span>
<span class="n">dependencies</span> <span class="o">{</span>
<span class="n">compile</span> <span class="nl">group:</span> <span class="s1">'com.google.code.findbugs'</span><span class="o">,</span> <span class="nl">name:</span> <span class="s1">'jsr305'</span><span class="o">,</span> <span class="nl">version:</span> <span class="s1">'3.0.1'</span>
<span class="n">testCompile</span> <span class="nl">group:</span> <span class="s1">'org.xerial'</span><span class="o">,</span> <span class="nl">name:</span> <span class="s1">'sqlite-jdbc'</span><span class="o">,</span> <span class="nl">version:</span> <span class="s1">'3.8.11.2'</span>
<span class="n">testCompile</span> <span class="nl">group:</span> <span class="s1">'junit'</span><span class="o">,</span> <span class="nl">name:</span> <span class="s1">'junit'</span><span class="o">,</span> <span class="nl">version:</span> <span class="s1">'4.12'</span>
<span class="o">}</span></code></pre></figure>
<p>(Obviously your group, project name, description, dependencies etc will
differ from this. Hopefully it’s clear which parts of this example
Gradle file you’ll need to change for your project and which you can
copy verbatim.)</p>
<p>Now we need to configure <code class="highlighter-rouge">gradle-nexus-plugin</code> to generate the POM. Just
by the act of including the plugin we have already arranged for the
appropriate JARs to be generated, but the plugin can’t figure out the
full contents of the POM by itself.</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="n">modifyPom</span> <span class="o">{</span>
<span class="n">project</span> <span class="o">{</span>
<span class="n">name</span> <span class="n">projectName</span>
<span class="n">description</span> <span class="n">projectDescription</span>
<span class="n">url</span> <span class="s1">'http://batterseapower.github.io/mdbi/'</span>
<span class="n">scm</span> <span class="o">{</span>
<span class="n">url</span> <span class="s1">'https://github.com/batterseapower/mdbi'</span>
<span class="n">connection</span> <span class="s1">'scm:https://batterseapower@github.com/batterseapower/mdbi.git'</span>
<span class="n">developerConnection</span> <span class="s1">'scm:git://github.com/batterseapower/mdbi.git'</span>
<span class="o">}</span>
<span class="n">licenses</span> <span class="o">{</span>
<span class="n">license</span> <span class="o">{</span>
<span class="n">name</span> <span class="s1">'The Apache Software License, Version 2.0'</span>
<span class="n">url</span> <span class="s1">'http://www.apache.org/licenses/LICENSE-2.0.txt'</span>
<span class="n">distribution</span> <span class="s1">'repo'</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">developers</span> <span class="o">{</span>
<span class="n">developer</span> <span class="o">{</span>
<span class="n">id</span> <span class="s1">'batterseapower'</span>
<span class="n">name</span> <span class="s1">'Max Bolingbroke'</span>
<span class="n">email</span> <span class="s1">'batterseapower@hotmail.com'</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">nexus</span> <span class="o">{</span>
<span class="n">sign</span> <span class="o">=</span> <span class="kc">false</span>
<span class="o">}</span></code></pre></figure>
<p>Note that I’ve explicitly turned off the automatic artifact signing
capability of the Nexus plugin. Theoretically we should be able to keep
this turned on, and sign everything locally before pushing to Bintray.
This would mean that we wouldn’t have to give Bintray our private key.
In practice, if you sign things locally Bintray seems to mangle the
signature filenames so they become unusable…</p>
<p>Finally, we need to configure the Bintray sync:</p>
<figure class="highlight"><pre><code class="language-groovy" data-lang="groovy"><span class="k">if</span> <span class="o">(</span><span class="n">hasProperty</span><span class="o">(</span><span class="s1">'bintrayUsername'</span><span class="o">)</span> <span class="o">||</span> <span class="n">System</span><span class="o">.</span><span class="na">getenv</span><span class="o">().</span><span class="na">containsKey</span><span class="o">(</span><span class="s1">'BINTRAY_USER'</span><span class="o">))</span> <span class="o">{</span>
<span class="c1">// Used by the bintray plugin</span>
<span class="n">bintray</span> <span class="o">{</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">System</span><span class="o">.</span><span class="na">getenv</span><span class="o">().</span><span class="na">getOrDefault</span><span class="o">(</span><span class="s1">'BINTRAY_USER'</span><span class="o">,</span> <span class="n">bintrayUsername</span><span class="o">)</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">System</span><span class="o">.</span><span class="na">getenv</span><span class="o">().</span><span class="na">getOrDefault</span><span class="o">(</span><span class="s1">'BINTRAY_KEY'</span><span class="o">,</span> <span class="n">bintrayApiKey</span><span class="o">)</span>
<span class="n">publish</span> <span class="o">=</span> <span class="kc">true</span>
<span class="n">pkg</span> <span class="o">{</span>
<span class="n">repo</span> <span class="o">=</span> <span class="s1">'maven'</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">projectName</span>
<span class="n">licenses</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'Apache-2.0'</span><span class="o">]</span>
<span class="n">vcsUrl</span> <span class="o">=</span> <span class="s1">'https://github.com/batterseapower/mdbi.git'</span>
<span class="n">version</span> <span class="o">{</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">project</span><span class="o">.</span><span class="na">version</span>
<span class="n">desc</span> <span class="o">=</span> <span class="n">projectDescription</span>
<span class="n">released</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Date</span><span class="o">()</span>
<span class="n">mavenCentralSync</span> <span class="o">{</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">System</span><span class="o">.</span><span class="na">getenv</span><span class="o">().</span><span class="na">getOrDefault</span><span class="o">(</span><span class="s1">'SONATYPE_USER'</span><span class="o">,</span> <span class="n">nexusUsername</span><span class="o">)</span>
<span class="n">password</span> <span class="o">=</span> <span class="n">System</span><span class="o">.</span><span class="na">getenv</span><span class="o">().</span><span class="na">getOrDefault</span><span class="o">(</span><span class="s1">'SONATYPE_PASSWORD'</span><span class="o">,</span> <span class="n">nexusPassword</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="n">configurations</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'archives'</span><span class="o">]</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>We do this conditionally because we still want people to be able to use
the Gradle file even if they don’t have your a username and password set
up. In order to make these credentials available to the script when run
on your machine, you need to create a <code class="highlighter-rouge">~/.gradle/gradle.properties</code> file
with contents like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># These 3 are optional: they'll be needed if you ever use the nexus plugin with 'sign = true' (the default)
signing.keyId=
signing.password=
signing.secretKeyRingFile=
nexusUsername=
nexusPassword=
bintrayUsername=
bintrayApiKey=
</code></pre></div></div>
<p>You can see the complete, commented, Gradle file that I’m using in my
project <a href="https://github.com/batterseapower/mdbi/blob/46a0ea7a09b312dfae6b5cd0997f3703ed28a28c/build.gradle">on
Github</a>.</p>
<h1 id="your-first-release">Your first release</h1>
<p>We should now be ready to go (assuming your Sonatype OSS and JCenter
setup requests have been approved). Let’s make a release! Go to the
terminal and type:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git tag v1.0.0
gradle bintrayUpload
</code></pre></div></div>
<p>If everything works, you’ll get a <code class="highlighter-rouge">BUILD SUCCESSFUL</code> message after a
minute or so. Your new version should be visible on the Bintray package
page (and JCenter) immediately, and will appear on Maven Central shortly
afterwards.</p>
<p>If you want to go the whole hog and have your continuous integration
(e.g. the excellent <a href="http://travis-ci.org">Travis</a>) make these automatic
deploys after every passing build, <a href="http://szimano.org/automatic-deployments-to-jfrog-oss-and-bintrayjcentermaven-central-via-travis-ci-from-sbt/">this guide for
SBT</a>
looks useful. However, I didn’t go this route so I can’t say it how it
could be adapted for Gradle.</p>
<p>A nice benefit of publishing to Maven Central is that
<a href="http://www.javadoc.io">javadoc.io</a> will host your docs for free totally
automatically. Check it out!</p>
<p>Overall I found the process of painlessly publishing Java open source to
Maven Central needlessly confusing, with many more moving parts than I
was expecting. The periods of waiting for 3rd parties to approve my
project were also a little frustrating, though it fairness the
turnaround time was quite impressive given that they were doing the work
for free. Hopefully this guide will help make the process a little less
frustrating for other Gradle users in the future.</p>MaxI recently released my first open source library for Java, MDBI. I learnt a lot about the Java open-source ecosystem as part of this process, and this blog summarises that in the hope that it will be useful to others. Specifically, the post will explain how to set up a project using the modern Gradle build system to build code and deploy it to the standard Maven Central repository from the command line really easily.Max’s DataBase Interface2015-11-22T21:41:17+00:002015-11-22T21:41:17+00:00http://blog.omega-prime.co.uk/2015/11/22/maxs-database-interface<p>It became necessary to write a Java database access library, since the ones that were available were somehow unsatisfactory. It’s called MDBI, and I’ve written more about it on the <a href="http://batterseapower.github.io/mdbi/">Github page</a>.</p>MaxIt became necessary to write a Java database access library, since the ones that were available were somehow unsatisfactory. It’s called MDBI, and I’ve written more about it on the Github page.Beware: java.nio.file.WatchService is subtly broken on Linux2015-11-14T11:06:41+00:002015-11-14T11:06:41+00:00http://blog.omega-prime.co.uk/2015/11/14/beware-java-nio-file-watchservice-is-subtly-broken-on-linux<p>This blog describes a bug that I reported to Oracle a month or so ago but still doesn’t seem to have made it’s way through to the official tracker.</p>
<p>The problem is that on Linux, file system events that should be being delivered by <a href="http://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchService.html">WatchService</a> events can be silently discarded or be delivered against the wrong <a href="http://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchKey.html">WatchKey</a>. So for example, it’s possible to <a href="http://docs.oracle.com/javase/7/docs/api/java/nio/file/Path.html#register(java.nio.file.WatchService,%20java.nio.file.WatchEvent.Kind...)">register</a> two directories, A and B, with a WatchService waiting for ENTRY_CREATE events, then create a file A/C but get an event with the WatchKey for B and <a href="http://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchEvent.html#context()">WatchEvent.context</a> C.</p>
<p>The reason for this is a bug in the JDK’s <a href="https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/solaris/classes/sun/nio/fs/LinuxWatchService.java#L314">LinuxWatchService</a>. This class wraps an <a href="http://man7.org/linux/man-pages/man7/inotify.7.html">inotify</a> instance, and also a thread that spins using poll to wait for either for:</p>
<ul>
<li>A file system event to be delivered on the inotify FD, or</li>
<li>A byte to arrive on a FD corresponding to a pipe which is owned by the LinuxWatchService</li>
</ul>
<p>Whenever a registration request is made by the user of the LinuxWatchService, the request is enqueued and then a single byte is written to the other end of this pipe to wake up the background thread, which will then make the actual registration with the kernel.</p>
<p>The core loop of this background thread is where the bug lies. The loop body looks like this:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// wait for close or inotify event</span>
<span class="n">nReady</span> <span class="o">=</span> <span class="n">poll</span><span class="o">(</span><span class="n">ifd</span><span class="o">,</span> <span class="n">socketpair</span><span class="o">[</span><span class="mi">0</span><span class="o">]);</span>
<span class="c1">// read from inotify</span>
<span class="k">try</span> <span class="o">{</span>
<span class="n">bytesRead</span> <span class="o">=</span> <span class="n">read</span><span class="o">(</span><span class="n">ifd</span><span class="o">,</span> <span class="n">address</span><span class="o">,</span> <span class="n">BUFFER_SIZE</span><span class="o">);</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="n">UnixException</span> <span class="n">x</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">x</span><span class="o">.</span><span class="na">errno</span><span class="o">()</span> <span class="o">!=</span> <span class="n">EAGAIN</span><span class="o">)</span>
<span class="k">throw</span> <span class="n">x</span><span class="o">;</span>
<span class="n">bytesRead</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="o">}</span>
<span class="c1">// process any pending requests</span>
<span class="k">if</span> <span class="o">((</span><span class="n">nReady</span> <span class="o">></span> <span class="mi">1</span><span class="o">)</span> <span class="o">||</span> <span class="o">(</span><span class="n">nReady</span> <span class="o">==</span> <span class="mi">1</span> <span class="o">&&</span> <span class="n">bytesRead</span> <span class="o">==</span> <span class="mi">0</span><span class="o">))</span> <span class="o">{</span>
<span class="k">try</span> <span class="o">{</span>
<span class="n">read</span><span class="o">(</span><span class="n">socketpair</span><span class="o">[</span><span class="mi">0</span><span class="o">],</span> <span class="n">address</span><span class="o">,</span> <span class="n">BUFFER_SIZE</span><span class="o">);</span>
<span class="kt">boolean</span> <span class="n">shutdown</span> <span class="o">=</span> <span class="n">processRequests</span><span class="o">();</span>
<span class="k">if</span> <span class="o">(</span><span class="n">shutdown</span><span class="o">)</span>
<span class="k">break</span><span class="o">;</span>
<span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="n">UnixException</span> <span class="n">x</span><span class="o">)</span> <span class="o">{</span>
<span class="k">if</span> <span class="o">(</span><span class="n">x</span><span class="o">.</span><span class="na">errno</span><span class="o">()</span> <span class="o">!=</span> <span class="n">UnixConstants</span><span class="o">.</span><span class="na">EAGAIN</span><span class="o">)</span>
<span class="k">throw</span> <span class="n">x</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="c1">// iterate over buffer to decode events</span>
<span class="kt">int</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="k">while</span> <span class="o">(</span><span class="n">offset</span> <span class="o"><</span> <span class="n">bytesRead</span><span class="o">)</span> <span class="o">{</span>
<span class="kt">long</span> <span class="n">event</span> <span class="o">=</span> <span class="n">address</span> <span class="o">+</span> <span class="n">offset</span><span class="o">;</span>
<span class="kt">int</span> <span class="n">wd</span> <span class="o">=</span> <span class="n">unsafe</span><span class="o">.</span><span class="na">getInt</span><span class="o">(</span><span class="n">event</span> <span class="o">+</span> <span class="n">OFFSETOF_WD</span><span class="o">);</span>
<span class="kt">int</span> <span class="n">mask</span> <span class="o">=</span> <span class="n">unsafe</span><span class="o">.</span><span class="na">getInt</span><span class="o">(</span><span class="n">event</span> <span class="o">+</span> <span class="n">OFFSETOF_MASK</span><span class="o">);</span>
<span class="kt">int</span> <span class="n">len</span> <span class="o">=</span> <span class="n">unsafe</span><span class="o">.</span><span class="na">getInt</span><span class="o">(</span><span class="n">event</span> <span class="o">+</span> <span class="n">OFFSETOF_LEN</span><span class="o">);</span>
<span class="c1">// Omitted: the code that actually does something with the inotify event</span>
<span class="o">}</span></code></pre></figure>
<p>The issue is that two read calls are made by this body – once with the inotify FD ifd, and once with the pipe FD socketpair[0]. If data happens to be available both via the pipe and via inotify, then the read from the pipe will corrupt the first few bytes of the inotify event stream! As it happens, the first few bytes of an event denote which watch descriptor the event is for, and so the issue usually manifests as an event being delivered against the wrong directory (or, if the resulting watch descriptor is not actually valid, the event being ignored entirely).</p>
<p>Note that this issue can only occur if you are registering watches while simultaneously receiving events. If your program just sets up some watches at startup and then never registers/cancels watches again you probably won’t be affected. This, plus the fact that it is only triggered by registration requests and events arriving very close together, is probably why this bug has gone undetected since the very first release of the WatchService code.</p>
<p>I’ve worked around this myself by using the inotify API directly via <a href="https://en.wikipedia.org/wiki/Java_Native_Access">JNA</a>. This reimplementation also let me solve a unrelated WatchService <a href="https://bugs.openjdk.java.net/browse/JDK-7057783">“feature”</a>, which is that <a href="http://docs.oracle.com/javase/7/docs/api/java/nio/file/WatchKey.html#watchable()">WatchKey.watchable</a> can point to the wrong path in the event that a directory is renamed. So if you create a directory A, start watching it for EVENT_CREATE events, rename the directory to B, and then create a file B/C the WatchKey.watchable you get from the WatchService will be A rather than B, so naive code will derive the incorrect full path A/C for the new file.</p>
<p>In my implementation, a WatchKey is invalidated if the directory is watches is renamed, so a user of the class has the opportunity to reregister the new path with the correct WatchKey.watchable if they so desire. I think this is much saner behaviour!</p>MaxThis blog describes a bug that I reported to Oracle a month or so ago but still doesn’t seem to have made it’s way through to the official tracker.