Fake data, part 3: Bypassing the central limit theorem

May 11, 2010

May 2017: This is from my other blog that's no longer online. The original comments are no longer available, but you are welcome to add more.

Now that I have my black swan distribution, and I verified that it fits market returns remarkably well while being mathematically tractable, I want to use it to generate artificial market data.

Generating a series of closing prices is easy.

Select a starting price, take the logarithm
Select a mean μ and standard deviation σ of log returns ln(P₀/P₁) where P₀ is the most recent price and P₁ is the previous price.
Generate a uniformly-distributed random probability p between 0 and 1. Plug it into the inverse black swan distribution (using a=1.6):
$$B^{-1}(p;\mu,s) = \mu - 2 s \, \sinh \left[\frac {\tanh^{-1}(1-2p)} {a} \right]$$
Add the result to a running total.
Go to step 3. Repeat as often as desired.

Then simply calculate the antilog or exponential of the values in the running total to get prices.

This works fine. But what if we want to generate more than just single prices? How do I get a series of daily price bars, with high, low, and closing values for my fake market? The way to mimic real market behavior is to generate a series of n values for each "day" and record the highest, lowest, and final price for high, low, and close of the day. That is, define S_d as a daily step and S_i as an intraday step:

$$S_d = \sum_{i=1}^n S_i$$ where $$S_i = B^{-1}(p,\mu_i,s_i) = \mu_i - 2 s_i \, \sinh \left[\frac {\tanh^{-1}(1-2p)} {a} \right]$$ $$\mu_i = \frac{\mu_d}{n}, \quad s_i = \frac {s_d}{\sqrt n}$$ Black swan random walk sampling every n steps

Black swan random walk sampling every n steps

The scaling of the intraday mean μ_i and standard deviation σ_i (corresponding to the shape parameter s_i) is necessary to preserve the desired mean μ_d and standard deviation σ_d for daily values. As described in the previous part, the approximation for the distribution shape parameter s is:

$$s \approx \frac{\sqrt{6}\,\sigma} {\pi \tanh^{-1} \left(\frac{1}{a}\right) \sqrt{\tanh^{-1} \left(\frac{1}{a}\right)^2+2}}$$ ...using a=1.6. But look at what happened! A disaster. The figure shows the resulting distributions for 1, 10, and 100 intraday steps. Look at the big rounded top made up of green dots representing n=100. As the number of intraday steps n grows, the resulting distribution of daily values approaches a normal distribution.

I'm back where I started! The whole point was to find something better than a normal distribution to model the markets. I don't want to end up with a normal distribution, I want to end up with my black swan distribution.

So what happened? The central limit theorem, evidently the "sovereign law" of probability theory, came in and took over. It says that the sum of many random variables with a finite mean and variance will be normally distributed, regardless of the underlying distribution we start with.

Although the black swan distribution has infinite kurtosis, it does have a finite mean and variance. This means, if we generate a bunch of tiny black swan steps to build large "daily" steps in a random walk, the large steps will approximate a normal distribution.

How can I break this law?

After much experimentation, I found a way to get past the central limit theorem. If we perturb μ_d to vary black-swanly with each daily step (not each intraday step), we get values that still appear distributed according to black swan, and so are the sums! $$\large \varepsilon = \begin {cases} \frac{ B^{-1}(p,0,s_d)}{\sqrt{n}\sqrt{n+1}} = \frac{\mu_d - 2 s_d \, \sinh \left[\frac {\tanh^{-1}(1-2p)}{a} \right]}{\sqrt{n}\sqrt{n+1}} & n>1 \\ 0 & n=1 \end {cases}$$ Re-generate ε once per day, and use that value for all intraday steps (if you generate a new ε for each intraday step, you end up with a normal distribution again). For each intraday step S_i, use these values of μ_i and s_i: $$\mu_i = \frac{\mu_d}{n}+\varepsilon, \quad s_i = \frac {s_d}{n}$$ For large n, the adjustment to μ_d basically perturbs the mean each day by a small black swan distribution having a standard deviation of σ/n. For n>50 or so, one can simply use n by itself in the denominator of the expression for ε. Again, p is a random probability between 0 and 1.

Here's how it worked out:

That seems to work. I can now generate an artificial series of high-low-close prices that have similar statistical properties to actual markets. Now I must investigate whether any dependencies exist between successive values in the series.

I want to mention that the plots in this article use the Rnd() function in Excel's Visual Basic. The Visual Basic random number generator is fairly crude by modern standards, which may explain the noisy tails of the distributions shown (using 16,000 samples) versus the relatively cleaner tails of the distributions measured from actual market data (using less samples). I can't know for sure unless I try a better random number generator, but I'm not inclined to do so now — I'm satisfied with these results.

Search This Blog

NABLU

Fake data, part 3: Bypassing the central limit theorem

Comments

Post a Comment

Popular posts from this blog

Syncing Office 365 Outlook to Google calendar using Power Automate

Whose hands are biggest? You may be surprised.

Elliptical-blade NACA airfoil propeller