Generating and plotting stock or crypto data with Matplotlib

We generate high, low, open, close and volume data using a simple method for a fictitious stock or crypto, plot it on a candlestick chart and use a custom font from disk to render the plot.

Context

I wanted a simple method for generating "stock price like" data for a dummy stock or crypto token to use in future projects, so I've created that here and then added some steps for charting it.

Concepts covered in this walkthrough:

  • Generate fictitious stock or crypto price data using a random walk and the numpy normal distribution functionality. The daily prices move by a varying percentage amount each day, and the high and low are derived from those
  • Add two charts to a matplotlib plot using a gridspec, setting the relative size of each
  • Use a font file stored on disk (that is not part of the matplotlib font cache) to render some of the text

Final result:

Generating the data

First we set some dates to work with (I chose the month of January 2023) and a starting price for the stock or token, let's say $100.

The daily_changes is a list of percentage change amounts (which can be positive or negative) to apply, one for each day. Using a numpy random.normal distribution, I set the mean to 0 because I want the % changes to be centred around 0 with an equal chance of increasing or decreasing. I chose a value of 0.03 for the standard deviation meaning that 95% of the values would be expected to fall within +/- 0.09 (i.e. 9%) which seemed a reasonable maximum movement in a day for example data.

Numpy cumprod (cumulative product) is then applied over the start price using the list of daily changes. For example if the start price is $100 and the change amount is 0.05 then we get $100 * 1.05 giving $105. Maybe the next day the change amount is -0.03, and then we get $105 * 0.97 = $101.85, and so on through the list. This simulates a random walk in a straightforward manner since at each 'step' we are moving a small and randomly determined distance from the current step.

Once the daily prices are calculated I then take the open and close prices as a subset of the list. In this case the close price of one day is the open price of the next. In a real stock market or any situation where the market 'closes for business' this assumption won't always be true, because of after-hours trading etc. In a market such as the main crypto exchanges where trading is 'continuous' it will generally follow this pattern.

Finally I generate the high and low price for each day by adding or subtracting from the open/close figures by a small and randomly generated amount. Again, this is a simplifying assumption since in real life it may be that the opening price (or closing price in a downward market) is the same as the 'low' value.

Having calculated the data points I then create a DataFrame and add a date column based on my start and end date.

Plotting a candlestick chart

matplotlib code can get quite verbose but is ultimately fairly straightforward... (Libraries such as seaborn abstract some of this to a higher level and I am planning on introducing more seaborn code here in the future.)

First I define some color constants to make use of later, and declare the shape of the basic plot area. I've made use of gridspec to plot the two baar charts (price and volume) on top of each other in a grid layout (gridspec can also be used to create more complex layouts and I have a future post planned on that). I've chosen the height_ratios to be 3 and 1 here, meaning the first plot will take up more space than the second plot in a 3:1 ratio.

Then I plot the two charts as subplots.

The candlestick chart is made up of two series of data: the main bars which are based on the open and close prices, and are red if the close price was lower than the open price or otherwise green (this is a convention for this type of chart). The vertical lines represent the high and low range for that day.

The volume plot goes in the second 'slot' of the grid layout and is a fairly self explanatory bar plot. The sharex parameter is used to ensure that the x-axis (range etc) is shared between this and the first plot. This makes sense intuitively since the x-axis is a list of dates in a range and the two charts directly relate to each other.

Formatting the chart and using a custom font

Now it is time to do some formatting on the plots. I've removed the borders, amended the x-axis to show all the dates, removed the x-ticks and labels from the prices chart and removed the tick marks from both y-axes. By default the volume was showing in "scientific" notation (e.g. 1e6 rather than 1,000,000) so I've set that back to use a plain style.

For the tick labels, x and y axis labels I used the font Consolas which is installed on my system and visible to the matplotlib font cache.

The plot title is created using a ttf font file that is stored in my Downloads area but not installed to the PC or visible to matplotlib. To make use of a file like that we can create a FontProperties object of matplotlib.FontManager with a reference to the font file, and then use that as the fontproperties to be passed to the plot, in this case for creating the overall title using the Playfair Display Medium font. Of course an installed font could be used here in the same way as the tick labels; I just wanted to demonstrate making use of this without installing it to my system.

Jupyter notebook

The complete Jupyter notebook for the above can be found here (Github Gist). As mentioned above, I made use of the installed Consolas font and downloaded the Playfair font to a local drive, so this will need amending if other fonts are to be used.