We use cookies to provide our services and for analytics and marketing. You can find out more about our use of cookies here. By continuing to browse our website, you agree to our use of cookies.

Time series data analytics: Steam Engine edition

Time series data analytics: Steam Engine edition

A digital display and real time predictive analytics for a model steam engine.

02 December 2020 - D.K.E. Green

Category: Technical

Where we will end up!

Much of my work has been on 'machine learning for engineering applications'. I decided that I could demonstrate different aspects of my work on a hobby sized project.

This post shows how I modified a Wilesco D222 to use a digital read out.

Further, the digital readout predicts the behaviour of the steam engine up to 90 seconds into the future using an on-line Vector Autoregression model and a particle filter.

Wilesco D222
The Wilesco D222 model steam engine. Image from here.

The steam engine in operation.

Steam engine and screen
The final product, steam engine and digital display.

Screen close up
A close up of the display screen.

Under the hood, this uses an InfluxDB database, a RabbitMQ message queue and a nodejs webserver. Keep reading to find out how I got there and how it all works.

Industrial revolution

There is a saying that you should 'find a job you love, then you will never work a day in your life'. All well and good, and I live by that as much as I can. However there is a flip side. If your hobbies are your work, then what do you do for fun? When the sense of fun goes out of your life you suffer and so do the people around you.

Due to Covid, my office was, naturally, closed. During this time, I was doing a bit of cleaning and saw the Wilesco D222 in its box. It was being temporarily stored at my place. I had the sudden urge to do something with this thing, instead of letting it languish.

The steam engine had been obtained as a part of a technology demonstration project. At the time, the team I was on was working on data from large gas turbine engines. For that project, we were taking gas turbine performance data and using it to from predictions about when maintenance would be required. Later in that project, the team wanted to do predictive data analytics close to the failure state of the engines. It might not surprise you that we were not allowed to induce failure conditions in these giant, real world gas turbines.

A large gas turbine
A large gas turbine. Image courtesy of imgur.

Enter the model steam engine. The steam engine came by mail. As this was a controlled model, the team could induce controlled failures in the engine and analyse the resultant behaviour. I set up a digital sensor suite. For the steam engine, the pressure was the key value we needed to monitor. The manufacturer specification says to keep it under 1.5 bar. We took that as the failure state of interest. I used a particle filter to predict the future behaviour of the steam engine. Fun stuff, but projects like that never feel complete. There is always more that can be done.

How will this work?

Back in the present, I see the D222 sitting there and decide to implement real time tracking, with a digital sensor display. At the time I didn't know why the urge took me, but the new project had begun. I figured it would at least make for an interesting post, so off I went.

I needed a plan, and plans mean understanding your constraints. I would only use components that I already had lying around. Since I had worked on this engine in the past, I already had the pressure sensor required for this to really work. The pressure sensor is specialised equipment and pretty pricey, so it's not something one typically has in a box somewhere. Other solutions are possible, however, if you don't want to just buy an all in one package.

As this was a 'just for fun' project, the implementation would need to be fast. I set myself the goal of having everything done in 2 days, plus one day for writing this post. This 2 day limit would not count the work I had done in the past, just the time to get everything together for this post. The lead time for the steam engine was something like 8 weeks when we first ordered it! Still, a few days seemed reasonable for what I already had.

On the software side, I would stick to the simple and fast implementations of what I needed. I didn't have time to deal with build, install or auth problems. I had already written some for the steam engine in the past, so I could make use of what had already been done. That might mean some compromises on the software tools, but hopefully nothing too serious.

The prediction engine would use the technology I had developed earlier to infer a forward time dynamical system model in real time. The prediction engine would then run a particle filter forward in time to collect summary statistics estimating the future behaviour of the system.

Overview of the plan

The steam engine pressure and temperature would be read in real time by a dedicated server. I decided to add an extra complication. The steam engine can transmit power to a dynamo, and this dynamo can power an LED. To measure this, I would build a phototransistor circuit and measure the voltage of the circuit with an Arduino. Sure, I could just measure the voltage of the dynamo directly, but where is the fun in that?

Wilesco M66 dynamo and light
Wilesco M66 dynamo and light. Image from here

The various sensors would write their values to a time series database for persistent storage. At the same time, a message queue would broadcast the sensor readings. A second server, the prediction server, would listen out for sensor message readings (by subscribing to the message queue) and use these readings to produce predictions for the sensor values. By using a message queue, it would be easy to push sensor readings to the prediction server as soon as the readings were available. This way, the prediction server does not need to poll the database.

A webserver would host a browser-based interface to the sensor data and predictions. The webserver would query the databases, so at least the system would be secure if it was moved to a network at a later date. The interface itself would run on a Raspberry Pi, independent of the data collection server.

System diagram
The plan for the data collection system.

Hardware list

The main parts required would be:

  • Wilesco D222 model steam engine
  • Wilesco M66 dynamo and light
  • ESI GD4200-USB digital pressure and temperature transducer
  • Arduino Uno
  • Various electronics: a phototransistor, breadboard, wiring and a resistor.
  • Rasperry Pi (a model 3 B that I had spare) and touchscreen.

Other odds and ends would be required such as power cords, usb cords and so on. A Windows PC would also be required to run the DLL for the pressure gauge API.

Software plan

I wanted the data to be stored properly in a database. As I would be working quickly, I would only run local docker images. I didn't have time for network issues. Besides, if I was ever to scale up this project, it would be easy to port a working system to some cloud server.

Even though I use MongoDB quite regularly, it is not the best way to store time series data. I decided to use InfluxDB, an open source dedicated time series database. Getting a local instance of InfluxDB running would be easy. Moreover, InfluxDB is perfectly suited to the type of data I would have, so this was an easy choice to make.

I also needed a message queue system for this to work. I decided that RabbitMQ (a network message broker system) would work well. Although it is not the optimal choice for streaming data (Apache Kafka is a more robust solution), RabbitMQ is easy to get something running quickly. Since I was highly time constrained, I didn't want to get bogged down in difficult installs. RabbitMQ is fast enough for the small number of sensors I would be processing.

The pressure gauge API is proprietary and Windows only. I would read the gauge using a server in C# on a Windows machine. The Arduino data collection would need to be done via serial port to a dedicated server. I would write this server in Python.

The 'prediction server' would also be written in Python. This server would take the incoming data and fit a simple Vector Autoregression prediction model. Since this project was being done so fast, I wouldn't have time to explore more complicated time series prediction models. I have a lot of experience in that area, and there is always a heavy time cost in fitting model parameters. Simple autoregression would have to do! This needed to run in real time, so I had to be careful about how to actually structure the data. When computing things quickly, cache is king, so it's important to think about the layout of the data. I would have to balance code complexity with performance.

Putting it all together

With my plan formulated, I set about making it happen.

My first challenge was the weather. It was raining outside (since this is the UK), so I would have to work inside on my carpet. It is, of course, sunny outside now that I am doing the writing part.

Remember, safety first! I had worked with this exact steam engine in the past and was confident that my indoor setup would be ok. A good-sized cardboard box was enough to catch the little oil sprays that the D222 produces. Not very classy, but good enough for a speed-build. Cardboard isn't ideal since the engine gets quite hot, but was ok for a few short runs. Note that the D222 is electrically powered and does not produce any sort of flame. Do not run a flame powered model steam engine on top of a cardboard box or on carpet! Again, safety first! Don't burn down your home!

Setting up the dynamo

Since I couldn't work outside, I decided to mount the dynamo on a Meccano base. The dynamo didn't quite line up with the holes on the base plate. I had to use a drill to widen the holes on the feet of the dynamo. Not a big deal.

The wide base would allow me to weigh down the base plate. For this, I used a simple 0.5kg barbell weight as it fit nicely in the box. The dynamo was connected to the drive train on the steam engine by a few rubber bands. Wilesco does sell official steel drive chains, but rubber bands worked well enough.

The dynamo, of course, has a polarity that is dependent on the direction that the steam engine is spinning in. The engine can spin in both directions, but the light connected to the dynamo will only work with the correct polarity. I didn't want to make things more complicated by adding any more circuitry than the bare minimum. So, to circumvent the polarity problem, I made sure that I would kick off the steam engine manually to ensure that it rotated the right way.

The dynamo on the base plate
The dynamo on the base plate.

Note the oil sprays in the dynamo base plate image. Not something to get on the carpet! The drive train for the steam engine is visible at the bottom of the image.

Getting RabbitMQ and InfluxDB working

Before installing the sensors, I needed to make sure I had the network infrastructure I needed up and running. I ran the simplest possible docker images to get RabbitMQ and InfluxDB working.

First, InfluxDB. The recommended simplest settings just worked:

docker run -p 8086:8086 -v $PWD:/var/lib/influxdb influxdb

where $PWD is the working directory within the image you want to use.

RabbitMQ was a bit more troublesome. The recommended settings for the docker image didn't run, but with a bit of finesse and googling, things were working fine:

docker run -d --hostname my-rabbit --name some-rabbit -p 5672:5672 -p 15672:15672 rabbitmq:3-management

where some-rabbit is the docker image name you would like to use. Port 5672 is the actual message queue, port 15672 is the management server. You can and should change these settings for your use case.

Although these are very simple settings with no auth enabled, it was enough for my purposes. If you were running servers for these databases in production, you would need proper user and password settings!

With these docker images up, I could proceed to the next part of the project.

Getting the Arduino working

With the dynamo hooked up to the LED, I needed to get the phototransistor readings from the Arduino out to the message queue and database.

First thing was to get the Arduino working.

Arduino ready to go
Arduino ready to go.

Arduino circuit diagram
The circuit diagram for the light level sensor. Note that I set R1 to 10kΩ.

Sticking with my classy speed-build aesthetic, I cable tied everything to a piece of wood that came with some grapes that I bought from the supermarket. Mostly just to keep everything off the carpet and in place. The circuit was simple to build and is shown below.

Light level sensor breadboard
Rapidly done wiring for the light level sensor. Also, more exciting shots of my carpet.

Note that the phototransistor is close to the light source. It would work better to cover both (to avoid external interference), but I wanted something I could see for this project.

The Arduino code is dead simple. It simply reads the light level in a loop, then writes the value out in a JSON format. This formatting is not strictly necessary, but I like using some consistent protocol. JSON isn't ideal for serial, but again, the time limit I gave myself meant that I had to move quickly, rather than use a dedicated binary format.

const int readSensorPin = A0;
int sensorValue = 0;

void setup() {

void loop() {
  sensorValue = analogRead(readSensorPin);
  Serial.print("{ \"lightV\": ");

The Arduino itself was connected via usb to a PC. I put together some Python code to read the serial port (simulated on the usb drive) and send the data to Influx and RabbitMQ. The serial port was read with pySerial. To talk to RabbitMQ, I used the Python pika bindings. Finally, to talk to InfluxDB, I used the influx Python bindings. All of these libraries can be installed easily using pip.

I started recording the light sensor data. I quickly realised that I was accumulating way more data than I really needed for this project. I revised the light sensor server to average out the readings over a one second period before it sends any data over the network. I didn't compute a full running average, instead the averages are over a set window. These details are important depending on the statistical application you have in mind. For my purposes, the simpler method was fine. The reduced data load was also ok.

I had to ensure that a consistent time stamp was used. Python's UTC datetime implementation doesn't automatically account for the timezone. This was fixed by importing pytz and using the right timezone by calling datetime.now(tz=pytz.utc). It is important in a project like this to keep the times used to record everything consistent. Sticking to UTC is the best way that I have found. This also happens to work nicely with InfluxDB. Conversion to UTC milliseconds can be done as follows:

import datetime
import pytz


ms = long(datetime.now(tz=pytz.utc).timestamp() * 1000)

Getting the pressure gauge working

Installing the pressure gauge is easy once the right connectors are in place. It simply screws directly into the boiler.

Pressure gauge installation
Pressure gauge installation - note the airbrush connector pieces to convert the gauge thread to the boiler thread diameter.

The pressure gauge doesn't fit directly into the boiler! The gauge I have uses a 1/4” BSP connection. The various Wilesco engines use M5, M6 and M8 threaded connections to the boiler. Back when I first worked on the steam engine, to get everything to fit together, I ended up finding airbrush house adapter connection pieces that fit the bill. Sure, you can manufacture whatever you want, but spending a few pounds on Amazon worked fine for me. I had these parts already, so I could stick to the timeframe.

The pressure gauge has a simple usb connection to a PC. Great, this will be easy you think. Not so. Getting the pressure gauge software working is much trickier than the Arduino code. There is a custom, proprietary API that the manufacturer provides. I won't go into all the detail here, but working with this API is not a very nice experience. The documentation is not particularly clear. It is a bit of a struggle, but it is possible to (eventually) get the right values out of the gauge. Luckily for me though, I had already spent many hours figuring this out months ago. Does this really count as a 2 day build? You decide.

I used C# to interface with the API DLL for the pressure gauge.

As with the Python server, with the data in hand, I could send the numbers over the network to InfluxDB and via RabbitMQ without too much hassle.

Writing the prediction server

With the actual data able to be read, my next task was the prediction server. The pressure gauge and the light sensor (average) readings were set to be approximately once per second. Given the rate at which the values changed, I figured that I could look ahead around 90 seconds from the current data values and predict the future outcomes (assuming the current trajectory was maintained).

The prediction server listens to the RabbitMQ channels, so it can be notified when new data is available. The forward time transition model is assumed to be of the form: Y=CZ Y = CZ

where YY is the set of sensor readings at time tt and ZZ is the set of sensor readings at earlier time steps, as well as a bias vector, i.e.

[ptdtlt]=C[pt1dt1lt1ptndtnltnb1b2b3] \begin{bmatrix} p_t \\ d_t \\ l_t \\ \end{bmatrix} = C \begin{bmatrix} p_{t-1} \\ d_{t-1} \\ l_{t-1} \\ \vdots \\ p_{t-n} \\ d_{t-n} \\ l_{t-n} \\ b_1 \\ b_2 \\ b_3 \\ \end{bmatrix} Where pt,dt,ltp_t, d_t, l_t refer to sensor readings at time tt for the pressure, temperature and light sensors respectively. The values b1,b2,b3b_1, b_2, b_3 are bias values. These values should either be equal to one or zero (enabled or disabled). CC is a matrix of coefficients.

The value nn is the number of time steps back that the autoregressive model should use when computing the future value. This parameter needs to be tuned. Such a parameter can be found with Bayesian model likelihood techniques, but manually tuning was enough for this project. Similarly, whether enabling or disabling bias is better can be inferred statistically, but in the interest of time I just tested different settings until I found something that seemed to work ok.

The task of the prediction server is to compute the coefficients, CC, so that the current sensor readings can be used to predict future values. This is a simplified sort of Vector Autoregression model.

To compute the coefficients, I did the simplest possible thing and just computed the Moore-Penrose pseudo-inverse of ZZ to compute CC: C=YZ C = YZ^\dagger

The full Python code is fairly long, but the guts of the prediction engine is only really two lines:

lagsPinv = np.linalg.pinv(lagValues)
coeffVals = np.dot(resultValues,lagsPinv)

This can be done more efficiently (computation wise), but it is hard to beat these two lines for programmer efficiency! If you lay out the matrices in the right way (sort of Toeplitz matrix over time) you can keep track of which lags belong to which result values nicely.

This isn't the most sophisticated statistical model but I had set myself a very strict time limit and so it would have to suffice. Do you spot the problem here? The senor readings come at different times. The pressure and temperature readings occur at the same time, as they are read by the same server. The light readings come at different times at a different rate. To make my life simple, I decided to compute the prediction coefficients whenever the pressure/temperature readings were updated. When the pressure values are updated, the previous (most up to date) light sensor reading is used. This isn't perfect, but since the sensor rates are quite close together, it was good enough for this project.

Given the prediction coefficients, CC, I needed to compute some summary statistics. To achieve this, every time the prediction coefficients are calculated, I spawn 1000 particles and run them forward for 90 seconds worth of pressure gauge time intervals. The initial particle positions are sampled from a normal distribution about the last sensor reading, using a fixed standard deviation. The standard deviation is a tunable parameter. Although you could compute the error in more detail (see this paper I wrote), this was more than required here. For this project, I simply tuned the standard deviation until it looked ok. This seemed to work well enough. For a production type problem, I would absolutely invest the time in using the proper probability theory.

As with the coefficient computation, the core of the particle filter computation is simple:

for j in range(0,numIntervals):
    updatedPositions = np.dot(coeffVals, particlePositions)
    #shift particle positions down (note, 3 sensor values per 'row')
    particlePositions[3:-3] = particlePositions[0:-6]

    #update top most row with latest positions
    particlePositions[0:3,:] = updatedPositions

I had an issue that the prediction system was generating a huge amount of data. I decided that downsampling the predictions for outputs would suffice, so I computed summary statistics of the particles at ten evenly spaced intervals for the 90 second prediction window. The summary statistics assume that the particles are normally distributed. This is pretty reasonable given that the initial particle distribution is assumed normal and that the vector autoregression model is just a polynomial of the time series values.

Once the prediction server has computed the necessary values, it sends the data to the InfluxDB for storage.

Getting ahead of things just a little, you can see the output of the prediction server in the figure below. The green line is the temperature output (the curved line, the other two, pressure and light are just flat). At the end of the data, the line splits into 3 parts. These are the mean and upper and lower confidence intervals, computed as C.I.95%=μ±1.96σ C.I._{95\%} = \mu \pm 1.96\sigma where μ\mu is the prediction mean and σ\sigma is the prediction standard deviation. Although the normal distribution assumed for the particles isn't perfect, I have found the use of confidence intervals to summarise complicated data works quite well, as long as the empirical distribution being approximated is unimodal. I have written academically on this topic, but the assumptions here are good enough for this project.

Close up of prediction server output
Close up of output from the prediction server.

Of course, the prediction server doesn't really compute the future or anything like that. It simply says 'given the current trajectory, where am I likely to end up'. I had the idea to set some alarm if it was predicted that there was some probability for the pressure to exceed 1.5 bar, but I was running low on time! Something I can leave for another build.

Writing the webserver and setting up the display screen

I thought there would not be too much to say about the interface. It is a Raspberry Pi running a web browser interface. The webserver powering the interface is a barebones node server (so written in javascript). The webserver hosts display screen page html and provides access to the database. The web interface triggers an AJAX request once per second to request the latest data from the server.

The main challenge here is picking a good chart extension that is free. Previously, I have used c3.js, which is based on d3.js. I decided to migrate to chart.js since I had been meaning to give it a try for some time. How hard could it be? Well, not that bad but there were some hiccups. I spent quite a bit of time trying to get the charts to look the way I wanted. There are also issues with resizing charts dynamically. I ended up having to just disable dynamic animation. Otherwise, I liked working with the library. In the end, I got something fit for purpose, but without some of the bells and whistles that I might've liked. I think it is probably possible to get them working, but it's likely quite a bit of effort.

The interface screen
Another close up of the interface screen.

Here is a shot of the interface screen after everything was working well. I realised afterwards that the accessibility isn't great (because of the chart series colours). Something to look out for in the future. This image was taken while I was messing about with the light outputs, so the light data time series is a bit spikey.

I had to pack everything down to quite small so that it would fit on one screen. Not a huge problem, but I wouldn't want to try to fit too much more on to this one screen.

Here goes nothing!

With all the pieces in place, it was time to run the engine for real. Hopefully it wouldn't blow up in my face. I restarted all the docker images, cleared the database and started all the servers. The interface was up. I turned on the power, and…

Turning on the power.

… waited. It takes about 5 minutes to heat up.

Finally, the pressure gauge started to move. I opened the throttle and…

Trying again...

Let there be light!

Lighting up the LED
Lighting up the LED.

Everything (sort of) worked! The model steam engine aficionado will notice that the rubber bands slipped to the wrong part of the drive train when this photo was taken. Kind of annoying, but easily fixable. Especially with the official drive chains. Oh well! Running the LED with the dynamo requires some overthrottling of the engine, so I didn't want to keep the light on for too long either.

The spinning disc visible on the drive train in some of the images, and the small microchip on the blue pad, is an optical encoder. In an earlier project with the engine, I was recording the velocity directly. The chip is disconnected and not being used for this project. I ended up removing the encoder disc after a few steam engine runs (this is why it is not visible in some images).

The digital display tracked the data! The prediction model seemed to work pretty well. I didn't verify it though. To complete this, I'd have to close the loop and check that the model assumptions were ok.

It all worked out in the end.

Overall, a big success!


It was quite fun to get everything put together, even with the tight timeframe. After working on this project, I felt a sense of renewal. I didn't realise it at the start, but this was just what I needed to reinvigorate that sense of joy one feels when completing a project. Perhaps this is why I decided on a whim to put the digital steam engine system monitor together. That what I really needed was to just do a project for fun, rather than work. Maintaining a sense of fun is vital if you want to keep yourself and the people around you happy.

It was a reminder that, despite what is in the news these days, technology does not need to be a burden. It can be a source of creativity, inspiration and fulfilment. It is not technology that produces outcomes, good or bad, it is people. We choose how we feel about the world and how we can respond to the world as it develops. Energy restored, I'm looking forward to the next project!

If you liked this article, please get in contact! You can send an email or chat in real time on discord.