Modelling what might have been in southeast BC and northwest Wasington

Goldilocks and the three metrics

Q Factor formula

How do we measure the “goodness” of an operations scheme for running a model railroad? Well, having a computer program that can easily run studies of a layout over many days with different operating schemes, train timetables, and other parameters, we can generate a lot of statistics from each run and then compare them, but what measures indicate a better or a worse outcome? We have all attended good and not so good operating sessions and can probably list a number of aspects that made one better than the other. The trick is to find a way to quantify at least some of these so that the program can better serve as a planning tool, and we don’t need to impose on our friends while we “try it and see”. There are certainly some aspects that people enjoy about a session that can’t be quantified, but maybe we can find some way to approximate a measure of “goodness”.

Stepping back from calculations for a moment, the main purpose of an operating session is to move cars around a layout to where they should be, through the running of various trains according to some operating scheme. The non train aspects, such as dispatching, train orders, etc., are outside of the scope of this discussion, because they are not simulated by the program. With that in mind, I think that the following characteristics can be accepted as indicative of a good, well run operating session.

  • Industries should have the number of cars they require.
  • A good number of the cars should move at least once.
  • All cars that are ready to move should move.
  • Only specialized cars should be off the layout in storage.
  • Yards, interchange, and other nonrevenue locations should have a minimum number of cars.

In other words, the layout should be well utilized, cars should flow smoothly to where they should be going, and there should be lots of activity to keep the crews busy during the session. Layouts with too few cars in play to provide sufficient interest or those with so many cars plugging up everywhere that hardly anything moves tend to be less satisfying for crews. Some variation is of course desirable, but extremes are not.

So the trick is to find some measures in the program that can capture these ideas so that we can judge if one operating scheme is better than another. The idea is to have a number of metrics about a session that can be combined to produce an overall score. If each one has a maximum value of 1.0, meaning that the aspect is as good as it can be, then we may be able to just multiply them together and still consider 1.0 as the “perfect” result.

Some of these characteristics tend to be at odds with one another. For example, if we add more cars to a layout so that we have a higher fraction of the industries fully served, it may result in cars not flowing through yards as quickly due to congestion and limits. This reduces the overall mobility and can lead to gridlock.

Here are three ideas for what we might measure.

  • Pickup Ratio – The fraction of times when a car is ready to move, and a train should move it, but it does not pick it up due to some limit on length, space ahead, etc. If a car always gets moved when it is ready, then the score would be 1.0.
  • Q Factor – The fraction of cars that are at revenue tracks (industries) and staging, versus those on nonrevenue tracks and in storage. We want the cars to be moving about the layout and not sitting waiting somewhere. Again, 1.0 is best.
  • Revenue Usage – The fraction of the track space at industries that is occupied by cars. Obviously this cannot always be 100 percent, as some industries only receive cars infrequently, but we also don’t want them never more than say half full either.

These three metrics are easy to calculate by the program, and are reported in a summary file for each study run.

To play around with how all of this might look, I plotted these measures for the S&BC for different numbers of cars, and different interchange probabilities. Other posts talk about some of the issues we have had with too many cars clogging up the Grand Forks yard because they are waiting to be interchanged between the two railways. The solution seemed to be to minimize the shipments that require interchange by coercing them to be selected less often, so this was a good test case to see if these metrics would support this idea in a quantifiable way.

The following graph shows the effect of having 40, 60, 70, 80, and 98 cars on the layout, with the train schedules and everything else the same. The probabilities of interchange shipments were not reduced. We would expect that as more cars are added, the revenue track usage would go up, as it is very low with only 40 cars in play, but we might end up with so much congestion that cars start to be unable to move when they should. This in fact is what happens. The green line shows the track usage going up from just over 0.2 to 0.7, meaning that we have a lot more cars delivered to the industries from the additional cars. But, the yellow line shows that there is a price to pay for this, namely congestion, as the rate of car pickup drops from over 0.9 (very good) to about 0.35 (very bad), which means that a lot of the cars that are ready to move are being skipped by trains when they should otherwise be picking them up.

Qfactor graph with interchange at 1.0

The blue line is what I am calling the Qfactor, using the formula shown at the start of this post. It is the fraction of cars at revenue and staging tracks, divided by the total number of cars. It simply indicates what portion of the cars are on the “good” parts of the layout, actively participating in the running of trains, versus those that are mostly sitting idle for one reason or another. In this case it is mostly constant and independent of the number of cars in play because there is extra capacity at the industries that can accept more cars, however a roughly proportionate number of additional cars will also end up on the non-revenue tracks such as yards while they wait longer to be delivered. This is a good example of why we need more than one metric, as there are opposing tends at work.

And finally, the red line is just a combination of the other three, in an attempt to provide a single value. I’m not sure yet how much value this may have. Time will tell.

A second set of studies were then run with the same numbers of cars, but with a much lower probability for the interchange shipments. This should result in far fewer cars waiting in the Grand Forks yard to be interchanged from one train to another and more at the industries. All of the shipments that required interchange between trains were set to a very low probability, so that they would be selected only as a “last resort” if there were no other direct shipments available.

Qfactor graph with interchange at 0.01

This time we have a very different set of curves. The car pickup ratio (yellow) is very high throughout, meaning that cars never have to wait long after they are ready to move, which is what was expected. The industry tracks are never very full at only 30% (green) so that is a concern, and the Q factor (blue) starts out very good, but drops rapidly as more cars are added. A closer examination shows that this is because almost all of the extra cars end up getting stuck in storage and never making it on to the layout. This in turn is because none of the shipments will be exchanged in the yard, so there is nowhere for them to accumulate and be in play on the layout. In short, the trains leaving staging quickly get saturated by cars freely going between staging and the industries, with none sitting around waiting. The flat portions of the yellow and green lines shows that the extra cars have no effect on them. This would probably be considered a good operating session with freely flowing cars, so long as it was acceptable to have low utilization of the industries. One conclusion from this is that the “natural” number of cars for this operating scheme is somewhere between 40 and 60 cars. The way to have more cars on the layout is probably to run more trains so that there is more capacity out of staging to feed the layout. This might be worth trying sometime.

So, in summary, I think these three metrics give a good insight into the nature of an operating session simulation. I expect to improve my understanding of their implications as more experience is gained, but for now, they should serve well for planning, comparing, and contrasting different operational schemes. Oh, and what does Goldilocks have to do with this? Nothing at all.