A bed forecasting app for the NHS

Woohoo (and thanks to the NIHR and NHS-X!). Super chuffed to have won support from NHS-X in the AI in Health and Care Award scheme.

Here’s the plain English summary from the grant:

More than 1000 patients have elective surgery cancelled every week in the NHS, causing anxiety for patients and families.(D Wong, 2017) Cancellations are upsetting, inefficient, may cause harm, delay treatment, and let slip therapeutic opportunities. Bed capacity is the primary reason for these cancellations. Yet bed capacity fluctuates. Operational efficiency depends on an accurate view of near-future demand, but this is challenging because a hospital is a complex system with many interdepartmental flows, and individual emergencies seem unpredictable.

This challenge can be met by operational modelling combined with Artificial Intelligence. In 2012, we built a demand forecasting solution that supports operational teams to reschedule elective surgery and reallocate resources to avoid last minute cancellations and bed shortages. We deployed and customised this for a single ward: a  cardio-thoracic critical care unit at Great Ormond Street Hospital.(C Pagel, 2017).

However, that technology could not be scaled, nor readily adapted to work in different institutions. In 2018-19, in partnership, with ‘INFORM’ (a translational data science team at University College Hospital), we generalised the underpinning mathematics of the solution and built a prototype that could scale.

Our approach is distinct from most bed modelling approaches in that we deliver forecasts that are local (ward level), and near future (over the next week). Unlike most other bed forecasting models, we predict future demand not future bed utilisation. Bed utilisation is best thought of as demand already mitigated by actual supply. More simply, we are interested in bed demand before not after cancellations have occurred.

This creates a window for local teams to better use their existing bed capacity by flexing staffing levels, or rescheduling surgical operating lists. More importantly, local teams are enabled to innovate: to find local solutions to predicted bottlenecks and to better use their existing bed capacity.

We now seek support to

  1. Upgrade the AI component of the model so that it can learn from a wider range of patient clinical characteristics (lab results, clinical history, vital signs etc.)
  2. To extend the mathematics to include staffing constraints that must also influence bed availability.
  3. To encapsulate the mathematical model in a software application that is resilient and ‘connectable’ to hospitals across the NHS (‘interoperability’). Our application would enable both hospitals using predominantly paper notes (most nonetheless have an electronic patient booking system) and hospitals that are already fully paperless.
  4. To test the application with clinical and operational teams so that it is reliable, easy and safe to use.

High quality local bed forecasts have the potential to allow the NHS to run at higher capacity, safely and efficiently. This can reduce costs and waste, reduce short-notice cancellations, and ultimately reduce anxiety and suffering for our patients.

Coronavirus and race

Coronavirus and race

Doing some thinking about whether or not we can estimate the effect of ethnicity on outcomes for coronavirus. There’s a nice discussion on the use of race/ethnicity and causal inference on CrossValidated.

The key message is that because race is not modifiable then it doesn’t make sense to think of it in terms of a causal pathway. Causal effects imply an action: a drug treats a treats a disease. Giving that same drug to another person, one would expect a similar effect. We cannot ‘give’ race to another person, so we can’t say that race causes a disease. In the same way, we cannot make someone older, or female, or blonde. These are not factors that we control, and therefore they do not cause an outcome.

This does not mean that that outcomes cannot differ for different racial groups. Here we are interested in the predictive effect not the causal effect. And here the challenge is distinguishing whether race is mediated via genetics or via its social and economic consequences.

With respect to COVID19, the specific question of interest is whether there is a genetic predisposition to worse outcomes amongst BAME (Black, Asian and Minority Ethnic groups). Or whether this is entirely mediated through the associated social and economic factors.

Also see

VanderWeele TJ, Robinson WR (2014) On the causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology 25: 473-484.

Folder templates for data science projects

Folder templates for data science projects

There are arguably three pieces of computing jargon and architecture
that we need to teach to avoid a lot of pain later: paths, folders or
directories, and environments. None of these are interesting in and of
themselves but a small investment now will pay dividends. This post
talks about how to organise your files and folders. And unlike MacOS,
iPads, and iPhones: we are keen that you see the nuts and bolts of how
these things work.

  1. Use relative paths (including for symlinks) so that your entire
    project structure is transportable bash ln -s data ./code/data To
    load data within an R script
    r mydata <- readRDS('data/mydata.RDS')
  2. separate inputs and outputs
    • specifically separate and protect the input data
  3. write notes (readme.md)
  4. define what needs to go under version control
  5. think about using a build tool … make, doit, …

A template project structure.

├── code
│   ├── _config
│   ├── a0utils
│   ├── a1prep
│   ├── a2analysis
│   ├── data -> ../data
│   ├── data-raw -> ../data-raw
│   ├── figures -> ../figures
│   ├── labbook
│   │   └── CCYY-MM-DD.py
│   ├── library
│   ├── readme.md
│   └── tables -> ../tables
├── data
├── data-raw
├── figures
├── filing
├── readme.md
├── tables
├── tmp
├── todo.todo
└── write
    ├── manuscript.md
    └── readme.md

When creating the symlinks, use relative paths so that you can recreate
the template and move it as you wish. See below.

# from the project root
$cd code
$ln -s ../data data

Put just the ./code directory under version control. Never write to

For an alternative approach, see Templates for reproducible research

which goes much further and splits directories formally using a build
tool (waf).


Sidenotes: Footnotes and Marginal Notes

One of the most distinctive features of Tufte’s style is his extensive use of sidenotes. Sidenotes are like footnotes, except they don’t force the reader to jump their eye to the bottom of the page, but instead display off to the side in the margin. Perhaps you have noticed their use in this document already. You are very astute.
This is a side note. Notice there is a number preceding the note.

Sidenotes are a great example of the web not being like print. On sufficiently large viewports, Tufte CSS uses the margin for sidenotes, margin notes, and small figures. On smaller viewports, elements that would go in the margin are hidden until the user toggles them into view. The goal is to present related but not necessary information such as asides or citations as close as possible to the text that references them. At the same time, this secondary information should stay out of the way of the eye, not interfering with the progression of ideas in the main text.

Sidenotes consist of two elements: a superscript reference number that goes inline with the text, and a sidenote with content. To add the former, just put a label and dummy checkbox into the text where you want the reference to go, like so:

You must manually assign a reference id to each side or margin note, replacing “sn-demo” in the for and the id attribute values with an appropriate descriptor. It is useful to use prefixes like sn- for sidenotes and mn- for margin notes.

Immediately adjacent to that sidenote reference in the main text goes the sidenote content itself, in a span with class sidenote. This tag is also inserted directly in the middle of the body text, but is either pushed into the margin or hidden by default. Make sure to position your sidenotes correctly by keeping the sidenote-number label close to the sidenote itself.

If you want a sidenote without footnote-style numberings, then you want a margin note. This is a margin note. Notice there isn’t a number preceding the note. On large screens, a margin note is just a sidenote that omits the reference number. This lessens the distracting effect taking away from the flow of the main text, but can increase the cognitive load of matching a margin note to its referent text. However, on small screens, a margin note is like a sidenote except its viewability-toggle is a symbol rather than a reference number. This document currently uses the symbol ⊕ (⊕), but it’s up to you.

Margin notes are created just like sidenotes, but with the marginnote class for the content and the margin-toggle class for the label and dummy checkbox. For instance, here is the code for the margin note used in the previous paragraph:

This is a margin note. Notice there isn’t a number preceding the note.

Figures in the margin are created as margin notes, as demonstrated in the next section.

The end of history?

A hundred years later the RCT may seem like the end of this history.1However, in critical care we are more aware than most that this would be a poor ending. Clinical trials have been notoriously fruitless in our field, and despite much promising pre-clinical work, this has been especially true in sepsis research.[Riedemann:2003] The main problem is that the delivery of a clinical trial is akin to measuring the meridian line. These are expensive juggernauts that can only ask one question at a time. Where the answer is subtle then the funds to power the trial machine will be exhausted before a small difference is detected.

There are new strategies that aim to make the clinical trial more agile2 However much we supercharge the randomised trial, it will never be able to keep pace with our need to understand the universe of clinical medicine. If big data is going to be the answer to this then it must show itself deserving of the trust that we place in an RCT. Google and friends are telling us that this will be machine learning and artificial intelligence. However if the diet of machine learning is big data, then we are likely to be disappointed. Methods which learn from data do not alone produce theory. Mendelian inheritance, the structure of the double helix, and the general theory of relativity were not problems with data waiting for machine learning to solve. Yes, it is possible that we could feed IBM’s Watson the position of the stars as documented by the ancients. Watson would likely do a good job of recognising that certain spots of light, the planets3, did not move in the same way as others. But to expect that from this Watson would suggest gravity, the Copernican universe, and Newton’s laws of motion is magical thinking.

  1. The End of History is a 1992 essay by Francis Fukuyama that argued that Western liberal democracy would be the final endpoint social and political development. A quarter of century later this claim seems rather premature. 

  2. This includes both platform trials, and now REMAP (Randomized Embedded Multifactorial Adaptive Platform). Here new treatment options are continuously added and removed, as they are discovered and assessed, and the randomisation is embedded in health care delivery. The EHR can even provide the realtime data collection and feedback loop."Angus:2015jw" 

  3. from the Greek, wanderers, because their position relative to the other stars was not constant. 

50 Years of Data Science …

50 Years of Data Science: Journal of Computational and Graphical Statistics: Vol 26, No 4:

This paper is a great find. Not the least because the argument (statistics versus data science) was already in full swing 50 years ago.

I have no problem with predictive modelling, but it is a different task. And it does seem that the emphasis on ML has obscured the value (in a pendulum swing from the days of Tukey) on importance of understanding the generative model. From Donoho …

Predictive modeling is effectively silent about the underlying mechanism generating the data, and allows for many different predictive algorithms, preferring to discuss only accuracy of prediction made by different algorithm on various datasets. The relatively recent discipline of machine learning, often sitting within computer science departments, is identified by Breiman as the epicenter of the predictive modeling culture.

I like to think that our lab is pulling hard at the pendulum. That we care massively about the underlying mechanism. That for me is the ‘science’ in ‘data science’. Science because when right it tells us something about how the world works, not just how it will be. The difference between a super accurate weather forecast, and understanding the principles of the atmosphere and the climate. None of that devalues predictive modelling, but these are separate activities.

Abandoning statistical significance is both sensible and practical …

Abandoning statistical significance is both sensible and practical « Statistical Modeling, Causal Inference, and Social Science:

The replication crisis in science is not the product of the publication of unreliable findings. The publication of unreliable findings is unavoidable: as the saying goes, if we knew what we were doing, it would not be called research. Rather, the replication crisis has arisen because unreliable findings are presented as reliable.

A manifesto for our lab?

This eloquent exposition of why clinicians are necessary to data science feels like a manifesto for the lab.

We argue that a failure to adequately describe the role of subject-matter expert knowledge in data analysis is a source of widespread misunderstandings about data science. Specifically, causal analyses typically require not only good data and algorithms, but also domain expert knowledge.

And a general critique of ML as a method to improve health

A goal of data science is to help make better decisions. For example, in health settings, the goal is to help decision-makers—patients, clinicians, policy-makers, public health officers, regulators—decide among several possible strategies. Frequently, the ability of data science to improve decision making is predicated on the basis of its success at prediction. However, the premise that predictive algorithms will lead to better decisions is questionable.

And why the human orrery is a dangerous myth

the distinction between prediction and causal inference (counterfactual prediction) becomes unnecessary for decision making when the relevant expert knowledge can be readily encoded and incorporated into the algorithms. For example, a purely predictive algorithm that learns to play Go can perfectly predict the counterfactual state of the game under different moves, and a predictive algorithm that learns to drive a car can accurately predict the counterfactual state of the car if, say, the brakes are not operated. Because these systems are governed by a set of known game rules (in the case of games like Go) or physical laws with some stochastic components (in the case of engineering applications like self-driving cars),

Or more specifically …

…contrary to some computer scientists’ belief, “causal inference” and “reinforcement learning” are not synonyms. Reinforcement learning is a technique that, in some simple settings, leads to sound causal inference. However, reinforcement learning is insufficient for causal inference in complex causal settings

Git tips

I have been using Git now for a couple of years, and have struggled to understand what it does. I get the basic concept (that it keeps a record of the changes I make to my code) but it also sometimes seems to get in the way. I have read Think like (a) Git and The thing about git in the last couple of days and learned a few really useful things.

In no particular order …


  • the idea that git commits can be ‘wasted’ – you don’t need to keep everything or worry about a commit being perfect. Commit if you feel like it.
  • think of branches as save points (via Think like (a) git)

Crafting a committ (or not)

  • in contrast, you can also ‘craft’ a commit: this is the idea behind the staging area (or index), and is nicely covered in The Thing About Git. Here you can almost imagine writing your git commit message before you commit (i.e. I fixed problem X). Then simply add those files (or parts of files — aka ‘hunks’). Adding ‘hunks’ is the task of …
  • git add --patch – this is genius. When you are preparing a commit then you don’t need to commit an entire file. Running git add --patch myfilename.here allows you to run through the diffs in the file and only stage those parts you wish to. You can also think of this as a bit of a backwards solution to the classic git commit --ammend which allows you to add things you forgot to the previous commit.

Git patch options

y - stage this hunk
n - do not stage this hunk
a - stage this and all the remaining hunks in the file
d - do not stage this hunk nor any of the remaining hunks in the file
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks

Selectively applying changes from one branch to another

Common scenario: work in one branch would be useful in another but you don’t want to merge the branches.

If current branch is this_branch and the branch with the changes you want to pull is called that_branch

To pull across a specific commit:

Git cherry pick will pull just a specific commit, but not necessarily a whole file.

To pull across specific file(s):

git checkout that_branch path/to/myfile1 path/to/myfile2

An interactive patch from your current branch.

git checkout this_branch
git checkout -p that_branch


via Jason Rudolph
SO answer

Using Git for writing

  • git diff --word-diff=color … wow! A way to inspect word by word changes in the file. Much better suited to using Git when writing. I should write more about this whole topic! In the meantime, also note
    • type -S while viewing the diff to wrap wrong lines and make everything more readable (via someone45 at StackOverflow … Thanks!).

Visualising your commits

  • a free git visualization tool called GitX
  • the command line version of the above is git log --graph --decorate --oneline

Finding and restoring something you deleted

`git log --diff-filter=D --summary`

then restore it

`git checkout <deleting_commit>^ -- <file_path>`

(via kablamo and Charles Bailey on stackoverflow)

Things I still need to get my head around

  • cherry picking
  • the rebase command


Early notes: imagine a development branch, and a feature branch. While your working on the new feature, you also make (possibly) separate changes to the development branch. You don’t want to destroy the feature (it’s not done yet), but you want these recent changes on the development branch in your feature. Then rebase. ‘Rebasing’ refers to moving the point where your feature branch separated from the development branch forward in time. In fact, you move it forward to the ‘capture’ as many of the recent changes on development as you like.

And if it turns out that there are conflicts between your feature branch and the development branch then you can either manually resolve them or do an ‘interactive’ rebase.

A re-read of Think like (a) Git is in order!