Doing some thinking about whether or not we can estimate the effect of ethnicity on outcomes for coronavirus. There’s a nice discussion on the use of race/ethnicity and causal inference on CrossValidated.
The key message is that because race is not modifiable then it doesn’t make sense to think of it in terms of a causal pathway. Causal effects imply an action: a drug treats a treats a disease. Giving that same drug to another person, one would expect a similar effect. We cannot ‘give’ race to another person, so we can’t say that race causes a disease. In the same way, we cannot make someone older, or female, or blonde. These are not factors that we control, and therefore they do not cause an outcome.
This does not mean that that outcomes cannot differ for different racial groups. Here we are interested in the predictive effect not the causal effect. And here the challenge is distinguishing whether race is mediated via genetics or via its social and economic consequences.
With respect to COVID19, the specific question of interest is whether there is a genetic predisposition to worse outcomes amongst BAME (Black, Asian and Minority Ethnic groups). Or whether this is entirely mediated through the associated social and economic factors.
There are arguably three pieces of computing jargon and architecture
that we need to teach to avoid a lot of pain later: paths, folders or
directories, and environments. None of these are interesting in and of
themselves but a small investment now will pay dividends. This post
talks about how to organise your files and folders. And unlike MacOS,
iPads, and iPhones: we are keen that you see the nuts and bolts of how
these things work.
Use relative paths (including for symlinks) so that your entire
project structure is transportable bash ln -s data ./code/data To
load data within an R script r mydata <- readRDS('data/mydata.RDS')
One of the most distinctive features of Tufte’s style is his extensive use of sidenotes. Sidenotes are like footnotes, except they don’t force the reader to jump their eye to the bottom of the page, but instead display off to the side in the margin. Perhaps you have noticed their use in this document already. You are very astute. This is a side note. Notice there is a number preceding the note.
Sidenotes are a great example of the web not being like print. On sufficiently large viewports, Tufte CSS uses the margin for sidenotes, margin notes, and small figures. On smaller viewports, elements that would go in the margin are hidden until the user toggles them into view. The goal is to present related but not necessary information such as asides or citations as close as possible to the text that references them. At the same time, this secondary information should stay out of the way of the eye, not interfering with the progression of ideas in the main text.
Sidenotes consist of two elements: a superscript reference number that goes inline with the text, and a sidenote with content. To add the former, just put a label and dummy checkbox into the text where you want the reference to go, like so:
You must manually assign a reference id to each side or margin note, replacing “sn-demo” in the for and the id attribute values with an appropriate descriptor. It is useful to use prefixes like sn- for sidenotes and mn- for margin notes.
Immediately adjacent to that sidenote reference in the main text goes the sidenote content itself, in a span with class sidenote. This tag is also inserted directly in the middle of the body text, but is either pushed into the margin or hidden by default. Make sure to position your sidenotes correctly by keeping the sidenote-number label close to the sidenote itself.
If you want a sidenote without footnote-style numberings, then you want a margin note. This is a margin note. Notice there isn’t a number preceding the note. On large screens, a margin note is just a sidenote that omits the reference number. This lessens the distracting effect taking away from the flow of the main text, but can increase the cognitive load of matching a margin note to its referent text. However, on small screens, a margin note is like a sidenote except its viewability-toggle is a symbol rather than a reference number. This document currently uses the symbol ⊕ (⊕), but it’s up to you.
Margin notes are created just like sidenotes, but with the marginnote class for the content and the margin-toggle class for the label and dummy checkbox. For instance, here is the code for the margin note used in the previous paragraph:
This is a margin note. Notice there isn’t a number preceding the note.
Figures in the margin are created as margin notes, as demonstrated in the next section.
A hundred years later the RCT may seem like the end of this history.1However, in critical care we are more aware than most that this would be a poor ending. Clinical trials have been notoriously fruitless in our field, and despite much promising pre-clinical work, this has been especially true in sepsis research.[Riedemann:2003] The main problem is that the delivery of a clinical trial is akin to measuring the meridian line. These are expensive juggernauts that can only ask one question at a time. Where the answer is subtle then the funds to power the trial machine will be exhausted before a small difference is detected.
There are new strategies that aim to make the clinical trial more agile2 However much we supercharge the randomised trial, it will never be able to keep pace with our need to understand the universe of clinical medicine. If big data is going to be the answer to this then it must show itself deserving of the trust that we place in an RCT. Google and friends are telling us that this will be machine learning and artificial intelligence. However if the diet of machine learning is big data, then we are likely to be disappointed. Methods which learn from data do not alone produce theory. Mendelian inheritance, the structure of the double helix, and the general theory of relativity were not problems with data waiting for machine learning to solve. Yes, it is possible that we could feed IBM’s Watson the position of the stars as documented by the ancients. Watson would likely do a good job of recognising that certain spots of light, the planets3, did not move in the same way as others. But to expect that from this Watson would suggest gravity, the Copernican universe, and Newton’s laws of motion is magical thinking.
The End of History is a 1992 essay by Francis Fukuyama that argued that Western liberal democracy would be the final endpoint social and political development. A quarter of century later this claim seems rather premature. ↩
This includes both platform trials, and now REMAP (Randomized Embedded Multifactorial Adaptive Platform). Here new treatment options are continuously added and removed, as they are discovered and assessed, and the randomisation is embedded in health care delivery. The EHR can even provide the realtime data collection and feedback loop."Angus:2015jw" ↩
from the Greek, wanderers, because their position relative to the other stars was not constant. ↩
This paper is a great find. Not the least because the argument (statistics versus data science) was already in full swing 50 years ago.
I have no problem with predictive modelling, but it is a different task. And it does seem that the emphasis on ML has obscured the value (in a pendulum swing from the days of Tukey) on importance of understanding the generative model. From Donoho …
Predictive modeling is effectively silent about the underlying mechanism generating the data, and allows for many different predictive algorithms, preferring to discuss only accuracy of prediction made by different algorithm on various datasets. The relatively recent discipline of machine learning, often sitting within computer science departments, is identified by Breiman as the epicenter of the predictive modeling culture.
I like to think that our lab is pulling hard at the pendulum. That we care massively about the underlying mechanism. That for me is the ‘science’ in ‘data science’. Science because when right it tells us something about how the world works, not just how it will be. The difference between a super accurate weather forecast, and understanding the principles of the atmosphere and the climate. None of that devalues predictive modelling, but these are separate activities.
The replication crisis in science is not the product of the publication of unreliable findings. The publication of unreliable findings is unavoidable: as the saying goes, if we knew what we were doing, it would not be called research. Rather, the replication crisis has arisen because unreliable findings are presented as reliable.
This eloquent exposition of why clinicians are necessary to data science feels like a manifesto for the lab.
We argue that a failure to adequately describe the role of subject-matter expert knowledge in data analysis is a source of widespread misunderstandings about data science. Specifically, causal analyses typically require not only good data and algorithms, but also domain expert knowledge.
And a general critique of ML as a method to improve health
A goal of data science is to help make better decisions. For example, in health settings, the goal is to help decision-makers—patients, clinicians, policy-makers, public health officers, regulators—decide among several possible strategies. Frequently, the ability of data science to improve decision making is predicated on the basis of its success at prediction. However, the premise that predictive algorithms will lead to better decisions is questionable.
the distinction between prediction and causal inference (counterfactual prediction) becomes unnecessary for decision making when the relevant expert knowledge can be readily encoded and incorporated into the algorithms. For example, a purely predictive algorithm that learns to play Go can perfectly predict the counterfactual state of the game under different moves, and a predictive algorithm that learns to drive a car can accurately predict the counterfactual state of the car if, say, the brakes are not operated. Because these systems are governed by a set of known game rules (in the case of games like Go) or physical laws with some stochastic components (in the case of engineering applications like self-driving cars),
Or more specifically …
…contrary to some computer scientists’ belief, “causal inference” and “reinforcement learning” are not synonyms. Reinforcement learning is a technique that, in some simple settings, leads to sound causal inference. However, reinforcement learning is insufficient for causal inference in complex causal settings
I have been using Git now for a couple of years, and have struggled to understand what it does. I get the basic concept (that it keeps a record of the changes I make to my code) but it also sometimes seems to get in the way. I have read Think like (a) Git and The thing about git in the last couple of days and learned a few really useful things.
In no particular order …
the idea that git commits can be ‘wasted’ – you don’t need to keep everything or worry about a commit being perfect. Commit if you feel like it.
in contrast, you can also ‘craft’ a commit: this is the idea behind the staging area (or index), and is nicely covered in The Thing About Git. Here you can almost imagine writing your git commit message before you commit (i.e. I fixed problem X). Then simply add those files (or parts of files — aka ‘hunks’). Adding ‘hunks’ is the task of …
git add --patch – this is genius. When you are preparing a commit then you don’t need to commit an entire file. Running git add --patch myfilename.here allows you to run through the diffs in the file and only stage those parts you wish to. You can also think of this as a bit of a backwards solution to the classic git commit --ammend which allows you to add things you forgot to the previous commit.
Git patch options
y - stage this hunk
n - do not stage this hunk
a - stage this and all the remaining hunks in the file
d - do not stage this hunk nor any of the remaining hunks in the file
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks
Selectively applying changes from one branch to another
Common scenario: work in one branch would be useful in another but you don’t want to merge the branches.
If current branch is this_branch and the branch with the changes you want to pull is called that_branch
To pull across a specific commit:
Git cherry pick will pull just a specific commit, but not necessarily a whole file.
git diff --word-diff=color … wow! A way to inspect word by word changes in the file. Much better suited to using Git when writing. I should write more about this whole topic! In the meantime, also note
Early notes: imagine a development branch, and a feature branch. While your working on the new feature, you also make (possibly) separate changes to the development branch. You don’t want to destroy the feature (it’s not done yet), but you want these recent changes on the development branch in your feature. Then rebase. ‘Rebasing’ refers to moving the point where your feature branch separated from the development branch forward in time. In fact, you move it forward to the ‘capture’ as many of the recent changes on development as you like.
And if it turns out that there are conflicts between your feature branch and the development branch then you can either manually resolve them or do an ‘interactive’ rebase.