Author: drstevok

Calculate the SOFA score in R

A follow on from the Charlson score function previously posted. Here are functions to calculate the SOFA score.

Please note

  • it’s almost inconceivable that your data will be similar to mine, and you will be able to just use these ‘as is’; however, they might provide a useful skeleton.
  • there are some add-ons included (e.g. if a blood gas is not available then you can still generate the SOFA respiratory score using oxygen saturations and the S:F ratio via this (slightly flawed) proposal)
  • there are some arbitrary decisions too (i.e. vasopressin use is considered to assign patients to 4 SOFA points for cardiovascular dysfunction)

Bootstrap at the cluster or the unit level

I have been using the bootstrap more often recently, but the data that I use is typically structured with patients nested in hospitals. The wonderful Cross Validated recommends that any sampling that is to be done should respect the structure of the data.

This means first sampling (with replacement) hospitals, and then sampling (with replacement again) within each hospital before re-assembling the data.

There is a better explanation along with a code snippet from the biostats department at Vanderbilt. However, with 48 hospitals and 15,000 patients, this ran very slowly.

I have re-written this using the data.table with a good (great?) improvement in speed (but some loss of flexibility).

# Generate sample data
# --------------------

tdt <- data.table(id=1:15000, site=1:50, age=round(rnorm(n=10000, mean=65, sd=18)))

# Non data.table version
# ----------------------

resample <- function(dat, cluster, replace) {
  # exit early for trivial data
  if(nrow(dat) == 1 || all(replace==FALSE))
  # sample the clustering factor
  cls <- sample(unique(dat[[cluster[1]]]), replace=replace[1])
  # subset on the sampled clustering factors
  sub <- lapply(cls, function(b) subset(dat, dat[[cluster[1]]]==b))
  # sample lower levels of hierarchy (if any)
  if(length(cluster) > 1)
    sub <- lapply(sub, resample, cluster=cluster[-1], replace=replace[-1])
  # join and return samples, sub)

# Data.table version
# ------------------

rsample2 <- function(data=tdt, id.unit=id.u, id.cluster=id.c) {

    # Generate within cluster ID (needed for the sample command)
    tdt[, "id.within" := .SD[,.I], by=id.cluster, with=FALSE]

    # Random sample of sites
    bdt <- data.table(sample(unique(tdt[[id.cluster]]), replace=TRUE))

    # Use random sample of sites to select from original data
    # then
    # within each site sample with replacement using the within site ID
    bdt <- tdt[bdt, .SD[sample(.SD$id.within, replace=TRUE)],by=.EACHI]

    # return data sampled with replacement respecting clusters
    bdt[, id.within := NULL] # drop id.within

cluster <- c("site", "id")
system.time(d <- resample(tdt,cluster,c(T,T)))
> system.time(d <- resample(tdt,cluster,c(T,T)))
   user  system elapsed 
  8.597   0.092   8.786 
system.time(d <- rsample2(tdt, id.unit="pid", id.cluster="site"))
> system.time(d <- rsample2(tdt, id.unit="pid", id.cluster="site"))
   user  system elapsed 
  0.051   0.001   0.052 

Get up, git up

I have been using SourceTree as a GUI for git, but just came across GitUp. It starts off just looking like a pretty way to view your repository with a fairly typical graph, but you can in fact work from within the graph.

Gitup 160113

To me, this makes things seem much more intuitive since I can see where I am, and how my work fits in with previous bits of work.

Not only that it lays an undo/redo layer on top of your work. No more trying to work out how to unravel a series of misguided commits.

Worth a look.