Bootstrap at the cluster or the unit level

I have been using the bootstrap more often recently, but the data that I use is typically structured with patients nested in hospitals. The wonderful Cross Validated recommends that any sampling that is to be done should respect the structure of the data.

This means first sampling (with replacement) hospitals, and then sampling (with replacement again) within each hospital before re-assembling the data.

There is a better explanation along with a code snippet from the biostats department at Vanderbilt. However, with 48 hospitals and 15,000 patients, this ran very slowly.

I have re-written this using the data.table with a good (great?) improvement in speed (but some loss of flexibility).