Category: Uncategorized

Folder templates for data science projects

Folder templates for data science projects

There are arguably three pieces of computing jargon and architecture
that we need to teach to avoid a lot of pain later: paths, folders or
directories, and environments. None of these are interesting in and of
themselves but a small investment now will pay dividends. This post
talks about how to organise your files and folders. And unlike MacOS,
iPads, and iPhones: we are keen that you see the nuts and bolts of how
these things work.

  1. Use relative paths (including for symlinks) so that your entire
    project structure is transportable bash ln -s data ./code/data To
    load data within an R script
    r mydata <- readRDS('data/mydata.RDS')
  2. separate inputs and outputs
    • specifically separate and protect the input data
  3. write notes (readme.md)
  4. define what needs to go under version control
  5. think about using a build tool … make, doit, …

A template project structure.

{.bash} mypaper ├── code │   ├── _config │   ├── a0utils │   ├── a1prep │   ├── a2analysis │   ├── data -> ../data │   ├── data-raw -> ../data-raw │   ├── figures -> ../figures │   ├── labbook │   │   └── CCYY-MM-DD.py │   ├── library │   ├── readme.md │   └── tables -> ../tables ├── data ├── data-raw ├── figures ├── filing ├── readme.md ├── tables ├── tmp ├── todo.todo └── write ├── manuscript.md └── readme.md {{EJS1-1}} Put just the ./code directory under version control. Never write to ./data-raw`

For an alternative approach, see Templates for reproducible research
projects

which goes much further and splits directories formally using a build
tool (waf).

Links

Syntax highlighting for R in the terminal

So R-studio seemed to be running really slowly today which prompted me to try using R in the terminal. This works nicely with R-Box. Otherwise said, type in Sublime Text, and execute in R (via iTerm.)

This all worked much more quickly, and the plots show up in a lovely quartz window. I lose a lot of the easy point-and-click functionality, but I never used the text editor so I don’t miss that.

What I did miss was syntax highlighting. The solution (via StackOverflow as usual) is a super cool little package called colorout.

Before (top half) and after (bottom half) of my screen. Which looks nicer?

141128 iTerm colorout screenshot

Don’t forget to load the package in your {{EJS1-14}} file.

Calculate the SOFA score in R

A follow on from the Charlson score function previously posted. Here are functions to calculate the SOFA score.

Please note

  • it’s almost inconceivable that your data will be similar to mine, and you will be able to just use these ‘as is’; however, they might provide a useful skeleton.
  • there are some add-ons included (e.g. if a blood gas is not available then you can still generate the SOFA respiratory score using oxygen saturations and the S:F ratio via this (slightly flawed) proposal)
  • there are some arbitrary decisions too (i.e. vasopressin use is considered to assign patients to 4 SOFA points for cardiovascular dysfunction)

Get up, git up

I have been using SourceTree as a GUI for git, but just came across GitUp. It starts off just looking like a pretty way to view your repository with a fairly typical graph, but you can in fact work from within the graph.

Gitup 160113

To me, this makes things seem much more intuitive since I can see where I am, and how my work fits in with previous bits of work.

Not only that it lays an undo/redo layer on top of your work. No more trying to work out how to unravel a series of misguided commits.

Worth a look.