How did quantitative editing work at FiveThirtyEight? This is the process I followed when working with our journalists or freelancers.

Here’s a quick how-to guide in presentation form:


What I need:
 

  • Original data as it was collected

  • Documentation (codebook, etc.)

  • Any code that recodes variables

  • Analysis code

Any code provided should be complete, commented, and run start to finish; extraneous analyses should be removed or, at minimum, commented out.

Ideally, code should be consolidated in one file (or two, recoding and analysis). If it’s spread over more than that, please make clear in what order they should be reviewed/run.



Questions


Questions I’ll probably want you to answer in advance:

  • Where did the data come from?

  • Is there any part of the method you’re unsure about or would like advice on?

  • What is the core argument of the piece, in one or two sentences?

Questions I’m going to be asking as I review:

(You don’t need to provide answers to these questions, but it would be great if you have answers. If there’s anything you don’t have an answer to, or where the answer is complicated, it might be good to talk to me about it in advance!)

Data collection: 

  • How was the data collected? 

    • Did it come from a poll/survey, was it scraped (from where? Is there code?), was it hand collected (please no)? 

    • Can I verify where it came from (e.g. from a website)?

    • Could I re-collect it, or is that impossible (e.g. proprietary data, or something has changed since it was collected)?

  • Double checking: was any aspect of the data collected manually?



Data management:

  • What recoding of the variables was done after the data was collected?

  • Is any other data merged in?

  • Was any of the data manipulated manually after it was collected? 

    • If yes, is there a record of these changes?

Analysis:

  • What is the analysis done here? 

  • What were the decisions made: 

    • choice of method? Did you try anything else?

    • is there any data subsetting? 

    • is there any subgroup analysis? How large are the subgroups?

    • how many tests are run? What kinds of tests?

  • Is there uncertainty in the estimates? 

    • How do you show that uncertainty (e.g. confidence intervals, standard errors, hypothesis tests)?

  • Is there anything you’re unsure about?

Argument:

  • What’s the core argument being made here? 

    • What is the evidence provided to answer it?

  • Is it a causal argument? 

    • If yes, what is the evidence that speaks to the causality?