https://twitter.com/peyton_k/status/1354477972561010696
lol
Attention getter! The appropriate path to challenge ones research is to write a proper replication essay and highlight weaknesses, not on Twitter.
Dum/b take. This paper is so facially ter/rible that replication would be a waste of anyone's time. Are you one of the authors?
So his main discomfort with the paper is that the analysis is (severely) underpowered?
Following up: Not in any defence of the authors (they've obviously made themselves look dumb by including all the cop x age, race, year interactions) but if they had just focused on the main effects of ''cop (which I think is their main finding), it's essentially a bunch of t-test with arouond N=80 in the treatment group. Still highly underpowered (I certainly wouldn't have written a paper based on it myself), but I don't think the issue is as bad as the guy on twitter is making it seem like. Am I missing something here?
So his main discomfort with the paper is that the analysis is (severely) underpowered?
Following up: Not in any defence of the authors (they've obviously made themselves look dumb by including all the cop x age, race, year interactions) but if they had just focused on the main effects of ''cop (which I think is their main finding), it's essentially a bunch of t-test with arouond N=80 in the treatment group. Still highly underpowered (I certainly wouldn't have written a paper based on it myself), but I don't think the issue is as bad as the guy on twitter is making it seem like. Am I missing something here?
This is exactly what I thought. Why not focus on the main effects of “cop” and call it a small, convenience sample of cops compared to general population? Fine for what it is. All these interaction terms are ridiculous given number of cops in dataset.
So his main discomfort with the paper is that the analysis is (severely) underpowered?
Following up: Not in any defence of the authors (they've obviously made themselves look dumb by including all the cop x age, race, year interactions) but if they had just focused on the main effects of ''cop (which I think is their main finding), it's essentially a bunch of t-test with arouond N=80 in the treatment group. Still highly underpowered (I certainly wouldn't have written a paper based on it myself), but I don't think the issue is as bad as the guy on twitter is making it seem like. Am I missing something here?
This is exactly what I thought. Why not focus on the main effects of �cop� and call it a small, convenience sample of cops compared to general population? Fine for what it is. All these interaction terms are ridiculous given number of cops in dataset.
Yes. This.
The Twitter OP seems like a third year grad student. To say these authors “lord” using statistics is misleading. They did some questionable things and clearly have unreliable estimates but it is unfair to say they “lied.” Relax d/u/d/e
7cc3 here. Let me elaborate a bit more. I suspect (purely speculative though) what's really going on is that they couldn't find a significant main effect initially without the s.i.l.l.y interactions, so they played around with different specifications and included the sets of interactions, though the interactions themselves are not of much interest in the paper, specifically because they lead to the significant main effects.
One can do this because, due to the small n in the treatment group, there are great amounts of noise associated with the main effects. So if you chase noise by playing around with different specifications and pick winners, you will end up with stars on your coefficients based on pure noise. This is the sort of issue that Gelman posts about time and time again on his blog (to be fair, even Heckman was pointed out of making this sort of e/r/r/o/r on his blog). In general, if an estimated effect is 1) based on a small n in the treatment group, 2) significant, and 3) large in magnitude then one should be very skeptical. The main in the effects has all three, so at a bare minimum, it's bad science.
If this is what's really going on, then yes, this is a typical case of p-harking and all the harsh criticisms are justified. But, without any direct evidence, I think it's right that one gives the authors the benefit of the doubt. So in his subsequent tweets, what the g/u/y on twitter really needed to demonstrate was that the main effects go away without the interactions, not that the year by "cop" interactions where highly unstable. Without such evidence, t/r/a/s/h/i/n/g the paper with lines like "how to l/i/e with statistics" and "STAT101 students would be held to a higher standard" seem extremely con/desc/en/ding to me.