YOUR SAMPLE IS SMALL ⏱14:20
my name's nick holmes and this is the error bar: a podcast about brain science & statistics nothing like any other podcast about brain science & statistics
episode #37, 15th March 2024 #brain #sex #small #sample #size image source
⇦ listen here ⏱14:20
⇨ transcript here
our main brain stories this episode...
SEX IN THE BRAIN
every single newspaper in the universe covered the story about a research paper that finds some small differences between men's & women's brains. [read more...]
this story was in episode #37. original article: Ryali et al., 2024 (Proceedings of the National Academy of Sciences of the United States of America), reported in: The New Scientist by Clare Wilson on 19th March 2024 & The New Scientist by Clare Wilson on 19th February 2024 & The Guardian by Gina Rippon on 22nd February 2024 & The Times by Tom Whipple on 20th February 2024 & The Telegraph by Sarah Knapton on 19th February 2024 image source #finally #sex #brain #decision #centuries
ratings: The New Scientist: The New Scientist: The Guardian: The Telegraph:
YOUR SAMPLE IS SMALL
it was inevitable that a podcast focussing on statistics would come around to the question of sample size. some time in 2023 i decided that i would no longer criticise any scientific study on the basis of its sample size. the sample size is how many independent pieces of evidence, typically from different people, are gathered in a single study to answer a research question.
in this second statistical essay, i explore why i think that sample size - on its own - is never a useful thing to think about when evaluating a single study or a single piece of evidence.
the sample size criticism is a popular one. students use it in essays & dissertations; politicians use it to discredit opinion polls; i've even heard a much-loved & respected satirical investigative journalist (Ian Hislop, for those in the UK), use it to discount scientific studies appearing in the news. non-scientists use sample size to discredit data & scientific research, students to address the 'critical evaluation' component of their assessments.
but qualified, highly-trained professional scientists also use the sample size criticism. indeed, they use it a big number of times.
in the story that precedes this a-log, i pointed out that at least 600 previous studies have asked whether male & female brains are different, & that these 600 studies were meticulously reviewed & analysed in a majestic paper by Eliot and colleagues in 2021. 600 studies is a big number of studies. for a systematic review & meta-analysis in this field of research, 600 is a very large sample.
when doing systematic reviews - or studying the life expectancy of the population of Sweden, as discussed in last month's episode - the question of how many studies to include, or of the study sample size needed, becomes irrelevant: the researcher's job is to collect all the relevant data & to explain which data might be missing & what impact it might have on the conclusions.
but in any individual research study there is a real problem about how much data should we gather, when should we stop collecting data, & what is enough to answer our questions or test our hypotheses.
there are papers on this problem to give you advice. i remember seeing a few years ago - likely on twitter - a paper stating as a simple truth that "you need more participants to study a statistical interaction than you do a main effect". this claim depends on specific & unnecessary assumptions & it is not at all a simple fact - i will return to this in a later episode. all i want to say here is that you can, if you wish, look for advice about the sample size you need for your study. & you will find that advice. you will find a big amount of advice.
so instead of providing any more advice - i would probably argue there is none to provide - i will instead illustrate a few cases where sample size has come up in discussion & say what i think about it. the first case is very fresh in my mind.
in October 2023, i submitted the largest paper of my academic career. there were more participants, more study visits, more data-points & more experiments than any other paper i've worked on. the data collection spanned the longest period - punctuated by covid - of any project i've written up. in my mind at least, there was nothing 'small' about this study.
i received two very good & careful reviews for my paper, & the reviewer's inputs really improved the work & strengthened its arguments. but i remain haunted by one of the reviewer's comments, which was about my study's sample size. i paraphrase the reviewer here: "it is great that the authors have conducted multiple experiments, but the sample size of only 12 participants in each experiment is limited".
this comment really troubled me. & it continues to trouble me. i've been playing the science & publishing games for twenty years, & i have never put more effort, thought, code & data into any other paper. yet still, the reviewers comment on sample size. i don't want to use this platform as an advert for my fantastic paper, freely available now at the Journal of Neurophysiology, but i do want to use it to ask this simple question:
when is a sample big enough?
it strikes me that reviewers don't tend to say that a sample is "too large". but why not? if they are happy to say that a sample is "too small", then there must be a sample of a bigger number that is the "right size" & perhaps also a sample of a yet bigger number that is "too large".
so let's talk about this Goldilocks Zone of study sample size.
"On September 1, 1953, Scoville removed Molaison's medial temporal lobes on both hemispheres including the hippocampi and most of the amygdalae and entorhinal cortex, the major sensory input to the hippocampi." this quote, from Wikipedia documents the creation of perhaps the most famous & important single case study in the history of human neuroscience. patient HM had both temporal lobes removed & after this drastic brain surgery lost his explicit anterograde memory - that is, his memory for new events & experiences.
N=1 for a study of the effect of temporal lobectomy on human memory is a large & sufficient sample size. we don't need another HM in human neuroscience. likewise, for many areas of brain research, a single neuropsychological case study, a single lesion in a single monkey's brain, or a single psychophysical experiment on a rarely-occurring genetic variation in human photoreceptor pigments, is a large, necessary, & sufficient sample.
if 1 is a sufficient minimum sample size for some studies, is there also a maximum at the other end of the sampling spectrum? let's talk about BIG DATA.
in episode 27 i reviewed a claim, from a high-profile brain scanning study, that thousands of participants are needed in every brain scanning study if we are to discover brain-behaviour correlations. that is simply wrong, for reasons i discussed before - it's only the number required to replicate that study, not every study, & especially not good, powerful, well-designed studies.
in episode 25 i criticised a study with 1.2 million datapoints on the grounds that a lot of those datapoints were, well, junk. it doesn't matter how big the sample is if the data are junk. better to spend tax dollars on collecting good data.
1 is sometimes enough & 1.2 million is sometimes too many. just thinking about these extremes leads me to conclude that the criticism "your sample is small" is meaningless. in the absence of any other justification or evidence, it cannot have any meaningful interpretation.
so what should we do?
well, there is no simple answer, & listening to my statistical essays each month is not going to provide you with it. (but do please keep listening!)
for each study, you need the best-possible design, the best-possible sampling strategy, the best-possible data collection, & the best-possible data analysis. doing the best-possible systematic review & best-possible meta-analysis before or after the study is also not a bad idea, if you have the time - see episode 36.
how to define the best-possible sample? well, this will differ for every study, whether it's a final year undergraduate project, a master's project, a PhD project, or a massive multi-centre study funded by millions & worked on by dozens.
what about effect-size? well, if you know what the effect size for your study is, you don't need to do the study, so save the resources & spend it on discovering important effects that you don't yet know the size of.
what about the minimal effect-size of interest? well, OK, that makes sense to me - given all the other constraints that apply to this study - time, staff, money, participants, resources - what kind of result would be worth putting in the effort to collect those N pieces of evidence?
when students, politicians, journalists or reviewers comment that your sample is small, you just need to ask: how big it should be, why, & who's paying?