transcript of episode 37: YOUR SAMPLE IS SMALL, 15th March 2024

[🎶 INTRO: "Spring Swing" by Dee Yan-Kee 🎶]

welcome to the error bar: the 3rd most popular brain-based stats podcast

my name's nick holmes and this is the error bar: a podcast about brain science & statistics nothing like any other podcast about brain science & statistics

here is the brain news on the 15th March 2024:



they've done it! neuroscientists have, finally, finally, this time, & for the first time & for the six hundred & twenty third time, discovered some differences between men's & women's brains.

we can all put down our gonad scanners & brain measurers for science has - at long last - done it.

that, at least, is what you might reasonably conclude from seeing that *every news outlet in the solar system* last month covered the story of yet another brain scanning study finding some small differences between the sexes. using deep learning or some other AI thing.

the most hyperbolic praise was from the New Scientist, which stated that this single new study is "finally cutting through the historical discrimination & gender politics to get at the truth".

the truth. this time only!

regular listeners to the error bar will recall in episode 7 that i reported on what i called a "majestic review" of over 600 previous brain imaging studies conducted over 30 years. the author of that study - Professor Lise Eliot - was quoted by the new Scientist: "When you control for brain size, all of the claims about volume differences of individual structures between men and women either disappear or become extremely small."

yet still - still - we are told by those same newspapers that, despite there being 600 previous studies of the brain differences between men & women it is the new one - & only the new one - that finally contains the truth about this vexed question.

students of neuroscience: ignore all 600 previous studies & just read this one.


so what is the new study about?

i don't know.

sadly, dear listener, despite the historic importance of everyone knowing about the differences between men & women's brains, the report containing this vital & final truth is paywalled. we might be able to find out the truth about cerebral sex in 6 months when the wall comes down.

it is not a little ironic that this new truthy brain sex study was possible only because the researchers analysed thousands of scans from the freely-available UK BioBank dataset, paid for & heroically maintained by the UK taxpayer. yet we can only read about this crucial new analysis if we pay an obscure scientific society in the US several tens of dollars.

if you want the error bar to pay several tens of dollars to read this study, please send bitcoin to the usual address.

the best place to learn about this topic & the implications of the new study is to go to the Guardian, where Professor Gina Rippon lays out the issues, free for all to read.


see you again next time


the science was by Ryali et al. in Proceedings of the National Academy of Sciences of the United States of America; reported in The New Scientist by @ClareWilsonMed on 19/Mar/24, & The New Scientist by @ClareWilsonMed on 19/Feb/24, & The Guardian by Gina Rippon on 22/Feb/24, & The Times by @whippletom on 20/Feb/24, & The Telegraph by Sarah Knapton on 19/Feb/24


it was inevitable that a podcast focussing on statistics would come around to the question of sample size. some time in 2023 i decided that i would no longer criticise any scientific study on the basis of its sample size. the sample size is how many independent pieces of evidence, typically from different people, are gathered in a single study to answer a research question.

in this second statistical essay, i explore why i think that sample size - on its own - is never a useful thing to think about when evaluating a single study or a single piece of evidence.

the sample size criticism is a popular one. students use it in essays & dissertations; politicians use it to discredit opinion polls; i've even heard a much-loved & respected satirical investigative journalist (Ian Hislop, for those in the UK), use it to discount scientific studies appearing in the news. non-scientists use sample size to discredit data & scientific research, students to address the 'critical evaluation' component of their assessments.

but qualified, highly-trained professional scientists also use the sample size criticism. indeed, they use it a big number of times.

in the story that precedes this a-log, i pointed out that at least 600 previous studies have asked whether male & female brains are different, & that these 600 studies were meticulously reviewed & analysed in a majestic paper by Eliot and colleagues in 2021. 600 studies is a big number of studies. for a systematic review & meta-analysis in this field of research, 600 is a very large sample.

when doing systematic reviews - or studying the life expectancy of the population of Sweden, as discussed in last month's episode - the question of how many studies to include, or of the study sample size needed, becomes irrelevant: the researcher's job is to collect all the relevant data & to explain which data might be missing & what impact it might have on the conclusions.

but in any individual research study there is a real problem about how much data should we gather, when should we stop collecting data, & what is enough to answer our questions or test our hypotheses.

there are papers on this problem to give you advice. i remember seeing a few years ago - likely on twitter - a paper stating as a simple truth that "you need more participants to study a statistical interaction than you do a main effect". this claim depends on specific & unnecessary assumptions & it is not at all a simple fact - i will return to this in a later episode. all i want to say here is that you can, if you wish, look for advice about the sample size you need for your study. & you will find that advice. you will find a big amount of advice.

so instead of providing any more advice - i would probably argue there is none to provide - i will instead illustrate a few cases where sample size has come up in discussion & say what i think about it. the first case is very fresh in my mind.

in October 2023, i submitted the largest paper of my academic career. there were more participants, more study visits, more data-points & more experiments than any other paper i've worked on. the data collection spanned the longest period - punctuated by covid - of any project i've written up. in my mind at least, there was nothing 'small' about this study.

i received two very good & careful reviews for my paper, & the reviewer's inputs really improved the work & strengthened its arguments. but i remain haunted by one of the reviewer's comments, which was about my study's sample size. i paraphrase the reviewer here: "it is great that the authors have conducted multiple experiments, but the sample size of only 12 participants in each experiment is limited".

this comment really troubled me. & it continues to trouble me. i've been playing the science & publishing games for twenty years, & i have never put more effort, thought, code & data into any other paper. yet still, the reviewers comment on sample size. i don't want to use this platform as an advert for my fantastic paper, freely available now at the Journal of Neurophysiology, but i do want to use it to ask this simple question:

when is a sample big enough?

it strikes me that reviewers don't tend to say that a sample is "too large". but why not? if they are happy to say that a sample is "too small", then there must be a sample of a bigger number that is the "right size" & perhaps also a sample of a yet bigger number that is "too large".

so let's talk about this Goldilocks Zone of study sample size.

"On September 1, 1953, Scoville removed Molaison's medial temporal lobes on both hemispheres including the hippocampi and most of the amygdalae and entorhinal cortex, the major sensory input to the hippocampi." this quote, from Wikipedia documents the creation of perhaps the most famous & important single case study in the history of human neuroscience. patient HM had both temporal lobes removed & after this drastic brain surgery lost his explicit anterograde memory - that is, his memory for new events & experiences.

N=1 for a study of the effect of temporal lobectomy on human memory is a large & sufficient sample size. we don't need another HM in human neuroscience. likewise, for many areas of brain research, a single neuropsychological case study, a single lesion in a single monkey's brain, or a single psychophysical experiment on a rarely-occurring genetic variation in human photoreceptor pigments, is a large, necessary, & sufficient sample.

if 1 is a sufficient minimum sample size for some studies, is there also a maximum at the other end of the sampling spectrum? let's talk about BIG DATA.

in episode 27 i reviewed a claim, from a high-profile brain scanning study, that thousands of participants are needed in every brain scanning study if we are to discover brain-behaviour correlations. that is simply wrong, for reasons i discussed before - it's only the number required to replicate that study, not every study, & especially not good, powerful, well-designed studies.

in episode 25 i criticised a study with 1.2 million datapoints on the grounds that a lot of those datapoints were, well, junk. it doesn't matter how big the sample is if the data are junk. better to spend tax dollars on collecting good data.

1 is sometimes enough & 1.2 million is sometimes too many. just thinking about these extremes leads me to conclude that the criticism "your sample is small" is meaningless. in the absence of any other justification or evidence, it cannot have any meaningful interpretation.

so what should we do?

well, there is no simple answer, & listening to my statistical essays each month is not going to provide you with it. (but do please keep listening!)

for each study, you need the best-possible design, the best-possible sampling strategy, the best-possible data collection, & the best-possible data analysis. doing the best-possible systematic review & best-possible meta-analysis before or after the study is also not a bad idea, if you have the time - see episode 36.

how to define the best-possible sample? well, this will differ for every study, whether it's a final year undergraduate project, a master's project, a PhD project, or a massive multi-centre study funded by millions & worked on by dozens.

what about effect-size? well, if you know what the effect size for your study is, you don't need to do the study, so save the resources & spend it on discovering important effects that you don't yet know the size of.

what about the minimal effect-size of interest? well, OK, that makes sense to me - given all the other constraints that apply to this study - time, staff, money, participants, resources - what kind of result would be worth putting in the effort to collect those N pieces of evidence?

when students, politicians, journalists or reviewers comment that your sample is small, you just need to ask: how big it should be, why, & who's paying?

[🎶 OUTRO: "Cosmopolitan - Margarita - Bellini"by Dee Yan-Kee 🎶]

it's closing time at the error bar, but do drop in next time for more brain news, fact-checking & neuro-opinions. take care.

the error bar was devised & produced by Dr Nick Holmes from University of Birmingham's School of Sport, Exercise and Rehabilitation Sciences. the music by Dee Yan-Kee is available from the free music archive. find us at the error bar dot com, on twitter at bar error, or email talk at the error bar dot com