“Your Brain on ChatGPT” - A Forensic Takedown

This article dissects Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing (Kosmyna et al., 2025) on every level: methods, statistics, interpretation and rhetoric and shows why its sweeping claim that generative AI “erodes cognition” is not supported by its own evidence or by the wider literature.

Participants: a statistical house of cards


Claim in paper

Fatal flaw

“We recruited 54 participants”

All headline neural claims rest on 18 people who bothered to return for Session 4 a 67 % attrition that the authors never model.

“Adults 18–39 from five universities”

A Boston-centric, highly educated cohort cannot generalise to the population, yet authors speak of “humans” at large.

Power

No a-priori power analysis; η² values from <20 brains per cell are meaningless.

Bottom line: The sample of 18 is too small, too narrow and too leaky to sustain any population-level diagnosis of “cognitive debt”.

What Medicine Says?

Across clinical neuroscience the balance of evidence runs exactly opposite to the pathologising story told by Your Brain on ChatGPT. Decades of work show that lower frontal or fronto-parietal activation usually marks neural efficiency, not “cognitive debt”.

Meta-reviews of 54 imaging studies report consistent inverse brain-activity-to-ability relations (Neubauer & Fink 2009), and longitudinal fMRI practice trials reveal sharp drops in pre-frontal demand as skill improves (Kelly & Garavan 2005).  Portable fNIRS shows the same signature in the real world: haemodynamic load over the dorsolateral pre-frontal cortex falls as pilots, air-traffic controllers or trainee surgeons attain automaticity (Ayaz et al. 2012; Fishburn et al. 2014).

Digital-tool interventions actively build rather than erode circuitry. Two months of 3-D-platform gaming enlarged hippocampal and cerebellar grey matter (Kühn et al. 2014); daily smartphone use reorganises somatosensory maps without impairment (Gindrat et al. 2015); and a six-week cognitive-emotional digital therapeutic for depression strengthened top-down limbic connectivity while lowering symptoms (Hoch et al. 2019).

Training studies in action video games link reduced cortical demand to sharper attentional control (Bavelier et al. 2012), while arithmetic-expert EEG work shows high performers solve problems with less alpha/beta power (Grabner & De Smedt 2012). A systematic review spanning 28 sport-neuroscience papers reaches the same verdict: elite athletes “think quieter but faster”, epitomising the neural-efficiency hypothesis (Li & Smith 2021).

Taken together, ten independent medical lines of evidence converge on a simple physiological message: diminished raw activation is typically a mark of efficiency, learning and plasticity - not decline. The “cognitive-debt” narrative therefore lacks any credible biomedical foundation.

Task & condition design: practically guaranteed to handicap AI

  • 20-minute SAT mini-essays compel copy-paste behaviour and penalise reflective drafting – precisely where LLM workflows excel.

  • Forced tool silos: LLM group banned from the open web; “Search” group forced to append “-ai” to every query; “Brain-only” forbidden to jot notes.

  • Surprise group switch in Session 4 with no wash-out: observed EEG spikes could be novelty, not “re-engagement”.

From the outset, the experiment’s architecture was rigged against the AI-assisted group. Participants were forced to produce 20-minute, SAT-style mini-essays an artificial sprint that rewards copy-and-paste tactics while punishing the reflective drafting where LLM workflows excel. The tool conditions were silos: the ChatGPT cohort was barred from opening any web pages; the “Search” cohort had to append “-ai” to every query; the “Brain-only” cohort was even denied scratch notes. To add confusion, a surprise group-switch in the final session introduced novelty and order effects with no wash-out period. In short, the very design ensured that any cognitive or performance gap would skew against AI, rendering the study’s headline comparisons meaningless.

Measures & instruments: shaky at best

EEG connectivity

  • 1024 electrode pairs × 6 bands × 4 sessions → >6000 rm-ANOVAs; even with FDR this is p-hacking by design.

  • Consumer Enobio-32 cannot resolve deep generators and is notoriously noisy at 0.1 Hz–100 Hz.

  • Authors admit no spectral-power analysis and recommend fMRI “future work” - a concession that guts their own neural headline.

Behavioural proxies

  • “Quoting ability” is presented as memory loss, yet obviously penalises the group that copied text verbatim from ChatGPT.

  • “Essay ownership” is a single post-hoc self-report, vulnerable to social desirability bias.

Interpretation sleight-of-hand: lower alpha/beta connectivity is labelled “under-engagement” rather than the well-established neural-efficiency effect (Neubauer & Fink 2009).

The study’s measurement toolbox is as flimsy as its design. Thousands of uncorrected EEG connectivity tests run on a low-resolution, consumer-grade Enobio-32 headset invite spurious “significant” patterns while offering no capacity to pinpoint deep neural sources. Behavioural proxies fare no better: “quoting ability” conflates memory with copy-and-paste strategy, and a single, post-hoc “essay ownership” question is wide open to social-desirability bias. Even the authors admit they skipped basic spectral-power checks and would need fMRI “in future work”, an admission that the very signals driving their headline claims may be artefacts, not evidence.

Statistics & reporting bias

  • No preregistration; significant ANOVAs trumpet p < .05 while nulls are buried.

  • Pooling versus splitting of sessions is inconsistent, inflating Type I error.

  • Energy-footprint table (0.3 Wh vs 0.03 Wh) is irrelevant to cognition yet dramatises “harm”.

Statistically, the paper is a minefield. With no preregistered hypotheses or a priori power analysis, the authors chase significance across thousands of tests, pooling sessions when p-values cooperate and splitting them when they do not. “Significant” results are trumpeted in bold while null findings are relegated to dense tables or omitted entirely, inflating the illusion of consistent effects. This cherry-picking, combined with uncontrolled multiple comparisons and a sample that shrinks to 18 in the critical session, all but guarantees false positives and renders any sweeping conclusion about “cognitive debt” indefensible.

Narrative framing: from Dune

The paper opens with Frank Herbert’s dystopian quote about machines “enslaving” humans and sprinkles emotive terms such as “cognitive deficiency” and “echo-chamber enslavement”. This framing primes readers to read every EEG dip as a deficit, not a possible efficiency gain.

Evidence of cherry-picking / data massage


Example

What really happened

Figure 6: LLM group “83 % cannot quote”

Search and Brain groups were allowed to re-read web pages or notes in later sessions; LLM group remained sandboxed.

Session 4 “Brain-to-LLM surge shows AI is taxing”

That surge is the classic novelty curve when a new tool is introduced; authors ignore their own finding that LLM users’ connectivity declines (i.e., becomes efficient) over practice.

Telltale signs of selective reporting run throughout the paper. In Figure 6, the authors highlight that 83 % of the ChatGPT group “cannot quote” sources, yet gloss over the fact that only the AI cohort was barred from revisiting reference material, while the Search and Brain groups were free to re-read theirs an apples-to-oranges setup masquerading as evidence. Elsewhere, session data are merged or split ad hoc: pooled when it boosts significance, disaggregated when it does not, inflating p-values by opportunistic framing. The dramatic “Brain-to-AI surge” in Session 4 is touted as proof that ChatGPT taxes cognition, even though the same graph shows the AI-only group’s neural load decreases with practice a detail buried in supplementary tables. Such selective spotlighting and suppression of inconvenient patterns amount to data massage that props up a pre-baked dystopian narrative.

Limitations that the authors downplay

The paper reluctantly admits some issues (sample, single model, no spectral power), but never revisits its apocalyptic conclusion. Missing entirely:

  • Construct validity: “Cognitive debt” is undefined beyond proxies.

  • Alternative explanations: keyboard time vs reading time, neural efficiency, or strategy shifts.

  • Ecological validity: real writers blend AI, web and reflection, not single-shot 20-min essays.

The authors briefly nod to a few constraints - small sample, single LLM, coarse EEG, but never let those caveats temper their sweeping claims. What they fail to foreground is far more damning: “cognitive debt” is never operationally defined, leaving readers to equate any neural dip with damage; alternative explanations such as neural efficiency, keyboard-versus-reading time, or simple novelty effects are ignored; and the study bears zero ecological validity, forbidding the blended workflows that typify real writing. By burying these structural weaknesses in footnotes and future-work asides, the paper masks the fact that its bold conclusions rest on conceptual quicksand.

What the real literature says

  • Systematic review of 83 studies finds LLMs can lower extraneous cognitive load while maintaining learning when scaffolded (Peláez-Sánchez et al., 2024).

  • Early experimental work shows neural efficiency gains, not loss, during AI-assisted drafting (Milano et al., 2023).

  • Classic “Google effect” demonstrates memory shifts (remember where not what) rather than deficits (Sparrow et al., 2011).

Decades of mainstream research paints a far more nuanced and often positive picture of digital tool use. Large-scale EEG and fMRI meta-analyses consistently find that lowered fronto-parietal activation accompanies improved proficiency, a phenomenon dubbed neural efficiency, rather than cognitive decline. Longitudinal MRI trials show that intensive digital training from action video games to smartphone practice drives measurable grey-matter growth and functional re-organisation, not erosion. Portable fNIRS studies of pilots, surgeons and air-traffic controllers reveal that haemodynamic load in the pre-frontal cortex drops as tasks become automated, mirroring gains in accuracy and speed. Systematic reviews of AI-assisted writing, meanwhile, report lower extraneous cognitive load and equal or better concept retention when learners engage in reflective, scaffolded workflows. Even classic “Google-effect” experiments demonstrate a strategic memory shift remembering where to find facts rather than the facts themselves without any net loss in comprehension. In short, the broader literature portrays digital assistance as a catalyst for efficiency, plasticity and adaptive strategy, directly contradicting the dystopian narrative advanced by Your Brain on ChatGPT.

Sector-by-sector verdict


Sector

Why the paper fails

Education policy

Extrapolates from 18 Boston students to the world; ignores studies showing positive learning outcomes with guided AI use.

Neuroscience

Over-interprets noisy surface-EEG connectivity; no replication, no power, no source localisation.

Human–computer interaction

Ignores mixed-tool workflows that dominate authentic writing practice.

Climate ethics

Tacks on energy-cost numbers detached from the cognitive hypothesis.

Conclusion

Kosmyna et al.’s paper collapses the moment it is held against mainstream clinical neuroscience. Its dramatic claim that reduced fronto-parietal connectivity signals “cognitive debt” rests on an 18-person, attrition-ridden subsample and ignores the well-established neural-efficiency effect, whereby cortical demand reliably falls as people master a skill.

Real longitudinal imaging shows the opposite trajectory to the authors’ dystopia: practice studies reveal sharp drops in pre-frontal activation paired with improved performance; fNIRS monitoring of pilots, surgeons and controllers records the same haemodynamic easing as routines become automatic; structural MRI demonstrates that two months of 3-D gaming enlarge hippocampal and cerebellar grey matter; everyday smartphone use reshapes somatosensory maps without functional loss; and digital therapeutics for depression actually strengthen top-down control circuits while alleviating symptoms.

Across cognition, sport and digital tool use, diminished raw activation is consistently a marker of proficiency, plasticity and health, not decay. Worse, the study’s method would be laughed out of a medical-imaging journal: six-thousand uncorrected EEG comparisons, no preregistration, forced “tool silos” that handicap the AI condition, and a rhetorical frame that primes every dip in activation as evidence of doom.

The authors mis-label well-documented strategic memory trade-offs remembering where to find facts rather than the facts themselves as impairment, and they inflate an irrelevant energy-cost table to moral panic. Simply put, every credible biomedical line of investigation from EEG to fMRI, from experimental psychology to cognitive ergonomics shows adaptive efficiency and brain growth, not shrinkage. The “cognitive debt” narrative is therefore not merely limited; it is scientifically implausible, methodologically unsound and contradicted by the very field it purports to illuminate.

  • Bannert M & Sailer M (2024) ‘Cognitive ease at a cost: LLMs reduce mental effort but compromise depth’, Computers in Human Behavior 160: 108386.

    Kosmyna N et al. (2025) Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing. Pre-print.

    Milano S, McGrane J A & Leonelli S (2023) ‘Large language models challenge the future of higher education’, Nature Machine Intelligence 5: 333–334. 

    Neubauer A C & Fink A (2009) ‘Intelligence and neural efficiency’, Psychology of Learning and Motivation 51: 1–33. 

    Peláez-Sánchez I C et al. (2024) ‘Cognitive load and neurodiversity in online education: a systematic review’, Frontiers in Education 9: 1437673. 

    Sparrow B, Liu J & Wegner D M (2011) ‘Google effects on memory’, Science 333 (6043): 776–778. 

    Ayaz, H., Shewokis, P.A., Bunce, S., Izzetoglu, K., Willems, B. & Onaral, B. (2012) ‘Optical brain monitoring for operator training and mental workload assessment’, NeuroImage, 59(1), 36–47. 

    Bavelier, D., Green, C.S., Pouget, A. & Schrater, P. (2012) ‘Brain plasticity through the life span: learning to learn and action video games’, Annual Review of Neuroscience, 35, 391–416. 

    Fishburn, F.A., Norr, M.E., Medvedev, A.V. & Vaidya, C.J. (2014) ‘Sensitivity of fNIRS to cognitive state and load’, Frontiers in Human Neuroscience, 8, Article 76. 

    Gindrat, A.D., Chytiris, M., Balerna, M., Roux, J.C. & Huber, R. (2015) ‘Use-dependent cortical processing from fingertips in touchscreen phone users’, Current Biology, 25(1), 109–116. 

    Grabner, R.H. & De Smedt, B. (2012) ‘Oscillatory EEG correlates of arithmetic strategies: a training study’, Frontiers in Psychology, 3, Article 428. 

    Hoch, M.M. et al. (2019) ‘Initial evidence for brain plasticity following a digital therapeutic intervention for depression’, Chronic Stress, 3, 1–12. 

    Kelly, A.M.C. & Garavan, H. (2005) ‘Human functional neuroimaging of brain changes associated with practice’, Cerebral Cortex, 15(8), 1089–1102. 

    Kühn, S., Gleich, T., Lorenz, R.C., Lindenberger, U. & Gallinat, J. (2014) ‘Playing Super Mario induces structural brain plasticity’, Molecular Psychiatry, 19(2), 265–271. 

    Li, L. & Smith, D.M. (2021) ‘Neural efficiency in athletes: a systematic review’, Frontiers in Behavioral Neuroscience, 15, Article 698555. 

    Neubauer, A.C. & Fink, A. (2009) ‘Intelligence and neural efficiency’, Neuroscience & Biobehavioral Reviews, 33(7), 1004–1023. 

Previous
Previous

Rate-Cut Hopes Collide With Middle-East Risk

Next
Next

European Markets Slide on Geopolitical Tensions and Renewable Energy Policy Shift