Great new publication on Data-Driven Learning in Language Education

With the publication ‘DATA-DRIVEN LEARNING IN AND OUT OF THE LANGUAGE CLASSROOM’ Pascual Pérez-Paredes and Alex Boulton, leading experts and zealous advocates of DDL-based approaches in language education, add another item to their impressive contribution to research on theory development and promotion of data-driven language learning practice in both academic and pre-tertiary contexts.


From Rules to Discovery: The Learner at the Centre

At the heart of DDL is a shift from traditional, rule-based instruction to a more inductive, learner-centred approach. Instead of simply memorising grammar rules or vocabulary lists, students engage directly with real-world language data—authentic texts that reflect how language is actually used. This process encourages learners to discover patterns, build hypotheses, and develop a deeper, more personalised understanding of language.

As part of the Cambridge Elements series and its access policy I could download the PDF version (still possible till June 6) and get a first impression.
Chapter titles (such as Querying Corpora in the Language Classroom, Hands-on DDL: What Teachers Need to Know, and Data-Driven Learning in the Wild: Fostering Learner Autonomy) and the contents of those that I skimmed confirm the actual realisation of the practical intention as indicated in the related line of the book’s abstract: […] a practical guide for language teachers and graduate students intending to explore or upgrade their use of corpora in the language classroom and beyond.

As I hoped and expected the relationship between the use of GenAI and tools like ChatGPT is also addressed. Paragraph 5.3.2 highlights that while GenAI offers valuable capabilities—such as easy querying in natural language and generating language examples—there are notable limitations, including issues with authenticity and Hallucinations, where the AI provides fabricated examples.


Based on their experiments putting the AI to the test by asking ChatGPT authentic, real corpus content, the authors point out the fundamental difference between these approaches. Traditional Data-Driven Learning (DDL)—which uses real language databases to show students authentic examples of how words and phrases are actually used—offers something AI currently cannot provide: verified, transparent language data. As they put it, we shouldn’t force AI to imitate DDL since “we already have the tools for DDL and the imitation is a poor copy.”

But rather than taking sides in what could easily become a polarized debate the authors make a compelling case by arguing convincingly that we shouldn’t view this as an either-or situation. They advocate for what they call “a synthesis of both approaches,” using each tool for what it does best. AI excels at accessibility and quick explanations, while traditional corpus tools provide the authenticity and transparency that serious language analysis requires.

Apart from the widely documented effectiveness of DDL-based approaches for language learning there are more reasons why adoption of language-pedagogical use of human-made corpora in pre-tertiary contexts at this moment in time should be promoted (IMO). By way of example I mention here the support DDL methodologies can offer language teachers to also contribute to cross-curriculum 21st skills development by teaching learners to evaluate AI-generated language critically, comparing it to authentic language data to identify discrepancies and biases (while developing student language awareness and digital literacy competences (by using discipline specific tools).

However, although DDL has great potential to transform language teaching, also the most recent research shows that actual use in general (primary/secondary) school education is rare and indicates that the integration of Data-Driven Learning (DDL) in initial language teacher education is still limited. Structural investments in teacher training (both initial and ongoing), practice-oriented tools, and institutional support are essential to bridge this gap, the more since (language) teachers are expected to be able to enhance students’ digital literacy and responsible technology use (see e.g. the component Facilitating Learners’ Digital Competence in the DigCompEdu framework and as proposed in Empowering Learners with AI, Area 4 part of PedAIComp (Zou et al.,2025)

At TELLConsult we gladly contribute to professionalising inservice teaching staff in this domain and added this most useful book to our Data-Driven Language Learning Resources Padlet to support the related courses of our ErasmusPlus offer.

Furthermore, FYI, in this context, to compensate for and address issues related to known limitations of current LMMs as described above, we included professional development on data-driven language learning methodologies in our latest EU project proposal ‘Data-driven Applications and Pedagogies for Language Education’ (DAPLE), anticipating -at the time of writing- further cross fertilization of AI and corpus-based pedagogies (Crosthwaite & Baisa, 2023) and increasing availability of (young) learner friendly tools such as CorpusMate and CorpusChat (Ma et al., 2021)

References

Crosthwaite, P., & Baisa, Vit. (2023) Generative AI and the end of corpus-assisted data-driven learning? Not so fast!, Applied Corpus Linguistics, Volume 3, Issue 3, 2023, 100066, ISSN 2666-7991,
https://doi.org/10.1016/j.acorp.2023.100066.

Ma, Q., Tang, J., & Lin, S. (2021). The development of corpus-based language pedagogy for TESOL teachers: a two-step training approach facilitated by online collaboration. Computer Assisted Language Learning, 35(9), 2731–2760. https://doi.org/10.1080/09588221.2021.1895225

Zou, D., Xie, H. and Kohnke, L. (2025), Navigating the Future: Establishing a Framework for Educators’ Pedagogic Artificial Intelligence Competence. Eur J Educ, 60: e70117. https://doi.org/10.1111/ejed.70117

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.