The FineDesc Learner Corpus is composed of the successful written production by candidates of the CertAcles Exam Suite at B1, B2 or C1 CEFR levels who took their exams in any of the University Language Centres which collaborated in this project and granted their permission to have their texts included in the learner corpus. The FineDesc Learner Corpus consists of 1,309,507 words in 6375 documents written by 3412 candidates (further information on the corpus breakdown is available in the section ‘learner corpus figures’).
These candidates are either L1 Spanish monolinguals or bilinguals, Spanish being one of their languages together with another co-official language in Spain (Galician, Catalan, Basque, Valencian). Some candidates considered English as their first, second or third foreign language or as their second language. Apart from the candidate’s L1(s) and the status they give to English in their plurilingual repertoire, each candidate’s information regarding their genre (male, female, non-binary or unknown, in those cases in which the candidate preferred not to disclose this type of information), and their attendance (or not) to a preparation course before taking this high-stakes language accreditation exam was also obtained, thus consituting the candidate’s variables in the FineDesc Learner Corpus.
The CEFR level was granted to each text in the FineDesc Learner Corpus thanks to the ratings by two independent CEFR/CV experts, who evaluated the candidate’s texts as being at the level specified. Apart from the CEFR level, other variables regarding the texts in the learner corpus were compiled to allow for comprehensive analyses of learner language. These variables are the formality of the text (formal, informal, semiformal), the text type (article, blog, email, essay, letter, post –on a forum/website-, proposal, report, review and short story), the main communicative function(s) in the text (applying, describing, describing & suggesting, expressing an opinion, expressing an opinion and suggesting; making a complaint, making an inquiry, making a request, making a suggestion, narrating, offering, persuading) and the main topic of the text.
Parts of the FineDesc Learner Corpus have been analysed by the members of the FineDesc Project to describe learner language in three main text types, namely correspondence, creative writing and reports and essays, considering the linguistic, sociolinguistic and pragmatic competences.
The FineDesc Learner Corpus can be used for research/teaching purposes provided you cite it as follows:
Díez-Bedmar, M. B. (2025). FineDesc Learner Corpus 2.0 (España, 2510243469857). SafeCreative. https://www.safecreative.org/validity
Acknowledgements
The compilation of this learner corpus was possible thanks to the collaboration of eight University Language Centres (Centro de Estudios Avanzados en Lenguas Modernas, Universidad de Jaén; Centro de Idiomas, Universidad Miguel Hernández de Elche; Centro de Idiomas de la Universidad de Valladolid; Centro de Lenguas, Universidad Politécnica de Madrid; Centro de Linguas, Fundación Universidade de Vigo; Centro Universitario de Lenguas Modernas, Universidad de Zaragoza; Escola d’Idiomes Moderns, Universitat de Barcelona; Servei de Llengües Modernes, Universitat de Girona) and the BA and MA students who helped with the transcription of the hand-written exams, once they had been fully anonymized, into electronic format (Estefanía Troyano Gómez, Mónica Mora Ruiz, Ignacio Barrionuevo Vasco, Ángela Araque García, Víctor Torres García, Laia Burguera Miñana, Carmen Torres Castillo, María Rosa Criado Cañuelo, Adrián Podovia Podovia, Alice L. Marriott, María García Baños, Juan Espinosa Huertas, Noelia Espinar Espejo, Lucía González Garrido, María Guadalupe Ramiro Fernández, Alejandra Ruíz Rodrigo, Sandro Mederos Feo and Samuel Díaz-Roncero Tejero).
The FineDesc Learner Corpus was compiled thanks to the FineDesc research project ‘Making the CEFR/CV more user-friendly: fine-tuning descriptors with Learner Corpus Research (LCR) results’ (Grant PID2020-117041GA-I00, funded by MICIU/AEI/10.13039/501100011033), funded by the Spanish Ministry of Science, Innovation and Universities.