GOOFRE version 2: voir et traiter 600 milliards de mots

Etienne Brunet; Laurent Vanni

doi:10.5007/1807-9288.2014v10n2p75

GOOFRE version 2: voir et traiter 600 milliards de mots

Authors

Etienne Brunet Université de Nice Sophia Antipolis
Laurent Vanni Université de Nice Sophia Antipolis

DOI:

https://doi.org/10.5007/1807-9288.2014v10n2p75

Abstract

Les données de Google Books ont doublé en deux ans, en franchissant le cap des 500 milliards de mots. Un nouveau traitement a repris les images scannées pour en proposer une lecture plus fidèle. Et pour la première fois les textes enregistrés ont bénéficié de la désambiguïsation et de la lemmatisation. Enfin le site Culturomics a fourni les outils nécessaires pour en assurer la diffusion. Il convenait donc de procéder à une nouvelle expertise et de créer une nouvelle base, pourvue de tout l’appareillage statistique qu’exige, en réseau ou en local, l’exploitation des grands corpus.

Author Biographies

Etienne Brunet, Université de Nice Sophia Antipolis

Étienne Brunet is an emeritus professor at the University of Nice Sophia Antipolis, and founder of the Bases, Corpus, Language Laboratory. Brunet researches Computational Linguistics and Textual Statistics, from which he is a pioneer and world reference. He is the designer of the academic software Hyperbase2 with Pierre Guiraud and Charles Muller. Brunet wrote over a hundred articles and a dozen books. One may highlight his reference theoretical and practical work on Literary Lexicometry, Le vocabulaire français de 1789 à nos jours (Genève-Paris, Slatkine-Champion, 1981, 3 volumes, 1824 p.).

Laurent Vanni, Université de Nice Sophia Antipolis

Laurent Vanni is an engineer at the University of Nice Sophia-Antipolis. He is part of the team of the Bases, Corpus, Language Laboratory.

Downloads

PDF/A (Français (Canada))

Published

2014-12-16

Issue

Vol. 10 No. 2 (2014)

Section

Articles

License

Authors who have their works published in Texto Digital agree that:

Copyrights remain with the authors, who grant the journal the right of first publishing their submitted manuscripts. All materials published by the journalare under an Attribution 4.0 International - Creative Commons License, which allows them to be shared since authorship and first publication credits are mentioned.
The Attribution 4.0 International - Creative Commons allows the copy and redistribution of the material in any medium or format, as well as its adaptation for any purpose, even commercially.
Authors can take additional contracts for non-exclusive distribution of the version of their works published by our journal separately (e.g. to publish it in an institutional repository or as a book chapter) with both expressed authorship acknowledgment and Texto Digital’s first publication credit.

GOOFRE version 2: voir et traiter 600 milliards de mots

Authors

DOI:

Abstract

Author Biographies

Etienne Brunet, Université de Nice Sophia Antipolis

Laurent Vanni, Université de Nice Sophia Antipolis

Downloads

Published

Issue

Section

License

Developed By

Language