Descriptif
Course in English
This course proposes an introduction to automatic text processing, from how to numerically represent text to basic machine learning algorithms develloped for these representations. It should be followed in parallel to SD-TSIA 210, which introduces general machine learning methods. This course does not address deep learning for natural language processing, as SD-TSIA 203 is in the following period. Rather, it provides a detailled tour of pre-deep learning methods of natural language processing, and will help contextualize the development of deep learning - as this represents one of its main application domain. It is strongly advised to students wishing to choose courses about NLP/LLMs in their third year.
Objectifs pédagogiques
- Being able to explain the difficulties linked to language data as text, and associated tasks.
- Understanding the different ways of numerically representing text, and basic machine learning methods for classical applications.
- Understanding what makes the success of using neural networks for representing and processing language data.
- Applying those methods to several simple tasks, thanks to python and specialized libraries (NLTK, Gensim, Scikit-learn)
- Understanding the different ways of numerically representing text, and basic machine learning methods for classical applications.
- Understanding what makes the success of using neural networks for representing and processing language data.
- Applying those methods to several simple tasks, thanks to python and specialized libraries (NLTK, Gensim, Scikit-learn)
24 heures en présentiel (16 blocs ou créneaux)
Diplôme(s) concerné(s)
Parcours de rattachement
Pour les étudiants du diplôme Echange international non diplomant
Machine learning (theoretical foundations) and basis of neural networks
Pour les étudiants du diplôme Diplôme d'ingénieur
Students are supposed to follow SD-TSIA 210 Machine learning
Format des notes
Numérique sur 20Littérale/grade européenPour les étudiants du diplôme Diplôme d'ingénieur
Vos modalités d'acquisition :
Evaluation : Lab report and final exam.
L'UE est acquise si Note finale >= 10- Crédits ECTS acquis : 2.5 ECTS
- Crédit d'UE électives acquis : 2.5
La note obtenue rentre dans le calcul de votre GPA.
Pour les étudiants du diplôme Echange international non diplomant
La note obtenue rentre dans le calcul de votre GPA.
Programme détaillé
The techniques and concepts that will be studied include:
-Text pre-processing and representation : tokenization, document representation and word embeddings; how they can be used for classical NLP tasks.
- An introduction to non-neural Language models.
- HMM and their application to NLP tasks.
- A first application of simple neural models to text representation.