v2.11.0 (5762)

Enseignement scientifique & technique - APM_4AI12_TP : Machine Learning for Text Mining

Domaine > Mathématiques.

Descriptif

Course in English
 
This course proposes an introduction to automatic text processing, from how to numerically represent text to basic machine learning algorithms develloped for these representations. It should be followed in parallel to SD-TSIA 210, which introduces general machine learning methods. This course does not address deep learning for natural language processing, as SD-TSIA 203 is in the following period. Rather, it provides a detailled tour of pre-deep learning methods of natural language processing, and will help contextualize the development of deep learning - as this represents one of its main application domain. It is strongly advised to students wishing to choose courses about NLP/LLMs in their third year.

Objectifs pédagogiques

- Being able to explain the difficulties linked to language data as text, and associated tasks.
- Understanding the different ways of numerically representing text, and basic machine learning methods for classical applications.
- Understanding what makes the success of using neural networks for representing and processing language data.
- Applying those methods to several simple tasks, thanks to python and specialized libraries (NLTK, Gensim, Scikit-learn)

Pour les étudiants du diplôme Echange international non diplomant

Machine learning (theoretical foundations) and basis of neural networks

Pour les étudiants du diplôme Diplôme d'ingénieur

Students are supposed to follow SD-TSIA 210 Machine learning

Format des notes

Numérique sur 20

Littérale/grade européen

Pour les étudiants du diplôme Diplôme d'ingénieur

Vos modalités d'acquisition :

Evaluation : Lab report and final exam.

L'UE est acquise si Note finale >= 10
  • Crédits ECTS acquis : 2.5 ECTS
  • Crédit d'UE électives acquis : 2.5

La note obtenue rentre dans le calcul de votre GPA.

Pour les étudiants du diplôme Echange international non diplomant

La note obtenue rentre dans le calcul de votre GPA.

Programme détaillé

The techniques and concepts that will be studied include:
-Text pre-processing and representation : tokenization, document representation and word embeddings; how they can be used for classical NLP tasks.

- An introduction to non-neural Language models.
- HMM and their application to NLP tasks.
- A first application of simple neural models to text representation.

Veuillez patienter