v2.6.0 (3585)

Enseignement ATHENS - TPT33 : Information Extraction (Télécom ParisTech)

Domaine > Informatique.


In this course, students will learn the basics of semantic information extraction, i.e. the art and science of extracting facts from natural language documents. This includes algorithms for extraction from the Web, as well as the essentials of natural language processing and knowledge representation. We will also touch upon the Semantic Web. The goal is to understand the technology behind today's large knowledge bases such as Google's Knowledge Graph, NELL, DBpedia, and YAGO.

Objectifs pédagogiques

The course will consist of lectures and practical exercises (labs). The lectures will be interactive, with small quizzes to check the understanding of the topics. The course will cover:
• Knowledge representation (RDF, RDFS, OWL)
• Named Entity Recognition (Regular Expressions, Tries)
• Named Entity Annotation (Rule-based and statistical)
• Design of extraction algorithms and evaluation
• Disambiguation (context-based, coherence-based)
• Instance Extraction (Hearst extraction, set expansion, iteration)
• Fact extraction from structured sources (Wrapper induction, extraction from Wikipedia)
• Fact extraction from text (DIPRE algorithm, POS annotation)
• Dependency Grammars
• Extraction by reasoning
• Basics of Predicate Logic
• Basics of Probability Theory
• Programming in Java: data structures, Input/Output, File handling

30 heures en présentiel (20 blocs ou créneaux)

effectifs minimal / maximal:


Diplôme(s) concerné(s)

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme Echange international non diplomant

La note obtenue rentre dans le calcul de votre GPA.

Pour les étudiants du diplôme Diplôme d'ingénieur

L'UE est acquise si Note finale >= 10
  • Crédits ECTS acquis : 3 ECTS
  • Crédit d'UE électives acquis : 3

La note obtenue rentre dans le calcul de votre GPA.

Méthodes pédagogiques

evaluation by labs
Veuillez patienter