UE MOB_0AT33_TP | Catalogue 2024-2025

Descriptif

In this course, students will learn the basics of semantic information extraction, i.e. the art and science of extracting facts from natural language documents. This includes algorithms for extraction from the Web, as well as the essentials of natural language processing and knowledge representation. We will also touch upon the Semantic Web. The goal is to understand the technology behind today's large knowledge bases such as Google's Knowledge Graph, NELL, DBpedia, and YAGO.

Objectifs pédagogiques

The course will consist of lectures and practical exercises (labs). The lectures will be interactive, with small quizzes to check the understanding of the topics. The course will cover:
Knowledge representation (RDF, RDFS, OWL)
Named Entity Recognition (Regular Expressions, Tries)
Named Entity Annotation (Rule-based and statistical)
Design of extraction algorithms and evaluation
Disambiguation (context-based, coherence-based)
Instance Extraction (Hearst extraction, set expansion, iteration)
Fact extraction from structured sources (Wrapper induction, extraction from Wikipedia)
Fact extraction from text (DIPRE algorithm, POS annotation)
Dependency Grammars
Extraction by reasoning
Prerequisites:
Basics of Predicate Logic
Basics of Probability Theory
Programming in Java: data structures, Input/Output, File handling

30 heures en présentiel (20 blocs ou créneaux)

effectifs minimal / maximal:

/30

Diplôme(s) concerné(s)

Diplôme d'ingénieur

Format des notes

Numérique sur 20

Littérale/grade réduit

Pour les étudiants du diplôme Diplôme d'ingénieur

L'UE est acquise si Note finale >= 10

Crédits ECTS acquis : 3 ECTS
Crédit d'UE électives acquis : 3

La note obtenue rentre dans le calcul de votre GPA.

Méthodes pédagogiques

evaluation by labs

Enseignement ATHENS - MOB_0AT33_TP : Information Extraction (Télécom Paris)

Domaine > Informatique.