Descriptif
In this course, students will learn the basics of semantic information extraction, i.e. the art and science of extracting facts from natural language documents. This includes algorithms for extraction from the Web, as well as the essentials of natural language processing and knowledge representation. We will also touch upon the Semantic Web. The goal is to understand the technology behind today's large knowledge bases such as Google's Knowledge Graph, NELL, DBpedia, and YAGO.Objectifs pédagogiques
The course will consist of lectures and practical exercises (labs). The lectures will be interactive, with small quizzes to check the understanding of the topics. The course will cover: Knowledge representation (RDF, RDFS, OWL)
Named Entity Recognition (Regular Expressions, Tries)
Named Entity Annotation (Rule-based and statistical)
Design of extraction algorithms and evaluation
Disambiguation (context-based, coherence-based)
Instance Extraction (Hearst extraction, set expansion, iteration)
Fact extraction from structured sources (Wrapper induction, extraction from Wikipedia)
Fact extraction from text (DIPRE algorithm, POS annotation)
Dependency Grammars
Extraction by reasoning
Prerequisites:
Basics of Predicate Logic
Basics of Probability Theory
Programming in Java: data structures, Input/Output, File handling
30 heures en présentiel (20 blocs ou créneaux)
effectifs minimal / maximal:
/30Diplôme(s) concerné(s)
Format des notes
Numérique sur 20Littérale/grade réduitPour les étudiants du diplôme Diplôme d'ingénieur
L'UE est acquise si Note finale >= 10- Crédits ECTS acquis : 3 ECTS
- Crédit d'UE électives acquis : 3
La note obtenue rentre dans le calcul de votre GPA.