You are here: AU PhD  Graduate Schools Science and Technology Courses and how to sign up Scientific courses Natural Language Processing with Machine Learning (2016)

Natural Language Processing with Machine Learning (2016)

ECTS credits: 4 ECTS

 

Course parameters:
Language: English
Level of course: PhD
Time of year: Two hours on Wednesday afternoon and two hours on Friday morning, in weeks 5, 7, 9, and 11 2016
No. of hours: 16 hours contact over four weeks; 32 hours preparation, 48 hours assignments. Total: 96
Capacity limits: 25

 

Objectives of the course:
This course gives a tour of data-intensive natural language processing (NLP). We will study the machine processing of human languages from an artificial intelligence perspective, using statistical machine learning (ML). The course will introduce the field, and demonstrate multiple complete practical examples in various real-world tasks. Covering basic structures and theory, the course will leave participants with knowledge of how to match machine learning tool with problem in the context of language processing. Topics covered include entity recognition, sentiment analysis, and processing for indexing and retrieval. The machine learning skills required to complete the course are included (i.e. types of learner, representations, and evaluation).

 

Learning outcomes and competences:
At the end of the course, the student should be able to:

  • Understand a natural language processing pipeline
  • Build a small search engine
  • Code, use and evaluate a statistical machine learning tool
  • Describe sequence labelling, instance labelling, multi-class labelling, unsupervised and supervised learning, and machine learning evaluation
  • Describe the biases present in various machine learning approaches and how these affect language processing
  • Characterise and select a machine learning paradigm given an NLP problem setting
  • Perform tokenisation, part-of-speech tagging, named entity extraction and sentiment analysis
  • Program Python & NLTK

 

Compulsory programme:
Four mini-assignments, one for each week of the course (20% each), and a final oral exam.

 

Course contents:
The course will be held in a seminar style. It has four parts, each with a related mini-assignment.

  1. The first part is an introduction to statistical NLP, with a tokenisation/segmentation and indexing assignment, leading to a small search engine.
  2. The second part introduces machine learning and its evaluation, as well as some simple language models, leading to a sentiment analysis system.
  3. The third part looks at part-of-speech tagging and feature extraction, complemented by sequence labelling and feature selection, resulting in a tagger and an analysis of which ML paradigms fit it best.
  4. The final part will examine named entity recognition, social media, and unsupervised methods, with a free choice of final assignment task (given lecturer approval).

 

Prerequisites:
Core: Programming experience; basic maths and probability skills.

Not strictly necessary but useful: Python programming; machine learning knowledge; information retrieval skills.

The core is at PhD level, and so appropriate reading and research skills are assumed

 

Name of lecturer:
Dr. L. Derczynski

 

Type of course/teaching methods:
There are four taught hours per segment of the course, followed by ~12 hours of assignment work. Seminars are given in workshop / tutorial style.

Students are expected to do some reading before and after class, and to achieve top assignment grades will require some scientific analysis of the system's results.

 

Literature:
Manning & Schutze - Foundations of Statistical Natural Language Processing
Jurafsky & Martin - Speech and Language Processing

 

Course homepage:
derczynski.com/sheffield/teaching.html 

 

Course assessment:
Four mini-assignments, one for each week of the course (20% each), and a final oral exam.

 

Provider:
Department of Computer Science, Aarhus University

 

Time:
Two hours in the afternoon on Wednesday and two hours in morning on Fridays at the following dates:

  • Wednesday, 3 February 2016, 14:00-16:00
  • Friday, 5 February 2016, 9:00-11:00
  • Wednesday, 17 February 2016, 14:00-16:00
  • Friday, 19 February 2016, 9:00-11:00
  • Wednesday, 2 March 2016, 14:00-16:00
  • Friday, 4 March 2016, 9:00-11:00
  • Wednesday, 16 March 2016, 14:00-16:00
  • Friday, 18 March 2016, 9:00-11:00

Place:
Building 5524, room 137 (InCuba), Aarhus University

 

Registration:
Deadline for registration is 16 November 2015
For registration: send an e-mail to ira@cs.au.dk
If you have any questions, please contact Ira Assent, e-mail: ira@cs.au.dk

 

Comments on content: 
Revised 20.06.2016