Tags:
create new tag
view all tags

Machine Learning - A.Y. 2019/2020/2021

Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example input in order to make data-driven predictions or decisions, rather than following "strictly static" program instructions. Machine learning algorithms can be applied to virtually any scientific and non-scientific field (health, security and cyber-security, management, finance, automation, robotics, marketing..).

Instructor Telephone Office hours Studio
Paola Velardi 06-49918356 send e-mail Via Salaria 113 - 3° floor n. 3412
Stefano Faralli (Lab Assistant) send e-mail Unitelma Sapienza

Course schedule

FIRST semester:

When   Where
Monday 14.00-17.00 aula 7a castro laurenziano
Thursday 14:00-16:00 aula 7a castro laurenziano

Lessons in 2020 will be held in presence and in streaming, according to the COVID-19 rules decided by SAPIENZA for all courses. Videorecorded lessons will be available ONLY for students who will register in the Google group. Registered students will receive a Meet or Zoom link.

Important Notes

The course is taught in English. Attending classes is HIGHLY recommended (homeworks, challenge, laboratory)

Homeworks and self-assessment tests are distributed via the google group, you MUST register

Summary of Course Topics

The course introduces motivations, paradigms and applications of machine learning. This is to be considered an introductory course. An advanced course is offered during the second semester: Deep Learning and Applied Artificial Intelligence, which is mostly concerned with image processing applications. Other applications, along with in-depth explanation of specific algorithms, are introduced in many of our offered courses.

If, based on the syllabus provided hereafter, you feel that you already have a basic understanding of the field, I suggest that you attend directly the advanced course.

The course has 3 objectives:

1.EXPLANATION OF THE MACHINE LEARNING WORKFLOW (steps required for a successful ML project, from data engineering, to selection of algorithms and hyper-parameter tuning, to evaluation)

2.IN BREADTH COVERAGE OF MACHINE LEARNING ALGORITHMS (classifiers and regressors, off-the-shelf and deep methods, supervised and unsupervised)

3.LABS & USAGE OF ML POPULAR PLATFORMS

Topics:

Supervised learning: decision and regression trees, naīve Bayes, support vector machine, neural networks, introduction to deep learning, ensamble methods.

Unsupervised learning: Clustering, association rules (if time allows).

Semi-supervised learning: Reinforcement learning, Q-learning.

Building machine learning systems: feature engineering, model selection, hyperparameter tuning, error analysis.

Laboratory

In-class labs (bring your computer on Lab days!) are dedicated to learning to design practical machine learning systems: feature engineering, model selection, error analysis. We will use mostly the scikit-learn library, and Tensor Flow

After a couple of introductory labs, labs will be organized in challenges.

Lab material (slides, datasets for challenges) will be provided before lab days via the Google group. Lab assistant is Dr. Stefano Faralli.


Pre-requisites (IMPORTANT!)

Students must be familiar with the Python computer programming language.
The familiarity with Python arrays is a fundamental prerequisite!
As a quick reference to the whole expected language prerequisites please do refer to the following "cheat sheet": <a data-saferedirecturl="https://www.google.com/url?q=https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf&source=gmail&ust=1567770638935000&usg=AFQjCNEuiRQR8n2d-O_5aKiBE0AAabc6Qg" href="https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf" target="_blank">https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf</a>

If you wish to learn Python from scratch or just refine your programming skills we suggest you to read the following documentation: <a data-saferedirecturl="https://www.google.com/url?q=https://docs.python.org/3/&source=gmail&ust=1567770638935000&usg=AFQjCNElzGUl-hHjnwpOEWpgkCVOPTcPfg" href="https://docs.python.org/3/" target="_blank">https://docs.python.org/3/</a>

Please be aware that we will NOT spend time during the course to cover background competences that master students in Computer Science are expected to have already!!!

Textbooks

There are plenty of on-line books and resources on Machine Learning. We list here some of the most widely used textbooks:

Additional useful texts:

Resources:

A dataset search engine: <a data-saferedirecturl="https://www.google.com/url?hl=it&q=https://toolbox.google.com/datasetsearch&source=gmail&ust=1536757974936000&usg=AFQjCNGUDrNbIDzKHNbFf7t7pga6Eme0Ng" href="https://toolbox.google.com/datasetsearch" target="_blank">https://toolbox.google.com/datasetsearch</a>

Exam rules (please read carefully*)

* the teacher will NOT answer emails asking for details already provided in this page.*

Structure of the exam

The final exam grade is based on 2 or 3 tasks, depending upon the maximum grade you would like to obtain.

For those willing to obtain 28-30L:

  1. Written exam on course material
  2. Scikit-learn/Keras project (or other tools/platforms, however labs will use sckit & keras). The choice of the project topic is free , see details and examples later in this page.
  3. One coding challenge - The challenge will consist of a practical project regarding the topics teached during the laboratory (students will be required to follow the entire machine learning workflow).
The solutions can be delivered on line during a pre-defined time interval, detailed instructions and rules will be available later.

For those willing to obtain from 18 to 27:

  1. Written exam on course material
  2. One coding challenge - The challenge will consist of a practical project regarding the topics teached during the laboratory (students will be required to follow the entire machine learning workflow). The solutions can be delivered on line during a pre-defined time interval, detailed instructions and rules will be available later.

How to prepare for the written test

Self-assessment homeworks are distributed after each lesson to members of the Google group. The written exam will include closed questions and open questions similar to those in Self-assessments, therefore it is important that each week you take the time to work on self assessments. It is also advisable that you share your solutions with other students on the Google group for discussion.

Of course, to answer self-assessment questions, you need to study the provided material and use any other material you may find useful.

The written test

Each written test will include a set of (relatively simple) closed-answer questions and a 2-4 (depending on complexity) open-answer exercises, both on practical and theoretical issues. Closed-answer questions are usually simple but are a FILTER: students that will not answer correctly at least 75% of the closed questions will NOT pass the exam.

The test has a maximum weight of 19/30.

The Coding Challenge


During the last part of this year's course the teacher of the lab will announce a "coding challenge". The challenge will consist in the development of a solution for a given ML problem.
The solution of the problem to be delivered (before the end of the academic year, exam session of September 2021) consists of the necessary working code implementing the entire ML pipeline "from the data to the system evaluation".

Others instructions will be provided at the time of the challenge announcement.

The challenge has a maximum weight of 8/30

The Project

The project is NOT an obligation. Those willing to achieve a higher grade should submit a project. Teams of 2 are preferred.

A very good project can increase your final grade up to 30L.

How a project is evaluated:

  • Realtively simple problem, feature engineering needed, medium-large datset, use of algorithms on available platforms, use of sckit-learn or a more efficient implementation of existing algorithm (e.g. some ad-hoc software developed), performance evaluation: up to 2 points (which means that you can obtain 29, depending on the test and challenge results)
  • Original problem, complex dataset with non-trivial feature engineering, torough data analysis and feature/hyper-parameter fitting, not straightforward use of algorithms or new algorithm or ad-hoc implementation, performance evaluation and insight on results: up to 4 points (which means that you can obtain 30L, depending on the test and challenge results)
Three very good projects (2017-2019): Deep-Reinforcement-Learning-Proyect-Documentation-Alfonso-Oriola.pdf, A Framework for Genetic Algorithms, RainForestML2016Pantea.pdf, ProjectAlessi-Colace-Facial-expressions.zip

Two among the best 2020 projects: DiCataldo_Boldrini_GameOfLife.pdf TimeSeriesCOVIDAragonaPodo.pdf

Other information

During the COVID emergency written test exams will be held on line. Instructions are sent to the Google Group members.

  • IMPORTANT: To assess the number of participants in each written exam a Google form will be sent via the Google group about two weeks BEFORE the exam date. Please check your @studenti mail on a regular basis. Please note that registering to a test date via the Google form does not exempt you from registering on INFOSTUD. I cannot register your final grade in a given exam session IF YOU DID NOT REGISTERED on INFOSTUD for that session. Furthermore, to register a grade I need both the result of the written test AND the project (and they must both be >=18). However, you do not need to deliver both simultaneously. You can, e.g., pass the test on January and deliver the project on June. I will then register on June.
  • IMPORTANT: during the test you can't use ANY material. You need to bring with you pen, paper, calculators (cellular phone is ok but it must be visible on the desk).

Registering to INFOSTUD and Google form

  • IMPORTANT: INFOSTUD sessions have a start date and an end date. This is because I can't register a grade until you don't pass the test, deliver the project, and pass the challenge. So, there is not one single date I can establish. Note that on INFOSTUD you can only see only the start date of an exam session. THIS IS NOT the date of the test! Usually, there are two test dates within any exam session. You can register for a test through the Google form I circulate before any test date. The Google form is used by the instructor (me) to understand how many students will participate in a written exam session, and organize the session accordingly (e.g., sending Zoom invitations for on-line tests, etc). Instead, INFOSTUD is the Sapienza exam official registration site. Please remember to register also on Infostud IF you believe that during the session (winter or summer) you will be able to obtain a final grade - which means that you deliver the project, the challenge, and that you pass the test. If you pass the test in one session, you can deliver the challenge and/or the project in the subsequent session, I will save your result. The contrary (delivering the project before passing the test) is NOT advisable. Unfortunately, some students never manage to pass the test.

Google Group

*MANDATORY!!*

Please Subscribe to Machine Learning 2020-2021 Group Machine Learning 2020-21 on Google Groups

Slides and course materials (download only those with date=2020, the others need to be updated)

NOTE: SLIDES are not sufficient for an in-depth understanding of discussed topics. For each set of slides, please refer to the provided papers and additional references.

Timetable Topic PPT PDF Suggested readings (pointers are also in slides)
2020 Introduction to ML. Course syllabus and course organization. 1.ML2020Introductionlight.pptx 1.ML2020Introductionlight.pdf  
2020 Building ML systems (ML pipeline, types of ML systems)

2.BuildingMachineLearningSystems.pptx

2.BuildingMachineLearningSystems.pdf

https://towardsdatascience.com/architecting-a-machine-learning-pipeline-a847f094d1c7 (ML pipeline)

https://medium.com/@aj.tambe.02/types-of-ml-systems-160601843758 (types of ML systems)

https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html (hyperparameter tuning)

See also "Google AutoML " project for hyperparameter tuning with structured data: https://cloud.google.com/automl-tables/

2020

Basic algorithms:Decision and Regression Trees, Perceptron model

3.dtrees.pptx

3b.Perceptron_.pptx

3.dtrees.pdf

3b.Perceptron_.pptx

Decision Trees: http://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/mitchell-dectrees.pdf

Regression Trees: http://www2.stat.duke.edu/~rcs46/lectures_2017/08-trees/08-tree-regression.pdf

Example of perceptron epochs: https://sefiks.com/2020/01/04/a-step-by-step-perceptron-example/

2020 Data pre-processing and feature engineering 2b.FeatureEngineering.pptx 2b.FeatureEngineering.pdf

http://www.machinelearningtutorial.net/2017/06/17/feature-engineering-in-machine-learning/

Other useful links are found in slides

2020 Performance Evaluation: error estimates, confidence intervals, one/two-tail test 4.evaluation.pptx   chapter5-ml-EVALUATION.pdf
2019 Neural Networks

5.neural.pptx

IntuitionNN.pptx

 

https://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf

2019 Deep Learning (Convolutional NN and denoising autoencoders)

6.Deeplearningsplit1.pptx

6.Deeplearningsplit2.pptx

6.DeepLeaningsplit3.pptx

 

and also https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

see also this nice video: https://www.youtube.com/watch?v=aircAruvnKk

2019 Ensemble methods (bagging, boosting, Decision Forest) 8.ensembles.pptx  

Random Forests: http://www.math.mcgill.ca/yyang/resources/doc/randomforest.pdf

https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1249

2019 Support Vector Machines 9.svm.pptx   http://cs229.stanford.edu/notes/cs229-notes3.pdf
2019 Probabilistic learning: Maximum Likelyhood Learning, Maximum Aposteriori Estimate, Naive Bayes 10.naivebayes.pdf
 

http://www.cs.columbia.edu/~mcollins/em.pdf

2019 Unsupervised learning: Clustering 11.clustering.pptx   https://www.researchgate.net/publication/282523039_A_Comprehensive_Survey_of_Clustering_Algorithms
  Unsupervised learning: Association Rules     Not in 2020
2019 Unsupervised Learning: Reinforcement Learning and Q-Learning

12.reinforcement.pptx

12.reinforcement.pdf

 

https://github.com/junhyukoh/deep-reinforcement-learning-papers#all-papers

https://skymind.ai/wiki/deep-reinforcement-learning

https://christian-igel.github.io/paper/RLiaN.pdf

  Unsupervised Learning: genetic Algorithms     Not in 2020

Syllabus (2020)

  • What is machine learning. Types of learning.
  • Workflow of ML systems.
  • Classifiers and regressors:Decision Trees and Regression trees
  • Feature engineering
  • Evaluation: performance measures, confidence intervals and hypothesis testing
  • Neural Networks
  • Introduction to Deep learning (Convolutional networks, Denoising Autoencoders)
  • Ensamble method
  • Support Vector Machines
  • Maximum Likelyhood Learning (MLE, MAP) and Naive Bayes
  • Clustering
  • Reinforcement learning and Q-Learning, Deep Q
  • Tools: Scikit-learn, Tensor flow, KERAS
Topic attachments
I Attachment History Action Size Date Who Comment
PowerPointpptx IntuitionNN.pptx r1 manage 7685.9 K 2019-11-12 - 09:20 PaolaVelardi  
Compressed Zip archivezip ProjectAlessi-Colace-Facial-expressions.zip r1 manage 2340.2 K 2019-07-15 - 16:20 PaolaVelardi  
Edit | Attach | Watch | Print version | History: r294 < r293 < r292 < r291 < r290 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r294 - 2020-09-17 - PaolaVelardi






 
Questo sito usa cookies, usandolo ne accettate la presenza. (CookiePolicy)
Torna al Dipartimento di Informatica
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback