Tags:
create new tag
view all tags

Machine Learning - A.Y. 2020/2021/2022

(14/09/2021) Lessons will start on Wednesday 22nd, Aula 2L Castro Laurenziano. Students are warmly invited to assist this lesson in presence, unless their residence is abroad or outside Lazio Region. In the next days I will specify a zoom link for remote students who are unable to assist in presence. Please, do subscribe the Google group (see below), to receive updates. Students are also requested to have the green pass.

IMPORTANT: please read carefully all emails sent to the Google group, and info on this web page concerning exam rules.

Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from data in order to make data-driven predictions or decisions, rather than following "strictly static" program instructions. Machine learning algorithms can be applied to virtually any scientific and non-scientific field (health, security and cyber-security, management, finance, automation, robotics, marketing..).

InstructorSorted ascending Role Office hours Studio
Paola Velardi Professor velardi AT di DOT uniroma1 DOT it Via Salaria 113 - 3° floor
Stefano Faralli (Lab Assistant) stefano DOT faralli AT uniroma1 DOT it Unitelma Sapienza

Course schedule

FIRST semester AY 2021-22 (to be confirmed):

When   Where
Wednesday 16.00-18.00 aula 2L castro laurenziano
Thursday 13:00-16:00 aula 1L castro laurenziano

Important Notes

The course is taught in English. Attending classes is HIGHLY recommended (homeworks, challenge, laboratory)

Homeworks and self-assessment tests are distributed via the google group, you MUST register

Summary of Course Topics

The course introduces motivations, paradigms and applications of machine learning. This is to be considered an introductory course. An advanced course is offered during the second semester: Deep Learning and Applied Artificial Intelligence, which is mostly concerned with image processing applications. Other applications, along with in-depth explanation of specific algorithms, are introduced in many of our offered courses.

If, based on the syllabus provided hereafter, you feel that you already have a basic understanding of the field, I suggest that you attend directly the advanced course.

The course has 3 objectives:

1.EXPLANATION OF THE MACHINE LEARNING WORKFLOW (steps required for a successful ML project, from data engineering, to selection of algorithms and hyper-parameter tuning, to evaluation)

2.IN BREADTH COVERAGE OF MACHINE LEARNING ALGORITHMS (classifiers and regressors, off-the-shelf and deep methods, supervised and unsupervised)

3.LABS & USAGE OF ML POPULAR PLATFORMS

Topics:

Supervised learning: decision and regression trees, naīve Bayes, support vector machine, neural networks, convolutional neural networks, recurrent neural networks, ensamble methods.

Unsupervised learning: Clustering, autoencoders.

Semi-supervised learning: Reinforcement learning, Q-learning.

Building machine learning systems: feature engineering, model selection, hyperparameter tuning, error analysis.

Laboratory

In-class labs (bring your computer on Lab days!) are dedicated to learning to design practical machine learning systems: feature engineering, model selection, error analysis. We will use mostly the scikit-learn library, and Tensor Flow

After a couple of introductory labs, labs will be organized in challenges.

Lab material (slides, datasets for challenges) will be provided before lab days via the Google group. Lab assistant is Dr. Stefano Faralli.


Pre-requisites (IMPORTANT!)

Students must be familiar with the Python computer programming language.
The familiarity with Python arrays is a fundamental prerequisite!
As a quick reference to the whole expected language prerequisites please do refer to the following "cheat sheet": link

If you wish to learn Python from scratch or just refine your programming skills we suggest you to read the following documentation: link

Please be aware that we will NOT spend time during the course to cover background competences that master students in Computer Science are expected to have already!!!

For student less familiar with phy and ML programming environments , two lab lessons have been pre-recorded and made available to students who subscribed to the Google Group on a share Drive folder.

Textbooks

There are plenty of on-line books and resources on Machine Learning. We list here some of the most widely used textbooks:

Additional useful texts:

Resources:

Exam rules (please read carefully)

The teacher will NOT answer emails asking for details already provided in this page.

Structure of the exam

The final exam grade is based on 2 or 3 tasks, depending upon the maximum grade you would like to obtain.

The exam has 3 parts:

  1. Written exam on course material
  2. Scikit-learn/Keras project (or other tools/platforms, however labs will use sckit & keras). The choice of the project topic is free , see details and examples later in this page. The project is NOT mandatory, however the maximum achievable grade without project is 28.
  3. One coding challenge - The challenge will consist of a practical project regarding the topics teached during the laboratory (students will be required to follow the entire machine learning workflow).
Both challenge ad project cannot be delivered freely and must be delivered JOINTLY (not delivering the project with the challenge is interpreted as NOT willing to submit the project, since it is not an obligation).

The instructor will specify the submission deadlines BEFORE each exam session. If in a session you pass the written test but you do not deliver the challenge and/or the project , you will not lose the test result. The final grade will be registered in a subsequent session, when all the exam duties are completed.

To summarize:

For those willing to obtain from 18 to 28:

  1. Written exam on course material (60% of the final grade, max is 16,8)
  2. One coding challenge (40% of the final grade, max is 11,2)- The challenge will consist of a practical project regarding the topics teached during the laboratory (students will be required to follow the entire machine learning workflow). The solutions can be delivered on line during a pre-defined time interval, detailed instructions and rules will be available later.
To obtain up to 3 extra points (from 29 to 31=30L) on top of the grade obtained with test and challenge, you must deliver a project of your choice. Details are provided below.

The written test

Each written test may include a set of (relatively simple) closed-answer questions and always include 2-4 (depending on complexity) open-answer exercises, both on practical and theoretical issues. Closed-answer questions are usually simple but are a FILTER: students that will not answer correctly at least 75% of the closed questions will NOT pass the exam.

To prepare for the test, self assessments will be sent almost every week throughout the course, using the Google group. Accessing the group you can retrieve all previous emails, in case you subscribe later. Self assessments are important, since they are very similar to possible exam questions. Note that these tests will NOT be corrected by the instructor, who only occasionally will send solutions. The Google group is a perfect way to share and discuss your solutions with the other classmates.

The teacher every year creates a course folder on Drive (shared with the Google Group) where you can download self assessments and upload your solutions (that will be cheked occasionally and NOT systematically). Students can read each other's solutions and discuss them on the Google group.

The Coding Challenge


During the last part of this year's course the teacher of the lab will announce a "coding challenge". The challenge will consist in the development of a solution for a given ML problem.
The solution of the problem to be delivered (delivery deadlines are communicated before the beginning of each session) consists of the necessary working code implementing the entire ML pipeline "from the data to the system evaluation".

Others instructions will be provided at the time of the challenge announcement.

The Project

The project is NOT an obligation. Those willing to achieve a higher grade should submit a project. Teams of 2 are preferred.

A good project can increase your final grade up to 30L.

How a project is evaluated:

  • Realtively simple problem, feature engineering needed, medium-large datset, use of algorithms on available platforms, use of sckit-learn or a more efficient implementation of existing algorithm (e.g. some ad-hoc software developed), performance evaluation: 0,5-2 points (which means that you can obtain 29 or 30, depending on the test and challenge results)
  • Original problem, complex dataset with non-trivial feature engineering, torough data analysis and feature/hyper-parameter fitting, not straightforward use of algorithms or new algorithm or ad-hoc implementation, performance evaluation and insight on results: up to 3 points (which means that you can obtain 30L, depending on the test and challenge results)
Three very good projects (2017-2019): Deep-Reinforcement-Learning-Proyect-Documentation-Alfonso-Oriola.pdf, A Framework for Genetic Algorithms, RainForestML2016Pantea.pdf, ProjectAlessi-Colace-Facial-expressions.zip

Two among the best 2020 projects: DiCataldo_Boldrini_GameOfLife.pdf TimeSeriesCOVIDAragonaPodo.pdf

Projects must be uploaded by students on the course folder on Drive (accessible to the members of the Google group). The project sub-folder includes a spreadsheet where students MUST specify if they intend to release a project or not, and in case they deliver the project, add the date of release and the title and co-authors (max 2!!).

Exam Rules

NOTE: Exam procedures may change according to the pandemic. This Section will be updated accordingly. Currently (21-22 winter session) exams are only in presence.

  • IMPORTANT: during the test you can't use ANY material. During in-presence tests you need to bring with you pen, paper, calculators (cellular phone is ok but it must be visible on the desk). On-line tests are based on the Zoom platform and require 2 (TWO) connections: one on your computer, one on you cellular phone (for environmental supervision by instructors). The platform for the test is exam.net.

DEADLINES (21-22):

Projects AND challenge must be delivered strictly before:

11.59 PM, January 10, 2022 (to have the final grade registered on January)

11.59 PM, February 10, 2022 (to have the final grade registered on February)

further deadlines will be announced for the exams of

- June

- July

- September

VERY IMPORTANT: the role of INFOSTUD and Google forms

  • There are 3 main exam sessions: SUMMER (JUNE-JULY) PRE-FALL (September) and WINTER (January February). INFOSTUD sessions have a start date and an end date. This is because I can't register a grade until you don't pass the test, deliver the project, and pass the challenge. So, there is not one single date I can establish. Note that on INFOSTUD you can see only the start date of an exam session. THIS IS NOT the date of the test! The "exact" dates of the exams are published before each session, FOR ALL THE COURSES, on the Computer Science web site .
This is what you need to do:
  1. Register for an exam session on INFOSTUD. You need to do this BEFORE the session starts (usually, MAY, AUGUST and DECEMBER are the INFOSTUD registration periods). You can check the registration period on INFOSTUD. Remember that a session may include more than one written test date. If you plan to give your test on July, you still need to register on May! Being registered on INFOSTUD is necessary for me to register your final grade. However, if in a session you pass the test but do not deliver the challenge, I can't register the final grade. So you need to register again in the subsequent exam session.
  2. You must also register for a test through the Google form I circulate before any test date. The Google form is used by the instructor (me) to understand how many students will participate in a written exam session, and organize the session accordingly (e.g., sending Zoom invitations for on-line tests, etc). It is an informal document I use for organization purposes, and does not exempt you from registering on INFOSTUD.

Google Group

*MANDATORY!! (Google groups vary every academic year)*

Please Subscribe to Machine Learning 2020-2021 Group Machine Learning 2020-21 on Google Groups

NOTE: only students with Sapienza email can access - for security reason- sorry about. The students who will receive an official mail later, will still be able to access the material and recorded lessons and mail when they will receive the @studenti email.

Machine Learning 2021-22 group

Slides and course materials (download only those with date=2021, the others need to be updated)

NOTE: SLIDES are not sufficient for an in-depth understanding of discussed topics. For each set of slides, please refer to the provided papers and additional references.

TimetableTopicPPTPDFSuggested readings (pointers are also in slides)
2021 Introduction to ML. Course syllabus and course organization. 1.ML2021Introductionlight.pptx
1.ML2021Introductionlight.pdf
2021 Building ML systems (ML pipeline, types of ML systems)

2.BuildingMachineLearningSystems.pptx

2.BuildingMachineLearningSystems.pdf

https://towardsdatascience.com/architecting-a-machine-learning-pipeline-a847f094d1c7 (ML pipeline)

https://medium.com/@aj.tambe.02/types-of-ml-systems-160601843758 (types of ML systems)

https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html (hyperparameter tuning)

See AutoML open source H20 and "Google AutoML " project: https://cloud.google.com/automl-tables/

2021

Basic algorithms:Decision and Regression Trees, Perceptron model

3.dtrees.pptx

3b.Perceptron_.pptx

3.dtrees.pdf

3b.Perceptron_.pptx

Decision Trees: http://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/mitchell-dectrees.pdf

Regression Trees: http://www2.stat.duke.edu/~rcs46/lectures_2017/08-trees/08-tree-regression.pdf

Example of perceptron epochs: https://sefiks.com/2020/01/04/a-step-by-step-perceptron-example/

2021 Neural Networks (backpropagation algorithm) 5.neural.pptx 5.neural.pdf additional readings
2021 Data pre-processing and feature engineering 2b.FeatureEngineering.pptx 2b.FeatureEngineering.pdf

http://www.machinelearningtutorial.net/2017/06/17/feature-engineering-in-machine-learning/

*Other useful links are found in slides*

2021 Performance Evaluation: error measures for classifiers and regressors, error estimates, confidence intervals, one/two-tail test 4.evaluation.pptx 4.evaluation.pdf chapter5-ml-EVALUATION.pdf
2021 Deep Learning (Convolutional NN and denoising autoencoders) 7.Deeplearning.pptx
7.Deeplearning.pdf

and also https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

see also this nice video: https://www.youtube.com/watch?v=aircAruvnKk

2021 Ensemble methods (bagging, boosting, Decision/regression Forest, gradient boosting) 8.ensembles.pptx 8.ensembles.pdf

Random Forests: http://www.math.mcgill.ca/yyang/resources/doc/randomforest.pdf

https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1249

2021 Support Vector Machines 9.svm.pptx 9.svm.pdf http://cs229.stanford.edu/notes/cs229-notes3.pdf
2021 Probabilistic learning: Maximum Likelyhood Learning, Maximum Aposteriori Estimate, Naive Bayes 10.naivebayes.pdf
10.naivebayes.pdf

http://www.cs.columbia.edu/~mcollins/em.pdf

2020 Unsupervised learning: Clustering 11.clustering.pptx
11.clustering.pdf https://www.researchgate.net/publication/282523039_A_Comprehensive_Survey_of_Clustering_Algorithms
Unsupervised learning: Association Rules Not in 2020 Not in 2020
2020 Unsupervised Learning: Reinforcement Learning and Q-Learning 12.reinforcement.pptx 12.reinforcement.pdf

https://github.com/junhyukoh/deep-reinforcement-learning-papers#all-papers

https://skymind.ai/wiki/deep-reinforcement-learning

https://christian-igel.github.io/paper/RLiaN.pdf

2020

RNN and LSTM sequential Deep learning methods 13.RNN.pptx 13.RNN.pdf Not in 2020

Syllabus (2020)

  • What is machine learning. Types of learning.
  • Workflow of ML systems.
  • Classifiers and regressors:Decision Trees and Regression trees
  • Feature engineering
  • Evaluation: performance measures, confidence intervals and hypothesis testing
  • Neural Networks
  • Introduction to Deep learning (Convolutional networks, Denoising Autoencoders, RNN and LSTM)
  • Ensamble methods (Boosting, bagging, Adaboost, Random Forest, Gradient Boosting)
  • Support Vector Machines
  • Maximum Likelyhood Learning (MLE, MAP) and Naive Bayes
  • Clustering
  • Reinforcement learning and Q-Learning, Deep Q
  • Tools: Scikit-learn, Tensor flow, KERAS
Edit | Attach | Watch | Print version | History: r337 < r336 < r335 < r334 < r333 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r337 - 2021-11-24 - PaolaVelardi






 
Questo sito usa cookies, usandolo ne accettate la presenza. (CookiePolicy)
Torna al Dipartimento di Informatica
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback