---+!! Machine Learning - A.Y. 2023/2024 Lessons will start in September, see https://corsidilaurea.uniroma1.it/en/corso/2023/29932/home for an updated schedule and classroom locations. ---+!! IMPORTANT: Please read carefully all emails sent to the Google group, and info on this web page concerning exam rules. The professor will not answer emails with questions whose answer is on this web page. ---+ Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from data in order to make data-driven predictions or decisions, rather than following "strictly static" program instructions. Machine learning algorithms can be applied to virtually any scientific and non-scientific field (health, security and cyber-security, management, finance, automation, robotics, marketing..). | *Instructor* | *Role* | *Office hours* | *Studio* | | Paola Velardi | Professor | velardi AT di DOT uniroma1 DOT it | Via Salaria 113 - 3rd floor | | Bardh Prenkaj | (Lab Assistant) | prenkaj AT di DOT uniroma1 DOT it | via Salaria 113 3rd floor | ---++ Course schedule FIRST semester AY 2023-24: | *When* | | *Where* | | Wednesday | 16:00-19.00 | Aula 2 via del Castro laurenziano (aule Ingegneria) | | Thursday | 11:00-13:00 | aula Alfa, via Salaria 113, ground floor | ---++!! ---++ Important Notes ---+ The course is taught in English. Attending classes is HIGHLY recommended (homeworks, challenge, laboratory) *Homeworks and self-assessment tests are distributed via the google group, you MUST register* ---++ Summary of Course Topics The course introduces motivations, paradigms and applications of machine learning. This is to be considered an *introductory* course. An advanced courses is offered during the second semester: _Deep Learning and Applied Artificial Intelligence_, which is mostly concerned with image processing applications. Other applications, along with in-depth explanation of specific algorithms, are introduced in many of our offered courses. If, based on the syllabus provided hereafter, you feel that you already have a basic understanding of the field, *I suggest that you attend directly the advanced course*. Since students enrolled in our master's course in Computer Science come from different degrees, it is impossible to avoid overlappings. The course has 3 objectives: 1. EXPLANATION OF THE *MACHINE LEARNING WORKFLOW*: data selection, integration and transformation, data pre-processing, selection of appropriate ML algorithms, hyper-parameter tuning and model fitting, evaluation, and reporting (visual analytics) 2. *IN BREADTH* COVERAGE OF MAIN CATEGORIES of MACHINE LEARNING and DEEP LEARNING ALGORITHMS (classifiers and regressors, off-the-shelf and deep methods, supervised and unsupervised) 3. *IN PRACTICE EXPERIENCE* of the ML workflow on real data (during LABS) *Topics:* <b>Supervised learning: </b>decision and regression trees, neural networks, ensemble methods, deep methods: convolutional neural networks, recurrent neural networks. *Unsupervised learning*: Clustering, autoencoders. *Building machine learning systems*: data selection, data types (structured, unstructured, symbolic, numeric, sequential), feature engineering, data pre-processing, model selection, hyperparameter tuning, and error analysis. *Visual analytics*: data visualization and reporting. *Explainability* of AI systems ---++ Laboratory In-class labs (bring your computer on Lab days!) are dedicated to *learning to design practical machine learning systems*: feature engineering, model selection, error analysis. Experiments/labs will be conducted using [[https://colab.google/][Google Colab]] Lab material (slides, datasets for challenges) will be provided _before lab days_ *via the Google group*. Lab assistant is Dr. [[mailto:prenkaj@di.uniroma1.it][Bardh Prenkaj]]. Labs will begin towards the end of the course (tentatively, from mid-november) *The application domain this year will be MACHINE LEARNING in e-HEALTH and Finance* --- ---++ Pre-requisites (IMPORTANT!) Students must be familiar with the Python computer programming language.<br />The familiarity with Python arrays is a *fundamental* prerequisite!<br />As a quick reference to the whole expected language prerequisites please do refer to the following "cheat sheet": [[https://www.google.com/url?q=https://docs.python.org/3/&source=gmail&ust=1567770638935000&usg=AFQjCNElzGUl-hHjnwpOEWpgkCVOPTcPfg"%20href="https://docs.python.org/3/"%20target="_blank"][link]] <br /><br />If you wish to learn Python from scratch or just refine your programming skills we suggest you to read the following documentation: [[<a%20data-saferedirecturl="https://www.google.com/url?q=https://docs.python.org/3/&source=gmail&ust=1567770638935000&usg=AFQjCNElzGUl-hHjnwpOEWpgkCVOPTcPfg"%20href="https://docs.python.org/3/"%20target="_blank">https://docs.python.org/3/][link]] <div data-tooltip="Mostra contenuti abbreviati" id=":1cdx"><img alt="" src="https://ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif" /></div> Please be aware that we will<b> NOT spend time during the course covering background competencies </b>that master students in Computer Science are expected to have already!!! For students less familiar with _phy_ and ML programming environments, two "basic programming" lab lessons have been *pre-recorded* and made available in a shared folder to students who subscribe to the Google Group. ---++ Textbooks Please see the "readings" columns in the Course Material table below. In addition, there are plenty of on-line books and resources on Machine Learning. We list here some of the most widely used textbooks: * [[https://mitpress.mit.edu/books/fundamentals-machine-learning-predictive-data-analytics][Fundamentals of Machine Learning for Predictive Data Analytics]] MIT Press Additional useful texts: * Stuart Russell, Peter Norvig, [[http://aima.cs.berkeley.edu/][Artificial Intelligence: A Modern Approach]], Prentice Hall, 2009 (and subsequent editions) * Hal Daumé III A Course in Machine Learning (freely available <a href="http://ciml.info/" target="_blank">book</a>) <a href="A%20Course%20in%20Machine%20Learning" target="_blank">Resources:</a> * UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets.html * KAGGLE datasets https://www.kaggle.com/datasets * ..and many other freely available dataset resources ---++ Exam rules (please *read carefully*) *The teacher will NOT answer emails asking for details already provided on this page.* *Structure of the exam* The final exam grade is based on *2 tasks:* 1 Written *exam* on course material 1 A coding<b style="background-color: transparent;"> challenge </b>- The challenge will consist of a practical project regarding the topics taught during the laboratory (students will be required to follow the entire machine learning workflow on real and complex datasets). *The challenge cannot be delivered freely: the instructor will specify the submission deadlines BEFORE each exam session*. If in a session you pass the written test but you do not deliver the challenge and/or the project, *you will not lose the test result, provided you complete everything by September 2023 (after that date, the new program and new exam rules will be applied)*. The final grade will be registered in a subsequent session when all the exam duties are completed. *The final grade is computed in the following way:* 1 Written *exam* on course material *(60% of the final grade)* 1 One coding <b style="background-color: transparent;">challenge (40% of the final grade)</b>- The challenge will consist of a practical project regarding the topics taught during the laboratory (students will be required to follow the entire machine learning workflow). The solutions can be delivered on line during a pre-defined time interval, detailed instructions and rules will be available later. ---+ The written test Examples of written tests with solutions can be found on the shared drive folder. To prepare for the test, *self-assessments* will be sent almost every week throughout the course, using the Google group. Accessing the group you can retrieve all previous emails, in case you subscribe later. Self-assessments are important, since they are very similar to possible exam questions. Note that these tests *will NOT be corrected by the instructor*, who only occasionally will send solutions. The Google group is a perfect way to share and discuss your solutions with the other classmates. The teacher every year creates a course folder on Drive (shared with the Google Group) where you can download self-assessments and upload your solutions (that will be checked occasionally and NOT systematically). Students can read each other's solutions and discuss them on the Google group. ---+ ---+ The Coding Challenge <br />During the last part of this year's course, the lab teacher will announce a "coding challenge". The challenge will consist of the development of a solution for a given ML problem.<br />The solution to the problem to be delivered (delivery deadlines are communicated before the beginning of each session) consists of the necessary working code implementing the entire ML pipeline "from data pre-processing to the system evaluation".<br /><br /> ---+++ For the A-Y 23-24, details on the challenge will be provided in early December 2023. The topic of the 2023 challenge will be communicated during the labs. Challenge details and lab programs are distributed to the members of the Google group. Attendance is highly recommended. ---+ Exam Rules *IMPORTANT*: During the test, you *can't use ANY material*. You must bring your computer since we use [[https://exam.net/][exam.net]] as the platform for accessing the test, asking questions to the teacher, and uploading your solutions. Please make sure to be [[https://exam.net/how-it-works][acquainted ]]with the usage of this platform. ---+ ---+ *DEADLINES FOR CHALLENGE DELIVERY(2024):* The deadlines refer to the examination appeal periods. January: January 6th February: February 1st June; June 1st July: July 1st September: September 1st ---++ IMPORTANT: read here to understand the role of INFOSTUD and Google forms <b style="background-color: transparent;">There are 3 main exam sessions: SUMMER (JUNE-JULY) PRE-FALL (September) and WINTER (January February). </b>ML INFOSTUD sessions have a start date and an end date. This is because *I can't register a grade until you don't pass the test, and pass the challenge.* So, there is not one single date I can establish. Note that on INFOSTUD you can see *only the start date* of an exam session.<b style="background-color: transparent;"> THIS IS NOT the date of the test! </b> *The "exact" dates of the exams are published before each session, FOR ALL THE COURSES, on the [[https://corsidilaurea.uniroma1.it/en/corso/2023/29932/programmazione][ Computer Science web site]] .* *This is what you need to do:* 1 Register for an exam session on INFOSTUD. You need to do this BEFORE the session starts (usually, MAY, AUGUST and DECEMBER are the INFOSTUD registration periods). You can check the registration period on INFOSTUD. Remember that a session may include *more than one written test date*. If you plan to take your test on July, _you still need to register on May!_ Being registered on INFOSTUD is necessary for me to register your final grade. However, if in a session you pass the test but do not deliver the challenge, I can't register the final grade. So you need to *register again* in the subsequent exam session. The best would be that you register on INFOSTUD only in the session you plan to COMPLETE all your duties. For example, if you plan to take the test in June and deliver the challenge in September, only register for the September session. 1 You must also register for a test through the Google form *I circulate before any test date*. The Google form is used by the instructor (me) to understand how many students will participate in a written exam session, and organize the session accordingly (e.g., sending Zoom invitations for on-line tests, etc). It is an informal document I use for organization purposes, and *does not exempt you* from registering on INFOSTUD. ---++ ---+ Google Group **MANDATORY!!* (Google groups vary every academic year)* *NOTE: Only students with a Sapienza email can request access!! The students who will register with some delay will still be able to access the material when they receive the @studenti email.* %RED% *Click here to subscribe to the 23-24 group:* %ENDCOLOR% [[https://groups.google.com/a/di.uniroma1.it/g/machinelearning23-24][Machine Learning 2023-24 group]] ---++ Syllabus and course materials (%RED%download only those with date=2023%ENDCOLOR%, the others need to be updated) NOTE: *SLIDES are not sufficient for an in-depth understanding of discussed topics*. For each set of slides, please refer to the provided papers and additional references. <table border="1" cellpadding="0" cellspacing="1"> <tbody> <tr><th style="text-align: left;">Timetable</th><th style="text-align: center;">Topic</th><th style="text-align: right;">PPT</th><th>PDF</th><th>Suggested readings (pointers are also in slides)</th></tr> <tr><th style="padding-left: 30px;">2023</th> <td style="padding-left: 30px;">Introduction to ML. Course syllabus and course organization. READ CAREFULLY EXAM RULES!</td> <td style="padding-left: 30px;"> [[%ATTACHURL%/1.ML2023Introduction.pptx][1.ML2023Introduction.pptx]]</td> <td style="padding-left: 30px;"> [[%ATTACHURL%/1.ML2023Introduction.pdf][1.ML2023Introduction.pdf]]</td> <td style="padding-left: 30px;"> https://www.gsb.stanford.edu/insights/explainer-what-machine-learning <br /> https://www.youtube.com/watch?v=jGwO_UgTS7I&ab_channel=StanfordOnline </td> </tr> <tr style="padding-left: 30px;"><th style="text-align: left; padding-left: 30px;">2023</th> <td>Building ML systems (ML pipeline, types of ML systems)</td> <td> <br /> [[%ATTACHURL%/2.BuildingMachineLearningSystems.pptx][2.BuildingMachineLearningSystems.pptx]] </td> <td> [[%ATTACHURL%/2.BuildingMachineLearningSystems.pdf][2.BuildingMachineLearningSystems.pdf]]</td> <td> <br /> https://towardsdatascience.com/architecting-a-machine-learning-pipeline-a847f094d1c7 (ML pipeline)<br /><br /> https://medium.com/@aj.tambe.02/types-of-ml-systems-160601843758 (types of ML systems)<br /><br /></td> </tr> <tr><th style="text-align: left; padding-left: 30px;">2023</th> <td> Basic algorithms: Decision and Regression Trees, Perceptron model </td> <td> <br /> [[%ATTACHURL%/3.dtrees.pptx][3.dtrees.pptx<br />]]<br /> [[%ATTACHURL%/3b.Perceptron_.pptx][<br />]] [[%ATTACHURL%/3b.Perceptron_.pptx][3b.Perceptron_.pptx]] </td> <td> [[%ATTACHURL%/3.dtrees.pdf][3.dtrees.pdf<br />]]<br /> [[%ATTACHURL%/3b.Perceptron_.pptx][<br />]] [[%ATTACHURL%/3b.Perceptron_.pdf][3b.Perceptron_.pdf]] </td> <td> <br /> Decision Trees: http://www.cs.princeton.edu/courses/archive/spr07/cos424/papers/mitchell-dectrees.pdf<br /> Regression Trees: http://www2.stat.duke.edu/~rcs46/lectures_2017/08-trees/08-tree-regression.pdf<br /> Example of perceptron epochs: https://sefiks.com/2020/01/04/a-step-by-step-perceptron-example/ <br /> A better linear and non-linear non-deep separator: [[https://see.stanford.edu/materials/aimlcs229/cs229-notes3.pdf][SVM]]</td> </tr> <tr><th style="text-align: left;">2023</th> <td>Neural Networks (backpropagation algorithm explained with computational graphs)</td> <td> [[%ATTACHURL%/5.neural.pptx][5.neural.pptx]]</td> <td> [[%ATTACHURL%/5.neural.pdf][5.neural.pdf]]</td> <td> <br /> Additional readings: [[https://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf][<br />]] <br /> [[https://towardsdatascience.com/understanding-backpropagation-abcc509ca9d0][backpropagation explained]] [[http://www.cs.columbia.edu/~mcollins/ff2.pdf][computational graphs and backpropagation]] <br /> Several papers on specific issues are pointed at in the slides</td> </tr> <tr><th>2023</th> <td> Deep Architectures (Convolutional NN and denoising autoencoders) </td> <td> [[%ATTACHURL%/7.Deeplearning.pptx][7.Deeplearning.pptx]] [[%ATTACHURL%/5b.Deeplearning.pptx][<br />]]</td> <td> [[%ATTACHURL%/7.Deeplearning.pdf][7.Deeplearning.pdf]]</td> <td> <br /> Links to relevant publications are in the slides <br /> [[https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c][backpropagation in graphs]] (general rules to compute backpropagation in complex computational graphs) <br />https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/<br /> see also this nice video: [[https://www.youtube.com/watch?v=aircAruvnKk]]</td> </tr> <tr><th>2023</th> <td>RNN and LSTM sequential Deep learning methods</td> <td> [[%ATTACHURL%/13.RNN.pptx][13.RNN.pptx]]</td> <td> [[%ATTACHURL%/13.RNN.pdf][13.RNN.pdf]]</td> <td> https://arxiv.org/pdf/1909.09586.pdf </td> </tr> <tr><th>2023</th> <td>Ensemble methods (bagging, boosting, Decision/regression Forest, gradient boosting machines)</td> <td> [[%ATTACHURL%/8.ensembles.pptx][8.ensembles.pptx]]</td> <td> [[%ATTACHURL%/8.ensembles.pdf][8.ensembles.pdf]]</td> <td> <br /> Random Forests: http://www.math.mcgill.ca/yyang/resources/doc/randomforest.pdf <br /> https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1249 (other links in the slides) </td> </tr> <tr><th>2023</th> <td>Probabilistic learning: Maximum Likelyhood Learning, Maximum Aposteriori Estimate, Naive Bayes</td> <td> [[%ATTACHURL%/10.naivebayes.pptx][10.naivebayes.pptx]]</td> <td> [[%ATTACHURL%/10.naivebayes.pdf][10.naivebayes.pdf]]</td> <td> <br /> http://www.cs.columbia.edu/~mcollins/em.pdf <br /> (more in slides) </td> </tr> <tr><th>2023</th> <td>Unsupervised learning: Clustering</td> <td>if time allows, but here are the slides: [[%ATTACHURL%/11.clustering.pptx][11.clustering.pptx]]</td> <td> [[%ATTACHURL%/11.clustering.pdf][11.clustering.pdf]] [[%ATTACHURL%/1.ML2020Introductionlight.pdf][<br />]]</td> <td>https://www.researchgate.net/publication/282523039_A_Comprehensive_Survey_of_Clustering_Algorithms</td> </tr> <tr><th> </th> <td>Prescriptive analytics: recommender systems</td> <td> not in 2023</td> <td> </td> <td> <br /> https://github.com/junhyukoh/deep-reinforcement-learning-papers#all-papers <br /> https://skymind.ai/wiki/deep-reinforcement-learning <br /> https://christian-igel.github.io/paper/RLiaN.pdf</td> </tr> <tr> <td> ---++++ *2023* </td> <td>The ML pipeline: Data sources identification, data preparation (structured, unstructured, symbolic, numeric) data transformation, model fitting, and hyperparameter tuning</td> <td> [[%ATTACHURL%/14.FeatureEngineering.pptx][14.FeatureEngineering.pptx]]</td> <td> [[%ATTACHURL%/14.FeatureEngineering.pdf][14.FeatureEngineering.pdf]]</td> <td> http://www.machinelearningtutorial.net/2017/06/17/feature-engineering-in-machine-learning/ https://www.automl.org/wp-content/uploads/2019/05/AutoML_Book_Chapter1.pdf <br />See AutoML open source [[https://www.h2o.ai/products/h2o-automl/][H20]] and "Google AutoML " project: https://cloud.google.com/automl-tables/ A highly cited paper [[%ATTACHURL%/Hyperparameter_optimization_survey_2020.pdf][Hyperparameter_optimization_survey_2020.pdf]] on hyperparameter optimization techniques in NN many other links are in the course slides </td> </tr> <tr> <td> ---++++ *2023* </td> <td>Evaluation: learning curves, evaluation measures, true vrs. sample error, confidence intervals, hypothesis testing (one and two-tail tests)</td> <td> [[%ATTACHURL%/4.evaluation.pptx][4.evaluation.pptx]] [[%ATTACHURL%/4.evaluation.pptx][<br />]] NOTE: in 2023, some of the latter topics in the slides have not been presented. Precisely, we introduced only two-tail test for hypothesis comparison, but NOT what follows, up to the end of the slides. </td> <td> [[%ATTACHURL%/4.evaluation.pdf][4.evaluation.pdf]] [[%ATTACHURL%/4.evaluation.pdf][<br />]]</td> <td> </td> </tr> <tr> <td> ---++++ 2023 </td> <td>Explainable ML, Counterfactual ML</td> <td> [[%ATTACHURL%/15.ExplainableAI.pptx][15.ExplainableAI.pptx]] [[%ATTACHURL%/15.Explainable_AI.pptx][<br />]]</td> <td> </td> <td> Survey on Explainability in ML [[%ATTACHURL%/SurveyExplainability.pdf][SurveyExplainability.pdf]] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7824368/ LIME: https://homes.cs.washington.edu/~marcotcr/blog/lime/</td> </tr> <tr> <td> ---++++ *2022* </td> <td>Data reporting: Visual Analytics and ML4Viz (machine learning for data visualization)</td> <td>not in 2023</td> <td> </td> <td> Visualization for ML: [[%ATTACHURL%/SurveyOfVisualAnalytics4ML2021.pdf][SurveyOfVisualAnalytics4ML2021.pdf]] ML for Visualization: https://arxiv.org/abs/2012.00467 </td> </tr> </tbody> </table> ---++
This topic: ApprAuto
>
WebHome
Topic revision: r387 - 2023-11-28 - PaolaVelardi
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback