Tags:
create new tag
view all tags
---++!! *Big Data Computing* ---+++!! Master's Degree in Computer Science ---+++!! Academic Year 2018-2019, spring semester <table width="100%" border=0 cellpadding=0> <tr> <td width="65%" valign="top"> <table width="100%" border=0 cellspacing=0> <tr> <td valign="top"> <font color=#AF0F0F size="+1"><b>Instructor: [[http://www.dsi.uniroma1.it/~finocchi][Irene Finocchi]]</b></font> *Office Hours*: by appointment.<br />Office: Città Universitaria, Department of Statistical Sciences, 4th floor, room 21. <br />E-mail: irene.finocchi AT uniroma1.it </td></tr> <tr> <td valign="top"> <div align="left"> </div> <div align="left"><b><p> </p>Meeting times and location</b></div> | *Day* | *Time* | *Room* | | Monday | 10:30-13:00 | Aula 2 (Via del Castro Laurenziano 7) | | Friday | 8:00-10:30 | Aula 2 (Via del Castro Laurenziano 7) | </td></tr> </table> </td> <td width="35%" valign="top" style="border-left: 1px solid #999999; "> %TOC% </td> </tr> </table> ---++ News * *September exam grades*: [[https://drive.google.com/file/d/1CQ5cbCZREjoNcrzGBKk2QG_Vqq57vbUx/view?usp=sharing][here]]. Final grades for students who passed both parts are [[https://drive.google.com/file/d/1Sj6TAhSCGJ_L-0W7tFWS2kfJCEah0GeC/view?usp=sharing][here]]. I will proceed with grade registration on Infostud unless you let me know, by October 8, that you are not going to accept the grade and want to improve it. * *July 1 final grades*: [[https://drive.google.com/file/d/10lmxMPBiCPCgapjfs7AJ7MKB8PNCEfk6/view?usp=sharing][final grades]]. I will proceed with grade registration on Infostud (for students who passed both parts) unless you let me know, by July 31, that you are not going to accept the grade and want to improve it. * *September exam*: *September 4*, *9:00*, *aula 34* Dipartimento di Statistica (fourth floor). Students who have already passed Part 1 can arrive at 10:30. * *July 1 grades*: [[https://drive.google.com/file/d/14BsOiTJ8I-LEcBscEw64nTH3EGWTcFb4/view?usp=sharing][here]]. The final grades for students who have passed both the first and the second part, considering also the paper hacking reports and presentations, will be available at the end of this week. * *June 6 final grades*: [[https://drive.google.com/file/d/1n1N_EE20yWNUviPH8OAiOAKcmBQoc2er/view?usp=sharing][final grades]]. I will proceed with the grade registration on Infostud (for students who passed both parts) unless you let me know, by July 3 (next wednesday), that you are not going to accept the grade and want to improve it. (June 6 exam grades are available [[https://drive.google.com/file/d/1veEV6GCdpJco6CC0Z3Uxte1JC4GW0O9v/view?usp=sharing][here]]) * *Midterm grades*: [[https://drive.google.com/file/d/1Y1eub3gIJ43ZzF5y9fhe2zzgC-EK67UF/view?usp=sharing][here]] * *Student presentations June 3*: Sala riunioni 34 (Statistica), Department of Statistical Sciences, Città Universitaria (Main Campus at Piazzale Aldo Moro), building CU002 (http://www.dss.uniroma1.it/sites/default/files/files/mappaCUSapienza.pdf), 4th floor, Room 34. * *Summer exams*: *June 6* at 9:00 in *Aula P1* and *July 1* at 9:00 in *Aula P2* (Main Campus). Students who have already passed the first part can arrive at 10:30. <!-- * *January exam grades*: [[https://drive.google.com/file/d/13ltMAvZjTLtFHWdqDWEpOkrbjHCYo2WV/view?usp=sharing][here]]. Please, let me know by *Friday, February 1*, if you *do not accept* your final grade. Otherwise, I will register it on Infostud (after that, you should receive an automatic email notification). * *Final grades September session*: [[https://drive.google.com/file/d/1g6_p01eCvdRfMi29mXZIfuM9r7iwTaIq/view?usp=sharing][here]]. Please, let me know by *Thursday, October 11*, if you *do not accept* your final grade. Otherwise, I will register it on Infostud (after that, you should receive an automatic email notification). * *Grades September 6 exam*: [[https://drive.google.com/file/d/1jA1M3RBJuwwSjKuC6yjB8bKow_Siv5Xl/view?usp=sharing][here]]. The final grades for students who have passed both the first and the second part, considering also the paper hacking reports and presentations, will be available at the beginning of next week. * *July 5 final grades*: [[https://drive.google.com/file/d/1sfGIfjnbF0ywELCqAG3AvKsuH3NBi9Xv/view?usp=sharing][here]]. Please, let me know by *Thursday, July 26*, if you *do not accept* your final grade. Otherwise, I will register it on Infostud (after that, you should receive an automatic email notification). Student 1826973 should contact me by email. * *Grades July 5 exam*: [[https://drive.google.com/file/d/18RVxWvAUG8vN9dlntyhH2KOpbT1EBChD/view?usp=sharing][here]]. The final grades for students who have passed both the first and the second part, considering also the paper hacking reports and presentations, will be available at the beginning of next week. * *July 5 exam*: *part 1* starts at *15:00*, *part 2* starts at *16:30*. * *June 6 final grades*: [[https://drive.google.com/file/d/1emsW_NypsIyTNb5QtxnrXJL0NjEvoszv/view?usp=sharing][here]]. Please, let me know by Wednesday, July 4, if you *do not accept* your final grade. Otherwise, I will register your grade on Infostud (after that, you should receive an automatic email notification). * *Grades June 6 exam*: [[https://drive.google.com/file/d/1OHwjp_T8fvGc_0hlqTF_ddL2oDek7Xgp/view?usp=sharing][here]]. The final grades for students who have passed both the first and the second part, considering also the paper hacking reports and presentations, will be available at the beginning of next week. * *Next exam*: due to an unexpected commitment, the next BDC exam is postponed from June 27 to *July 5*. Location: Aula 2. The first part will start at *15:00* (sharp) and the second part will start at 16:30. Late students will not be admitted. Please, register on Infostud to the June 27 session. * *June 6 exam*: the exam will be divided into two parts. The first part will start at 9:00 (sharp) and the second part will start at 10:30. Late students will not be admitted. * *Midterm scores*: [[https://drive.google.com/file/d/1-VuP3EuVYOnXADZIMLiboCYAOw_q0IJ4/view?usp=sharing][here]] * *Paper hacking*: instructions and advice on paper report writing are available [[https://drive.google.com/file/d/1dJvRv7fyZNJr0no-IR72AXnDi33RFqU7/view?usp=sharing][here]]. Please, hand-in your report no later than May 5. * *Paper assignment*: the assignment of papers is available [[https://docs.google.com/spreadsheets/d/1-DgONoBBoc9lSDpORNGQORreyjJcy6ArLdl4EPZ3eYk/edit?usp=sharing][here, tab Paper-Assignment]]. Please, hand-in your report (more details soon) no later than May 5. * *Paper hacking session*: at [[https://drive.google.com/drive/folders/1AlHY6QvGCn1U0rMZeie-MyWqdFbbXr9q?usp=sharing][this link]] you can find a bunch of papers for the paper hacking session. In groups of 2, skim through the papers (titles and abstracts) and bid for your preferred ones: please, insert at least 5 bids, possibly with ties if you have no real preferences on what to read. Bids can be inserted [[https://docs.google.com/spreadsheets/d/1-DgONoBBoc9lSDpORNGQORreyjJcy6ArLdl4EPZ3eYk/edit?usp=sharing][here]]. Provide your name and insert numbers as in the example at row 2 (1 denotes the highest preference). *Deadline: this Friday at midnight.* If you do not bid by the deadline, I will likely assign you a random paper. More details on Friday during the lesson. * Please, *register to the first midterm* using *[[http://twiki.di.uniroma1.it/twiki/view/Prenotazioni/2018_04_23_BigDataComputing][this link]]*. Registration is *mandatory*. * <font color="#AF0F0F"> *Google group registration:* </font> Please, use your intitutional email or let me know that you are a student attending the BDC class. * <font color="#AF0F0F"> *Course starting date for A.Y. 2017-2018:* </font> *Last-minute update.* Due to extreme weather conditions, [[https://www.uniroma1.it/it/notizia/didattica-sospesa-il-26-febbraio-emergenza-maltempo][all teaching activities at Sapienza are cancelled]] tomorrow. Hence, the course will start on Friday *March 2, 8:30*, Aula 2 Via del Castro Laurenziano 7 (Aule L Ingegneria). * *Restricted November session*: the restricted exam session (for students who have failed to graduate within the prescribed time; repeating students; part-time students; students workers) will be on *November 3*, *Aula Seminari*, *9:30*. Please, register for the exam on Infostud if you are entitled to participate. * Students who have passed both parts, sent me the homework, and didn't hear back from me about final grade registration on Infostud, please, send me an email with subject "BDC final grade registration" and your names. Thanks. * June 28 results: [[https://drive.google.com/file/d/0B1yYvm6QgJReb04tamtlOVRpbnM/view?usp=sharing][part 1]], [[https://drive.google.com/file/d/0B1yYvm6QgJReSzAzdlBQTDJ5MDg/view?usp=sharing][part 2]] * [[https://drive.google.com/file/d/0B1yYvm6QgJReM1lHVTRfTV81dGc/view?usp=sharing][First midterm grades]] <table> <tr> <td rowspan="3"> <img src="%ATTACHURLPATH%/midterm.jpg" alt="midterm.jpg" width='136' height='133' /></td> <td> * <font color=#AF0F0F><b>Midterm:</b></font> Tuesday *April 4*, Aula Alfa, <b>10:00 - 14:00</b> </td> </tr> </table> * <span class="WYSIWYG_COLOR" style="color: crimson;"> *Homework discussion and exam registration* </span>: *June 28, 9:30, Aula Seminari* (for students who sent me both homeworks and passed both midterms or June 6 final). * Please, *register to the second midterm* using *[[Prenotazioni.2016_05_24_BigDataComputingMidterm2][this link]]*. You can participate independently of the outcome of the first midterm. The program is indeed divided into two parts: the first part covers topics explained in the first half of the course, until April 11 (included), the second part covers all the remaining topics. Once one part is passed (either through a midterm or through a final exam), the score for that part remains valid for the entire academic year. * *Final exams, summer term*: * On *May 24* we'll have a second midterm (covering topics explained in the second half of the course, from April 12 - included - to May 23). * Final exam dates: *June 6* (9:30, Aula Alfa) and *June 30* (9:30, Aula Alfa). * You must *register* for the final exams on *Infostud*. This is required even if you passed both midterms, so that I can formally register your score on Infostud. * Final exams, starting in June, will be divided into two parts: you can do one part in a session and another one in a different session. Once one part is passed, the score for that part remains valid for the entire academic year. * Virtual machine with a lightweight Ubuntu distribution (LXLE) containing the latest release of Hadoop: *[[https://drive.google.com/file/d/0B1yYvm6QgJReUlgyc0pBQk1OOGc/view?usp=sharing][download LXLE here]]* (file size: 4GB) . In order to use the virtual machine, you need !VirtualBox (available on the Oracle website). <p> </p> * [[https://docs.google.com/spreadsheets/d/106mnEY8tZ6eiCH2mCgJ8z95JOwo7djCF7khdcEg6x94/edit?usp=sharing][Current contest results here]] * *Installing Hadoop*: we'll shortly introduce !MapReduce and its open source implementation Hadoop. Hadoop installation can pose some issues, depending on your platform. It would be therefore very useful if you could try to install Hadoop 2.2.0 right now, so that you can bring your laptop to the class with a working Hadoop version when requested (you can work in pairs). Together with [[http://www.dsi.uniroma1.it/~fusco][Dr. Emanuele Fusco]], we have prepared a short installation guide (there are many tutorials on the Web, but not all of them are correct/updated): [[%ATTACHURL%/installHadoop.pdf][here it is]]. The tutorial should be self-contained. If something goes wrong, just let us know: 1) posting questions to the Google group (if they are of general interest), or 2) sending us an email to arrange a meeting during the office hours. Better trying the installation shortly in order to get support! --> ---++ Course description As data sets grow to Terabyte and Petabyte scales, traditional models and paradigms of sequential computation become obsolete. The course will focus on fundamental algorithmic and programming issues posed by big-data computing, tackling some major data mining problems on a variety of computational models used for managing massive information structures. We will study how algorithm design techniques and technological aspects of modern computing platforms interact and adapt to each other. The emphasis will be on: * !MapReduce as a programming model for distributed data mining on large clusters of computers * Data streaming techniques for mining on-the-fly huge and rapidly changing streams of data * External memory algorithms for processing data stored on slow secondary memories <p> </p> The lectures will follow an experimental and problem-driven approach. The goal for the class is to be broad and to touch upon a variety of techniques, introducing standard practices as well as cutting-edge research topics in this area. Hands-on programming sessions will be held to guide the students on the use of good programming practices and advanced programming frameworks, such as Hadoop. Students will learn the proper settings in which to use each paradigm, the advantages and disadvantages of each model, how to design/analyze algorithms and to write efficient code in different big data settings. <font color="#AF0F0F"> *Learning outcomes:* </font> * Knowledge of big data processing frameworks (part of the Hadoop ecosystem) * Knowledge of advanced computational models, focusing on data streaming, !MapReduce-style parallelism, external memory * Ability to write efficient code taking into account architectural features of modern computing platforms (including distributed systems) * Familiarity with data mining problems and techniques * Ability to study advanced research topics in big data systems and algorithmics for massive data * Performance analysis skills using back-of-the-envelope calculations, mathematical and experimental tools <p> </p> ---++ Lectures and readings Readings, notes, slides, papers, code... are posted *[[BDC/Schedule][here]]* after each lecture. *[[BDC/ScheduleOld][Schedule 2018]]* There are no required textbooks for this class: many lessons explore cutting-edge topics and there is no unique book covering all of them systematically. Some resources that we will use along the way are: * J. Leskovec, A. Rajaraman, and J. Ullman, _[[http://www.mmds.org/][Mining of Massive Datasets]]_. Available online. * T. White. _Hadoop: The Definitive Guide - Storage and Analysis at Internet Scale_ (4th edition). O'Reilly Media. * C. Demetrescu and I. Finocchi. _[[%ATTACHURL%/SurveyStreaming08-DemetrescuFinocchi.pdf][Algorithms for data streams]]_. In Handbook of Applied Algorithms, John Wiley and Sons, 2008 * K. Mehlhorn, P. Sanders. _Algorithms and data structures: The basic toolbox_, Springer, 2009. [[http://www.mpi-inf.mpg.de/~mehlhorn/Toolbox.html][Book web site]]. <!-- * R. Bryant, D. O'Hallaron: _Computer Systems: A Programmer's Perspective_, Prentice Hall, 2003. --> <!-- * J. Bentley. _Programming pearls_, 2/ed, Addison-Wesley, 2000. [[http://netlib.bell-labs.com/cm/cs/pearls/][Book web site]].--> <!-- -- - ++ Homeworks The homework for A.Y. 2017-2018 (including a small software project in !MapReduce) will be posted here during the course. Old homeworks: * [[%ATTACHURL%/BigData-hw1-aa2016-2017.pdf][Homework A.Y. 2016-2017]] * [[%ATTACHURL%/BigData-hw1-aa2015-2016.pdf][Homework 1 A.Y. 2015-2016]] * [[https://drive.google.com/file/d/0B1yYvm6QgJReRlZzVVlXa2xSLXc/view?usp=sharing][Homework 2 A.Y. 2015-2016]] --> <!-- * [[%ATTACHURL%/BigData-hw1-aa2014-2015.pdf][Homework BDC 2015]] * [[%ATTACHURL%/BigData-hw3-aa2013-2014.pdf][Homework 3 BDC 2014]] * [[%ATTACHURL%/BigData-hw2-aa2013-2014.pdf][Homework 2 BDC 2014]] * [[%ATTACHURL%/BigData-hw1-aa2013-2014.pdf][Homework 1 BDC 2014]] --> <!-- * [[%ATTACHURL%/hw2-aa2011-2012.pdf][Mini-homework on data streams 2012]] * [[%ATTACHURLPATH%/hw1-aa2011-2012.pdf][Homework 1 2012]] --> ---++ Grading <!-- * <font color="#AF0F0F"> *Homework* </font> assigned during the course, including a small software project in !MapReduce (you can work in *groups*, typically composed of two persons. In exceptional cases I could allow groups of 3.) <br /> --> * <font color="#AF0F0F"> *Written exam* </font> with both multiple choice and open questions (if time permits, we will insert two midterms)<p> </p> * <font color="#AF0F0F"> *Reading assignment:* </font> read a scientific paper on big data systems, write a report, and (possibly) present the paper in class (you can work in *groups*, typically composed of two persons. In exceptional cases I could allow groups of 3.) <!-- <br />Reading sessions will be organized with a conference-style: each paper should be read by all students, who should prepare questions for the speakers. --> ---++ Google group The group will be used for technical discussions, homework assignment, last-minute messages. Better subscribing! | *Subscribe to Big Data Computing (Sapienza, Irene Finocchi)* | | Email: <input type=text name=email> <input type=submit name="sub" value="Subscribe"> | | [[http://groups.google.com/group/big-data-computing-sapienza-finocchi?hl=en][Visit this group]] |
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r114
<
r113
<
r112
<
r111
<
r110
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r114 - 2019-10-05
-
IreneFinocchi
Log In
or
Register
BDC Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Prenotazioni esami
Laurea Triennale ...
Laurea Triennale
Algebra
Algoritmi
Introduzione agli algoritmi
Algoritmi 1
Algoritmi 2
Algoritmi per la
visualizzazione
Architetture
Prog. sist. digitali
Architetture 2
Basi di Dati
Basi di Dati 1 Inf.
Basi di Dati 1 T.I.
Basi di Dati (I modulo, A-L)
Basi di Dati (I modulo, M-Z)
Basi di Dati 2
Calcolo
Calcolo differenziale
Calcolo integrale
Calcolo delle Probabilitą
Metodi mat. per l'inf. (ex. Logica)
canale AD
canale PZ
Programmazione
Fond. di Programmazione
Metodologie di Programmazione
Prog. di sistemi multicore
Programmazione 2
AD
EO
PZ
Esercitazioni Prog. 2
Lab. Prog. AD
Lab. Prog. EO
Lab. Prog. 2
Prog. a Oggetti
Reti
Arch. di internet
Lab. di prog. di rete
Programmazione Web
Reti di elaboratori
Sistemi operativi
Sistemi Operativi (12 CFU)
Anni precedenti
Sistemi operativi 1
Sistemi operativi 2
Lab. SO 1
Lab. SO 2
Altri corsi
Automi, Calcolabilitą
e Complessitą
Apprendimento Automatico
Economia Aziendale
Elaborazione Immagini
Fisica 2
Grafica 3D
Informatica Giuridica
Laboratorio di Sistemi Interattivi
Linguaggi di Programmazione 3° anno Matematica
Linguaggi e Compilatori
Sistemi Informativi
Tecniche di Sicurezza dei Sistemi
ACSAI ...
ACSAI
Computer Architectures 1
Programming
Laurea Magistrale ...
Laurea Magistrale
Percorsi di studio
Corsi
Algoritmi Avanzati
Algoritmica
Algoritmi e Strutture Dati
Algoritmi per le reti
Architetture degli elaboratori 3
Architetture avanzate e parallele
Autonomous Networking
Big Data Computing
Business Intelligence
Calcolo Intensivo
Complessitą
Computer Systems and Programming
Concurrent Systems
Crittografia
Elaborazione del Linguaggio Naturale
Estrazione inf. dal web
Fisica 3
Gamification Lab
Information Systems
Ingegneria degli Algoritmi
Interazione Multi Modale
Metodi Formali per il Software
Methods in Computer Science Education: Analysis
Methods in Computer Science Education: Design
Prestazioni dei Sistemi di Rete
Prog. avanzata
Internet of Things
Sistemi Centrali
Reti Wireless
Sistemi Biometrici
Sistemi Distribuiti
Sistemi Informativi Geografici
Sistemi operativi 3
Tecniche di Sicurezza basate sui Linguaggi
Teoria della
Dimostrazione
Verifica del software
Visione artificiale
Attivitą complementari
Biologia Computazionale
Design and development of embedded systems for the Internet of Things
Lego Lab
Logic Programming
Pietre miliari della scienza
Prog. di processori multicore
Sistemi per l'interazione locale e remota
Laboratorio di Cyber-Security
Verifica e Validazione di Software Embedded
Altri Webs ...
Altri Webs
Dottorandi
Commissioni
Comm. Didattica
Comm. Didattica_r
Comm. Dottorato
Comm. Erasmus
Comm. Finanziamenti
Comm. Scientifica
Comm Scientifica_r
Corsi esterni
Sistemi Operativi (Matematica)
Perl e Bioperl
ECDL
Fondamenti 1
(NETTUNO)
Tecniche della Programmazione 1° modulo
(NETTUNO)
Seminars in Artificial Intelligence and Robotics: Natural Language Processing
Informatica generale
Primo canale
Secondo canale
II canale A.A. 10-11
Informatica
Informatica per Statistica
Laboratorio di Strumentazione Elettronica e Informatica
Progetti
Nemo
Quis
Remus
TWiki ...
TWiki
Tutto su TWiki
Users
Main
Sandbox
Home
Site map
AA web
AAP web
ACSAI web
AA2021 web
Programming web
AA2021 web
AN web
ASD web
Algebra web
AL web
AA1112 web
AA1213 web
AA1920 web
AA2021 web
MZ web
AA1112 web
AA1213 web
AA1112 web
AA1314 web
AA1415 web
AA1516 web
AA1617 web
AA1819 web
Old web
Algo_par_dis web
Algoreti web
More...
BDC Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Register User
Questo sito usa cookies, usandolo ne accettate la presenza. (
CookiePolicy
)
Torna al
Dipartimento di Informatica
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback