create new tag
view all tags

Advanced Parallel Architecture - Architetture avanzate e parallele

Annalisa Massini

Office Hours: appointment by email

  • The presentation of papers is scheduled for 11 October at 10:00 am in Seminari room, via Salaria 113, III floor.
    • Students wishing to give the presentation on October 11 must send an email to the professor.

  • Paper presentation:
    • Paper for oral presentation must be approved by the teacher. Please, send a copy of proposed papers to the teacher for approval.
    • Students are required to present the selected paper using slides, in 20-30 minutes.

  • Alternatively to the paper presentation, student can present the result of a project, whose topic and goal is approved by the teacher.

Aim of the course - Scopo del corso

The aim of this course is to acquire an understanding and appreciation of a computer system and to learn to harness parallelism to sustain performance improvements, starting from the knowledge of Computer Architecture derived from undergraduate courses. The course presents: the classification of parallel architectures of the Flynn’s Taxonomy and a classification for architectures in the MIMD class; interconnection networks and features of different communication requests; protocols for cache coherence; metrics and measurements of performance, and performance optimization. A deep knowledge of the computer architecture, a careful use of different forms of parallelism and the performance analysis sustain the design of parallel algorithms and the study strategies for problem decomposition.

Lo scopo del corso è quello di acquisire la capacità di comprensione e di valutazione di una particolare architettura e di saper sfruttare diversi tipi di parallelismo, allo scopo di incrementare le prestazioni in maniera adeguata, partendo dalle conoscenze acquisite nei corsi di Architettura degli Elaboratori della laurea triennale. Nel corso, verrà descritta la classificazione delle architetture parallele secondo Flynn, e una classificazione per la classe MIMD. Si introdurranno le reti di interconnessione e i problemi legati ai diversi tipi di comunicazione. Si descriveranno i diversi protocolli per Cache Coherence. Si parlerà infine di prestazioni (metriche, misure e ottimizzazione) L’utilità del corso nasce dal fatto che una conoscenza approfondita dell’architettura, un attento uso del parallelismo e l’analisi delle prestazioni sono aspetti che dovrebbero sempre affiancare il progetto di algoritmi paralleli e lo studio di strategie di decomposizione di problemi e di distribuzione di carico.
Syllabus - Programma di massima del corso

  • Von Neumann's Architecture limitations.
  • Instruction pipeline and arithmetic operations pipeline. Vector processors. Dataflow architecture. Multicore and multithreading.
  • Parallel Architectures. Flynn’s Taxonomy and other classifications. Forms of parallelism. SIMD and MIMD architectures.
  • Interconnection topologies and interconnection networks. Routing Functions. Static Networks. Dynamic Networks. Combining Networks.
  • Cache Coherence: Snooping Protocols and Directory-based Protocols. Memory Consistency. Message Passing Systems.
  • Manycore Architectures: GPU (and CUDA).
  • Performance metrics and measurement Amdahl's Law. Performance optimization: work distribution and load balance, locality, communication.

  • Limiti dell'architettura di Von Neumann.
  • Pipeline delle istruzioni e delle operazioni. Macchine vettoriali. Macchine data-flow. Multicore e multithreading.
  • Architetture parallele. Classificazione di Flynn e altre classificazioni. Tipi di parallelismo. Multiprocessori (SIMD e MIMD).
  • Topologie e reti di interconnessione. Funzioni di routing. Reti statiche, reti dinamiche, reti combinate.
  • Protocolli per Cache Coherence: Snooping e Directory-based. Memory Consistency. Message Passing Systems.
  • Architetture manycore: GPU (cenni su CUDA).
  • Prestazioni: metriche e misure. Legge di Amdahl. Ottimizzazione di prestazioni: distribuzione del carico, località, comunicazioni.


Lectures will be held on Tuesday and Thursday, 12:00-14:00 in Aula Alfa.

Lecture 1 - February 21st, 2017 Introduction to the course. Motivations to Parallel Architectures. Application Trends. Technology Trends. (Culler, Singh - Ch. 1) Lecture 1 - Introduction (part 1)
Lecture 2 - February 23rd, 2017 Architectural Trends: Bit Level Parallelism, Instruction Level Parallelism, Thread Level Parallelism. Flynn's Taxonomy. Considerations on performance: Speed-up and Communication cost. (Culler, Singh, Gupta - Ch. 1) Lecture 2 - Introduction (part 2)
Lecture 3 - February 28th, 2017 Summary on computer architecture. Von Neumann's architecture.Instruction execution, Instruction Set, Instruction format. Addressing modes. Hardwired and microprogrammed CU. Lecture 3 - Computer architecture and organization (part 1)
Lecture 4 - March 2nd, 2017 Summary on computer architecture. Modules and connections. Bus. Memory Hierarchy. Cache Memory. Lecture 4 - Computer architecture and organization (part 2)
Lecture March 7th, 2017 Cancelled
Lecture 5 - March 9th, 2017 Main memory. I/O modules. (Computer architecture and organization (part 3)). Instruction pipelining (Hennessy, Patterson - Appendix C, Sections C1, C2 - Lecture 5 - Pipeline (part 1))
Lecture 6 - March 14th, 2017 Pipeline hazards. Exercises. Lecture 6 - Pipeline (part 1 + part 2)
Lecture 7 - March 16th, 2017 Arithmetic operations. Pipeline of arithmetic operations. Lecture 7 - Computer arithmetic
Lecture 8 - March 21st, 2017 Redundant number representations for carry-free addition. Modified Signed Digit (MSB) and Redundant Binary. Lecture 8 - Redundant number representations
Lecture 9 - March 23rd, 2017 Residue number system. Circuit evaluation: delay and area. Lesson 9 - Residue systems and ciruit evaluation
Lecture 10 - March 28th, 2017 State of art in Bioinformatics and Project Proposals - Tiziana Castrignanò - CINECA Castrignanò - Project Proposals 2017
Lecture 11 - March 30th, 2017 Data dependences and name dependences. Loop-carried dependences. (Hennessy, Patterson - Chapter 3, Sect. 3.1 and Chapter 4, Sect. 4.5) Lecture 11 - Instruction level parallelism and Loop-carried dependences
Lecture April 4th, 2017 Cancelled (Big Data midterm exam)
Lecture April 6th, 2017 Cancelled due to illness
Lecture 12 - April 11th, 2017 Exercises on Circuit propagation time and area; residue number systems; RB representation; instruction pipelining; true dependences, output dependences, antidependences, loop carried dependences; pipelined operations. Lecture 12 - Exercises - RB representation
Lecture 13 - April 20th, 2017 Midterm 20 april 2017
Lecture 14 - April 27th, 2017 Classifications of parallel architectures. Interconnection networks. Lecture 14 - Parallel architectures - Interconnection Networks
Lecture 15 - May 2nd, 2017 Evaluation of interconnection networks. Multistage interconnection networks. Clos networks. Benes Network. Equivalence of logN stage MIN. Equivalence classes for (2logN-1) stage MIN. Lecture 15 - Interconnection Networks Equivalence classes paper
Lecture 16 - May 4th, 2017 Exercises on interconnection networks.
Lecture 17 - May 9th, 2017 Vector Architecture. Description and scheme of CRAY-1 Vector Architecture optimizations. Lecture 17 - Vector Architectures (Hennessy, Patterson - Chapter 4, Sect. 4.2)
Lecture 18 - May 11th, 2017 Graphics Processing Units. Lecture 18 - GPUs (Hennessy, Patterson - Chapter 4, Sect. 4.2- Kirk, Hwu - Chapter 3, 4, 5; Barlas - Chapter 6)
Lecture 19 - May 16th, 2017 Exercises on GPU (Kirk, Hwu - Chapter 3, 4). Exercises on GPU
Lecture 20 - May 18th, 2017 Cache Coherence in Shared Memory Systems Lecture 20 - Cache Coherence (Hennessy, Patterson - Chapter 5, Sect. 5.2 and 5.4) Slide 1-43
Lecture 21 - May 23th, 2017 Cache Coherence in Shared Memory Systems Lecture 20 - Cache Coherence (Hennessy, Patterson - Chapter 5, Sect. 5.2 and 5.4) Slide 44-60 Exercises on Snooping protocol. Lecture 21 - Cache Coherence Exercises
Lecture 22 - May 25th, 2017 Amdhal Law and Performance Equation Amdhal & Performance (Hennessy, Patterson - Chapter 1, Sect. 1.9)
Lecture 22 - Amdhal Law and Performance Equation

Past year lectures

Textbooks - Testi di riferimento

  • Parallel Computer Architecture: A Hardware/Software Approach, David E. Culler, Jaswinder P. Singh and Anoop Gupta, Morgan Kaufmann, 1998
  • Computer Architecture, Fifth Edition: A Quantitative Approach, John L. Hennessy, David A. Patterson, Morgan Kaufmann, 2011
  • Programming Massively Parallel Processors: A Hands-on Approach, David B. Kirk, Wen-mei W. Hwu, Morgan Kaufmann, 2010
  • Multicore and GPU Programming An Integrated Approach, Gerassimos Barlas, Morgan Kaufmann, 2014

Exam - Esame

  • Students attending the lessons can take a mid-term exam and a final exam (or a whole exam). Mid-term and final exam (or whole exam) consist in a written test and exercises.
  • Project or oral exam.

Text of exams


-- AnnalisaMassini

Edit | Attach | Watch | Print version | History: r80 < r79 < r78 < r77 < r76 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r80 - 2017-10-04 - AnnalisaMassini

Questo sito usa cookies, usandolo ne accettate la presenza. (CookiePolicy)
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback