CS5063 Data Industry Practicum (DIP)

AUP - Master Human Rights and Data Science, Prof. Claudia Roda

Note the dates of feedback sessions. Make sure you mark your calendars with all dates. All reading lists are tentative and subject to change until 2 weeks before each workshop.

Introductory Lecture (January 16) back to top

Tue 16.1, Q704, 9:00 - 10:20
Data Industry Practicum introduction, Claudia Roda (PPT and Course syllabus)

Exploring Future(s) of data protection and the environment (January 19 - 13) back to top
Régis Chatellier is the Innovation & Foresight Project Manager at the Technologies and Innovation Department of the CNIL (French Data Protection Authority). His work is at the intersection of innovation, technology, digital humanities, society, regulation and ethics, and includes future studies and reports to actively participate in the data protection and data ethics debate and support the CNIL's future stance (on smart cities, design and privacy, civic tech, metaverses, ecology, AI, etc.)

Our relationship with freedoms could evolve in a context where we need to change our behaviour to mitigate climate change. These changes will take place in a context where we have collectively created, used and fed systems to monitor and measure our own actions by voluntarily measuring our health data, movements etc. What could happen with regards to data protection in possible/plausible futures?

View/Hide schedule and assignments

 

Fri 19.1, Q704, 09:00 - 10:20
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below.

Mon 22.1, Q709, 16:30 - 19:30
Exploring Future(s) of data protection and the environment - Régis Chatellier

Reading mandatory:

If you read French, the following is also very relevant

Tue 23.1, Q709, 14:00 - 17:00
Exploring Future(s) of data protection and the environment - Régis Chatellier

Sunday, 4.2, workshop assignment due

Fri 9.2, Q709, 12:10 - 14:10
Feedback session, Régis Chatellier

Artificial intelligence, regulation and human rights (30 January) back to top

Conference Agence française de développement (AFD), Artificial intelligence, regulation and human rights.

NOTE: CONFERENCE AT THE SITE OF THE AFD - REGISTER AS SOON AS POSSIBLE

Dramatic advances in artificial intelligence (AI) offer unprecedented opportunities in fields ranging from medicine and education to the monitoring and protection of biodiversity. 

AI also has the potential to upend the way societies function, however, and governments are racing to introduce regulations for a fast-evolving technology. Which is why, in the approach to European Data Protection Day, AFD Group is organizing the conference "Artificial intelligence, regulation and human rights."

 

View/Hide schedule and assignments

January 30, 2024 - 13h30 - 18h00.

Auditorium du Mistral, 3 place Louis Armand 75012 Paris

Inclusive Data Visualisations (26 January - 2 February) back to top
An AUP alumna, Alex Phuong Nguyen is Director of Product & Analytics at Ulula, a Canadian stakeholder technology company. In addition to product management, Alex leads efforts to derive values from data. Prior to joining Ulula, Alex contributed to developing and implementing the data strategy and innovation agenda in the Executive Office of the UN Secretary-General in New York. Her background is in international human and labour rights. 

Data visualization is an important component of most data science projects, not only enabling better interpretation of the data, but also facilitating data scrubbing and exploration. People who design visualizations need to address questions related to what they should display and why, but also pay increased attention to how they should display it. This workshop explores how different visualization techniques may impact the information conveyed and introduces the concepts behind visual accessibilities. Students will be working on a small real-world project requiring them to apply the principles introduced during the workshop.

View/Hide schedule and assignments

 

Tue. 23.1, Q704, 9:00 - 10:20
Workshop intro , Claudia Roda
Before this lecture you should have studied the mandatory readings listed below.

Fri. 26.1, Q709 (Speaker online), 14:30 - 16:00
Ethical Considerations in Data Visualization, Alex Phuong Nguyen


Mandatory readings:

  1. Data Visualization in Data Science (local copy)
  2. Statistics, lies and the virus: five lessons from a pandemic
  3. How Deceptive are Deceptive Visualizations (pdf)

Recommended:

  1. A Great Way to Think About Data Science: The Bowtie
  2. How To Break A Scale (pdf)
  3. Practicing Good Ethics in Data Visualization
  4. Visual Arrangements Influence Comparisons (pdf)

Visualization exercise and Data for the visualization exercise (accessible only to students at the time of the workshop)

Thu. 1.2, Q609 (Speaker online), 17:15 - 18:15
Ethical Considerations in Data Visualization, Alex Phuong Nguyen

Fri. 2.2, Q609 (Speaker online) 14:00 - 15:00
Feedback session

Sunday, 11.2, Final version of assignment due

Data management in refugee protection and humanitarian settings (February 2-8) back to top
Wellington Pereira Carneiro is a UN humanitarian worker and researcher, obtained his PhD in International Relations in the University of Brasilia, Brazil in 2012, holds two master’s degrees, a Mst in International Human Rights Law by the University of Oxford in the United Kingdom and a LLM in International Law by the University Drujby Narodov, Moscow 1996. Received his bachelor’s in law from the University do Vale do Paraiba 1992, where he lectured in from 1999-to 2000 and in the University Center of Brasilia from 2007 - 2009. In 2004 joined the International Civil Service of the United Nations and served in many conflict and troubled areas including in emergency humanitarian operations, in Chad/Cameroon 2008 and Uganda (South Sudan emergency - 2017), and completed assignments in Brazil, Sudan, Kazakhstan, Colombia, Lebanon, Angola and lately in the Russian Federation. He published a book on crimes against humanity in 2015 and authored several articles on topics related to human rights and humanitarian action.

By June 2023 110 million people were forced displaced worldwide due to persecution, conflict, violence, human rights violations, or events seriously disturbing public order, including environmental disasters. Around 75% of all world’s refugees and other in need of international protection are hosted in low- or middle-income countries which eventually cannot cope with the amount of aid to be provided in their territories. That makes aid a multi-billion-dollar industry involving huge data management challenges, registration, provision of several types of assistance, family tracing, persons with specific needs, unaccompanied children and other issues involving sensitive data protection challenges.

Protecting individuals' personal data is an integral part of protecting their life and dignity. In situations of humanitarian crises or conflict, data protection may acquire a critical life-saving significance due to persecution, ethnic grievances, or discrimination. For humanitarian organizations data protection is of fundamental importance.

The workshop provides a solid overview of the refugee and forced displacement international regime, its interaction with other human rights, and an introduction to the Data Protection Policy and principles, and their practical application in  processing personal data of persons served by humanitarian organizations. Case studies and concrete “Data protection impact assessments” by participants will provide a practical dimension to the workshop.

View/Hide schedule and assignments

 

Fri 2.2, Q704, 9:00 - 10:20
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below.

Mon 5.2, Q709, 15:30 - 18:30
Data for Refugees Protection , Wellington Pereira Carneiro 
Mandatory readings :

Tue 6.2, Q709, 13:45 - 16:45
Data for Refugees Protection , Wellington Pereira Carneiro  

Thu. 8.2, workshop assignment due

Thu. 8.2,, Q709 17:00 - 19:00
Feedback session, Wellington Pereira Carneiro 
 

Health Data Governance (February 9-13) back to top
Eric Sutherland is a Senior Health Economist leading the OECD’s work in Digital Health, bringing together policy guidance for digital tools, integrated data, and responsible analytics including artificial intelligence. In that role, he is accountable for measuring and evolving the OECD’s Recommendation on Health Data Governance (2017) and supporting digital health policy that provides data protection (e.g., security and privacy) and timely access to quality data to optimize the use of data for information, insights, and impact among individuals, health workers, policy makers, researchers, and innovators. Prior to joining the OECD, Eric led the Secretariat for a pan-Canadian Health Data Strategy, bringing together experts and governmental leaders from across Canada to establish an integrated health data ecosystem that makes better use of data for health systems, public health, population health, research, and care. Eric authored the Pan-Canadian Heath Data and Information Governance Framework and Toolkit and has taught courses in data science, health data governance, and privacy.

The objective of these workshops are to convey the complexity of enabling cross-border data collaborations and identify levers that can help enable the better use of data for the public good. Two 3-hour workshops will explore the concept of health data governance using a practical example of enabling cross-border data collaboration to develop better treatments for people with rare disease. The first session will explore the breadth of health data governance and its benefits for individuals, communities, innovators, and society considering data supply chains, data models, and data stewardship. The second session will go deeper to consider the imperative to minimise data-related harms from either data use or non-data use and the relationship with privacy, access, and consent. This will then be applied to a use case for the use of artificial intelligence to help people suffering from rare disease, which participants will build on for submission. A final 2-hour workshop will allow participants to present their approach to rare disease and receive feedback.

View/Hide schedule and assignments

 

Fri 9.2, Q704, 9:00 - 10:20
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below.

Mon 12.2, Q709, 15:30 - 18:30
Health Data Governance, Eric Sutherland (PPT)
Mandatory readings :

Tue 13.2, Q709, 13:45 - 16:45
Health Data Governance, Eric Sutherland 

Sun 18.2, workshop assignment due

Fri 1.3, Q709 12:10 - 14:10
Feedback session, Eric Sutherland 

Tech and data tectonics: how geo-economics and geo-politics influence digital sovereignty and resilience (February 23 - 27) back to top
An AUP alum, Yann Lechelle is co-founder and CEO of probabl.ai, a company that aims at democratizing data science and machine learning. Over the past three decades, from Paris to Los Angeles, via Cambridge UK and New York, and back, Yann has been leveraging software to help unlock business opportunities in many fields: financial markets, cartoon animation, yield management, digital art on new media, mobile app discovery and monetization, and applied AI to deploy state of the art voice recognition on the edge. Before founding probabl.ai, Yann has developed a small cloud company to become a credible regional alternative to the major 3 cloud hyperscalers, growing the team threefold to 600 employees while operating complex change management. Yann's has played key roles as an entrepreneur, key shareholder, advisory board member, and angel investor. He is a strong advocate of the Paris tech ecosystem as a co-founding member of France Digitale and HUB AI Paris, as well as Entrepreneur-in-Residence at INSEAD Business School.

In these sessions, we'll explore the organization of digital technology on a large scale, focusing on a few major players who dominate the value chain. This dominance allows them to capture significant value, which could hinder innovation and our control over data processing and value creation, especially as artificial intelligence advances. We'll examine monopolies and oligopolies, antitrust efforts, and ways to achieve a more balanced playing field. We'll also consider the impact on society and democracy's future. Your assignment, should you choose to accept it, will be to conceive strategies, policies, and methods to enhance societal resilience and maximize technological benefits for humanity.

 

View/Hide schedule and assignments

 

Fri 23.2, Q704, 9:00 - 10:20
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below.

Mon 26.2, Q709, 15:30 - 18:30
Tech and data tectonics, Yann Lechelle 
Mandatory readings :

Objective #1: read through, let the numbers speak for themselves and to you, make a mental map of who does what, where, at what scale. Map to national GDP.

Objective #2: consider counter measures, movements, techniques, approaches, philosophies

Advanced reading, optional:

Tue 27.2, Q709, 13:45 - 16:45
Tech and data tectonics , Yann Lechelle  (Assignment)

Sun 2.3, workshop assignment due

Fri 22.3, Q709 12:10 - 14:10
Feedback session, Yann Lechelle 

Privacy and ethics in product design and operation (March 18 - 19) back to top
An AUP alumna, Maria-Martina Yalamova is Vice President, Privacy Counsel at NBCUniversal Media, LLC. She is an international privacy lawyer experienced in the media and entertainment, and technology industries. She has advised leading Internet, technology and pharmaceutical companies on electronic communications, data privacy, security, e-commerce, online advertising, and intellectual property issues. Maria-Martina started her privacy career as a researcher and policy advisor at the London-based NGO, Privacy International, where she worked on privacy and human rights issues across the Asia Pacific region. 

The life-cycle of digital product design and operation requires the consideration of various data privacy and ethics issues that involve active collaboration between privacy and business legal teams, data scientists, product and technology experts. This workshop will allow students to better understand this “privacy by design” process by working through real-life examples and problem solving as a team.

View/Hide schedule and assignments

 

Mon 18.3, Online, 10h30 - 11h30
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below.

Mon 18.3, Q709, 15:30 - 18:30
Privacy and ethics in product design and operation, Maria Martina Yalamova (PPT)

Mandatory readings:

Recommended readings:

Tue 19.3, Q709, 13:45 - 16:45
Privacy and ethics in product design and operation, Maria Martina Yalamova

Sun 30.3, workshop assignment due

Fri 19.4, Q709 (speaker online) 12:10 - 14:10
Feedback session, Maria Martina Yalamova

AI and Privacy in Finance (March 22 - 26) back to top
Pagona Tsormpatzoudi is Senior Vice President, Assistant General Counsel, Privacy, Artificial Intelligence & Data Protection at MasterCard. She is responsible for legal compliance, policy and regulatory engagement on privacy and data protection for MasterCard Cyber and Intelligence solutions which inform the overall MasterCard safety and security strategy. She often advises on new technologies, such as artificial intelligence and digital identity. Previously, she was a researcher at the Center for IT and IP Law (KU Leuven). 

Artificial Intelligence (AI) has revolutionized the way we do business, the way we work, the way we live our lives. Besides the many benefits, AI may also bring risks to fundamental human rights, including the rights to privacy and data protection. This workshop explores, through practical examples, regulatory issues on AI development and deployment.

View/Hide schedule and assignments

 

Fri 22.3, Q704, 09:00 - 10:20
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below. Please note that this, and the next workshop, happen in the same week so you have a double preparation to do.

Mon 25.3, Q709, 15:30 - 18:30
AI and Privacy in Finance, Pagona Tsormpatzoudi (PPT)

Mandatory readings:

Tue 26.3, Q709, 13:45 - 16:45
AI and Privacy in Finance, Pagona Tsormpatzoudi (Exercise)

Sun 14.4, workshop assignment due (Please note that the assignments for this and the next workshop are due on the same date)

Mon 22.4, Q709 (speaker online) 12:10 - 14:10
Feedback session, Pagona Tsormpatzoudi

How good is language technology for most of the world’s languages? (March 22 - 28) back to top
An AUP alumna, Anna Kazantseva is a Research Officer at the National Research Council of Canada. Her immediate work and research interests are in applications of Natural Language Technology for the revitalization of Indigenous languages in Canada. She has a long-term interest in making digitized literature easy to access, to find and to read: automatic summarization of literature, topical segmentation of literature, information retrieval in the context of literature, document similarity, modelling narrative structure, and so on.  

Most of us think of language as written, typed, transmitted through various media  and recorded with large repositories of data or books available. However, most of the world’s 7000 languages have recent or no writing systems, are mostly spoken and not written and passed from person to person – not through media or books. And they weaken and disappear at an alarming rate.

In this workshop we will take a quick overview at the state-of-the-art in language technology and will examine the gap between so called “well-resourced” languages like English or French and “under-resourced” languages like Mayan, Mohawk or Inuktitut. We will look at why most of the modern language technology cannot be applied to these smaller languages, both from ethical and from technical points of view.

We will also look at some of the tools that can be used for building language technology when little or no data is available and we will build one or two small applications.

View/Hide schedule and assignments

 

Fri 22.3, Q704, 09:00 - 10:20
Workshop Intro, Claudia Roda
Before this lecture you should have studied the mandatory readings listed below. Please note that this, and the previous workshop, happen in the same week so you have a double preparation to do.

Wed 27.3, Q609 (speaker online) , 17:00 - 18:30
How good is language technology for most of the world’s languages? Anna Kazantseva

Mandatory readings:

Thu 28.3, Q709 (speaker online) , 17:00 - 18:30
How good is language technology for most of the world’s languages? Anna Kazantseva

Sun 14.4, workshop assignment due (Please note that the assignments for this and the previous workshop are due on the same date)

Tue 23.4, Q709 (speaker online) 17:00 - 18:30
Feedback session, Anna Kazantseva

Workshop Conclusion and Portfolio Presentation back to top

Fri 19.4, Q704, 9:00 - 10:20
Review of draft version of portfolio (sample structure), Claudia Roda

Fri 26.4, Q704, 9:00 - 11:30
Portfolio final draft, Claudia Roda

Tue 7.5, Q704, 9:00 - 11:00
Final exam period, portfolio presentation, Claudia Roda

Please also reserve the following dates back to top

All class periods: T, F period 1 (9:00 - 10:30)

Mon 5.2, 15:30 - 18:30
Tue 6.2, 13:45 - 16:45
Mon 19.2, 15:30 - 18:30
Tue 20.2, 13:45 - 16:45