Week 1: Crowdsourcing for High-quality Data Collection and The ImageNet Story
Artificial Intelligence is at the center of many recent advancements across areas such as transportation and finance. One of the reasons for this is that in the past decade we have designed methods to harness human intelligence at scale.
We will introduce and discuss the crowdsourcing paradigm and the importance of high-quality data.
Topics we will cover this week:
- The intuition behind crowdsourcing
- The role of crowdsourcing platforms
- The need for high-quality data for AI models
- What is ImageNet, the gap it filled, and how it was built
Week 2: Quality Control Mechanisms for Crowdsourcing
The quality of crowdsourced human input is one of the most crucial aspects affecting the overall value of the paradigm. In this week we will discuss the challenges that make quality control difficult to guarantee.
Topics we will cover this week:
- Workers' motives and behaviors
- Quality control mechanisms in crowdsourcing
- Incentives in crowdsourcing (like gamification)
- Cognitive aspects and psychometric methods
Week 3: Factors Affecting Quality in Crowdsourcing
Researchers and practitioners in human computation and crowdsourcing have identified several factors that affect the quality of crowdsourced data. In this week we will discuss some of the recent works in this regard.
Topics we will cover this week:
- Tradeoff between task pricing and quality of output
- The role of workers' demographics, qualifications and skills
- The importance of task clarity and work environments
- The concepts of task packaging, task framing and task priming
Week 4: Human Input for Data Creation and Model Evaluation in AI
In this week, we will cover the importance of data collection, annotation and engineering.
Topics we will cover this week:
- The importance of data collection
- Data generation
- The role of crowdsourcing in advanced machine learning
- Taxonomy of microtasks
Week 5: Reducing Worker Effort: Active Learning
In this week we explore the challenges of collecting large scale data and how to overcome them.
Topics we will cover this week:
- Approaches to reducing worker effort
- The implications of reducing labeling effort
- The key idea of active learning
- Query strategies for selecting informative instances
Week 6: Interpreting, Evaluating, and Debugging ML models
In this week, we discuss strategies for evaluating, debugging, and interpreting machine learning models.
Topics we will cover this week:
- The notion of model interpretability
- The role of humans in the interpretability process
- Debugging ML pipelines and related challenges