Skip to main content

WageningenX: Big Data for Agri-Food: Principles and Tools

As the big data era unfolds, developments in sensor and information technologies are evolving quickly. As a result, science and businesses are yielding enormous amounts of data. Ideally this data provides valuable insights for decision-making in real time. But processing data the traditional way is no longer possible. Join Wageningen University & Research, #1 university Animal Sciences and Agriculture, and learn how to best handle big data sets. Enrol now.

Big Data for Agri-Food: Principles and Tools
6 weeks
6–10 hours per week
Progress at your own speed
Optional upgrade available

There is one session available:

After a course session ends, it will be archivedOpens in a new tab.
Starts Feb 21

About this course

Skip About this course

Demystify complex big data technologies
Compared to traditional data processing, modern tools can be complex to grasp. Before we can use these tools effectively, we need to know how to handle big data sets. You will understand how and why certain principles – such as immutability and pure functions – enable parallel data processing (‘divide and conquer’), which is necessary to manage big data.

During this course you will acquire this principal foundation from which to move forward. Namely, how to recognise and put into practice the scalable solution that’s right for your situation.

The insights and tools of this course are regardless of programming language, but user-friendly examples are provided in Python, Hadoop HDFS and Apache Spark. Although these principles can also be applied to other sectors, we will use examples from the agri-food sector.

Data collection and processing in an Agri-food context
Agri-food deserves special focus when it comes to choosing robust data management technologies due to its inherent variability and uncertainty. Wageningen University & Research’s knowledge domain is healthy food and the living environment. That makes our data experts especially equipped to forge the bridge between the agri-food business on the one hand, and data science, artificial intelligence (AI) on the other.

Combining data from the latest sensing technologies with machine learning/deep learning methodologies, allows us to unlock insights we didn’t have access to before. In the areas of smart farming and precision agriculture this allows us to:

  • Better manage dairy cattle by combining animal-level data on behaviour, health and feed with milk production and composition from milking machines.
  • Reduce the amount of fertilisers (nitrogen), pesticides (chemicals) and water used on crops by monitoring individual plants with a robot or drone.
  • More accurately predict crop yields on a continental scale by combining current with historic data on soil, weather patterns and crop yields.

In short, this course’s foundational knowledge and skills for big data prepare you for the next step: to find more effective and scalable solutions for smarter, innovative insights.

For whom?
You are a manager or researcher with a big data set on your hands, perhaps considering investing in big data tools. You’ve done some programming before, but your skills are a bit rusty. You want to learn how to effectively and efficiently manage very large datasets. This course will enable you to see and evaluate opportunities for the application of big data technologies within your domain. Enrol now.

This course has been partially supported by the European Union Horizon 2020 Research and Innovation program (Grant #810 775, Dragon).

At a glance

  • Institution: WageningenX
  • Subject: Data Analysis & Statistics
  • Level: Intermediate
  • Prerequisites:

    A university education and/or working knowledge of math and science and, of course, being a computer science enthusiast will help a lot!

  • Language: English
  • Video Transcript: English
  • Associated skills:Deep Learning, Precision Agriculture, Written Composition, Scalability, Animal Science, Innovation, Pesticides, Artificial Intelligence, Apache Hadoop, Research, Decision Making, Information Technology, Data Collection, Management, Machine Learning, Data Science, MapReduce, Foods, Computer Science, Immutability, Data Processing, Soil Science, Apache Spark, Agriculture, Data Management, Big Data, Python (Programming Language), Fertilizers, Hadoop Distributed File System (HDFS)

What you'll learn

Skip What you'll learn
  • Recognize big data characteristics (volume, velocity, variety, veracity)
  • The difference between scaling up and scaling out
  • Big data principles: immutability and pure functions
  • Processing big data with map-reduce, using clusters
  • Understand technologies: distributed file systems, Hadoop
  • How dataframes and wrapper technology (Apache Spark) make life easier
  • The big data workflow and pipeline
  • How data is organized in datalakes, using lazy evaluation
  • Develop insight how to apply this to your own case
  • Module 1: Big data definition and characteristics
    In module 1, you will learn how to recognize the characteristics of a big data problem in agriculture, to see where its biggest challenge lies. Should the solution focus on size, speed, various formats or uncertainty of data? Should you scale up or scale out?

  • Module 2: Big data principles: what are they and why do we need them
    In module 2, you'll learn the principles that are required for scaling out: immutability and pure functions, and map-reduce. What are these and why do we need them?

  • Module 3: Bring those principles to practice
    Module 3 shows you how to bring those principles into practice. You will learn what a cluster is, and how a distributed file system in a client-server architecture works, with Hadoop. You will understand why such a system is indeed scalable.

  • Module 4: Big data technologies that make implementation so much easier
    Module 4 goes further into the application of big data technology, the “big data stack of technologies". The main message here is that if you know what you want to do, these technologies can take the work out of your hands. For example, you will see Apache Spark, a big data technology platform, that applies map-reduce for you.

  • Module 5: The big data workflow and pipeline; the how and why of datalakes
    Module 5 dives deeper into the data. You'll learn about datalakes and why a datalake is different from a traditional database. You'll understand what a big data workflow looks like and what a pipeline is.

Frequently Asked Questions

Skip Frequently Asked Questions

Do you have questions about the MOOCs (Massive Open Online Courses) and/or related online programmes of Wageningen University & Research? To help you find answers to your questions, we created a list with frequently asked questions about enrolling, participating in- and finishing a MOOC.

Interested in this course for your business or team?

Train your employees in the most in-demand topics, with edX For Business.