Webscraping using Python

When:	Th 05-06-2025 13:00 - 17:00
Where:	CIT Smitsborg 5431.0074 BlueGene, Nettelbosje 1 Groningen

Web scraping is valuable for collecting data from online sources, especially when no downloadable datasets are available. However, scraping can be confusing for beginners, and if done carelessly, it can be ineffective, unethical, or even legally problematic.

This workshop will introduce you to the basics of web scraping in a clear, practical way. You'll learn how to extract useful data from websites using Python, explore essential tools like BeautifulSoup and Selenium, and understand the ethical and legal considerations of responsible scraping.

Why should you attend?

Whether you're conducting a literature review, gathering course data, monitoring public policy updates, or collecting forum discussions, much of the information you need is available on websites, but not as downloadable files. Web scraping allows you to automate the process of extracting this data for analysis and research.

In this hands-on workshop, you'll:

Learn how to identify and extract useful data from websites using tools like BeautifulSoup, and Selenium
Understand the structure of web pages and how to inspect elements effectively,
Gain awareness of the ethical and legal boundaries of web scraping in research,
See real examples of scraping,
Leave with code templates and practical knowledge, you can adjust and apply them to your project.

Who should attend?

This workshop is ideal for:

Undergraduate and graduate students working on research projects
PhD candidates gathering data for their theses or literature reviews
Faculty or research staff needing customized datasets for analysis
Administrative staff interested in automating data collection from websites

Requirements

No prior experience with web scraping is required. This session is designed for beginners and is especially relevant for students, researchers, and staff working with online information in academic contexts. Participants should be comfortable with basic Python. Prior experience in writing Python scripts is helpful but not required.

Content

This workshop will combine theory, discussion, and practical exercises to help you create solutions for real cases.

What is web scraping?
Identify and extract useful data from websites
Basics of HTML structure and web page elements
Using Python and tools like Requests, BeautifulSoup, or Selenium.
Ethical and legal boundaries of scraping
If time allows, reverse engineering API calls for dynamic content scraping.

Preparation

Pick one of the options below, ranked from most difficult to easiest:

Install Python from the source (via python.org) and set up an IDE like VSCode or PyCharm.
Install Anaconda or Spyder, which bundles Python with a user-friendly IDE and useful packages.
No installation needed — use Google Colab directly from your browser, working from your Google Drive.

Result

By the end of this workshop, you will have a working web scraping script that can extract data from a real website. You will also gain a solid understanding of how to navigate HTML structures, use essential scraping tools, and apply ethical best practices to ensure your data collection is responsible, effective, and reproducible.

Trainer

This course will be given by Emin Tatar.

Enrollment and course fee

Attendance is free for UG staff and PhD-students, but registration is required.

More information

You can mail the coordinator Theo van Mourik (t.j.van.mourik@rug.nl) for more questions

Share this Facebook LinkedIn