Webscraping using Python
When: | Th 05-06-2025 13:00 - 17:00 |
Where: | CIT Smitsborg 5431.0074 BlueGene, Nettelbosje 1 Groningen |
Web scraping is valuable for collecting data from online sources, especially when no downloadable datasets are available. However, scraping can be confusing for beginners, and if done carelessly, it can be ineffective, unethical, or even legally problematic.
This workshop will introduce you to the basics of web scraping in a clear, practical way. You'll learn how to extract useful data from websites using Python, explore essential tools like BeautifulSoup and Selenium, and understand the ethical and legal considerations of responsible scraping.
Why should you attend?
Whether you're conducting a literature review, gathering course data, monitoring public policy updates, or collecting forum discussions, much of the information you need is available on websites, but not as downloadable files. Web scraping allows you to automate the process of extracting this data for analysis and research.
In this hands-on workshop, you'll:
-
Learn how to identify and extract useful data from websites using tools like BeautifulSoup, and Selenium
-
Understand the structure of web pages and how to inspect elements effectively,
-
Gain awareness of the ethical and legal boundaries of web scraping in research,
-
See real examples of scraping,
-
Leave with code templates and practical knowledge, you can adjust and apply them to your project.
Who should attend?
This workshop is ideal for:
-
Undergraduate and graduate students working on research projects
-
PhD candidates gathering data for their theses or literature reviews
-
Faculty or research staff needing customized datasets for analysis
-
Administrative staff interested in automating data collection from websites
Requirements
No prior experience with web scraping is required. This session is designed for beginners and is especially relevant for students, researchers, and staff working with online information in academic contexts. Participants should be comfortable with basic Python. Prior experience in writing Python scripts is helpful but not required.
Content
This workshop will combine theory, discussion, and practical exercises to help you create solutions for real cases.
-
What is web scraping?
-
Identify and extract useful data from websites
-
Basics of HTML structure and web page elements
-
Using Python and tools like Requests, BeautifulSoup, or Selenium.
-
Ethical and legal boundaries of scraping
-
If time allows, reverse engineering API calls for dynamic content scraping.
Preparation
-
Install Python (from the source(Python.org), Anaconda, Spyder, Google Colab, etc. )
-
It is recommended that you create a virtual environment for this workshop. (see https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/)
-
It is recommended that you bring your laptop. If you don’t have a laptop, please contact t.j.van.mourik@rug.nl so we can arrange one for you.
Result
By the end of this workshop, you will have a working web scraping script that can extract data from a real website. You will also gain a solid understanding of how to navigate HTML structures, use essential scraping tools, and apply ethical best practices to ensure your data collection is responsible, effective, and reproducible.
Trainer
This course will be given by Emin Tatar.
Enrollment and course fee
Attendance is free for UG staff and PhD-students, but registration is required.
More information
You can mail the coordinator Theo van Mourik (t.j.van.mourik@rug.nl) for more questions