“Intro to Python for Working with Text” workshop series

“Intro to Python for Working with Text” workshop series#

By Filipa Calado

Pratt Institute School of Information

Welcome to the series!#

As one of the most popular, versatile, and beginner-friendly programming langauges, Python can be used for a variety of tasks from analyzing data to building websites.

This 5-part workshop series introduces participants to the Python programming language for working with text. The first workshop begins with core concepts, like data types, variables, functions, loops, and conditional statements. The second and third workshops move to data gathering and processing with web scraping, APIs, and text cleaning methods. The fourth and fifth workshops explore text analysis with Natural Language Processing (NLP) and deep learning tools. See a more detailed description of each workshop below.

Workshop overviews#

Workshop 1: “Introduction to Python Fundamentals”

  • Offers basic introduction to core concepts in Python programming, grounded in a critical awareness about data and what happens to data at various levels of transformation and abstraction.

Workshop 2: “Python for Web Scraping and APIs”

  • Introduction to ethics, legality, and programmatic methods for extracting data from the web. Advances core concepts from introductory session (like loops and conditional statements) and adds new concepts on object-oriented programming and working with Python libraries. Participants practice scraping metadata from current “anti-trans” bills in the USA.

  • libraries: requests, bs4, and pandas

Workshop 3: “Python for Text Cleaning”

  • Experiments with approaches for wrangling text data into formats for analysis, with emphasis on removing unwanted elements that may skew analysis. While building on skills for writing loops and conditional statements and working with external libraries, participants will learn to write functions and scripts for running customized text cleaning processes.

  • libraries: pandas, spacy

Workshop 4: “Python for Text Analysis”

  • Explores methods for finding and analyzing textual patterns through popular tasks in Natural Language Processing. Participants practice writing code to annotate and extract text according to specific features from current “anti-trans” bills in the USA.

  • libraries: spaCy

*Workshop 5: “Python for Text Generation (with Machine Learning)”

  • With the anti-trans bills data that they prepared in previous workshops, participants practice fine-tuning a small Text Generation model and learn about how to use Machine Learning for research.

  • libraries: transformers

Upcoming Digital Scholarship Workshops#

See more workshop offerings (including on Python) at the Princeton University library. We have upcoming workshops on working with data, digital publishing, and more.

Want to talk Python or another digital project or tool? Sign up for a consultation with Digital Scholarship at Princeton.

For other workshops and reference materials on computational methods at Princeton University, check out Research Computing.

Sources#

This curriculum is inspired by the Graduate Center Digital Initiatives Digital Humanities Research Institute Python workshop.

The opening challenge takes text from the Feminist Data Manifest-No by M. Cifor, P. Garcia, et al.

For more instruction with Python, please see these books:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.