lists#
The list
data type is perhaps the most common data type for working
with text. Textual data tends to be saved as a list
, or a collection
of separate strings
, each string consisting of a sequence of
characters that represents a single word.
It is important to remember
that while humans might read each string
as a word, the computer does
not ascribe any semantic value to it, only processing it as sequence of
characters.
list
indexing#
List indexing enables programmers access specific items in a list. It is super useful for working with large lists, such as a full text, or even a collection of texts. It works by accessing items according to their position within a list.
By “position,” I mean where that item is located on a list. The moment a list is created in Python, it comes with an implicit index, where each item is assigned a numerical position based on its location. The first item is assigned to the position 0, the second item is assigned 1, the third item is assigned 2, the fourth item assigned 3, and so on.
The syntax for list indexing is to declare the list name followed by brackets that contain the index. The first item in a list would be accessed with the index 0.
breakfast = ['granola', 'cashew yogurt', 'strawberries', 'mango',
'coffee']
breakfast[0]
'granola'
Advanced question: Why does list indexing begin with 0?
The answer is a bit complicated, and gets into technical details of how computer memory works in programming. Put simply, it has to do with the memory location being based in offsets. Because the first item of a list is also the origin location of the list, its offset is 0. In other words, to go to the first item in the list, there is no movement (or offset) from the current, or origin, location. The second item of the list requires 1 movement from current location, and is therefore assigned to the location 1.
The third item in a list would be accessed with the index 2.
breakfast[2]
'strawberries'
Python offers a specific way of accessing the last item of a list, which can really useful for taking a peek at the end of a list. The last item in a list is accessed with the index -1. This might seem a little strange, but if you imagine the list items forming a circle which goes clockwise, then going backward (counterclockwise) one step would take you to the last item. That backward step is represented with -1.
breakfast[-1]
'coffee'
list
slicing#
Like list indexing, list slicing uses a list’s implicit index to access items. Unlike list indexing, however, slicing enables someone to access a range, or a “slice,” of items from a list. This capability becomes convenient when working with large lists. For example, we might take a look into a section of a list without loading the entire list, or we might grab a portion of that list to create a new list.
To get a slice, you need to enter the starting and ending point of the section which you want to grab. In between these values, insert a colon, which represents every item that occurs in between the start and end points. A list slice therefore looks like the following:
breakfast [2:5]
['strawberries', 'mango', 'coffee']
The syntax for list slicing is a bit wonky, but like list indexing, it is just a matter of getting used to the particulars of Python’s way of counting by starting with zero, as well as the rules for inclusive and exclusive values.
Here’s the tricky part: the starting point is always inclusive, and the ending point is always exclusive. Which means that the slice will take everything starting at position 2, up until, but exclusing, the item in position 5. This rule for list slicing can cause some confusion, and it’s best just to memorize it.
One more important thing to know is the colon. In its position between the start and end point, it signifies everything in between. But if there is nothing before or after the colon, then the colon represents everything until the beginning or end of the list. For example, the syntax [1:] will take everything starting at index 1 (or the second item) until the end of the list. Similarly, [:4] will take everything from the start until the fourth index position of the list.
This trick with the colon can be really nifty for grabbing the first 10 or first 100 items from a list, allowing you to take peeks into the list items without printing the entire list. We will use this trick a lot in future workshops.
breakfast[3:]
['mango', 'coffee']
list
methods#
Like strings, lists also have specific methods that only work with list type data. List methods enable one to do various things to data contained in lists.
Some of the more popular methods have to do with creating and modifying
items in a list. One list method, append()
, adds items to a list. For
example, to a list of breakfast items, one can add additional yummy items
using append()
:
breakfast = ['bagel', 'cream cheese', 'coffee']
breakfast.append('orange juice')
breakfast
['bagel', 'cream cheese', 'coffee', 'orange juice']
Other methods, remove()
and pop()
take items out from a list. While
remove()
takes out a specific item (declared within the parentheses),
pop()
will only take out the last item in a list, “popping” it off,
so to speak.
breakfast.remove('bagel')
breakfast
['cream cheese', 'coffee', 'orange juice']
breakfast.pop()
breakfast
['cream cheese', 'coffee']
Besides adding and deleting items, we can also sort items in a list. For
these, we use the sort()
and reverse()
methods.
To show the power of these methods, we’ll work with a larger list, like
a list of words from this workshop’s description. First, we will copy
and paste the workshop description as a single string. Then, we will
split the string, saving the output to a new variable. Once the data is
in list format, we will run the sort()
and reverse()
methods.
Notice the syntax wrapping the string here, the triple quotation marks (instead of single), for saving the string. The triple quotes enables our strings to break lines and include other punctuation without messing with the string format.
text = '''
1. We refuse to operate under the assumption that risk and harm
associated with data practices can be bounded to mean the same
thing for everyone, everywhere, at every time. We commit to
acknowledging how historical and systemic patterns of violence
and exploitation produce differential vulnerabilities for
communities.
2. We refuse to be disciplined by data, devices, and practices
that seek to shape and normalize racialized, gendered, and
differently-abled bodies in ways that make us available to be
tracked, monitored, and surveilled. We commit to taking back
control over the ways we behave, live, and engage with data and
its technologies.
3. We refuse the use of data about people in perpetuity. We
commit to embracing agency and working with intentionality,
preparing bodies or corpuses of data to be laid to rest when they
are not being used in service to the people about whom they were
created.
4. We refuse to understand data as disembodied and thereby
dehumanized and departicularized. We commit to understanding
data as always and variously attached to bodies; we vow to
interrogate the biopolitical implications of data with a keen
eye to gender, race, sexuality, class, disability, nationality,
and other forms of embodied difference.
5. We refuse any code of phony “ethics” and false proclamations of
transparency that are wielded as cover, as tools of power, as forms
for escape that let the people who create systems off the hook from
accountability or responsibility. We commit to a feminist data
ethics that explicitly seeks equity and demands justice by helping
us understand and shift how power works.'''
# when splitting a list, remember that we have to save the
# results to a new variable
text_split = text.split()
# with the sort method, we don't have to save the results to
# a new variable. It will automatically just print them.
text_split.sort()
Notice a couple of things:
I lied. Well, I omitted something. I included a
string
method when I said I was only going to show youlist
methods. Can you spot the roguestring
method? Hint: you’ve seen it before, and it is attached to a string type of object.Some methods return new objects, while other methods simply change existing ones. The
sort()
method is one that changes the existing object. Unlike thesplit()
method, we don’t have to save the results ofsort()
to a new variable.sort()
, and other methods likereverse()
, change the item “in place,” meaning that the existing list is modified.split()
, by contrast, creates a new object, which needs to be saved. This concept of changing things in place is a bit advanced, so there is no rush to grasp it now.The resulting list is organized alphabetically, which means that any punctuation and/or numbers will appear before any letters. This will explain why the results may not seem alphabetically sorted at first.
See the little hashtag
#
symbol? That’s how you write a comment in Python. By putting the#
at the start of the line, that tells Pyhon not to process that line’s contents, that it’s meant for human readers.
Remember that these list
methods will not work on string
type data.
Try running a few of them on a string, just to see what happens.
The new concepts you learned in this lesson (list methods, indexing, and slicing) will be crucial tools for working with lists in future lessons and workshops. You will have plenty of opportunities to revisit the concepts over and over again until they start to seem more intuitive.
In the next lessons, we will introduce skills for working more programmatically with lists.