lists#

The list data type is perhaps the most common data type for working with text. Textual data tends to be saved as a list, or a collection of separate strings, each string consisting of a sequence of characters that represents a single word.

It is important to remember that while humans might read each string as a word, the computer does not ascribe any semantic value to it, only processing it as sequence of characters.

list indexing#

List indexing enables programmers access specific items in a list. It is super useful for working with large lists, such as a full text, or even a collection of texts. It works by accessing items according to their position within a list.

By “position,” I mean where that item is located on a list. The moment a list is created in Python, it comes with an implicit index, where each item is assigned a numerical position based on its location. The first item is assigned to the position 0, the second item is assigned 1, the third item is assigned 2, the fourth item assigned 3, and so on.

The syntax for list indexing is to declare the list name followed by brackets that contain the index. The first item in a list would be accessed with the index 0.

breakfast = ['granola', 'cashew yogurt', 'strawberries', 'mango',
'coffee']

breakfast[0]
'granola'

Advanced question: Why does list indexing begin with 0?

The answer is a bit complicated, and gets into technical details of how computer memory works in programming. Put simply, it has to do with the memory location being based in offsets. Because the first item of a list is also the origin location of the list, its offset is 0. In other words, to go to the first item in the list, there is no movement (or offset) from the current, or origin, location. The second item of the list requires 1 movement from current location, and is therefore assigned to the location 1.

The third item in a list would be accessed with the index 2.

breakfast[2]
'strawberries'

Python offers a specific way of accessing the last item of a list, which can really useful for taking a peek at the end of a list. The last item in a list is accessed with the index -1. This might seem a little strange, but if you imagine the list items forming a circle which goes clockwise, then going backward (counterclockwise) one step would take you to the last item. That backward step is represented with -1.

breakfast[-1]
'coffee'

list slicing#

Like list indexing, list slicing uses a list’s implicit index to access items. Unlike list indexing, however, slicing enables someone to access a range, or a “slice,” of items from a list. This capability becomes convenient when working with large lists. For example, we might take a look into a section of a list without loading the entire list, or we might grab a portion of that list to create a new list.

To get a slice, you need to enter the starting and ending point of the section which you want to grab. In between these values, insert a colon, which represents every item that occurs in between the start and end points. A list slice therefore looks like the following:

breakfast [2:5]
['strawberries', 'mango', 'coffee']

The syntax for list slicing is a bit wonky, but like list indexing, it is just a matter of getting used to the particulars of Python’s way of counting by starting with zero, as well as the rules for inclusive and exclusive values.

Here’s the tricky part: the starting point is always inclusive, and the ending point is always exclusive. Which means that the slice will take everything starting at position 2, up until, but exclusing, the item in position 5. This rule for list slicing can cause some confusion, and it’s best just to memorize it.

One more important thing to know is the colon. In its position between the start and end point, it signifies everything in between. But if there is nothing before or after the colon, then the colon represents everything until the beginning or end of the list. For example, the syntax [1:] will take everything starting at index 1 (or the second item) until the end of the list. Similarly, [:4] will take everything from the start until the fourth index position of the list.

This trick with the colon can be really nifty for grabbing the first 10 or first 100 items from a list, allowing you to take peeks into the list items without printing the entire list. We will use this trick a lot in future workshops.

breakfast[3:]
['mango', 'coffee']

list methods#

Like strings, lists also have specific methods that only work with list type data. List methods enable one to do various things to data contained in lists.

Some of the more popular methods have to do with creating and modifying items in a list. One list method, append(), adds items to a list. For example, to a list of breakfast items, one can add additional yummy items using append():

breakfast = ['bagel', 'cream cheese', 'coffee']
breakfast.append('orange juice') 
breakfast 
['bagel', 'cream cheese', 'coffee', 'orange juice']

Other methods, remove() and pop() take items out from a list. While remove() takes out a specific item (declared within the parentheses), pop() will only take out the last item in a list, “popping” it off, so to speak.

breakfast.remove('bagel') 
breakfast
['cream cheese', 'coffee', 'orange juice']
breakfast.pop()
breakfast
['cream cheese', 'coffee']

Besides adding and deleting items, we can also sort items in a list. For these, we use the sort() and reverse() methods.

To show the power of these methods, we’ll work with a larger list, like a list of words from this workshop’s description. First, we will copy and paste the workshop description as a single string. Then, we will split the string, saving the output to a new variable. Once the data is in list format, we will run the sort() and reverse() methods.

Notice the syntax wrapping the string here, the triple quotation marks (instead of single), for saving the string. The triple quotes enables our strings to break lines and include other punctuation without messing with the string format.

text = ''' 
  1. We refuse to operate under the assumption that risk and harm
  associated with data practices can be bounded to  mean the same
  thing for everyone, everywhere, at every time. We commit to
  acknowledging how historical and systemic patterns of violence 
  and exploitation produce differential vulnerabilities for 
  communities.
  2. We refuse to be disciplined by data, devices, and practices 
  that seek to shape and normalize racialized, gendered, and 
  differently-abled bodies in ways that make us available to be 
  tracked, monitored, and surveilled. We commit to taking back 
  control over the ways we behave, live, and engage with data and 
  its technologies.
  3. We refuse the use of data about people in perpetuity. We 
  commit to embracing agency and working with intentionality, 
  preparing bodies or corpuses of data to be laid to rest when they 
  are not being used in service to the people about whom they were 
  created.
  4. We refuse to understand data as disembodied and thereby 
  dehumanized and departicularized. We commit to understanding 
  data as always and variously attached to bodies; we vow to 
  interrogate the biopolitical implications of data with a keen 
  eye to gender, race, sexuality, class, disability, nationality, 
  and other forms of embodied difference.
  5. We refuse any code of phony “ethics” and false proclamations of 
  transparency that are wielded as cover, as tools of power, as forms 
  for escape that let the people who create systems off the hook from 
  accountability or responsibility. We commit to a feminist data 
  ethics that explicitly seeks equity and demands justice by helping 
  us understand and shift how power works.'''
  
# when splitting a list, remember that we have to save the
# results to a new variable
text_split = text.split() 

# with the sort method, we don't have to save the results to
# a new variable. It will automatically just print them.
text_split.sort()

Notice a couple of things:

  1. I lied. Well, I omitted something. I included a string method when I said I was only going to show you list methods. Can you spot the rogue string method? Hint: you’ve seen it before, and it is attached to a string type of object.

  2. Some methods return new objects, while other methods simply change existing ones. The sort() method is one that changes the existing object. Unlike the split() method, we don’t have to save the results of sort() to a new variable. sort(), and other methods like reverse(), change the item “in place,” meaning that the existing list is modified. split(), by contrast, creates a new object, which needs to be saved. This concept of changing things in place is a bit advanced, so there is no rush to grasp it now.

  3. The resulting list is organized alphabetically, which means that any punctuation and/or numbers will appear before any letters. This will explain why the results may not seem alphabetically sorted at first.

  4. See the little hashtag # symbol? That’s how you write a comment in Python. By putting the # at the start of the line, that tells Pyhon not to process that line’s contents, that it’s meant for human readers.

Remember that these list methods will not work on string type data. Try running a few of them on a string, just to see what happens.

The new concepts you learned in this lesson (list methods, indexing, and slicing) will be crucial tools for working with lists in future lessons and workshops. You will have plenty of opportunities to revisit the concepts over and over again until they start to seem more intuitive.

In the next lessons, we will introduce skills for working more programmatically with lists.