Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

scraping with bs4

anti-trans legislation

This section uses requests and bs4 to scrape basic metadata about legislative bills that limit trans people’s rights in the USA. This dataset will be part of later lessons on “cleaning data” and “analyzing data”, further on in this curriculum, where you will be working with the full text of the bills themselves.

what is bs4?

bs4 is short for BeautifulSoup4, a python package for parsing HTML data. bs4’s power comes from using python syntax to access and manipulate HTML elements. This means that it uses the python language and its syntax to get information from pages written in the web’s main computer lanugage, HTML.

I explain what the code below does in “comments” contained within each cell. Comments in Python are written on lines that begin with a hashtag #. They are like annotations for the code. The # which starts the comment line indicates to the computer that it should ignore that line (in other words, that the line is meant for human readers).

# import the following libraries for web scraping

import requests # to make https requests
from bs4 import BeautifulSoup # our web scraping library
import lxml # our parser (to handle html data)

NOTE: If you get an error with importing one of the above libraries, make sure you have them installed. On Jupyter, that means running the following in your Command Line program (like Terminal or Gitbash):

pip install requests
pip install bs4
pip install lxml

On colab, run the same code, but within your python notebook. Be sure to put an exclamation before pip, like:

!pip install requests
!pip install bs4
!pip install lxml
# save the data from the website as a "soup" object

site = requests.get('https://translegislation.com/bills/2025/US') # gets the URL
html_code = site.content # saves the HTML code
soup = BeautifulSoup(html_code) # creates a soup object

the soup object

This word “object” in Python is something you’ll hear often. It means a collection of data and functions that can work on that data. You can think of it as a way of representing real world objects (like this web page) that is organized and accessible, so you can search and manipulate that information with Python.

Let’s take an initial look into what this beautiful soup object allows us to do. It takes the HTML source, the specific HTML elements or “tags,” and makes it possible for us to access those tags using python syntax -- specifically, the dot syntax to access information about elements (as a property) and to do things (as a function).

two ways of accessing html elements:

1. selecting html elements using dot syntax

Once we have created our soup, we can use dot syntax to access html elements as a Python property. Notice the result includes the entire html element (with opening and closing tags) that we are searching for.

# get title

soup.title
<title>United States Bills | Anti-trans legislation</title>
# checking for third level header element

soup.h3
<h3 class="chakra-heading css-1vygpf9"><style data-emotion="css f4h6uy">.css-f4h6uy{transition-property:var(--chakra-transition-property-common);transition-duration:var(--chakra-transition-duration-fast);transition-timing-function:var(--chakra-transition-easing-ease-out);cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:2px solid transparent;outline-offset:2px;color:inherit;}.css-f4h6uy:hover,.css-f4h6uy[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.css-f4h6uy:focus,.css-f4h6uy[data-focus]{box-shadow:var(--chakra-shadows-outline);}</style><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1">US<!-- --> <!-- -->HB1</a></h3>
# commented out, because output is too large!

# soup.div

Where is the div on the page? Go back to the inspector and see if you can find the first div.

2. selecting html elements using .find()

We can do the same as above, but using the find() method.

soup.find('title')
<title>United States Bills | Anti-trans legislation</title>
soup.find('h3')
<h3 class="chakra-heading css-1vygpf9"><style data-emotion="css f4h6uy">.css-f4h6uy{transition-property:var(--chakra-transition-property-common);transition-duration:var(--chakra-transition-duration-fast);transition-timing-function:var(--chakra-transition-easing-ease-out);cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:2px solid transparent;outline-offset:2px;color:inherit;}.css-f4h6uy:hover,.css-f4h6uy[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.css-f4h6uy:focus,.css-f4h6uy[data-focus]{box-shadow:var(--chakra-shadows-outline);}</style><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1">US<!-- --> <!-- -->HB1</a></h3>
# commented out, because output is too large!

# soup.find('div')

narrowing down our data by text, class, or href

getting just text

We can access the text within each tag, getting rid of tags like <p> or <h3>, by using the text property.

soup.h3.text
'US HB1'

You can layer elements on top of each other to get more specific elements

soup.div.a.text
'Trans Legislation Tracker'

Using a variable, we can save just the text. This will be useful later, when we write more complex code, and migrate our data into a spreadsheet.

# saving the text from the level 3 header element to "bill_title"

bill_title = soup.h3.text
bill_title
'US HB1'

searching by HTML attributes: class and href

HTML attributes contain additional inforamation about HTML tag. To access the attributes like class, we use the syntax: tag['attr'])For example, we can search for CSS classes using this syntax. (Read more about CSS classes).

# note that this prints the value of each attribute (like the name of the class), not
# the actual text contained within the larger element. For that, use the `text` property.

soup.h3['class']
['chakra-heading', 'css-1vygpf9']

A popular attribute is href, which stands for hyperlink reference, and it contains the link’s URL address.

link_location = soup.h3.a['href']
# the result will be just a `/` because it links to the current page

link_location
'/bills/2025/US/HB1'

Once we have a class value, then we can get more granular about our searching. Here, we would use the find() method. This is useful when there are a lot of objects with the same element, and you want more specificity.

For example, if we want to access a particular element that has a specific class name, include the class_=xxx in your find() or find_all() call.

soup.find('h3', class_='css-1vygpf9')
<h3 class="chakra-heading css-1vygpf9"><style data-emotion="css f4h6uy">.css-f4h6uy{transition-property:var(--chakra-transition-property-common);transition-duration:var(--chakra-transition-duration-fast);transition-timing-function:var(--chakra-transition-easing-ease-out);cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:2px solid transparent;outline-offset:2px;color:inherit;}.css-f4h6uy:hover,.css-f4h6uy[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.css-f4h6uy:focus,.css-f4h6uy[data-focus]{box-shadow:var(--chakra-shadows-outline);}</style><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1015">US<!-- --> <!-- -->HB1015</a></h3>
soup.find_all('h3', class_='css-1vygpf9')
[<h3 class="chakra-heading css-1vygpf9"><style data-emotion="css f4h6uy">.css-f4h6uy{transition-property:var(--chakra-transition-property-common);transition-duration:var(--chakra-transition-duration-fast);transition-timing-function:var(--chakra-transition-easing-ease-out);cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:2px solid transparent;outline-offset:2px;color:inherit;}.css-f4h6uy:hover,.css-f4h6uy[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.css-f4h6uy:focus,.css-f4h6uy[data-focus]{box-shadow:var(--chakra-shadows-outline);}</style><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1">US<!-- --> <!-- -->HB1</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1015">US<!-- --> <!-- -->HB1015</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1016">US<!-- --> <!-- -->HB1016</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1017">US<!-- --> <!-- -->HB1017</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1028">US<!-- --> <!-- -->HB1028</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1139">US<!-- --> <!-- -->HB1139</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1208">US<!-- --> <!-- -->HB1208</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1282">US<!-- --> <!-- -->HB1282</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1866">US<!-- --> <!-- -->HB1866</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2107">US<!-- --> <!-- -->HB2107</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2197">US<!-- --> <!-- -->HB2197</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2202">US<!-- --> <!-- -->HB2202</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2378">US<!-- --> <!-- -->HB2378</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2387">US<!-- --> <!-- -->HB2387</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2616">US<!-- --> <!-- -->HB2616</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB2617">US<!-- --> <!-- -->HB2617</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB28">US<!-- --> <!-- -->HB28</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3205">US<!-- --> <!-- -->HB3205</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3247">US<!-- --> <!-- -->HB3247</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3406">US<!-- --> <!-- -->HB3406</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3492">US<!-- --> <!-- -->HB3492</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3518">US<!-- --> <!-- -->HB3518</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3688">US<!-- --> <!-- -->HB3688</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3765">US<!-- --> <!-- -->HB3765</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3792">US<!-- --> <!-- -->HB3792</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3802">US<!-- --> <!-- -->HB3802</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3917">US<!-- --> <!-- -->HB3917</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3944">US<!-- --> <!-- -->HB3944</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB3950">US<!-- --> <!-- -->HB3950</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4016">US<!-- --> <!-- -->HB4016</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4021">US<!-- --> <!-- -->HB4021</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4121">US<!-- --> <!-- -->HB4121</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4138">US<!-- --> <!-- -->HB4138</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4213">US<!-- --> <!-- -->HB4213</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4249">US<!-- --> <!-- -->HB4249</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4363">US<!-- --> <!-- -->HB4363</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4512">US<!-- --> <!-- -->HB4512</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4552">US<!-- --> <!-- -->HB4552</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4553">US<!-- --> <!-- -->HB4553</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB461">US<!-- --> <!-- -->HB461</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4618">US<!-- --> <!-- -->HB4618</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4730">US<!-- --> <!-- -->HB4730</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4754">US<!-- --> <!-- -->HB4754</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4779">US<!-- --> <!-- -->HB4779</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB4953">US<!-- --> <!-- -->HB4953</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB498">US<!-- --> <!-- -->HB498</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB5047">US<!-- --> <!-- -->HB5047</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB5050">US<!-- --> <!-- -->HB5050</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB5116">US<!-- --> <!-- -->HB5116</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB5166">US<!-- --> <!-- -->HB5166</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB5304">US<!-- --> <!-- -->HB5304</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB5342">US<!-- --> <!-- -->HB5342</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB600">US<!-- --> <!-- -->HB600</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB650">US<!-- --> <!-- -->HB650</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB653">US<!-- --> <!-- -->HB653</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB742">US<!-- --> <!-- -->HB742</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB800">US<!-- --> <!-- -->HB800</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB925">US<!-- --> <!-- -->HB925</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR157">US<!-- --> <!-- -->HR157</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR167">US<!-- --> <!-- -->HR167</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR199">US<!-- --> <!-- -->HR199</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR224">US<!-- --> <!-- -->HR224</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR26">US<!-- --> <!-- -->HR26</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR47">US<!-- --> <!-- -->HR47</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR475">US<!-- --> <!-- -->HR475</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HR536">US<!-- --> <!-- -->HR536</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB1147">US<!-- --> <!-- -->SB1147</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB1551">US<!-- --> <!-- -->SB1551</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB1658">US<!-- --> <!-- -->SB1658</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB1811">US<!-- --> <!-- -->SB1811</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB1988">US<!-- --> <!-- -->SB1988</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2008">US<!-- --> <!-- -->SB2008</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2037">US<!-- --> <!-- -->SB2037</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB204">US<!-- --> <!-- -->SB204</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB209">US<!-- --> <!-- -->SB209</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2091">US<!-- --> <!-- -->SB2091</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2251">US<!-- --> <!-- -->SB2251</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2296">US<!-- --> <!-- -->SB2296</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2385">US<!-- --> <!-- -->SB2385</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB2702">US<!-- --> <!-- -->SB2702</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB312">US<!-- --> <!-- -->SB312</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB382">US<!-- --> <!-- -->SB382</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB405">US<!-- --> <!-- -->SB405</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB576">US<!-- --> <!-- -->SB576</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB591">US<!-- --> <!-- -->SB591</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB74">US<!-- --> <!-- -->SB74</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB851">US<!-- --> <!-- -->SB851</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB9">US<!-- --> <!-- -->SB9</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SB977">US<!-- --> <!-- -->SB977</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SR21">US<!-- --> <!-- -->SR21</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SR22">US<!-- --> <!-- -->SR22</a></h3>, <h3 class="chakra-heading css-1vygpf9"><a class="chakra-link css-f4h6uy" href="/bills/2025/US/SR295">US<!-- --> <!-- -->SR295</a></h3>]
# save h3 element to a variable 

bill_title = soup.find('h3', class_='css-1vygpf9')
bill_title
<h3 class="chakra-heading css-1vygpf9"><style data-emotion="css f4h6uy">.css-f4h6uy{transition-property:var(--chakra-transition-property-common);transition-duration:var(--chakra-transition-duration-fast);transition-timing-function:var(--chakra-transition-easing-ease-out);cursor:pointer;-webkit-text-decoration:none;text-decoration:none;outline:2px solid transparent;outline-offset:2px;color:inherit;}.css-f4h6uy:hover,.css-f4h6uy[data-hover]{-webkit-text-decoration:underline;text-decoration:underline;}.css-f4h6uy:focus,.css-f4h6uy[data-focus]{box-shadow:var(--chakra-shadows-outline);}</style><a class="chakra-link css-f4h6uy" href="/bills/2025/US/HB1">US<!-- --> <!-- -->HB1</a></h3>

Making variables is useful for layering other operations on top, like getting text.

bill_title.text
'US HB1015'

looping through find_all() to get just text

Want to print out all tags of a specific element? Then we use find_all(). Note: we use find_all() rather than find(), because only find_all() returns a list like object, which is better for looping.

Let’s do this to get all of the bill names, with just the text.

soup.find_all('div', class_ ='css-4rck61').text
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[30], line 1
----> 1 soup.find_all('div', class_ ='css-4rck61').text

File ~/.conda/envs/jb/lib/python3.11/site-packages/bs4/element.py:2433, in ResultSet.__getattr__(self, key)
   2431 def __getattr__(self, key):
   2432     """Raise a helpful exception to explain a common code fix."""
-> 2433     raise AttributeError(
   2434         "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key
   2435     )

AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
for i in soup.find_all('div', class_ ='css-4rck61'):
    print(i.text)
Fetching long content....

methods vs attributes

The decision to use dot syntax (like soup.h3.a) or a method (like find() or find_all()) depends on what you’re trying to do, and what kind of data you have about the thing you’re trying to scrape. In many cases, you could use either one.

The difference between the two is how data is stored in Python. In dot syntax, it’s stored as an attribute, or property, of the soup object. By using a method like find(), you’re executing a function to find the data.

individual challenge:

Let’s try doing the same thing two different ways. What if I wanted to get the link, the value of the href attribute, using both methods and attributes?

# use find to search by element and class. Now, grab the link. 

link = soup.find('h3', class_='css-1vygpf9').a['href']

print(link)
/bills/2025/US/HB1015
# there are multiple ways to do this! If you don't need the class, just use dot syntax

soup.h3.a['href']
'/bills/2025/US/HB1015'

You can also use find() to search an element by specific attribute. Just include the class_=xxx in your find() call.

for i in soup.find_all('h3'):
    print(i.text)
US HB1
US HB1015
US HB1016
US HB1017
US HB1028
US HB1139
US HB1208
US HB1282
US HB1866
US HB2107
US HB2197
US HB2202
US HB2378
US HB2387
US HB2616
US HB2617
US HB28
US HB3205
US HB3247
US HB3406
US HB3492
US HB3518
US HB3688
US HB3765
US HB3792
US HB3802
US HB3917
US HB3944
US HB3950
US HB4016
US HB4021
US HB4121
US HB4138
US HB4213
US HB4249
US HB4363
US HB4512
US HB4552
US HB4553
US HB461
US HB4618
US HB4730
US HB4754
US HB4779
US HB4953
US HB498
US HB5047
US HB5050
US HB5116
US HB5166
US HB5304
US HB5342
US HB600
US HB650
US HB653
US HB742
US HB800
US HB925
US HR157
US HR167
US HR199
US HR224
US HR26
US HR47
US HR475
US HR536
US SB1147
US SB1551
US SB1658
US SB1811
US SB1988
US SB2008
US SB2037
US SB204
US SB209
US SB2091
US SB2251
US SB2296
US SB2385
US SB2702
US SB312
US SB382
US SB405
US SB576
US SB591
US SB74
US SB851
US SB9
US SB977
US SR21
US SR22
US SR295

getting data from each bill card

For our project, we want to scrape information about each bill contained within the bill cards.

Like all good programmers, we will break our task up into a number of steps:

  1. isolate the bill_cards data from the rest of the webpage

  2. pick out the information we want from the bill cards

Each of these steps itself contains smaller steps, which we will figure out as we go along. Let’s begin with the first step. Here, we want to separate out that information (within the bill cards) from the rest of the website. This will make it easier to then go grab the elements we need later.

step 1: isolate our bill_cards data from the rest of the web page

First, create a new object called bill_cards, which enables us to narrow down the parts of the website that we want to scrape.

# to get the element and class for the cards, use the inspector

bill_cards = soup.find_all('div', class_ ='css-4rck61')

Let’s use a loop to check that we have all the right data. In the next section, we will be able to pick out specific pieces of text, based on their HTML markup.

# printing out all the text contained in the "bill cards" div

for i in bill_cards:
    print(i.text)
US HB1015HEALTHCAREINTRODUCEDTo amend title 18, United States Code, to provide for certain rules for housing or transportation based on gender and to provide for a limitation on gender-related medical treatment.To amend title 18, United States Code, to provide for certain rules for housing or transportation based on gender and to provide for a limitation on gender-related medical treatment.View Bill
US HB1016BATHROOMINTRODUCEDTo prohibit individuals from accessing or using single-sex facilities on Federal property other than those corresponding to their biological sex, and for other purposes.To prohibit individuals from accessing or using single-sex facilities on Federal property other than those corresponding to their biological sex, and for other purposes.View Bill
US HB1017BATHROOMINTRODUCEDTo prohibit an entity from receiving Federal funds if such entity permits an individual to access or use a single-sex facility on the property of such entity that does not correspond to the biological sex of such person, and for other purposes.To prohibit an entity from receiving Federal funds if such entity permits an individual to access or use a single-sex facility on the property of such entity that does not correspond to the biological sex of such person, and for other purposes.View Bill
US HB1028SPORTSINTRODUCEDTo modify eligibility requirements for amateur sports governing organizations.View Bill
US HB1139BIRTH CERTIFICATESINTRODUCEDTo prohibit the Secretary of State from issuing a passport, passport card, or Consular Report of Birth Abroad that includes the unspecified (X) gender designation, and for other purposes.View Bill
US HB1282EDUCATIONINTRODUCEDTo prohibit Federal funding for institutions of higher education that carry out diversity, equity, and inclusion initiatives, and for other purposes.View Bill
US HB28SPORTSENGROSSEDProtection of Women and Girls in Sports Act of 2025To amend the Education Amendments of 1972 to provide that for purposes of determining compliance with title IX of such Act in athletics, sex shall be recognized based solely on a person's reproductive biology and genetics at birth.It shall be a violation of subsection (a) for a recipient of Federal financial assistance who operates, sponsors, or facilitates athletic programs or activities to permit a person whose sex is male to participate in an athletic program or activity that is designated for women or girls.View Bill
US HB461MILITARYINTRODUCEDEliminate DEI in the Military ActTo prohibit the use of Federal funds for any DEI activity in the Armed Forces, and for other purposes.View Bill
US HB498HEALTHCAREINTRODUCEDTo amend title XIX of the Social Security Act to prohibit Federal Medicaid funding for gender transition procedures for minors.View Bill
US HB600HEALTHCAREINTRODUCEDWHO is Accountable ActTo prohibit the use of funds to seek membership in the World Health Organization or to provide assessed or voluntary contributions to the World Health Organization.No funds available to any Federal department or agency may be used to seek membership by the United States in the World Health Organization or to provide assessed or voluntary contributions to the World Health Organization until such time as the Secretary of State certifies to Congress that the World Health Organization meets the conditions described in subsection (b): [...] (7) The World Health Organization has ceased all funding for, engagement in, and messaging with respect to certain controversial and politically charged issues that are non-germane to the World Health Organization’s directive, including— (A) so-called "gender identity" and harmful rhetoric relating to "gender affirming care";View Bill
US HB653HEALTHCAREINTRODUCEDTo protect children from medical malpractice in the form of gender transition procedures.View Bill
US HB742HEALTHCAREINTRODUCEDTo prohibit Federal funds from being used to provide certain gender transition procedures to minors.View Bill
US HB800OTHERINTRODUCEDTo enact into law the executive order relating to ending diversity, equity, and inclusion programs in the Federal Government, and for other purposes.To enact into law the executive order relating to ending diversity, equity, and inclusion programs in the Federal Government, and for other purposes.View Bill
US HR26OTHERINTRODUCEDDeeming certain conduct of members of Antifa as domestic terrorism and designating Antifa as a domestic terrorist organization.Deeming certain conduct of members of Antifa as domestic terrorism and designating Antifa as a domestic terrorist organization. [...] Whereas, in August 2022, Antifa ardently defended the sexualization of children by guarding a "kid-friendly" drag show at a North Texas distillery;View Bill
US HR47SPORTSINTRODUCEDConcerning the National Collegiate Athletic Association policy for eligibility in women's sports.The House of Representatives (1) calls on the National Collegiate Athletic Association (referred to in this resolution as "NCAA") to revoke its transgender student-athlete eligibility policy that directly discriminates against female student athletes;View Bill
US SB209HEALTHCAREINTRODUCEDA bill to protect children from medical malpractice in the form of gender-transition procedures.View Bill
US SB312HEALTHCAREINTRODUCEDA bill to establish a Federal tort against pediatric gender clinics and other entities pushing gender-transition procedures that cause bodily injury to children or harm the mental health of children.View Bill
US SB74SPORTSINTRODUCEDFair Play for Girls ActA bill to require the Attorney General to submit to Congress a report relating to violence against women in athletics.View Bill
US SB9SPORTSINTRODUCEDProtection of Women and Girls in Sports Act of 2025A bill to provide that for purposes of determining compliance with title IX of the Education Amendments of 1972 in athletics, sex shall be recognized based solely on a person's reproductive biology and genetics at birth.To provide that for purposes of determining compliance with title IX of the Education Amendments of 1972 in athletics, sex shall be recognized based solely on a person’s reproductive biology and genetics at birth.View Bill
US SR21SPORTSINTRODUCEDA resolution designating October 10, 2025, as "American Girls in Sports Day".Whereas the National Association of Intercollegiate Athletics (NAIA) has instituted new policies to protect biological girls in sports and ensure that only student athletes whose biological sex is female will be allowed to compete in NAIA-sponsored women's sports teams; Whereas it is imperative that women's and girl's opportunities to compete athletically are protected; and Whereas October 10th, as represented by the Roman numerals "XX", signifies the female XX chromosomes: Now, therefore, be it Resolved, That the Senate— (1) recognizes October 10, 2025, as "American Girls in Sports Day"; [...] (3) recognizes the importance of Title IX in protecting biological women in sports; and (4) calls on sports-governing bodies in the United States and abroad to protect biological women and girls in sports.View Bill
US SR22SPORTSINTRODUCEDA resolution concerning the National Collegiate Athletic Association policy for eligibility in women's sports.The House of Representatives (1) calls on the National Collegiate Athletic Association (referred to in this resolution as "NCAA") to revoke its transgender student-athlete eligibility policy that directly discriminates against female student athletes;View Bill

step 2: group challenge: pick out information from each bill card

Now that we have narrowed down our data to bill_cards, we can search within this code for the individual elements we want. For our dataset, we want to scrape the following information:

  • bill title

  • bill caption (if it exists!)

  • bill category

  • bill description

  • link to bill

Using the inspector, take 5-10 minutes create a list of html elements and attributes that correspond to the above information. Work in partners.

Once we have the code for the relevant HTML elements, we will now extract them and save them. To do that, we will write a loop that goes through each item in our bill_cards, gets the relevant HTML element, and saves it to a variable. Our loop will goes through each bill card, one by one, and pull out the title, description, category, and link.

Note: loops are ways of programmatically going through a dataset and doing something to each item in the dataset, like extracting it. Read more about loops in the intro workshop

Below, I will be explaining the code logic in by writing it out in “pseudo-code” in the comments. Pseudo-code is a cross between normal language and programming language, that is useful for explaining and working out how to write the actual programming code in Python.

# for each card in bill_cards:
# get the title in h3.text
# get the category in span.text
# get the caption in h2.text
# get the descriptoin in p.text (if any)
# get the link in a tag, class "chakra-link"
# runs the loop on the bill cards
bill_cards = soup.find_all('div', class_ ='css-4rck61')

for item in bill_cards[:10]: # only the first ten cards, just to check if it is working
    print(item.h3.text) # title
    print(item.span.text) # category
    print(item.h2.text) # caption
    print(item.p.text) # description (if any)
    print(item.a['href']) # add https://translegislation.com/bills/2023/US
US HB1064
MILITARY
Ensuring Military Readiness Act of 2023
To provide requirements related to the eligibility of transgender individuals from serving in the Armed Forces.
/bills/2024/US/HB1064
US HB1112
MILITARY
Ensuring Military Readiness Act of 2023
To provide requirements related to the eligibility of individuals who identify as transgender from serving in the Armed Forces.
/bills/2024/US/HB1112
US HB1276
HEALTHCARE
Protect Minors from Medical Malpractice Act of 2023
To protect children from medical malpractice in the form of gender transition procedures.
/bills/2024/US/HB1276
US HB1399
HEALTHCARE
Protect Children’s Innocence Act
To amend chapter 110 of title 18, United States Code, to prohibit gender affirming care on minors, and for other purposes.
/bills/2024/US/HB1399
US HB1490
INCARCERATION
Preventing Violence Against Female Inmates Act of 2023
To secure the dignity and safety of incarcerated women.
/bills/2024/US/HB1490
US HB1585
EDUCATION
Prohibiting Parental Secrecy Policies In Schools Act of 2023
To require a State receiving funds pursuant to title II of the Elementary and Secondary Education Act of 1965 to implement a State policy to prohibit a school employee from conducting certain social gender transition interventions.
/bills/2024/US/HB1585
US HB216
EDUCATION
My Child, My Choice Act of 2023
To prohibit Federal education funds from being provided to elementary schools that do not require teachers to obtain written parental consent prior to teaching lessons specifically related to gender identity, sexual orientation, or transgender studies, and for other purposes.
/bills/2024/US/HB216
US HB3101
OTHER
TPA Act Traditional Passport Act
To prohibit the issuance of a passport with any gender designation other than "male" or "female", and for other purposes.
/bills/2024/US/HB3101
US HB3102
OTHER
TSA Act Traditional Screening Application Act
To prohibit the Transportation Security Administration from using the "X" gender designation in the TSA PreCheck advanced security program, and for other purposes.
/bills/2024/US/HB3102
US HB3328
HEALTHCARE
Protecting Children From Experimentation Act of 2023
To amend chapter 110 of title 18, United States Code, to prohibit gender transition procedures on minors, and for other purposes.
/bills/2024/US/HB3328

Excellent work! In the next section, we will write some code to save this data in the form of a spreadsheet, a csv file.