saving scraped data#
For our project, remember that we want to scrape information about each bill contained within the bill cards.
And like all good programmers, we broke this our task up into a number of steps, some of which we’ve already done in the previous notebook:
isolate the bill_cards data from the rest of the webpage (already done)
pick out the information we want from the bill cards (already done)
process our information into lists
adding more data to our lists
save that information to a csv file
Now, we are on step three, processing elements and saves them into a list. Each of these steps itself contains smaller steps, which we will figure out as we go along.
Before continuing our work, we will import the libraries we need and create our soup
object (that holds our website content), and our bill_cards
object (which holds our bill card data).
import requests
from bs4 import BeautifulSoup
site = requests.get('https://translegislation.com/bills/2024/US')
html_code = site.content
soup = BeautifulSoup(html_code, 'lxml')
---------------------------------------------------------------------------
FeatureNotFound Traceback (most recent call last)
Cell In[2], line 3
1 site = requests.get('https://translegislation.com/bills/2024/US')
2 html_code = site.content
----> 3 soup = BeautifulSoup(html_code, 'lxml')
File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/bs4/__init__.py:364, in BeautifulSoup.__init__(self, markup, features, builder, parse_only, from_encoding, exclude_encodings, element_classes, **kwargs)
362 possible_builder_class = builder_registry.lookup(*features)
363 if possible_builder_class is None:
--> 364 raise FeatureNotFound(
365 "Couldn't find a tree builder with the features you "
366 "requested: %s. Do you need to install a parser library?"
367 % ",".join(features)
368 )
369 builder_class = possible_builder_class
371 # At this point either we have a TreeBuilder instance in
372 # builder, or we have a builder_class that we can instantiate
373 # with the remaining **kwargs.
FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
# to get the element and class for the cards, use the inspector
bill_cards = soup.find_all('div', class_ ='css-4rck61')
Now, we can write our loop that grabs all the elements we want.
# runs the loop on the bill cards
for item in bill_cards[:10]: # only the first ten cards, just to check if it is working
print(item.h3.text) # title
print(item.h2.text) # caption
print(item.span.text) # category
print(item.p.text) # description (if any)
print(item.a['href']) # add https://translegislation.com/bills/2023/US
US HB1064
Ensuring Military Readiness Act of 2023
MILITARY
To provide requirements related to the eligibility of transgender individuals from serving in the Armed Forces.
/bills/2024/US/HB1064
US HB1112
Ensuring Military Readiness Act of 2023
MILITARY
To provide requirements related to the eligibility of individuals who identify as transgender from serving in the Armed Forces.
/bills/2024/US/HB1112
US HB1276
Protect Minors from Medical Malpractice Act of 2023
HEALTHCARE
To protect children from medical malpractice in the form of gender transition procedures.
/bills/2024/US/HB1276
US HB1399
Protect Children’s Innocence Act
HEALTHCARE
To amend chapter 110 of title 18, United States Code, to prohibit gender affirming care on minors, and for other purposes.
/bills/2024/US/HB1399
US HB1490
Preventing Violence Against Female Inmates Act of 2023
INCARCERATION
To secure the dignity and safety of incarcerated women.
/bills/2024/US/HB1490
US HB1585
Prohibiting Parental Secrecy Policies In Schools Act of 2023
EDUCATION
To require a State receiving funds pursuant to title II of the Elementary and Secondary Education Act of 1965 to implement a State policy to prohibit a school employee from conducting certain social gender transition interventions.
/bills/2024/US/HB1585
US HB216
My Child, My Choice Act of 2023
EDUCATION
To prohibit Federal education funds from being provided to elementary schools that do not require teachers to obtain written parental consent prior to teaching lessons specifically related to gender identity, sexual orientation, or transgender studies, and for other purposes.
/bills/2024/US/HB216
US HB3101
TPA Act Traditional Passport Act
OTHER
To prohibit the issuance of a passport with any gender designation other than "male" or "female", and for other purposes.
/bills/2024/US/HB3101
US HB3102
TSA Act Traditional Screening Application Act
OTHER
To prohibit the Transportation Security Administration from using the "X" gender designation in the TSA PreCheck advanced security program, and for other purposes.
/bills/2024/US/HB3102
US HB3328
Protecting Children From Experimentation Act of 2023
HEALTHCARE
To amend chapter 110 of title 18, United States Code, to prohibit gender transition procedures on minors, and for other purposes.
/bills/2024/US/HB3328
step 3: process our information into lists#
Now, the next step is to assign a variable for each item. This allows us to save the data to the variable name, and later, to add it to a list.
for item in bill_cards[:10]:
title = item.h3.text
caption = item.h2.text
category = item.find('span').text
description = item.p.text
link = 'https://translegislation.com/bills/2023/passed' + item.a['href']
print(title, caption, category, description, link)
US HB1064 Ensuring Military Readiness Act of 2023 MILITARY To provide requirements related to the eligibility of transgender individuals from serving in the Armed Forces. https://translegislation.com/bills/2023/passed/bills/2024/US/HB1064
US HB1112 Ensuring Military Readiness Act of 2023 MILITARY To provide requirements related to the eligibility of individuals who identify as transgender from serving in the Armed Forces. https://translegislation.com/bills/2023/passed/bills/2024/US/HB1112
US HB1276 Protect Minors from Medical Malpractice Act of 2023 HEALTHCARE To protect children from medical malpractice in the form of gender transition procedures. https://translegislation.com/bills/2023/passed/bills/2024/US/HB1276
US HB1399 Protect Children’s Innocence Act HEALTHCARE To amend chapter 110 of title 18, United States Code, to prohibit gender affirming care on minors, and for other purposes. https://translegislation.com/bills/2023/passed/bills/2024/US/HB1399
US HB1490 Preventing Violence Against Female Inmates Act of 2023 INCARCERATION To secure the dignity and safety of incarcerated women. https://translegislation.com/bills/2023/passed/bills/2024/US/HB1490
US HB1585 Prohibiting Parental Secrecy Policies In Schools Act of 2023 EDUCATION To require a State receiving funds pursuant to title II of the Elementary and Secondary Education Act of 1965 to implement a State policy to prohibit a school employee from conducting certain social gender transition interventions. https://translegislation.com/bills/2023/passed/bills/2024/US/HB1585
US HB216 My Child, My Choice Act of 2023 EDUCATION To prohibit Federal education funds from being provided to elementary schools that do not require teachers to obtain written parental consent prior to teaching lessons specifically related to gender identity, sexual orientation, or transgender studies, and for other purposes. https://translegislation.com/bills/2023/passed/bills/2024/US/HB216
US HB3101 TPA Act Traditional Passport Act OTHER To prohibit the issuance of a passport with any gender designation other than "male" or "female", and for other purposes. https://translegislation.com/bills/2023/passed/bills/2024/US/HB3101
US HB3102 TSA Act Traditional Screening Application Act OTHER To prohibit the Transportation Security Administration from using the "X" gender designation in the TSA PreCheck advanced security program, and for other purposes. https://translegislation.com/bills/2023/passed/bills/2024/US/HB3102
US HB3328 Protecting Children From Experimentation Act of 2023 HEALTHCARE To amend chapter 110 of title 18, United States Code, to prohibit gender transition procedures on minors, and for other purposes. https://translegislation.com/bills/2023/passed/bills/2024/US/HB3328
It works! Now let’s save it to lists.
# a bunch of empty lists where we will dump our data
titles = []
captions = []
categories = []
descriptions = []
links = []
# our for loop that saves each item we want from the bill_cards
for item in bill_cards:
title = item.h3.text
category = item.find('span').text
caption = item.h2.text
description = item.p.text
link = 'https://translegislation.com/bills/2023/passed' + item.a['href']
# adding the items to the empty lists
titles.append(title)
categories.append(category)
captions.append(caption)
descriptions.append(description)
links.append(link)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[10], line 13
11 category = item.find('span').text
12 caption = item.h2.text
---> 13 description = item.p.text
14 link = 'https://translegislation.com/bills/2023/passed' + item.a['href']
16 # adding the items to the empty lists
AttributeError: 'NoneType' object has no attribute 'text'
individual challenge:#
Google this error, try to understand what it means. And then try out a solution from Stak Overflow, making sure to change out the variable names.
# a bunch of empty lists where we will dump our data
titles = []
captions = []
categories = []
descriptions = []
links = []
# our for loop that saves each item we want from the bill_cards
for item in bill_cards:
title = item.h3.text
category = item.find('span').text
caption = item.h2.text
if item.h2.text is not None:
description = item.h2.text
else:
description = 'No bill description'
link = 'https://translegislation.com/bills/2023/passed' + item.a['href']
# adding the items to the empty lists
titles.append(title)
categories.append(category)
captions.append(caption)
descriptions.append(description)
links.append(link)
step 4: adding more data to our lists#
Before saving our dataset to a spreadsheet, we are going to do a bit more data gathering. This will enable us to make a more robust dataset at the end. Here, we are going to get the link directly to the bill page on LegiScan.
Like the previous sections, I’m going to use comments to write some pseudo-code that separates out the steps of the larger task. This is good practice for all programmers.
## now, we will get the link to state bill, in the following steps:
## first, make a list of URLs:
## then, for each URL, make a soup.
## then, for each soup, get the link to the state bill, called "extension"
## then, add the link extension to the root, saving it as "urls"
## finally, add the urls to a new list, called "legiscan links"
for item in bill_cards[:10]:
extension = 'https://translegislation.com/' + item.a['href']
print(extension)
https://translegislation.com//bills/2024/US/HB1064
https://translegislation.com//bills/2024/US/HB1112
https://translegislation.com//bills/2024/US/HB1276
https://translegislation.com//bills/2024/US/HB1399
https://translegislation.com//bills/2024/US/HB1490
https://translegislation.com//bills/2024/US/HB1585
https://translegislation.com//bills/2024/US/HB216
https://translegislation.com//bills/2024/US/HB3101
https://translegislation.com//bills/2024/US/HB3102
https://translegislation.com//bills/2024/US/HB3328
urls = []
for item in bill_cards:
extension = 'https://translegislation.com/' + item.a['href']
urls.append(extension)
urls[:10]
['https://translegislation.com//bills/2024/US/HB1064',
'https://translegislation.com//bills/2024/US/HB1112',
'https://translegislation.com//bills/2024/US/HB1276',
'https://translegislation.com//bills/2024/US/HB1399',
'https://translegislation.com//bills/2024/US/HB1490',
'https://translegislation.com//bills/2024/US/HB1585',
'https://translegislation.com//bills/2024/US/HB216',
'https://translegislation.com//bills/2024/US/HB3101',
'https://translegislation.com//bills/2024/US/HB3102',
'https://translegislation.com//bills/2024/US/HB3328']
# making a soup object of *every* page that is linked
# this may take several seconds
soups = []
for item in urls:
site = requests.get(item)
html_code = site.content
soup = BeautifulSoup(html_code, 'lxml')
soups.append(soup)
legiscan_links = []
congress_links = []
for item in soups:
# we are getting two links here, one to legiscan and one to the congress website
links = item.find_all('a', class_='chakra-link css-oga2ct')
anchor1 = links[0]['href'] # link to legiscan
legiscan_links.append(anchor1)
anchor2 = links[1]['href'] # link to congress
congress_links.append(anchor2)
legiscan_links
['https://legiscan.com/US/text/HB1064/id/2737306',
'https://legiscan.com/US/text/HB1112/id/2742708',
'https://legiscan.com/US/text/HB1276/id/2755407',
'https://legiscan.com/US/text/HB1399/id/2796538',
'https://legiscan.com/US/text/HB1490/id/2761146',
'https://legiscan.com/US/text/HB1585/id/2763467',
'https://legiscan.com/US/text/HB216/id/2654610',
'https://legiscan.com/US/text/HB3101/id/2830677',
'https://legiscan.com/US/text/HB3102/id/2815463',
'https://legiscan.com/US/text/HB3328/id/2818358',
'https://legiscan.com/US/text/HB3329/id/2818922',
'https://legiscan.com/US/text/HB3462/id/2827206',
'https://legiscan.com/US/text/HB3887/id/2833147',
'https://legiscan.com/US/text/HB429/id/2674746',
'https://legiscan.com/US/text/HB4365/id/2846650',
'https://legiscan.com/US/text/HB4367/id/2866079',
'https://legiscan.com/US/text/HB4398/id/2835893',
'https://legiscan.com/US/text/HB4665/id/2843997',
'https://legiscan.com/US/text/HB4821/id/2849882',
'https://legiscan.com/US/text/HB5/id/2761423',
'https://legiscan.com/US/text/HB5327/id/2839779',
'https://legiscan.com/US/text/HB5636/id/2842877',
'https://legiscan.com/US/text/HB5893/id/2847457',
'https://legiscan.com/US/text/HB5894/id/2847458',
'https://legiscan.com/US/text/HB6040/id/2847526',
'https://legiscan.com/US/text/HB6258/id/2852634',
'https://legiscan.com/US/text/HB6658/id/2866510',
'https://legiscan.com/US/text/HB6728/id/2866449',
'https://legiscan.com/US/text/HB7183/id/2930086',
'https://legiscan.com/US/text/HB7187/id/2930340',
'https://legiscan.com/US/text/HB734/id/2787793',
'https://legiscan.com/US/text/HB736/id/3023788',
'https://legiscan.com/US/text/HB7725/id/2974064',
'https://legiscan.com/US/text/HB8070/id/3011826',
'https://legiscan.com/US/text/HB8433/id/3012543',
'https://legiscan.com/US/text/HB8580/id/3020996',
'https://legiscan.com/US/text/HB8708/id/3016017',
'https://legiscan.com/US/text/HB8752/id/3013700',
'https://legiscan.com/US/text/HB8771/id/3020997',
'https://legiscan.com/US/text/HB8774/id/3020988',
'https://legiscan.com/US/text/HB8997/id/3014401',
'https://legiscan.com/US/text/HB8998/id/3021001',
'https://legiscan.com/US/text/HB9026/id/3014403',
'https://legiscan.com/US/text/HB9027/id/3014469',
'https://legiscan.com/US/text/HB9028/id/3014468',
'https://legiscan.com/US/text/HB9029/id/3014470',
'https://legiscan.com/US/text/HB9218/id/3019830',
'https://legiscan.com/US/text/HB9586/id/3021417',
'https://legiscan.com/US/text/HB985/id/2727973',
'https://legiscan.com/US/text/HJR160/id/3003197',
'https://legiscan.com/US/text/HJR165/id/3015576',
'https://legiscan.com/US/text/HR115/id/2692544',
'https://legiscan.com/US/text/HR1223/id/2996410',
'https://legiscan.com/US/text/HR282/id/2773507',
'https://legiscan.com/US/text/HR298/id/2786011',
'https://legiscan.com/US/text/HR518/id/2828339',
'https://legiscan.com/US/text/HR536/id/2830680',
'https://legiscan.com/US/text/HR769/id/2852616',
'https://legiscan.com/US/text/SB1595/id/2819634',
'https://legiscan.com/US/text/SB1597/id/2819703',
'https://legiscan.com/US/text/SB1709/id/2827463',
'https://legiscan.com/US/text/SB187/id/2696929',
'https://legiscan.com/US/text/SB200/id/2702901',
'https://legiscan.com/US/text/SB2357/id/2836565',
'https://legiscan.com/US/text/SB2394/id/2836690',
'https://legiscan.com/US/text/SB2797/id/2841880',
'https://legiscan.com/US/text/SB3035/id/2844163',
'https://legiscan.com/US/text/SB3438/id/2865749',
'https://legiscan.com/US/text/SB3729/id/2927908',
'https://legiscan.com/US/text/SB435/id/2727671',
'https://legiscan.com/US/text/SB457/id/2734132',
'https://legiscan.com/US/text/SB4638/id/3014404',
'https://legiscan.com/US/text/SB613/id/2746832',
'https://legiscan.com/US/text/SB635/id/2752091',
'https://legiscan.com/US/text/SB752/id/2760328',
'https://legiscan.com/US/text/SJR90/id/3003899',
'https://legiscan.com/US/text/SJR96/id/3009679',
'https://legiscan.com/US/text/SR267/id/2831179',
'https://legiscan.com/US/text/SR53/id/2696872',
'https://legiscan.com/US/text/SR669/id/2998369']
congress_links
['https://www.congress.gov/bill/118th-congress/house-bill/1064/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/1112/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/1276/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/1399/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/1490/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/1585/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/216/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/3101/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/3102/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/3328/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/3329/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/3462/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/3887/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/429/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/4365/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/4367/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/4398/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/4665/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/4821/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/5/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/5327/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/5636/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/5893/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/5894/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/6040/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/6258/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/6658/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/6728/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/7183/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/7187/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/734/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/736/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/7725/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8070/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8433/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8580/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8708/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8752/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8771/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8774/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8997/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/8998/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/9026/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/9027/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/9028/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/9029/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/9218/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/9586/all-info',
'https://www.congress.gov/bill/118th-congress/house-bill/985/all-info',
'https://www.congress.gov/bill/118th-congress/house-joint-resolution/160/all-info',
'https://www.congress.gov/bill/118th-congress/house-joint-resolution/165/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/115/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/1223/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/282/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/298/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/518/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/536/all-info',
'https://www.congress.gov/bill/118th-congress/house-resolution/769/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/1595/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/1597/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/1709/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/187/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/200/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/2357/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/2394/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/2797/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/3035/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/3438/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/3729/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/435/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/457/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/4638/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/613/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/635/all-info',
'https://www.congress.gov/bill/118th-congress/senate-bill/752/all-info',
'https://www.congress.gov/bill/118th-congress/senate-joint-resolution/90/all-info',
'https://www.congress.gov/bill/118th-congress/senate-joint-resolution/96/all-info',
'https://www.congress.gov/bill/118th-congress/senate-resolution/267/all-info',
'https://www.congress.gov/bill/118th-congress/senate-resolution/53/all-info',
'https://www.congress.gov/bill/118th-congress/senate-resolution/669/all-info']
step 5: saving our data to a CSV#
This is the final step. First, we will import two libraries for working with tabular data pandas
and csv
.
Then, we will add each of our lists into the “DataFrame” (the pandas
term for a tabular type of object), where they will appear as separate columns. Finally, we will save our DataFrame as a .csv file.
# importing the necessary libraries
import pandas as pd
import csv
# creating empty lists to hold all of our data
titles = []
captions = []
categories = []
descriptions = []
# extracting the data from the bill cards
for item in bill_cards:
title = item.h3.text
category = item.find('span').text
caption = item.h2.text
if item.h2.text is not None:
description = item.h2.text
else:
description = 'No bill description'
# adding the items to the empty lists
titles.append(title)
categories.append(category)
captions.append(caption)
descriptions.append(description)
# remember that "legiscan_links" is already saved as a list, so we don't have to create it here
# creating a dataframe, with separate columns to hold each of our lists
df = pd.DataFrame(
{'title': titles,
'caption': captions,
'category': categories,
'description': descriptions,
'url': urls,
'legiscan': legiscan_links,
'congress': congress_links
})
# checking the first 5 lines of the dataframe
df.head()
title | caption | category | description | url | legiscan | congress | |
---|---|---|---|---|---|---|---|
0 | US HB1064 | Ensuring Military Readiness Act of 2023 | MILITARY | Ensuring Military Readiness Act of 2023 | https://translegislation.com//bills/2024/US/HB... | https://legiscan.com/US/text/HB1064/id/2737306 | https://www.congress.gov/bill/118th-congress/h... |
1 | US HB1112 | Ensuring Military Readiness Act of 2023 | MILITARY | Ensuring Military Readiness Act of 2023 | https://translegislation.com//bills/2024/US/HB... | https://legiscan.com/US/text/HB1112/id/2742708 | https://www.congress.gov/bill/118th-congress/h... |
2 | US HB1276 | Protect Minors from Medical Malpractice Act of... | HEALTHCARE | Protect Minors from Medical Malpractice Act of... | https://translegislation.com//bills/2024/US/HB... | https://legiscan.com/US/text/HB1276/id/2755407 | https://www.congress.gov/bill/118th-congress/h... |
3 | US HB1399 | Protect Children’s Innocence Act | HEALTHCARE | Protect Children’s Innocence Act | https://translegislation.com//bills/2024/US/HB... | https://legiscan.com/US/text/HB1399/id/2796538 | https://www.congress.gov/bill/118th-congress/h... |
4 | US HB1490 | Preventing Violence Against Female Inmates Act... | INCARCERATION | Preventing Violence Against Female Inmates Act... | https://translegislation.com//bills/2024/US/HB... | https://legiscan.com/US/text/HB1490/id/2761146 | https://www.congress.gov/bill/118th-congress/h... |
# saving the dataframe as a csv file
df.to_csv('bill_data.csv')
And that’s all! If you are on google colab, check your sidebar under the “files” tab. You should see a .csv file containing the data we’ve scraped from the translegislation.com
website. Well done!
In the next section, we will look at an API method for getting legislative data, and save that data to a CSV file. In that activity, you’ll see the differences in handling data acrossn web scraping and API methods.