Legiscan API#
Now we turn to examine APIs, how to create one, and how to navigate and download the results. You will notice how much easier it is to manage the data, which comes already in a clean and structured format (as JSON), which is opposed to the messy HTML data that we get with web scraping.
The Legiscan API provides access to legislative data, such as bills, sponsors, hearing information, etc, from all 50 states. To use the Legiscan API, you’d first need to make an account on Legiscan, and request an API key. The key (which we will incorporate into our API call) gives you authentication to access the data on Legiscan.
After getting our API key, we can create our API call. The first step is import our libraries that we need in order to use the API. Then, we can construct our API call, putting together the root, key and query.
import requests
import pandas as pd
import time
# the components of the API call, which make up the "request" variable
url = 'https://api.legiscan.com/?key='
key = # insert your key here
page = 1
request = requests.get(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))
# to print out the full URL, and we can paste in the browser to get
# an interactive look at the raw results
print(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))
Cell In[2], line 3
key = # insert your key here
^
SyntaxError: invalid syntax
Now we can make the API call, using the “request” object that we created above. From there, we call the .json()
method, to navigate through the results, which are in json format.
Accessing items in json involves using brackets to indicate the keys.
# get the page_total and the count from the request summary
page_total = request.json()['searchresult']['summary']['page_total']
count = request.json()['searchresult']['summary']['count']
print('Page total: ' + str(page_total) + '\n' + 'Total results: ' + str(count))
Page total: 13
Total results: 604
Now we write a few loops. The first one gathers all the data from each page of the results. The second loop parses that data into a dataframe object (a tabular or spreadsheet format) so we can examine it and eventually save it to a csv file.
# request the additional pages of the query by adding 1 to the 'page'
# parameter until it reaches the page_total. Store each page of
# requests in a list. Wait 3 seconds between each request to avoid
# overloading the API
pages = []
for i in range(page_total):
page = i + 1
request = requests.get(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))
time.sleep(3)
pages.append(request.json())
# for each page of the request, parse the results and add them to a
# dataframe. each page is a json file with individual results labeled
# '0' through '49' and # 'summary' nested under searchresult. Ignore
# the summary and use pandas.concat to add each of the results from
# the request in a dataframe
df = pd.DataFrame()
for page in pages:
results = page['searchresult']
# if the page has no results, skip it
for i in range(50):
if str(i) in results:
df = pd.concat([df, pd.DataFrame(results[str(i)], index=[i])])
else:
continue
Now we have our data in a tabular structure, thanks to the DataFrame that we got from a different library called pandas
. This DataFrame format enables us to examine our data in a spreadsheet.
df
relevance | state | bill_number | bill_id | change_hash | url | text_url | research_url | last_action_date | last_action | title | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100 | UT | HB0316 | 1819064 | bb27e8c4d929c9331af7b02dc6d81348 | https://legiscan.com/UT/bill/HB0316/2024 | https://legiscan.com/UT/text/HB0316/2024 | https://legiscan.com/UT/research/HB0316/2024 | 2024-02-12 | Senate/ 1st reading (Introduced) in Senate Rul... | Inmate Assignment Amendments |
1 | 99 | DC | B25-0460 | 1778035 | e96fc947b1b4170adf7a3fe91291a61b | https://legiscan.com/DC/bill/B25-0460/2023 | https://legiscan.com/DC/text/B25-0460/2023 | https://legiscan.com/DC/research/B25-0460/2023 | 2023-09-22 | Notice of Intent to Act on B25-0460 Published ... | Transgender and Gender-Diverse Mortality and F... |
2 | 99 | DC | CER25-0143 | 1782702 | 9aff2f06c9f9b38306f8f3a8e83183c8 | https://legiscan.com/DC/bill/CER25-0143/2023 | https://legiscan.com/DC/text/CER25-0143/2023 | https://legiscan.com/DC/research/CER25-0143/2023 | 2023-11-24 | Resolution ACR25-0141, Effective from Nov 07, ... | Transgender Day of Remembrance Recognition Res... |
3 | 99 | VT | JRH004 | 1751508 | bfb4a15ca9ece1f262e6ed5759e192ba | https://legiscan.com/VT/bill/JRH004/2023 | https://legiscan.com/VT/text/JRH004/2023 | https://legiscan.com/VT/research/JRH004/2023 | 2023-04-07 | Senate Message, adopted in concurrence | Joint resolution recognizing March 31, 2023 as... |
4 | 99 | US | HR886 | 1784730 | 15a2a26d5c91782333a2b37fc8083154 | https://legiscan.com/US/bill/HR886/2023 | https://legiscan.com/US/text/HR886/2023 | https://legiscan.com/US/research/HR886/2023 | 2023-11-21 | Referred to the House Committee on the Judiciary. | Supporting the goals and principles of Transge... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
49 | 6 | US | HB2670 | 1757049 | ccfa8fba0550b39bb71af841122e2132 | https://legiscan.com/US/bill/HB2670/2023 | https://legiscan.com/US/text/HB2670/2023 | https://legiscan.com/US/research/HB2670/2023 | 2023-12-22 | Became Public Law No: 118-31. | CONVENE Act of 2023 Sensible Classification Ac... |
0 | 4 | MI | HB4437 | 1757146 | 604fc33c04c0b4d89cce5b8bd0d51893 | https://legiscan.com/MI/bill/HB4437/2023 | https://legiscan.com/MI/text/HB4437/2023 | https://legiscan.com/MI/research/HB4437/2023 | 2023-09-06 | Disapproved Line Item(s) Re-referred To Commit... | Appropriations: omnibus; appropriations for mu... |
1 | 4 | NY | S04007 | 1690727 | 1579358405b6c681a7bd5ed500b7ac14 | https://legiscan.com/NY/bill/S04007/2023 | https://legiscan.com/NY/text/S04007/2023 | https://legiscan.com/NY/research/S04007/2023 | 2023-05-03 | SIGNED CHAP.57 | Enacts into law major components of legislatio... |
2 | 3 | NY | S04004 | 1690688 | 30e29d6956eec2d3ff19a58c35ad73f8 | https://legiscan.com/NY/bill/S04004/2023 | https://legiscan.com/NY/text/S04004/2023 | https://legiscan.com/NY/research/S04004/2023 | 2023-05-01 | SUBSTITUTED BY A3004D | Makes appropriations for the support of govern... |
3 | 3 | NY | A03004 | 1690608 | b862da143af26ca5a2997011fc933673 | https://legiscan.com/NY/bill/A03004/2023 | https://legiscan.com/NY/text/A03004/2023 | https://legiscan.com/NY/research/A03004/2023 | 2023-05-12 | thru line veto memo.36 | Makes appropriations for the support of govern... |
604 rows × 11 columns
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 604 entries, 0 to 3
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 relevance 604 non-null int64
1 state 604 non-null object
2 bill_number 604 non-null object
3 bill_id 604 non-null int64
4 change_hash 604 non-null object
5 url 604 non-null object
6 text_url 604 non-null object
7 research_url 604 non-null object
8 last_action_date 604 non-null object
9 last_action 604 non-null object
10 title 604 non-null object
dtypes: int64(2), object(9)
memory usage: 56.6+ KB
df.to_csv('legiscan_api_results.csv')
That’s it!
In the next workshop, we will look at analyzing the plain text from some of these bills, which we will access through the congress.gov
website.