Legiscan API

Legiscan API#

Now we turn to examine APIs, how to create one, and how to navigate and download the results. You will notice how much easier it is to manage the data, which comes already in a clean and structured format (as JSON), which is opposed to the messy HTML data that we get with web scraping.

The Legiscan API provides access to legislative data, such as bills, sponsors, hearing information, etc, from all 50 states. To use the Legiscan API, you’d first need to make an account on Legiscan, and request an API key. The key (which we will incorporate into our API call) gives you authentication to access the data on Legiscan.

After getting our API key, we can create our API call. The first step is import our libraries that we need in order to use the API. Then, we can construct our API call, putting together the root, key and query.

import requests
import pandas as pd
import time
# the components of the API call, which make up the "request" variable
url = 'https://api.legiscan.com/?key='
key = # insert your key here
page = 1
request = requests.get(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))

# to print out the full URL, and we can paste in the browser to get 
# an interactive look at the raw results
print(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))
  Cell In[2], line 3
    key = # insert your key here
          ^
SyntaxError: invalid syntax

Now we can make the API call, using the “request” object that we created above. From there, we call the .json() method, to navigate through the results, which are in json format.

Accessing items in json involves using brackets to indicate the keys.

# get the page_total and the count from the request summary
page_total = request.json()['searchresult']['summary']['page_total']
count = request.json()['searchresult']['summary']['count']
print('Page total: ' + str(page_total) + '\n' + 'Total results: ' + str(count))
Page total: 13
Total results: 604

Now we write a few loops. The first one gathers all the data from each page of the results. The second loop parses that data into a dataframe object (a tabular or spreadsheet format) so we can examine it and eventually save it to a csv file.

# request the additional pages of the query by adding 1 to the 'page' 
# parameter until it reaches the page_total. Store each page of  
# requests in a list. Wait 3 seconds between each request to avoid 
# overloading the API

pages = []
for i in range(page_total):
    page = i + 1
    request = requests.get(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))
    time.sleep(3)
    pages.append(request.json())
# for each page of the request, parse the results and add them to a 
# dataframe. each page is a json file with individual results labeled
# '0' through '49' and # 'summary' nested under searchresult. Ignore 
# the summary and use pandas.concat to add each of the results from 
# the request in a dataframe

df = pd.DataFrame()
for page in pages:
    results = page['searchresult']
    # if the page has no results, skip it
    for i in range(50):
        if str(i) in results:
            df = pd.concat([df, pd.DataFrame(results[str(i)], index=[i])])
        else:
            continue

Now we have our data in a tabular structure, thanks to the DataFrame that we got from a different library called pandas. This DataFrame format enables us to examine our data in a spreadsheet.

df
relevance state bill_number bill_id change_hash url text_url research_url last_action_date last_action title
0 100 UT HB0316 1819064 bb27e8c4d929c9331af7b02dc6d81348 https://legiscan.com/UT/bill/HB0316/2024 https://legiscan.com/UT/text/HB0316/2024 https://legiscan.com/UT/research/HB0316/2024 2024-02-12 Senate/ 1st reading (Introduced) in Senate Rul... Inmate Assignment Amendments
1 99 DC B25-0460 1778035 e96fc947b1b4170adf7a3fe91291a61b https://legiscan.com/DC/bill/B25-0460/2023 https://legiscan.com/DC/text/B25-0460/2023 https://legiscan.com/DC/research/B25-0460/2023 2023-09-22 Notice of Intent to Act on B25-0460 Published ... Transgender and Gender-Diverse Mortality and F...
2 99 DC CER25-0143 1782702 9aff2f06c9f9b38306f8f3a8e83183c8 https://legiscan.com/DC/bill/CER25-0143/2023 https://legiscan.com/DC/text/CER25-0143/2023 https://legiscan.com/DC/research/CER25-0143/2023 2023-11-24 Resolution ACR25-0141, Effective from Nov 07, ... Transgender Day of Remembrance Recognition Res...
3 99 VT JRH004 1751508 bfb4a15ca9ece1f262e6ed5759e192ba https://legiscan.com/VT/bill/JRH004/2023 https://legiscan.com/VT/text/JRH004/2023 https://legiscan.com/VT/research/JRH004/2023 2023-04-07 Senate Message, adopted in concurrence Joint resolution recognizing March 31, 2023 as...
4 99 US HR886 1784730 15a2a26d5c91782333a2b37fc8083154 https://legiscan.com/US/bill/HR886/2023 https://legiscan.com/US/text/HR886/2023 https://legiscan.com/US/research/HR886/2023 2023-11-21 Referred to the House Committee on the Judiciary. Supporting the goals and principles of Transge...
... ... ... ... ... ... ... ... ... ... ... ...
49 6 US HB2670 1757049 ccfa8fba0550b39bb71af841122e2132 https://legiscan.com/US/bill/HB2670/2023 https://legiscan.com/US/text/HB2670/2023 https://legiscan.com/US/research/HB2670/2023 2023-12-22 Became Public Law No: 118-31. CONVENE Act of 2023 Sensible Classification Ac...
0 4 MI HB4437 1757146 604fc33c04c0b4d89cce5b8bd0d51893 https://legiscan.com/MI/bill/HB4437/2023 https://legiscan.com/MI/text/HB4437/2023 https://legiscan.com/MI/research/HB4437/2023 2023-09-06 Disapproved Line Item(s) Re-referred To Commit... Appropriations: omnibus; appropriations for mu...
1 4 NY S04007 1690727 1579358405b6c681a7bd5ed500b7ac14 https://legiscan.com/NY/bill/S04007/2023 https://legiscan.com/NY/text/S04007/2023 https://legiscan.com/NY/research/S04007/2023 2023-05-03 SIGNED CHAP.57 Enacts into law major components of legislatio...
2 3 NY S04004 1690688 30e29d6956eec2d3ff19a58c35ad73f8 https://legiscan.com/NY/bill/S04004/2023 https://legiscan.com/NY/text/S04004/2023 https://legiscan.com/NY/research/S04004/2023 2023-05-01 SUBSTITUTED BY A3004D Makes appropriations for the support of govern...
3 3 NY A03004 1690608 b862da143af26ca5a2997011fc933673 https://legiscan.com/NY/bill/A03004/2023 https://legiscan.com/NY/text/A03004/2023 https://legiscan.com/NY/research/A03004/2023 2023-05-12 thru line veto memo.36 Makes appropriations for the support of govern...

604 rows × 11 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 604 entries, 0 to 3
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   relevance         604 non-null    int64 
 1   state             604 non-null    object
 2   bill_number       604 non-null    object
 3   bill_id           604 non-null    int64 
 4   change_hash       604 non-null    object
 5   url               604 non-null    object
 6   text_url          604 non-null    object
 7   research_url      604 non-null    object
 8   last_action_date  604 non-null    object
 9   last_action       604 non-null    object
 10  title             604 non-null    object
dtypes: int64(2), object(9)
memory usage: 56.6+ KB
df.to_csv('legiscan_api_results.csv')

That’s it!

In the next workshop, we will look at analyzing the plain text from some of these bills, which we will access through the congress.gov website.