Legiscan API

Legiscan API#

Now we turn to examine APIs, how to create one, and how to navigate and download the results. You will notice how much easier it is to manage the data, which comes already in a clean and structured format (as JSON), which is opposed to the messy HTML data that we get with web scraping.

The Legiscan API provides access to legislative data, such as bills, sponsors, hearing information, etc, from all 50 states. To use the Legiscan API, you’d first need to make an account on Legiscan, and request an API key. The key (which we will incorporate into our API call) gives you authentication to access the data on Legiscan.

After getting our API key, we can create our API call. The first step is import our libraries that we need in order to use the API. Then, we can construct our API call, putting together the root, key and query.

import requests
import pandas as pd
import time

# the components of the API call, which make up the "request" variable
url = 'https://api.legiscan.com/?key='
key = # insert your key here
page = 1
request = requests.get(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))

# to print out the full URL, and we can paste in the browser to get 
# an interactive look at the raw results
print(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))

  Cell In[2], line 3
    key = # insert your key here
          ^
SyntaxError: invalid syntax

Now we can make the API call, using the “request” object that we created above. From there, we call the .json() method, to navigate through the results, which are in json format.

Accessing items in json involves using brackets to indicate the keys.

# get the page_total and the count from the request summary
page_total = request.json()['searchresult']['summary']['page_total']
count = request.json()['searchresult']['summary']['count']
print('Page total: ' + str(page_total) + '\n' + 'Total results: ' + str(count))

Page total: 13
Total results: 604

Now we write a few loops. The first one gathers all the data from each page of the results. The second loop parses that data into a dataframe object (a tabular or spreadsheet format) so we can examine it and eventually save it to a csv file.

# request the additional pages of the query by adding 1 to the 'page' 
# parameter until it reaches the page_total. Store each page of  
# requests in a list. Wait 3 seconds between each request to avoid 
# overloading the API

pages = []
for i in range(page_total):
    page = i + 1
    request = requests.get(url + key + '&op=getSearch&state=ALL&query=transgender' + '&page=' + str(page))
    time.sleep(3)
    pages.append(request.json())

# for each page of the request, parse the results and add them to a 
# dataframe. each page is a json file with individual results labeled
# '0' through '49' and # 'summary' nested under searchresult. Ignore 
# the summary and use pandas.concat to add each of the results from 
# the request in a dataframe

df = pd.DataFrame()
for page in pages:
    results = page['searchresult']
    # if the page has no results, skip it
    for i in range(50):
        if str(i) in results:
            df = pd.concat([df, pd.DataFrame(results[str(i)], index=[i])])
        else:
            continue

Now we have our data in a tabular structure, thanks to the DataFrame that we got from a different library called pandas. This DataFrame format enables us to examine our data in a spreadsheet.

df

	relevance	state	bill_number	bill_id	change_hash	url	text_url	research_url	last_action_date	last_action	title
0	100	UT	HB0316	1819064	bb27e8c4d929c9331af7b02dc6d81348	https://legiscan.com/UT/bill/HB0316/2024	https://legiscan.com/UT/text/HB0316/2024	https://legiscan.com/UT/research/HB0316/2024	2024-02-12	Senate/ 1st reading (Introduced) in Senate Rul...	Inmate Assignment Amendments
1	99	DC	B25-0460	1778035	e96fc947b1b4170adf7a3fe91291a61b	https://legiscan.com/DC/bill/B25-0460/2023	https://legiscan.com/DC/text/B25-0460/2023	https://legiscan.com/DC/research/B25-0460/2023	2023-09-22	Notice of Intent to Act on B25-0460 Published ...	Transgender and Gender-Diverse Mortality and F...
2	99	DC	CER25-0143	1782702	9aff2f06c9f9b38306f8f3a8e83183c8	https://legiscan.com/DC/bill/CER25-0143/2023	https://legiscan.com/DC/text/CER25-0143/2023	https://legiscan.com/DC/research/CER25-0143/2023	2023-11-24	Resolution ACR25-0141, Effective from Nov 07, ...	Transgender Day of Remembrance Recognition Res...
3	99	VT	JRH004	1751508	bfb4a15ca9ece1f262e6ed5759e192ba	https://legiscan.com/VT/bill/JRH004/2023	https://legiscan.com/VT/text/JRH004/2023	https://legiscan.com/VT/research/JRH004/2023	2023-04-07	Senate Message, adopted in concurrence	Joint resolution recognizing March 31, 2023 as...
4	99	US	HR886	1784730	15a2a26d5c91782333a2b37fc8083154	https://legiscan.com/US/bill/HR886/2023	https://legiscan.com/US/text/HR886/2023	https://legiscan.com/US/research/HR886/2023	2023-11-21	Referred to the House Committee on the Judiciary.	Supporting the goals and principles of Transge...
...	...	...	...	...	...	...	...	...	...	...	...
49	6	US	HB2670	1757049	ccfa8fba0550b39bb71af841122e2132	https://legiscan.com/US/bill/HB2670/2023	https://legiscan.com/US/text/HB2670/2023	https://legiscan.com/US/research/HB2670/2023	2023-12-22	Became Public Law No: 118-31.	CONVENE Act of 2023 Sensible Classification Ac...
0	4	MI	HB4437	1757146	604fc33c04c0b4d89cce5b8bd0d51893	https://legiscan.com/MI/bill/HB4437/2023	https://legiscan.com/MI/text/HB4437/2023	https://legiscan.com/MI/research/HB4437/2023	2023-09-06	Disapproved Line Item(s) Re-referred To Commit...	Appropriations: omnibus; appropriations for mu...
1	4	NY	S04007	1690727	1579358405b6c681a7bd5ed500b7ac14	https://legiscan.com/NY/bill/S04007/2023	https://legiscan.com/NY/text/S04007/2023	https://legiscan.com/NY/research/S04007/2023	2023-05-03	SIGNED CHAP.57	Enacts into law major components of legislatio...
2	3	NY	S04004	1690688	30e29d6956eec2d3ff19a58c35ad73f8	https://legiscan.com/NY/bill/S04004/2023	https://legiscan.com/NY/text/S04004/2023	https://legiscan.com/NY/research/S04004/2023	2023-05-01	SUBSTITUTED BY A3004D	Makes appropriations for the support of govern...
3	3	NY	A03004	1690608	b862da143af26ca5a2997011fc933673	https://legiscan.com/NY/bill/A03004/2023	https://legiscan.com/NY/text/A03004/2023	https://legiscan.com/NY/research/A03004/2023	2023-05-12	thru line veto memo.36	Makes appropriations for the support of govern...

604 rows × 11 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 604 entries, 0 to 3
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   relevance         604 non-null    int64 
 1   state             604 non-null    object
 2   bill_number       604 non-null    object
 3   bill_id           604 non-null    int64 
 4   change_hash       604 non-null    object
 5   url               604 non-null    object
 6   text_url          604 non-null    object
 7   research_url      604 non-null    object
 8   last_action_date  604 non-null    object
 9   last_action       604 non-null    object
 10  title             604 non-null    object
dtypes: int64(2), object(9)
memory usage: 56.6+ KB

df.to_csv('legiscan_api_results.csv')

That’s it!

In the next workshop, we will look at analyzing the plain text from some of these bills, which we will access through the congress.gov website.