Check out my other posts:
How to Gather Data from EIA API
Creating a Python Dashboard using Dash
Import Data to Python Using BLS Open Data API
Now, let start on getting the publicly available data from the US Bureau of Labor Statistics. The API has two versions, version 1 and version 2. Version 1 API can be used without registering for an API, but it has more limitations.
I have decided to utilize the version 2 of the BLS API primarily because it provides greater flexibility in terms of functionality. The version 1 API of the BLS API imposes a limitation of only 25 queries per day, which is insufficient for my project requirements. Given that I may need to extensively test various features that involve requesting data through the API, the 25 API request per day would be used up very quickly.
Version 1 API | Version 2 API |
---|---|
Query up to 10 years of data at a time | Query up to 20 years of data at a time |
Query up to 25 series per request | Query up to 50 series at a time |
Maximum of 25 queries per day | Maximum of 500 queries per day |
Does not requires an API key | Requires an API key |
Configs for the BLS API
To start off, I have to put in some settings for the BLS API. The Series ID can be found in the Data Viewer in the BLS website, and there will be a Series ID for each Series.
config.py
# Setting for the BLS API
from datetime import datetime
# Settings for global variables
start_year = datetime.now().year - 19
end_year = datetime.now().year
bls_series = ['APU0000708111', 'APU000072610', 'APU0000709112', 'APU0000702111', 'APU0000704111', 'APU0000FF1101',
'APU0000706111', 'APU0000711211', 'APU0000701312', 'APU0000717311', 'APU0000703111', 'APU0000702421',
'APU0000711311', 'APU0000712311', 'CUUR0000SA0']
bls_series_name = ['eggs', 'electricity', 'milk', 'bread', 'bacon', 'chicken_breast', 'chicken_whole', 'bananas',
'rice', 'coffee', 'ground_chuck', 'cookies', 'oranges', 'tomatoes', 'cpi_values']
url_bls = 'https://api.bls.gov/publicAPI/v2/timeseries/data/'
I am using the datetime libraries to help me keep track of the current year. Since the BLS open data API only supports up to 20 years of data per query.
- start_year: Current year – 19 years
- end_year: Current year
- bls_series: Contains all the series that I want to query
- bls_series_name: Name of the datasets that is used as column name for a Pandas DataFrame
- url_bls: URL that is used in the API request
Extracting data using BLS API
In the __init__ function, it pulls the setting in the config.py we had setup earlier. This is to let our Updater class know where and what kind of data we want to fetch.
The response data from the request have the following data structure.

Helper Functions
In the output, Result is a list of dictionaries containing all series we are requesting through the API. In each series in the dictionaries, there is a key called data. This key is a list of dictionaries containing a record for a specific series.
Each dictionary in that list contains the year of the record, the month of the record, and the values of the record for that month of that year. This is not easy to work with, so we need some way to process this data, so it can be much nicer to work with.
Here I will introduce a few functions that can help me achieve that, the first function is to help me convert the values in the dictionaries into a list. This function takes in a dictionary and a list of keys that are in the dictionaries. It will go through the dictionary and extract the item with matches the keys provided. Lastly, it outputs it into a list.
misc_func
"""
Helper function for both the EIA and BLS API
"""
import pandas as pd
# Function that converts dictionaries to list
def dict_to_list(dicts, key):
"""
Function that takes in list of dictionaries and return the values in each of the dictionaries with the specified
key
:param key: List of keys
:param dicts: List of Dict
:return: Nested list of items
"""
list_array = []
list_item = [item.get(key) for item in dicts]
list_array.append(list_item)
return list_array
The second help function will convert a list of dictionaries into a Pandas Data frame. The concept is simple, given a list of dictionaries, I want to extract the values in the list while looking for values that match a specific key in the dictionary. It calls the previous function that extracts values from the dictionary and outputs it into a list. The final product is a Pandas Data frame.
# Function for the retrieve data function that converts list of dict into a Pandas dataframe for the BLS data
def dict_to_df(list_dict, key):
"""
Takes in a list of dict and converts it into pandas dataframe
:param key: Key must be a list of key to search through
:param list_dict: List of dict
:return: Pandas dataframe
"""
temp_list = []
for item in list_dict:
temp_list += dict_to_list(item[key[0]], key[1])
data_df = pd.DataFrame(temp_list).transpose()
return data_df
The last help function is to parse the date into a more standard format. As of right now, the API only provides the year of record, and the month in the form of either M + month number (M01, M02,…M12) or January, February, …. I want to transform it to year + month format.
# Function to parse year and month in yyyy-mm format for the BLS data
def bls_parse_date(list_dict):
"""
The function is to parse date for the BLS data from year and month to YYYY-mm-01. It takes in a list of dicts,
and will parse data according to the given keys
:param list_dict: List of dictionaries that contains the year and period
:return: Panda dataframe with the first day of the month
"""
# Parse the dictionaries into a list
year = dict_to_list(list_dict, 'year')
month = dict_to_list(list_dict, 'period')
year_month = []
for date in list(zip(*year, *month)):
# Join year and month together and set the date to first day of the month
year_month += ['-'.join(date)]
# Remove letter m in period
year_month = [str(item).replace('M', '') for item in year_month]
return pd.DataFrame(year_month)
Putting Everything Together
Next, I am going to create a file that helps me fetch the data using BLS API.
fetch_data.py
"""This file is used to retrieve data from the US BLS and US EIA website using their public API"""
# Importing libraries
import requests
import json
import pandas as pd
import api_keys
import config
class Updater:
def __init__(self):
self.bls_series = config.bls_series
self.bls_url = config.url_bls
self.start_year = config.start_year
self.end_year = config.end_year
Finally, we have the main function that pulls the data from the BLS API. The function uses the Requests library to get data from the API and uses parameters in the config.py to request the data I need. After it pulls the data using API, I parse the data through the function I have created above standardize the data, and put it into a Pandas Data frame.
I am quite new at using the Request library, fortunately, there are sample codes provided to help me get started. Otherwise, I would probably have to spend hours just to get it working.
Link to sample code here: https://www.bls.gov/developers/api_python.htm
# API to retrieve data from the US BLS website
def retrieve_data_bls(self, bls_series, bls_series_name):
"""
Takes in a list of series id and retrieve their data values from the website
:param: List of series id in strings
:return: Pandas Dataframe
"""
# Requesting data through API
headers = {'Content-type': 'application/json'}
data = json.dumps({'seriesid': bls_series,
'startyear': self.start_year,
'endyear': self.end_year,
"registrationkey": api_keys.bls_api_key})
p = requests.post(self.bls_url, data=data, headers=headers)
json_data = json.loads(p.text)
# Get the data from a list of dictionaries into a nested list
# Then the nested list is put into a dataframe
bls_df = misc_func.dict_to_df(json_data['Results']['series'], ['data', 'value'])
bls_df.columns = bls_series_name
# Extract the value date from the series and put it as a new column into the bls df
bls_df['year_month'] = misc_func.bls_parse_date(json_data['Results']['series'][0]['data'])
return bls_df
The result Dataframe have the following structures.
>>> data = Updater()
>>> bls_api = data.retrieve_data_bls(config.bls_series, config.bls_series_name)
Index(['eggs', 'electricity', 'milk', 'bread', 'bacon', 'chicken_breast',
'chicken_whole', 'bananas', 'rice', 'coffee', 'ground_chuck', 'cookies',
'oranges', 'tomatoes', 'cpi_values', 'year_month'],
dtype='object')
Sample data from the data extracted from the API.
>>> bls_api.head()
eggs electricity milk bread ... oranges tomatoes cpi_values year_month
0 2.666 0.165 4.042 1.951 ... 1.512 1.798 304.127 2023-05
1 3.270 0.165 4.042 1.989 ... 1.530 1.874 303.363 2023-04
2 3.446 0.166 4.098 1.936 ... 1.509 1.932 301.836 2023-03
3 4.211 0.168 4.163 1.896 ... 1.549 1.990 300.840 2023-02
4 4.823 0.168 4.204 1.888 ... 1.514 2.110 299.170 2023-01
This is the data from that was requested using the BLS API in Pandas DataFrame.

The Full Code for the BLS API
"""This file is used to retrieve data from the US BLS and US EIA website using their public API"""
# Importing libraries
import requests
import json
import api_keys
import config
import data.misc_func as misc_func
class Updater:
def __init__(self):
self.bls_url = config.url_bls
self.eia_url = config.url_eia
self.start_year = config.start_year
self.end_year = config.end_year
# API to retrieve data from the US BLS website
def retrieve_data_bls(self, bls_series, bls_series_name):
"""
Takes in a list of series id and retrieve their data values from the website
:param: List of series id in strings
:return: Pandas Dataframe
"""
# Requesting data through API
headers = {'Content-type': 'application/json'}
data = json.dumps({'seriesid': bls_series,
'startyear': self.start_year,
'endyear': self.end_year,
"registrationkey": api_keys.bls_api_key})
p = requests.post(self.bls_url, data=data, headers=headers)
json_data = json.loads(p.text)
# Get the data from a list of dictionaries into a nested list
# Then the nested list is put into a dataframe
bls_df = misc_func.dict_to_df(json_data['Results']['series'], ['data', 'value'])
bls_df.columns = bls_series_name
# Check the data before returning the dataframe
bls_df = data_check(bls_df)
# Extract the value date from the series and put it as a new column into the bls df
bls_df['year_month'] = misc_func.bls_parse_date(json_data['Results']['series'][0]['data'])
return bls_df