Using API to Extract Data from the US BLS Website

Using API to Extract Data from the US BLS Website

Check out my other posts:

How to Gather Data from EIA API

Creating a Python Dashboard using Dash

Import Data to Python Using BLS Open Data API

Now, let start on getting the publicly available data from the US Bureau of Labor Statistics. The API has two versions, version 1 and version 2. Version 1 API can be used without registering for an API, but it has more limitations.

I have decided to utilize the version 2 of the BLS API primarily because it provides greater flexibility in terms of functionality. The version 1 API of the BLS API imposes a limitation of only 25 queries per day, which is insufficient for my project requirements. Given that I may need to extensively test various features that involve requesting data through the API, the 25 API request per day would be used up very quickly.

Version 1 APIVersion 2 API
Query up to 10 years of data at a timeQuery up to 20 years of data at a time
Query up to 25 series per requestQuery up to 50 series at a time
Maximum of 25 queries per dayMaximum of 500 queries per day
Does not requires an API keyRequires an API key
Table comparing Version 1 and Version 2 of the API

Configs for the BLS API

To start off, I have to put in some settings for the BLS API. The Series ID can be found in the Data Viewer in the BLS website, and there will be a Series ID for each Series.

config.py

# Setting for the BLS API
from datetime import datetime

# Settings for global variables
start_year = datetime.now().year - 19
end_year = datetime.now().year

bls_series = ['APU0000708111', 'APU000072610', 'APU0000709112', 'APU0000702111', 'APU0000704111', 'APU0000FF1101',
              'APU0000706111', 'APU0000711211', 'APU0000701312', 'APU0000717311', 'APU0000703111', 'APU0000702421',
              'APU0000711311', 'APU0000712311', 'CUUR0000SA0']

bls_series_name = ['eggs', 'electricity', 'milk', 'bread', 'bacon', 'chicken_breast', 'chicken_whole', 'bananas',
                   'rice', 'coffee', 'ground_chuck', 'cookies', 'oranges', 'tomatoes', 'cpi_values']

url_bls = 'https://api.bls.gov/publicAPI/v2/timeseries/data/'

I am using the datetime libraries to help me keep track of the current year. Since the BLS open data API only supports up to 20 years of data per query.

  • start_year: Current year – 19 years
  • end_year: Current year
  • bls_series: Contains all the series that I want to query
  • bls_series_name: Name of the datasets that is used as column name for a Pandas DataFrame
  • url_bls: URL that is used in the API request

Extracting data using BLS API

In the __init__ function, it pulls the setting in the config.py we had setup earlier. This is to let our Updater class know where and what kind of data we want to fetch.

The response data from the request have the following data structure.

Example of the data returned by the API

Helper Functions

In the output, Result is a list of dictionaries containing all series we are requesting through the API. In each series in the dictionaries, there is a key called data. This key is a list of dictionaries containing a record for a specific series.

Each dictionary in that list contains the year of the record, the month of the record, and the values of the record for that month of that year. This is not easy to work with, so we need some way to process this data, so it can be much nicer to work with. 

Here I will introduce a few functions that can help me achieve that, the first function is to help me convert the values in the dictionaries into a list. This function takes in a dictionary and a list of keys that are in the dictionaries. It will go through the dictionary and extract the item with matches the keys provided. Lastly, it outputs it into a list.

misc_func

"""
Helper function for both the EIA and BLS API
"""

import pandas as pd


# Function that converts dictionaries to list
def dict_to_list(dicts, key):
    """
    Function that takes in list of dictionaries and return the values in each of the dictionaries with the specified
    key
    :param key: List of keys
    :param dicts: List of Dict
    :return: Nested list of items
    """

    list_array = []

    list_item = [item.get(key) for item in dicts]

    list_array.append(list_item)

    return list_array

The second help function will convert a list of dictionaries into a Pandas Data frame. The concept is simple, given a list of dictionaries, I want to extract the values in the list while looking for values that match a specific key in the dictionary. It calls the previous function that extracts values from the dictionary and outputs it into a list. The final product is a Pandas Data frame.

# Function for the retrieve data function that converts list of dict into a Pandas dataframe for the BLS data
def dict_to_df(list_dict, key):
    """
    Takes in a list of dict and converts it into pandas dataframe
    :param key: Key must be a list of key to search through
    :param list_dict: List of dict
    :return: Pandas dataframe
    """

    temp_list = []

    for item in list_dict:
        temp_list += dict_to_list(item[key[0]], key[1])

    data_df = pd.DataFrame(temp_list).transpose()

    return data_df

The last help function is to parse the date into a more standard format. As of right now, the API only provides the year of record, and the month in the form of either M + month number (M01, M02,…M12) or January, February, …. I want to transform it to year + month format.

# Function to parse year and month in yyyy-mm format for the BLS data
def bls_parse_date(list_dict):
    """
    The function is to parse date for the BLS data from year and month to YYYY-mm-01. It takes in a list of dicts,
    and will parse data according to the given keys
    :param list_dict: List of dictionaries that contains the year and period
    :return: Panda dataframe with the first day of the month
    """

    # Parse the dictionaries into a list
    year = dict_to_list(list_dict, 'year')
    month = dict_to_list(list_dict, 'period')
    year_month = []

    for date in list(zip(*year, *month)):
        # Join year and month together and set the date to first day of the month
        year_month += ['-'.join(date)]

        # Remove letter m in period
        year_month = [str(item).replace('M', '') for item in year_month]

    return pd.DataFrame(year_month)


Putting Everything Together

Next, I am going to create a file that helps me fetch the data using BLS API.

fetch_data.py

"""This file is used to retrieve data from the US BLS and US EIA website using their public API"""

# Importing libraries
import requests
import json
import pandas as pd
import api_keys
import config


class Updater:

    def __init__(self):
        self.bls_series = config.bls_series
        self.bls_url = config.url_bls
        self.start_year = config.start_year
        self.end_year = config.end_year

Finally, we have the main function that pulls the data from the BLS API. The function uses the Requests library to get data from the API and uses parameters in the config.py to request the data I need. After it pulls the data using API, I parse the data through the function I have created above standardize the data, and put it into a Pandas Data frame.

I am quite new at using the Request library, fortunately, there are sample codes provided to help me get started. Otherwise, I would probably have to spend hours just to get it working.

Link to sample code here: https://www.bls.gov/developers/api_python.htm

    # API to retrieve data from the US BLS website
    def retrieve_data_bls(self, bls_series, bls_series_name):
        """
        Takes in a list of series id and retrieve their data values from the website
        :param: List of series id in strings
        :return: Pandas Dataframe
        """

        # Requesting data through API
        headers = {'Content-type': 'application/json'}
        data = json.dumps({'seriesid': bls_series,
                           'startyear': self.start_year,
                           'endyear': self.end_year,
                           "registrationkey": api_keys.bls_api_key})

        p = requests.post(self.bls_url, data=data, headers=headers)
        json_data = json.loads(p.text)

        # Get the data from a list of dictionaries into a nested list
        # Then the nested list is put into a dataframe
        bls_df = misc_func.dict_to_df(json_data['Results']['series'], ['data', 'value'])
        bls_df.columns = bls_series_name

        # Extract the value date from the series and put it as a new column into the bls df
        bls_df['year_month'] = misc_func.bls_parse_date(json_data['Results']['series'][0]['data'])

        return bls_df


The result Dataframe have the following structures.

>>> data = Updater()
>>> bls_api = data.retrieve_data_bls(config.bls_series, config.bls_series_name)
Index(['eggs', 'electricity', 'milk', 'bread', 'bacon', 'chicken_breast',
       'chicken_whole', 'bananas', 'rice', 'coffee', 'ground_chuck', 'cookies',
       'oranges', 'tomatoes', 'cpi_values', 'year_month'],
      dtype='object')

Sample data from the data extracted from the API.

>>> bls_api.head()
    eggs  electricity   milk  bread  ...  oranges  tomatoes  cpi_values  year_month
0  2.666        0.165  4.042  1.951  ...    1.512     1.798     304.127     2023-05
1  3.270        0.165  4.042  1.989  ...    1.530     1.874     303.363     2023-04
2  3.446        0.166  4.098  1.936  ...    1.509     1.932     301.836     2023-03
3  4.211        0.168  4.163  1.896  ...    1.549     1.990     300.840     2023-02
4  4.823        0.168  4.204  1.888  ...    1.514     2.110     299.170     2023-01

This is the data from that was requested using the BLS API in Pandas DataFrame.

The Full Code for the BLS API

Github: https://github.com/thecodingmango/cpi_dashboard

"""This file is used to retrieve data from the US BLS and US EIA website using their public API"""

# Importing libraries
import requests
import json
import api_keys
import config
import data.misc_func as misc_func


class Updater:

    def __init__(self):
        self.bls_url = config.url_bls
        self.eia_url = config.url_eia
        self.start_year = config.start_year
        self.end_year = config.end_year

    # API to retrieve data from the US BLS website
    def retrieve_data_bls(self, bls_series, bls_series_name):
        """
        Takes in a list of series id and retrieve their data values from the website
        :param: List of series id in strings
        :return: Pandas Dataframe
        """

        # Requesting data through API
        headers = {'Content-type': 'application/json'}
        data = json.dumps({'seriesid': bls_series,
                           'startyear': self.start_year,
                           'endyear': self.end_year,
                           "registrationkey": api_keys.bls_api_key})

        p = requests.post(self.bls_url, data=data, headers=headers)
        json_data = json.loads(p.text)

        # Get the data from a list of dictionaries into a nested list
        # Then the nested list is put into a dataframe
        bls_df = misc_func.dict_to_df(json_data['Results']['series'], ['data', 'value'])
        bls_df.columns = bls_series_name

        # Check the data before returning the dataframe
        bls_df = data_check(bls_df)

        # Extract the value date from the series and put it as a new column into the bls df
        bls_df['year_month'] = misc_func.bls_parse_date(json_data['Results']['series'][0]['data'])

        return bls_df