Repository materials can be downloaded only by EAA ARC members. Please log in here!

October 29, 2017 11:45 PM

A python script to manage regulatory data of U.S. Bank Holding Companies

The Chicago Fed offers great data on bank holding companies, click here for the full page, and here for the download page. The data is big, deep, large. The number of variables is stunning (more than 3,000). The format is a flat text file with a typical delimiter: a caret (^) separates the data fields.

How to manage this data efficiently? The Chicago Fed offers a SAS script to manage the data, but SAS is costly software written by consultants who approach software as if it were bouillabaisse.

Alternatively, one can download the data and then import it into Stata, however, the processing is slow and cumbersome.

Python, however, offers a fine solution. I wrote a script that merges the BHC data into one data set.

You can choose to output the data to CSV or STATA format. You can also send the data to your MySql server.

You should install Python and add the full Scipy pack. If you are not familiar with python, read Python for Kids and Python for Data Analysis. If you want to download the bank holding company data quickly, you need curl.

Note, this all works like a charm under Linux (Mint 17). Linux just works finer if you want to use large data and free software.

If you want to give it a go, visit my Git page for more information:

Share this:


About Martien Lubberink

Victoria University Of Wellington
Associate Professor Accounting and Capital

I completed my PhD in Economics at Groningen University. I have since worked at Maastricht University and at Lancaster University. After my sabbatical year at UNC Chapel Hill, I joined De Nederlandsche Bank, the central bank of the Netherlands. Here...

View Full Profile