Python-ize your research

Martien Lubberink
May 25, 2017

I agree with David Veenman that Stata is great software. However, Stata progresses slowly. Every now and then there is an update, but it is kinda top-down. Bottom-up-wise you can use ado files. However, if you try to write an ado file yourself you are bound to get stuck: Stata's programming language is a real challenge. 

More of a pain is Stata's approach to data. It is arcane: one can only have one set of data in memory; running scripts reminds me of GWBasic, it is hardly programming. This then leads to code maintenance problems. 

I tried R as an alternative to Stata - yes, the code is versatile, but there are too many modules that do similar things: data-table, data-frame, matrices. This leads to messiness too. 

Both R and Stata may have been functional in the past, when dedicated programmers could contribute to the development of these packages, and when only Universities would pay for the software - thus restricting access.

Python? Today, however, great software is free, and thousands of people are happy to contribute to new features. Here is where Python has a clear advantage, it is bottom-up but without the chaos that characterises R. There is a large Python support community that helps you move on quickly, specifically via Stackoverflow.

And it must be said: there is a lot going for Python:

  • Pyhton is free.
  • Pyhton is has a great research packages, specifically SciPy, Pandas, Numpy, Statsmodels. So, you can do Fama-MacBeth and 2-way clustering in Python. 
  • Pandas for Python seems to be written for the financial industry, e.g. there is a stash-load of date functions. Financial functions are basically all the ones that Excel offers.  
  • The code is nicely structured, code maintenance is easy.
  • Pyhton is much more programming and little scripting, which allows you to produce clean code with your own functions that take care of routine tasks.
  • There is a Datastream package for Python.
  • Updating Python (via Anaconda) is seamless. 
  • Graphs in Python are mouth-watering.
  • Stackoverflow offers lots of support.
  • WRDS supports Python.
  • Teaching Python can be done using PythonAnywhere.

All in all, given the world we live in today, Python looks promising: it offers maximum freedom to analyse data the way you want.


Victor Van Pelt
May 31, 2017

Helpful post!

Each tool has its benefits and costs. While you listed a few good reasons to use Python, I believe both R and Stata have their advantages.

In the end, I believe it's best to figure out the extent to which a particular tool leverages what you are trying to do.