Using Python And Pandas Datareader to Analyze Financial Data

Finance and economics are becoming more and more interesting for all kinds of people, regardless of their career or profession. This is because we are all affected by economic data, or at least we are increasingly interested in being up-to-date, and we have a lot of information at hand.

Every day billions of bytes of financial data are sent over the Internet. Whether it is the price of a share, an e-commerce transaction, or even information on a country's GDP. All this data, when properly organized and managed can be used to build some amazing and insightful software applications.

We will use Python to access public financial data, organize it and combine it to gain new insights into how money makes the world go round. We will focus mainly on two Python modules:

We will use these modules to import data from some of the largest financial organizations in the world, as well as data stored locally on our computers. By the end of the notebook, you should feel comfortable importing financial data, either from a public source or from a local file, into Python, organizing that data and combining it with each other

Importing data via Datareader

Many financial institutions, stock markets and global banks provide the public with large amounts of the data they publicly store. Most of this data is well organized, updated live and accessible through the use of an application programming interface (API), which offers programming languages such as Python a way to download and import it.

The pandas-datareader module is specifically designed to interface with some of the world's most popular financial data APIs, and import their data into an easily digestible pandas DataFrame. Each financial API is accessed by a different function exposed by pandas-datareader. Generally, accessing each API requires a different set of arguments and information to be provided by the programmer.

We will import data from several of these APIs and play with them. For a complete list of all data that the pandas-datareader can access, you can consult the official documentation.

Let's make a small program that uses pandas-datareader to get data from the World Bank API gdp. First we will install the library as follows (Please run the command on your own in the next cell, I already have it installed!):

!pip3 install pandas_datareader

Here we have imported the data from the World Bank via wb. And we imported the python datetime library, in order to put the start and end dates in which I want to do the analysis. For more information about Python Datetimes see this other notebook!

The download method has different parameters that we can pass on in order to get the data from the World Bank. Among them is the indicator_id. You can find more information about the parameters here.

As we can see, with this command we have obtained the GDP per capita data for 3 countries on a given date

Getting the NASDAQ Symbols

The NASDAQ Stock Exchange identifies each of its shares with a unique symbol:

It also provides a useful API for accessing the symbols currently traded on it. Pandas-datareader provides several functions to import data from the NASDAQ API through its nasdaq_trader sub-module.

To import the list of stock symbols, we want to use the function get_nasdaq_symbols from nasdaq_trader. It is done in this way

When called, it will go to the NASDAQ API, and import the list of symbols currently being traded. The advantage of using pandas-datareader is that all the logic to interact with the NASDAQ API or any other API is encapsulated in easy-to-use sub-modules and functions like the ones above.

We have a total of 9,338 different stocks that are listed on the Nasdaq, and in whose columns we have different economic and financial information about that stock. But what if we want to access a single symbol/stock?

Technical Analysis in Finance

Technical analysis in finance is the type of analysis performed by means of statistics and charts on stocks (or indices in our case). Let's see how to do something very simple with 'Plotly' a Python library for charting. In this case we'll access to Microsoft daily quotes. Let's do it!

We have accessed the data of MSFT. We did this by importing .data from datareader and giving it the web alias. Under the hood we are using Yahoo Finance to import the data from an API, but in this case pandas.datareader allowed us to do it in a very simple way. Now we are going to plot the result to make Technical Analysis.

We just did something very interesting and it was to chart MSFT's acicon with the updated data! Today is November 20, 2020, so the last data of my graph is that date, you can do the same, place the mouse at the end of the graph and see the last quote of the stock! You could in this case access your investment portfolio and run the code daily and make a technical analysis on those tickets!

Data filtering by date

Many of the APIs that the pandas-datareader connects to allow us to filter the data we get by date. Financial institutions tend to keep track of data that goes back several decades, and when we import that data, it is useful to be able to specify exactly when we want it to come from

An API that does just that is the Federal Reserve Bank of San Luis (FRED), which we can access by first importing the pandas_datareader.data sub-module and then calling its DataReader function:

The DataReader function takes 4 arguments:

By changing the start and end dates, you can easily filter the data you receive

Using the Shift() operation

Once we've imported a DataFrame full of financial data, there are some pretty cool ways to manipulate it. In this exercise we will see the shift() operation, a DataFrame function that moves all the rows in a column up or down

Screenshot%20from%202020-11-21%2010-30-15.png

Screenshot%20from%202020-11-21%2010-30-27.png

Shift() can be called in a single column (as in the image above), or in the entire DataFrame where all columns will be shifted. You can also scroll through more than one row, and in any direction.

# shifts all rows down by 1
dataframe.shift(1)

# shifts all rows in name column up 5
dataframe[name].shift(-5)

# shifts all rows in the price column down 3
dataframe[name].shift(3)

Shift is particularly useful when it comes to financial data. For example, it can be used to help calculate the percentage growth between one row and the next, or to find the difference in stock prices over a series of days. Let's see an example

We have imported the GDP from the FRED, now we will create a new column called Growth where we can do the math of the difference (in absolute values) between the different days

We can now see the absolute differences in this new column. An important clarification: the first row of growth column is now 'NaN' because it has no one to do the calculation with, it is the first row of the dataset

Calculating basic financial statistics

Two useful calculations that can be made with financial data are variance and covariance. To illustrate these concepts, let's use the example of a DataFrame that measures stock and bond prices over time

Screenshot%20from%202020-11-21%2010-45-11.png

Variance

Variance measures how far a set of numbers is from its average. In finance, it is used to determine the volatility of investments.

dataframe['stocks'].var() # 106427
dataframe['bonds'].var() # 2272

In the above variance calculations, stocks are greater in value than bonds (106427 vs 2272). That's because stock prices are more dispersed than bond prices, indicating that stocks are a more volatile investment.

Covariance

Covariance, in a financial context, describes the relationship between the returns of two different investments over a period of time, and can be used to help balance a portfolio. Calling our stock and bond columns cov() produces an array that defines the covariance values between each pair of columns in the DataFrame. Covariance is also known as a correlation in finance. In our example data, when stock prices go up, bonds go down. We can use the covariance function to see this numerically.

dataframe.cov()

The above code produces the following DataFrame output:

Screenshot%20from%202020-11-21%2010-51-32.png

Each value above represents the covariance between two columns. The higher the number, the more investments tend to go up and down at the same time. The upper right and lower left columns represent the covariance between stocks and bonds. Here we have a negative number, which indicates that stocks and bonds tend to move in different directions. To see this with an example, we are going to make a more complex exercise, let's go!

Calculating the Logarithmic Return

The Logarithmic return is one of the three methods for calculating the return and assumes that the returns are composed continuously rather than through sub-periods. It is calculated by taking the natural logarithm of the final value divided by the initial value.

As an example, let's say that the final value of an investment was $11 and the initial value was $10. The function would say =LN(11/10) for a result of 9.53%.

In mathematics and statistics, a distinction is often made between discrete and continuous data. The return of the record is the most theoretical continuous version. In the real world, however, most people think of returns divided into discrete periods.

So the logarithmic return is the non-discrete, continuous version, meaning that if the whole period were divided into an infinite number of periods, what would that return be? You see, it's theoretical.

Except for very short periods of time (less than a day), and for theoretical applications involving the calculation and precise measurement of curves, logarithmic returns are not commonly used.

Price data are useful, but in this case, since we want to compare each data set, it would be even better if instead of daily/annual prices, we had information on the returns of daily/annual prices.

As a first step, let's define a function called log_return, which should accept one parameter, prices.

The equation for calculating the log return between two prices is as follows natural_log(current price/previous price)

In our case we want to run this equation for every day/year of price data in our imported DataFrame series (the A series is a single column in a DataFrame).

The pandas shift function (sifth()) can be used to divide each current price by its previous price in the Series.

prices / prices.shift(1)

And we can use Numpy's natural logarithm function to get the log return for each entry in the new Series.

import numpy as np
np.log(Series)

Let's do it

Let's use our new log_return function to calculate the logarithmic return of the DataFrame of Nasdaq, GDP and other indicator that we've already loaded

Now we are going to pass each variable to the log_return function

Comparison of return volatility

We are now ready to compare the volatility of each type of data. Remember that variance, in the context of financial data, tells us how volatile an investment is. We'll use the var() function in Pandas to calculate the variance of returns on stocks and World Bank data, and print the results.

The results can be interpreted in various ways, but in general, the higher the variance, the more volatile the data.

What conclusions can be drawn from these data? Which data set was the most volatile? Did any data sets have similar variances?

sap series: sap: 8.860342194008153e-05 which is equivalent to 0.00008860342194008153

The S&P 500, a set of 500 large companies listed on U.S. stock exchanges, has the smallest variation, and therefore is the least volatile. Because the S&P 500 index is a weighted measure of many stocks in a variety of industries, it is considered a safer and more diversified investment.

Stocks are interesting. The NASDAQ 100 is more volatile than the S&P 500, which, when you think about it, makes sense since the S&P 500 is much more diversified and follows the market more closely.

So finally we have GDP and exports.

Exports are very volatile, which could have to do with industries that have moved abroad in the last 20 years, and global competition for the production of goods.

GDP is actually quite similar to the NASDAQ 100 in terms of volatility, which is perhaps an interesting correlation.

I hope you enjoyed this reading!