How do we get data to analyze? One way is to use webscrapping (that is how the AirBNB dataset was created). Another way is to use APIs (application programming interfaces). These allow us to get real time data without having to store our own copy.
This tutorial is divided into 4 parts:
In this example we use the website https://financialmodelingprep.com
to get data about stocks. The site can give us real time pricing data as well as financial metrics and other information about the company.
We will start out by getting some basic data and saving it as an excel workbook for later use.
import requests # this lets us call the API to the external website
import pandas as pd
def getdata(stock: str):
"""This function gets data from the API and returns specific values
:param stock: Official stock ticker symbol
:returns share_price: Array of current share price
:returns cash: Array of cash on hand
:returns debt: Array of debt
:returns qRev: Array of quarterly revenue
:returns ceo: Name of the CEO
"""
# Company Quote Group of Items
company_quote = requests.get(f"{base}quote/{stock}?apikey={key}")
company_quote = company_quote.json()
share_price = float(company_quote[0]['price'])
# Balance Sheet Group of Items
BS = requests.get(f"{base}financials/balance-sheet-statement/{stock}?period=quarter&apikey={key}")
BS = BS.json()
# Total Cash
cash = float(BS['financials'][0]['Cash and short-term investments'])
# Total Debt
debt = float(BS['financials'][0]['Total debt'])
# Income Statement Group of Items
IS = requests.get(f"{base}financials/income-statement/{stock}?period=quarter&apikey={key}")
IS = IS.json()
# Most Recent Quarterly Revenue
qRev = float(IS['financials'][0]['Revenue'])
# Company Profile Group of Items
company_info = requests.get(f"{base}company/profile/{stock}?apikey={key}")
company_info = company_info.json()
# Chief Executive Officer
ceo = company_info['profile']['ceo']
return (share_price, cash, debt, qRev, ceo)
base = 'https://financialmodelingprep.com/api/v3/'
key = '<YOUR KEY HERE>'
tickers = ('AAPL', 'MSFT', 'GOOG', 'T', 'CSCO', 'INTC', 'ORCL', 'AMZN', 'FB', 'TSLA', 'NVDA')
data = map(getdata, tickers)
quarter = '1Q2020'
df = pd.DataFrame(data,
columns=['Share Price ($)', 'Total Cash', 'Total Debt', f'{quarter} Revenue', 'CEO'],
index=tickers)
writer = pd.ExcelWriter('example.xlsx')
df.to_excel(writer, 'Statistics')
writer.save()
df.head(10)
# convert to billions for easier reading
cols_to_convert = ['Total Cash','Total Debt','1Q2020 Revenue']
df[cols_to_convert] = df[cols_to_convert] / 10**9
df.head(10)
Let’s explore the RDS-A stock ticker.
# getting historical data for RDS-A. This code calls the API and transforms the result into a DataFrame.
import pandas as pd
from pandas.io.json import json_normalize
df = pd.read_json("https://financialmodelingprep.com/api/v3/quotes/nyse?apikey=<YOUR KEY HERE>")
df.head()
key = '<YOUR KEY HERE>'
ticker = "T"
target = "https://financialmodelingprep.com/api/v3/historical-price-full/{}?apikey={}".format(ticker, key)
df = pd.read_json(target)
df = json_normalize(df['historical'])
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.to_csv('{} data.csv'.format(ticker))
df = pd.read_csv('data/RDS-A data.csv')
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
ticker = "RDS-A"
df.head()
%matplotlib inline
df['adjClose'].plot()
A basic method of analysis for stocks is the Bollinger band. This uses the moving average and the moving standard deviation to identify points where the trader should buy/sell. We start by defining how the Bollinger Bands are calculated and defining how we want the data to be plotted. These defined functions can be used again for analizing another stock.
import matplotlib.pyplot as plt
%matplotlib inline
def bollinger_bands(df, target_col: str = 'adjClose')->pd.DataFrame:
"""Calculates Bollinger Bands and returns an updated DataFrame.
:param df: DataFrame
:param target_col: column that will be used for the calcuations
:type target_col: str
:return df: df with additional columns
:rtype df: pd.DataFrame
"""
df['20 Day MA'] = df[target_col].rolling(20).mean()
df['20 Day MA_lower bound'] = df['20 Day MA'] - df[target_col].rolling(20).std()*2
df['20 Day MA_upper bound'] = df['20 Day MA'] + df[target_col].rolling(20).std()*2
return df
def bb_plot(df: pd.DataFrame = df, target_col: str = 'adjClose'):
"""Calculates time series plot with Bollinger Bands
:param df: DataFrame
:param target_col: column that will be used for the calcuations
:type target_col: str
:return: plot
:rtype: matplotlib.pyplot
"""
x = df.index
y= df[['adjClose', '20 Day MA', '20 Day MA_lower bound','20 Day MA_upper bound']]
plt.fill_between(x, rds['20 Day MA_lower bound'],rds['20 Day MA_upper bound'], alpha = .5)
plt.plot(x,y)
plt.title("Bollinger Bands for {}".format(ticker))
plt.xlabel('Date (Year/Month)')
plt.ylabel('Price(USD)')
plt.legend(y)
plt.show()
return plt
rds = bollinger_bands(df.sort_values(by='date'), 'adjClose')
bb_plot(rds)
The basic strategy is to buy/sell short if the actual value moves beyond the bands. The assumption is that the stock will return toward the mean.
Let’s test our returns if we follow this strategy.
import numpy as np
def bb_strategy(df: pd.DataFrame)->pd.DataFrame:
"""Calculates the returns of implementing the Bollinger Bands
:param df: DataFrame
:return df: df with additional columns for Positing, Market Return, and Strategy Return
:rtype df: pd.DataFrame
"""
df['Position'] = None
#Fill our newly created position column - set to sell (-1) when the price hits the upper band, and set to buy (1) when it hits the lower band
for row in range(len(df)):
if (df['adjClose'].iloc[row] > df['20 Day MA_upper bound'].iloc[row]) and (df['adjClose'].iloc[row-1] < df['20 Day MA_upper bound'].iloc[row-1]):
df['Position'].iloc[row] = -1
if (df['adjClose'].iloc[row] < df['20 Day MA_lower bound'].iloc[row]) and (df['adjClose'].iloc[row-1] > df['20 Day MA_lower bound'].iloc[row-1]):
df['Position'].iloc[row] = 1
#Forward fill our position column to replace the "None" values with the correct long/short positions to represent the "holding" of our position
#forward through time
df['Position'].fillna(method='ffill',inplace=True)
#Calculate the daily market return and multiply that by the position to determine strategy returns
df['Market Return'] = np.log(df['adjClose'] / df['adjClose'].shift(1))
df['Strategy Return'] = df['Market Return'] * df['Position']
return df
rds = bb_strategy(rds)
rds['Strategy Return'].cumsum().plot()
Yikes! Our returns are pretty bad just following the Bollinger Band strategy. We should do more analysis and try another strategy.
This section lists some ideas for extending the tutorial that you may wish to explore.
This section provides more resources on the topic if you are looking to go deeper.
In this tutorial, you were introduced to accessing data using APIs. Specifically, you learned:
Use the easiest option to get data and then take the data and do amazing analysis.