Python is well known for its data science and analysis capabilities with packages such as Numpy, Matplotlib and Pandas. Each of these packages also offer a variety of graphing and visualisation tools, the majority of which is already very well documented.
Recently, there has been a spate of new visualisation focused packages, some which have been well received and documented, and others, less so. Plotly, which we will explore in this post, is one of the latter ones.
For this exercise, we will use real-world real estate data of sold properties in the city of Perth, Western Australia. We will use our data to investigate the ever popular question - ‘What is the current state of the market in July 2017?’
The general steps in this exercise were:
- Data is scraped from real estate listing websites using a scraper built in Python using BeautifulSoup over a period of time
- Data is stored in a PostgreSQL database that can be added to over time
- For analysis, the database is exported to a Jupyter Notebook which serves as a working environment
- The plotly package is installed in the Jupyter Notebook
To set some context, our data set looks like this:
Let’s start off by exporting our data out from the PSQL database. While Jupyter Notebook has the ability to link directly to the database, for simplicity in this instance we will export a static CSV that we will upload to our notebook.
psql: \copy tablename to 'filename' csv;
We then upload the CSV file to the server that hosts the notebook, and initialise the notebook with the relevant packages
import pandas as pd import numpy as np import plotly import plotly.plotly as py from plotly.graph\_objs import \* import plotly.figure\_factory as ff plotly.offline.init\_notebook\_mode(connected=True) df = pd.read\_csv("file\_location") df.head()
For our first attempt, a simple property value over time bar chart.
# group dates by week df.Date = pd.to\_datetime(df.Date) - pd.to\_timedelta(7, unit='d') df = df.groupby([pd.Grouper(key='Date', freq='W-MON'), 'Price'])['Type'].count() df = df.reset\_index() # initialise bar chart using Plotly offline mode data = [Bar(x=df.Date, y=df.Price)] plotly.offline.iplot(data, filename='jupyter/basic\_bar')
That was pretty straight forward. A chart like this can be created by any other number of programs, so it is not terribly impressive.
Let’s try something a little more complex. A nice heatmap delving further into the types of properties being sold.
# group data by number of bedrooms per week df = df.groupby(['Bed', pd.Grouper(key='Date', freq='W-MON')])['Type'].count() df = df.reset\_index() # intialise heatmap import plotly.graph\_objs as go data = [go.Heatmap( z=df.Type, x=df.Date, y=df.Bed )] layout = go.Layout( title='Perth properties sold by number of bedrooms (May - Jul 2017)', yaxis = dict( title='Number of beds' ), xaxis = dict( title='Date sold' ) ) fig = go.Figure(data=data, layout=layout) plotly.offline.iplot(fig, filename='datetime-heatmap')
From this figure, we can see that 3 and 4 bedroom properties dominate the Perth market. We can also see sales volume dropping off towards the end of the month.
This is just the tip of the iceberg in terms of what Plotly can offer. It will be interesting to see where it lands amongst all the other visualisation tools in the market.