Below is a simple analysis of English houses prices over time. All analysis is conducted in SQL. Python is used to connect to the database and display the results. The dataset, which I import from an SQLite database, was retrieved from this URL: https://www.kaggle.com/datasets/justinas/housing-in-london¶
In [1]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('property_db.db')
db_cursor = conn.cursor()

Display areas ordered by 2020 prices¶

In [2]:
query_result = pd.read_sql_query("""


SELECT
SUBSTR(date,0,8) AS year_month,
area,
average_price

FROM property_details

WHERE date = '2020-01-01'
AND borough_flag = '0'
AND area NOT IN ('outer london', 'inner london', 'england')

ORDER by average_price DESC


""", conn)

query_result
Out[2]:
year_month area average_price
0 2020-01 london 476588
1 2020-01 south east 320700
2 2020-01 east of england 286999
3 2020-01 south west 254320
4 2020-01 west midlands 200628
5 2020-01 east midlands 195707
6 2020-01 yorks and the humber 165383
7 2020-01 north west 164769
8 2020-01 north east 126592

Display areas ordered by difference between 2020 price and average England price¶

In [3]:
query_result = pd.read_sql_query("""


WITH area_england_avg AS (
    SELECT
    area,
    average_price,
    -- Create column for England only
    (SELECT average_price FROM property_details WHERE date = '2020-01-01' AND area = 'england') AS average_price_england

    FROM property_details

    WHERE date = '2020-01-01'
    AND borough_flag = '0'
    AND area NOT IN ('outer london', 'inner london', 'england')
)

SELECT *,
(average_price - average_price_england) as difference_from_england_average

FROM area_england_avg

ORDER by difference_from_england_average DESC;


""", conn)

query_result
Out[3]:
area average_price average_price_england difference_from_england_average
0 london 476588 247355 229233
1 south east 320700 247355 73345
2 east of england 286999 247355 39644
3 south west 254320 247355 6965
4 west midlands 200628 247355 -46727
5 east midlands 195707 247355 -51648
6 yorks and the humber 165383 247355 -81972
7 north west 164769 247355 -82586
8 north east 126592 247355 -120763

Display year-over-year England price differences, ordered by year¶

In [7]:
query_result = pd.read_sql_query("""


SELECT
SUBSTR(date,0,8) AS year_month,
average_price,
(average_price - LAG(average_price, 1) OVER (ORDER BY date)) AS difference_year_over_year

FROM property_details

WHERE area = 'england'
AND date like "%-01-01"

ORDER by date;


""", conn)

query_result
Out[7]:
year_month average_price difference_year_over_year
0 1995-01 53203 NaN
1 1996-01 52333 -870.0
2 1997-01 55789 3456.0
3 1998-01 61902 6113.0
4 1999-01 65766 3864.0
5 2000-01 75219 9453.0
6 2001-01 84377 9158.0
7 2002-01 96215 11838.0
8 2003-01 121610 25395.0
9 2004-01 139719 18109.0
10 2005-01 158572 18853.0
11 2006-01 166544 7972.0
12 2007-01 181824 15280.0
13 2008-01 191750 9926.0
14 2009-01 162673 -29077.0
15 2010-01 174458 11785.0
16 2011-01 174442 -16.0
17 2012-01 174179 -263.0
18 2013-01 176816 2637.0
19 2014-01 188265 11449.0
20 2015-01 202856 14591.0
21 2016-01 220361 17505.0
22 2017-01 231593 11232.0
23 2018-01 241061 9468.0
24 2019-01 244641 3580.0
25 2020-01 247355 2714.0

Display regional differences between 2000 and 2020 prices, ordered by price difference percent¶

In [8]:
query_result = pd.read_sql_query("""


-- Create CTE for 2000 and 2020 data
WITH prices_2000 AS (
    SELECT
    area,
    average_price as average_price_2000

    FROM property_details

    WHERE date = '2000-01-01'
    AND borough_flag = '0'
    AND area NOT IN ('outer london', 'inner london', 'england')
    ),
    
    prices_2020 AS (
    SELECT
    area,
    average_price as average_price_2020

    FROM property_details

    WHERE date = '2020-01-01'
    AND borough_flag = '0'
    AND area NOT IN ('outer london', 'inner london', 'england')
    )

SELECT
prices_2000.area,
average_price_2000,
average_price_2020,
-- Calculate price difference
(average_price_2020 - average_price_2000) AS price_increase,
(100 * (average_price_2020 - average_price_2000) / average_price_2000) AS price_increase_percent

FROM prices_2000
-- Join CTEs
LEFT JOIN prices_2020
ON prices_2000.area = prices_2020.area

ORDER BY price_increase_percent DESC


""", conn)

query_result
Out[8]:
area average_price_2000 average_price_2020 price_increase price_increase_percent
0 london 130411 476588 346177 265
1 east of england 84192 286999 202807 240
2 east midlands 58080 195707 137627 236
3 yorks and the humber 50489 165383 114894 227
4 north west 51177 164769 113592 221
5 west midlands 63833 200628 136795 214
6 south west 81381 254320 172939 212
7 south east 104358 320700 216342 207
8 north east 47604 126592 78988 165
In [ ]: