Below is a simple analysis of London houses prices over time. In this notebook I filter, transform, and visualise data using the Julia programming language. The dataset, which I import from a CSV, was retrieved from this URL: https://www.kaggle.com/datasets/justinas/housing-in-london¶

Import data, select columns, remove non-London data¶

In [1]:
using DataFrames, CSV, Printf, Plots, Dates

df = DataFrame(CSV.File("housing_in_london.csv"));
df = filter(:borough_flag => !=(0), df)
df = select(df, [:date, :area, :average_price])

first(df, 5)
Out[1]:
5×3 DataFrame
Rowdateareaaverage_price
DateString31Int64
11995-01-01city of london91449
21995-02-01city of london82203
31995-03-01city of london79121
41995-04-01city of london77101
51995-05-01city of london84409

Create user defined function to convert column to proper case¶

In [2]:
function to_proper_case(s)
    s = split(s)
    for i in 1:length(s)
        s[i] = uppercase(s[i][1]) * lowercase(s[i][2:end])
    end
    join(s, " ")
end

df[!, :area] = map(to_proper_case, df[!, :area])

first(df, 5)
Out[2]:
5×3 DataFrame
Rowdateareaaverage_price
DateStringInt64
11995-01-01City Of London91449
21995-02-01City Of London82203
31995-03-01City Of London79121
41995-04-01City Of London77101
51995-05-01City Of London84409

Create data frame calculating Tower Hamlets price differences over time¶

In [3]:
df_tower_hamlets = filter(:area => ==("Tower Hamlets"), df)
df_tower_hamlets = filter(row -> month(row.date) == 1, df_tower_hamlets)

df_tower_hamlets[!, :average_price_lag] = circshift(df_tower_hamlets[!, :average_price], (+1))
df_tower_hamlets[1, :average_price_lag] = 0
df_tower_hamlets[!, :price_dif] = df_tower_hamlets[:, :average_price] .- df_tower_hamlets[:, :average_price_lag]

df_tower_hamlets = df_tower_hamlets[2:end, :] # Remove first row (which has no data in lag column)

first(df_tower_hamlets, 5)
Out[3]:
5×5 DataFrame
Rowdateareaaverage_priceaverage_price_lagprice_dif
DateStringInt64Int64Int64
11996-01-01Tower Hamlets67552598657687
21997-01-01Tower Hamlets74439675526887
31998-01-01Tower Hamlets901927443915753
41999-01-01Tower Hamlets1074649019217272
52000-01-01Tower Hamlets12820010746420736

Create two data frames for Jan 1996 and Jan 2020 data¶

In [4]:
df_Jan_1996 = filter(row -> row[:date] == Date("1996-01-01"), df);
df_Jan_2020 = filter(row -> row[:date] == Date("2020-01-01"), df);

Join data frames and calculate 24 year price increases¶

In [5]:
df_1996_2020 = leftjoin(df_Jan_1996, df_Jan_2020, on = [:area => :area], makeunique=true);
In [6]:
df_1996_2020 = rename!(df_1996_2020,
    "average_price" => "average_price_1996",
    "average_price_1" => "average_price_2020");

df_1996_2020[!, :price_increase_abs_24y] = df_1996_2020[:, :average_price_2020] .- df_1996_2020[:, :average_price_1996]
df_1996_2020[!, :price_increase_percent_24y] = (df_1996_2020[:, :average_price_2020] .- df_1996_2020[:, :average_price_1996]) ./ df_1996_2020[:, :average_price_1996] * 100

df_1996_2020 = select(df_1996_2020, [:area, :average_price_1996, :average_price_2020, :price_increase_abs_24y, :price_increase_percent_24y])
df_1996_2020 = sort(df_1996_2020, :price_increase_percent_24y, rev = true);

first(df_1996_2020, 5)
Out[6]:
5×5 DataFrame
Rowareaaverage_price_1996average_price_2020price_increase_abs_24yprice_increase_percent_24y
StringInt64Int64?Int64Float64
1Hackney66327571882505555762.216
2Lambeth68780526690457910665.76
3Southwark65753502612436859664.394
4Westminster1338431019028885185661.361
5Islington89406673418584012653.213

Plot top 10 areas with highested prices in 2020¶

In [7]:
df_1996_2020 = sort(df_1996_2020, :average_price_2020, rev = true);
df_1996_2020_top_10 = first(df_1996_2020, 10);

x = df_1996_2020_top_10[!, :area]
y = df_1996_2020_top_10[!, :average_price_2020]/1000

data_label = [(@sprintf("£%.1fk", yi),9) for yi in y]
(ymin,ymax) = extrema(y) # Compute both min and max, return as a 2-tuple
label_vertical_offset = 0.02*ymax

top_10_price= Plots.bar(x,y,
    ylabel="Price (£1,000s)",
    title="Top 10 Highest Average House Price (2020)",
    legend=false,
    xrotation=45,
    fill="green",
    size=(900,805))

annotate!(x,y.+label_vertical_offset, data_label, ylim = (0,ymax+label_vertical_offset))
Out[7]:

Plot top 10 areas with highest price increases (absolute)¶

In [8]:
df_1996_2020 = sort(df_1996_2020, :price_increase_abs_24y, rev = true);
df_1996_2020_top_10 = first(df_1996_2020, 10);

x = df_1996_2020_top_10[!, :area]
y = df_1996_2020_top_10[!, :price_increase_abs_24y]/1000

data_label = [(@sprintf("£%.1fk", yi),9) for yi in y]
(ymin,ymax) = extrema(y)
label_vertical_offset = 0.02*ymax

top_10_increase_abs = Plots.bar(x,y,
    ylabel="Price Increase (£1,000s)",
    title="Top 10 Highest Increases in Average House Price (1996-2020)",
    legend=false,
    xrotation=45,
    size=(900,805))

annotate!(x,y.+label_vertical_offset, data_label, ylim = (0,ymax+label_vertical_offset))
Out[8]:

Plot top 10 areas with highest price increases (percentage)¶

In [9]:
df_1996_2020 = sort(df_1996_2020, :price_increase_percent_24y, rev = true);
df_1996_2020_top_10 = first(df_1996_2020, 10);

x = df_1996_2020_top_10[!, :area]
y = df_1996_2020_top_10[!, :price_increase_percent_24y]

data_label = [(@sprintf("+%.1f%%", yi),9) for yi in y]
(ymin,ymax) = extrema(y)
label_vertical_offset = 0.02*ymax

top_10_increase_percent = Plots.bar(x,y,
    ylabel="Percentage Increase",
    title="Top 10 Highest Percentage Increases in Average House Price (1996-2020)",
    legend=false,
    xrotation=45,
    size=(900,805))

annotate!(x,y.+label_vertical_offset, data_label, ylim = (0,ymax+label_vertical_offset))
Out[9]:

Plot change in Tower Hamlets house prices over time¶

In [10]:
y_max = maximum(df_tower_hamlets.price_dif/1000)
y_min = minimum(df_tower_hamlets.price_dif/1000)

tower_hamlets_trend = Plots.plot(df_tower_hamlets.date, df_tower_hamlets.price_dif/1000,
    xlabel="Date", ylabel="Price Change (£1,000s)", title="Change in Average Tower Hamlets House Price (1996-2020)",
    marker=:circle, markersize=8,
    legend=false,
    color=:purple,
    size=(900,725),
    ylims=(y_min*1.1, y_max*1.1),
    #xrotation=90
)

hline!(tower_hamlets_trend, [0], line=(:dash, :red))
Out[10]:
In [ ]: