Sunday, May 7, 2017

Correlations among the Fraser family members using R

So I've been fiddling around with R over the weekends. R is both the name of a programming language and an open-source software which is used for statistical computing. I'm using R in the workplace and it comes equipped with a full suite of packages for various kinds of data manipulation/analysis/etc.

Not too long ago, I wondered whether could R be used for investing-related purposes. True enough, there was a package known as quantmod which would be a treasure trove for traders. After some fiddling around, what I liked about quantmod is its ease in importing data from various financial sources (e.g. yahoo finance, google finance, etc) into R itself.

As I'm currently vested in some of the Fraser family members, I thought to myself how fun it would be if I could create a function in R that will be able to tell me how correlated the Fraser family members are.

Before proceeding further, there are some assumptions I have made. First, I've set the time period from 1 January 2016 to 31 December 2016. Second, I did not include Frasers Logistics & Industrial Trust as it does not have a full-year worth of data from the above-mentioned time period. Third, I assumed that there is a monotonic relationship among the Fraser family members. Monotonic relationships are less restrictive than linear relationships as linear relationships are monotonic, but not all monotonic relationships are linear. Therefore, I will be using the Spearman's correlation, which is suited for this task.

Basically, the function which I have written takes in three arguments: (a) the list of stocks to be correlated with one another, (b) the start date, and (c) the end date.

The function consists of the following steps:
1). use quantmod to import a list of stock symbols/tickers to be downloaded from yahoo finance
2). retain only the closing price of all the stock counters from the period between the start date to the end date (inclusive of both the start date and the end date as well)
3). join the closing prices together in one dataset, with each column representing one counter
4). produce scatterplot matrices and the Spearman's correlation table.

So, here's the scatterplot matrices produced by R:

At first glance, I thought that there was something wrong with the output. For example, if you look at the scatterplot in the first column from the left, second row from the top, it has F&N on its x-axis and FCL on its y-axis. A mirror-image of that scatterplot could be found on the second column from the left, first row from the top. The change is that now F&N is on the y-axis and FCL is on the x-axis. The scatterplots do really look different from one another if the counters swapped axis! I've checked the underlying raw data and everything seems to be correct. Guess it must be the compression of the y-axis (relative to the x-axis) that causes the distortion in presentation.

What about the Spearman's correlation table?

Over the last year, the performance of Frasers Centrepoint Trust is positively associated with the performance of Frasers Commercial Trust. Frasers Hospitality Trust is least associated with the other counters in the Frasers family (the correlation coefficients with the other counters are generally smaller).

That's all for now.

In the meantime, I shall touch up on my programming code. I realized that I have no error handling mechanism in my code (e.g. if only one symbol/ticker is used as an input, it should throw up a warning statement instead of an error). Also, the scatterplots could be made more visually appealing (most probably with the ggplot2 package).

Readers, if you want to know whether a counter correlate with another counter, do drop a comment. I'm keen to test my function out further. =P

Just specify the list of counters, start date, and end date!


  1. THANK YOU !

    This is so cool. Now I know that I can balance my FHT with FCot as both are fairly high yielding counters that are poorly correlated with each other.


  2. how about a full regression with all 5? :D

    1. Hi Dan,

      Hmm. You mean combinations of 4 variables predicting the 5th variable?

      I'll try to figure it out.