Concepton

a device that is generating concepts

NASDAQ Network – Part I

Leave a comment

Yes, this is what I am doing at 1am when market shifts remind that there is a hidden beauty within pseudo chaotic signals that rule our life. Then I go back to the work, which is trying to systematize stock market into a single network and through the static and dynamic properties of the network to understand it better. What can we find there? Lots of cool stuff. Hidden connections, clusters, internal and cross sector connections. What can we possibly derive out of it? Investment considerations, robustness properties of the market, pathologies and more…

I am going to post findings in a small chunks – the way I am actually working on this 🙂

So how shall we start? Certainly from the raw data. We need a trading records of financial instruments for some period of time. I’ve got 2 sets of data, based on ranges  –

  • 1 week of  3.5k NASDAQ company stocks and exchange-traded fund (ETF) indexes with 5 minutes granularity
  • 10 years of daily granularity data for NASDAQ stocks

Next step is to choose the set (or subset, based on required scope), gather the data and probably to cleanup/arrange formats. Once the data is ready, we need to run cross-correlation. This would giver us a matrix of NxN with correlation coefficients (R square) between each stock. From this point and on we will call each Company Stock of ETF a “Node” and connection between two companies an “Edge“. This is because, as I said, we are going to build a network and those are the terms of basic components.

Application of Rsq threshold is going to reduce significantly the amount of stocks that are correlated. How much? Well, exponentially. This is important, since it is reducing the load on our system and makes the analysis faster. In addition it gives us the required focus of investigation:

Amount of Edges (stock connections - axis Y) as function of applied threshold on StdDev (axis X). Exponential drop, so here it is presented in logarithmic scale.

Amount of Edges (stock connections – axis Y) as function of applied threshold on Rsq (axis X). Exponential drop, so here it is presented in logarithmic scale.

I prefer to work with highly correlated signals >0.9 Rsq on 5 minutes granularity data and a bit lower for annual scale signals. This is enough data to dig into for a single person during the night. For example 0.9 Rsq cleanup is giving on my data set 364 nodes (companies and funds) and 5778 edges (connections between them).

To start work easily, need some visualization SW. I like Gephi. We import the Edge table, when Rsq values are defined as “Weights” inside So how it looks like?

Cross Correlation Network of 364 NASDAQ company stocks and funds at 16 Apr 2014 with correlation higher than 0.9 based on 5 minutes granularity sampling. Colors are based on Market Sector, size of node based on capital value.

Cross Correlation Network of 364 NASDAQ company stocks and funds at 16 Apr 2014 with correlation higher than 0.9 based on 5 minutes granularity sampling. Colors are based on Market Sector, size of node based on capital value.

Beautiful, is not it?

Rsq>0.95 gives much more focused picture:

Cross Correlation Network of 116 NASDAQ company stocks and funds at 16 Apr 2014 with correlation higher than 0.95 based on 5 minutes granularity sampling. Colors are based on Market Sector, size of node based on capital value

Cross Correlation Network of 116 NASDAQ company stocks and funds at 16 Apr 2014 with correlation higher than 0.95 based on 5 minutes granularity sampling. Colors are based on Market Sector, size of node based on capital value

Now we can inspect by zooming in, filtering out based on sectors. For example Health Care and Pharma:

Cross Correlation Network of 26 NASDAQ company stocks in Health Care and Pharma sector at 16 Apr 2014 with correlation higher than 0.95 based on 5 minutes granularity sampling. Size of node is based on capital value. Width of connection based on Rsq value

Cross Correlation Network of 26 NASDAQ company stocks in Health Care and Pharma sector at 16 Apr 2014 with correlation higher than 0.95 based on 5 minutes granularity sampling. Size of node is based on capital value. Width of connection based on Rsq value

At this stage we can ask ourselves various questions. For example, how does it look (and why) two stocks with high correlation?

Sometimes it perfectly makes sense, e.g. for  FOXA (for the Class A shares) and FOX (for the Class B shares):

Fox

In other cases you might find that intra-day correlation is not representative and occasional (or caused by rare common event) and so there is a need to switch to annual scale.

Do we build a portfolio of Pharma companies, should we take them all? If now it does not really makes sense, then which part? Rep from each cluster? Well, we can run cluster algorithm, Run Centrality algorithm and probably choose based on those considerations.

 

All those things and more in next parts… 

 

Advertisements

Author: Andrey Gabdulin

www.gabdulin.com Product Development

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s