Modeling Relevant Sources for a Currency: USDJPY

Introduction

Right Relevance (RR) provides curated information and intelligence on ~50 thousand topics. This includes:

  • Topic relationships including related topics & semantic information like synonyms.
  • Topical influencers (~2.5M) with score and rank.
  • Topical content and information in the form of articles, videos and conversations.

Additionally, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to events like elections, emerging technologies, activism, conferences, product launches etc.

This report is part of a series to apply the Relevance-as-a-Service (RaaS) Insights technology to financial markets intelligence esp. financial instruments like equities, commodities, bonds and forex to begin with. This is the third report in the series after $AMZN & Crude Oil. The focus of this report is to model the most relevant sources for a currency with ‘usdjpy‘ as the example.

Hypothesis for Application to Financial Instruments

The scale and availability of data is increasing exponentially. This is a boon overall but exposes some serious problems like the lack of ability to extract relevance and intelligence from data at this scale, high costs, misinformation and even more seriously disinformation (aka fake news) at scale.

We’ve previously outlined how trust from influencers (trusted sources) can be inductively applied to the fake news problem by providing a measure of trust & verifiability in addition to our core value prop of relevance.

We’ve applied the Right Relevance RaaS Insights technology to several scenarios listed below with great success.

This is our attempt to apply the same set of technologies and approaches to financial instruments, which in this report, is a currency: ‘usdjpy‘.

In the financial domain, several complex models exist, from high speed low latency quant trading to longer term analysis to back testing among others. Most of these models struggle to handle data at the current scale. Cost and latency are growing problems as the scale continues to increase. Errors due to misinformation and disinformation are increasing risks.

The hypothesis for this analysis rests on identifying relevant and verifiable/trustworthy sources (via influencers) to monitor for any given financial security such that we can reduce users/accounts to monitor by 3 magnitudes, which inductively applies to 3 magnitudes less data that needs to be analyzed.

We’ll outline three distinct ways to find relevant sources via our analysis that can then be used as a superset of those sets based on specific needs.

Data & Duration

The report leverages tweets sampled from April 1st to May 8th 2017 and along with Right Relevance topics, topical communities’ and articles data form the basis for the analysis.

The phrases used for gathering tweets are: “usdjpy”, “usd jpy”

Most of the summary report is extracted from the analysis collateral in the form of:

  1. Gephi Communities Graph Visual: Extracts are shown below.
  2. Tableau Online Dashboard: Visualizes graph analysis results, including flocks, top trending terms, top hashtags, top Users/accounts, RR topics, top tweets and several other measures in the form of tables and charts. Faceting is supported per flock, RR topic and Twitter/RR account.

For access to Tableau data and the complete graphs please send email to biz@rightrelevance.com.

The analysis methodology is outlined at http://54.244.44.22/insights

Communities Graph & RR Topics-based Identification

Community detection graph algorithms like Walktrap and InfoMap are used to identify communities (as sub-graphs) in our engagements graph built using Neo4j & R. Graph visualizations are done via Gephi.

The all engagements graph (Fig 1), which includes mentions, shows several small scattered but active subgraphs (aka communities or flocks) providing a high level visual overview of the overall conversations related to ‘usdjpy’ along with the accounts/users with most engagements in that context.

Figure 1: All Engagements Graph for ‘usdjpy’

For Zoomable clickable link here.

The RTs-only graph (Fig 2) is fairly similar in terms of showing no identifiable large flock(s) but a series of active smaller flocks.

Figure 2: RTs-only Graph for ‘usdjpy’

For Zoomable clickable link here.

Fig 3 visualizes the resultant graph produced by the tried and tested approach of superimposing relevant Right Relevance topics, in this case ‘usdjpy‘, over the graph. This technique highlights nodes (aka users/accounts) in the graph that are Right Relevance influencers for the topic ‘usdjpy’ to pinpoint real influence.

usdjpy_onlyFigure 3: All Engagements Graph with relevant RR Topics Superimposed

The list of accounts from the graph in Fig 2 can be extracted from Tableau using the RR Topic ‘usdjy’ as a facet. Select the topic (Fig 4) and then click the ‘Top Tables’ tab next to ‘Dashboard’ in the top menu.

Facet_RRTopic_UsdJpy
Figure 4: ‘usdjpy’ RR Topics Facet

RR Topics faceting leads to the lists (Fig 5) of top accounts for ‘usdjpy’ by several measures. This data is available via Right Relevance Insights API.

Facet_UsdJpy_Users
Figure 5: Top Accounts using RR Topic ‘usdjpy’ As Facet

Some of the top users by this method are:

Figure 6: Top 6 ‘usdjpy’ Accounts using RR Topics As Facets

The above shows a varied set of accounts with users like @RealBrianWatt and  @ZFXtrading having comparitively high influence in ‘usdjpy’ in spite of having 2 magnitudes less followers than some of the bigger accounts.

These lists of RR influencers from the graph and Tabeau, that are influencers in relevant topics for our scenario, form the first set of accounts that we believe need to be monitored for ‘usdjpy’ related news and information.

Network Connectors-based Identification

Right Relevance ‘engagement influence’ measures are calculated by a set of graph analysis algorithms that measure the quality and quantity of engagements (RTs, mentions, replies), reach of tweets etc. within the context of a subject (event, trend etc.).

We apply several methods including PageRank and Betweeness centrality to measure Flock influence. The meaning of rankings within this methodology are documented at Twitter Conversation Performance Measures.

Prior work has repeatedly shown us the susceptibility of PageRank to high engagements and high followers count. Betweeness centrality, which is a measure of the degree to which a node forms a bridge or critical link between all other users, leads to our top network connectors list. It is a measure of influence wrt value in being information and/or communication hubs.

Figure 7: Top 50 ‘usdjpy’ Connectors

Fig 7 is the list of top 50 connector accounts. Brian Watt (@RealBrianWatt), Miad Kasravi (@ZFXtrading) and FXStreet News (@FXstreetNews) show up in the top 10 here too.

Figure 8: Top 2 Connector Accounts

V (@Vtradez) and Bamabroker (@Bamabroker) form the top 2 accounts by this measure. 

We have found Betweenness Centrality to be the leading way to identify valuable accounts as it bubbles up accounts with potentially real influence in terms of news and information dissemination on a given subject.

This forms the second set of accounts we use to monitor ‘usdjpy’ related information.

Flocks based Identification

The engagements or “flocking” in the context of a subject (topic, event etc.) can lead to building of temporal communities with local influence that is not obvious by the standalone influence of the individuals or without the context of the event. The subgraphs aka communities formed by applying community detection graph algorithms are termed as ‘Flocks’.

In this approach, we pinpoint the most important accounts for ‘usdjpy’ news and information via flocks analysis. Flocks analysis also helps identify accounts for more fine grained information feeds within the overall usdjpy domain.

Fig 9 lists the top 7 flocks in the context of ‘usdjpy’ related conversations. The Twitter handle of the top PageRank account that is part of a flock is used as the flock name. The full list is available via the public Tableau dashboard.

Flocks_Top10

Figure 9: Top 10 ‘usdjpy’ Flocks

Top 2 flocks are reviewed below from selection pov. Same methodology can be applied to other flocks.

Another advantage of this is the ability to backtest and identify which flock provides the most value.

Flock: ‘zerohedge’

The top trending terms (Fig 10) for this flock are forex heavy including ‘usdjpy’.

Fl_zerohedge_trending.pngFigure 10: Top Trending Terms for flock ‘zerohedge’

The top hashtags and RR topics (Fig 10) confirm the relevance of this flock to our subject. One of the top hashtags #BOJ for this flock has direct relevance to ‘usdjpy’.

Fl_zerohedge_topall

Figure 11: Top Hashtags, RR Topics & Users for flock ‘zerohedge’

Zerohedhe (@zerohedge) and V (@Vtradez), Brian Watt (@RealBrianWatt), David Brady CFA (@GlobalProTraderand Dale Pinkert (@ForexStopHunterform the top 5 users for this flock.

Flock: ‘Bamabroker’

The top trending terms, hashtags and RR topics show the relevance of the flock with a healthy mix of precious metals (#gold, ‘precious metals’) and stock markets charting (#onechart, stock markets, stocks, charting etc.) thrown in the mix.

Figure 12: Top Trending Terms for flock ‘Bamabroker’

Fig X shows the top accounts/users for this flock.

Fl_Bama_Users

Figure 13: Top RR Topics & Hashtags for flock ‘Amaka_Ekwo’

This flock should be considered for ‘usdjpy’ and may need filtering to isolate precious metals and charting specific conversations.

Conclusions

Wrt our hypothesis, relevance and verifiability have been deeply tested using the Right Relevance platform for over 3 yrs. Injecting relevant topics and influencers’ graphs from the core technology platform and cross-checking every account selection by Betweenness Centrality and Flocks provides another layer of confirmation.

Wrt scale, using Twitter sampled data as our testbed, we’re looking at a potential initial scale of ~300M accounts and ~1B Tweets. Our relevant sets are reduced to 5-10K range for users and less than 5K tweets/day for related terms. Even assuming simple algorithms reducing the original scale by a magnitude, we’re looking at another 3-scale reduction in magnitude wrt accounts and tweets while providing relevance, verifiability and engagement-based trust. As shown in the US election analysis, we can further mitigate bot impact (some is ingrained in our initial cutoffs). Also, we can dynamically update these sets on a daily, weekly, monthly basis as required.

Our contention is that this would lead to:

  1. Reduction in noise due to much higher relevance
  2. Higher degree of trust and verifiability in data leading to fewer esp. catastrophic errors
  3. Substantial reduction in cost for data and processing
  4. Decisive latency edge since both simple and complex models can rapidly churn through the far less data and produces results before others
  5. Enabling more complex models which can do deeper analysis as noise is massively reduced and with much higher signal to noise ratio in the data

Next, we’re building feeds for ‘usdjpy’ currency using the sets of accounts above and providing access to our users for trial purposes.

Please contact biz@rightrelevance.com for more details.

Write a Reply or Comment

Your email address will not be published. Required fields are marked *