Canada should lead in the battle of intelligence platforms

I have been the VP and at Ficstar, a leading data extraction platform for nearly the past two years and in those times I have seen the need for better platforms for using the data we gather.  When I started here, we were simply pulling data from competitor websites and feeding it directly to our clients, mostly via FTP files and mostly just raw.

My first task when I got there was to make sure that we weren’t just simply sending our data away to our clients who were often not as sophisticated in data techniques (and especially techniques) as I or my staff. We at this point decided that the delivery of such “intelligence” would be best done via API and that providing clients with the possibility of constantly having ready (and historical) data would make so much sense.

Changing the way web scraping was done

This is when we built out the specifications for a RESTful API that clients could authenticate against (via OAuth2), query our data stores (where we encrypted all the data at rest) and get back specific results for the most up-to-date extractions, or if they’ve paid for the service, to be able to query back in time and get specifics about the past.

Layering on insight

Now that we had this notion of a database to hold data in perpetuity and we were able to charge a good amount of the regular extractions we were making for clients, I thought we should become a consumer of their data that presented them with further insight into the data.  At first, this journey took us to business insight and was very specific to the spare parts industries of the world – businesses such as car parts or electronic parts (for consumer goods).

Going one step further

The next step beyond the diagnostic value of this data, which we have nailed down quite nicely is to create an internal tool for ourselves that will allow us to go beyond the Business Intelligence / Dashboard view of the world for these clients and to layer on more predictive and hopefully reaching a product that is prescriptive and can help the operators really understand what should be done.

Image result for georgian partners predictive descriptive prescriptive

The golden grail

With all this data in house for so many clients, we have the opportunity to get really smart with it and start doing the things that our clients don’t yet know how to do.  This will lead to very large profit margins for our main product lines where we can then charge for analysis and prediction done on the data they have.

In order to power this, I have been tinkering with a concept that is very akin to what Microsoft Azure has built on their cloud service – the Microsoft Azure ML Studio.

I, myself, really like the way that Microsoft has gone about creating this platform, however they just happen to be on the wrong cloud for us to use. This process-flow tree or graph structure is perfect for both layering on some data cleansing (which is always required in machine learning), but also does the entire job – such as splitting the data set(s) into training, validation and test data, it trains the model (albeit, I have not tested how fast it does this) and helps you evaluate different models.

All of this is done in a GUI, which makes life so much easier.

Cloud vs on-premise

The biggest issue I see with this is that I want it to be portable and have a marketplace aspect to it so that my staff can reuse the models we eventually will create and also be able to share specific ones with each other.  A federated marketplace, much like what Amazon itself does for its EC2 images would be the perfect paradigm for this (and for me).Image result for ec2 marketplace

Image result for ec2 marketplaceImage result for ec2 marketplace

Having this notion of a market for models will open this up to not only my staff, but to anyone that we may, at some point in the future, look to sell this to. This would allow us to serve specific models and hypotheses we have to our clients and have them submit things back to us, either to check for them (because they may lack expertise) or just to even house for them and be able to serve back when they or their team(s) are logged in.

Deploying such a platform

Lately at the Apache and O’Reilly conferences that I’ve attended, I’ve noticed a slight shift away from Apache Mesos, which is a great orchestration framework for deploying applications, to a new incumbent (well, new to me at this point) called Kubernetes.

From what I have seen (in presentations) and read, this is a Google development and seems to be very promising as it is using docker as its underlying “image” and does a lot of the same things that I had grown accustomed to with Solaris Zones. This is, of course, not true virtualization as docker really runs as a process of the underlying operating system rather than a hypervisor, but with today’s available compute this should not be a problem for us.

Image result for kubernetes

Conclusion

I could very well see a product that I build in the very near future that would contain a lot of these building blocks and be ready for real-time by layering on replicated queues such as Apache Kafka and possibly a web-ready frontend written in AngularJS or ReactJS. With proper logon to our platform, I could segregate clients to a specific set of nodes (in the Kubernetes terminology) and make sure that we control the way they access GPUs to do the best job of training these models.

Seeing as no one is doing this outside of the very big box cloud providers, we could then choose to keep this for ourselves or decide on whether or not this would be good to sell.

Keep your fingers crossed that I can get this approved in the coming months and can fund it – I’d like to be your “as a service” provider for all things data, pipeline and machine learning in 2017!

Canada should lead in the battle of intelligence platforms

Post navigation


Leave a Reply