InfoSum's first product touts decentralized big data insights

InfoSum's first product touts decentralized big data insights
From TechCrunch - March 9, 2018

Nick Halsteads new startup, InfoSum, is launching its first product todaymoving one step closer to his founding vision of a data platform that can help businesses and organizations unlock insights from big data silos without compromising user privacy, data security or data protection law. So a pretty high bar then.

If the underlying tech lives up to the promises being made for it, the timing for this business looks very good indeed, with the European Unions new General Data Protection Regulation (GDPR) mere months away from applying across the regionushering in a new regime of eye-wateringly large penalties to incentivize data handling best practice.

InfoSum bills its approach to collaboration around personal data as fully GDPR compliantbecause it says it doesnt rely on sharing the actual raw data with any third parties.

Rather a mathematical model is used to make a statistical comparison, and the platform delivers aggregatedbut still, says Halsteaduseful insights. Though he says the regulatory angle is fortuitous, rather than the full inspiration for the product.

Two years ago, I saw that the world definitely needed a different way to think about working on knowledge about people, he tells TechCrunch. Both for privacy [reasons]there isnt a week where we dont see some kind of data breach they happen all the timebut also privacy isnt enough by itself. There has to be a commercial reason to change things.

The commercial imperative he reckons hes spied is around how unmanageable big data can become when its pooled for collaborative purposes.

Datasets invariably need a lot of cleaning up to make different databases align and overlap. And the process of cleaning and structuring data so it can be usefully compared can run to multiple weeks. Yet that effort has to be put in before you really know if it will be worth your while doing so.

That snag of time + effort is a major barrier preventing even large companies from doing more interesting things with their data holdings, argues Halstead.

So InfoSums first productcalled Linkis intended to give businesses a glimpse of the art of the possible, as he puts itin just a couple of hours, rather than the nine, ten weeks he says it might otherwise take them.

I set myself a challenge could I get through the barriers that companies have around privacy, security, and the commercial risks when they handle consumer data. And, more importantly, when they need to work with third parties or need to work across their corporation where theyve got numbers of consumer data and they want to be able to look at that data and look at the combined knowledge across those.

Thats really where I came up with this idea of non-movement of data. And thats the core principle of whats behind InfoSum I can connect knowledge across two data sets, as if theyve been pooled.

Halstead says that the problem with the traditional data pooling routeso copying and sharing raw data with all sorts of partners (or even internally, thereby expanding the risk vector surface area)is that its risky. The myriad data breaches that regularly make headlines nowadays are a testament to that.

But thats not the only commercial consideration in play, as he points out that raw data which has been shared is immediately less valuablebecause it cant be sold again.

If I give you a data set in its raw form, I cant sell that to you againyou can take it away, you can slice it and dice it as many ways as you want. You wont need to come back to me for another three or four years for that same data, he argues.From a commercial point of view [what were doing] makes the data more valuable. In that data is never actually having to be handed over to the other party.

Not blockchain for privacy

Decentralization, as a technology approach, is also of course having a major moment right nowthanks to blockchain hype. But InfoSum is definitely not blockchain. Which is a good thing. No sensible person should be trying to put personal data on a blockchain.

The reality is that all the companies that say theyre doing blockchain for privacy arent using blockchain for the privacy part, theyre just using it for a trust model, or recording the transactions that occur, says Halstead, discussing why blockchain is terrible for privacy.

Because you cant use the blockchain and say its GDPR compliant or privacy safe. Because the whole transparency part of it and the fact that its immutable. You cant have an immutable database where you cant then delete users from it. It just doesnt work.

Instead he describes InfoSums technology as blockchain-esquebecause everyone stays holding their data. The trust is then that because everyone holds their data, no one needs to give their data to everyone else. But you can still crucially, through our technology, combine the knowledge across those different data sets.

So what exactly is InfoSum doing to the raw personal data to make it privacy safe? Halstead claims it goes beyond hashing or encrypting it. Our solution goes beyond thatthere is no way to re-identify any of our data because its not ever represented in that way, he says, further claiming:It is absolutely 100 per cent data isolation, and we are the only company doing this in this way.

There are solutions out there where traditional models are pooling it but with encryption on top of it. But again if the encryption gets broken the data is still ending up being in a single silo.

InfoSums approach is based on mathematically modeling users, using a one way model, and using that to make statistical comparisons and serve up aggregated insights.

You cant read things out of it, you can only test things against it, he says of how its transforming the data. So its only useful if you actually knew who those users were beforehandwhich obviously youre not going to. And you wouldnt be able to do that unless you had access to our underlying code-base.Everyone else either users encryption or hashing or a combination of both of those.

This one-way modeling technique is in the process of being patentedso Halstead says he cant discuss the fine detailsbut he does mention a long standing technique for optimizing database communications, called bloom filters, saying those sorts of principles underpin InfoSums approach.

Although he also says its using those kind oftechniques differently. Heres how InfoSums website describes this process (which it calls Quantum):

InfoSum Quantum irreversibly anonymises data and creates a mathematical model that enables isolated datasets to be statistically compared. Identities are matched at an individual level and results are collated at an aggregate levelwithout bringing the datasets together.

On the surface, the approach shares a similar structure toFacebooks Custom Audiences Product, where advertisers customer lists are locally hashed and then uploaded to Facebook for matching against its own list of hashed customer IDswith any matches used to create a custom audience for ad targeting purposes.

Though Halstead argues InfoSums platform offers more for even this kind of audience building marketing scenario, because its users can use much more valuable knowledge to model onknowledge they would not comfortably share with Facebook because of the commercial risks of handing over that first person valuable data.

For instance if you had an attribute that defined which were your most valuable customers, you would be very unlikely to share that valuable knowledgeyet if you could safely then it would be one of the most potent indicators to model upon, he suggests.

He also argues that InfoSum users will be able to achieve greater marketing insights via collaborations with other users of the platform vs being a customer of Facebook Custom Audiencesbecause Facebook simply does not open up its knowledge.

You send them your customer lists, but they dont then let you have the data they have, he adds. InfoSum for many DMPs [data management platforms] will allow them to collaborate with customers so the whole purchasing of marketing can be much more transparent.

He also emphasizes that marketing is just one of the use-cases InfoSums platform can address.

Decentralized bunkers of data

One important clarification: InfoSum customers data does get movedbut its moved into a private isolated bunker of their choosing, rather than being uploaded to a third party.

The easiest one to use is where we basically create you a 100 per cent isolated instance in Amazon [Web Services], says Halstead. Weve worked with Amazon on this so that weve used a whole number of techniques so that once we create this for you, you put your data into itwe dont have access to it. And when you connect it to the other part we use this data modeling so that no data then moves between them.

The bunker is an isolated instance, he adds, elaborating on how communications with these bunkers are secured. It has its own firewall, a private VPN, and of course uses standard SSL security. And once you have finished normalising the data it is turned into a form in which all PII [personally identifiable information] is deleted.

And of course like any other security related company we have had independent security companies penetration test our solution and look at our architecture design.

Other key pieces of InfoSums technology are around data integration and identity mappingaimed at tackling the (inevitable) problem of data in different databases/datasets being stored in different formats. Which again is one of the commercial reasons why big data silos often stay just that: Silos.

Halstead gave TechCrunch a demo showing how the platform ingests and connects data, with users able to use simple steps to teach the system what is meant by data types stored in different formatssuch as that f means the same as female for gender category purposesto smooth the data mapping and try to get it as clean as possible.

Once that step has been completed, the user (or collaborating users) are able to get a view on how well linked their data sets areand thus to glimpse the start of the art of the possible.

In practice this means they can choose to run different reports atop their linked datasetssuch as if they want to enrich their data holdings by linking their own users across different products to gain new insights, such as for internal research purposes.

Or, where theres two InfoSum users linking different data sets, they could use it for propensity modeling or lookalike modeling of customers, says Halstead. So, for example, a company could link models of their users with models of the users of a third party that holds richer data on its users to identify potential new customer types to target marketing at.

Because Ive asked to look at the overlap I can literally say I only know the gender of these people but I would also like to know what their income is, he says, fleshing out another possible usage scenario. You cant drill into this, you cant do really deep analyticsthats what well be launching later. But Link allows you to get this idea of what would it look like if I combine our datasets.

The key here is its opening up a whole load of industries where sensitivity around doing thisand where, even in industries that share a lot of data already but where GDPR is going to be a massive barrier to it in the future.

Halstead says he expects big demand from the marketing industry which is of course having to scramble to rework its processes to ensure they dont fall foul of GDPR.

Engineering around privacy risks?


Continue reading at TechCrunch »