A Call to Structure

When building a threat Intelligence team you will face a range of challenges and problems. One of the most significant ones is about how to best take on the ever-growing amount of Threat Intel. It might sound like a luxurious problem to have: The more intel the better! But if you take a closer look at what the available Threat Intelligence supply looks like, or rather, the way it is packaged, the problem becomes apparent. Ideally, you would want to take this ever-growing field of Threat Intelligence supply and work to converge on a central data model – specifically, STIX (Structured Threat Information eXpression). STIX is an open standard language supported by the OASIS open standards body, designed to represent structured information about cyber threats

This isn't a solo effort, so first the intelligence team needs to align properly with the open standards bodies. I was thrilled to deliver our theories around STIX data modeling to the OASIS and FIRST communities at the Borderless Cyber Conference in Prague in 2017. (The slides from this are available for download here.) Our team took this to the next level as we started to include not just standard data structures in our work, but standardized libraries, including MITRE's ATT&CK (Adversarial Tactics, Techniques & Common Knowledge) framework that now forms a core part of our TTP (and, to some extent, Threat Actor) mapping across our knowledge base. We couldn't have done it without the awesome folk at OASIS and MITRE. Those communities are still our cultural home.

So far, so good… but largely academic. The one thing I always say to teams who start planning their CTI journeys is: "Deploy your theory to practice ASAP – because it will change." CTI suppliers know this all too well. In the ensuing months of our threat intel team, we faced the challenge of merging these supplier sources in to a centralized knowledge base. We're currently up to 38 unique source organizations (with 50+ unique feeds across those suppliers), around a third of those being top-flight commercial suppliers. And, of course, even in this age of STIX, and MISP, we still see the full spectrum of implementations from those suppliers. Don't get me wrong – universal STIX adoption is a utopia (this is my version of 'memento mori' that I should get my team to say to me every time I go on my evangelism sprees). And we should not expect all suppliers to 'conform' in some totalitarian way. But here is my question to you: Who designs your data model? I would love to meet them.

Now here's the thing: If you're anything like my boss, you probably don't care how the data model is implemented – so long as the customer can get the data fields they need from your feed, what does it matter? REST + JSON everywhere, right? But the future doesn't look like that. The one thing that the STIX standard is teaching people better than most other structured languages is the importance of decentralization. I should be able to use the STIX model to build intelligence in one location and have it be semantically equivalent (though not necessarily the same) as the equivalent built by a different analyst in another location. The two outputs should be logically similar – recognizably so, by some form of automated interpretation that doesn't require polymorphism or a cryptomining rig to calculate – but different enough to capture the unique artistry of the analysts who created them. Those automatically discernible differences are the pinnacle of a shared, structured-intelligence knowledge base that will keep our data relevant, allow for automated cross-referencing and take the industry to the next level.

There is a downside, of course. The cost of implementation is the first hurdle – it may mean reengineering a data model and maybe even complete rebuilds of knowledge repositories. With any luck, it can just be a semantic modelling (similar to what I presented at Borderless Cyber, but instead of STIX 1.2 à STIX 2.1, just à STIX 2.1) that you can describe with some simple mapping and retain your retcon. But perhaps the biggest elephant in the room is that aligning all suppliers to a common data model means leaving people open to de-duplication and cross-referencing. As we start to unify our data models, that "super-secret source" that was actually just a re-package of some low-profile, open source feed is going to get doxed. We think this is a good thing – data quality, uniqueness and provenance will speak for themselves, and those suppliers who vend noise will lose business. This should be an opportunity rather than a threat, and hopefully it will reinforce supplier business models to provide truly valuable intelligence to customers.

About the author: Chris O'Brien is the Director Intelligence Operations at EclecticIQ. Prior to his current role, Chris held the post of Deputy Technical Director at NCSC UK specialising in technical knowledge management to support rapid response to cyber incidents.