Role of Data Explosion and ML in Technological Landscape
We are moving from the age of big data to the age of information overload. Enterprises have erected their big data infrastructures, yet they don’t understand how to draw actionable insights from the collected data. Machine learning (ML) can solve the problem of information overload. However, it is successful only when an enterprise has ML infrastructure to clean data, manage features, and deploy ML models in production. Some large enterprises, like Uber, and Facebook have successfully set up that infrastructure, but most are struggling to implement machine learning in production. I think that’s the story, what it takes to actually use ML in production.
The challenges exist at every step of the design and development cycle. The first and foremost challenge is an organizational issue, where the enterprise needs to identify the problems that can be effectively solved with big data. Moving forward into ML and data science development cycle, one see challenges in the form of data cleaning. An enterprise may have a very high-quality training data set, yet if the production data is not clean, the training data is relatively useless. ML can actually help with data cleaning. For instance, Datalogue, a New York-based startup uses transfer learning to help enterprises to identify their personally identifiable information (PII) and standardize it.
"Enterprise IT needs to evolve in response to ML because it is a different programming paradigm"
The next challenge is transforming this clean data into model features. Some tech companies have feature catalogs, a repository of features that data scientists use to build new models. However, most companies need a better way to manage their features. For instance, if a data source is no longer available or highly volatile, you need insights as to how it impacts features and model performance. There’s a lot of monitoring involved due to constantly changing data feeds.
ML-powered solutions require a huge amount of resourcing in terms of data analysts, data scientists, ML developers, and data engineers. Instead, enterprises can use tools, built by start-ups, to make their data teams more productive and so that they do not need to build machine- learning pipelines from scratch.
A lot of public companies are crippled by legacy systems that cannot easily adapt. To disrupt these disadvantaged incumbents, AI-enabled businesses that adopt a verticalized, full-stack approach are emerging. For instance, in the past couple of months, one can see companies that are not trying to sell products or SaaS solutions to advertising agencies, they are trying to build better advertising agencies themselves. Instead of selling property-management software, they are building better property managers using machine learning. More and more, we will see ML- and data-enabled companies that are not trying to sell into the Fortune 500 but are actually displacing them.
The early adopters of ML technology include advertising, healthcare, finance, and IT. These industries have high volume, high-velocity datasets, which may be siloed in different sources. Often, this information is structured or semi-structured, so it’s easier to model.
From an analytical perspective, there are a lot of similarities between start-ups and insurgencies. Both of them are trying to disrupt the incumbent and both have limited incentives to disclose information. Start-ups and insurgencies consist of groups of people who are operating clandestinely while striving for rapid growth. Hence, the tools and analytical framework to analyze startups and insurgencies are similar. In my experience, practices from defense and intelligence are easily translatable into investing. For instance, writing intelligence briefs is similar to writing investment memos. At Mattermark, not only did I build a sourcing product, I was also given the opportunity to work with multiple investors; understanding their workflow, and how they think about data collection, sourcing, and portfolio tracking.
There’s a fight for quality as early-stage investing becomes increasingly competitive. Hence, investors must use data to identify the fastest growing startups before they start a formal fundraising process. This also means that investors must shift their approach to outbound prospecting.
Enterprise IT needs to evolve in response to ML because it is a different programming paradigm. Therefore, our existing databases, data pipelines, monitoring tools, and crash-management solutions are going to fall short when companies start utilizing ML more frequently. However, we can also apply ML to enterprise IT to improve and strengthen it. We can already see AI-driven monitoring, log management and databases solutions that are more adaptive and represent an interesting area for investments.