How DiversIQ Became a Skeptical Optimist About AI

My co-founder Hilary and I have both been involved with different aspects of building data products for over 20 years tackling roughly the same challenge—discovering, filtering, organizing, and analyzing raw information and delivering trusted, timely, decision-useful insights to help stakeholders make better-informed decisions. There have been a lot of technical advancements and tools we have incorporated over the years that have helped streamline our systems and processes and driven efficiencies, but most of what we do is still very resource intensive and requires humans to understand/extract/validate.

Our goal is to discover new sources of information in our scope when published; extract, normalize, and contextualize; and have the data and insights available in our products within 48 hours. We have a robust discovery engine that leverages feeds, website pings, and saved searches combined with manual checks that ensures we know when there’s something new we care about—and this has helped us to scale our research coverage from 500 to over 1,000 companies and now on our way to a target of 4,000+ companies in 2024.

However, up until recently parsing, extracting, and analyzing the info was still very manual, for example:

  • We get an alert that someone joined/left/changed roles at a company (Board of Directors or executive)—press release, SEC filing, bio changes on corporate site, etc.—and need to:
    • Parse the document to see if it contains information we care about
    • Extract the company name, person name, title, effective date, and check if the person already exists in our system
  • We get an alert that there is a new ESG report published by a company and supplementary information (GRI/SASB indices, data sheet) and we need to:
    • Know if and where there is relevant information we care about in those documents
    • Fill out our disclosure/transparency grid with the relevant sources/locations of where quantitative data is reported
    • Extract quantitative and qualitative information in scope and enter in templates to allow for analysis and normalization

We were first pitched AI (and interchangeably ML and NLP)  about 10 years ago, and as recently as mid-2020 took a deep dive into potential solutions. There were some ‘slick’ tools and impressive demos, but we found that nearly all of them over-promised and under-delivered, or there was a huge investment ($1mm+) required to train, customize, and develop the application to deliver real value as opposed to just helping at the margins.

Fast forward to the summer of 2023—everyone’s grandmother was talking about AI at the July 4th barbecue, and Hilary and I figured we could no longer ignore the hype, so we tasked our former data scientist colleague Mitch (who now has his own consulting firm) to build a simple prototype that indexes our source library and prompts ChatPT with entity identification and extraction.

We went in with low expectations but were blown away by the results. In just a week, v1 was able to parse and extract all relevant information about people changes at a 95% accuracy rate, and with some slight iterations it’s now over 99% accurate.

The next test involved adding all people (board members and executives) at a new company we want to cover going back ~10 years. Today, this involves gathering (mostly manually) a list of people (and deduplicating), their roles and titles, start dates and end dates, etc from historical proxies, 10-Ks, and 8-Ks. Again, v1 was built in a few days and Hilary said it saved 90% of the manual work researchers were spending on building a clean list of people and their relevant information.

We are now in the process of testing out some other proxy-specific tasks—with plans on covering 3,000+ US companies and over 80% of proxies published ~March-June there an enormous amount of information to get through in a short amount of time—results so far are promising for CEO pay ratio data, executive and director compensation, director diversity disclosures, and more qualitative information like board diversity policies and shareholder proposals related to diversity and human capital.

In just a few months, we have completely changed the way we think about scaling our business. We will never change what we do and how we are different:

  • The analyst team still need to understand information and add context which may mean reading a footnote or reaching out directly to the company to explain what the ‘leadership’ level means, which could be the C-Suite at one company and manager and above at another
  • Researchers still need to validate how individuals self-identify, including using public profile links (Wikipedia, LinkedIn, etc.), sources mentioning background (articles, interviews, podcasts, etc.), membership in organizations (Latino Corporate Directors Association), voter registration records, and obituaries of relatives

However, we believe that the 80-90% of the time we currently spend on finding, parsing, filtering, extracting, and organizing information into a format that can then be easily analyzed, validated, and uploaded can be replaced by this new research tool powered by AI, freeing up our time to go both broader (more companies) and deeper (more information like policies, benefits, goals, etc.) on coverage and especially deriving insights out of the sea of information and turning it into actionable intelligence.

What Else Are We Working On?

  • We have heard multiple requests from clients and prospects to know whether companies have a Chief Diversity Officer, whether it’s a Sr. Executive position (C-level or reporting to CEO), and to track changes. We recently completed our first prototype for the S&P 100 and are working on an execution plan to roll this out to our full coverage universe
  • Similarly, people want to know whether companies have certain policies or not (Board diversity/Rooney Rule, Human Rights, collective bargaining, etc.) and how those policies align with certain frameworks. We have another prototype for the S&P 100 with a policy matrix linked to sources, and are working on both expanding coverage to our full universe and going deeper on more granular coverage of policies
  • Last month we share another prototype of diversity and human capital goals S&P 100 companies were setting, and are working on building that out for our entire universe (aided by our new friend AI)

If you want to discuss or get access to any of those prototypes please reach out!

Need human capital data for your planning?