Methodology

How the MarComm Hub directory is sourced, structured, enriched, and kept honest.

Sourcing

The directory is generated from public, structured knowledge about the global marketing services industry. The primary source is the Wikidata SPARQL endpoint at https://query.wikidata.org/sparql. We query for entities classified as advertising agencies (wd:Q1287945), marketing agencies (wd:Q2519449), public relations firms (wd:Q4201895), media agencies (wd:Q1130349), digital agencies (wd:Q2992826), and brand consultancies (wd:Q1751889).

For each match we pull the canonical name, English description, country, headquarters city, founding year, and official website where present. The exact source used for the current build is logged in the site's meta.json: Wikidata SPARQL (https://query.wikidata.org/).

Fallback sources

If the primary source returns fewer than 100 entries, the seed script falls back to publicly listed industry directories — including The Drum's agency network listings and the PRCA's PR agency directory — to ensure the directory has enough breadth to be useful as a starting point.

Normalisation

Country names are normalised to a single canonical form (so "United States of America" and "United States" become one). Each country is mapped to one of five super-regions: North America, Europe, Asia-Pacific, Latin America, and Middle East & Africa. Each entity is assigned one or more specialization tags drawn from the Wikidata class it matched against.

Editorial enrichment

Each agency entry is paired with a templated editorial profile that explains how an integrated engagement with that firm typically runs and where the firm sits in the broader IMC stack. The templates are explicit about what is structured data and what is interpretation, and they avoid claims the structured data does not support.

What is excluded

The directory excludes entities whose only public classification is a generic "organization" or "company" with no marketing-services tag. It also excludes entries whose canonical name resolves to a Wikidata Q-identifier rather than a human-readable label. These exclusions trade some recall for substantially better precision.

Refresh cadence

The directory is regenerated by re-running the seed script (php seed.php). Every regeneration writes a fresh meta.json with a timestamp, source attribution, and counts.

Limitations to keep in mind

  • Wikidata coverage skews toward larger and more historically established firms; many strong independents are missing.
  • Specialization tags are derived from the agency's Wikidata classification, not from a current capability audit.
  • Region assignment is based on country of headquarters and does not reflect global delivery footprints.

Read the methodology as a working description of the current build, not as a permanent specification.