Developing useful ontological data requires categories that are narrow enough to reason about. At Indeed, we help people find jobs, and it’s hard to know who is a good fit for a generic “Engineering” job; it needs to be “Civil Engineering”, “Chemical Engineering”, or the like. Once categories get narrow enough to say something about, we start to talk about them as though we are reasoning about individuals: “A truck driver needs a commercial driver’s license.” That’s not quite right—more properly, “Each truck driver needs their own current commercial driver’s license for the state in which they reside”—but because building a sub-graph for the qualifications of every individual truck driver, nurse, software engineer, etc. is both infeasible and redundant, we’ve learned to stop before getting to the level of trying to make inferences about individuals. We can get specific enough to make reliable inferences without putting millions of jobs and job seekers in our knowledge graph.
Creating sensitivity to context is a related challenge. Various ways of categorizing and annotating jobs won’t have the same relevance in every context. If we see that a job seeker has a lot of building trades skills, we should really focus on jobs in the building trades sector, but accounting skills might indicate interest in jobs across almost any sector, because accountants are needed everywhere. Each job has many facets, and knowing which facets are relevant in a given context is critical to providing efficient and nuanced reasoning. Our system distributes knowledge across a graph including taxonomies of SKOS concepts, which serve as what we might call “first-order” types, and an ontology including both customized relations that support inferences based on those concepts, and RDFS classes that serve as “meta-types”, supporting reasoning based on contexts and use cases. This knowledge graph allows us to reason about individual documents on-demand and apply sophisticated metadata at scale. In building it, we’ve learned a lot about when to tolerate ambiguity boundaries between types, or even between types and individuals. Documents like job postings, résumés, and company profiles are written by humans in natural language, and reasoning about them cannot be perfect. To make matters worse, we have to operate at scale and to a great extent in real time. Leveraging distinct layers of type analysis, and having a sense of humor, are our best tools for maintaining the performance and precision of our scaled metadata application.