Knowledge Base

Overview
Think of words in a query as textual symbols of information with a surplus of potential meanings. The invisible space that connects these words to form a query is where user intent lies. Our proprietary Knowledge Base stores and classifies the numerous meanings of words, applies machine learning to distinguish the connection between them and calls upon contextual data from across the web for the most relevant answers.

While some queries are straightforward, there are others that have ambiguity leading to a variety of interpretations.  The ability to weigh these interpretations against a database that functions in a way which aligns with the human brain, allows users to submit a query according to the words formed in their thoughts, independent of how they position those words in a textual phrase.

From the cognitive perspective, the KB provides an efficient approximation to general, domain-specific and fact-based human knowledge. Concepts are organized by layers of abstraction ranging from individuals to top-level categories and interlinked via meaningfully labeled links such as “author of”, “capital of” and “has part of”. This type of relation offers a structured world model, the core of which is shared across multiple cultures and languages, making the language-independent core of the KB easy to maintain and reuse for multilingual applications.

From the semantic perspective, the KB serves as a lingua franca for our Natural Language Understanding system capable of expressing the meaning of any natural language input, whether queries or documents, in terms of ontological concepts. These semantic representations serve as structured, language-independent statements easily accessible to experts or any other question-answering and search components.

From the behavioral perspective, the KB provides the conceptual “scaffolding” necessary to support a gamut of practical reasoning tasks from learning about a specific recipe to locating nearby tourist attractions, comparing currency values or car prices, answering trivia questions or finding redundancies and contradictions in extracted facts. Additionally, from the text processing perspective, the KB supports a specialized disambiguation module called the “ontological proximity” module. This component draws heavily on a well-known computational method called “spreading activation” and attempts to resolve lexical ambiguity by favoring concepts discovered to be proximate in the underlying KB.

Three Fundamental Layers
 – Ontology (language-independent graph)
 – Lexicon (language specific list of expressions associated with concepts)
 – Onomasticon (list of entities, like persons, places, organizations)

Ontology
NTENT’s ontology is general-purpose, meaning it is not restricted to a particular domain. Branching happens at the top level with an entity-event-attribute-relation pathway to extensive information classification across multiple domains like sports, automotive, geography, medicine and more.

Our large-scale ontology includes millions of concepts, relations and expressions. We refer to the core ontological units as “synsets” and “relations”. Synsets represent a segment of concepts considered semantically synonymous for the purpose of collecting data. Relations symbolize connections to concepts via links, predicates and properties. Mathematically, concepts are plotted as nodes in a graph with labeled, directed edges.

We capture knowledge using Semantic Web technologies like Resource Description Framework (RDF), including a type of RDF query language called SPARQL and Web Ontology Language (OWL) to construct the knowledge base for various domains.

Language Independent: Our ontology represents knowledge independently of how languages express it. Our method argues against representing language-specific properties in the ontology because, while they are relevant in text processing, they work orthogonal to knowledge or semantics. For example, inflectional changes that alter the grammatical function of a word, derivational morphology that builds new words from existing ones, syntactic properties that govern the structure of sentences and the various phonetic sounds each language applies to word patterns, work independently of the collected meanings and relationships between these words. Including them would counter the intended purpose of creating a uniform ontology that can be separately tailored to suit any language.

Similar to the multitude of neural pathways in the human brain, our ontology is represented visually as a vast, interconnected graph of concepts comprised of Individuals, Classes, Attributes and Relations. We use multiple types of relations to link concepts. For example, “Microsoft is a company”, connects to the fact Microsoft also has a CEO, develops software and has headquarters based in Seattle.

Language Specific Lexicon
Language Specific Lexicons are particular concepts that correlate to different words depending on the language, such as “shoe” in English, “chaussure” in French and “ботинок” in Russian. Compositional rules that capture the potentially infinite combinations of concepts encoded in specific languages. For example, a single rule can represent thousands of possible ways to detect the relationship between two words that connect nationality to a type of person like “German athlete”, “Italian lawyer”, “Japanese writer” and “French actor.”

Our sophisticated lexicon is able to instinctively detect polysemy within a chain of text. Polysemy refers to a word, phrase or symbol that has more than one meaning, but each potential meaning is drawn from the computed relationships within the network of words themselves.  For example, the word “bond” could mean a financial instrument, a type of glue, a character from a spy movie or a chemical phenomenon. By anchoring the word “bond” to all available and appropriate concepts within the ontology, we enable the system to access the various interpretations of that word, immediately and directly during query analysis or document indexing.

Onomasticon
An onomasticon is a list or collection of proper names, or a list or collection of specialized terms, as those used in a particular field or subject.  The internal onomasticon which is made up of imports from third party sources. The platform leverages data from open source repositories (both open and closed), giving us deep insight into specific topics that we do not have to maintain ourselves. These include sources like Wikipedia for information on famous people, events and organizations or Gazetteers for answers on geographic locations like mountains, rivers, cities and more.

Conclusion
Our brains are wired to store knowledge, reason on it and put it to use. For example, we learn how a car works, the rules of the road and how to read traffic signs that we put in practice behind the wheel for the purpose of transportation. Our technology works the same in that it collects data and rates various possibilities of intent connecting the data, to offer the most likely solution to a query. In a query that contains the word ‘target,’ our technology knows ‘target’ is a type of bullseye, a stated goal or a department store, and then chooses which one depending on the relationship to the rest of the query. NTENT uses its Knowledge Base to humanize queries through an extensive Ontology, Language Specific Lexicons and Open Source Knowledge Repositories.