Query Understanding

Overview
Words are often ambiguous and can be interpreted in many ways, even by humans. NTENT utilizes  Semantic Ranking, Natural Language Understanding and Machine Learning to establish complex, ontological relationships between words and their intended meaning. Consider the example for the query “bond”.  A possible result for query understanding could be for a financial instrument, the movie character, a chemical reaction or a term for endearment – see diagram below.

Query understanding’s primary objective is to understand the intention behind the query. This implies first predicting the language used to express the query. Second, parsing the query according to that language. Third, extracting the entities and concepts mentioned in the query. Finally, based on all this information, NTENT predicts one or more possible intentions with a certain probability, which is particularly important for ambiguous queries. This result will be one of the inputs of the semantic ranking module.

Semantic Ranking
Semantic ranking refers to ranking search results using semantic information. In a standard search engine, a rank is computed by using signals or features coming from the search query, from the documents in the collection being searched and from the search context, such as the language and device being used. During semantic processing, we add semantic features that come from concepts present in the knowledge base that appear in the query and semantically match documents in the collection. To do this efficiently, all documents are preprocessed semantically to build an index that includes semantic information.

To accomplish semantic ranking, we use machine learning in several stages. The first stage selects the data sources that we should use to answer the query. In the second stage, each data source generates a set of answers using “learning to rank,” a particular instance of machine learning applied to ranking. The third and final stage ranks these data sources, selecting and ordering the intentions as well as the answers inside each intention (e.g., news) that will appear in the final composite answer. All these techniques are language independent, but may use language dependent features.
As an example of the concepts above, consider the query “capital of France.” A standard search engine will return the best documents that contain this phrase, hoping that the answer is included there, which is not always the case. When we couple query understanding with semantic ranking, the search system recognizes “France” as a country and “capital” as an attribute of a country, interprets the answer as Paris, and then gives greater ranking to documents that also include the word Paris.

Levels of Query Understanding
NTENT’s Search platform choreographs the interpretation of singular query constituents, and the dissemination of relevant answers through a specialized combination of Language Detection, Linguistic Processing, Semantic Processing and Pragmatic Processing.

  1. Language Detection: The first step is to understand which language the user is using. Sometimes this is obvious, but many things can make this hard. For example, many users find themselves forced to use a keyboard that makes it hard to use accented characters, so they “ascify” their query. Users sometimes use multiple languages within a single query (“code switching”) or proper names that are the same in many languages.
  2. Linguistic Processing: Every language has its own rules for how text should be broken down into individual words (“tokenized”), whether distinctions of case and accent are significant, how to normalize words to a base form (“lemmatization” or “stemming”), and categorization of words and phrases by parts of speech (“POS tagging”).
  3. Semantic Processing: A traditional keyword search engine would stop after linguistic processing, but NTENT’s technology goes further, and determines what the user’s words actually mean. Many words have multiple meanings (“homonyms”), and many concepts have multiple ways to express them (“synonyms”). Drawing on many sources of information, such as a large-scale ontology, notability data, and the user’s context (e.g., location), we are able to determine all the possible interpretations of the user’s query, and assign a probability to each one. By distinguishing a particular sense of a word, and by knowing which phrases denote a single concept, we are able to improve the precision of our applications. At the same time, by recognizing that multiple expressions refer to the same concept, and also that broad terms encompass narrower ones (“hyponymy”), we are able to improve recall. Furthermore, NTENT is able to analyze the syntax of how multiple words are combined into composite concepts.
  4. Intent Detection (Pragmatic Processing): NTENT goes beyond just the surface semantics of the user’s utterance, and develops hypotheses about why they typed what they did: what their information need is; what transactions they intend to perform; what website they’re looking for; or what local facilities they’re trying to find. This inductive reasoning is key to harnessing NTENT’s extensive set of experts to give the user what they want.