Home
Arenadata Hyperwave
Concepts
Architecture and data model
Solr
Overview

Solr overview

Konstantin Alpashkin

Contents

Indexing
Querying
Ranking the results

Solr is a search server, and it deals with large sets of data. Since Solr can also store data, it is a NoSQL, Non-Relational storage, and a processing technology.

When you submit a query to the Solr search server, it separates queries into different pieces/entities, then matches the query against the document’s inverted index that was created earlier. The Solr search server returns a set of documents as a response based on the similarity in class or other characteristics defined in the schema.xml and solr.config files.

Solr follows a three-step process of indexing, querying, and ranking.

Indexing

There are various methods using which Solr indexes documents and other rich text-based data. Solr allows users to directly upload their documents in PDF, CSV, XML formats, and the system can read and index data from these sources automatically. Further, it can also upload texts and documents from email and attachments.

Solr uses an inverted index to store data where it uses keyword-centric rather than page-centric data structure; a simpler way to understand the concept is how words are indexed at the end of any book where the word on the page is mentioned along with its meaning. Hence, it can achieve a faster response time and gives relevant search results in no time.

Querying

A query can be anything like searching for text, images, or geolocation. When a query is sent, Solr processes it with a query handler which returns the document from the Solr index.

Ranking the results

As the system is matching the query with the data from the indexed files based on keywords, it ranks the results based on the relevance. This process creates a hierarchy of results based on relevance.

Found a mistake? Seleсt text and press Ctrl+Enter to report it