Top Elasticsearch frequently asked interview questions | JavaInUse



Top Elasticsearch frequently asked interview questions.

In this post we will look at elasticsearch Interview questions. Examples are provided with explanations.


Q: What is ElasticSearch ?
A:
Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

Q: What are the basic operations you can perform on a document ?
A:
The following operations can be performed on documents
a. INDEXING A DOCUMENT USING ELASTICSEARCH.
b. FETCHING DOCUMENTS USING ELASTICSEARCH.
c. UPDATING DOCUMENTS USING ELASTICSEARCH.
d. DELETING DOCUMENTS USING ELASTICSEARCH.
Perform basic operations with Elasticsearch.

Q: What is inverted index in Elasticsearch ?
A:
Inverted index is the heart of search engines. The primary goal of a search engine is to provide speedy searches while finding the documents in which our search terms occur. Inverted index is a hashmap like data structure that directs users from a word to a document or a web page. It is the heart of search engines. Its main goal is to provide quick searches for finding data from millions of documents.
Usually in Books we have inverted indexes as below. Based on the word we can thus find the page on which the word exists.
Consider the following statements
  • javainuse is a good website
  • javainuse is one of the good websites.
For indexing purpose the above text are tokenized into separate terms and all the unique terms are stored inside the index with information such as in which document this term appears and what is the term position in that document.
So the inverted index for the document text will be as follows-
When you search for the term website OR websites, the query is executed against the inverted index and the terms are looked out for, and the documents where these terms appear are quickly identified.

Q: What is a cluster in ElasticSearch ?
Q: What is a node in ElasticSearch ?
Q: What is an index in ElasticSearch ?
Q: What is a document in ElasticSearch ?
Q: What is a type in ElasticSearch ?
A:
Please refer-Understanding Elasticsearch Cluster, Node, Index and Document using example.

  • Cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
  • Node is a single server that is part of the cluster. It stores the data and participates in the clusters indexing and search capabilities.
  • Index is like a ‘database’ in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.
    MySQL => Databases
    ElasticSearch => Indices
  • Document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.
    MySQL => Databases => Tables => Columns/Rows
    ElasticSearch => Indices => Types => Documents with Properties
  • Type is a logical category/partition of index whose semantics is completely upto the user.
Q: Does ElasticSearch have a schema ?
A:
Yes, Elastic search can have a schema. A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document. The schema in Elasticsearch is a mapping that describes the the fields in the JSON documents along with their data type, as well as how they should be indexed in the Lucene indexes that lie under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”.
Elasticsearch has the ability to be schema-less, which means that documents can be indexed without explicitly providing a schema. If you do not specify a mapping, Elasticsearch will by default generate one dynamically when detecting new fields in documents during indexing.

Q: What is a shard in ElasticSearch ?
A:
In most environments, each node runs on a separate box or virtual machine.
  • index – In Elasticsearch, an index is a collection of documents.
  • shard – Because Elasticsearch is a distributed search engine, an index is usually split into elements known as shards that are distributed across multiple nodes.


Q: What is a replica in ElasticSearch ? ?
A:
An index is broken into shards in order to distribute them and scale. Replicas are copies of the shards. A node is a running instance of elastic search which belongs to a cluster. A cluster consists of one or more nodes which share the same cluster name.


Q: What is an Analyzer in ElasticSearch ?
A:
While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index.
Analyzers are composed of a single Tokenizer and zero or more TokenFilters. The tokenizer may be preceded by one or more CharFilters. The analysis module allows you to register Analyzers under logical names which can then be referenced either in mapping definitions or in certain APIs.
Elasticsearch comes with a number of prebuilt analyzers which are ready to use. Alternatively, you can combine the built in character filters, tokenizers and token filters to create custom analyzers.

Q: What is a Tokenizer in ElasticSearch ?
A:
Tokenizers are used to break a string down into a stream of terms or tokens. A simple tokenizer might split the string up into terms wherever it encounters whitespace or punctuation. Elasticsearch has a number of built in tokenizers which can be used to build custom analyzers.

Q: What is a Filter in ElasticSearch ?
A:
After data is processed by Tokenizer, the same is processed by Filter, before indexing.


Q: What is the is use of attributes- enabled, index and store ?
A:
  • The enabled attribute applies to various ElasticSearch specific/created fields such as _index and _size. User-supplied fields do not have an "enabled" attribute.
  • Store means the data is stored by Lucene will return this data if asked. Stored fields are not necessarily searchable. By default, fields are not stored, but full source is. Since you want the defaults (which makes sense), simply do not set the store attribute.
  • The index attribute is used for searching. Only indexed fields can be searched. The reason for the differentiation is that indexed fields are transformed during analysis, so you cannot retrieve the original data if it is required.


See Also

Top Java Data Structures and Algorithm Interview Questions
Elasticsearch Tutorial- Download and install Elasticsearch. Perform basic operations with Elasticsearch. Installing the Head Plugin for Elasticsearch. Understanding Elasticsearch Cluster, Node, Index and Document using example. Elasticsearch-Main Menu.