What is Elasticsearch

Elasticsearch is a distributed, open source NoSQL database built with Java. This means that it stores data in a way that you cannot use SQL to query it. Elasticsearch is a server that process JSON requests and give back JSON data.

Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. It uses a structure based on documents instead of tables and schemas and comes with extensive REST APIs for storing and searching the data.

Installation and Setting Up

Install Elasticsearch on macOS with Homebrew

1
% brew tap elastic/tap
1
% brew install elastic/tap/elasticsearch-full

Check elasticsearch version

1
2
$ elasticsearch --version 
=> (Version: 7.17.4, Build: default/tar/79878662c54c886ae89206c685d9f1051a9d6411/2022-05-18T18:04:20.964345128Z, JVM: 18.0.1.1)

Run elasticsearch

1
$ elasticsearch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ curl localhost:9200 # or visit http://localhost:9200/
=>
{
"name" : "Rafas-MacBook-Pro.local",
"cluster_name" : "elasticsearch_rafaltrojanowski",
"cluster_uuid" : "0EVLwc9hQVS2hPWnizmaqA",
"version" : {
"number" : "7.17.4",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "79878662c54c886ae89206c685d9f1051a9d6411",
"build_date" : "2022-05-18T18:04:20.964345128Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
NOTE

There is also Elasticsearch version 8, but for this tutorial we will use version 7, because version 8 is not available via homebrew. If you want to use latest version please visit Elasticsearch and install it from archive.

Kibana

Kibana is data visualization dashboard software for Elasticsearch. Kibana provides a user-friendly interface to interact with Elasticsearch and transform raw data into visually appealing and informative dashboards, charts, and reports. Kibana has many key features, however we will focus on using it for querying Elasticsearch and retrieving the data. You can see Kibana Dev Tools in action below (you can experiment with queries on the left, while on the right there are json responses)

Install kibana
1
$ brew install elastic/tap/kibana-full

Check kibana version

1
2
kibana --version 
=> 7.17.4

Note: Remember that Kibana version and Elasticsearch version must be the same.

Rails project

1
2
3
$ gem install rails
$ rails new blog --database=postgresql
$ cd blog

Let’s suppose that we have Users and Articles with Comments in our database.
Every user can have many articles, and every article can have multiple comments.

To quickly generate a project structure we can use available Rails generators and Rake tasks, as follows:

1
2
3
4
$ rails g model User first_name:string last_name:string
$ rails g scaffold Article title:string body:text user:references
$ rails g model Comment body:text article:references
$ rake db:create && rake db:migrate
1
2
3
4
class Article < ApplicationRecord
belongs_to :user
has_many :comments, dependent: :destroy
end
1
2
3
class User < ApplicationRecord
has_many :articles, dependent: :destroy
end
1
2
3
class Comment < ApplicationRecord
belongs_to :article
end

Elasticsearch integration

Gemfile
1
2
3
# Add Elasticsearch gems
gem 'elasticsearch-model'
gem 'elasticsearch-rails'

Configure Elasticsearch connection (enable logger)

initializers/elasticsearch.rb
1
2
3
4
5
# Enable Elasticsearch logger for better visibility
Elasticsearch::Model.client = Elasticsearch::Client.new(
url: ENV['ELASTICSEARCH_URL'] || 'http://localhost:9200',
log: true
)

Indexing document

1
2
3
4
class Article < ApplicationRecord
include Searchable
# (...)
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
module Searchable
extend ActiveSupport::Concern

included do
# Include to add the Elasticsearch integration for this model
# (model methods such as search, mapping, import, etc)
include Elasticsearch::Model

# Automatically update Elasticsearch index when the model changes
include Elasticsearch::Model::Callbacks


# Index Mapping
mapping do
# Article fields
indexes :title, type: :text
indexes :body, type: :text
# belongs_to user
indexes :user, type: :object do # object type is default
indexes :first_name, type: :text
indexes :last_name, type: :text
end
# has_many comments
indexes :comments, type: :object do
indexes :body, type: :text
end
end

# Model JSON Serialization
def as_indexed_json(options = {})
as_json(
only: ['title', 'body'],
include: {
user: { only: [:first_name, :last_name] },
comments: { only: :body }
},
)
end

def self.search(query)
# ...
end
end
end
Gemfile
1
2
3
group :development, :test do
gem 'ffaker'
end

Let’s create index and seed data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Article.__elasticsearch__.create_index! force: true

# Create users
5.times do |i|
User.create!(first_name: FFaker::Name.first_name, last_name: FFaker::Name.last_name)
end

# Create articles
amount = 100_000
amount.times do |i|
a = Article.create!(
title: FFaker::Book.title,
body: FFaker::Tweet.body,
user: User.all.sample
)

# Create comments
(0..3).to_a.sample.times do
a.comments << Comment.new(body: FFaker::FreedomIpsum.paragraph)
a.save
end
puts i
end

Article.import # import all data to Elasticsearch

Getting documents

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-get.html#docs-get-api-request

GET articles/_doc/1
{
"_index" : "articles",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Bloody Monster",
"body" : "Voluptate laudantium expedita commodi hic odio neque quisquam. Deleniti cupiditate mollitia animi aspernatur. Ex perferendis repudiandae.",
"user" : {
"first_name" : "Willis",
"last_name" : "Nitzsche"
},
"comments" : [
{
"body" : "Dallas cowboys 7-Eleven MGD 18-wheeler Harley Davidson anti-metric system. Super bowl propane tanks NASA jean shorts potato salad drone strike MOPAR tomahawk cruise missile velcro. John cena juicy flame-grilled shock and awe extra beef national security monster truck rally Fox News Call of Duty Starbucks. Garth brooks Applebee's apple pie Championship Pro Bass Fishing John Cena condiments extra beef."
}
]
}
}

Searching

Match all

The most simple query, which matches all documents, giving them all a _score of 1.0.

1
2
3
4
5
6
GET articles/_search
{
"query": {
"match_all": {}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
module Searchable
included do
def self.match_all
params = {
query: {
match_all: {}
}
}

self.__elasticsearch__.search(params)
end
end
end

Example:

1
2
3
4
5
query = Article.match_all
query.results.total
=> 10000
query.results.size
=> 10

Track total hits

If you need the accurate number of hits. Total hit count can’t be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The default is set to 10,000

1
2
3
4
5
6
7
8
9
10
11
12
13
14
module Searchable
included do
def self.match_all
params = {
track_total_hits: true,
query: {
match_all: {}
}
}

self.__elasticsearch__.search(params)
end
end
end

Example:

1
2
3
query = Article.match_all
query.results.total
=> 100000
Size

The size parameter tells how many hits should be returned in the response.
Maximum value of size can be 10000, default size is 10. We can change it by passing size in a query and increase result window.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
module Searchable
included do
def self.match_all
params = {
size: 10000,
query: {
match_all: {}
}
}

self.__elasticsearch__.search(params)
end
end
end
1
2
3
query = Article.match_all
query.results.size
=> 10000

Multi match

The next query that we’ll explore is multi-match query. It searches across multiple fields. The ruby code responsible for building that search query is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
module Searchable
included do
def self.search(query)
params = {
query: {
multi_match: {
query: query,
"fields": [
"title",
"body",
"user.first_name",
"user.last_name",
"comments.body"
]
}
}
}

self.__elasticsearch__.search(params)
end
end
end

Example:

1
2
3
query = Article.search("ninjas")
query.results.total
=> 2668

Relationships

We have a relationships within our articles index and we want to keep index up to date. So every change of comment should update coresponding artile document with that comments and every change in user should update documents with that user. Let’s dive in.

Article.has_many.comments
1
2
3
4
5
class Comment < ApplicationRecord
belongs_to :article, touch: true # add touch: true
# associated object will be touched (the updated_at / updated_on attributes set to current time) when this record is either saved or destroyed.
# (...)
end
Article.belongs_to.user
1
2
3
4
class User < ApplicationRecord
after_update { self.articles.find_each(&:touch) } # add after_update callback
# (...)
end

Note: it can be slow, dependents on how many articles user has
It would be good to run this code asynchronously, for example using Sidekig

Pagination

kaminari

Gemfile
1
gem 'kaminari'
1
2
3
4
5
6
class ArticlesController < ApplicationController
# GET /articles
def index
@articles = Article.match_all.page(params[:page]).records
end
#...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<p style="color: green"><%= notice %></p>

<h1>Articles</h1>

<div id="articles">
<% @articles.each do |article| %>
<%= render article %>
<p>
<%= link_to "Show this article", article %>
</p>
<% end %>
</div>

<%= link_to "New article", new_article_path %>

<%= paginate @articles %>

will_paginate

1
gem 'will_paginate', '~> 4.0'
1
2
3
4
5
6
class ArticlesController < ApplicationController
# GET /articles or /articles.json
def index
@articles = Article.match_all.paginate(page: params[:page]).records
end
# ...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<p style="color: green"><%= notice %></p>

<h1>Articles</h1>

<div id="articles">
<% @articles.each do |article| %>
<%= render article %>
<p>
<%= link_to "Show this article", article %>
</p>
<% end %>
</div>

<%= link_to "New article", new_article_path %>

<%= will_paginate @articles %>

URI search query (simple search query)

Simple query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
query = Article.__elasticsearch__.search("Season of the Champagne Ninjas", track_total_hits: true, size: 20)

# GET articles/_search?q=Season+of+the+Champagne+Ninjas
# {
# "track_total_hits": true,
# "size": 20
# }

# or

# GET articles/_search?q=Season+of+the+Champagne+Ninjas&size=20&track_total_hits=true

query.results.total
=> 80788

Exact match (notice double quotes escaped in search query):

1
2
3
4
5
# GET articles/_search?q="Season of the Champagne Ninjas"
query = Article.__elasticsearch__.search("\"Season of the Champagne Ninjas\"")

query.results.total
=> 5

Specified field

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# GET articles/_search?q=title:"Season of the Champagne Ninja
query = Article.__elasticsearch__.search("title:Champagne Ninjas")

query.results.total
=> 7511

# GET articles/_search?q=body:"Season of the Champagne Ninjas

query = Article.__elasticsearch__.search("body:Champagne Ninjas")

query.results.total
=> 2668

# GET articles/_search?q=body:Hubert
$ user_first_name = User.first.first_name
query = Article.__elasticsearch__.search("body:#{user_first_name}")

$ query.records.total
=> 0

Source:
(https://gist.github.com/amulyakashyap09/e8442735fe576145dfbed809bb089056#simple-search-query)

Took