The bottom layer of Elastic is the open source library Lucene. However, you can’t use Lucene directly, you have to write your own code to call its interface. Elastic is a package of Lucene that provides an operational interface to the REST API, out of the box. The bottom layer of Elastic is the open source library. However, you can’t use Lucene directly, you have to write your own code to call its interface. Elastic is a package of Lucene that provides an operational interface to the REST API, out of the box.
First, the basic concept in ES
Represents a cluster. There are multiple nodes in the cluster. One of them is the master node. This master node can be elected. The master and slave nodes are internal to the cluster. A concept of es is decentralization. The literal understanding is that there is no central node. This is for the outside of the cluster, because from the outside, the es cluster is logically a whole, and you communicate with any node and the whole. Es cluster communication is equivalent.
On behalf of index shards, es can divide a complete index into multiple shards. This has the advantage of splitting a large index into multiple nodes and distributing them to different nodes. Form a distributed search. The number of shards can only be specified before the index is created, and cannot be changed after the index is created.
On behalf of the index copy, es can set a copy of multiple indexes. The role of the copy is to improve the fault tolerance of the system. When a certain piece of a node is damaged or lost, it can be recovered from the copy. The second is to improve the efficiency of es query, es will automatically load balance the search request.
Representing data recovery or data redistribution, es will re-allocate index fragments according to the load of the machine when a node joins or exits, and data recovery will be performed when the suspended node restarts.
A data source representing es, and a method of synchronizing data to es by other storage methods (such as databases). It is an es service that exists as a plugin. By reading the data in the river and indexing it into es, the official river has couchDB, RabbitMQ, Twitter, and Wikipedia.
Represents the storage mode of the es index snapshot. es defaults to storing the index in memory first, and then persisting to the local hard disk when the memory is full. The gateway stores the index snapshot. When the es cluster is closed and restarted, the index backup data is read from the gateway. Es supports multiple types of gateways, including local file system (default), distributed file system, Hadoop HDFS and amazon s3 cloud storage service.
On behalf of es’ automatic discovery node mechanism, es is a p2p-based system. It first searches for existing nodes through broadcast, and then communicates between nodes through multicast protocols, and also supports peer-to-peer interaction.
Represents the internal node of es or the interaction between the cluster and the client. The default internal interaction is using tcp protocol. At the same time, it supports the transmission protocol of http protocol (json format), thrift, servlet, memcached, zeroMQ, etc. (integrated through plug-in).
Second, the deployment environment
The deployment of Elasticsearch clusters with three CentOS 7.3s requires index fragmentation when deploying Elasticsearch clusters. The following is a brief introduction to index sharding.
system Node name IP
CentOS7.3 Els1 172.18.68.11
CentOS7.3 Els2 172.18.68.12
CentOS7.3 Els3 172.18.68.13
An index in an ES cluster may consist of multiple shards, and each shard can have multiple copies. By dividing a single index into multiple shards, we can handle large indexes that cannot be run on a single server. Simply put, the size of the index is too large, causing efficiency problems. The reason you can’t run is probably memory or storage. Since each shard can have multiple copies, you can increase the load capacity of the query by assigning copies to multiple servers.
Third, deploy Elasticsearch cluster
1. Install JDK
Elasticsearch is based on Java development and is a Java program that runs in Jvm, so the first step is to install JDK.
yum install -y java-1.8.0-openjdk-devel
2. Download elasticsearch
Https://artifacts.elastic.co/downloads/elasticsearch/ is the official site of ELasticsearch. If you need to download the latest version, you can download it from the official website. It can be downloaded to a local computer and then imported into CentOS, or it can be downloaded directly from CentOS.
3. Configuration directory
After the installation is complete, many files will be generated, including configuration file log files, etc. The following are the most important configuration file paths.
/etc/elasticsearch/elasticsearch.yml # els
/etc/elasticsearch/jvm.options # JVM
4. Create a directory for storing data and logs
The data file will grow rapidly with the system running, so the default log file and data file path can not meet our needs, then manually create the log and data file path, you can use NFS, you can use Raid, etc. to facilitate future management and Expansion
chown -R elasticsearch.elasticsearch /els/*
5. Cluster configuration
The most important cluster configuration is two node.nameand network.hosteach node must be unreasonable. Among them node.nameis the node name mainly in Elasticsearch’s own log to distinguish each node information.
discovery.zen.ping.unicast.hostsIt is the node information in the cluster. You can use the IP address and you can use the host name (you must be able to resolve it).
discovery.zen.ping.unicast.hosts: [“172.18.68.11”, “172.18.68.12”,”172.18.68.13″]
Since Elasticsearch is developed in Java, you can /etc/elasticsearch/jvm.optionsset the JVM settings through a configuration file. If there is no special requirement, you can press the default.
However, there are still two of the most important minimum memory -Xmx1gwith the -Xms1gJVM. If it is too small, it will cause Elasticsearch to stop as soon as it starts. Too slow to slow down the system itself
-Xms1g # JVM
7. Start Elasticsearch
Since launching Elasticsearch will automatically start daemon-reload, the last item can be omitted.
systemctl enable elasticsearch.service
systemctl start elasticsearch
Elasticsearch has directly heard the http interface, so you can view some cluster-related information directly using the curl command.
You can use the curl command to get information about the cluster.
_cat stands for viewing information
The nodes are for viewing node information, the default will be displayed as one line, so use the knife? Preety to make the information better display
?preety makes the output information more friendly display
curl -XGET ‘http://172.18.68.11:9200/_cat/nodes?pretty’
172.18.68.12 18 68 0 0.07 0.06 0.05 mdi – els2
172.18.68.13 25 67 0 0.01 0.02 0.05 mdi * els3
172.18.68.11 7 95 0 0.02 0.04 0.05 mdi – els1
If you want to see more about cluster information, current node statistics, etc., you can use the following command to get all the information you can view.
curl -XGET ‘http://172.18.68.11:9200/_cat?pretty’