newbie trying to understand terminology

J

jdepp

Guest
I have just started working with linux and the first project is pretty intense which involves logstash and elasticsearch. I was reading documentation and came across the following terms which may be more related to elasticsearch but I do find mentioned on unix network stuff alot. Hope I could get a better explanation here:

what are shards?
 


You might have already seen this, but elasticsearch.org offers a good definition:

shard
A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by elasticsearch. An index is a logical namespace which points to primary andreplica shards. Other than defining the number of primary and replica shards that an index should have, you never need to refer to shards directly. Instead, your code should deal only with an index. Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes.

Seem similar to worker processes in a database for parallel searches. Each "worker" is assigned a task to do. By splitting a task into smaller parts and allowing a worker to work on each part individually and then recompile the results often leads to faster search/run times similar to multi-threaded processing.

Ultimately, even elasticsearch says, shards are managed automatically and shouldn't be accessed directly within code.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
 
thanks MikeyD; not sure how I missed that. Appreciate it.
 


Top