newbie trying to understand terminology

jdepp · Mar 18, 2014

I have just started working with linux and the first project is pretty intense which involves logstash and elasticsearch. I was reading documentation and came across the following terms which may be more related to elasticsearch but I do find mentioned on unix network stuff alot. Hope I could get a better explanation here:

what are shards?

MikeyD · Mar 18, 2014

You might have already seen this, but elasticsearch.org offers a good definition:

shard
A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by elasticsearch. An index is a logical namespace which points to primary andreplica shards. Other than defining the number of primary and replica shards that an index should have, you never need to refer to shards directly. Instead, your code should deal only with an index. Elasticsearch distributes shards amongst all nodes in the cluster, and can move shards automatically from one node to another in the case of node failure, or the addition of new nodes.

Seem similar to worker processes in a database for parallel searches. Each "worker" is assigned a task to do. By splitting a task into smaller parts and allowing a worker to work on each part individually and then recompile the results often leads to faster search/run times similar to multi-threaded processing.

Ultimately, even elasticsearch says, shards are managed automatically and shouldn't be accessed directly within code.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html

jdepp · Mar 18, 2014

thanks MikeyD; not sure how I missed that. Appreciate it.

newbie trying to understand terminology

jdepp

Guest

MikeyD

Guest

jdepp

Guest

Members online

Latest posts