Pages

Wednesday, August 27, 2014

SugarCRM 101: Elasticsearch

I recently found myself doing an extensive amount of work relating to Elasticsearch. While doing so, the usefulness of the O'Reilly Elasticsearch book came to mind, along with the thought: it would be useful to have a list of helpful commands handy

So here it is, a list of commands that will hopefully help answer some of your questions on the manner in which Elasticsearch interacts with Sugar.

Before proceeding, there are some important notes to bear in mind. 

First, where referenced, replace localhost:9200 with the IP/address and port number for your Elasticsearch server. Secondly, the provided examples utilize the curl command line utility. Lastly, the name of the Elasticsearch index is derived from the unique_key value stored in the config.php file of your Sugar instance, thus [index_name] = unique_key


The first thing you might be wondering is how do we go about checking whether Elasticsearch is running/active. Here it is:

curl -XGET 'http://localhost:9200/'

..and here is the output:

{
  • oktrue
  • status200
  • name"Sugar Elasticsearch"
  • -
    version: {
    • number"0.90.10"
    • build_hash"0a5781f44876e8d1c30b6360628d59cb2a7a2bbb"
    • build_timestamp"2014-01-10T10:18:37Z"
    • build_snapshotfalse
    • lucene_version"4.6"
    }
  • tagline"You Know, for Search"
}

The values in red help us confirm that the server is running. Pay special attention to the version number as the referenced version is the only one currently supported by Sugar (as of version 7.2.2). 

Your next area of interest may be relating to viewing the overall list of indexes defined on the server. To view the listing of indexes, use the following:

curl -XGET 'http://localhost:9200/_stats'

...here is a snippet of the output:

{
  • oktrue
  • -
    _shards: {
    • total22
    • successful11
    • failed0
    }
  • ...
  • indices: {
  • df368b8be6406d1d4aa582ba7c44fa49: {
    • -
      primaries: {
      • -
        docs: {
        • count1586
        • deleted0
        }
      • -
        store: {
        • size"386.1kb"
        • size_in_bytes395375
        • throttle_time"0s"
        • throttle_time_in_millis0
        }
      • -
        indexing: {
        • index_total1586
        • index_time"369ms"
        • index_time_in_millis369

As indicated earlier in this post, Sugar index names are derived from the unique_key entry in the config.php file for a given instance. This usually takes the form of a GUID, as in the above example: df368b8be6406d1d4aa582ba7c44fa49

A variation of the previous curl command will allow us to focus on the stats pertaining to just a single index:

curl -XGET 'http://localhost:9200/df368b8be6406d1d4aa582ba7c44fa49/_stats'

You may change the name of an index by simply adjusting the unique_key value in config.php prior to creating it. Do note that a single Elasticsearch server can host multiple indexes (for Sugar or other purposes). This means you should take care when using custom index names as you would not want the name to be shared across multiple Sugar instances.

Also note the docs and store sections that have been highlighted above, as they give us some insight as to the contents of the index. 

Its important to understand that within the context of Elasticsearch, an index simply equates to a container. Once an index is defined, you must then populate it with documents (docs). Each Sugar record that is indexed is essentially a document within an index. This also means it is possible to have an index on the Elasticsearch server that is empty or does not contain any documents. 

Within Sugar, there are two distinct actions that handle the creation of the index and its subsequent population. The former is handled via the Administration panel and the latter is a task run by cron. This underscores the importance of ensuring cron is configured properly or the index will not populate otherwise, in turn leaving the Global Search feature in Sugar in a non-functional state.

What if we want to know more about a given index? To view the definition of the index itself, execute the following command:

curl -XGET 'http://localhost:9200/[index_name]/_mapping'

..said command will return something similar to the following:

{
  • -
    df368b8be6406d1d4aa582ba7c44fa49: {
    • -
      Notes: {
      • -
        properties: {
        • -
          doc_owner: {
          • type"string"
          • index"not_analyzed"
          • -
            norms: {
            • enabledfalse
            }
          • index_options"docs"
          }
        • -
          module: {
          • type"string"
          • index"not_analyzed"
          • -
            norms: {
            • enabledfalse
            }

Those of you familiar with the Sugar database structure or modules will notice there are some commonalities with the structure of the index. 

First, notice that the index includes module names. In the above, we see an excerpt of the definition as it relates to the Notes module. Within that definition we also see a listing of fields that are said to be indexed. In this snippet we see that the fields doc_owner and module are among the indexed fields for the Notes module. When a user performs a search through the Global Search feature in Sugar, it is performing a search on those indexed fields. This list is for the most part static, but can be changed by enabling the option Full Text Search Enabled within the properties of a field in Studio.

One final note of relevance: Elasticsearch will save the data that corresponds to the indexed fields within its datastore. This underscores the importance of not allowing any other system besides the Sugar server to connect to the Elasticsearch server. Because anyone that can access Elasticsearch server would be able to query its indexes, one could inadvertently expose Sugar data to the world at large if Elasticsearch is exposed beyond that.

Hopefully that will get you started and help answer some of your questions pertaining to Sugar and its use of Elasticsearch.

In the next post on the topic we will look at some further examples of how one can interact with Elasticsearch, including how we go about querying the index for documents (records).


No comments:

Post a Comment

Your comments, feedback and suggestions are welcome, but please refrain from using offensive language and/or berating others. Thank you in advance.