You can use the Atlas Search autocomplete type to index text values in
string fields for autocompletion. You can query fields indexed as
autocomplete type using the autocomplete operator.
You can also use the autocomplete type to index:
Fields whose value is an array of strings. To learn more, see How to Index the Elements of an Array.
String fields inside an array of documents indexed as the embeddedDocuments type.
Tip
If you have a large number of documents and a wide range of data
against which you want to run Atlas Search queries using the
autocomplete operator, building this index can take
some time. Alternatively, you can create a separate index with only
the autocomplete type to reduce the impact on other indexes and
queries while the index builds.
To learn more, see Atlas Search Index Performance Considerations.
Atlas Search doesn't dynamically index
fields of type autocomplete. You must use static mappings to index autocomplete fields. You can
use the Visual Editor or the JSON Editor in the Atlas UI
to index fields of type autocomplete.
Define the Index for the autocomplete Type
To define the index for the autocomplete type, choose your preferred
configuration method in the Atlas UI and then select the
database and collection.
Click Refine Your Index to configure your index.
In the Field Mappings section, click Add Field to open the Add Field Mapping window.
Click Customized Configuration.
Select the field to index from the Field Name dropdown.
Note
You can't index fields that contain the dollar (
$) sign at the start of the field name.For field names that contain the term
emailorurl, the Atlas Search Visual Editor recommends using a custom analyzer with the uaxUrlEmail tokenizer for indexing email addresses or URL values. Click Create urlEmailAnalyzer to create and apply the custom analyzer to the Autocomplete Properties for the field.Click the Data Type dropdown and select Autocomplete.
(Optional) Expand and configure the Token Properties for the field. To learn more, see Configure
tokenField Properties.Click Add.
The following is the JSON syntax for the autocomplete type.
Replace the default index definition with the following. To learn more
about the fields, see Field Properties.
1 { 2 "mappings": { 3 "dynamic": true|false, 4 "fields": { 5 "<field-name>": { 6 "type": "autocomplete", 7 "analyzer": "<lucene-analyzer>", 8 "tokenization": "edgeGram|rightEdgeGram|nGram", 9 "minGrams": <2>, 10 "maxGrams": <15>, 11 "foldDiacritics": true|false 12 } 13 } 14 } 15 }
Configure autocomplete Field Properties
The Atlas Search autocomplete type takes the following parameters:
Option | Type | Necessity | Description | Default | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| string | required | Human-readable label that identifies this field type. Value must be string. | |||||||||||||||||||||||||||||||||||||||||||||||||
| string | optional | Name of the analyzer to use with this
autocomplete mapping. You can use any Atlas Search analyzer except the
|
| ||||||||||||||||||||||||||||||||||||||||||||||||
| int | optional | Maximum number of characters per indexed sequence. The
value limits the character length of indexed tokens. When you
search for terms longer than the |
| ||||||||||||||||||||||||||||||||||||||||||||||||
| int | optional | Minimum number of characters per indexed sequence. We
recommend |
| ||||||||||||||||||||||||||||||||||||||||||||||||
| enum | optional | Tokenization strategy to use when indexing the field for autocompletion. Value can be one of the following:
When tokenized with a Indexing a field for autocomplete with an For the specified tokenization strategy, Atlas Search applies the
following process to concatenate sequential tokens before
emitting them. This process is sometimes referred to as
"shingling". Atlas Search emits tokens between
|
| ||||||||||||||||||||||||||||||||||||||||||||||||
| boolean | optional | Flag that indicates whether to perform normalizations such as including or removing diacritics from the indexed text. Value can be one of the following:
|
|
Try an Example for the autocomplete Type
The following index definition example uses the sample_mflix.movies collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
The following index definition example indexes only the title
field as the autocomplete type to support search-as-you-type
queries against that field using the autocomplete
operator. The index definition also specifies the following:
Use the standard analyzer to divide text values into terms based on word boundaries.
Use the
edgeGramtokenization strategy to index characters starting at the left side of the words .Index a minimum of
3characters per indexed sequence.Index a maximum of
5characters per indexed sequence.Include diacritic marks in the index and query text.
In the Add Field Mapping window, select title from the Field Name dropdown.
Click the Data Type dropdown and select Autocomplete.
Make the following changes to the Autocomplete Properties:
Max Grams
Set value to
5.Min Grams
Set value to
3.Tokenization
Select
edgeGramfrom dropdown.Fold Diacritics
Select
falsefrom dropdown.Click Add.
Replace the default index definition with the following index definition.
1 { 2 "mappings": { 3 "dynamic": false, 4 "fields": { 5 "title": { 6 "type": "autocomplete", 7 "analyzer": "lucene.standard", 8 "tokenization": "edgeGram", 9 "minGrams": 3, 10 "maxGrams": 5, 11 "foldDiacritics": false 12 } 13 } 14 } 15 }
The following index definition example uses the sample_mflix.movies collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
You can index a field as other types also by specifying the other
types in the array. For example, the following index definition
indexes the title field as the following types:
autocompletetype to support autocompletion for queries using the autocomplete operator.stringtype to support text search using operators such text, phrase, and so on.
In the Add Field Mapping window, select title from the Field Name dropdown.
Click the Data Type dropdown and select Autocomplete.
Make the following changes to the Autocomplete Properties:
Max Grams
Set value to
15.Min Grams
Set value to
2.Tokenization
Select
edgeGramfrom dropdown.Fold Diacritics
Select
falsefrom dropdown.Click Add.
Repeat steps b through d.
Click the Data Type dropdown and select String.
Accept the default String Properties settings and click Add.
Replace the default index definition with the following index definition.
1 { 2 "mappings": { 3 "dynamic": true|false, 4 "fields": { 5 "title": [ 6 { 7 "type": "autocomplete", 8 "analyzer": "lucene.standard", 9 "tokenization": "edgeGram", 10 "minGrams": 2, 11 "maxGrams": 15, 12 "foldDiacritics": false 13 }, 14 { 15 "type": "string" 16 } 17 ] 18 } 19 } 20 }
The following index definition example uses the sample_mflix.users collection. If you have the sample data already loaded on your cluster, you can use the Visual Editor or JSON Editor in the Atlas UI to configure the index. After you select your preferred configuration method, select the database and collection, and refine your index to add field mappings.
The following index definition example indexes only the email
field as the autocomplete type to support search-as-you-type
queries against that field using the autocomplete
operator. The index definition specifies the following:
Use the keyword analyzer to accept a string or array of strings as a parameter and index them as a single term (token).
Use the nGram tokenizer to tokenize text into chunks, or "n-grams", of given sizes.
Index a minimum of
3characters per indexed sequence.Index a maximum of
15characters per indexed sequence.Include diacritic marks in the index and query text.
You can also use the uaxUrlEmail tokenizer to tokenizes
URLs and email addresses. To learn more, see
uaxUrlEmail.
In the Add Field Mapping window, select email from the Field Name dropdown.
Click the Data Type dropdown and select Autocomplete.
Make the following changes to the Autocomplete Properties:
Analyzer
Select lucene.keyword from the dropdown.
Max Grams
Set value to
15.Min Grams
Set value to
3.Tokenization
Select nGram from the dropdown.
Fold Diacritics
Select
falsefrom dropdown.Click Add.
Replace the default index definition with the following index definition.
1 { 2 "mappings": { 3 "dynamic": true, 4 "fields": { 5 "email": { 6 "type": "autocomplete", 7 "analyzer": "lucene.keyword", 8 "tokenization": "nGram", 9 "minGrams": 3, 10 "maxGrams": 15, 11 "foldDiacritics": false 12 } 13 } 14 } 15 }
Learn More
To learn more about the autocomplete operator and see example queries, see autocomplete.