Working with the Vector Index
Introduction
The Vector Index is part of the TopBraid AI Services. It facilitates similarity searches based on AI language models. This document describes how to enable the Vector Index for an asset collection and how to use it for Crosswalks and the AutoClassifier.
Enable the Vector Index
Enabling the Vector Index for an Asset Collection
All the TopBraid AI Service features, including the Vector Index, are bundled in the TopBraid AI Service
collection that needs to be included.
To do that, go to Settings
, Includes
:

Search on the top by name for AI Service
, check it, and press next
:

It’s required to define the classes and properties which should be used by the Vector Index.
That configuration can be found on the start page of the asset collection.
On the right, select Vector Index Configuration
.

Select the classes that should be indexed.
The screenshot shows an example for a taxonomy.
All instances of Concept
will be indexed with the content of the properties preferred label
and alternative label
.
Additional properties, describing the instance, like description
should be added if they are used.
Each instance requires a label for the indexing.
Based on the order, the first property that can be found will be used.
Properties marked as keyword will be used for keyword or hybrid search method.
Only label properties should be configured as keyword properties.
Description properties may contain keywords of related resources and could distort the search results.
It’s required to mark at least one property as a keyword property.

If there are already instances of classes that should be indexed before the index was created, it’s required to push them initially.
This can be done using the Push to Vector Index
Modify action shown below.
All changes made after enabling the Vector index will be synchronized automatically.

The indexing can be done in foreground showing the progress, or in background. If there are more than 200 instances to index, run as background job should be set to true. In that case a notification will be shown once the indexing is done or if any errors have appeared.

Changing the Vector Index Configuration
It’s required to reindex the Vector Index when the configuration was changed (classes and properties have been added or removed). To reindex, perform the following steps:
Delete the Vector Index
Create the Vector Index
Push to Vector Index
Using the Vector Index in a Content Tag Set
Any asset collection for which the Vector Index has been enabled can be used by the AutoClassifier in the Content Tag Set. Unlike using Maui Server, this method doesn’t require a training step.
See also
See Content Classification in EDG for a detailed guide on content classification.
After creating a Content Tag Set, the AutoClassifier must be configured.
Go to Manage
, Advanced
, Configure AutoClassifier
.

Under Content properties
, select all properties with content that should be used by the AutoClassifier.
In this example, content
and title
are used but other properties like filename
can be of interest if the documents have meaningful filenames.
The Tag Selection Strategy
acts as a filter on the concepts of the taxonomy.
In this example, only the most specific tags are used to ignore concepts with child nodes.
The Probability threshold
must be adapted to the Content Tag Set.
Each combination of a corpus and a taxonomy has their own reasonable threshold.
Check some documents in the Taggings
tab to find a good threshold value.
Once finished, press the Save Changes
button.

The Taggings
tab should show documents from the corpus.
Select one to see concepts found by the AutoClassifier in Recommended Concepts
.

Use the Vector Index for AI Linking
AI Linking leverages the Vector Index to add properties that refer to other asset collections using the search methods provided by the Vector Index. Based on properties given in the configuration, AI Linking will search for matching resources in the target asset collection. Any asset collection for which the Vector Index has been enabled can be used as a target for AI Linking.
Enable AI Linking for a property
AI Linking must be configured in the property shape in an ontology. The following example shows how to enable AI Linking for the related property of SKOS Concept. The content of the source properties is used to search for matching resources in the target asset collection. For the related property, definition and preferred label should contain the information that the target should match. In other cases, there could be a dedicated literal that matches better. For example, if there is already a literal for brand and AI Linking should add a property to a catalog of brands, that literal property should be used as the source. The asset collection that contains the target must be selected in target graph. A search options data structure must be added where further settings can be configured. The search options can be used by multiple AI Linking property configurations.

Search options can be used to tweak the search for better results. The most important settings are:
Parameter |
Description |
---|---|
search alpha |
The relative weight of keyword and vector search for the hybrid search in the range between 0 and 1. 0 is pure keyword search, and 1 is pure vector search. As the hybrid search includes a normalization step, setting this value to 0 or 1 may not give the same result as changing the method to keyword or vector. |
search limit |
The upper limit of results that will be shown. |
search method |
The search method used by the Vector Index. - exactPhraseMatch uses the keyword search to find full matches of a phrase. For example, New York doesn’t match York, only New York. - hybrid combines the results of keyword and vector. It gives a high probability to exact matches and adds semantic similarity to the mix. As it gives the best results for most use cases, it’s used as default. - keyword uses BM25 to rank exact matches. - vector uses the configured embedding model to calculate the cosine similarity as base for the probability. |
search threshold |
A threshold value for the search score. Only results with a score value greater or equal to the threshold will be shown. |

Note
A separate ontology asset collection should be used for the AI Linking configuration if the underlying ontology is generic and not designed for a single target. The ontology for the configuration can be added under Settings → Includes, like in the following example where SKOS arXiv AI Linking contains the AI Linking configuration.

Applying AI Linking suggestions
AI Linking based suggestions are shown in the Problems and Suggestions panel. Run AI Linking Suggestions must be enabled in the dropdown menu on the top right. It can be used combined with other actions. If this is not wanted, all other Run actions should be unchecked. The Apply button will create the suggested property. If the property shape allows multiple values, multiple suggestions can be applied.

Note
Problems and Suggestions can be triggered for smaller batches using batch actions. In the tree of the Taxonomy Concepts panel, batch actions can be triggered in the dropdown menu that opens on right click.
Use the Vector Index for Crosswalks
Any asset collection for which the Vector Index has been enabled can be used as a target in a Crosswalk.
See also
See Working with Crosswalks for a detailed guide on the crosswalk asset collection type.
After creating a Crosswalk, the matching method needs to be changed.
That configuration can be found on the start page of the asset collection.
On the right, select Crosswalk Configuration
.

In the Crosswalk configuration, change the label matching method
to vector index
.

Run the Problems and Suggestions to see the recommendations based on the Vector Index.
Use the Vector Index in Code
The Vector Index provides APIs for programmatic access.
SPARQL functions
Functions for the Vector Index are available in the AI service namespace: http://ai.topbraid.org/ai-service#
.
You can leverage the Vector Index’s text search within a SPARQL query using the vectorIndexSearch
function.
Below is a simple example that includes a filter to retrieve only results above a specified threshold.
This search is combined with a pattern to narrow the results to a subset of a taxonomy:
PREFIX ai: <http://ai.topbraid.org/ai-service#>
PREFIX g: <http://topquadrant.com/ns/examples/geography#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT * WHERE {
"island" ai:vectorIndexSearch (?term ?score).
?term skos:broader* g:Asia.
FILTER(?score > 0.85)
}