Indexer - Milvus v2 (Recommended)

Milvus Vector Database Introduction

Milvus Vector Retrieval Service is a fully managed database service built on open-source Milvus, providing efficient unstructured data retrieval capabilities suitable for diverse AI scenarios. Customers no longer need to worry about underlying hardware resources, reducing usage costs and improving overall efficiency.

Since the company’s internal Milvus service uses the standard SDK, the EINO-ext community version is applicable.

This package provides a Milvus 2.x (V2 SDK) indexer implementation for the EINO framework, supporting document storage and vector indexing.

Note: This package requires Milvus 2.5+ to support server-side functions (such as BM25), basic functionality is compatible with lower versions.

Features

  • Milvus V2 SDK: Uses the latest milvus-io/milvus/client/v2 SDK
  • Flexible Index Types: Supports multiple index builders including Auto, HNSW, IVF series, SCANN, DiskANN, GPU indexes, and RaBitQ (Milvus 2.6+)
  • Hybrid Search Ready: Native support for hybrid storage of sparse vectors (BM25/SPLADE) and dense vectors
  • Server-side Vector Generation: Automatic sparse vector generation using Milvus Functions (BM25)
  • Automated Management: Automatic handling of collection schema creation, index building, and loading
  • Field Analysis: Configurable text analyzers (supporting Chinese Jieba, English, Standard, etc.)
  • Custom Document Conversion: Flexible mapping from Eino documents to Milvus columns

Installation

go get github.com/cloudwego/eino-ext/components/indexer/milvus2

Quick Start

package main

import (
        "context"
        "log"
        "os"

        "github.com/cloudwego/eino-ext/components/embedding/ark"
        "github.com/cloudwego/eino/schema"
        "github.com/milvus-io/milvus/client/v2/milvusclient"

        milvus2 "github.com/cloudwego/eino-ext/components/indexer/milvus2"
)

func main() {
        // Get environment variables
        addr := os.Getenv("MILVUS_ADDR")
        username := os.Getenv("MILVUS_USERNAME")
        password := os.Getenv("MILVUS_PASSWORD")
        arkApiKey := os.Getenv("ARK_API_KEY")
        arkModel := os.Getenv("ARK_MODEL")

        ctx := context.Background()

        // Create embedding model
        emb, err := ark.NewEmbedder(ctx, &ark.EmbeddingConfig{
                APIKey: arkApiKey,
                Model:  arkModel,
        })
        if err != nil {
                log.Fatalf("Failed to create embedding: %v", err)
                return
        }

        // Create indexer
        indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
                ClientConfig: &milvusclient.ClientConfig{
                        Address:  addr,
                        Username: username,
                        Password: password,
                },
                Collection:   "my_collection",

                Vector: &milvus2.VectorConfig{
                        Dimension:  1024, // Match embedding model dimension
                        MetricType: milvus2.COSINE,
                        IndexBuilder: milvus2.NewHNSWIndexBuilder().WithM(16).WithEfConstruction(200),
                },
                Embedding:    emb,
        })
        if err != nil {
                log.Fatalf("Failed to create indexer: %v", err)
                return
        }
        log.Printf("Indexer created successfully")

        // Store documents
        docs := []*schema.Document{
                {
                        ID:      "doc1",
                        Content: "Milvus is an open-source vector database",
                        MetaData: map[string]any{
                                "category": "database",
                                "year":     2021,
                        },
                },
                {
                        ID:      "doc2",
                        Content: "EINO is a framework for building AI applications",
                },
        }
        ids, err := indexer.Store(ctx, docs)
        if err != nil {
                log.Fatalf("Failed to store: %v", err)
                return
        }
        log.Printf("Store success, ids: %v", ids)
}

Configuration Options

FieldTypeDefaultDescription
Client
*milvusclient.Client
-Pre-configured Milvus client (optional)
ClientConfig
*milvusclient.ClientConfig
-Client configuration (required when Client is empty)
Collection
string
"eino_collection"
Collection name
Vector
*VectorConfig
-Dense vector configuration (Dimension, MetricType, field name)
Sparse
*SparseVectorConfig
-Sparse vector configuration (MetricType, field name)
IndexBuilder
IndexBuilder
AutoIndexBuilder
Index type builder
Embedding
embedding.Embedder
-Embedder for vectorization (optional). If empty, documents must contain vectors (BYOV).
ConsistencyLevel
ConsistencyLevel
ConsistencyLevelDefault
Consistency level (
ConsistencyLevelDefault
uses Milvus default: Bounded; if not explicitly set, maintains collection-level setting)
PartitionName
string
-Default partition for inserting data
EnableDynamicSchema
bool
false
Enable dynamic field support
Functions
[]*entity.Function
-Schema function definitions (e.g., BM25) for server-side processing
FieldParams
map[string]map[string]string
-Field parameter configuration (e.g., enable_analyzer)

Dense Vector Configuration (VectorConfig)

FieldTypeDefaultDescription
Dimension
int64
-Vector dimension (required)
MetricType
MetricType
L2
Similarity metric type (L2, IP, COSINE, etc.)
VectorField
string
"vector"
Dense vector field name

Sparse Vector Configuration (SparseVectorConfig)

FieldTypeDefaultDescription
VectorField
string
"sparse_vector"
Sparse vector field name
MetricType
MetricType
BM25
Similarity metric type
Method
SparseMethod
SparseMethodAuto
Generation method (
SparseMethodAuto
or
SparseMethodPrecomputed
)

Note: Only when MetricType is BM25, Method defaults to Auto. Auto means using Milvus server-side functions (remote functions). For other metric types (such as IP), the default is Precomputed.

Index Builders

Dense Index Builders

BuilderDescriptionKey Parameters
NewAutoIndexBuilder()
Milvus automatically selects optimal index-
NewHNSWIndexBuilder()
Graph-based high-performance index
M
,
EfConstruction
NewIVFFlatIndexBuilder()
Clustering-based search
NList
NewIVFPQIndexBuilder()
Product quantization, memory efficient
NList
,
M
,
NBits
NewIVFSQ8IndexBuilder()
Scalar quantization
NList
NewIVFRabitQIndexBuilder()
IVF + RaBitQ binary quantization (Milvus 2.6+)
NList
NewFlatIndexBuilder()
Brute-force exact search-
NewDiskANNIndexBuilder()
Disk index for large datasets-
NewSCANNIndexBuilder()
Fast search with high recall
NList
,
WithRawDataEnabled
NewBinFlatIndexBuilder()
Brute-force search for binary vectors-
NewBinIVFFlatIndexBuilder()
Clustering search for binary vectors
NList
NewGPUBruteForceIndexBuilder()
GPU-accelerated brute-force search-
NewGPUIVFFlatIndexBuilder()
GPU-accelerated IVF_FLAT-
NewGPUIVFPQIndexBuilder()
GPU-accelerated IVF_PQ-
NewGPUCagraIndexBuilder()
GPU-accelerated graph index (CAGRA)
IntermediateGraphDegree
,
GraphDegree

Sparse Index Builders

BuilderDescriptionKey Parameters
NewSparseInvertedIndexBuilder()
Inverted index for sparse vectors
DropRatioBuild
NewSparseWANDIndexBuilder()
WAND algorithm for sparse vectors
DropRatioBuild

Example: HNSW Index

indexBuilder := milvus2.NewHNSWIndexBuilder().
        WithM(16).              // Maximum connections per node (4-64)
        WithEfConstruction(200) // Search width during index construction (8-512)

Example: IVF_FLAT Index

indexBuilder := milvus2.NewIVFFlatIndexBuilder().
        WithNList(256) // Number of cluster units (1-65536)

Example: IVF_PQ Index (Memory Efficient)

indexBuilder := milvus2.NewIVFPQIndexBuilder().
        WithNList(256). // Number of cluster units
        WithM(16).      // Number of sub-quantizers
        WithNBits(8)    // Bits per sub-quantizer (1-16)

Example: SCANN Index (Fast Search with High Recall)

indexBuilder := milvus2.NewSCANNIndexBuilder().
        WithNList(256).           // Number of cluster units
        WithRawDataEnabled(true)  // Enable raw data for reranking

Example: DiskANN Index (Large Datasets)

indexBuilder := milvus2.NewDiskANNIndexBuilder() // Disk-based, no additional parameters

Example: Sparse Inverted Index

indexBuilder := milvus2.NewSparseInvertedIndexBuilder().
        WithDropRatioBuild(0.2) // Ratio of small values to ignore during build (0.0-1.0)

Dense Vector Metrics

Metric TypeDescription
L2
Euclidean distance
IP
Inner product
COSINE
Cosine similarity

Sparse Vector Metrics

Metric TypeDescription
BM25
Okapi BM25 (
SparseMethodAuto
required)
IP
Inner product (for precomputed sparse vectors)

Binary Vector Metrics

Metric TypeDescription
HAMMING
Hamming distance
JACCARD
Jaccard distance
TANIMOTO
Tanimoto distance
SUBSTRUCTURE
Substructure search
SUPERSTRUCTURE
Superstructure search

Sparse Vector Support

The indexer supports two sparse vector modes: Auto-Generation and Precomputed.

Auto-Generation (BM25)

Automatically generates sparse vectors from content fields using Milvus server-side functions.

  • Requirements: Milvus 2.5+
  • Configuration: Set MetricType: milvus2.BM25.
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
    // ... basic configuration ...
    Collection:        "hybrid_collection",
    
    Sparse: &milvus2.SparseVectorConfig{
        VectorField: "sparse_vector",
        MetricType:  milvus2.BM25, 
        // Method defaults to SparseMethodAuto when using BM25
    },
    
    // Analyzer configuration for BM25
    FieldParams: map[string]map[string]string{
        "content": {
            "enable_analyzer": "true",
            "analyzer_params": `{"type": "standard"}`, // Use {"type": "chinese"} for Chinese
        },
    },
})

Precomputed (SPLADE, BGE-M3, etc.)

Allows storing sparse vectors generated by external models (such as SPLADE, BGE-M3) or custom logic.

  • Configuration: Set MetricType (usually IP) and Method: milvus2.SparseMethodPrecomputed.
  • Usage: Pass sparse vectors via doc.WithSparseVector().
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
    Collection: "sparse_collection",
    
    Sparse: &milvus2.SparseVectorConfig{
        VectorField: "sparse_vector",
        MetricType:  milvus2.IP,
        Method:      milvus2.SparseMethodPrecomputed,
    },
})

// Store documents with sparse vectors
doc := &schema.Document{ID: "1", Content: "..."}
doc.WithSparseVector(map[int]float64{
    1024: 0.5,
    2048: 0.3,
})
indexer.Store(ctx, []*schema.Document{doc})

Bring Your Own Vectors (BYOV)

If your documents already contain vectors, you can use the Indexer without configuring an Embedder.

// Create indexer without embedding
indexer, err := milvus2.NewIndexer(ctx, &milvus2.IndexerConfig{
    ClientConfig: &milvusclient.ClientConfig{
        Address: "localhost:19530",
    },
    Collection:   "my_collection",
    Vector: &milvus2.VectorConfig{
        Dimension:  128,
        MetricType: milvus2.L2,
    },
    // Embedding: nil, // Leave empty
})

// Store documents with precomputed vectors
docs := []*schema.Document{
    {
        ID:      "doc1",
        Content: "Document with existing vector",
    },
}

// Attach dense vector to document
// Vector dimension must match collection dimension
vector := []float64{0.1, 0.2, ...} 
docs[0].WithDenseVector(vector)

// Attach sparse vector (optional, if Sparse is configured)
// Sparse vector is a mapping of index -> weight
sparseVector := map[int]float64{
    10: 0.5,
    25: 0.8,
}
docs[0].WithSparseVector(sparseVector)

ids, err := indexer.Store(ctx, docs)

For sparse vectors in BYOV mode, refer to the Precomputed section above for configuration.

Examples

See the https://github.com/cloudwego/eino-ext/tree/main/components/indexer/milvus2/examples directory for complete example code:

  • demo - Basic collection setup using HNSW index
  • hnsw - HNSW index example
  • ivf_flat - IVF_FLAT index example
  • rabitq - IVF_RABITQ index example (Milvus 2.6+)
  • auto - AutoIndex example
  • diskann - DISKANN index example
  • hybrid - Hybrid search setup (dense + BM25 sparse) (Milvus 2.5+)
  • hybrid_chinese - Chinese hybrid search example (Milvus 2.5+)
  • sparse - Pure sparse index example (BM25)
  • byov - Bring Your Own Vectors example

Getting Help

If you have any questions or feature suggestions, feel free to join the oncall group.

External References


Last modified January 20, 2026: feat(eino): sync En docs with zh docs (9da8ff724c)