Documentation

NodeRefine Docs

NodeRefine is the performance module for RAG system engineering. It transforms bloated vector databases into lean, high-precision knowledge assets through three progressive refinement stages: semantic de-duplication, contextual pruning, and topology re-linking.

This documentation covers SDK installation, API endpoints, core concepts, and industry-specific configuration templates.

⚡

Quickstart

Up and running in 5 minutes

🐍

Python SDK

pip install noderefine

🔗

API Reference

REST endpoints & webhooks

Quickstart

Get NodeRefine running against your vector database in under 5 minutes.

1. Install the SDK

# Python
pip install noderefine

# Rust
cargo add noderefine
        

2. Initialize the Client

from noderefine import Client

client = Client(
    api_key="nr_live_your_api_key",
    db_provider="pinecone",  # or "milvus", "qdrant", "weaviate", "chroma", "pgvector"
    db_config={
        "index": "my-knowledge-base",
        "api_key": "pc_..."
    }
)
        

3. Run Your First Refinement

result = client.refine(
    strategies=["dedup", "prune", "relink"],
    dry_run=True  # Preview changes before applying
)

print(result.summary)
# → Analyzed 128,431 nodes. Found 34,102 redundant, 21,088 pruneable.
# → Estimated noise reduction: 73%. Token savings: ~$42K/month.

# Apply when ready
result.apply()
        

Authentication

Whitelist Required

NodeRefine is currently in private beta. API keys are only issued to whitelisted users. Request access to receive your credentials.

All API requests require a Bearer token. Once your account is approved, you can generate API keys from the Lab console under Settings → API Keys.

# Include in all requests
Authorization: Bearer nr_live_your_api_key

# Test keys (sandbox)
Authorization: Bearer nr_test_your_api_key
        

API keys follow two conventions:

nr_live_* — Production keys. All refinement operations are permanent.
nr_test_* — Sandbox keys. Refinements are simulated (dry_run enforced).

Python SDK

The official Python SDK provides a high-level interface to all NodeRefine capabilities. Requires Python 3.9+.

Installation

pip install noderefine
# With optional async support
pip install noderefine[async]
        

Basic Usage

from noderefine import Client, Strategy

client = Client(api_key="nr_live_...")

# Connect to your vector database
collection = client.connect(
    provider="qdrant",
    url="https://your-qdrant-instance.io",
    collection="knowledge-base"
)

# Configure refinement strategies
pipeline = collection.pipeline([
    Strategy.dedup(threshold=0.88),
    Strategy.prune(min_frequency=2, min_similarity=0.32),
    Strategy.relink(max_edges=5, min_coupling=0.65),
])

# Execute with progress tracking
for event in pipeline.stream():
    print(f"[{event.stage}] {event.message}")
        

Async Usage

import asyncio
from noderefine import AsyncClient

async def main():
    client = AsyncClient(api_key="nr_live_...")
    result = await client.refine(
        collection="legal-docs",
        strategies=["dedup", "prune"]
    )
    print(result.stats)

asyncio.run(main())
        

Rust SDK

The Rust SDK is optimized for high-throughput, low-latency refinement pipelines in production environments.

Installation

# Cargo.toml
[dependencies]
noderefine = "0.4"
tokio = { version = "1", features = ["full"] }
        

Basic Usage

use noderefine::{Client, Strategy, Config};

#[tokio::main]
async fn main() -> Result<(), noderefine::Error> {
    let client = Client::new("nr_live_...");

    let result = client
        .refine("knowledge-base")
        .strategy(Strategy::Dedup { threshold: 0.88 })
        .strategy(Strategy::Prune {
            min_freq: 2,
            min_sim: 0.32,
        })
        .execute()
        .await?;

    println!("Refined {} nodes, saved {} tokens",
        result.nodes_processed, result.tokens_saved);

    Ok(())
}
        

Semantic De-duplication

The first stage of the NodeRefine pipeline identifies chunks that are worded differently but carry overlapping semantic meaning. Unlike naive hash-based dedup, NodeRefine uses cross-encoder models to compute pairwise semantic similarity.

How It Works

Candidate Selection — Fast bi-encoder pre-filtering narrows the comparison space from O(n²) to O(n·k) by selecting only the top-k nearest neighbors per node.
Cross-Encoder Scoring — Each candidate pair is scored by a high-precision cross-encoder. Pairs above the threshold (default: 0.88) are marked as duplicates.
Merge Resolution — The node with the highest aggregate retrieval score is promoted. Metadata from the demoted node is absorbed, preserving context breadth.

Configuration

Strategy.dedup(
    threshold=0.88,         # Semantic similarity threshold (0.0–1.0)
    model="cross-encoder-v3", # Cross-encoder model variant
    merge="absorb",          # "absorb" | "keep_latest" | "keep_highest"
    batch_size=1000,         # Processing batch size
)
        

Contextual Pruning

The second stage removes noise from individual chunks. This includes outdated metadata, formatting artifacts, conversion debris (from PDF/HTML extraction), and low-information filler text.

What Gets Pruned

Dead metadata — Timestamps, file paths, page numbers that add no retrieval value.
Format artifacts — HTML tags, markdown escapes, OCR errors from document conversion.
Semantic filler — Repeated boilerplate, disclaimers, headers/footers that appear across multiple chunks.
Orphan nodes — Chunks with fewer than N retrievals over a defined period and below a similarity floor.

Configuration

Strategy.prune(
    min_frequency=2,       # Min retrievals in the lookback period
    min_similarity=0.32,   # Floor cosine similarity to retain
    lookback_days=30,      # Activity lookback window
    preserve_compliance=True # Never prune compliance-tagged data
)
        

Topology Re-linking

The third and most powerful stage builds logical dependency edges between semantically adjacent nodes. This transforms a flat vector store into a traversable knowledge graph.

Benefits

Context enrichment — When an LLM retrieves node A, it also receives the most logically coupled nodes B and C, providing richer context without additional queries.
Multi-hop reasoning — Edge traversal enables the LLM to follow logical chains across documents, dramatically improving answers to complex questions.
Reduced hallucination — By providing structurally connected evidence, the LLM has less incentive to fabricate connections.

Configuration

Strategy.relink(
    max_edges=5,           # Maximum outgoing edges per node
    min_coupling=0.65,     # Minimum logical coupling score
    edge_model="causal-v2", # Edge weight model variant
    bidirectional=True     # Create bidirectional edges
)
        

API Reference

POST /v1/refine

Trigger a refinement job on a vector collection.

Parameter	Type	Required	Description
`collection`	string	Yes	Target vector collection name
`strategies`	string[]	Yes	Array of strategies: "dedup", "prune", "relink"
`dry_run`	boolean	No	Preview changes without applying (default: false)
`config`	object	No	Strategy-specific configuration overrides
`webhook_url`	string	No	URL to receive completion callback

// Request
POST /v1/refine
{
  "collection": "legal-kb-prod",
  "strategies": ["dedup", "prune", "relink"],
  "dry_run": false,
  "config": {
    "dedup": { "threshold": 0.90 },
    "prune": { "preserve_compliance": true }
  }
}

// Response
{
  "job_id": "rf_a1b2c3d4",
  "status": "processing",
  "nodes_total": 128431,
  "estimated_completion": "2026-03-20T15:30:00Z"
}

POST /v1/query

Run a retrieval query with optional before/after comparison.

POST /v1/query
{
  "query": "What are the compliance requirements for data retention?",
  "collection": "legal-kb-prod",
  "top_k": 5,
  "compare": true
}

// Response includes both raw and refined results
{
  "before": {
    "chunks": [...],
    "tokens": 2847,
    "top1_score": 0.82
  },
  "after": {
    "chunks": [...],
    "tokens": 1203,
    "top1_score": 0.96
  },
  "savings": {
    "token_reduction": "57.7%",
    "relevance_gain": "17.1%"
  }
}

GET /v1/status/:job_id

Check the status of a refinement job.

GET /v1/status/rf_a1b2c3d4

{
  "job_id": "rf_a1b2c3d4",
  "status": "completed",
  "stages": {
    "dedup": {
      "status": "done",
      "merged": 34102
    },
    "prune": {
      "status": "done",
      "removed": 21088
    },
    "relink": {
      "status": "done",
      "edges_created": 89412
    }
  },
  "noise_reduction": "73%",
  "token_savings_monthly": "$42,180",
  "duration_ms": 184320
}

Industry Templates

Pre-configured refinement templates for high-accuracy industries. Each template includes optimized thresholds, compliance guardrails, and domain-specific heuristics.

Legal

Law firms and legal tech platforms deal with massive document corpora where precision is non-negotiable. The Legal template enforces strict version arbitration and preserves all jurisdictional metadata.

result = client.refine(
    collection="case-law-db",
    template="legal",  # Pre-configured for legal domain
    config={
        "jurisdiction_aware": True,
        "preserve_citations": True,
        "conflict_strategy": "latest_ruling",
        "prune_threshold": 0.25,  # Conservative — preserve more context
    }
)
        

Key features: Citation graph preservation, jurisdiction-aware dedup, ruling recency weighting, privilege-tag immunity.

Healthcare

Healthcare RAG systems require HIPAA-compliant refinement. The Medical template never prunes PHI-tagged nodes and maintains audit trails for every refinement action.

result = client.refine(
    collection="clinical-guidelines",
    template="medical",
    config={
        "hipaa_mode": True,
        "audit_trail": True,
        "phi_protection": "strict",
        "evidence_level_weight": True,  # Prioritize higher evidence levels
    }
)
        

Key features: PHI auto-detection and protection, evidence-level weighting, ICD/CPT code preservation, HIPAA audit logging.

Engineering

Technical documentation, API specs, and code repositories benefit from version-aware refinement that respects semver and deprecation patterns.

result = client.refine(
    collection="api-docs-v3",
    template="engineering",
    config={
        "version_aware": True,
        "deprecation_policy": "demote",  # Demote deprecated APIs, don't delete
        "code_block_protection": True,
        "relink_by_module": True,
    }
)
        

Key features: Semver-aware conflict resolution, code block integrity, module-based re-linking, deprecation demotion.

LangChain Integration

Drop NodeRefine into your existing LangChain pipeline as a retriever wrapper.

from langchain.vectorstores import Pinecone
from noderefine.integrations import NodeRefineRetriever

# Wrap your existing vector store
vectorstore = Pinecone.from_existing_index("my-index", embeddings)

retriever = NodeRefineRetriever(
    vectorstore=vectorstore,
    api_key="nr_live_...",
    strategies=["dedup", "relink"],
    top_k=5
)

# Use as a standard LangChain retriever
docs = retriever.get_relevant_documents("What is our refund policy?")
        

LlamaIndex Integration

NodeRefine plugs into LlamaIndex as a node post-processor.

from llama_index.core import VectorStoreIndex
from noderefine.integrations import NodeRefinePostProcessor

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(
    node_postprocessors=[
        NodeRefinePostProcessor(
            api_key="nr_live_...",
            strategies=["dedup", "prune"]
        )
    ]
)

response = query_engine.query("Summarize Q4 revenue trends")
        

Need Access or Help?

NodeRefine is invite-only during private beta. Request access to get your API credentials, or reach out if you need support.

Request Access Discord Community support@noderefine.com