Skip to main content

How can we find(evaluate) A good Agent?

· 5 min read
Damon Lee
Fission Team

Alright, But How the Hell Can We Find(Evaluate) A good Agent?

What Makes a "Good Agent"? Evaluating Agents in the Era of LLM-Based RAG Systems​

image.png

The rise of Large Language Model (LLM)-powered Retrieval-Augmented Generation (RAG) has led to an explosion of projects and services claiming to integrate "agents" into their systems. From task automation to advanced decision-making, these agents are reshaping industries. However, amid the wave of hype, a critical question emerges: What constitutes a good agent? As the AI community navigates this flood of agent-based solutions, it’s imperative to establish robust evaluation methods to differentiate effective agents from underperforming ones. This article explores how to evaluate agents using methods like "G-Eval" and "Hallucination + RAG Evaluation" and why this is critical for the future of agent-based systems.

The Current Challenge: Defining a Good Agent​

An agent in the context of LLM-based RAG systems typically performs tasks by combining reasoning, retrieval, and interaction capabilities. However, the effectiveness of these agents varies widely due to:

  1. Ambiguous Standards: There is no universally agreed-upon metric for evaluating an agent’s performance.
  2. Complexity of Multi-Step Tasks: Many agents fail to maintain contextual accuracy across multi-turn or complex interactions.
  3. Hallucinations: Agents often generate factually incorrect or irrelevant responses, undermining trust and utility.
  4. Domain-Specific Demands: Agents must adapt to the nuances of specific fields, such as healthcare, finance, or Web3.

Without rigorous evaluation frameworks, it’s challenging to identify and improve truly effective agents.

Challenges and directions for crypto-based AI agents

· 10 min read
Damon Lee
Fission Team

1-1. Definition of a decentralized AI agent​

A decentralized AI agent is a system that harnesses artificial intelligence for automation, learning, and reasoning, while simultaneously ensuring data sovereignty through distributed ledger technologies (blockchains) and consensus mechanisms. By doing so, these agents mitigate reliance on centralized servers or organizations, and empower individual users or communities to control the data they generate or consume. Potential applications range from automated asset management in decentralized finance (DeFi) to decision-support engines in Decentralized Autonomous Organizations (DAOs).

1-2. Why is decentralization important in AI?​

🤖 Decentralized AI Agents for Trust, Security, and Next-Generation Applications

Background and Motivation​

AI has become a cornerstone of modern industry, fueling innovation in areas like finance, healthcare, manufacturing, and education. However, most AI systems today are centralized, aggregating data and training resources under the jurisdiction of a few major entities. This arrangement has repeatedly raised concerns about data sovereignty, transparency, and equity.

In contrast, decentralized AI agents capitalize on distributed trust to enhance security and accountability, allowing individual stakeholders to define how, when, and to what extent their data is leveraged. By having the broader network verify data and model processes, these agents reduce reliance on traditional centralized platforms and create an ecosystem that is more horizontally structured and community-driven.

Hybrid Search = Spare + Dense RAG

· 4 min read
Damon Lee
Fission Team

Why We Use Hybrid Search RAG (Sparse + Dense Embedding + ReRanker) Instead of Naive RAG?

Problem Statement: Decentralized Web3 Agents and the Need for Efficient Data Retrieval​

The emergence of decentralized Web3 agents has redefined the landscape of AI-driven automation. Unlike traditional centralized frameworks, these agents operate on decentralized platforms, emphasizing transparency, user ownership, and multi-modal data processing. However, managing and retrieving data in decentralized environments poses unique challenges:

  1. Data Fragmentation: Information is scattered across multiple decentralized nodes, making efficient retrieval complex.
  2. Diverse Data Modalities: Web3 agents require access to text, images, and structured metadata to function effectively.
  3. Performance Bottlenecks: Standard retrieval mechanisms struggle with scalability and semantic understanding in decentralized systems.

This is where Hybrid Search RAG—a sophisticated blend of sparse and dense embedding retrieval with re-ranking—becomes a game-changer. It not only addresses these challenges but also sets a new benchmark for data retrieval in decentralized frameworks.

What is Naive RAG?​

Naive RAG integrates a generative AI model with a retrieval component that fetches relevant documents from a database. This retrieval is typically based on:

While effective for basic applications, naive RAG has critical shortcomings:

  1. Limited Context Understanding: Sparse embeddings often fail to capture semantic nuances, especially in multi-modal data.
  2. Suboptimal Ranking: Dense embeddings can retrieve irrelevant documents due to lack of fine-grained ranking mechanisms.
  3. Scalability Issues: Naive implementations struggle to efficiently handle large-scale or multi-modal datasets.

RAFT - RAG based Finetunning

· 4 min read
Damon Lee
Fission Team

Why We Need RAFT: Adapting Language Models to Domain-Specific RAG​

The evolution of Retrieval-Augmented Generation (RAG) has unlocked unprecedented possibilities in AI, enabling generative models to retrieve and incorporate external data dynamically. However, as AI frameworks increasingly interface with domain-specific contexts like Web3, there is a growing need for a specialized adaptation mechanism—RAFT (Retrieval-Adapted Fine-Tuning). This blog explores why RAFT is essential for adapting language models to domain-specific RAG, enhancing real-time interactions with the Web3 community and its users.

The Challenge: Domain-Specificity in RAG​

Web3 ecosystems are inherently dynamic and domain-specific, characterized by:

  1. Unique Jargon and Concepts: Terms like "staking," "DAO," "NFT minting," and "gas fees" are ubiquitous in Web3 but rarely encountered in general-purpose datasets.
  2. Rapidly Evolving Information: Web3 platforms are continuously updated with new protocols, smart contracts, and token standards.
  3. Decentralized Data Sources: Information is dispersed across blockchains, decentralized file systems, and community-managed repositories.

While RAG frameworks excel in retrieving relevant data, they often struggle with adapting generative outputs to these domain-specific requirements. Without fine-tuning, language models risk producing generic or irrelevant responses that fail to meet the expectations of Web3 users.