Bulan lalu aku dapet project bikin chatbot internal yang bisa jawab pertanyaan dari ribuan dokumen perusahaan. Requirement-nya: akurat, cepat, dan harus production-ready.
Masalahnya? Ada tiga framework yang semuanya bisa: LangChain, LlamaIndex, dan Haystack. Dan aku gak tau mana yang paling cocok.
Jadi aku bikin prototype yang sama di ketiga framework. Berikut pengalaman dan perbandingannya secara jujur — including frustrations yang aku alami.
Overview Cepat
| LangChain | LlamaIndex | Haystack | |
|---|---|---|---|
| Fokus | General purpose AI framework | Data framework (RAG-focused) | NLP pipeline framework |
| Kompleksitas | Tinggi | Sedang | Sedang-Tinggi |
| Community | Terbesar | Besar & growing | Kecil tapi solid |
| Production-ready? | Ya, tapi butuh effort | Ya, untuk RAG | Ya, paling mature |
| Learning Curve | Curam | Moderate | Moderate |
| Maintainer | LangChain Inc | LlamaIndex Inc | deepset |
LangChain
Apa itu?
LangChain itu Swiss army knife-nya AI development. Dari RAG, agents, chatbots, sampai complex workflows — semua bisa. Tapi itu juga yang bikin dia overwhelming.
Contoh: Build RAG Sederhana
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate
# 1. Load documents
loader = DirectoryLoader(
"./docs",
glob="**/*.txt",
loader_cls=TextLoader
)
documents = loader.load()
# 2. Split
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)
# 3. Embed & Store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings)
# 4. Create chain
llm = ChatOpenAI(model="gpt-4", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
# 5. Query
result = qa_chain.invoke({"query": "Apa kebijakan cuti karyawan?"})
print(result["result"])
print("Sumber:", [doc.metadata for doc in result["source_documents"]])
Bisa dipersingkat pake LCEL (LangChain Expression Language):
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
retriever = vectorstore.as_retriever()
prompt = PromptTemplate.from_template("""
Jawab berdasarkan context:
{context}
Pertanyaan: {question}
""")
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
result = rag_chain.invoke("Apa kebijakan cuti?")
Build Agent:
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
@tool
def search_docs(query: str) -> str:
"""Cari informasi di database dokumen internal."""
retriever = vectorstore.as_retriever()
docs = retriever.invoke(query)
return "\n".join([d.page_content for d in docs])
@tool
def calculator(expression: str) -> str:
"""Hitung matematika."""
return str(eval(expression)) # Jangan di production!
llm = ChatOpenAI(model="gpt-4")
prompt = ChatPromptTemplate.from_messages([
("system", "Kamu adalah asisten yang helpful. Pakai tools yang tersedia."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_tool_calling_agent(llm, [search_docs, calculator], prompt)
executor = AgentExecutor(agent=agent, tools=[search_docs, calculator], verbose=True)
result = executor.invoke({"input": "Berapa sisa cuti tahunan kalau dihitung 12 - 7?"})
Kelebihan:
- Ecosystem terbesar: 700+ integrasi (vector stores, LLMs, tools)
- Multi-purpose: RAG, agents, chatbots, extraction, all-in
- LCEL: Expression language yang elegan untuk chain composition
- Community besar: Mudah cari solusi kalau stuck
- LangSmith: Monitoring & debugging platform
Kekurangan:
- Over-engineered untuk kasus simple
- Abstraction berlapis — susah tau apa yang terjadi di balik hood
- Breaking changes sering terjadi antar versi
- Documentation kadang confusing (terlalu banyak konsep)
- Performance overhead dari abstraction layers
Cocok untuk:
- Complex AI applications dengan multiple tools
- Projects yang butuh banyak integrasi
- Teams yang familiar dengan Python
Skor: 7.5/10
LlamaIndex
Apa itu?
LlamaIndex fokus ke satu hal dan doing it really well: connecting your data to LLM. Kalau use case-mu RAG (query your data), LlamaIndex biasanya jadi pilihan paling clean.
Contoh: Build RAG Sederhana
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
Settings
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Settings global
Settings.llm = OpenAI(model="gpt-4", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.chunk_size = 1000
Settings.chunk_overlap = 200
# 1. Load documents (auto-detect format!)
documents = SimpleDirectoryReader("./docs").load_data()
print(f"Loaded {len(documents)} documents")
# 2. Build index
index = VectorStoreIndex.from_documents(documents)
# 3. Create query engine
query_engine = index.as_query_engine(
similarity_top_k=3,
response_mode="compact"
)
# 4. Query!
response = query_engine.query("Apa kebijakan cuti karyawan?")
print(response)
print("Sumber:", response.source_nodes)
Bandingin dengan LangChain: LlamaIndex cuma butuh 10 baris. LangChain butuh 30+.
Advanced: Multi-Document Query
from llama_index.core import SummaryIndex, VectorStoreIndex
# Load multiple document types
documents = SimpleDirectoryReader(
input_dir="./docs",
recursive=True,
required_exts=[".pdf", ".txt", ".docx"]
).load_data()
# Vector index untuk semantic search
vector_index = VectorStoreIndex.from_documents(documents)
# Summary index untuk overview questions
summary_index = SummaryIndex.from_documents(documents)
# Router — otomatis pilih index yang tepat
from llama_index.core.tools import QueryEngineTool, ToolMetadata
vector_tool = QueryEngineTool(
query_engine=vector_index.as_query_engine(similarity_top_k=3),
metadata=ToolMetadata(
name="vector_search",
description="Cari informasi spesifik dari dokumen"
)
)
summary_tool = QueryEngineTool(
query_engine=summary_index.as_query_engine(),
metadata=ToolMetadata(
name="summary_search",
description="Dapatkan ringkasan/overview dari dokumen"
)
)
# Router query engine
from llama_index.core.query_engine import RouterQueryEngine
router = RouterQueryEngine(
query_engine_tools=[vector_tool, summary_tool],
select_multi=True
)
# LlamaIndex otomatis pilih tool yang tepat
response = router.query("Beri saya ringkasan semua kebijakan perusahaan")
Persistence (Save & Load Index)
# Save index (biar gak re-process setiap kali)
storage_context = index.storage_context
storage_context.persist(persist_dir="./storage")
# Load index
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
Kelebihan:
- RAG-focused: Paling clean untuk query data use cases
- Less code: 3x lebih sedikit dari LangChain untuk RAG
- Auto-detect: File types, parsing, chunking — mostly automatic
- Storage persistence: Bisa save/load index dengan gampang
- Excellent documentation: Clear dan well-organized
- Sub-question query: Auto-break complex questions
Kekurangan:
- Narrow focus: Bukan general-purpose
- Agent capability: Gak se-powerful LangChain
- Integrations: Gak selengkap LangChain
- Community: Lebih kecil, harder to find niche solutions
- Customization: Kadang terlalu high-level, susah tweak
Cocok untuk:
- RAG applications (the best choice!)
- Chatbot yang query your data
- Document Q&A systems
Skor: 8.5/10 untuk RAG, 6/10 untuk general AI apps
Haystack (by deepset)
Apa itu?
Haystack framework yang didesain untuk production NLP pipelines. Approach-nya “pipeline” — kamu bikin rangkaian node yang memproses data step by step. Mature, enterprise-tested, dan well-structured.
Contoh: Build RAG Pipeline
from haystack import Pipeline, Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import (
OpenAIDocumentEmbedder,
OpenAITextEmbedder
)
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
# 1. Document Store
document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
# 2. Indexing Pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", TextFileToDocument())
indexing_pipeline.add_component("splitter", DocumentSplitter(
split_by="word",
split_length=200,
split_overlap=50
))
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder())
indexing_pipeline.add_component("writer", document_store)
# Connect components
indexing_pipeline.connect("converter.documents", "splitter.documents")
indexing_pipeline.connect("splitter.documents", "embedder.documents")
indexing_pipeline.connect("embedder.documents", "writer.documents")
# Run indexing
from pathlib import Path
indexing_pipeline.run({"converter": {"sources": list(Path("./docs").glob("*.txt"))}})
# 3. Query Pipeline
prompt_template = """
Kamu adalah asisten yang menjawab berdasarkan dokumen.
Context:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Pertanyaan: {{ question }}
Jawaban:
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component("embedder", OpenAITextEmbedder())
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(
document_store=document_store,
top_k=3
))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-4"))
# Connect
rag_pipeline.connect("embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.connect("prompt_builder", "llm")
# 4. Query
result = rag_pipeline.run({
"embedder": {"text": "Apa kebijakan cuti karyawan?"},
"prompt_builder": {"question": "Apa kebijakan cuti karyawan?"}
})
print(result["llm"]["replies"][0])
Advanced: Multi-Stage Pipeline
from haystack.components.rankers import TransformersSimilarityRanker
# Tambahin reranker untuk hasil lebih akurat
rag_pipeline.add_component(
"reranker",
TransformersSimilarityRanker(model="BAAI/bge-reranker-base")
)
# Wire it up
rag_pipeline.connect("retriever.documents", "reranker.documents")
rag_pipeline.connect("reranker.documents", "prompt_builder.documents")
Kelebihan:
- Production-tested: deepset pakai ini di enterprise clients
- Explicit pipelines: Kamu tau persis apa yang terjadi di setiap step
- Component-based: Gampang swap satu bagian tanpa ngerusak yang lain
- Built-in evaluation: Metrics dan evaluation framework
- REST API: Bisa deploy pipeline sebagai API langsung
Kekurangan:
- Lebih verbose dari LlamaIndex
- Community kecil di luar Eropa
- Learning curve: Konsep pipeline butuh penyesuaian
- Update lambat dibanding kompetitor
- Documentation: Kurang banyak contoh real-world
Cocok untuk:
- Enterprise NLP applications
- Projects yang butuh explicit control
- Multi-step processing pipelines
- Team yang value simplicity dan reliability
Skor: 7/10
Head-to-Head: Test Case yang Sama
Aku bikin task yang sama di ketiga framework: RAG chatbot dari 100 dokumen.
Metrics:
| Metric | LangChain | LlamaIndex | Haystack |
|---|---|---|---|
| Lines of code | 65 | 22 | 45 |
| Setup time | 45 min | 15 min | 30 min |
| Accuracy (test set) | 82% | 87% | 84% |
| Avg response time | 2.3s | 1.8s | 2.0s |
| Memory usage | 512MB | 380MB | 420MB |
| Time to first working prototype | 2 jam | 30 menit | 1.5 jam |
Catatan Penting:
Accuracy LlamaIndex paling tinggi karena indexing strategy-nya lebih pintar. Dia otomatis pilih chunk size dan overlap yang optimal.
LangChain paling lambat karena abstraction overhead. Tapi kalau kamu perlu add tools (search web, hitung kalkulator), LangChain jauh lebih flexible.
Haystack middle ground — gak secepat LlamaIndex untuk RAG, tapi paling mudah dideploy ke production.
Kapan Pakai yang Mana?
Pakai LangChain kalau:
- Butuh multiple tools dan integrasi
- Mau bikin complex agent system
- Project gak cuma RAG, tapi juga extraction, chatbot, dll
- Team-nya besar dan butuh framework yang flexible
Pakai LlamaIndex kalau:
- Use case-nya RAG murni
- Mau cepat prototype
- Butuh query berbagai jenis data (PDF, SQL, API)
- Prioritas accuracy dan speed
Pakai Haystack kalau:
- Enterprise environment
- Butuh explicit control atas setiap pipeline step
- Mau bikin custom NLP pipeline
- Butuh built-in evaluation dan monitoring
Bisa Dipadukan!
Yang orang jarang tau: kamu bisa combine framework ini.
# Pakai LlamaIndex untuk indexing, LangChain untuk agents
from llama_index.core import VectorStoreIndex
from langchain_openai import ChatOpenAI
# Index with LlamaIndex
index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever(similarity_top_k=3)
# Use in LangChain agent
from langchain_core.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search internal docs"""
results = retriever.retrieve(query)
return "\n".join([r.node.text for r in results])
# Build agent with LangChain (using LlamaIndex retriever)
Pitfalls Umum
1. Version breaking changes LangChain paling parah. Banyak tutorial yang udah outdated. SELALU cek versi library dan compatibility.
2. Over-engineering dari awal Mulai dari yang simple! Jangan langsung bikin complex agent system kalau RAG sederhana udah cukup.
3. Gak test accuracy Bikin RAG gak cukup. Harus test accuracy-nya. Buat test set dengan 20-30 pertanyaan dan expected answers.
test_cases = [
{"question": "Berapa hari cuti tahunan?", "expected": "12 hari"},
{"question": "Bagaimana cara klaim BPJS?", "expected": "..."},
# ... dst
]
for case in test_cases:
result = query_engine.query(case["question"])
print(f"Q: {case['question']}")
print(f"A: {result}")
print(f"Expected: {case['expected']}")
print("---")
4. Chunk strategy yang asal-asalan Chunk size dan overlap bikin beda besar di quality. Test beberapa konfigurasi.
Conclusion
Kalau aku harus pilih SATU untuk pemula: LlamaIndex. Paling gampang, paling cepat, dan hasilnya paling bagus untuk RAG.
Tapi kalau project-nya complex dan butuh agents + tools + integrations: LangChain.
Untuk enterprise dengan explicit pipeline control: Haystack.
Yang terpenting: framework itu tool. Yang bikin bagus atau jelek itu cara kamu pakainya. Pahami fundamentals-nya dulu (embeddings, retrieval, prompt engineering), framework-nya ganti-ganti gampang.
Mau diskusi framework mana yang cocok buat project kamu? Chat aku di Telegram!