AI

Feb 20, 2025

Your AI Pipeline Doesn't Need Another Wrapper

Stop Over-Engineering AI Apps: The LiteLLM logo surrounded by the logos of popular embedding models

Posted by

Jascha Beste

Jascha Beste

"Consistently, the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns." — Anthropic, Building Effective Agents

The AI tooling landscape resembles a gold rush. New frameworks pop up daily, each claiming to be the solution for building AI applications. But in an attempt to solve every possible use case, they introduce layers of abstraction that make systems harder to understand, debug, and maintain.

This seems familiar: NoSQL, serverless, microservices. Each promised simplicity but just shifted complexity elsewhere. 

AI is no different. An application leveraging LLMs is still just an app. Ninety percent of what we’ve learned in software engineering still applies. The best solutions don’t replace what works—they build on it.

The Siren Song of Complete Solutions

LangChain, the most popular AI application framework today, promises to be a comprehensive solution for building LLM applications. While it's an impressive piece of engineering, it exemplifies the "do everything" approach that plagues the field.

A typical RAG (retrieval-augmented generation) app that retrieves relevant documents and uses them to answer a question looks like this according to Langchain’s example docs:


# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

This "simple" example introduces seven new concepts: StateGraphs, sequences, graph builders, vector stores, splitters, documents, and loaders. And that’s on top of concepts that engineers need to understand to build a RAG app in the first place: vector similarity search, chunks, embeddings, and, of course, large language models (LLMs). Yet, even these foundational concepts aren't left untouched: They're wrapped in LangChain's own implementations. OpenAIEmbeddings isn't the official OpenAI client but rather LangChain's wrapper. Even BeautifulSoup, a tried-and-true Python HTML parsing library, gets encapsulated in a custom Loader wrapper.

While LangChain lowers the barrier to entry for AI apps, many teams find that these abstractions become liabilities as projects scale. We’ve heard many stories of developers building MVPs of their RAG systems on LangChain, but tearing it up and re-building their app without a framework a month or so later. And we’re not alone; many AI engineering teams have written about their decision to move away from frameworks like LangChain—see examples here, here, here, here, here, and here.

There’s a common refrain behind all these stories: LangChain (and similar tools) buries engineers in layers of abstraction before they even reach prompt engineering, evals, and data management. These abstractions make debugging challenging when issues arise across multiple layers and also make customization and meeting specific application requirements challenging. Moreover, it creates an entire parallel vocabulary that maps to actual LLM operations, forcing engineers to learn two sets of concepts instead of one.

The fundamental irony is that modern programming languages already handle LLM primitives perfectly well. Unlike web frameworks that abstract away complex networking concepts or object-relational mappers (ORMs) that simplify database interactions, LLM operations work with surprisingly simple data types. Messages are just strings. Embeddings are just lists of floating-point numbers. Python's built-in types and basic control flow are already ideal for handling these primitives.So why not keep it simple, stupid?

Unix-like tools in AI

Tools like LiteLLM exemplify good AI tooling: They solve a single, well-defined problem by providing a unified interface for LLM provider APIs. No embedding management, no caching, no workflow orchestration—just one focused task done well.

This approach mirrors Anthropic's findings about successful agent implementations. Instead of betting on monolithic frameworks, using simple, focused components leads to systems that are easier to understand, maintain, and evolve. We can then combine these basic building blocks to build solutions that exactly match our needs.

We recently added LiteLLM to pgai Vectorizer, our tool for automating embedding creation and synchronization in PostgreSQL. This move enabled our PostgreSQL extension to support basically every embedding provider. It also saved us from having to integrate a new API for every provider. Kudos to LiteLLM. This was really helpful.

Integrating AI Into Existing Software Stacks

The surge of AI-specific tools has led teams to overlook a fundamental truth: Most AI applications are still just applications at heart! And they need the same things that “non-AI applications” need: data persistence, authentication, business logic, and all the other components we've been building for decades. Successfully building an AI application today means choosing tools that complement and integrate with your existing infrastructure, not replacing your entire stack with AI-specific tools. 

PostgreSQL, the well-known and loved relational database, is a great example of this. PostgreSQL has been the backbone of countless applications for over 30 years. Thanks to PostgreSQL extensions like pgvector, pgai, and pgvectorscale, it can handle vector similarity search alongside relational queries.

When building RAG functionality, many developers choose to stick with PostgreSQL, rather than choosing from the myriad new specialized vector databases like Pinecone, Chroma, and Qdrant, to name but a few. This isn't just about reducing complexity; it's about leveraging battle-tested technology that your team already knows how to operate, monitor, and scale.

Here’s how to implement simple semantic search across your product documentation in PostgreSQL. Instead of setting up a separate vector database and building synchronization logic, you can use pgvector and pgai to auto-embed your data and add vector search capabilities directly to your existing database:

SELECT ai.create_vectorizer(
    'documentation'::regclass,
    destination => 'documentation_embeddings',
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    chunking => ai.chunking_recursive_character_text_splitter('content')
);

This single SQL command creates and maintains embeddings for your documentation automatically, similar to how PostgreSQL maintains an index. No separate service to deploy, no complex sync logic to debug, no additional failure modes to consider. Pgai is itself built with composability in mind, allowing you to combine different embedding, chunking, or other strategies together through simple, well-defined interfaces. Even if you decide not to use pgai Vectorizer, using pgvector itself saves you from maintaining yet another piece of infrastructure. This is AI tooling done well, in our humble opinion.

Following the principle of integrating into existing tools and frameworks for folks whose native language isn’t SQL, we also recently added SQLAlchemy support for pgai. Instead of learning yet another query language or API client, this lets you work with vector embeddings using the same query patterns as any Python application you’ve built before:

class Documentation(Base):
    __tablename__ = "documentation"
    id: Mapped[int] = mapped_column(primary_key=True)
    content: Mapped[str]
    
    # Add vector embeddings to your model via an sqlalchemy relationship
    embeddings = vectorizer_relationship(dimensions=768)

# Semantic search via sqlalchemy queries
similar_docs = (
    session.query(Documentation)
    .join(Documentation.embeddings)
    .order_by(
        Documentation.embeddings.embedding.cosine_distance(
            func.ai.openai_embed("text-embedding-3-small", "How do I setup authentication?")
        )
    )
    .limit(5)
    .all()
)

The nice thing about this is that your team can leverage their existing SQLAlchemy knowledge, and your vector search integrates seamlessly with your other queries and filters. To integrate this into your application, follow any existing SQLAlchemy + (insert web framework of your choice) guide. For any issues regarding query optimization, you can fall back on decades of PostgreSQL experience either in your team or in the community.

Conclusion

The AI tooling gold rush has created an ecosystem filled with abstractions looking for problems to solve. While the enthusiasm is understandable, we've seen how this pattern can lead to unnecessary complexity, technical debt, and, ultimately, harder-to-maintain systems. Instead of reinventing the wheel or building entire new ecosystems, the path forward lies in the thoughtful integration of AI capabilities into our existing, battle-tested tools. When evaluating AI tools for your stack, we suggest you follow these principles:

  • Choose boring technology: Favor tools that build upon existing, well-understood platforms rather than those that require wholesale replacement of working systems. PostgreSQL with pgvector might not be as shiny as the latest vector database, but it works just as well and brings decades of operational knowledge and reliability.
  • Embrace composability: Look for tools that solve specific problems well and can be easily integrated with others. LiteLLM's focused approach to LLM API abstraction exemplifies this philosophy, as does pgai's integration with SQLAlchemy.
  • Value developer experience: The best tools feel natural to use with your existing workflows. If the tools integrate well with existing stacks, this reduces your team's cognitive overhead.
  • Beware of "all-in-one" solutions: Tools that promise to solve every AI-related problem often create more complexity instead of eliminating it.

We built pgai with these principles in mind. It focuses on solving specific problems well: automating embedding synchronization through vectorizers, providing a clean interface for LLM interactions, and, most importantly, integrating seamlessly with existing PostgreSQL deployments. We believe this approach—building on proven foundations while adding new capabilities—is the sustainable path forward for AI development.

Originally posted

Feb 18, 2025

Last updated

Feb 20, 2025

Share

pgai

4.1k

pgvectorscale

1.7k

Subscribe to the Timescale Newsletter

By submitting you acknowledge Timescale's Privacy Policy.