What Is RAG and Why It Matters
Large language models have broad general knowledge but do not know about your company’s specific data: your products, policies, internal documents, and customer history. Retrieval Augmented Generation (RAG) solves this by first searching your database for relevant information, then passing that information to the LLM along with the user’s question. The LLM generates an answer grounded in your actual data, dramatically reducing hallucinations and ensuring accuracy.
The RAG Pipeline
A RAG pipeline has four steps. First, chunk your documents into manageable pieces (paragraphs or sections). Second, generate vector embeddings for each chunk and store them in Oracle’s VECTOR column. Third, when a user asks a question, search for the most relevant chunks using vector similarity. Fourth, pass the retrieved chunks plus the question to the LLM and return the generated answer.
Step 1-2: Document Ingestion
CREATE TABLE knowledge_base (
chunk_id NUMBER GENERATED ALWAYS AS IDENTITY,
doc_name VARCHAR2(500),
chunk_text CLOB,
chunk_vector VECTOR(1536, FLOAT32),
created_dt DATE DEFAULT SYSDATE
);
-- Ingest a document: chunk and embed
DECLARE
l_chunks APEX_T_VARCHAR2;
BEGIN
-- Split document into chunks (e.g., by paragraph)
l_chunks := APEX_STRING.SPLIT(l_document_text, CHR(10) || CHR(10));
FOR i IN 1 .. l_chunks.COUNT LOOP
IF LENGTH(l_chunks(i)) > 50 THEN -- Skip very short chunks
INSERT INTO knowledge_base (doc_name, chunk_text, chunk_vector)
VALUES (
p_doc_name,
l_chunks(i),
DBMS_VECTOR.UTL_TO_EMBEDDING(l_chunks(i),
JSON('{"provider":"database","model":"doc_model_v1"}'))
);
END IF;
END LOOP;
COMMIT;
END;
Step 3: Retrieval
-- Find the 5 most relevant chunks for the user's question
SELECT chunk_id, chunk_text,
VECTOR_DISTANCE(chunk_vector, :question_vector, COSINE) AS distance
FROM knowledge_base
ORDER BY VECTOR_DISTANCE(chunk_vector, :question_vector, COSINE)
FETCH APPROXIMATE FIRST 5 ROWS ONLY;
Step 4: Generation
-- Build a prompt with retrieved context
l_prompt := 'Answer the following question based ONLY on the provided context. '
|| 'If the context does not contain the answer, say "I don''t have that information."'
|| CHR(10) || CHR(10) || 'Context:' || CHR(10) || l_retrieved_chunks
|| CHR(10) || CHR(10) || 'Question: ' || :P10_USER_QUESTION;
-- Call the LLM
l_answer := DBMS_CLOUD_AI.GENERATE(
prompt => l_prompt,
profile_name => 'RAG_PROFILE',
action => 'chat'
);
Building the APEX Interface
Create a chat-style APEX page with a text input for questions and a response region that displays the LLM’s answer. Include a “Sources” collapsible section showing the retrieved document chunks so users can verify the answer and drill into the original documents. Add a feedback mechanism (thumbs up/down) to track answer quality and identify gaps in your knowledge base.
Improving RAG Quality
Chunk size matters: too small and you lose context, too large and you dilute relevance. Start with 500 to 1000 character chunks with 100 character overlap between chunks. Add metadata filtering: if the user is asking about HR policies, filter chunks to only HR documents before vector search. Re-rank retrieved chunks using a cross-encoder model for more accurate relevance scoring. Monitor which questions get poor feedback and add better documentation to fill the gaps.