Load the data
1. Load the data from websites, GitHub Repo, or maybe search for answers on Google and use the results as verbs.
Data processing
1. Text tokenization
Create embeddings
1. Using any embedding algorithm provided by HuggingFace or OpenAI, however, choosing the algorithm is critical in terms of time because the process of creating embeddings takes a lot of time.
Create a vector database
1. Using a cloud vector database to keep the history of the chat and also keep any data we got from the previous search process. We may use Qdrant to achieve this goal or deploy any vector database like chromaDB, Pgvector, or Faiss.
2. I have a great experience with vector databases because I've participated in implementing a vector database indexer project so I know they work and the algorithms used to implement them.
LLM
1. I want to talk about this point specifically because it’s critical, we want to use LLM cheap, efficient, and fast.
2. There are multiple greater models on Huggingface with an Apache 2.0 License and they are created for question-answering tasks like Intel/dynamic_tinybert, distilbert/distilbert-base-cased-distilled-squad, FlagAlpha/Llama2-Chinese-13b-Chat,
Create back-end APIs
1. I recommend using Python frameworks like Django to build our back-end and endpoints because the part that deals with LLM and gets its response will be implemented in Python, and to make the interface between them easy it’s a good idea to build the back-end using Python.
Create front-end
1. There is no doubt that React is a great front-end framework in terms of performance.
2. I think the website has the following features:
  1. Create an account: Sign in/ Sign-up
  2. Chat with LLM
  3. Show search history
  4. Clear History
  5. Continue as a guest (without memory)
Integrate the front end with the back end
Deploy the website

Block Diagram:

Image Added

Data workflow:

Loading data from documentation, websites, GitHub,...etc
Processing the data using Langchain tools like tokenization and splitting.
Send the documents to create embeddings for the collected data
Return to Langchain vectors of fixed size that represent the documents
Send the embeddings to Qdrant to be stored

Chat workflow

The user enters the query and the website sends an API call to the server
The server calls Langchain functions on the user's query and the chat history
Converting the user query and chat history to embeddings using HuggingFace.
Getting the vectorized data from hugging face
Making API calls to send the embeddings to Qdrant.
Get the result of querying Qdrant to get the top-k relevant documents
Construct a well-defined prompt that contains the relevant data to LLM.
Get the generated response from the LLM.
Further processing on the generated output to be in a user-friendly format and have the result on the back-end side.
Returning the response to the client side.

Suggestions

Instead of using React to build the website, we can use Flutter to take our service to the next level to have a single code base and our application will work on mobile phones (android or IOS), websites, and desktops.
Fine-tuning a pre-trained model.

...

Space shortcuts

Page tree

Versions Compared

Old Version 8

New Version Current

Key

Block Diagram:

Data workflow:

Chat workflow

Suggestions

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 8

New Version Current

Key

Block Diagram:

Data workflow:

Chat workflow

Suggestions