This project started out as a research project with the goal of gaining new knowledge for Data Respons R&D Services within the field of AI and LLM and studying how a local system with running an open-source LLM could compare to the state-of-the-art services on market.
With such a wide research field many different methods and systems were researched: Standard text chat usage of LLM, such as of OpenAI’s ChatGPT. Autonomous agents, where AI models are placed in a system where it automatically analyzes a problem, comes up with a solution, analyzes the solution for improvements and implements such improvements without the interference of a human. Training our own LLM on our own Confluence database, to make it into a better and more advanced search function. Code generation in code editor, similar to GitHub Copilot.
From all these different topics, it was decided to first try and run an LLM locally on an office computer. The goal was to make a Linux Command Line Helper (LCLH), a tool that was integrated into the Linux terminal and able to answer short questions regarding different terminal commands. The response from the LLM should be a command that accomplishes the user’s request. For example:
In a Linux terminal:
User: howto find my ubuntu version?
AI: lsb_release -a
Here howto is used as a keyword to activate the LCLH. The question, also called the prompt, is sent to a LLM running locally on the computer. After some time, the LLM has generated an answer that is returned to the user. The locally run LCLH accomplished two things: 1) It was working as intended and was able to return useful answers most of the time, and 2) it removed all online third-party sources, and therefore not breaching any confidentiality regulations. A step in the right direction. Although not without a new problem: Subpar performance. Some prompts took up to 50 seconds to answer, which is not acceptable when you could find the correct answer on the internet in less than 15 seconds. The performance problem came from insufficient hardware.
Hardware problems can mainly be solved in two ways: Buy new and more powerful hardware or rent it. For a product that was supposed to be available for all Data Respons SW developers, the best solution was to rent. Amazon Web Services (AWS) was chosen as provider, and a server with a virtual machine running the LLM was set up. This reduced the response time to around 5 seconds, and at a later stage to only 3 seconds. Renting a server also opened the possibility of using a larger and better LLM, enabling more and different use cases.
Among the new use cases was a chat with a LLM specialized in code generation. Combining this capability with an open-source Visual Studio Code extension called Continue, produced a mix between OpenAI’s ChatGPT and GitHub Copilot. To connect Continue to AWS, a custom-made VS code extension was developed. The extension runs a Python program on startup, which is responsible for the http-connection between Continue and AWS and encrypts/decrypts all messages. The overall architecture can be viewed in the figure below.