Context-Aware Chat

This is an intelligent chat system that understands your codebase context and also can significantly enhance productivity by providing top-matched/relevant and precise interactions. This feature allows developers to seamlessly navigate their codebase without the need for manual searches. Suppose you need to edit a specific section of your code, but you don’t know its exact location among numerous files. Instead of spending time searching manually, this feature intelligently scans your codebase and retrieves the top matching results. It also displays the file names where these matches are found, allowing you to quickly locate the desired section. This eliminates the hassle of navigating through multiple files and makes code modifications faster and more efficient.
Moreover, once the relevant section is identified, you have the flexibility to edit it manually or pass it to another feature that can modify it based on your provided guidelines. This ensures that changes are made accurately and consistently while minimizing errors. Whether you are working on a large-scale project or collaborating with a team, this feature streamlines code navigation and modification, ultimately improving development efficiency. By acting as an intelligent assistant that understands your codebase, it transforms the way developers interact with their projects, making coding more intuitive and productive.
Currently this feature supports the files with any extention if included in .js, .c, .cpp, .cs, .py, .java, .ts, .go, .rb, .php, .html, .css. In future, we will try to extend it to more extensions.

On the backend side, a pre-trained Jina AI model is loaded as an embedding model and tokenizer for the embeddings and tokenization purposes. Also we are creating local Qdrant database for the vector storage (embeddings storage, create using Jinna model). Everytime you will search, new embeddings wil be generated to ensure that you might be updated the code with new editions, so embeddings must be updated, results in effective result.
After this, the code will scan the codebase including all the files with extenstion we are offering currently. After scanning, each function, tag, block, conditions, loops using regex expressions. This result will be passed through the tokenizer to create chunks. Each chunk will be embedded & these embeddings will be stored in Qdrant database.
When the user will enter any query, that natural language query will also be converted into an embedding. The a searche will happen for getting from Qdrant database the closest matching code snippets. After getting results it will display the top 3 matching answers.

Context-Aware Chat

Example: Users can enter queries like "Find the function that handles user authentication," and the system will return relevant code snippets. Utilizes Cosine Similarity in Qdrant database to rank the best-matching results. It will give you k results, here we set it to k=3, so y'll get top3 matchings of the function that's handling user's athentication and authorization.

Key Features

Multi-Language Code Support

This Supports multiple programming languages, including Python, Java, C, C++, JavaScript, TypeScript, Go, PHP, Ruby, and more. Uses regex patterns to extract functions, methods, and classes for effective indexing.

Automated Code Search & Retrieval

The system allows users to find specific code sections without manually searching through files. It indexes code snippets and retrieves the most relevant results based on natural language queries.

Context-Aware Editing

After finding the relevant code section, users can either manually edit it or pass it to another AI-assisted editing feature that follows all your specified instructions and will make edits.

Future Enhancements

More regex patterns for better searching from codebase including different programming languages.
Cloud-based Qdrant or optimize embedding storage for large-scale codebases.
A ranking mechanism to prioritize the most useful results, in abetter and most effective way.

Usage Guidelines

I. The system works best when queries are descriptive, e.g., "Find the function that initializes the database connection."
II. Proper structuring and consistent naming of functions help improve search accuracy.
III. If you are making frequent changes in your code, rerun to update stored embeddings.