Internships on the tasks for different directions are also available. Contact us!
- Improved Storage & Retrieval (-> faster processing & longer documents & more documents)
- Select the embeddings & the vector DB or library (e.g. FAISS) or multiple methods
- Integrate it/them into Dream instead of currently used DeepPavlov model
- Reasoning / planning for complex questions
- Multimodality: images in texts
- Extract images; process each to get textual representation
- Add textual information about images to the text
- Hint: Start with already integrated methods (image captioning); then Research methods used in SoTA systems
- Multimodality: audio / video QA
- Update ASR service in Dream
- Process the audio
- Feed the resulting text into existing Doc Processing service
- Hint: https://scenex.jina.ai/
- More formats; enhanced methods of extracting & structuring text
- Add all popular text formats (epub, fb2) to process books as well
- Research best practices for structuring and processing texts of different formats and genres
- Complex tasks solving
- Create a pipeline that can plan and solve tasks for more than one iteration
- Add flexibility to the number of context
- Think of the way to extract useful info from the previous utterances. Maybe we should consider those cases when the previous service is the same as the current
- E.g.: – What is the weather in Moscow – 22 degrees —Will it rain today?
- More and better APIs
- Review APIs that already exist (in distribution) and maybe improve them
- Add the possibility to customize list of APIs for distribution
- See LangChain --developer can define the list of tools they want to use
- Create universal skill
- Create a skill that can be adapted to different planning approaches (react, deps, etc.)
- Add feature that allows to customize logs
- Some people may want to see the logs like thought, action, etc., while some not
- Future: think of a way to choose top n APIs to give to the model to choose
- Create a service that will retrieve top n most appropriate APIs
- Hypotheses Ranking Model:
- Find dataset with ranking hypotheses
- Train a model
- RL from HF:
- Integrate a system for dialog markup
- Fine-tuning from HF
- Evaluation on Benchmarks:
- Create a service for evaluating a system on benchmarks
- Self-evaluation:
- Create a service for evaluating a system using specific LLM
- Frame-based Approach
- Create a frame-based Template Skill
- Add frame-based to Dream Builder
- Few-shots
- Integrate few-shot classifier from DP
- Integrate few-shot entities from DP
- Flexible pipeline
- Add a one-node skill that can retrieve the most appropriate response from json using sentence ranker
- Testing Pipeline for Dream
- Change main distribution (this one which is tested)
- Improve testing pipeline
- Testing Pipeline for Dream Builder
- Dialog Topic Recommendation
- Create a model for next subject prediction
- Hint: take recommendations based on KG by Ali Panesh (HSE student 2022-2023)
- Custom KGs
- Add custom KGs to store info
- Add personality info to dialog context
- Adaptation for user’s and system’s mood
- Summarization of dialog context
- Separate model for summarization
- LLM-based summarization of dialog context
- Contradiction detection
- Separate model for contradiction detection
- Dialog breakdown detection or Open (speech function) detection
- Quick hack for Multilingual support
- Add translation pre- and post-filters
- Hint: ChatGPT not always good. So, need a translation filter
- Image Processing
- Add image captions to user utterance
- Add LLM considering images
- add interface for image sending from browser
- Voice processing
- Russian LLM
- Train Russian LLMs
- Train Russian Hypotheses Ranking Model
- Add Evaluation on Russian Benchmarks
- Feedback
- Create feedback form
- Create a feedback analysis service (dashboard!)
- Automatic Evaluation and Statistics Service:
- Create a dashboard service
- Integrate available models for evaluation