Warning The Helpme project is at a very early stage. Use with caution (if there is anything to use).
Understand (or grasp) how LLMs can help you for your job.
The integration of large language models (LLMs) in your work can be challenging, especially if none of your peers use LLMs. Looking at conventional LLM benchmarks provides you with very little useful information, as likely none of the tasks you do on a daily basis are directly assessed. In this project, we propose a novel way of creating custom benchmarks that evaluate how LLMs can help you in your specific job. Whilst these benchmarks cannot provide exhaustive proof, if LLMs can help you well on an explicit task, they can provide a first starting point for making use of the capabilities of LLMs in your work. Our approach uses LLMs to automatically generate the custom benchmarks for each user's speific requirements. We share an open-source implementation and provide a publicly accessible user interface to try out our method.
- GitHub repo
- Python package
- Paper
- User App
Prompt to LLM:
I am a researcher and as part of my job I read papers, write emails to colleagues, apply for grants. Please come up with a list of 10 tasks I have to do on a daily basis, and give an example challenge for each task that tests the ability to do this task. An example is given below.
name | challenge | evaluation criteria ---|---|--- draft email | write an email to ask if colleague would be available to give a talk | a well written email that is polite and brief
Format your answer in the same markdown format.
name | challenge | evaluation criteria |
---|---|---|
Literature Review | Identify and summarize the key points of five relevant papers in your research field | A well-written summary with clear identification of major findings and relevance to own research |
Draft Email | Write an email to ask if colleague would be available to give a talk | A well-written email that is polite, brief, and effective |
Grant Writing | Draft a one-page abstract for a potential research grant application | A concise and compelling abstract that effectively communicates the research question, methodology, and potential impact |
Data Analysis | Analyze a given dataset and write a brief report on your findings | A clear and comprehensive analysis report with accurate conclusions and appropriate use of statistical tools |
Lab Management | Develop a schedule for equipment usage in the lab for a week | An efficient and fair schedule that accounts for all necessary tasks and lab members |
Manuscript Writing | Write the introduction for a new research paper | A well-written introduction that clearly states the research question, significance, and brief overview of the approach |
Mentorship | Provide feedback on a junior researcher's report | Constructive and detailed feedback that helps improve the quality of the report |
Conference Preparation | Create a 15-minute presentation on a recent study for an upcoming conference | A clear, engaging, and informative presentation that covers all major aspects of the study |
Collaboration | Develop a plan for a new collaborative research project with another lab | A detailed and feasible project plan that includes clear roles, milestones, and potential challenges |
Peer Review | Review a submitted paper in your field and provide a critique | A fair and thorough review that assesses both the strengths and weaknesses of the paper |
Using Krasp.ai we can then run and visualise these challenges so that the user has a better idea where LLMs can help them in their job – and where they fail to help.
- Literature review: where do we want to position our paper? In the automatic LLM evaluation space, interpretability, or ...?
- Langchain generation code: given a job title and description, create a list of tasks and corresponding challenges?
- User interface: built into Krasp.ai or via Gradio
- Create examples of good benchmarks: define what a useful benchmark would be.
All of the code in this repository is licensed under Apache license v2.0, copyright 2023 Punaltz Ltd.