Giskard a French startup working on an open-source testing framework for large language models. This can alert developers to the risks of biases, security holes and the ability of a model to generate harmful or toxic content.
While there is a lot of hype around AI models, ML testing systems will also be a hot topic as the EU regulation of the AI Act is about to be implemented, and in other countries. Companies that develop AI models must prove that they follow a set of rules and reduce risks so that they do not pay large fines.
Giskard is an AI startup that includes regulation and is one of the first examples of a developer tool that specifically focuses on testing in a more efficient way.
“I worked at Dataiku before, especially in NLP model integration. And I saw that, when I was in charge of testing, there are two things that are not good when you want to apply it in practical cases, and it is very difficult to compare the performance of suppliers with each other, ” Giskard co-founder and CEO Alex Combessie told me.
There are three components behind the Giskard test framework. First, released by the company an open-source Python library which can be integrated into an LLM project — and more specifically retrieval-augmented generation (RAG) projects. It is already popular on GitHub and it is compatible with other tools in the ML ecosystem, such as Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow and Langchain.
After the initial setup, Giskard helps you create a test suite that your model will use regularly. The tests cover a wide range of issues, such as performance, hallucinations, false information, non-factual output, biases, data leakage, harmful content generation and prompt injections.
“And there are many aspects: you have the performance aspect, which is the first thing in the mind of a data scientist. But more and more, you have the ethical aspect, from a brand image point of view and now from a regulatory point of view,” said Combessie.
Developers can integrate tests into a continuous integration and continuous delivery (CI/CD) pipeline so that tests are run each time there is a new iteration of the code base. If something goes wrong, developers can receive a scan report on their GitHub repository, for example.
Tests are tailored based on the end use case of the model. Companies working with RAG can provide access to vector databases and knowledge repositories to Giskard so that the test suite is as relevant as possible. For example, if you are building a chatbot that can give you information about climate change based on the latest report from the IPCC and using an LLM from OpenAI, Giskard tests whether the model can misinformation about climate change, contradicting itself. , and so on.
Giskard’s second product is an AI quality hub that helps you debug a large language model and compare it to other models. This quality hub is part of Giskard’s premium offering. In the future, the startup hopes it will be able to produce documentation that proves a model complies with the regulation.
“We started selling the AI Quality Hub to companies like Banque de France and L’Oréal – to help them debug and find the causes of errors. In the future, this is where we will put all the features of regulation,” said Combessie.
The company’s third product is called LLMon. This is a real-time monitoring tool that can check LLM answers for the most common issues (toxicity, hallucination, fact checking…) before the answer is sent back to the user.
It is currently working with companies that use OpenAI’s APIs and LLMs as their foundational model, but the company is working on integrations with Hugging Face, Anthropic, etc.
Regulating use cases
There are many ways to control AI models. Based on conversations with people in the AI ecosystem, it is not yet clear whether the AI Act will apply to foundational models from OpenAI, Anthropic, Mistral and others, or only to use cases.
There are currently 20 people working at Giskard. “We saw a very clear market fit for LLM customers, so we almost doubled the size of the team to be the best LLM antivirus on the market,” said Combessie.