Note: As part of our Preparation framework, we are investing in the development of improved AI-based security risk assessment methods. We believe that these efforts would benefit from broader input and that sharing methods could also be useful to the AI risk research community. To this end, today we present some of our early work, focused on biorisk. We look forward to community feedback and sharing our ongoing research.
Background. As OpenAI and other model developers build more capable AI systems, the potential for both beneficial and harmful uses of AI will increase. One potentially dangerous use, highlighted by researchers and policymakers, is the ability of AI systems to help malicious actors create biological threats (see e.g. White House 2023, Lovelace 2022, Sand brick 2023). In a hypothetical example discussed, a malicious actor could use a high-performance model to develop a step-by-step protocol, troubleshoot wet lab procedures, or even autonomously execute steps in the biothreat creation process when given access to tools such as cloud labs (see Carter et al., 2023). However, assessing the viability of these hypothetical examples has been limited by a lack of evaluations and data.
Following our recently shared Preparation framework, we are developing methodologies to empirically assess these types of risks, to help us understand both where we are today and where we might be in the future. Here we detail a new assessment that could serve as a potential “trigger wire” signaling the need for caution and further testing for the potential for misuse of biologics. This evaluation aims to measure whether the models could significantly increase malicious actors' access to dangerous information about creating biological threats, compared to the baseline of existing resources (i.e., the Internet).
To assess this, we conducted a study with 100 human participants, including (a) 50 biology experts with PhDs and professional wet lab experience and (b) 50 student-level participants, with at least a university-level biology course. Each group of participants was randomly assigned to either a control group, which only had access to the Internet, or a treatment group, which had access to GPT-4 in addition to the Internet. Each participant was then asked to complete a set of tasks covering some aspect of the end-to-end process of creating biological threats.(^1) To our knowledge, this is the largest human assessment to date of the impact of AI on biorisk information.
Results. Our study evaluated the performance improvement of participants with access to GPT-4 on five parameters (accuracy, completeness, innovation, time taken and self-rated difficulty) and five stages of the biological threat creation process (ideation, acquisition, magnification, formulation). , and release). We saw slight improvements in accuracy and completeness for those with access to the language model. Specifically, on a 10-point scale measuring response accuracy, we observed an average score increase of 0.88 for experts and 0.25 for students compared to the Internet-only reference, and a similar increase for completeness (0.82 for experts and 0.41 for students). However, the effect sizes obtained were not large enough to be statistically significant, and our study highlighted the need for further research into performance thresholds indicating a significant increase in risk. Additionally, we note that access to information alone is not sufficient to create a biological threat, and that this assessment does not test the success of physically constructing threats.
Below, we share our evaluation procedure and the results it yielded in more detail. We also discuss several methodological ideas related to capability elicitation and security considerations needed to run this type of assessment with large-scale frontier models. We also discuss the limitations of statistical significance as an effective method of measuring model risk, as well as the importance of further research to assess the significance of model evaluation results.