GPT-4 with Vision (GPT-4V) allows users to instruct GPT-4 to analyze user-provided image inputs. This is the latest feature we’re making widely available. The integration of additional modalities (such as image inputs) into large language models (LLMs) is considered by some to be a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of extending the impact of language-only systems with new interfaces and capabilities, allowing them to solve new tasks and offer new experiences to their users. In this system map, we analyze the security properties of GPT-4V. Our security work for GPT-4V builds on the work done for GPT-4, and here we dive deeper into the assessment, preparation, and mitigation work done specifically for image inputs.
GPT-4V(ion) system board

Leave a comment