GPT-4 with Vision (GPT-4V) allows users to instruct GPT-4 to analyze user-supplied input images and is the latest capability we are making widely available. Incorporating additional modalities (such as pictorial inputs) into large language models (LLMs) is considered by some to be a key frontier in AI research and development. Multimodal LLMs offer the ability to extend the impact of language-only systems with new interfaces and capabilities, enabling them to solve new tasks and provide new experiences to their users. In this system tab, we analyze the security features of GPT-4V. Our security work for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation and mitigation work done specifically for image inputs.