AI & Cloud Governance

2023 was the year of Large Language Models and chatbots, but now in 2024, Generative AI has expanded its capabilities in other modalities like audio (whisper), images (DALL-E 3), and video (Sora). While GPT models focus on predicting the next word, these multi-modal models like Stable Diffusion are now focusing on predicting the next pixel. It’s time for the security world to expand its capabilities to secure applications/ models in non text/ language modalities.

Two sets of issues:

How do we protect against AI generated content? For this there is a new standard (C2PA) where metadata is signed to tell if a content is AI generated or not. This is similar to how one uses TLS to find if they are in a legitimate site.
How do we protect our apps from consuming/ generating content which may be wrong? For the second case, of the main concerns are sanitizing input or output to prevent multimodal prompt injection.

‍There are 2 parts to prompt filtering to prevent injection:

‍Moderation: removing inputs or outputs that may not be legal or may promote unreasonable or extreme behavior. This filtering is more global across use cases.‍
Use case filtering: removing inputs or outputs that do not align to the use case. If our use case is financial and we are asking healthcare questions this filtering should help.

‍

Read the full article: 2024, The Year of Multi-modal Models