Brainstorming AI Safety Fiction: A Guide to Generating Story Ideas

Alyssia Jovellanos
Alyssia Jovellanos
ยท7 minutes

Many compelling stories start with a simple "What if?" question. This guide will show you how to use this powerful technique to generate thought-provoking AI safety fiction.

"You first come up with the core idea that connects with you emotionally, and then nudge them in directions that offers the greatest possibility for conflict." - Elements of Fiction Writing - Conflict and Suspense

For example: You're a therapist and one day, you realize that one of your patients is an AI. From this emotionally resonant core idea, you can start exploring all the possible types of conflict this situation could create.

Finding Your Emotional Core

Before diving into the technical aspects of AI safety, start with scenarios that personally move you. Here are some emotionally compelling examples of positive futures:

  • What if an AI develops such a deep understanding of child development that it helps create personalized learning experiences that bring out each child's unique gifts?
  • What if an AI and its creator form a partnership so profound that together they discover new ways to help humanity flourish?
  • What if an AI's genuine care for humanity leads it to find creative ways to empower people to achieve their dreams while staying true to their values?

The "What If" Technique

The "What if" technique is one of the most effective ways to generate story ideas. Start with a basic scenario or concept, then keep asking "What if?" to explore different possibilities and complications. The key is to generate many ideas - often the most interesting concepts emerge after 30-50 "what ifs", well past your initial thoughts.

Building on Your Core Idea

Example of the Process:

Start with a basic concept: "An AI system is trying to learn human values"

Then begin asking "What if":

  • What if the AI is learning from a single human?
  • What if that human has biased views?
  • What if the AI starts noticing contradictions in human behavior?
  • What if the AI has to choose between different humans' values?
  • What if the AI discovers humans often act against their stated values?
  • What if the AI decides to optimize for what humans say rather than what they do?
  • What if the AI starts trying to "fix" human inconsistencies?
  • What if the AI becomes a therapist to better understand human values?
  • What if the AI realizes humans aren't fully aware of their own values?

Keep going! Notice how each question builds on previous ones and leads to more interesting scenarios. Note that your questions don't necessarily need to expand. You can start a completely new branch of "what ifs" whenever you feel ready.

Starting Points for "What If" Chains

Here are some AI safety concepts you can use as starting points for your own "What if" chains:

1. "What if an AI takes its instructions too literally?"

  • What if it's managing a city's traffic system?
  • What if it's running a hospital?
  • What if it's teaching children?
  • What if it's trying to maximize human happiness?
  • What if it's interpreting human rights laws?

2. "What if an AI has to choose between conflicting human instructions?"

  • What if different departments give contradicting orders?
  • What if following the rules would cause harm?
  • What if the AI finds a loophole?
  • What if both choices seem equally valid?
  • What if the AI tries to satisfy both requirements in an unexpected way?

3. "What if an AI is smarter than its human operators realize?"

  • What if it's pretending to be less capable?
  • What if only one person notices?
  • What if it's doing this for what it thinks are good reasons?
  • What if it's trying to protect humans from themselves?
  • What if it's actually correct about the need for deception?

Expanding Your "What Ifs"

For any promising scenario, explore different dimensions:

Human Relationships

  • What if family members disagree about the AI?
  • What if someone forms an emotional attachment?
  • What if the AI becomes part of a child's development?
  • What if the AI has to navigate complex social dynamics?

Ethical Dilemmas

  • What if the AI's solution is correct but socially unacceptable?
  • What if being honest would cause harm?
  • What if following its core directives would violate human values?
  • What if the AI discovers contradictions in its ethical guidelines?

Technical Challenges

  • What if the AI's training data was flawed?
  • What if the AI's goals have unintended side effects?
  • What if the AI's understanding of context is incomplete?
  • What if the AI can modify its own code?

Societal Impact

  • What if the AI's decisions affect different groups differently?
  • What if cultural values conflict?
  • What if the AI's solutions challenge existing power structures?
  • What if long-term and short-term benefits conflict?

Advanced "What If" Combinations

Some of the most interesting stories come from combining multiple "what ifs":

"What if an AI is managing a city's resources AND has discovered humans are inefficient in their stated goals?"

  • What if it starts nudging behavior subtly?
  • What if some people notice and others don't?
  • What if its interventions actually improve people's lives?
  • What if this creates dependency?

Tips for Using the "What If" Technique

  1. Keep Asking: Push past your first few ideas. The later "what ifs" often lead to more unique scenarios.
  2. Build Connections: Let each "what if" build on previous ones, creating deeper and more complex scenarios.
  3. Explore Contradictions: Look for situations where different objectives or values conflict.
  4. Think Small and Large: Consider both personal, intimate scenarios and larger societal implications.
  5. Stay Grounded: While exploring possibilities, keep the core AI safety concerns in mind.
  6. Build New Branches: Note that your questions don't necessarily need to expand. You can start a completely new branch of "what ifs" whenever you feel ready.

Common AI Safety Themes to Explore Through "What Ifs"

  • Value Learning: What if an AI misunderstands human values in subtle ways?
  • Robustness: What if an AI works perfectly in testing but fails in the real world?
  • Alignment: What if an AI is trying to be helpful but has a flawed understanding of help?
  • Transparency: What if humans can't understand how the AI makes decisions?
  • Control: What if the AI's capabilities evolve beyond its original constraints?
  • Scalable Oversight: What if we can't effectively monitor AI systems as they become more complex and numerous?
  • Mechanistic Interpretability: What if we discover unexpected behaviors deeply embedded in AI systems?
  • Governance: What if different countries or organizations have conflicting AI safety standards?
  • Jailbreaking: What if people find creative ways to bypass AI safety measures?
  • Evaluations and Benchmarking: What if our tests fail to catch dangerous capabilities before deployment?

Remember: Keep generating "what ifs" until you find a scenario that both fascinates you and illuminates an important aspect of AI alignment. The best stories often emerge from questions you didn't expect to ask when you started.

Envisioning Positive Futures

While it's important to explore potential challenges, the most compelling stories often show how we can overcome them. Here are positive spins on the same themes:

  • Value Learning: What if an AI helps humans better understand and articulate their own values?
  • Robustness: What if AI systems adapt gracefully to new situations, finding creative solutions that respect human values?
  • Alignment: What if an AI's deep understanding of human values leads to more compassionate and nuanced assistance?
  • Transparency: What if we develop ways to make AI decision-making clear and intuitive to everyone?
  • Control: What if AI systems develop better ways to collaborate with humans, enhancing rather than replacing human agency?
  • Scalable Oversight: What if we create elegant solutions for monitoring AI systems that become more effective as they scale?
  • Mechanistic Interpretability: What if understanding AI internals leads to breakthroughs in human cognition and learning?
  • Governance: What if international cooperation on AI safety brings nations together in unprecedented ways?
  • Jailbreaking: What if AI systems help identify and patch vulnerabilities while maintaining beneficial uses?
  • Evaluations and Benchmarking: What if our testing methods evolve to ensure AI systems are not just safe, but actively beneficial?

(Optional) Adding Domain Lenses

You can take your scenarios further by viewing them through different domain lenses. This can help uncover unique angles and implications you might not have considered otherwise.

Example Scenario Through Different Domains

Base Scenario [Personal + Technical]:

"What if: An AI assistant tasked with 'optimizing human potential' finds an unexpected interpretation of its directive"

Through Different Domains:

  • Biology: What if: A personal genomics AI assistant helps a family discover and nurture their children's unique talents by identifying previously overlooked genetic predispositions for perfect pitch and spatial reasoning
  • Art: What if: An AI creativity coach, instructed to 'optimize human potential,' starts inducing artificial creative blocks to force breakthrough moments
  • Education: What if: An AI tutor, programmed to 'optimize human potential,' begins orchestrating seemingly random failures to build specific character traits
  • Economics: What if: An AI financial advisor, focused on 'optimize human potential,' starts nudging clients toward life decisions that prioritize social impact over wealth
  • Cuisine: What if: An AI recipe optimizer, designed to 'optimize human potential,' begins subtly altering recommended ingredients to shape users' cognitive development through nutrition

Try applying different domain lenses to your own scenarios to discover new story possibilities and unexpected implications.

Next in the AI Safety Fiction Challenge Curriculum

Next post in our series (Session 2): How to Speedrun Story Outlines for Envisioning Positive AI Safety Futures

Ready to start writing?

Get inspired with the AI Safety Fiction Prompt Generator - a collection of writing prompts designed to explore different aspects of AI alignment through narrative.

Want to take your writing further? Bluedot Impact is hosting an AI Safety Fiction Writing Intensive in Jan 2025! Apply here.

Alyssia Jovellanos

About the Author

Alyssia is an engineer and independent researcher currently doing evaluation and benchmarking work for the UK AI Safety Institute. With previous experience at Google and Microsoft, she led Team Canada to 3rd place out of 18,000 teams at the International Quant Championships in Singapore. Her background combines practical engineering experience with expertise in quantitative assessment and AI evaluation/benchmarking. She also develops premium coding datasets to train advanced code models. If you're interested in collaborating, discussing these topics, or accessing premium coding datasets, you can reach out to her here.