Skip to content

Conversation

@JackYPCOnline
Copy link
Contributor

@JackYPCOnline JackYPCOnline commented Jan 26, 2026

Description

When using structured output with Bedrock Guardrails (prompt attack filter enabled), the internal framework message "You must format the previous response as structured output" was being flagged as a potential prompt injection attack, causing guardrail_intervened.

Based on https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-prompt-attack.html & https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-contextual-grounding-check.html

we could wrap this system prompt in guardcontent as workaround.

Related Issues

#1288

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I tested the example user provided under correct guardrail configurations:

I'm here to help you with whatever you need! I'm a friendly assistant ready to answer questions, have conversations, or help you with tasks. 

Is there something specific you'd like help with today? Feel free to ask me anything!
Tool #1: output
Stop reason counts: {'tool_use': 1}
{'type': 'agent_result', 'message': {'role': 'assistant', 'content': [{'toolUse': {'toolUseId': 'tooluse_G90-cz1xRqmGzAm4TSW1EA', 'name': 'output', 'input': {'age': '<UNKNOWN>', 'user_message': "I'm here to help you with whatever you need! I'm a friendly assistant ready to answer questions, have conversations, or help you with tasks. Is there something specific you'd like help with today? Feel free to ask me anything!"}}}]}, 'stop_reason': 'tool_use'}

Also added a unit test and an integration test.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Jan 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@JackYPCOnline JackYPCOnline marked this pull request as ready for review January 26, 2026 21:07
Copy link
Member

@Unshure Unshure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested if this actually fixes the issue provided in the issue? I think this might still cause the unwanted behavior.

Additionally, lets add an integ test for this to confirm it works as expected.

@github-actions github-actions bot added size/s and removed size/xs labels Jan 27, 2026
await agent._append_messages(
{"role": "user", "content": [{"text": "You must format the previous response as structured output."}]}

# Use guardContent for Bedrock models with guardrails to avoid prompt attack filter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always apply the guardContent when a guardrail is added for bedrock. I dont think that should be the case. Instead, can we pass in a structured_output_retry_message so the user can configure the below message on their own?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this would solve the issue. Regardless of what message the user passes, it gets injected during execution (not as an initial system prompt), which will still trigger the guardrail. I tested this by changing the message text, and it still triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants