-
Notifications
You must be signed in to change notification settings - Fork 616
fix: wrap structure_output system prompt in guardContent #1564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: wrap structure_output system prompt in guardContent #1564
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Unshure
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested if this actually fixes the issue provided in the issue? I think this might still cause the unwanted behavior.
Additionally, lets add an integ test for this to confirm it works as expected.
236ffaa to
8af4f21
Compare
| await agent._append_messages( | ||
| {"role": "user", "content": [{"text": "You must format the previous response as structured output."}]} | ||
|
|
||
| # Use guardContent for Bedrock models with guardrails to avoid prompt attack filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always apply the guardContent when a guardrail is added for bedrock. I dont think that should be the case. Instead, can we pass in a structured_output_retry_message so the user can configure the below message on their own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this would solve the issue. Regardless of what message the user passes, it gets injected during execution (not as an initial system prompt), which will still trigger the guardrail. I tested this by changing the message text, and it still triggered.
Description
When using structured output with Bedrock Guardrails (prompt attack filter enabled), the internal framework message "You must format the previous response as structured output" was being flagged as a potential prompt injection attack, causing
guardrail_intervened.Based on https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-prompt-attack.html & https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-contextual-grounding-check.html
we could wrap this system prompt in guardcontent as workaround.
Related Issues
#1288
Documentation PR
Type of Change
Bug fix
New feature
Breaking change
Documentation update
Other (please describe):
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
I tested the example user provided under correct guardrail configurations:
Also added a unit test and an integration test.
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.