|
|
|
|
|
by pbgcp2026
58 days ago
|
|
" it supports prompt caching"
May I ask if you checked that?
I use "{"cachePoint": { "type": "default" }" and I found 2 things:
* 1) even if stated in the Doco, Bedrock Converse API does not allow 1hr expiry time, only 5m - gives error when attempted;
* 2) Bedrock Converse API does accept up to 4 cachePoint's but does NOT cache and returns zeroes. LOL. It was confirmed by some other people on Github.
(Note: VertexAI does cache properly reducing the bill drastically, so I use Vertex instead of OpenRouter.) |
|
Edit: I do see the first request shows 0 cache read, 7k cache write tokens. The next request shows 7k cache read, 900 cache write tokens. The agent run summary is:
usage {
cache_read_input_tokens 244586
cache_write_input_tokens 38399
completion_tokens 8131
input_tokens 1172
output_tokens 8131
prompt_tokens 1172
total_tokens 292288
}
I do see a recent issue in the Strands Agent issue tracker about 1hr TTL getting ignored and defaulting to 5m TTL. I haven't validated cache TTL but these agent runs take ~2-3m so a 5m TTL is sufficient.
I also checked the AWS bill and see separate Usage SKUs
USE1-MP:USE1_CacheWriteInputTokenCount-Units $0.34
USE1-MP:USE1_OutputTokenCount-Units $0.27
USE1-MP:USE1_CacheReadInputTokenCount-Units $0.16
USE1-MP:USE1_InputTokenCount-Units $0.01