Page MenuHomePhabricator

[terraform] identity service cloudwatch alarms
ClosedPublic

Authored by will on May 1 2024, 8:10 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Dec 21, 6:00 PM
Unknown Object (File)
Sat, Dec 21, 6:00 PM
Unknown Object (File)
Sat, Dec 21, 6:00 PM
Unknown Object (File)
Sat, Dec 21, 6:00 PM
Unknown Object (File)
Tue, Dec 17, 10:57 PM
Unknown Object (File)
Dec 2 2024, 11:23 PM
Unknown Object (File)
Nov 10 2024, 9:02 PM
Unknown Object (File)
Nov 10 2024, 8:20 PM
Subscribers

Details

Summary

This introduces the cloudwatch alarms for each identity metric filter.

Depends on D11853

Test Plan

Tested on staging by triggering identity Search Error error log and activated the cloudwatch alarm.

As part of my final testing task, will test each log pattern individually before landing

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

will requested review of this revision.May 1 2024, 8:25 PM
varun added 1 blocking reviewer(s): bartek.

i think @bartek should take a look when he's back

services/terraform/remote/aws_cloudwatch_alarms.tf
9 ↗(On Diff #39727)

If we go with my suggestion in D11852, I think the pattern will look like this. not sure, though

services/terraform/remote/aws_cloudwatch_alarms.tf
9 ↗(On Diff #39727)

sorry, should've left this comment on D11853 instead

services/terraform/remote/aws_cloudwatch_alarms.tf
6 ↗(On Diff #39727)

Isn't this threshold too low? On Staging, I get spammed even when the threshold is 5, 8, and sometimes even 20.
Of course, this happens when someone is testing stuff on staging and it fails.

services/terraform/remote/aws_cloudwatch_alarms.tf
6 ↗(On Diff #39727)

Me and Varun discussed this and plan on having the threshold set low at first and configure certain thresholds when they come up on staging.

An idea I had for this is to actually put the threshold in identity_error_patterns, allowing us to easily configure thresholds for certain error types.

I think you bring up an important point on procedure for testing on staging. Perhaps similar to our terraform channel, we'd have devs notify everyone to ignore certain error types they might trigger on purpose?

9 ↗(On Diff #39727)

It looks like we'll be using a separate errorType field in our logs. I believe this means the pattern can just be something like Search Error without : or any regex

services/terraform/remote/aws_cloudwatch_alarms.tf
9 ↗(On Diff #39727)

Besides for the DB Error error types which will include a * at the start as there's multiple kinds

This revision is now accepted and ready to land.May 7 2024, 1:43 AM