What is the MOST li...
 
Notifications
Clear all

What is the MOST likely cause of the 5-minute connection outage?

1 Posts
1 Users
0 Likes
37 Views
Octavio
(@katterjohnoctavio)
Noble Member
Joined: 7 months ago
Posts: 361
Topic starter  

A Database Specialist is troubleshooting an application connection failure on an Amazon Aurora DB cluster with multiple Aurora Replicas that had been running with no issues for the past 2 months. The connection failure lasted for 5 minutes and corrected itself after that. The Database Specialist reviewed the Amazon RDS events and determined a failover event occurred at that time. The failover process took around 15 seconds to

complete.

What is the MOST likely cause of the 5-minute connection outage?

  • A . After a database crash, Aurora needed to replay the redo log from the last database checkpoint
  • B . The client-side application is caching the DNS data and its TTL is set too high
  • C . After failover, the Aurora DB cluster needs time to warm up before accepting client connections
  • D . There were no active Aurora Replicas in the Aurora DB cluster

Show Answer Hide Answer

Suggested Answer: B

Explanation:

When your application tries to establish a connection after a failover, the new Aurora PostgreSQL writer will be a previous reader, which can be found using the Aurora read only endpoint before DNS updates have fully propagated. Setting the java DNS TTL to a low value helps cycle between reader nodes on subsequent connection attempts.

Amazon Aurora is designed to recover from a crash almost instantaneously and continue to serve your application data. Unlike other databases, after a crash Amazon Aurora does not need to replay the redo log from the last database checkpoint before making the database available for operations. Amazon Aurora performs crash recovery asynchronously on parallel threads, so your database is open and available immediately after a crash. Because the storage is organized in many small segments, each with its own redo log, the underlying storage can replay redo records on demand in parallel and asynchronously as part of a disk read after a crash. This approach reduces database restart times to less than 60 seconds in most cases

   
Quote
Share: