Elevated platform errors

Incident Report for 360Learning

Resolved

Update of April 14th.

- We've been observing a stability of our platform and a low error rate since the last fixes deployed.

- We will keep working on a clarification of the error messages and on a reinforcement of our infrastructures in the next few weeks.

- We consider the incident resolved now.
Posted 2 years ago. Apr 14, 2023 - 20:46 CEST

Update

Update of April 7th.

- We observed a slowdown on Wednesday for 1 minute around 4.15pm CET, due to the replication mechanism of our database. We applied a fix this morning and we keep monitoring the behavior.

- As for error popups, we fixed most causes except:

- When users play SCORM courses that are included into paths (not programs), they might see an error popup on the results page. These popups can be safely dismissed. We plan to deploy a fix on Tuesday morning CET.

- In some cases, accessing deleted items or items for which the users have lost visibility rights, may lead to these generic error messages instead of a proper explanation for the cause.

We are fully engaged in solving these issues and we apologize for the inconveniences.

Next update before 1 week.
Posted 2 years ago. Apr 07, 2023 - 17:12 CEST

Update

April 4th update.

- The platform is stable with a good latency and a low error rate. We will embed fixes in our regular software update planned tomorrow early morning CET, that should decrease the error rate even more. We will of course keep monitoring after this deployment.

- Regarding the incident of March 29th (9:02-9:37am CET), an incident report is available: https://drive.google.com/file/d/1wd4m0ODmHfSzfJg0_CTJDMTluoCu6JMZ/view?usp=share_link
Posted 2 years ago. Apr 04, 2023 - 12:27 CEST

Update

March 30th update. The number of errors stood at a low level while our latency was good today. We will deploy additional fixes with the next regular release, planned for Wednesday 5th. We keep monitoring. Next update here before 3 days.
Posted 2 years ago. Mar 30, 2023 - 18:58 CEST

Monitoring

We have resolved the issue due to a problem on 2 servers hosting the database. The situation is back to normal. We keep monitoring.
We apologize for the inconvenience.
Posted 2 years ago. Mar 29, 2023 - 09:48 CEST

Investigating

Since 9:00 am CET today (March 29th), we observe a high error rate and elevated latencies. We're investigating the issue.
Posted 2 years ago. Mar 29, 2023 - 09:20 CEST

Update

According to our monitoring, the number of errors is still stabilized at a low level, while our latency is good.
We need a few more days to deploy new fixes, and make sure to minimize the error messages.
Next update before Wednesday 29th 5pm CET.
Posted 2 years ago. Mar 24, 2023 - 18:53 CET

Monitoring

The number of errors has been stabilized at a low level. We will deploy fixes for the long tail of errors in the next few days, and keep monitoring. We will keep updating this page daily. We will however not send further e-mail notifications as the situation is now under control.
Posted 2 years ago. Mar 23, 2023 - 19:49 CET

Update

A new batch of fixes was deployed today. We observed another 3-fold decrease in the number of errors compared to yesterday, with a good latency of our servers.

We will keep implementing our plan in the next few days, and will update this page at least daily.
Posted 2 years ago. Mar 22, 2023 - 19:08 CET

Identified

We confirm the issue. The causes have been identified, and we have built a plan to fix these errors.

Some fixes were deployed today, resulting in a 3-fold decrease in the number of errors.

We will keep implementing our plan in the next few days, and will update this page at least daily.
Posted 2 years ago. Mar 21, 2023 - 18:06 CET

Investigating

Some customers keep reporting an elevated number of error messages while using our platform.

We are currently looking into the issue.
Posted 2 years ago. Mar 21, 2023 - 10:42 CET
This incident affected: Web Application, Android App, and iOS App.