Time
2018/02/07 5PM EDT to 2018/02/07 9:00PM EDT (~4 hours)
Issue
Dynalist service is intermittent, especially regards to syncing. Several users reported seeing internal server errors. No data was lost.
Cause
Disk space was full on our web server, causing large web-requests to be dropped as temp files couldnāt be stored for processing. This was mostly taken by /tmp
.
Mitigation
Cleaned /tmp
and restarted server.
Prevention going forward
Installed automated scripts to clean /tmp
periodically.
Why this hasnāt been caught/fixed sooner
Our disk space monitoring graph was just slightly out of view on our main dashboard, so we did not noticed while it was getting full. Weāve rearranged our dashboard as a result to better monitor the disk space.
The reason why the service was down for ~4 hours was because all of our team members were unavailable at the time of outage.
Sorry for the inconvenience, and thanks to all those who reached out to report the problem!