Avatar
GuestNo new alerts

A series of mysterious occurences

- Updates45
Avatar
For the past few weeks, a set of mysterious bugs have been rearing their ugly heads. The site will work fine (and at the speed you'd expect from Gosora) for a time, but then, one day, it'll become laggy for no apparent reason. Restarting Gosora fixes which indicates it must be a blockage of some sort.

I'm investigating this, and I'm taking the step of implementing several features to both see the determinants in this phenomena, and to hopefully mitigate it's effects. It may even be connected to Cloudflare which isn't known to be the best behaving reverse-proxy.

As both of this effort, I am going to add more settings to disable unwanted logging. This will have the side-effect of improving performance (in general), and of improving privacy. We may try to flip these settings one by one to see if any behaviors change.

We have also dramatically optimised logging in general, although some of this work started before we became aware of this weird phenomena. We have also noticed that this phenomena has coincided with a large number of suspicious requests, so it's entirely possible it might a side effect of bad actors exploiting our resources to probe for vulnerabilities, although we will have to dig into this further.

If it is bad actors, I don't think it is anyone with a vendetta, I think it is more likely to be increased traffic to Gosora providing the site with more bad actors in general. This could be interesting issue to explore, particularly with the possible bottlenecks which might be involved.

Another thing we're doing is rotating logs, so instead of having gigantic logs growing up to a gigabyte as an instance runs forever, they'll instead cap out at a megabyte. If paging the logs in and out of memory is a factor, this should help to mitigate that, and we're also exploring to see if capping the amount of time we keep file handles open will have an effect here.

We're also exploring improving our performance monitoring code. We already measure how long requests run on average, and even the duration of the longest requests. Oddly, this phenomena is not recorded there, which indicates it could be an issue with Cloudflare, but it's also pointed out a gap in our performance monitoring.

To rectify this, we're planning to implement a feature which automatically pings the server every hour to see if it is responding in a reasonable timeframe (it would be most effective to run from a separate server, but I don't currently have the time to setup another one).

We're also going to track the number of open connections and other server metrics to see if any are a factor at play here. And finally, I'm going to do an audit of the performance analytics code to see if there are any flaws in the implementation which could cause a flaw to go unseen, and to implement some unit tests to verify that it operates as wanted.
Avatar
Cloudflare seems to be keeping a lot of connections open unneccessarily... Although, I'm not sure if that is related.

It isn't surprising that a reverse-proxy like Cloudflare wants to keep some connections open in order to better utilize resources, but what is surprising is that Cloudflare doesn't appear to be using the connections it is holding open, and instead opts to open new ones (and thereby depleting the connection pool pointlessly).