Over the past few weeks since implementing the credit system, we have added a lot of new sites but many of them remain on trial as the number of server errors is too high. Sometimes there are some sites that simply don’t perform well enough but in this case there were enough sites affected to make us investigate the problem further. We’ve been posting a few short updates on our Facebook page but for those who don’t use that, here’s a recap of our findings so far:
First of all over the Easter weekend the problem got noticeably worse – we found there was an issue with the external system that we use to solve the captchas. As almost all of our credit sites have captchas, they all stopped working. Unfortunately the problem was not resolved for a few days as we were not working over the Easter break.
Fixing this was an improvement but the problem remained. We continued to investigate and a few days later we found that there had been a subtle change to the Pligg template. Pligg is a system that many bookmarking sites use and many of our sites are Pligg sites so this affected many of them. What was strange was that the problem was not an all or nothing change – it simply made failure rates worse. Usually, if a site changes its template (such as Tagza right now), the submissions immediately stop working until we are able to engineer our code to match the new site template.
However we went through all of our Pligg sites, and checked if any of them required changes which many did and this began to improve success rates some more but still not enough.
At this point we realised that there was an underlying issue in the submission engine which affected a large number of sites but not all of them. This does not make it easy to track down the problem – usually a global issue affects all sites and not just a subset! What we have discovered now is that if there is a site which is slow, the poor performance seems to be exacerbated by running submissions through our engine. This is due to the fact that we run huge numbers of submissions simultaneously and also because we use proxies.
So the sites that are always fast go through very quickly and remain problem-free but those that are slow, run even slower through our engine and as a result do not perform as well as we would like. We know we can never achieve 100% success rate (though there are a few sites that do!), but we aim for 90% overall.
So now we’re looking at engine performance. We made a tweak last night which has helped but it is not scalable which means that if we suddenly doubled the number of submissions we performed, this tweak would not continue to provide the benefit that it is now. We have a number of other strategies that we plan to explore over the coming weeks. Each one has to be tested in isolation and over a period of days and of course has to be easily revertible if it causes problems.