As well as working on the high server errors that we have been experiencing since we started to add a lot more social bookmarking sites, we have also been working on another issue which is related to this one.
The submission process works in several steps – it logs into the website, submits the job data, in some cases there is a confirmation step and then we check for success and pull out the URL of the created bookmark where available.
The earlier problem of the server timing out and causing the job to fail can actually happen at any point in the submission process. Sometimes the job has been successfully submitted and then the timeout occurs on the final step where we are checking for success. If this happens the job will fail even though the submission did actually succeed. What happens next is that we have an internal retry system that will retry a submission that failed due to a server error several times over a 24-48 hour period. Sites with higher PR are given a higher number of retries.
The problem is that when the job is retried, it is already there (because it was successfully submitted the first time) but now it fails with the error “Duplicate URL”! This is actually an oversight on our part and a bug that has been in the system since the beginning but we didn’t notice it before because the number of failures due to a duplication error was very small.
But since we started adding more sites and started to experience general performance issues, more and more submissions have been failing due to timeouts and this in turn also raised the number of duplication errors. A true duplication error should be very rare, so as these numbers increased it raised a red flag.
So we have now investigated the issue and fixed it – partially. If a duplication error is found then the submission process will now skip ahead to the final step and look for the posted Bookmark URL. However, this will only work if the account that is being used to submit this job is the same account as the one used to submit it earlier when it did indeed succeed because we login to that users account, look at the submission history and pull out the Bookmark URL that was created.
We use a pool of accounts for submission and we rotate around them randomly for all jobs so it may well be a different account that is used on a subsequent retry which means the URL cannot be extracted.
The proper fix for this is to keep track of which account is used for every submission and in the case of a failure, to use the same account when performing any retries. However this is not a trivial fix and it cannot be applied retroactively.
What all of this means is that from now on you will see less failures due to a duplication error (they should now be very rare) but until we can implement the full fix keeping track of the account, the reporting of the created URL will not work in all cases so that field will be blank.
I’ll post again when the next phase of the fix has been implemented and also on the progress of the underlying server timeout errors.