Tuesday, September 11, 2012

Processing batch/bulk requests

It's fairly common to get requests for creating a batch version of a previously transactional API. What are the main questions to ask when considering batch implementation:

  1. Atomicity - what operations should be considered atomic, all-or-nothing (i.e. not admitting partial failures)?
     
  2. Since sizable batches cannot be processed atomically, how to handle partial failures? This quickly leads to:
     
  3. Idempotency - how to prevent erroneously submitted duplicate requests from creating snowballing failures and data corruption throughout the system?
     
  4. Downstream effects - how to ensure that downstream systems that depend on asynchronous processes, such as ETL, work well with different load patterns created by upstream batch requests?
So introducing batch requests into the system without compromising consistency and while maintaining load & performance SLAs is not always a trivial task, which makes it interesting!