It's fairly common to get requests for creating a batch version of a previously transactional API. What are the main questions to ask when considering batch implementation:
- Atomicity - what operations should be considered atomic, all-or-nothing (i.e. not admitting partial failures)?
- Since sizable batches cannot be processed atomically, how to handle partial failures? This quickly leads to:
- Idempotency - how to prevent erroneously submitted duplicate requests from creating snowballing failures and data corruption throughout the system?
- Downstream effects - how to ensure that downstream systems that depend on asynchronous processes, such as ETL, work well with different load patterns created by upstream batch requests?