Describe the difference between batch and streaming data
When navigating the intricate world of data analytics, you’ll often encounter two primary modes of data processing: batch and streaming. To harness the power of your data, under-standing the distinction between these two is crucial.
Batch data
At its core, batch processing is about dealing with data in large chunks or batches. Imagine you’ve been on a trip and taken numerous photos. At the end of the day, you transfer all these photos at once to your computer. That’s batch processing. In data analytics, batch processing means you’re accumulating data over a specific period and then processing it all at once. This approach is ideal when dealing with vast volumes of data that don’t require immediate analy-sis. For example, if you’re analyzing monthly sales data, you’d typically wait for the month to end, collect all the sales records, and then process them together, as shown in Figure 4-14.
Here are the characteristics of batch data:
- Volume-driven: Batch processing typically handles enormous volumes of data. It’s designed for efficiency and can process vast datasets at optimized costs.
- Scheduled: Operations in batch processing are often pre-scheduled, occurring at fixed times like the end of the day or the end of the week.
- Latency: Since data is processed after accumulation, there’s inherent latency in obtain-ing insights from batch data.
Here are the challenges of batch data:
- Data freshness: As data isn’t processed immediately upon arrival, insights might not reflect the most current state of affairs.
118 CHAPTER 4 Describe an analytics workload on Azure
■■ Resource management: Handling the surge of processing needs when dealing with large batches requires optimal resource allocation.

FIGURE 4-14 Time-series data store
Leave a Reply