Foundation for Autonomous Driving: High-Performance, Massive Storage
As vehicles with L2 and higher levels of autonomy operate on roads, data collected from the on-board sensors will be used not just to operate vehicles, but in data centers to train even more sophisticated inferences. Experts estimate that autonomous vehicles could generate as much as 40 TB of data per hour3
from the cameras, sensors and other technology that the vehicles use to operate. The amount of data generated depends on the sensor technology, the number of sensors used and the resolution of the sensors.
For example, a 720p camera with a 1 Mbps bitrate generates 5MB of data every minute. Higher-resolution cameras – 1080p for example – generate up to 10.3 GB per minute. Each vehicle may have six or more cameras, and drivers spend about one hour driving every day4
. For 720p cameras, this translates to 1.8TB of data generated per vehicle per day. Add in more sensors – RADAR and LiDAR sensors for redundancy or more cameras for better coverage and the data generated increases incrementally.
A small percentage of the data collected –~30% – will be uploaded and used for training inference models. Given this, it will only take a small fleet of 62,000 autonomous vehicles to generate 1 Petabyte of uploaded data per month. Keep in mind, there are 276 million vehicles5
on the roads in the United States!
Furthermore, ADAS and AD developers are increasingly using the data collected during driving to generate synthetic video to quickly increase the training data set as more data leads to better inferences. Developers can reconstruct and alter environments (ex: an intersection or stretch of highway) and insert vehicles, pedestrians and other objects for any number of scenarios. Synthetic video also helps guarantee inclusion of high-quality data for corner case scenarios in the training data set.
Data center storage requirements are set to increase exponentially as ADAS and AD systems move to higher-resolution sensors, more vehicles equipped with these systems are sold, and massive amounts of data generated from synthetic video continues to increase.
In addition to scalability to address ever growing datasets, key SSD requirements for inference training include high I/O performance to quickly transfer large sets of data and low latency to minimize the time required to feed data to the CPU and GPUs during training.
To accomplish this, data center architects will need to design systems which support the latest SSD interfaces while looking ahead to products on the horizon. For example, high-performance SSDs that support PCIe 5.0 can provide a significant improvement to data centers tasked with inference training. With a top capacity of 15.36TB, Samsung recently released its PM1743
enterprise SSD featuring a sequential read speed of up to 13,000 MB/s and a random read speed of 2,500K IOPS, delivering ~2x performance over the previous PCIe 4.0 generation of products. Built with Samsung’s advanced sixth-generation V-NAND, the PM1743
is designed to process vast quantities of data to meet the advanced requirements of high-performance servers.