Fujitsu Laboratories has developed new parallel distributed data processing technology that enables pools of big data as well as continuous inflows of new data to be efficiently processed and put to use within minutes.
When the priority is on high-speed performance, methods that process the data in memory are used, but when dealing with very large volumes of data, disk-based methodologies are typically used as volumes are too large to process in memory. When using disk-based techniques, however, if the objective is to immediately reflect the newly received data in the analytical results, many disk accesses are necessary. This results in the problem that analytical processing cannot keep pace with the volume of data flowing in.
To address this problem, Fujitsu Laboratories has developed a technology it calls "adaptive locality-aware data reallocation," which dramatically reduces the number of accesses, along with distributed parallel middleware for incremental processing. With adaptive data localization, data is optimally allocated by the following three steps:
- Record data-access history: Records sets of continuously accessed data.
- Calculate optimal allocation: Based on step 1, group sets of data that tend to be accessed continuously.
- Reallocate data dynamically: Based on step 2, specify a location on disk for data belonging to a group and allocate it there.
This makes it possible to acquire desired data through a fewer number of continuous accesses, not numerous random accesses, which vastly increases overall throughput in a distributed-processing system. Also, by monitoring and automatically recognizing patterns of data access, this technology can gradually accommodate the hard-to-anticipate data characteristics of social-infrastructure systems.
According to Fujitsu, the new technology slashes the number of disk accesses by approximately 90% compared to previous levels by dynamically reallocating data on disks to match trends in data accesses. Whereas producing analytic results of new data could take several hours in the past, with this new technique results are available in minutes.
Fujitsu plans to applying it to commercial products and services in fiscal 2013.