Breaking News

DJI Brings World First to the Sky with Mini 5 Pro GAMEMAX introduces N90 case with LED DOT Matrix Display and Wood Aesthetics HighPoint Upgrades RocketStor 8000 Series eGPU Enclosures with 850W PSU and Smart Cooling Solution for Gen5 GPUs AMD Introduces EPYC Embedded 4005 Processors for Low-Latency Applications at the Edge ADATA Launches SD820 and SC735 External Solid-State Drives

logo

  • Share Us
    • Facebook
    • Twitter
  • Home
  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map

Search form

Researchers Present Efficient Distributed Deep Learning System For Automatic Speech Recognition

Researchers Present Efficient Distributed Deep Learning System For Automatic Speech Recognition

Enterprise & IT Oct 21,2019 0

Automatic speech recognition (ASR) based on deep learning has made great progress recently, thanks to the use of large amounts of training data, expressive models, and high computational power.

Researcers at IBM have developed an efficient distributed learning strategy for training acoustic models with deep architectures.

The researchers published their work in this year’s ICASSP. They used a distributed training approach – Asynchronous Decentralized Parallel Stochastic Gradient Descent (ADPSGD) – to successfully shorten the training time of a deep LSTM acoustic model from one week to 11.5 hours on 32 NVIDIA V100 GPUs without degradation of recognition accuracy on the 2,000-hour Switchboard corpus, a well-established dataset in the speech community for benchmarking ASR performance. In a recently published paper in this year’s INTERSPEECH, they were able to achieve additional improvement on the efficiency of ADPSGD, reducing the training time from 11.5 hours to 5.2 hours using 64 NVIDIA V100 GPUs.

First, large batch sizes are critical for scaling distributed training to a large number of learners. The researchers observed that ADPSGD may allow significantly larger batch sizes with good loss convergence than synchronous centralized parallel SGD (SCPSGD).

While a rigorous theory is still being developed to explain this phenomenon, the researchers speculate that since SCPSGD is a special case of ADPSGD, the local model averaging among only neighboring learners in ADPSGD is equivalent to a noise perturbation of global model averaging in SCPSGD. This noise perturbation may provide opportunities to use a larger batch size in ADPSGD that is not possible for SCPSGD. This property gives ADPSGD great advantages when scaling out distributed training to a large number of learners.

Second, to improve communication efficiency on the same node while also reducing main memory traffic and CPU pressure among nodes, the researchers designed a hierarchical ADPSGD architecture (H-ADPSGD). The learners on the same computing node construct a super-learner via NVIDIA NCCL using a synchronous ring-based all-reduce implementation (Sync-Ring). The super-learners then form another ring under ADPSGD (ADPSGD-Ring). In addition, as gradient computation on GPUs overlaps with the ADPSGD communication, this design also significantly improves the computation/communication ratio in the distributed training.

The distributed training of the LSTM acoustic model using the proposed H-ADPSGD is carried out on a cluster with eight nodes connected via 100 Gbit/s Ethernet. Each node has eight NVIDIA V100 GPUs. The batch size on each GPU is 128, which gives a global batch size of 8,192. The model was trained for 16 epochs and achieved 7.6% WER for the Switchboard task and 13.2% WER for the Callhome task.

While it took about one week to train the model on a single V100 GPU and 11.5 hours in our ICASSP paper using ADPSGD on 32 NVIDIA V100 GPUs, it only took 5.2 hours to train under H-ADPSGD on 64 NVIDIA V100 GPUs. Overall, H-ADPSGD gives 40x speedup without accuracy loss. This also marks an additional 50% training time reduction from IBM's ICASSP paper.

The researchers claim that it is the first time that an asynchronous distributed algorithm is demonstrated to scale better with a large batch size than the synchronous approach for large-scale deep learning models. And 5.2 hours is the fastest training time that reaches this level of recognition accuracy on the 2,000-hour Switchboard dataset, to date.

Tags: IBMAutomatic speech recognition (ASR)
Previous Post
Microsoft to Showcase Advanced Spatial Computing Experiences at UIST 2019
Next Post
EU Data Watchdog Concerned Over Microsoft 's Contracts With EU Institutions

Related Posts

  • IBM and AMD Join Forces to Build the Future of Computing

  • IBM Unveils watsonx Generative AI Capabilities to Accelerate Mainframe Application Modernization

  • New magnetic tape prototype breaks data density and capacity records

  • IBM Expands the Computational Power of its IBM Cloud-Accessible Quantum Computers

  • Researchers Use Analog AI hardware to Support Deep Learning Inference Without Great Accuracy

  • Server Market Posts a Record First Quarter on Strong Cloud-service Demand

  • IBM Wants to Change IT Operations With Watson AIOps, Releses Edge Computing Solutions for 5G Deployments 5G era

  • IBM Reports Continued Cloud Revenue Growth, Withdraws Annual Forecast

Latest News

DJI Brings World First to the Sky with Mini 5 Pro
Drones

DJI Brings World First to the Sky with Mini 5 Pro

GAMEMAX introduces N90 case with LED DOT Matrix Display and Wood Aesthetics
Cooling Systems

GAMEMAX introduces N90 case with LED DOT Matrix Display and Wood Aesthetics

HighPoint Upgrades RocketStor 8000 Series eGPU Enclosures with 850W PSU and Smart Cooling Solution for Gen5 GPUs
Enterprise & IT

HighPoint Upgrades RocketStor 8000 Series eGPU Enclosures with 850W PSU and Smart Cooling Solution for Gen5 GPUs

AMD Introduces EPYC Embedded 4005 Processors for Low-Latency Applications at the Edge
Enterprise & IT

AMD Introduces EPYC Embedded 4005 Processors for Low-Latency Applications at the Edge

ADATA Launches SD820 and SC735 External Solid-State Drives
PC components

ADATA Launches SD820 and SC735 External Solid-State Drives

Popular Reviews

be quiet! Dark Mount Keyboard

be quiet! Dark Mount Keyboard

Terramaster F8-SSD

Terramaster F8-SSD

be quiet! Light Mount Keyboard

be quiet! Light Mount Keyboard

be quiet! Light Base 600 LX

be quiet! Light Base 600 LX

Noctua NH-D15 G2

Noctua NH-D15 G2

Soundpeats Pop Clip

Soundpeats Pop Clip

be quiet! Pure Base 501

be quiet! Pure Base 501

Akaso 360 Action camera

Akaso 360 Action camera

Main menu

  • Home
  • News
  • Reviews
  • Essays
  • Forum
  • Legacy
  • About
    • Submit News

    • Contact Us
    • Privacy

    • Promotion
    • Advertise

    • RSS Feed
    • Site Map
  • About
  • Privacy
  • Contact Us
  • Promotional Opportunities @ CdrInfo.com
  • Advertise on out site
  • Submit your News to our site
  • RSS Feed