Description: In applications such as a digital assistant or a search engine, the input data are natural language tokens with short sequence length, e.g., smartphone assistant queries such as “What is the weather in San Francisco?” For these types of scenarios, reduced sequence length typically has a negligible impact on the accuracy attained by compact models. This is another characteristic that is deeply coupled with the latency advantage of RDUs. While a GPU’s latency saturates with reduced sequence length for compact models, the RDU’s latency improves with reduced sequence length. As shown in Figure 2, the TinyBERT model can match state-of-the-art model accuracies across sequence lengths from 64 to 256 on the MNLI benchmark task that we use as a proxy.
With ITBusinessBook, you can turn ideas into possibilities and use information to empower your business. Check out the most recent research studies and whitepapers right now!