🚀 Optimum Transformers: accelerated NLP pipelines with Infinity speed

Optimum Transformers

Accelerated NLP pipelines for fast inference :rocket: on CPU and GPU. Built with :hugs:Transformers, Optimum and ONNX runtime.

How to use

Quick start

The usage is exactly the same as original pipelines, except minor improves:

from optimum_transformers import pipeline

pipe = pipeline("text-classification", use_onnx=True, optimize=True)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
  • use_onnx- converts default model to ONNX graph
  • optimize - optimizes converted ONNX graph with Optimum

Optimum config

Read Optimum documentation for more details

from optimum_transformers import pipeline
from optimum.onnxruntime import ORTConfig

ort_config = ORTConfig(quantization_approach="dynamic")
pipe = pipeline("text-classification", use_onnx=True, optimize=True, ort_config=ort_config)
pipe("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]

Benchmark

With notebook

You can benchmark pipelines easier with benchmark_pipelines notebook.

With own script

from optimum_transformers import Benchmark

task = "sentiment-analysis"
model_name = "philschmid/MiniLM-L6-H384-uncased-sst2"
num_tests = 100

benchmark = Benchmark(task, model_name)
results = benchmark(num_tests, plot=True)

Results

Note: These results were collected on my local machine. So if you have high performance machine to benchmark, please contact me :hugs:

Benchmark sentiment-analysis pipeline

Almost the same as in Inifinity launch video :hugs:

AWS VM: g4dn.xlarge
GPU: NVIDIA T4
128 tokens
2.6 ms

More results are available in project repository!

3 Likes