AI Engineering Glossary
Search

Quantization

Quantization reduces the precision of numbers involved in a model's calculations, typically from high precision (e.g., 32-bit floating point) to lower precision (e.g., 8-bit integer). This process helps improve the efficiency of models by reducing memory usage and speeding up computation. It's particularly useful in deploying models on edge devices where resources are limited. Quantization may result in a slight drop in accuracy but offers significant gains in speed and efficiency compared to high-precision computations.

Search Perplexity | Ask ChatGPT | Ask Clade

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

z