AI Engineering Glossary
Search
view all

Weight quantization

Weight quantization is a technique used to optimize neural network models by reducing the precision of the weights, which are the parameters learned during model training. Instead of using 32-bit floating-point numbers, quantization might reduce them to 8-bit integers. This reduces the model size and improves computational efficiency, especially beneficial for deploying models on edge devices with limited resources. Compared to similar methods like pruning, quantization focuses on minimizing the numerical representation while maintaining performance.

Search Perplexity | Ask ChatGPT | Ask Clade

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

z