AI Engineering Glossary

all > Term > Inference OptimizationView all

by llatent.space

view all

related:
Quantization

Inference Optimization

Inference optimization involves improving the performance and speed of machine learning models when they are making predictions on new data. This can include techniques like model pruning, quantization, and utilizing more efficient hardware or software. It is crucial in real-world applications where quick and efficient responses are required, such as in autonomous vehicles or real-time recommendation systems. By optimizing inference, systems can achieve faster decision-making with lower computational resources.

Search Perplexity | Ask ChatGPT | Ask Clade

a

Activation Function

Adversarial Attacks

AI Orchestration

AI safety systems

Anonymization Techniques

Application Programming Interface (API)

Artificial General Intelligence

Attention Mechanism

Autonomous Systems

Autoregressive Language Models

b

Backpropagation

Batch Processing

Bayesian Networks

c

Causal language models

Chain of Thought Prompting

ChatGPT https://openai.com/research/

Civitai https://civitai.com

Claude https://claude.ai/

Cohere https://cohere.ai

Column-Family Store

Compound System

computer vision

Conditional Generation

Constitutional AI

Contextual Embedding

Conversational Agents

Convolutional Layer

Cosine Similarity

Cost and Latency

Cross-Entropy Loss

Cross-Lingual Embeddings

Cross-Validation

d

Data Augmentation

Data Preprocessing

Dataset Engineering

Decoder-Only Architecture

DeepSeek https://www.deepseek.com/

Deeplearning.ai https://deelearning.ai

Dense Retrieval

Deployment Optimization

Dimensionality Reduction

Document Chunking

Document Ranking

Domain-specific models

e

Eleven Labs https://elevenlabs.io/

embedding models

f

Fal https://fal.ai/

Feature Engineering

Feature Extraction

Few Shot Learning

Few-Shot Prompting

FireCrawl https://www.firecrawl.dev/

Flash Attention

Foundation Model

Function calling

g

Gemini https://gemini.google.com

Generative Adversarial Networks

Generative Adversarial Networks

Generative Models

Generative Pre-trained Transformer https://huggingface.co/docs/transformers

Generative Pre-trained Transformer

Google AI Studio https://aistudio.google.com/

Google Colab https://colab.research.google.com/

Gradient Boosting

Gradient Descent

h

Hugging Face https://huggingface.co

Human-in-the-Loop

i

Image Generation

image processing

Inference Optimization

Information Retrieval

Instruction Tuning

Interpretability

j

Jina AI https://jina.ai

k

k-nearest neighbors

Kaggle https://www.kaggle.com

Key-Value Store

Keyword Analysis

Kling https://www.klingai.com/

Knowledge Distillation

Knowledge Graph

l

Label Studio https://labelstud.io/

LangChain https://www.langchain.com/

Language Generation https://en.wikipedia.org/wiki/Natural_language_generation

Large Language Model

Lightning.ai https://www.lightning.ai/

Llama https://www.llama.com/

LlamaIndex https://www.llamaindex.ai/

LLM Visualisation https://bbycroft.net/llm

Logical Feature Store

Long Short-Term Memory

Low Precision Arithmetic

Low-Precision Computing

m

Masked language models

Methods for Structured Outputs

Midjourney https://www.midjourney.com

Mistral https://mistral.ai/en

Mixture-of-experts

Modal.com https://modal.com

Monte Carlo Methods

Multi-Hop Retrieval

n

Natural Language Processing

Neural Networks

Notebook LM https://notebooklm.google/

o

Ollama https://ollama.com

One Shot Prompting

OpenAI Application Programming Interface https://www.openai.com/

Open-ended Outputs

Open Source Models

Open Weight Models

p

parameter efficient fine tuning

Parameter-Efficient Fine-Tuning

Perplexity AI https://www.perplexity.ai/

Pinecone https://www.pinecone.io/

Policy Gradient

Post-Processing

Post-Training Quantization

Pre-trained model

Prompt Engineering

Prompt Injection

Prompt Optimization

PyTorch https://pytorch.org

q

Query Optimization

r

Reinforcement Learning

Resource Allocation

Retrieval Augmented Generation

Retrieval Augmented Generation https://towardsdatascience.com/retrieval-augmented-generation-the-best-of-both-worlds-fb40b8abf64d

Reward Function

RunDiffusion https://rundiffusion.com/

Runway ML https://runwayml.com

s

Self-critique prompting

Self-Supervised Learning

Semantic Search

Semantic Similarity

Semi-supervised Learning

Softmax Function

Stability AI https://Stability.ai

Stable Diffusion

Stable Diffusion Art https://stable-diffusion-art.com/

Structured Output

Synthetic Data Generation

t

Text-to-Image Generation

Text-to-Text Generation

Transfer Learning

Transfer Learning

Transformer Architecture

u

Udio https://www.udio.com/

Uncertainty Estimation

Unstructured Data Processing

v

Variational Autoencoders https://arxiv.org/abs/1312.6114

Vector database

Vertex AI https://cloud.google.com/vertex-ai

Vision Transformers

w

Weaviate https://weaviate.io/

Weight quantization

z

Zero Shot Prompting