AI Engineering Glossary
Tools
Terms
Resources
about
a
Advanced RAG
Agent
Agentic AI
Agents
AI-as-a-Judge
AI Engineering
AI Judges
AI Orchestration
Alignment
Alignment
Applications of Foundation Models
Automatic1111
Autoregressive language models
b
Benchmarks
BM25
c
Causal language models
Chain-of-thought (CoT)
Chain of Thought Prompting
Chains
ChatGPT
Chroma
Chunking
Chunking Module
Chunking Strategies
Civitai
Claude
Cleaning Layer
Cohere
Constraint Sampling
Context
Context WIndow
Continuous batching
Copyright Regurgitation
Copyright Regurgitation
Cost and Latency
Cross entropy
d
Data Category
Data Collection Pipeline
Data Contamination
Data Extraction Module
Data Manipulation in Foundation Models
Data parallelism
Data Quality Concerns
Data Slicing
Data Synthesis
Data Theft
Dataset Engineering
Deterministic
Dispatcher
Domain-Specific Models
Domain-specific models
e
Eleven Labs
Embedding-based retrieval
Embedding Component
Embedding Handler
Embeddings
Evaluation Harness
Evaluation Pipeline
ExLlamaV2
EXL2
f
Factual Consistency
Faiss
Feature
Feature Pipeline
Few-shot learning
Few-shot learning
Finetuning
Finetuning
Finetuning for Structured Outputs
Flash attention
Foundation Models
Frontier Model
Function calling
g
Gemini
Generative AI
GGUF
GGUF Format
Google AI Studio
GPTQ
Gradient Descent
Greedy sampling
Greedy Sampling
Grounding
Gwen
h
Hallucination
Handler
Hugging Face
Hybrid Search
Hypothetical Document Embeddings
i
In-context learning
Inference
Inference Optimization
Instruction Following
Instruction-Following Capability
Inverse Document Frequency
Inverted index
j
JSON mode
k
k-nearest neighbors
Kaggle
KV cache
l
Large Language Model
Latent Space
Lexical Similarity
Llama
LLM Engineer
LLM Twin
LLM Twin Architecture
LLM Twin Goal
Loading Module
Logical Feature Store
Logits
Logits
Logprobs
Luma Labs
m
Masked language models
Memory System
Methods for Structured Outputs
Midjourney
Mistral
Mixture-of-experts (MoE)
ML Engineering / MLOps
Model Adaptation
Model API Cons
Model API Pros
Model Build vs. Buy
Model Context Protocol
Model Development Process
Model Licenses
Model Optimization Strategies
Model parallelism
Model Quantization
Multilingual Models
Multimodal Embedding Models
n
N-grams
Natural Language Generation (NLG)
NoSQL Database
Notebook LM
o
Ollama
One Shot Prompting
Open-ended Outputs
Open Source Models
Open Weight Models
OVM
p
PagedAttention
Parameter
Perplexity
Perplexity
Perplexity AI
Pinecone
Pipeline parallelism
Post-Processing
Post-retrieval
Post-training
Post-Training Quantization (PTQ)
Pre-retrieval
Pre-retrieval
Pre-trained model
Preference Models
Probabilistic
Prompt Engineering
Prompt Engineering
Prompt Engineering
Prompt Injection
Prompt Organization
Prompt Security
Prompting Guide
Proprietary Data
q
Quantization
Quantization-Aware Training
r
RAG
RAG Applications
RAG Ingestion Pipeline
Redundancy Removal
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Reranking
Retrieval Algorithms
Retrieval-Augmented Generation
Retrieval Augmented Generation
Retrieval Pipeline
Reverse Prompt Engineering
Reward Model
Roleplaying
RunDiffusion
Runway ML
s
SAFE
Sampling
Self-critique prompting
Self-Hosting Cons
Self-Hosting Pros
Self-query
Self-supervision
Semantic Similarity
Speculative decoding
Stability.ai
Stable Diffusion
Stable Diffusion Art
Stochastic Parrot
Structured Outputs
Structured Outputs
Sub-queries
Supervised Finetuning
System Prompt
t
Temperature
Tensor Parallelism
Term-based retrieval
Term Frequency
Term Frequency-Inverse Document Frequency
Test Time Sampling
Text Generation Inference
Text-to-SQL
TGI
Time to first token
Token
Token
Tokenization
Tokenization
Top-k sampling
Top-p sampling
Toxicity Benchmarks
Training Data Extraction
Training Data Sources
u
Udio
v
vLLM
Vector Database
Vector Database
Vector database
Vocabulary
w
Weight quantization
z
ZenML Pipeline
zenml.io
Zero Shot Prompting