AI Engineering Glossary
Search

Retrieval Algorithms

term

BM25, and embedding-based retrieval methods

Search Perplexity |Ask ChatGPT |Ask Clade

a

Advanced RAGAgentAgentic AIAgentsAI-as-a-JudgeAI EngineeringAI JudgesAI OrchestrationAlignmentAlignmentApplications of Foundation ModelsAutomatic1111Autoregressive language models

b

BenchmarksBM25

c

Causal language modelsChain-of-thought (CoT)Chain of Thought PromptingChainsChatGPTChromaChunkingChunking ModuleChunking StrategiesCivitaiClaudeCleaning LayerCohereConstraint SamplingContextContext WIndowContinuous batchingCopyright RegurgitationCopyright RegurgitationCost and LatencyCross entropy

d

Data CategoryData Collection PipelineData ContaminationData Extraction ModuleData Manipulation in Foundation ModelsData parallelismData Quality ConcernsData SlicingData SynthesisData TheftDataset EngineeringDeterministicDispatcherDomain-Specific ModelsDomain-specific models

e

Eleven LabsEmbedding-based retrievalEmbedding ComponentEmbedding HandlerEmbeddingsEvaluation HarnessEvaluation PipelineExLlamaV2EXL2

f

Factual ConsistencyFaissFeatureFeature PipelineFew-shot learningFew-shot learningFinetuningFinetuningFinetuning for Structured OutputsFlash attentionFoundation ModelsFrontier ModelFunction calling

g

GeminiGenerative AIGGUFGGUF FormatGoogle AI StudioGPTQGradient DescentGreedy samplingGreedy SamplingGroundingGwen

h

HallucinationHandlerHugging FaceHybrid SearchHypothetical Document Embeddings

i

In-context learningInferenceInference OptimizationInstruction FollowingInstruction-Following CapabilityInverse Document FrequencyInverted index

j

JSON mode

k

k-nearest neighborsKaggleKV cache

l

Large Language ModelLatent SpaceLexical SimilarityLlamaLLM EngineerLLM TwinLLM Twin ArchitectureLLM Twin GoalLoading ModuleLogical Feature StoreLogitsLogitsLogprobsLuma Labs

m

Masked language modelsMemory SystemMethods for Structured OutputsMidjourneyMistralMixture-of-experts (MoE)ML Engineering / MLOpsModel AdaptationModel API ConsModel API ProsModel Build vs. BuyModel Context ProtocolModel Development ProcessModel LicensesModel Optimization StrategiesModel parallelismModel QuantizationMultilingual ModelsMultimodal Embedding Models

n

N-gramsNatural Language Generation (NLG)NoSQL DatabaseNotebook LM

o

OllamaOne Shot PromptingOpen-ended OutputsOpen Source ModelsOpen Weight ModelsOVM

p

PagedAttentionParameterPerplexityPerplexityPerplexity AIPineconePipeline parallelismPost-ProcessingPost-retrievalPost-trainingPost-Training Quantization (PTQ)Pre-retrievalPre-retrievalPre-trained modelPreference ModelsProbabilisticPrompt EngineeringPrompt EngineeringPrompt EngineeringPrompt InjectionPrompt OrganizationPrompt SecurityPrompting GuideProprietary Data

q

QuantizationQuantization-Aware Training

r

RAGRAG ApplicationsRAG Ingestion PipelineRedundancy RemovalReinforcement Learning from Human FeedbackReinforcement Learning from Human FeedbackRerankingRetrieval AlgorithmsRetrieval-Augmented GenerationRetrieval Augmented GenerationRetrieval PipelineReverse Prompt EngineeringReward ModelRoleplayingRunDiffusionRunway ML

s

SAFESamplingSelf-critique promptingSelf-Hosting ConsSelf-Hosting ProsSelf-querySelf-supervisionSemantic SimilaritySpeculative decodingStability.aiStable DiffusionStable Diffusion ArtStochastic ParrotStructured OutputsStructured OutputsSub-queriesSupervised FinetuningSystem Prompt

t

TemperatureTensor ParallelismTerm-based retrievalTerm FrequencyTerm Frequency-Inverse Document FrequencyTest Time SamplingText Generation InferenceText-to-SQLTGITime to first tokenTokenTokenTokenizationTokenizationTop-k samplingTop-p samplingToxicity BenchmarksTraining Data ExtractionTraining Data Sources

u

Udio

v

vLLMVector DatabaseVector DatabaseVector databaseVocabulary

w

Weight quantization

z

ZenML Pipelinezenml.ioZero Shot Prompting