Back to Glossary

Inference

Inference is the process of using a trained AI model to make predictions or generate outputs on new, previously unseen data in real time.

In Depth

While training creates the AI model, inference is where it actually does useful work. Every time an AI agent reads a customer message and generates a response, that is inference. Inference performance is measured by latency (how fast the model responds), throughput (how many requests it can handle simultaneously), and accuracy (quality of the output).

In customer support, inference speed directly impacts customer experience — responses need to feel near-instantaneous in live chat, even if the underlying model is processing complex reasoning chains. Optimizing inference involves techniques like model quantization (reducing precision for faster computation), caching (storing common responses), batching (processing multiple requests together), and edge deployment (running models closer to users). Cost management is also critical, as inference costs scale with usage volume.

Woman with laptop

Eliminate customer support
as you know it.

Start for free