Alex Credits and Limits
Alex uses a credit system that reflects model complexity, conversation length, and the amount of context the AI has to process. The goal is transparency: the panel shows your current limit state, consumption, and available options.
This page explains the principles in a simple, practical way. Exact limits can vary based on your package, service configuration, and active add-ons.
What Affects Consumption
Each request can include several parts:
| Part | What it means for consumption |
|---|---|
| Your prompt | The message you send to Alex, including any details you provide |
| Context | Conversation history, relevant memory, server state, tool outputs, or file content |
| Response | Alex’s answer, analysis, proposed solution, and steps it performs |
| Tools | Work with console, files, logs, web search, sub-agents, or other integrations |
| Cache | Repeated context that selected models can process more efficiently |
Consumption is therefore not only about the number of messages. A short question is usually lighter than a deep diagnostic task that reads logs, inspects files, and performs multiple steps.
How Cache Works
Some models support prompt caching. Repeated parts of the conversation or system context may be processed more efficiently in later requests.
Cache works automatically. You do not need to enable or configure it. If the selected model supports cache and the request is suitable for it, Alex uses it in the background.
When cache helps most
- You continue in the same conversation instead of starting a new one.
- You work on one area step by step, such as the same app, server, or issue.
- You follow up on a previous plan, check result, or file review.
- You avoid pasting the same long logs repeatedly after Alex has already read them.
- You let Alex reuse existing context instead of restating the same background every time.
When cache helps less
- Every request starts a new conversation.
- You frequently switch topics, servers, or projects.
- Each message contains a large amount of new text.
- The selected model does not support cache or cannot use it for the current request.
Cache does not mean zero consumption. It is an optimization that can reduce the cost of repeated context.
How To Save Credits and Limits
1. Send a clear prompt upfront
Instead of adding one sentence at a time, include the goal, constraints, and expected outcome in the first message.
Less effective:
Check the website.
More effective:
example.com returns 502 after deployment. Check nginx, PHP-FPM, and the latest logs, find the cause, and write a repair plan before making changes.
2. Continue in the same conversation
If you are solving one issue, stay in the same thread. Alex can continue from the history, plan, and previous results. On supported models, this also gives cache a better chance to work efficiently.
3. Do not resend data Alex already has
Once Alex has read a log, configuration, or part of a project, refer back to it:
Continue from the previous nginx log review and also check PHP-FPM.
You do not need to paste the same log again unless it changed.
4. Use Planning Mode for complex tasks
For incidents, migrations, or larger debugging sessions, turn on Planning Mode. Alex analyzes first and proposes a plan. This reduces blind trial-and-error and often avoids unnecessary consumption.
5. Choose the model based on task complexity
Routine checks, status requests, simple configuration, and quick explanations usually work well with standard models. Stronger models are best reserved for complex debugging, architecture, multi-file analysis, or higher-risk decisions.
6. Split very large work into logical stages
One well-prepared task is more efficient than many tiny follow-ups. For very large work, use stages: plan, first part, review, next part. This keeps quality and consumption easier to control.
Types of Limits
Alex may track multiple limits at the same time:
| Limit | Purpose |
|---|---|
| Short-term window | Protects against sudden overuse or unexpected high consumption in a short time |
| Long-term window | Tracks overall consumption within your package |
| Model availability | Some models or features may be available only on selected tariffs |
| Special features | Image generation or selected tools may have their own rules |
The exact state is always visible in the panel. If you reach a limit, Alex will notify you and show the next available options.
Tariffs
Your available limit and features depend on your active package and account configuration. Higher tariffs usually provide more room for demanding work, team usage, or premium models.
If you are unsure which tariff fits you, use your workflow as a guide:
| Usage | Typical approach |
|---|---|
| Occasional questions | Standard tariff and basic models |
| Regular server management | Higher limit and cache-friendly workflow |
| Development, debugging, incidents | Higher tariff, Planning Mode, and stronger models when needed |
| Team or intensive work | Higher availability with a clear budget setup |
Pay-as-you-go
If pay-as-you-go (PAYG) is enabled, you can continue after selected limits are exhausted according to your account settings. Overage consumption is charged from your balance and can be controlled by a cap you configure.
PAYG is useful when you do not want work to stop in the middle of an incident or longer analysis. We still recommend monitoring consumption and using cache-friendly workflows.
Tracking Consumption
In chat
The chat shows your current limit state, remaining room, and warnings when you are close to exhaustion.
In the panel
The service or account overview shows consumption history, used models, PAYG entries where applicable, and other information available for your tariff.
For some models, the overview may also show the cache portion of consumption. Treat it as an indication of how much repeated context could be processed more efficiently.
What Happens When You Reach a Limit
If you reach a limit, Alex notifies you directly in the interface. Depending on your setup, you can:
- wait for the limit to refresh,
- choose a more efficient model or smaller task scope,
- use an available upgrade or PAYG,
- contact support if the situation is urgent.
FAQ
Why does consumption differ for similar requests?
In one case Alex may answer directly, while in another it needs to read history, logs, files, or run diagnostics. Consumption is also affected by the selected model and whether cache can be used.
Does cache always reduce consumption?
Not always. Cache helps most with repeated or stable context and models that support it. If every message adds a large amount of new content, cache has less room to help.
Should I start fewer conversations because of cache?
Yes, when you are solving the same issue or project. Continuing in the same conversation helps Alex reuse context and may improve efficiency. For unrelated topics, start a new conversation so context does not get mixed.
What if Alex hits an error?
If a request fails technically, the system evaluates consumption based on what actually happened. The overview shows what was counted. Contact support if anything looks unclear.
Next Steps
- Alex Models and Limits - How to choose the right model
- Best Practices - How to write prompts and save request units
- Alex Memory - How Alex remembers preferences and context
- Interactive Questions - How Alex asks questions in chat
Need to increase your limit or have a question? Open a support ticket.