Training Overview
Training is the process of teaching your AI agent about your business. Your training data feeds directly into the ChatSpark AI Engine, which uses it to find and deliver accurate answers. The more relevant content you provide, the better your agent can answer customer questions.
You can add training data from multiple sources:
- Files: Upload documents in various formats
- Websites: Crawl your site or specific URLs
- Text: Paste content directly using our editor
- More Sources: Pull resolved tickets and conversations from your helpdesk
Begin with your FAQ page and most common support topics. These will immediately make your agent useful for the majority of customer questions.
Supported File Types
Upload documents in any of these formats:
| Format | Extension | Notes |
|---|
| PDF | .pdf | Text-based PDFs work best; scanned images may have issues |
| Word | .doc, .docx | Full support including tables and lists |
| PowerPoint | .ppt, .pptx | Text from slides is extracted |
| CSV | .csv | Great for product catalogs and structured data |
| Text | .txt | Plain text files |
Each uploaded file counts toward your training data limit. 1 page = approximately 750 words.
Website Crawling
Let ChatSpark automatically learn from your website:
- Navigate to Training → Website in your agent settings
- Enter your website URL or specific page URLs
- Choose to crawl the entire site or just specific pages
- Click Crawl: we'll extract all text content
Full Site Crawl
Enter your homepage URL and we'll follow links to discover all pages. This is great for comprehensive coverage.
Specific Pages
Add individual URLs to target specific content. Useful for:
- FAQ pages
- Product documentation
- Pricing pages
- Policy pages (returns, shipping, etc.)
We respect robots.txt and won't crawl pages that are blocked. Dynamic content that requires JavaScript may not be fully captured.
Rich Text Editor
Use our built-in editor to add content directly:
- Perfect for FAQs: Format questions and answers clearly
- Quick updates: Add new information instantly
- Rich formatting: Headings, lists, bold, links
- No file needed: Just paste and save
The text editor is ideal for:
- Common Q&A pairs
- Quick policy updates
- Seasonal information
- Corrections or clarifications
- YouTube video transcripts
More Sources
The More Sources tab lets you pull resolved tickets and conversations directly from your helpdesk or support platform. Your agent learns from real customer interactions, which makes it better at answering the questions your customers actually ask.
Supported platforms:
- HappyFox
- Zendesk
- Freshdesk
- Salesforce
- Freshchat
- Intercom
How to import
- Go to Training → More Sources in your agent settings
- Set up the AI Action for your platform if you have not done so already
- Select your date range and how many records to import
- Choose any platform-specific filters such as category, group, or case type
- Click Import and your records will be queued for training
Ticket and conversation records are formatted automatically. You do not need to format anything manually.
Counts toward your plan limit
Each imported record counts toward your training data limit the same as any other source. One page is approximately 750 words.
HappyFox
HappyFox requires you to select at least one category before importing. Resolved tickets from those categories will be pulled and formatted as question and answer pairs for your agent.
Zendesk and Freshdesk
Import solved tickets from your Zendesk account or resolved tickets from Freshdesk. You can optionally filter by group to target a specific team.
Salesforce
Pull closed cases from Salesforce. You can optionally filter by case type to focus on a specific category of support interactions.
Freshchat and Intercom
Freshchat and Intercom are conversation-based platforms. Each imported record contains the full back-and-forth dialogue between the customer and your team. You can filter by group or team inbox to focus on the most relevant conversations.
Every ticket and conversation imported through More Sources is automatically screened before it is added to your training data. Personal information including names, email addresses, phone numbers, and shared credentials is detected and removed. Your agent learns from the resolution patterns and product knowledge in your support history, not from the personal details of individual customers.
Best Practices
Follow these guidelines for the best results:
- Be comprehensive: Include all information customers might ask about
- Use clear language: Write in plain English, avoid jargon
- Structure content well: Use headings, lists, and clear organization
- Include variations: If customers might phrase things differently, include those variations
- Keep it current: Update training data when policies or products change
- Review analytics: Check unanswered questions to find gaps
Review your customer support emails and tickets. The questions people actually ask are the best source of training content.
Retraining Your Agent
Your agent automatically retrains when you:
- Add new training data
- Update existing content
- Delete outdated information
- Re-crawl your website
Retraining typically takes 1-5 minutes depending on the amount of content. Your agent remains available during retraining.
Training Limits
Training data limits vary by plan:
| Plan | Training Data | Approx. Words |
|---|
| Starter | 50 pages | ~37,500 words |
| Pro | 200 pages | ~150,000 words |
| Enterprise | Unlimited | Unlimited |
View your current usage in the dashboard under Training Data Usage.