Performance Tips¶
Optimize your validation performance with these recommendations.
Quick Optimization Checklist¶
- Install parallel compression tools
- Use appropriate validation levels
- Configure thread counts
Parallel Compression Tools¶
Install these tools for faster compression/decompression:
Performance gains:
- pigz - parallel gzip
- pbzip2 - parallel bzip2
Validation Level Selection¶
Choose the right validation level for your use case:
| Scenario | Recommended Level | Speed Gain |
|---|---|---|
| First-time data processing | strict |
Baseline |
| Pre-validated data | trust |
10-15x faster |
| Files already in correct format | minimal |
100x+ faster |
Example configuration:
{
"ref_genome_filename": {
"filename": "reference.fasta",
"validation_level": "trust",
"threads": 8
}
}
See Validation Settings for detailed information on validation levels.
Feature File Parallel Processing¶
In strict mode, coordinate validation for feature (GFF/GTF/BED) files runs in parallel when threads > 1 and the file contains at least 1,000 features. This applies both to the normal gffread-based path and the direct-parse fallback.
| Condition | Behaviour |
|---|---|
strict + threads > 1 + ≥ 1000 features |
Parallel coordinate validation |
strict + threads = 1 or < 1000 features |
Sequential coordinate validation |
trust or minimal |
No coordinate validation |
Thread Configuration¶
Per-File Thread Settings¶
Configure threads for each file:
{
"ref_genome_filename": {
"filename": "reference.fasta",
"validation_level": "strict",
"threads": 16
}
}
Global Thread Settings¶
Or set globally for all files:
Thread performance: - Strict mode: 3-7x faster with 8+ threads - Trust mode: Minimal benefit from threading - Recommended: Use 8-16 threads for strict mode
Performance Summary¶
| Optimization | Performance Gain | Effort |
|---|---|---|
| validation_level='trust' | 10-15x faster | Low (config change) |
| validation_level='minimal' | 100x+ faster | Low (config change) |
| threads=16 (strict mode) | 3-7x faster | Low (config change) |
| parallel compression tools | 2-6x faster | Medium (installation) |
Best Practices¶
- For production: Use
strictmode first, thentrustfor subsequent runs - For development: Use
trustorminimalmodes - For large datasets: Always use parallel compression tools and maximum threads
- Monitor resources: Check CPU and memory usage with
htop
Logging Performance¶
Reduce logging verbosity for slight performance improvements:
Logging levels (from most to least verbose):
DEBUG- Most detailedINFO- StandardWARNING- Warnings onlyERROR- Errors only
See Also¶
- Validation Settings - Detailed validation level information
- Configuration Guide - Complete configuration options