Files
big-file-gen/README.md
Space-Banane ca1f80d470 first commit
2026-01-16 21:30:17 +01:00

4.6 KiB

Big File Generator

A collection of Python CLI tools for creating and reading large binary files. Useful for testing disk I/O performance, storage systems, and file transfer mechanisms.

Tools

make_big_file.py - File Generator

Creates large binary files filled with zeros for testing purposes.

Features:

  • Configurable file size with human-readable units (GB, TB, MB, etc.)
  • Adjustable chunk size for write optimization
  • Disk space validation before writing
  • Real-time progress reporting with speed metrics
  • Prevents accidental file overwrites
  • Graceful interrupt handling with cleanup
  • Quiet mode for scripting

Usage:

python make_big_file.py <output> <size> [options]

Arguments:

  • output - Output file path
  • size - File size (e.g., 15GB, 1.5TB, 500MB)

Options:

  • --chunk-size <size> - Chunk size for writing (default: 64MB)
  • --quiet, -q - Suppress progress output
  • --version - Show version information
  • --help, -h - Show help message

Examples:

# Create a 15GB file
python make_big_file.py output.bin 15GB

# Create a 1.5TB file with 128MB chunks
python make_big_file.py bigfile.dat 1.5TB --chunk-size 128MB

# Create a 500MB file quietly
python make_big_file.py test.bin 500MB --quiet

read_big_file.py - File Reader & Benchmark

Reads large files and measures I/O performance, optionally computing checksums.

Features:

  • Configurable chunk size for read optimization
  • Real-time progress reporting with speed metrics
  • SHA256 hash computation option
  • File validation before reading
  • Quiet mode for scripting
  • Graceful interrupt handling

Usage:

python read_big_file.py <input> [options]

Arguments:

  • input - Input file path to read

Options:

  • --chunk-size <size> - Chunk size for reading (default: 64MB)
  • --hash - Compute SHA256 hash of the file
  • --quiet, -q - Suppress progress output
  • --version - Show version information
  • --help, -h - Show help message

Examples:

# Read a large file
python read_big_file.py largefile.bin

# Read with 128MB chunks and compute hash
python read_big_file.py test.dat --chunk-size 128MB --hash

# Read quietly and compute hash
python read_big_file.py data.bin --hash --quiet

Installation

No external dependencies required. Works with Python 3.6+.

# Clone or download the scripts
git clone <repository-url>
cd bigfilegen

# Make scripts executable (optional, Unix/Linux/Mac)
chmod +x make_big_file.py read_big_file.py

Requirements

  • Python 3.6 or higher
  • Sufficient disk space for file creation
  • Read/write permissions in target directories

Performance Tips

Chunk Size Optimization

  • SSDs: Use larger chunks (64-128MB) for better performance
  • HDDs: Use moderate chunks (32-64MB) to balance speed and memory
  • Network drives: Experiment with different sizes based on network speed

File System Considerations

  • NTFS (Windows): Supports files up to 16 EiB
  • exFAT: Good for large files on external drives
  • ext4 (Linux): Supports files up to 16 TiB
  • APFS/HFS+ (macOS): Supports very large files

Use Cases

  • Performance Testing: Benchmark disk I/O speeds
  • Storage Validation: Verify storage capacity and integrity
  • Transfer Testing: Test file transfer mechanisms and speeds
  • Application Testing: Test applications with large file handling
  • Disk Burn-in: Stress test new storage devices

Output Examples

Creating a file:

Creating file: test.bin
Target size: 15.00 GiB
Chunk size: 64.00 MiB

Progress: 5% (768.00 MiB written)
Written: 1.50 GiB, Speed: 1.23 GiB/s
Progress: 10% (1.50 GiB written)
...
✓ Successfully created test.bin (15.00 GiB)
Time taken: 12.34 seconds
Average speed: 1.22 GiB/s

Reading a file:

Reading file: test.bin
File size: 15.00 GiB
Chunk size: 64.00 MiB

Progress: 5% (768.00 MiB read)
Read: 1.50 GiB, Speed: 1.45 GiB/s
Progress: 10% (1.50 GiB read)
...
✓ Successfully read 15.00 GiB
Time taken: 10.12 seconds
Average speed: 1.48 GiB/s
SHA256: a3d5c... (if --hash was used)

Error Handling

Both tools include comprehensive error handling:

  • File existence checks
  • Disk space validation
  • Permission verification
  • Interrupt handling (Ctrl+C)
  • Automatic cleanup on errors

Exit Codes

  • 0 - Success
  • 1 - General error (file not found, permission denied, etc.)
  • 130 - Interrupted by user (Ctrl+C)

License

MIT License - Feel free to use and modify as needed.

Contributing

Contributions welcome! Feel free to submit issues or pull requests.

Author

Created for testing and benchmarking large file operations.