Rewrite big file CLI tools

This commit is contained in:
2026-04-08 00:27:11 +02:00
parent ca1f80d470
commit 31b46d0dd5
4 changed files with 381 additions and 573 deletions

227
README.md
View File

@@ -1,184 +1,69 @@
# Big File Generator
# big-file-gen
A collection of Python CLI tools for creating and reading large binary files. Useful for testing disk I/O performance, storage systems, and file transfer mechanisms.
Small Python CLI tools for creating and reading large files.
Useful for storage testing, transfer checks, and dumb-fun disk abuse.
## Tools
## What it does
### `make_big_file.py` - File Generator
- create large binary files filled with zeros
- optionally create sparse files instead
- read files back and measure throughput
- optionally compute SHA256 while reading
- no third-party dependencies
Creates large binary files filled with zeros for testing purposes.
## Usage
**Features:**
- Configurable file size with human-readable units (GB, TB, MB, etc.)
- Adjustable chunk size for write optimization
- Disk space validation before writing
- Real-time progress reporting with speed metrics
- Prevents accidental file overwrites
- Graceful interrupt handling with cleanup
- Quiet mode for scripting
**Usage:**
```bash
python make_big_file.py <output> <size> [options]
```
**Arguments:**
- `output` - Output file path
- `size` - File size (e.g., 15GB, 1.5TB, 500MB)
**Options:**
- `--chunk-size <size>` - Chunk size for writing (default: 64MB)
- `--quiet, -q` - Suppress progress output
- `--version` - Show version information
- `--help, -h` - Show help message
**Examples:**
```bash
# Create a 15GB file
python make_big_file.py output.bin 15GB
# Create a 1.5TB file with 128MB chunks
python make_big_file.py bigfile.dat 1.5TB --chunk-size 128MB
# Create a 500MB file quietly
python make_big_file.py test.bin 500MB --quiet
```
### `read_big_file.py` - File Reader & Benchmark
Reads large files and measures I/O performance, optionally computing checksums.
**Features:**
- Configurable chunk size for read optimization
- Real-time progress reporting with speed metrics
- SHA256 hash computation option
- File validation before reading
- Quiet mode for scripting
- Graceful interrupt handling
**Usage:**
```bash
python read_big_file.py <input> [options]
```
**Arguments:**
- `input` - Input file path to read
**Options:**
- `--chunk-size <size>` - Chunk size for reading (default: 64MB)
- `--hash` - Compute SHA256 hash of the file
- `--quiet, -q` - Suppress progress output
- `--version` - Show version information
- `--help, -h` - Show help message
**Examples:**
```bash
# Read a large file
python read_big_file.py largefile.bin
# Read with 128MB chunks and compute hash
python read_big_file.py test.dat --chunk-size 128MB --hash
# Read quietly and compute hash
python read_big_file.py data.bin --hash --quiet
```
## Installation
No external dependencies required. Works with Python 3.6+.
### Create a file
```bash
# Clone or download the scripts
git clone <repository-url>
cd bigfilegen
# Make scripts executable (optional, Unix/Linux/Mac)
chmod +x make_big_file.py read_big_file.py
python make_big_file.py <output> <size> [--chunk-size SIZE] [--sparse] [--quiet]
```
Examples:
```bash
python make_big_file.py test.bin 15GB
python make_big_file.py dump.dat 1.5TiB --chunk-size 128MB
python make_big_file.py tiny.bin 500MB --quiet
python make_big_file.py sparse.img 20GB --sparse
```
### Read a file
```bash
python read_big_file.py <input> [--chunk-size SIZE] [--hash] [--quiet]
```
Examples:
```bash
python read_big_file.py test.bin
python read_big_file.py dump.dat --chunk-size 128MB --hash
python read_big_file.py tiny.bin --quiet
```
## Size formats
Binary units are supported:
- `B`
- `KB`, `MB`, `GB`, `TB`, `PB`
- `KiB`, `MiB`, `GiB`, `TiB`, `PiB`
Plain numbers are treated as bytes.
## Exit codes
- `0` success
- `1` failure
- `130` interrupted
## Requirements
- Python 3.6 or higher
- Sufficient disk space for file creation
- Read/write permissions in target directories
- Python 3.8+
- enough disk space for real writes
## Performance Tips
## Notes
### Chunk Size Optimization
- **SSDs**: Use larger chunks (64-128MB) for better performance
- **HDDs**: Use moderate chunks (32-64MB) to balance speed and memory
- **Network drives**: Experiment with different sizes based on network speed
### File System Considerations
- **NTFS** (Windows): Supports files up to 16 EiB
- **exFAT**: Good for large files on external drives
- **ext4** (Linux): Supports files up to 16 TiB
- **APFS/HFS+** (macOS): Supports very large files
## Use Cases
- **Performance Testing**: Benchmark disk I/O speeds
- **Storage Validation**: Verify storage capacity and integrity
- **Transfer Testing**: Test file transfer mechanisms and speeds
- **Application Testing**: Test applications with large file handling
- **Disk Burn-in**: Stress test new storage devices
## Output Examples
### Creating a file:
```
Creating file: test.bin
Target size: 15.00 GiB
Chunk size: 64.00 MiB
Progress: 5% (768.00 MiB written)
Written: 1.50 GiB, Speed: 1.23 GiB/s
Progress: 10% (1.50 GiB written)
...
✓ Successfully created test.bin (15.00 GiB)
Time taken: 12.34 seconds
Average speed: 1.22 GiB/s
```
### Reading a file:
```
Reading file: test.bin
File size: 15.00 GiB
Chunk size: 64.00 MiB
Progress: 5% (768.00 MiB read)
Read: 1.50 GiB, Speed: 1.45 GiB/s
Progress: 10% (1.50 GiB read)
...
✓ Successfully read 15.00 GiB
Time taken: 10.12 seconds
Average speed: 1.48 GiB/s
SHA256: a3d5c... (if --hash was used)
```
## Error Handling
Both tools include comprehensive error handling:
- File existence checks
- Disk space validation
- Permission verification
- Interrupt handling (Ctrl+C)
- Automatic cleanup on errors
## Exit Codes
- `0` - Success
- `1` - General error (file not found, permission denied, etc.)
- `130` - Interrupted by user (Ctrl+C)
## License
MIT License - Feel free to use and modify as needed.
## Contributing
Contributions welcome! Feel free to submit issues or pull requests.
## Author
Created for testing and benchmarking large file operations.
- `--sparse` is handy when you want a huge file without actually burning the disk.
- `--hash` is SHA256, because anything weaker would be cosplay.