first commit

This commit is contained in:
Space-Banane
2026-01-16 21:30:17 +01:00
commit ca1f80d470
3 changed files with 596 additions and 0 deletions

184
README.md Normal file
View File

@@ -0,0 +1,184 @@
# Big File Generator
A collection of Python CLI tools for creating and reading large binary files. Useful for testing disk I/O performance, storage systems, and file transfer mechanisms.
## Tools
### `make_big_file.py` - File Generator
Creates large binary files filled with zeros for testing purposes.
**Features:**
- Configurable file size with human-readable units (GB, TB, MB, etc.)
- Adjustable chunk size for write optimization
- Disk space validation before writing
- Real-time progress reporting with speed metrics
- Prevents accidental file overwrites
- Graceful interrupt handling with cleanup
- Quiet mode for scripting
**Usage:**
```bash
python make_big_file.py <output> <size> [options]
```
**Arguments:**
- `output` - Output file path
- `size` - File size (e.g., 15GB, 1.5TB, 500MB)
**Options:**
- `--chunk-size <size>` - Chunk size for writing (default: 64MB)
- `--quiet, -q` - Suppress progress output
- `--version` - Show version information
- `--help, -h` - Show help message
**Examples:**
```bash
# Create a 15GB file
python make_big_file.py output.bin 15GB
# Create a 1.5TB file with 128MB chunks
python make_big_file.py bigfile.dat 1.5TB --chunk-size 128MB
# Create a 500MB file quietly
python make_big_file.py test.bin 500MB --quiet
```
### `read_big_file.py` - File Reader & Benchmark
Reads large files and measures I/O performance, optionally computing checksums.
**Features:**
- Configurable chunk size for read optimization
- Real-time progress reporting with speed metrics
- SHA256 hash computation option
- File validation before reading
- Quiet mode for scripting
- Graceful interrupt handling
**Usage:**
```bash
python read_big_file.py <input> [options]
```
**Arguments:**
- `input` - Input file path to read
**Options:**
- `--chunk-size <size>` - Chunk size for reading (default: 64MB)
- `--hash` - Compute SHA256 hash of the file
- `--quiet, -q` - Suppress progress output
- `--version` - Show version information
- `--help, -h` - Show help message
**Examples:**
```bash
# Read a large file
python read_big_file.py largefile.bin
# Read with 128MB chunks and compute hash
python read_big_file.py test.dat --chunk-size 128MB --hash
# Read quietly and compute hash
python read_big_file.py data.bin --hash --quiet
```
## Installation
No external dependencies required. Works with Python 3.6+.
```bash
# Clone or download the scripts
git clone <repository-url>
cd bigfilegen
# Make scripts executable (optional, Unix/Linux/Mac)
chmod +x make_big_file.py read_big_file.py
```
## Requirements
- Python 3.6 or higher
- Sufficient disk space for file creation
- Read/write permissions in target directories
## Performance Tips
### Chunk Size Optimization
- **SSDs**: Use larger chunks (64-128MB) for better performance
- **HDDs**: Use moderate chunks (32-64MB) to balance speed and memory
- **Network drives**: Experiment with different sizes based on network speed
### File System Considerations
- **NTFS** (Windows): Supports files up to 16 EiB
- **exFAT**: Good for large files on external drives
- **ext4** (Linux): Supports files up to 16 TiB
- **APFS/HFS+** (macOS): Supports very large files
## Use Cases
- **Performance Testing**: Benchmark disk I/O speeds
- **Storage Validation**: Verify storage capacity and integrity
- **Transfer Testing**: Test file transfer mechanisms and speeds
- **Application Testing**: Test applications with large file handling
- **Disk Burn-in**: Stress test new storage devices
## Output Examples
### Creating a file:
```
Creating file: test.bin
Target size: 15.00 GiB
Chunk size: 64.00 MiB
Progress: 5% (768.00 MiB written)
Written: 1.50 GiB, Speed: 1.23 GiB/s
Progress: 10% (1.50 GiB written)
...
✓ Successfully created test.bin (15.00 GiB)
Time taken: 12.34 seconds
Average speed: 1.22 GiB/s
```
### Reading a file:
```
Reading file: test.bin
File size: 15.00 GiB
Chunk size: 64.00 MiB
Progress: 5% (768.00 MiB read)
Read: 1.50 GiB, Speed: 1.45 GiB/s
Progress: 10% (1.50 GiB read)
...
✓ Successfully read 15.00 GiB
Time taken: 10.12 seconds
Average speed: 1.48 GiB/s
SHA256: a3d5c... (if --hash was used)
```
## Error Handling
Both tools include comprehensive error handling:
- File existence checks
- Disk space validation
- Permission verification
- Interrupt handling (Ctrl+C)
- Automatic cleanup on errors
## Exit Codes
- `0` - Success
- `1` - General error (file not found, permission denied, etc.)
- `130` - Interrupted by user (Ctrl+C)
## License
MIT License - Feel free to use and modify as needed.
## Contributing
Contributions welcome! Feel free to submit issues or pull requests.
## Author
Created for testing and benchmarking large file operations.

218
make_big_file.py Normal file
View File

@@ -0,0 +1,218 @@
#!/usr/bin/env python3
"""Generate large binary files for testing purposes."""
import argparse
import os
import sys
import time
def parse_size(size_str):
"""Parse size string like '15GB', '1.5TB', '500MB' to bytes."""
size_str = size_str.upper().strip()
units = {
'B': 1,
'KB': 1024,
'MB': 1024**2,
'GB': 1024**3,
'TB': 1024**4,
'KIB': 1024,
'MIB': 1024**2,
'GIB': 1024**3,
'TIB': 1024**4,
}
for unit, multiplier in units.items():
if size_str.endswith(unit):
try:
value = float(size_str[:-len(unit)])
return int(value * multiplier)
except ValueError:
raise ValueError(f"Invalid size format: {size_str}")
# Try parsing as raw bytes
try:
return int(size_str)
except ValueError:
raise ValueError(f"Invalid size format: {size_str}. Use format like '15GB', '1.5TB', '500MB'")
def format_bytes(bytes_val):
"""Format bytes to human-readable string."""
for unit in ['B', 'KiB', 'MiB', 'GiB', 'TiB']:
if bytes_val < 1024.0:
return f"{bytes_val:.2f} {unit}"
bytes_val /= 1024.0
return f"{bytes_val:.2f} PiB"
def create_file(output_path, total_bytes, chunk_size, quiet=False):
"""Create a file filled with zeros."""
# Check if file already exists
if os.path.exists(output_path):
print(f"Error: File '{output_path}' already exists", file=sys.stderr)
return 1
# Check if directory exists
output_dir = os.path.dirname(output_path)
if output_dir and not os.path.exists(output_dir):
print(f"Error: Directory '{output_dir}' does not exist", file=sys.stderr)
return 1
# Check available disk space
try:
if hasattr(os, 'statvfs'): # Unix
stat = os.statvfs(output_dir or '.')
free_space = stat.f_bavail * stat.f_frsize
else: # Windows
import ctypes
free_bytes = ctypes.c_ulonglong(0)
ctypes.windll.kernel32.GetDiskFreeSpaceExW(
ctypes.c_wchar_p(output_dir or '.'),
None, None,
ctypes.pointer(free_bytes)
)
free_space = free_bytes.value
if free_space < total_bytes:
print(f"Error: Insufficient disk space. Required: {format_bytes(total_bytes)}, Available: {format_bytes(free_space)}", file=sys.stderr)
return 1
except Exception as e:
print(f"Warning: Could not check disk space: {e}", file=sys.stderr)
chunk = b"\x00" * chunk_size
if not quiet:
print(f"Creating file: {output_path}")
print(f"Target size: {format_bytes(total_bytes)}")
print(f"Chunk size: {format_bytes(chunk_size)}")
print()
start_time = time.time()
last_gb_log_time = start_time
last_gb_written = 0
try:
with open(output_path, "wb") as f:
written = 0
last_reported_percent = -1
while written + chunk_size <= total_bytes:
f.write(chunk)
written += chunk_size
if not quiet:
percent = int((written / total_bytes) * 100)
if percent != last_reported_percent and percent % 5 == 0:
print(f"Progress: {percent}% ({format_bytes(written)} written)")
last_reported_percent = percent
# Per second GB log
now = time.time()
if now - last_gb_log_time >= 1.0:
gb_written = written / (1024**3)
gb_per_sec = (written - last_gb_written) / (1024**3) / (now - last_gb_log_time)
print(f"Written: {gb_written:.2f} GiB, Speed: {gb_per_sec:.2f} GiB/s")
last_gb_log_time = now
last_gb_written = written
# Write leftover
leftover = total_bytes - written
if leftover:
f.write(b"\x00" * leftover)
written += leftover
end_time = time.time()
elapsed = end_time - start_time
if not quiet:
print()
print(f"✓ Successfully created {output_path} ({format_bytes(written)})")
if elapsed > 0:
print(f"Time taken: {elapsed:.2f} seconds")
print(f"Average speed: {format_bytes(written / elapsed)}/s")
return 0
except KeyboardInterrupt:
print("\n\nInterrupted by user", file=sys.stderr)
# Clean up partial file
if os.path.exists(output_path):
print(f"Cleaning up partial file: {output_path}", file=sys.stderr)
try:
os.remove(output_path)
except Exception as e:
print(f"Warning: Could not remove partial file: {e}", file=sys.stderr)
return 130
except IOError as e:
print(f"Error writing file: {e}", file=sys.stderr)
# Clean up partial file
if os.path.exists(output_path):
try:
os.remove(output_path)
except Exception:
pass
return 1
def main():
parser = argparse.ArgumentParser(
description='Generate large binary files filled with zeros for testing purposes.',
epilog='Examples:\n'
' %(prog)s output.bin 15GB\n'
' %(prog)s test.dat 1.5TB --chunk-size 128MB\n'
' %(prog)s small.bin 500MB --quiet',
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
'output',
help='Output file path'
)
parser.add_argument(
'size',
help='File size (e.g., 15GB, 1.5TB, 500MB, 1073741824)'
)
parser.add_argument(
'--chunk-size',
default='64MB',
help='Chunk size for writing (default: 64MB)'
)
parser.add_argument(
'--quiet', '-q',
action='store_true',
help='Suppress progress output'
)
parser.add_argument(
'--version',
action='version',
version='%(prog)s 1.0.0'
)
args = parser.parse_args()
try:
total_bytes = parse_size(args.size)
chunk_size = parse_size(args.chunk_size)
except ValueError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
if total_bytes <= 0:
print("Error: Size must be greater than 0", file=sys.stderr)
return 1
if chunk_size <= 0 or chunk_size > 1024**3: # Max 1GB chunk
print("Error: Chunk size must be between 1 byte and 1GB", file=sys.stderr)
return 1
return create_file(args.output, total_bytes, chunk_size, args.quiet)
if __name__ == "__main__":
sys.exit(main())

194
read_big_file.py Normal file
View File

@@ -0,0 +1,194 @@
#!/usr/bin/env python3
"""Read and benchmark large file I/O performance."""
import argparse
import hashlib
import os
import sys
import time
def parse_size(size_str):
"""Parse size string like '64MB', '128KB' to bytes."""
size_str = size_str.upper().strip()
units = {
'B': 1,
'KB': 1024,
'MB': 1024**2,
'GB': 1024**3,
'TB': 1024**4,
'KIB': 1024,
'MIB': 1024**2,
'GIB': 1024**3,
'TIB': 1024**4,
}
for unit, multiplier in units.items():
if size_str.endswith(unit):
try:
value = float(size_str[:-len(unit)])
return int(value * multiplier)
except ValueError:
raise ValueError(f"Invalid size format: {size_str}")
# Try parsing as raw bytes
try:
return int(size_str)
except ValueError:
raise ValueError(f"Invalid size format: {size_str}. Use format like '64MB', '128KB'")
def format_bytes(bytes_val):
"""Format bytes to human-readable string."""
for unit in ['B', 'KiB', 'MiB', 'GiB', 'TiB']:
if bytes_val < 1024.0:
return f"{bytes_val:.2f} {unit}"
bytes_val /= 1024.0
return f"{bytes_val:.2f} PiB"
def read_file(input_path, chunk_size, compute_hash=False, quiet=False):
"""Read a file and optionally compute its hash."""
if not os.path.exists(input_path):
print(f"Error: File '{input_path}' does not exist", file=sys.stderr)
return 1
if not os.path.isfile(input_path):
print(f"Error: '{input_path}' is not a file", file=sys.stderr)
return 1
try:
total_bytes = os.path.getsize(input_path)
except OSError as e:
print(f"Error: Cannot get file size: {e}", file=sys.stderr)
return 1
if total_bytes == 0:
print(f"Warning: File is empty", file=sys.stderr)
return 0
if not quiet:
print(f"Reading file: {input_path}")
print(f"File size: {format_bytes(total_bytes)}")
print(f"Chunk size: {format_bytes(chunk_size)}")
if compute_hash:
print(f"Computing: SHA256 hash")
print()
start_time = time.time()
last_gb_log_time = start_time
last_gb_read = 0
hash_obj = hashlib.sha256() if compute_hash else None
try:
with open(input_path, "rb") as f:
bytes_read = 0
last_reported_percent = -1
while True:
chunk = f.read(chunk_size)
if not chunk:
break
bytes_read += len(chunk)
if compute_hash:
hash_obj.update(chunk)
if not quiet:
percent = int((bytes_read / total_bytes) * 100)
if percent != last_reported_percent and percent % 5 == 0:
print(f"Progress: {percent}% ({format_bytes(bytes_read)} read)")
last_reported_percent = percent
# Per second GB log
now = time.time()
if now - last_gb_log_time >= 1.0:
gb_read = bytes_read / (1024**3)
gb_per_sec = (bytes_read - last_gb_read) / (1024**3) / (now - last_gb_log_time)
print(f"Read: {gb_read:.2f} GiB, Speed: {gb_per_sec:.2f} GiB/s")
last_gb_log_time = now
last_gb_read = bytes_read
end_time = time.time()
elapsed = end_time - start_time
if not quiet:
print()
print(f"✓ Successfully read {format_bytes(bytes_read)}")
if elapsed > 0:
print(f"Time taken: {elapsed:.2f} seconds")
print(f"Average speed: {format_bytes(bytes_read / elapsed)}/s")
if compute_hash:
print(f"SHA256: {hash_obj.hexdigest()}")
return 0
except KeyboardInterrupt:
print("\n\nInterrupted by user", file=sys.stderr)
return 130
except IOError as e:
print(f"Error reading file: {e}", file=sys.stderr)
return 1
def main():
parser = argparse.ArgumentParser(
description='Read and benchmark large file I/O performance.',
epilog='Examples:\n'
' %(prog)s largefile.bin\n'
' %(prog)s test.dat --chunk-size 128MB\n'
' %(prog)s data.bin --hash --quiet',
formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument(
'input',
help='Input file path to read'
)
parser.add_argument(
'--chunk-size',
default='64MB',
help='Chunk size for reading (default: 64MB)'
)
parser.add_argument(
'--hash',
action='store_true',
help='Compute SHA256 hash of the file'
)
parser.add_argument(
'--quiet', '-q',
action='store_true',
help='Suppress progress output'
)
parser.add_argument(
'--version',
action='version',
version='%(prog)s 1.0.0'
)
args = parser.parse_args()
try:
chunk_size = parse_size(args.chunk_size)
except ValueError as e:
print(f"Error: {e}", file=sys.stderr)
return 1
if chunk_size <= 0 or chunk_size > 1024**3: # Max 1GB chunk
print("Error: Chunk size must be between 1 byte and 1GB", file=sys.stderr)
return 1
return read_file(args.input, chunk_size, args.hash, args.quiet)
if __name__ == "__main__":
sys.exit(main())