diff --git a/README.md b/README.md index c89fb39..377071e 100644 --- a/README.md +++ b/README.md @@ -1,184 +1,69 @@ -# Big File Generator +# big-file-gen -A collection of Python CLI tools for creating and reading large binary files. Useful for testing disk I/O performance, storage systems, and file transfer mechanisms. +Small Python CLI tools for creating and reading large files. +Useful for storage testing, transfer checks, and dumb-fun disk abuse. -## Tools +## What it does -### `make_big_file.py` - File Generator +- create large binary files filled with zeros +- optionally create sparse files instead +- read files back and measure throughput +- optionally compute SHA256 while reading +- no third-party dependencies -Creates large binary files filled with zeros for testing purposes. +## Usage -**Features:** -- Configurable file size with human-readable units (GB, TB, MB, etc.) -- Adjustable chunk size for write optimization -- Disk space validation before writing -- Real-time progress reporting with speed metrics -- Prevents accidental file overwrites -- Graceful interrupt handling with cleanup -- Quiet mode for scripting - -**Usage:** -```bash -python make_big_file.py [options] -``` - -**Arguments:** -- `output` - Output file path -- `size` - File size (e.g., 15GB, 1.5TB, 500MB) - -**Options:** -- `--chunk-size ` - Chunk size for writing (default: 64MB) -- `--quiet, -q` - Suppress progress output -- `--version` - Show version information -- `--help, -h` - Show help message - -**Examples:** -```bash -# Create a 15GB file -python make_big_file.py output.bin 15GB - -# Create a 1.5TB file with 128MB chunks -python make_big_file.py bigfile.dat 1.5TB --chunk-size 128MB - -# Create a 500MB file quietly -python make_big_file.py test.bin 500MB --quiet -``` - -### `read_big_file.py` - File Reader & Benchmark - -Reads large files and measures I/O performance, optionally computing checksums. - -**Features:** -- Configurable chunk size for read optimization -- Real-time progress reporting with speed metrics -- SHA256 hash computation option -- File validation before reading -- Quiet mode for scripting -- Graceful interrupt handling - -**Usage:** -```bash -python read_big_file.py [options] -``` - -**Arguments:** -- `input` - Input file path to read - -**Options:** -- `--chunk-size ` - Chunk size for reading (default: 64MB) -- `--hash` - Compute SHA256 hash of the file -- `--quiet, -q` - Suppress progress output -- `--version` - Show version information -- `--help, -h` - Show help message - -**Examples:** -```bash -# Read a large file -python read_big_file.py largefile.bin - -# Read with 128MB chunks and compute hash -python read_big_file.py test.dat --chunk-size 128MB --hash - -# Read quietly and compute hash -python read_big_file.py data.bin --hash --quiet -``` - -## Installation - -No external dependencies required. Works with Python 3.6+. +### Create a file ```bash -# Clone or download the scripts -git clone -cd bigfilegen - -# Make scripts executable (optional, Unix/Linux/Mac) -chmod +x make_big_file.py read_big_file.py +python make_big_file.py [--chunk-size SIZE] [--sparse] [--quiet] ``` +Examples: + +```bash +python make_big_file.py test.bin 15GB +python make_big_file.py dump.dat 1.5TiB --chunk-size 128MB +python make_big_file.py tiny.bin 500MB --quiet +python make_big_file.py sparse.img 20GB --sparse +``` + +### Read a file + +```bash +python read_big_file.py [--chunk-size SIZE] [--hash] [--quiet] +``` + +Examples: + +```bash +python read_big_file.py test.bin +python read_big_file.py dump.dat --chunk-size 128MB --hash +python read_big_file.py tiny.bin --quiet +``` + +## Size formats + +Binary units are supported: + +- `B` +- `KB`, `MB`, `GB`, `TB`, `PB` +- `KiB`, `MiB`, `GiB`, `TiB`, `PiB` + +Plain numbers are treated as bytes. + +## Exit codes + +- `0` success +- `1` failure +- `130` interrupted + ## Requirements -- Python 3.6 or higher -- Sufficient disk space for file creation -- Read/write permissions in target directories +- Python 3.8+ +- enough disk space for real writes -## Performance Tips +## Notes -### Chunk Size Optimization -- **SSDs**: Use larger chunks (64-128MB) for better performance -- **HDDs**: Use moderate chunks (32-64MB) to balance speed and memory -- **Network drives**: Experiment with different sizes based on network speed - -### File System Considerations -- **NTFS** (Windows): Supports files up to 16 EiB -- **exFAT**: Good for large files on external drives -- **ext4** (Linux): Supports files up to 16 TiB -- **APFS/HFS+** (macOS): Supports very large files - -## Use Cases - -- **Performance Testing**: Benchmark disk I/O speeds -- **Storage Validation**: Verify storage capacity and integrity -- **Transfer Testing**: Test file transfer mechanisms and speeds -- **Application Testing**: Test applications with large file handling -- **Disk Burn-in**: Stress test new storage devices - -## Output Examples - -### Creating a file: -``` -Creating file: test.bin -Target size: 15.00 GiB -Chunk size: 64.00 MiB - -Progress: 5% (768.00 MiB written) -Written: 1.50 GiB, Speed: 1.23 GiB/s -Progress: 10% (1.50 GiB written) -... -✓ Successfully created test.bin (15.00 GiB) -Time taken: 12.34 seconds -Average speed: 1.22 GiB/s -``` - -### Reading a file: -``` -Reading file: test.bin -File size: 15.00 GiB -Chunk size: 64.00 MiB - -Progress: 5% (768.00 MiB read) -Read: 1.50 GiB, Speed: 1.45 GiB/s -Progress: 10% (1.50 GiB read) -... -✓ Successfully read 15.00 GiB -Time taken: 10.12 seconds -Average speed: 1.48 GiB/s -SHA256: a3d5c... (if --hash was used) -``` - -## Error Handling - -Both tools include comprehensive error handling: -- File existence checks -- Disk space validation -- Permission verification -- Interrupt handling (Ctrl+C) -- Automatic cleanup on errors - -## Exit Codes - -- `0` - Success -- `1` - General error (file not found, permission denied, etc.) -- `130` - Interrupted by user (Ctrl+C) - -## License - -MIT License - Feel free to use and modify as needed. - -## Contributing - -Contributions welcome! Feel free to submit issues or pull requests. - -## Author - -Created for testing and benchmarking large file operations. +- `--sparse` is handy when you want a huge file without actually burning the disk. +- `--hash` is SHA256, because anything weaker would be cosplay. diff --git a/big_file_gen.py b/big_file_gen.py new file mode 100755 index 0000000..b065d20 --- /dev/null +++ b/big_file_gen.py @@ -0,0 +1,319 @@ +#!/usr/bin/env python3 +"""Shared logic for generating and reading large files.""" + +from __future__ import annotations + +import argparse +import hashlib +import os +import re +import shutil +import sys +import time +from dataclasses import dataclass +from pathlib import Path +from typing import Iterable, Optional + +VERSION = "2.0.0" +MAX_CHUNK_SIZE = 1024 ** 3 +_PROGRESS_STEP = 5 +_SIZE_RE = re.compile(r"^\s*(?P\d+(?:\.\d+)?)\s*(?P[A-Za-z]*)\s*$") + +_BINARY_UNITS = { + "": 1, + "B": 1, + "K": 1024, + "KB": 1024, + "KIB": 1024, + "M": 1024 ** 2, + "MB": 1024 ** 2, + "MIB": 1024 ** 2, + "G": 1024 ** 3, + "GB": 1024 ** 3, + "GIB": 1024 ** 3, + "T": 1024 ** 4, + "TB": 1024 ** 4, + "TIB": 1024 ** 4, + "P": 1024 ** 5, + "PB": 1024 ** 5, + "PIB": 1024 ** 5, +} + + +@dataclass(frozen=True) +class ProgressState: + total_bytes: int + last_reported_percent: int = -1 + last_log_time: float = 0.0 + last_log_bytes: int = 0 + + +def parse_size(value: str) -> int: + match = _SIZE_RE.match(value) + if not match: + raise ValueError(f"invalid size: {value!r}") + + amount = float(match.group("value")) + unit = match.group("unit").upper() + if unit not in _BINARY_UNITS: + raise ValueError(f"unknown size unit: {unit or 'bytes'}") + + result = int(amount * _BINARY_UNITS[unit]) + if result <= 0: + raise ValueError("size must be greater than zero") + return result + + +def format_bytes(value: int) -> str: + units = ["B", "KiB", "MiB", "GiB", "TiB", "PiB"] + size = float(value) + for unit in units: + if size < 1024.0 or unit == units[-1]: + return f"{size:.2f} {unit}" + size /= 1024.0 + return f"{size:.2f} PiB" + + +def _disk_free_bytes(path: Path) -> int: + usage = shutil.disk_usage(path) + return usage.free + + +def _ensure_parent_dir(path: Path) -> None: + if path.parent and not path.parent.exists(): + raise FileNotFoundError(f"directory does not exist: {path.parent}") + + +def _write_progress(prefix: str, current: int, total: int) -> None: + percent = min(100, int(current * 100 / total)) if total else 100 + print(f"{prefix}: {percent}% ({format_bytes(current)} of {format_bytes(total)})") + + +def _maybe_log_throughput(prefix: str, bytes_done: int, started_at: float, state: ProgressState) -> ProgressState: + now = time.time() + if state.last_log_time == 0.0: + return ProgressState(state.total_bytes, state.last_reported_percent, now, bytes_done) + elapsed = now - state.last_log_time + if elapsed < 1.0: + return state + delta = bytes_done - state.last_log_bytes + speed = delta / elapsed if elapsed > 0 else 0 + print(f"{prefix}: {format_bytes(bytes_done)} at {format_bytes(int(speed))}/s") + return ProgressState(state.total_bytes, state.last_reported_percent, now, bytes_done) + + +def create_file(output: str | Path, total_bytes: int, chunk_size: int, quiet: bool = False, sparse: bool = False) -> int: + output_path = Path(output) + + if output_path.exists(): + print(f"Error: file already exists: {output_path}", file=sys.stderr) + return 1 + + _ensure_parent_dir(output_path) + + if chunk_size <= 0 or chunk_size > MAX_CHUNK_SIZE: + print("Error: chunk size must be between 1 byte and 1 GiB", file=sys.stderr) + return 1 + + free_bytes = _disk_free_bytes(output_path.parent if output_path.parent != Path("") else Path(".")) + if not sparse and free_bytes < total_bytes: + print( + f"Error: not enough disk space, need {format_bytes(total_bytes)}, have {format_bytes(free_bytes)}", + file=sys.stderr, + ) + return 1 + + if not quiet: + print(f"Creating: {output_path}") + print(f"Size: {format_bytes(total_bytes)}") + print(f"Chunk: {format_bytes(chunk_size)}") + if sparse: + print("Mode: sparse") + print() + + started_at = time.time() + progress = ProgressState(total_bytes=total_bytes, last_log_time=started_at, last_log_bytes=0) + buffer = b"\x00" * chunk_size + + try: + with output_path.open("wb") as handle: + if sparse: + handle.truncate(total_bytes) + written = total_bytes + else: + written = 0 + while written < total_bytes: + step = min(chunk_size, total_bytes - written) + handle.write(buffer[:step]) + written += step + + if quiet: + continue + + percent = int(written * 100 / total_bytes) + if percent >= progress.last_reported_percent + _PROGRESS_STEP: + _write_progress("Progress", written, total_bytes) + progress = ProgressState(total_bytes, percent, progress.last_log_time, progress.last_log_bytes) + + progress = _maybe_log_throughput("Speed", written, started_at, progress) + + elapsed = time.time() - started_at + print(f"Done, wrote {format_bytes(written)} to {output_path}") + print(f"Elapsed: {elapsed:.2f}s") + if elapsed > 0: + print(f"Average: {format_bytes(int(written / elapsed))}/s") + return 0 + except KeyboardInterrupt: + print("Interrupted, cleaning up partial file", file=sys.stderr) + try: + if output_path.exists(): + output_path.unlink() + except OSError: + pass + return 130 + except OSError as exc: + print(f"Error writing file: {exc}", file=sys.stderr) + try: + if output_path.exists(): + output_path.unlink() + except OSError: + pass + return 1 + + +def read_file(input_path: str | Path, chunk_size: int, compute_hash: bool = False, quiet: bool = False) -> int: + path = Path(input_path) + + if not path.exists(): + print(f"Error: file not found: {path}", file=sys.stderr) + return 1 + if not path.is_file(): + print(f"Error: not a file: {path}", file=sys.stderr) + return 1 + + if chunk_size <= 0 or chunk_size > MAX_CHUNK_SIZE: + print("Error: chunk size must be between 1 byte and 1 GiB", file=sys.stderr) + return 1 + + try: + total_bytes = path.stat().st_size + except OSError as exc: + print(f"Error reading file metadata: {exc}", file=sys.stderr) + return 1 + + if not quiet: + print(f"Reading: {path}") + print(f"Size: {format_bytes(total_bytes)}") + print(f"Chunk: {format_bytes(chunk_size)}") + if compute_hash: + print("Hash: sha256") + print() + + started_at = time.time() + progress = ProgressState(total_bytes=total_bytes, last_log_time=started_at, last_log_bytes=0) + hasher = hashlib.sha256() if compute_hash else None + bytes_read = 0 + + try: + with path.open("rb") as handle: + while True: + chunk = handle.read(chunk_size) + if not chunk: + break + bytes_read += len(chunk) + + if hasher is not None: + hasher.update(chunk) + + if quiet: + continue + + percent = int(bytes_read * 100 / total_bytes) if total_bytes else 100 + if percent >= progress.last_reported_percent + _PROGRESS_STEP: + _write_progress("Progress", bytes_read, total_bytes) + progress = ProgressState(total_bytes, percent, progress.last_log_time, progress.last_log_bytes) + + progress = _maybe_log_throughput("Speed", bytes_read, started_at, progress) + + elapsed = time.time() - started_at + print(f"Done, read {format_bytes(bytes_read)} from {path}") + print(f"Elapsed: {elapsed:.2f}s") + if elapsed > 0: + print(f"Average: {format_bytes(int(bytes_read / elapsed))}/s") + if hasher is not None: + print(f"SHA256: {hasher.hexdigest()}") + return 0 + except KeyboardInterrupt: + print("Interrupted", file=sys.stderr) + return 130 + except OSError as exc: + print(f"Error reading file: {exc}", file=sys.stderr) + return 1 + + +def build_create_parser(prog: str) -> argparse.ArgumentParser: + parser = argparse.ArgumentParser( + prog=prog, + description="Create large binary files for storage and transfer testing.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=( + "Examples:\n" + f" {prog} output.bin 15GB\n" + f" {prog} dump.dat 1.5TB --chunk-size 128MB\n" + f" {prog} test.bin 500MB --quiet" + ), + ) + parser.add_argument("output", help="Path to the file to create") + parser.add_argument("size", help="Target size, for example 15GB or 1.5TiB") + parser.add_argument("--chunk-size", default="64MB", help="Write chunk size (default: 64MB)") + parser.add_argument("--sparse", action="store_true", help="Create a sparse file instead of writing zeros") + parser.add_argument("--quiet", "-q", action="store_true", help="Suppress progress output") + parser.add_argument("--version", action="version", version=f"{prog} {VERSION}") + return parser + + +def build_read_parser(prog: str) -> argparse.ArgumentParser: + parser = argparse.ArgumentParser( + prog=prog, + description="Read large files and benchmark I/O throughput.", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=( + "Examples:\n" + f" {prog} largefile.bin\n" + f" {prog} test.dat --chunk-size 128MB --hash\n" + f" {prog} data.bin --quiet" + ), + ) + parser.add_argument("input", help="Path to the file to read") + parser.add_argument("--chunk-size", default="64MB", help="Read chunk size (default: 64MB)") + parser.add_argument("--hash", action="store_true", help="Compute SHA256 while reading") + parser.add_argument("--quiet", "-q", action="store_true", help="Suppress progress output") + parser.add_argument("--version", action="version", version=f"{prog} {VERSION}") + return parser + + +def create_main(argv: Optional[Iterable[str]] = None) -> int: + parser = build_create_parser("make_big_file.py") + args = parser.parse_args(list(argv) if argv is not None else None) + try: + total_bytes = parse_size(args.size) + chunk_size = parse_size(args.chunk_size) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + return 1 + return create_file(args.output, total_bytes, chunk_size, args.quiet, args.sparse) + + +def read_main(argv: Optional[Iterable[str]] = None) -> int: + parser = build_read_parser("read_big_file.py") + args = parser.parse_args(list(argv) if argv is not None else None) + try: + chunk_size = parse_size(args.chunk_size) + except ValueError as exc: + print(f"Error: {exc}", file=sys.stderr) + return 1 + return read_file(args.input, chunk_size, args.hash, args.quiet) + + +if __name__ == "__main__": + raise SystemExit(create_main()) diff --git a/make_big_file.py b/make_big_file.py old mode 100644 new mode 100755 index f3a5047..d9d271c --- a/make_big_file.py +++ b/make_big_file.py @@ -1,218 +1,8 @@ #!/usr/bin/env python3 -"""Generate large binary files for testing purposes.""" +"""CLI wrapper for big file creation.""" -import argparse -import os -import sys -import time - - -def parse_size(size_str): - """Parse size string like '15GB', '1.5TB', '500MB' to bytes.""" - size_str = size_str.upper().strip() - units = { - 'B': 1, - 'KB': 1024, - 'MB': 1024**2, - 'GB': 1024**3, - 'TB': 1024**4, - 'KIB': 1024, - 'MIB': 1024**2, - 'GIB': 1024**3, - 'TIB': 1024**4, - } - - for unit, multiplier in units.items(): - if size_str.endswith(unit): - try: - value = float(size_str[:-len(unit)]) - return int(value * multiplier) - except ValueError: - raise ValueError(f"Invalid size format: {size_str}") - - # Try parsing as raw bytes - try: - return int(size_str) - except ValueError: - raise ValueError(f"Invalid size format: {size_str}. Use format like '15GB', '1.5TB', '500MB'") - - -def format_bytes(bytes_val): - """Format bytes to human-readable string.""" - for unit in ['B', 'KiB', 'MiB', 'GiB', 'TiB']: - if bytes_val < 1024.0: - return f"{bytes_val:.2f} {unit}" - bytes_val /= 1024.0 - return f"{bytes_val:.2f} PiB" - - -def create_file(output_path, total_bytes, chunk_size, quiet=False): - """Create a file filled with zeros.""" - # Check if file already exists - if os.path.exists(output_path): - print(f"Error: File '{output_path}' already exists", file=sys.stderr) - return 1 - - # Check if directory exists - output_dir = os.path.dirname(output_path) - if output_dir and not os.path.exists(output_dir): - print(f"Error: Directory '{output_dir}' does not exist", file=sys.stderr) - return 1 - - # Check available disk space - try: - if hasattr(os, 'statvfs'): # Unix - stat = os.statvfs(output_dir or '.') - free_space = stat.f_bavail * stat.f_frsize - else: # Windows - import ctypes - free_bytes = ctypes.c_ulonglong(0) - ctypes.windll.kernel32.GetDiskFreeSpaceExW( - ctypes.c_wchar_p(output_dir or '.'), - None, None, - ctypes.pointer(free_bytes) - ) - free_space = free_bytes.value - - if free_space < total_bytes: - print(f"Error: Insufficient disk space. Required: {format_bytes(total_bytes)}, Available: {format_bytes(free_space)}", file=sys.stderr) - return 1 - except Exception as e: - print(f"Warning: Could not check disk space: {e}", file=sys.stderr) - - chunk = b"\x00" * chunk_size - - if not quiet: - print(f"Creating file: {output_path}") - print(f"Target size: {format_bytes(total_bytes)}") - print(f"Chunk size: {format_bytes(chunk_size)}") - print() - - start_time = time.time() - last_gb_log_time = start_time - last_gb_written = 0 - - try: - with open(output_path, "wb") as f: - written = 0 - last_reported_percent = -1 - - while written + chunk_size <= total_bytes: - f.write(chunk) - written += chunk_size - - if not quiet: - percent = int((written / total_bytes) * 100) - if percent != last_reported_percent and percent % 5 == 0: - print(f"Progress: {percent}% ({format_bytes(written)} written)") - last_reported_percent = percent - - # Per second GB log - now = time.time() - if now - last_gb_log_time >= 1.0: - gb_written = written / (1024**3) - gb_per_sec = (written - last_gb_written) / (1024**3) / (now - last_gb_log_time) - print(f"Written: {gb_written:.2f} GiB, Speed: {gb_per_sec:.2f} GiB/s") - last_gb_log_time = now - last_gb_written = written - - # Write leftover - leftover = total_bytes - written - if leftover: - f.write(b"\x00" * leftover) - written += leftover - - end_time = time.time() - elapsed = end_time - start_time - - if not quiet: - print() - print(f"✓ Successfully created {output_path} ({format_bytes(written)})") - if elapsed > 0: - print(f"Time taken: {elapsed:.2f} seconds") - print(f"Average speed: {format_bytes(written / elapsed)}/s") - - return 0 - - except KeyboardInterrupt: - print("\n\nInterrupted by user", file=sys.stderr) - # Clean up partial file - if os.path.exists(output_path): - print(f"Cleaning up partial file: {output_path}", file=sys.stderr) - try: - os.remove(output_path) - except Exception as e: - print(f"Warning: Could not remove partial file: {e}", file=sys.stderr) - return 130 - - except IOError as e: - print(f"Error writing file: {e}", file=sys.stderr) - # Clean up partial file - if os.path.exists(output_path): - try: - os.remove(output_path) - except Exception: - pass - return 1 - - -def main(): - parser = argparse.ArgumentParser( - description='Generate large binary files filled with zeros for testing purposes.', - epilog='Examples:\n' - ' %(prog)s output.bin 15GB\n' - ' %(prog)s test.dat 1.5TB --chunk-size 128MB\n' - ' %(prog)s small.bin 500MB --quiet', - formatter_class=argparse.RawDescriptionHelpFormatter - ) - - parser.add_argument( - 'output', - help='Output file path' - ) - - parser.add_argument( - 'size', - help='File size (e.g., 15GB, 1.5TB, 500MB, 1073741824)' - ) - - parser.add_argument( - '--chunk-size', - default='64MB', - help='Chunk size for writing (default: 64MB)' - ) - - parser.add_argument( - '--quiet', '-q', - action='store_true', - help='Suppress progress output' - ) - - parser.add_argument( - '--version', - action='version', - version='%(prog)s 1.0.0' - ) - - args = parser.parse_args() - - try: - total_bytes = parse_size(args.size) - chunk_size = parse_size(args.chunk_size) - except ValueError as e: - print(f"Error: {e}", file=sys.stderr) - return 1 - - if total_bytes <= 0: - print("Error: Size must be greater than 0", file=sys.stderr) - return 1 - - if chunk_size <= 0 or chunk_size > 1024**3: # Max 1GB chunk - print("Error: Chunk size must be between 1 byte and 1GB", file=sys.stderr) - return 1 - - return create_file(args.output, total_bytes, chunk_size, args.quiet) +from big_file_gen import create_main if __name__ == "__main__": - sys.exit(main()) + raise SystemExit(create_main()) diff --git a/read_big_file.py b/read_big_file.py old mode 100644 new mode 100755 index 2751a99..061ffd3 --- a/read_big_file.py +++ b/read_big_file.py @@ -1,194 +1,8 @@ #!/usr/bin/env python3 -"""Read and benchmark large file I/O performance.""" +"""CLI wrapper for big file reading and benchmarking.""" -import argparse -import hashlib -import os -import sys -import time - - -def parse_size(size_str): - """Parse size string like '64MB', '128KB' to bytes.""" - size_str = size_str.upper().strip() - units = { - 'B': 1, - 'KB': 1024, - 'MB': 1024**2, - 'GB': 1024**3, - 'TB': 1024**4, - 'KIB': 1024, - 'MIB': 1024**2, - 'GIB': 1024**3, - 'TIB': 1024**4, - } - - for unit, multiplier in units.items(): - if size_str.endswith(unit): - try: - value = float(size_str[:-len(unit)]) - return int(value * multiplier) - except ValueError: - raise ValueError(f"Invalid size format: {size_str}") - - # Try parsing as raw bytes - try: - return int(size_str) - except ValueError: - raise ValueError(f"Invalid size format: {size_str}. Use format like '64MB', '128KB'") - - -def format_bytes(bytes_val): - """Format bytes to human-readable string.""" - for unit in ['B', 'KiB', 'MiB', 'GiB', 'TiB']: - if bytes_val < 1024.0: - return f"{bytes_val:.2f} {unit}" - bytes_val /= 1024.0 - return f"{bytes_val:.2f} PiB" - - -def read_file(input_path, chunk_size, compute_hash=False, quiet=False): - """Read a file and optionally compute its hash.""" - if not os.path.exists(input_path): - print(f"Error: File '{input_path}' does not exist", file=sys.stderr) - return 1 - - if not os.path.isfile(input_path): - print(f"Error: '{input_path}' is not a file", file=sys.stderr) - return 1 - - try: - total_bytes = os.path.getsize(input_path) - except OSError as e: - print(f"Error: Cannot get file size: {e}", file=sys.stderr) - return 1 - - if total_bytes == 0: - print(f"Warning: File is empty", file=sys.stderr) - return 0 - - if not quiet: - print(f"Reading file: {input_path}") - print(f"File size: {format_bytes(total_bytes)}") - print(f"Chunk size: {format_bytes(chunk_size)}") - if compute_hash: - print(f"Computing: SHA256 hash") - print() - - start_time = time.time() - last_gb_log_time = start_time - last_gb_read = 0 - - hash_obj = hashlib.sha256() if compute_hash else None - - try: - with open(input_path, "rb") as f: - bytes_read = 0 - last_reported_percent = -1 - - while True: - chunk = f.read(chunk_size) - if not chunk: - break - - bytes_read += len(chunk) - - if compute_hash: - hash_obj.update(chunk) - - if not quiet: - percent = int((bytes_read / total_bytes) * 100) - - if percent != last_reported_percent and percent % 5 == 0: - print(f"Progress: {percent}% ({format_bytes(bytes_read)} read)") - last_reported_percent = percent - - # Per second GB log - now = time.time() - if now - last_gb_log_time >= 1.0: - gb_read = bytes_read / (1024**3) - gb_per_sec = (bytes_read - last_gb_read) / (1024**3) / (now - last_gb_log_time) - print(f"Read: {gb_read:.2f} GiB, Speed: {gb_per_sec:.2f} GiB/s") - last_gb_log_time = now - last_gb_read = bytes_read - - end_time = time.time() - elapsed = end_time - start_time - - if not quiet: - print() - print(f"✓ Successfully read {format_bytes(bytes_read)}") - if elapsed > 0: - print(f"Time taken: {elapsed:.2f} seconds") - print(f"Average speed: {format_bytes(bytes_read / elapsed)}/s") - - if compute_hash: - print(f"SHA256: {hash_obj.hexdigest()}") - - return 0 - - except KeyboardInterrupt: - print("\n\nInterrupted by user", file=sys.stderr) - return 130 - - except IOError as e: - print(f"Error reading file: {e}", file=sys.stderr) - return 1 - - -def main(): - parser = argparse.ArgumentParser( - description='Read and benchmark large file I/O performance.', - epilog='Examples:\n' - ' %(prog)s largefile.bin\n' - ' %(prog)s test.dat --chunk-size 128MB\n' - ' %(prog)s data.bin --hash --quiet', - formatter_class=argparse.RawDescriptionHelpFormatter - ) - - parser.add_argument( - 'input', - help='Input file path to read' - ) - - parser.add_argument( - '--chunk-size', - default='64MB', - help='Chunk size for reading (default: 64MB)' - ) - - parser.add_argument( - '--hash', - action='store_true', - help='Compute SHA256 hash of the file' - ) - - parser.add_argument( - '--quiet', '-q', - action='store_true', - help='Suppress progress output' - ) - - parser.add_argument( - '--version', - action='version', - version='%(prog)s 1.0.0' - ) - - args = parser.parse_args() - - try: - chunk_size = parse_size(args.chunk_size) - except ValueError as e: - print(f"Error: {e}", file=sys.stderr) - return 1 - - if chunk_size <= 0 or chunk_size > 1024**3: # Max 1GB chunk - print("Error: Chunk size must be between 1 byte and 1GB", file=sys.stderr) - return 1 - - return read_file(args.input, chunk_size, args.hash, args.quiet) +from big_file_gen import read_main if __name__ == "__main__": - sys.exit(main()) + raise SystemExit(read_main())