在读取大量小文件时靠后的文件总是打开的特别慢

Viewed 51

问题描述


如题,在需要读取大量小文件时,靠后的文件总是打开的比靠前的慢很多,已经更换了速度更快的sd卡了(目前使用64G卡),打开时能快一点但明显还是会越来越慢,请问这个要怎样解决呢?
目前我是在电脑上先将所有小文件合并为单个文件,然后在k230上一次打开,这样速度就没有明显问题。

复现步骤


首先在电脑上创建测试用的小文件

import os

base_dir = "D:/test"
os.makedirs(base_dir, exist_ok=True)
for i in range(3000):
    with open("{}/{}.txt".format(base_dir, i), "w") as file:
        for j in range(700):
            file.write("A")

然后将test文件夹复制到K230存储卡的SDCARD分区
最后将存储卡插回K230,在IDE里运行下面的脚本,可以看到打印数字的速度越来越慢

import os

base_dir = "/sdcard/test"

def get_file_list(dir_path: str):
    filelist_str = []
    try:
        filelist_byte = os.listdir(dir_path.encode("utf-8"))
        for filename in filelist_byte:
            try:
                filelist_str.append(filename.decode("utf-8"))
            except:
                print("file name decode err " + str(filename))
    except Exception as e:
        print("get file list err " + str(e))
        return []
    return filelist_str


filelist = get_file_list(base_dir)
for i in range(len(filelist)):
# for i n range(int(len(filelist) / 2), len(filelist)):
    with open("{}/{}".format(base_dir, filelist[i]), "r") as file:
        # print(i, len(file.readlines()[0]))
        print(i)

另外如果从文件列表的中间开始打开文件,打开的速度并不会和从头开始一样快,貌似是靠后的文件打开一定会变慢。

硬件板卡


亚博智能K230

软件版本


CanMV_K230_YAHBOOM_micropython_v1.5-legacy-1-g7843f7c_nncase_v2.9.0

你好,我们会来review一下存储这部分的驱动。

2 Answers

你好,目前已经修复了这个问题,下载明天的daily build,然后在烧录固件之后,使用工具将sd卡给完整格式化,或者在 msh 中执行 mkfs sd02,然后重启即可。

import os
import time

BASE_DIR = "/sdcard"   # change to "/data" to compare
FILE_PATH = BASE_DIR + "/bench.bin"

TOTAL_SIZE = 8 * 1024 * 1024   # 8MB

# sweep: 1KiB → 1MiB
BLOCK_SIZES = [
    1024, 2048, 4096,
    8*1024, 16*1024, 32*1024,
    64*1024, 128*1024, 256*1024, 512*1024,
    1024*1024
]


def remove_file(path):
    try:
        os.remove(path)
    except:
        pass


def write_test(path, total_size, block_size):
    buf = b'\x55' * block_size
    written = 0

    start = time.ticks_ms()

    with open(path, "wb") as f:
        while written < total_size:
            f.write(buf)
            written += block_size

    end = time.ticks_ms()
    elapsed = time.ticks_diff(end, start) / 1000

    speed = (total_size / (1024 * 1024)) / elapsed
    return speed


def read_test(path, block_size):
    total = 0

    start = time.ticks_ms()

    with open(path, "rb") as f:
        while True:
            data = f.read(block_size)
            if not data:
                break
            total += len(data)

    end = time.ticks_ms()
    elapsed = time.ticks_diff(end, start) / 1000

    speed = (total / (1024 * 1024)) / elapsed
    return speed


def run_sweep():
    print("Block Size Sweep Benchmark")
    print("Path:", FILE_PATH)
    print("Total size:", TOTAL_SIZE // (1024 * 1024), "MB\n")

    print("{:>8} | {:>10} | {:>10}".format("Block", "Write MB/s", "Read MB/s"))
    print("-" * 36)

    for bs in BLOCK_SIZES:
        remove_file(FILE_PATH)

        w = write_test(FILE_PATH, TOTAL_SIZE, bs)
        r = read_test(FILE_PATH, bs)

        if bs < 1024:
            label = "{}B".format(bs)
        else:
            label = "{}KB".format(bs // 1024)

        print("{:>8} | {:>10.2f} | {:>10.2f}".format(label, w, r))


run_sweep()

# Block Size Sweep Benchmark
# Path: /sdcard/bench.bin
# Total size: 8 MB
# 
#    Block | Write MB/s |  Read MB/s
# ------------------------------------
#      1KB |       0.41 |       2.23
#      2KB |       0.87 |       3.75
#      4KB |       1.82 |       6.18
#      8KB |       3.24 |       9.35
#     16KB |       8.09 |      11.70
#     32KB |      10.11 |      14.06
#     64KB |      10.14 |      14.04
#    128KB |      10.13 |      15.16
#    256KB |      10.15 |      15.37
#    512KB |       9.94 |      14.92
#   1024KB |      10.17 |      15.10