admin管理员组

文章数量:1123234

数据集加载(python, keras, pytorch)

从0开始处理数据集

图片数据以及对应的csv数据(以IDRiD数据集为例)

train_data = []
train_labels = []def get_images(image_dir, labels_dir):for image_file in os.listdir(image_dir):image = cv2.imread(image_dir+r'/'+image_file)image = cv2.resize(image,(227,227))train_data.append(image)labels = pd.read_csv(labels_dir)label = list(labels[labels['Image_name'] + '.jpg' ==  image_file]['Retinopathy grade'])train_labels.append(label)return shuffle(train_data,train_labels,random_state=7)
train_data, train_labels = get_images()

图片数据及对应的txt数据

将txt文档的后缀名改成csv,便于1.1相同

仅txt数据

一般而言,仅给出txt数据的情况较少,遇到之后可以进行如下处理:直接将txt文档的后缀名改成csv即可,然后再按照csv的处理方式进行处理

仅图像数据(以intel classification比赛为例)

首先是如何读取、打标签和使用含标签的训练集只需给出需要的路径即可
def get_images(directory):Images = []Labels = []  label = 0for labels in os.listdir(directory):            # you should give the dir of the train dataif labels == '':                            # you can change the name of labels and the number of labelslabel = 2elif labels == '':label = 4elif labels == '':label = 0elif labels == '':label = 1elif labels == '':label = 5elif labels == '':label = 3for image_file in os.listdir(directory+r'/'+labels):     image = cv2.imread(directory+r'/'+labels+r'/'+image_file)   # read your image and change the size of your imageimage = cv2.resize(image,(150,150)) Images.append(image)Labels.append(label)return shuffle(Images,Labels,random_state=817328462) 
Images, Labels = get_images('')   Images = np.array(Images)                                  
Labels = np.array(Labels)
接下来是如何读取预测集
def Get_images(directory): # function for image detectionImages = []path = os.path.join(directory)for img in os.listdir(path):img_array = cv2.imread(os.path.join(path, img))img_array = cv2.resize(img_array, (150, 150))Images.append(img_array)return shuffle(Images, random_state=817328462)
pred_images = Get_images('')pred_images = np.array(pred_images)
pred_images.shape

仅csv数据

csv数据读取较为简单,直接使用以下指令即可,读取后可以当成dataframe格式进行后续的处理
pd.read_csv('')

本文标签: 数据集加载(pythonKeraspytorch)