Dataset vs. Dataloader

in pytorch and fastai

datset = [all data]
sampler = [subset of datset] # useally used for dividing train, valid and test
batch = [subset of sampler]
dataloader = [batch_1, batch_2, ... batch_n]
dataBunch = {train_dl: deviceDataloader, valid_dl: deviceDataloader, test_dl: deviceDataloader}
deviceDataloader = {dl: dataloader}

len(datasets)="size of all data"
len(sampler)="size of sample"
len(dataLoader)="size of batch"
len(dataBunch)="size of bunch?"

dataset = torchvision.datasets.ImageFolder
sample = torch.utils.data.sampler.SubsetRandomSampler
dataloader = torch.utils.data.DataLoader
databunch = fastai.DataBunch
deviceDataloader = fastai.DeviceDataLoader

learner = {data:dataBunch, model:module, opt_func:'adam', loss_func:l2Loss}
learner.fit(epoch=int, lr=float)

Dataloaderは__iter__と__next__が定義されているので、 iter(Dataloader).next() としてあげれば最初から1バッチずつ取り出すことができます。 デバッグする時等に便利。

images, labels= iter(train_dl).next() # pytorchの場合
images, labels= iter(deviceDataloader.train_dl.dl).next()

REF

  • https://qiita.com/tomp/items/3bf6d040bbc89a171880