Academic emotions can produce a great impact on the learning effect. Normally, emotions are expressed externally in the students' facial expressions, speech and behaviour. In this paper, the focus is on automatic academic emotion inference based on facial expressions in online learning. Considering the lack of training samples for the inference algorithm, a spontaneous facial expression database is established. It includes the facial expressions of five common academic emotions and consists of two subsets: a video clip database and an image database. A total of 1,274 video clips and 30,184 images from 82 students are included in the database. The samples are labelled by both the participants and external coders. An extensive analysis is carried out on the image database using a convolutional neural network (CNN)-based algorithm to infer self-annotation. Some data augmentation algorithms are applied to improve the algorithm performance. Additionally, an adaptive data augmentation algorithm based on spatial transformer network is introduced, which can remove some confounding factors in the original images. The algorithm can obviously improve the inference performance, which has been proven by comparing some evaluation indicators before and after adoption. Such a database will certainly accelerate the application of affective computing in the educational field.