Image Colorization Algorithm based on Dense Neural Network
School of Data Science and Technology, North University of China, Taiyuan, 030051, China
Corresponding authors:
Accepted: 2018-12-24 Online: 2019-01-1
About authors
Na Zhang received her undergraduate degree in School of Information and Statistics from Guangxi University of Finance and Economics, Nanning, Guangxi, P.R. China, in 2015. Currently, she is pursuing master’s degree in NUC, and her areas of interest are digital image processing, medical image processing and computer vision.
Pinle Qin received the PhD degree in computer application technology from Dalian University of Technology (DLUT), Dalian, Liaoning, P.R. China, in 2008. He is currently an associate professor with the School of Data Science and Technology, North University of China (NUC). His current research interests include computer vision, medical image processing and deep learning.
Jianchao Zeng received the PhD degree from Xi’an Jiaotong University, Xi’an, Shanxi, P.R. China, in 1990. He is currently the vice president of North University of China (NUC). His current research interests include computer vision, medical image processing and deep learning.E-mail: zjc@nuc.edu.cn.
Yulong Song is pursing master`s degree in NUC, and his research areas are digital image processing and computer vision.
In most scenes, color images have richer information than grayscale images. This paper presents a method of grayscale image pseudo coloring that constructed and trained an end-to-end deep learning model based on dense neural network aims to extract all kinds of information and features (such as classification information and detail feature information). Entering a grayscale picture to the trained networkcan generate a full and vibrant vivid color picture. By constantly training the entire network on a wide variety of data sets, you will get the most adaptable, high-performance pseudo color network. The experiments show that the method proposed has a higher utilization of features and can obtain a satisfactory coloring effect. Compared with the current advanced pseudo color methods, it has also made remarkable improvements, and to a certain extent, the problem during the coloring processing have been improved, such as color overflow, loss of details, low contrast etc.
Keywords:
Na Zhang, Pinle Qin, Jianchao Zeng, and Yulong Song.
1. Introduction
Color information is an important part of the image, which can show an abundant space hierarchy by combining with the semantic and surface texture information of the scene in the image. Research shows that the human eye is highly sensitive to the color intensity and its transformation, while the sensitivity to the grayscale value transformation is far less. In terms of human psychology, color images can give observers a more pleasant and enjoyable feeling, which helps to understand the content of images and obtain more comprehensive and rich information from them to enhance the use value of images.
Grayscale image coloring (i.e., pseudo-color processing) is a technology that is generated under the above requirements by assigning a color to grayscale values[1] by a specified rule to restore, enhance, or change the color information of the image. Currently, there are mainly two kinds of methods for image coloring:
The development of deep learning methods and the advent of high-performance GPUs have opened new directions for data-driven image colorization methods. This kind of method uses neural network.By building different network architecture, it can extract and analyse the content and features of the image and look for the mapping relation between the grayscale image and the color image and train the corresponding model and realize the image colorization. Cheng et al.[11]used large-scale data modelling, post-processing based on joint bilateral filtering and adaptive image clustering to integrate image global information. Deshpande et al. [12] used the LEARCH framework to train the quadratic objective function in the chromaticity diagram and achieve image colorization by minimizing the objective function. The structure of such network is relatively simple, and the colorization effect is limited. Zhang et al.[13]proposed to extract image features by VGG convolutional neural network[14] and predicted the color histogram of each pixel to color the image. Later, they put forward a new idea to extract the information by using U-Net network [15] and combined it with the user interaction [16]. Lizuka et al. [17] constructed a dual-stream structure network; they extracted the global classification information and local characteristic information of the image simultaneously and fused the two types of information to predict the pixel color. Compared with the previous methods, these three methods have made great improvement. However, due to the subsampled and up-sampling they have used during the image processing, there is a certain degree of information loss. Qin et al. [18] used theResidual neural network [19] to extract the detail features, and then combined with the guidance of classification information. Their method has helped to reduce the information loss in some extent, but there are still some problems such as incomplete details coloring and color overflow. In summary, the existing grayscale image colorization techniques mainly have the following problems:
This paper proposed a new method based on Densenets [20] to solve the mentioned problems. The method is based on the idea of skip-connection and has high utilization rate of information; the loss of the low-level semantic information and object contour information is also small. In this paper, we used its characteristics to extract the texture and detail features in the image. At the same time, we obtained the classification information of the image through the VGG sub-network. The network combines the texture details and classification information for feature re-extraction and predicts the color according to the integrated features to obtain the output. We compared the output image with the original color image and calculated the mean square error. After several optimizations training, we got a final colorization model to achieve the conversion from grayscale image to color image. The network does not change the size of the feature map when extracting the detail information. The feature information that is gradually discarded or lost in the traditional network is reused through Densenets, which effectively reduces the gradient vanishing problem and enhances the transmission and utilization of features. As a result, the information utilization ratio of the original image by using the network is higher, the obtained coloring image is better, and the details are also more complete and abundant.
2. Related work
Figure 1.
Figure 2.
In a convolution neural network with l levels, each layer l corresponds to a nonlinear mapping Hl(.). Each mapping usually contains the following items: Batch Normalization, ReLU Activation, Pooling and Convolution. When an image X0 enters the network, after the lth layer, the output will be Xl. The traditional feed forward network only connects the output of the (l-1)th layer to the lth layer directly, that is ${{X}_{l}}=H\left( {{X}_{l-1}} \right)$. ResNet added a skip-connection, as shown in Equation (1).
In DenseNets, the network connects the convolutional layer directly to all subsequent levels, i.e. lth layer receives the features from all the previous convolution layers as a new input, as seen in Equation (2):
Where $\left[ {{X}_{1}},{{X}_{2}},\cdots ,{{X}_{l-1}} \right]$ is processing to connect the feature maps from 0, 1, 2,
3. The Image Colorization Algorithm Based on Dense Neural Network
3.1. The Network Structures
The existing grayscale image colorization networks based on deep learning mainly extract the detail texture feature of the image by constructing convolution neural network; the coloring effect is acceptable. However, if there is no proper way to learn the image global context information correctly (such as whether the scene is indoor or outdoor, etc.), the network may have obvious errors. Lizuka et al., Qin et al. incorporated the category information of the images into the network and used the information to co-train the model, which played an informational guiding role for the entire colorization network.
This paper draws on the advantages of the method of Lizuka et al. Qin et al. constructed a dual-stream structure based on the dense neural network, learning the classification information and the detail texture information of the image at the same time. In this paper, we used CIE Lab color space. When the image enters the network, it only learns and predicts the color information of the channel a and b. Then, it combines with the information in channel L from the grayscale image to achieve the coloring. The network structure is shown as Figure 3.
Figure 3.
The whole colorization network is composed of three parts:
Figure 4.
Images with category labels will be converted from
3.1.1. Feature Extraction
The image of H X W X1 enters the feature extraction section based on dense neural network. It passes through a layer of convolution before entering four dense blocks one after another. The convolution layers in each dense block are closely connected with the subsequent convolution layers, and every layer in each block outputs 12 feature maps. In view of the denseness of Densenets, we put a 1x1 convolution before each 3x3 convolution (as the dense block part shown in Table 1 (a)). This processing decreases the number of input feature maps, reduces the dimensionality, cuts back the amount of computation required to a large extent, as well as blends the features of each channel.
Between every two Dense blocks, we added a 1x1 convolution (the Transition layer). This way, the output feature map of the previous dense block can be reduced (in this paper, the network is set to be reduced to half), so as to avoid the network being too large effectively, as well as reduce the computation burden of the next dense block.
After the image passes through the above network, a large number of features and texture information will be extracted. Because the convolutions in dense blocks are connected to all the previous layers, low-level information is used, which effectively reduced the information loss and improved the problem of gradient vanishing.
3.1.2. Classification Guidance
When the image enters the classification guidance network, the network will extract the classification information of the image. The fully connected layer fc1 reconstructs the extracted features into a 1x4096 eigenvector, and then integrates them via fc2 and fc3 to obtain a 1x64 eigenvector. This helps the entire network determine the category of the image content.
3.1.3. Fusion and Output
After the feature extraction network and the classification guiding network both complete the information extraction, two parts of information will be reconstructed into a feature map with the same dimension and fused in the fusion layer. After the feature re-extraction by Dense-block4, the network will finally generate a H X W X2 output after a convolution operation.
3.2. Loss
As described in Section 3.1, the loss of the network will serve as an important reference for adjusting weights. Loss consists of two parts: one is the feature extraction loss (L1) from the feature extraction network and the other is the classification loss (L2) from the classification guidance network. Feature extraction loss and classification loss are fed back to the network independently and do not interact with each other.
During the training, the network read n images per batch. After obtaining the output prediction, we compared the color prediction with the original image and use the Mean Squared Error (MSE) to measure the disparity between the network output and the true value. This is shown in Equation (3) and Equation (4):
Where w and h are the width and high of the sample, ${{Y}_{p}}$ is the color of the ab channel of the original image, while ${{X}_{p}}$ is the prediction value by the network, and n refers to the number of images contained in a training batch.
As for the classification guidance part, let the classification information of the input image to be the guidance label ylabel, while the predicted classification of the network is yout. Then, use Cross-Entropy to measure the disparity between the classification prediction of the network and the real classification, as shown in Equation (5):
When calculating the log function value on $y_{i}^{\text{out}}$, if the value of $y_{i}^{\text{out}}$ is 0, $\log (\text{ }y_{i}^{\text{out}})$ will be positive infinity. So, when the number is smaller than $1e-10$, set it equal to $1e-10$.
3.3. The Result evaluation——Information Loss
As stated in chapter 1, the main purpose of coloring the grayscale images is to obtain more information from the colored results than the grayscale images. Whether the result is clearand the amount of information contained is sufficientcan be regarded as an important index to measure the merits of the colorization algorithm.
Where $InEn$ is the image information entropy value of the picture, $P\left( i \right)$ represents the appearance frequency of the color whose value is i in the whole image, and c represents the range of color values in Lab color space. By comparing the information entropy between the results of the colorization network and the original image and calculating the discrepancy between them, this paper adopts the evaluation criteria based on information entropy. A smaller the discrepancymeans the color information loss is fewer. That is to say, the effectiveness of the colorization network is better.
The network proposed mainly estimates the value of ab channel based on the Lab color space; it does not calculate and process the value of channel L (represent the grayscale information of the image) repeatedly. Therefore, to improve the efficiency, we only consider the information contained in the ab channels when evaluating the colorization effect, and compare the information entropy of the ab channels between the output image and the original image. This is shown as Equation (7):
Where $Info\_loss$, $InE{{n}_{C}}$, $InE{{n}_{O}}$ denote the degree of loss of color information, the entropy of the original image, and the information entropy of the output image in this paper. It can be seen from the above definition that the smaller the Info_loss value, the better the colorization result.
4. Experiments and Analysis
4.1. Experimental Data Set and Environment
As a supervised colorization method, the network proposed needs a large number of color images with classified labels as training data sets. Therefore, we chose the MIT Places Database (containing 205 scenes and more than 2.5 million images) [26] and ImageNet (containing 1000 scenes and more than 1.2 million images) [27] to train the network. HDF5is used to process the data set and generate a data file of “.h5” type, which is no longer necessary to read a large number of single pictures in sequence to facilitate the operation and maintenance. The colorization network proposed needs a lot of matrix calculation. To improve the training efficiency, we used GPU (Graphics Processing Unit) to do the training; the GPU type is NVIDIA Tesla M40. In the implementation of the method, we used Python programming environment, and chose TensorFlow [28] to build the network.
4.2. Coloring Result and the Comparison with Advanced Methods
4.2.1. The Coloring Effect Comparison that Have Reference of the Original Image
To verify the effectiveness of the proposed algorithm, we choose some representative images and compared them with the existing excellent algorithms such as Zhang et al., Lizuka et al., Qin et al.This is shown in Figure 5. We compared the coloring effect from the following aspect:
Figure 5.
4.2.2. The Coloring Effect Comparison of Old Photos and Grayscale Image
To verify the universality of our algorithm, we selected some old photos and grayscale images to compare the coloring effects, as shown in Figure 6. From the coloring effect comparison of old photos and grayscale images, we can see that our method has less color overflow, better contrast and more abundant details than the algorithm of Zhang et al., Lizuka et al., and Qin et al. (Such as in group (1) our method gave the tree trunk the proper color brown).
Figure 6.
4.2.3. Algorithm Comparison (with Qin et al.)
As we can see, compared with the prior advance colorization method based on deep learning. the colorization method based on Residual neural network proposed by Qin et al. have reduced the information loss to a certain degree; the coloring effect is also improved. We compared them by calculating the Info_loss value of the two methods. We chose 5000 pictures randomly, and then calculated the average. The mean of Qin et al.’s method is 2.31481063099, and the mean of the method proposed in this paper is 1.92628092468. Figure 7 shows the result comparison. The blue line in Figure 7 refers to the method proposed in this paper, while the red line refers to the method of Qin et al. The horizontal axis indicates the value of Info_loss and the vertical axis indicates the frequency of images.
Figure 7.
5. Conclusion and Prospection
5.1. Conclusion
In this paper, we proposed a gray-scale image colorization algorithm based on dense neural network. The algorithm includes sub-networks and VGG sub-networks, which are composed of dense blocks to extract detailed texture information and classification information respectively. Two kinds of information fused together to generate the output of the network as the prediction of color picture. The experimental results show that the proposed method is better than the existing excellent grayscale image colorization algorithm in terms of detail information richness and contrast, and the color overflow is also significantly reduced, using in old photos and grayscale images coloring can also get good performance.
5.2. Shortage and Future Research Direction
Due to the denseness of the network, the performance requirements of the running equipment are high, and the network training needs a long time. At the same time, the coloring effect of our method may not ideal for an image that has not been learned since the data set hasn’t cover all the image categories. In the next phase of research, we plan to enhance the universality and utility of our approach by optimizing the whole network architecture and training more type of images.
Reference
“Colorization using Optimization, ”
, Vol.
“Medical Image Colorization, ”
, Vol.
“Medical Image Colorization using Optimization Technique, ”
, Vol.
“Pseudo-Colorization of Medical Images based on Two-Stage Transfer Model, ”
, Vol.To deal with the drawbacks of the traditional transfer function designing with single-stage multi-channel model, a novel pseudo-colorization model based on two-stage transfer function was developed.Thus, the complex function selection and control point tuning could be avoided.Four modes in the primary grayscale transform stage were proposed, in which the low-level, high-level, middle-level of the original image's grayscale could be stretched or compressed, therefore the area of interest was enhanced.In the secondary colorizing stage, two methods were proposed and the rainbow method was selected, consequently the visual effect and distinguishing ability were improved.The two-stage model has exceptional flexibility and expansibility, ready for multiple applications, and the generated pseudo-color images are quite natural and clear, more conducive to the doctors for diagnosis.
“Transferring Color to Grayscale Images, ”
, Vol.
“Intrinsic Colorization, ”
, Vol.
“AutoStyle: Automatic Style Transfer from Image Collections to Users’ Images, ”
, Vol.DOI:10.1111/cgf.12409 URL [Cited within: 1]
AbstractStylizing photos, to give them an antique or artistic look, has become popular in recent years. The available stylization filters, however, are usually created manually by artists, resulting in a narrow set of choices. Moreover, it can be difficult for the user to select a desired filter, since the filters’ names often do not convey their functions. We investigate an approach to photo filtering in which the user provides one or more keywords, and the desired style is defined by the set of images returned by searching the web for those keywords. Our method clusters the returned images, allows the user to select a cluster, then stylizes the user's photos by transferring vignetting, color, and local contrast from that cluster. This approach vastly expands the range of available styles, and gives each filter a meaningful name by default. We demonstrate that our method is able to robustly transfer a wide range of styles from image collections to users’ photos.
“Automatic Colorization of Grayscale Images using Multiple Images on the Web, ”
in, pp.DOI:10.1145/1599301.1599333 URL [Cited within: 1]
Colorization is the process of adding color to monochrome images and video. It is used to increase the visual appeal of images such as old black and white photos, classic movies, and scientific visualizations. Since colorizing grayscale images involves assigning three-dimensional (RGB) pixel values to an image whose elements are characterized by one feature (luminance) only, the colorization problem does not have a unique solution. Hence, human interaction is typically required in the colorization process. Although existing colorization methods attempt to minimize the amount of user intervention, they require users to manually sellect a similar image to the target image or input a set of color seeds for different regions of the target image. In this paper, we present an entirely automatic colorization method using multiple images collected from the Web. The method generates various and natural colorized images from an input monochrome image by using the information of the scene structure.
“Colorization by Example, ”
,Vol.
“Deep Colorization, ”
in , pp.
“Learning Large-Scale Automatic Image Colorization, ”
in , pp.DOI:10.1109/ICCV.2015.72 URL [Cited within: 1]
We describe an automated method for image colorization that learns to colorize from examples. Our method exploits a LEARCH framework to train a quadratic objective function in the chromaticity maps, comparable to a Gaussian random field. The coefficients of the objective function are conditioned on image features, using a random forest. The objective function admits correlations on long spatial scales, and can control spatial error in the colorization of the image. Images are then colorized by minimizing this objective function. We demonstrate that our method strongly outperforms a natural baseline on large-scale experiments with images of real scenes using a demanding loss function. We demonstrate that learning a model that is conditioned on scene produces improved results. We show how to incorporate a desired color histogram into the objective function, and that doing so can lead to further improvements in results.
“Colorful Image Colorization, ”
in , pp.
“Very Deep Convolutional Networks for Large-Scale Image Recognition, ”
.1556, September
“U-Net: Convolutional Networks for Biomedical Image Segmentation, ”
in , pp.
“Real-Time User-Guided Image Colorization with Learned Deep priors, ”
, Vol.DOI:10.1145/3072959.3073703 URL [Cited within: 1]
Abstract: We propose a deep learning approach for user-guided image colorization. The system directly maps a grayscale image, along with sparse, local user "hints" to an output colorization with a Convolutional Neural Network (CNN). Rather than using hand-defined rules, the network propagates user edits by fusing low-level cues along with high-level semantic information, learned from large-scale data. We train on a million images, with simulated user inputs. To guide the user towards efficient input selection, the system recommends likely colors based on the input image and current user inputs. The colorization is performed in a single feed-forward pass, enabling real-time use. Even with randomly simulated user inputs, we show that the proposed system helps novice users quickly create realistic colorizations, and offers large improvements in colorization quality with just a minute of use. In addition, we demonstrate that the framework can incorporate other user "hints" to the desired colorization, showing an application to color histogram transfer. Our code and models are available at this https URL
“Let There Be Color!: Joint End-to-End Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification, ”
, Vol.DOI:10.1145/2897824.2925974 URL [Cited within: 1]
react-text: 432 A brain computer interface (BCI) system provides a communication channel between a brain and a computer bypassing the need for muscular means. Electroencephalography based BCI systems that utilize the P300 speller paradigm are commonly used but due to the nature of the P300 speller paradigm, these systems are prone to erroneous classification. In this study, a novel approach addressing the... /react-text react-text: 433 /react-text [Show full abstract]
“Research on Image Colorization Algorithm based on Residual Neural Network, ”
in , pp.DOI:10.1007/978-981-10-7299-4_51 URL [Cited within: 1]
Abstract In order to colorize the grayscale images efficiently, an image colorization method based on deep residual neural network is proposed. This method combines the classified information and features of the image, uses the whole image as the input of the network and forms a non-linear mapping from grayscale images to the colorful images through the deep network. The network is trained by using the MIT Places Database and ImageNet and colorizes the grayscale images. The experiment result shows that different data sets have different colorization effects on grayscale images, and the complexity of the network determines the colorization effect of grayscale images. This method can colorize the grayscale images efficiently, which has better visual effect.
“Deep Residual Learning for Image Recognition, ”
in , pp.DOI:10.1109/CVPR.2016.90 URL [Cited within: 1]
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers 8 deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
“Densely Connected Convolutional Networks, ”
in , pp.
“ImageNet Classification with Deep Convolutional Neural Networks, ”
in , pp.
“Going Deeper with Convolutions, ”
in , pp.
“Deep Networks with Stochastic Depth, ”
in , pp.
“FractalNet: Ultra-Deep Neural Networks without Residuals, ”
.07648, May
Torralba, and A. Oliva, “Learning Deep Features for Scene Recognition using Places Database, ”
in , pp .
“ImageNet: A Large-Scale Hierarchical Image Database, ”
in , pp.
“Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, ”
.04467, March
/
〈 | 〉 |