MODIFICATION OF NON-LOCAL MEAN ALGORITHM USING PARALLEL CALCULATION FOR IMAGE NOISE REDUCTION

Noise in digital image processing is a noise that occurs at pixel values due to random colour intensity. Several types of noise models include Gaussian noise, speckle noise, impulse noise, and Poisson noise. Before processing image data, a noise reduction process is required. One of the noise reduction algorithms used for gaussian noise models is Non-local Mean. This algorithm performs calculations sequentially on each pixel in the search block. Due to a large number of pixels and search block area in the image, the noise reduction process using the Non-local Mean algorithm is very slow. This study proposes the concept of parallel calculations for the Non-local Mean algorithm. This concept divides the search block into three parts and performs calculations on each part simultaneously. The experimental results show that the Non-local Mean algorithm with parallel calculations can reduce noise up to 30% faster if the noise standard deviation is above 30.


INTRODUCTION
A necessary noise reduction process should be done before image data is analyzed [1]. Denoising is a process in image preprocessing to reduce noise while preserving the texture of digital images [2]. A suitable digital image denoising method will remove noise without reducing the image's texture [3]. Several digital image denoising methods exist, including Median Filter, Non-local Means, and Edge Preserving [4]. The Median Filter method has been applied by [5], focusing on reducing impulsive noise models with very high intensities. [6] proposed the Non-local Means algorithm by applying the concept of self-similarity. [7] proposed a Edge Preserving algorithm to reduce Salt and Pepper noise with parallel computing. The Non-local Means algorithm can combine two essential attributes in digital image denoising: preserving texture/edges and reducing noise [8]. The concept of self-similarity applied in the Nonlocal Means algorithm makes the calculation intensive [9]-a large number of iterations for calculating weights performed serially results in a relatively longer denoising process [10]. The Non-local Means algorithm applies the concept of self-similarity to calculate weights in reducing noise in a digital image pixel [11]. The calculation is performed on all pixels within a search block of 21 x 21 pixels for noise standard deviation less than 30 and 35 x 35 for noise standard deviation more significant than 30 [12]. Calculating weights serially within the search block makes this algorithm run slowly. The Non-local Mean Algorithm has a relatively slow reduction processing time when calculating the weights for the search blocks performed on each pixel in the image [13]. For example, in an image with 64 x 64 pixels and a noise standard deviation of 50, the search block used is 35 x 35 in size, and the patch size is 5 x 5. The iterations performed to reduce noise in the image amount to 125,440,000 repetitions. In the original Non-local Mean algorithm, the search for a new pixel value to replace the noisy pixel value is done serially within the search block [14]. Parallel computation can overcome this issue. Parallel computation is performed by dividing the search block into three parts, and each part runs independently to calculate the weights used in reducing noise.
This research proposes a method for digital image denoising using the Non-local Means algorithm by applying parallel computation. Parallel computation divides the serial iterations in the Non-local Means algorithm into several parts to be independently calculated. Therefore, this proposed concept can save time in the digital image denoising process using the Non-local Means algorithm.

METHOD
The Non-Local Mean algorithm reduces noise in a pixel within an image by calculating the pixel values within a search block [14,15]. The size of the search block used shows in Table 1 and Table 2.  Table 2 shows the patch and search block sizes used based on the standard deviation value if the input is a grayscale image. The first step is to create a search block with its center being the pixel to be denoised. Next, the similarity between the two patches is calculated. The first patch is center on the pixel to be denoised, and the second patch is center on each pixel in the search block using the Euclidean distance. Each similarity value is multiply by the pixel in the search block that is the center of the patch. Figure 1 shows two patches within a search block, where patch q is the patch center on all the pixels in the search block, and patch p is the patch center on the pixel to be denoised.  Figure 2, the calculation of patch similarity using Equation (1) is done as follows: The similarity between the two pixels is calculated using the Euclidean distance 2 ( ( , ), ( , )) of two patches sized 2f+1 x 2f+1 pixels centered on pixels p and q.
For grayscale images, the similarity value for all three red, green, and blue channels will be the same. However, for color images, the values will be different, adjusted to the value of each pixel in both patches.
The proposed method modifies the Nonlocal Mean algorithm using a parallel concept. This method divides the search block into three parts; each search block has three independent similarity calculation processes. After these three processes are completed, the values in each block are added and divided by three. The process is illustrate in Figure 2.

Figure 2. An Illustration Process of Search Block With Parallel Computation
The final step in finding the similarity value using Euclidean distance is to add all the similarity values for the red, green, and blue channels together and divide them by the product of the patch radius. This value will use to obtain the weight, calculated using Equation (2).
The weight w(p,q) depends on the similarity between the patch centered at pixel p and the patch centered at pixel q, with h as the degree of filtering and σ as the standard deviation when noise is added to the image. The patch is a 2-dimensional matrix whose values are taken from the RGB values inside the pixel. The weight is calculated using an exponential function that involves the standard deviation, the similarity value, and the degree of filtering. The degree of filtering shows in Tables 1 and Table  2. After the weight is obtain, the next step is multiplying the weight with all the pixel values in the search block and then summing them up. The result of this sum is divided by the normalization coefficient, which is the sum of all the weights in the search block. The normalization coefficient is calculated using Equation 3. The result of this division is the new pixel value that is used to replace the old pixel value. The above process is repeated from the input images first to the last pixel.

RESULT AND DISCUSSION
The experiments were carried out utilizing a computing platform equipped with an Intel Core i7 2.4 GHz CPU, 12 GB RAM, and the Microsoft Windows 10 64-bit operating system. The development environment consisted of Netbeans 7 IDE and Java programming language.
The efficacy of the proposed parallel calculation Non-local Means algorithm in reducing noise was examined using MSE and PSNR analysis. The value differential between the input image object and the noise-reduced image object is used in the calculations for MSE and PSNR. The results for MSE and PSNR for the serial and parallel Non-local Means  Table 4 and Figure 3 show the results of noise reduction for 64 x 64 pixel images. When the noise standard deviation is 20, the serial Non-local Means algorithm has a reduction time of 1 second that is shorter than the parallel Nonlocal Means. When the noise standard deviation is 40, the parallel Non-local Means algorithm is 5 seconds faster, and when the noise standard deviation is 60, 80 and 100 respectively, the parallel Non-local Means is 3, 9 and 10 seconds faster. From Figure 3 it can be seen that the graph for the parallel Non-local Means algorithm is below the serial Non-local Means algorithm line when noise standard deviation is greater than 30. The findings of noise reduction for 128 x 128-pixel images are shown in Table 5. The reduction duration of the serial Non-local Means algorithm is 2 seconds faster than the parallel Non-local Means algorithm when the noise standard deviation is 20. The parallel Non-local Means algorithm is 23 seconds quicker when the noise standard deviation is 40. The parallel Non-local Means is 27, 40, and 41 seconds faster when the noise standard deviation is 60, 80, and 100, respectively. When the noise standard deviation is higher than 30, as shown in Figure  4, the parallel Non-local Means algorithm graph lies below the line of the serial Non-local Means algorithm.  Table 6 and Figure 5 show the noise reduction results for a 256 x 256 pixel image. At a noise standard deviation of 20, the serial Non-local Means algorithm has an 11-second shorter reduction time compared to the parallel Nonlocal Means algorithm. When the noise standard deviation is 40, the parallel Non-local Means algorithm is 131 seconds faster. Similarly, when the noise standard deviation is 60, 80, and 100, the parallel Non-local Means algorithm is faster by 102, 132, and 135 seconds, respectively. Figure 5 shows that the graph for the parallel Non-local Means algorithm is below the line for the serial Nonlocal Means algorithm when the noise standard deviation is greater than 30.   Figure 6 show the noise reduction results for a 512 x 512 pixel image. At a noise standard deviation of 20, the serial Non-local Means algorithm has a 26-second shorter reduction time compared to the parallel Non-local Means algorithm. When the noise standard deviation is 40, the parallel Non-local Means algorithm is 452 seconds faster. Similarly, when the noise standard deviation is 60, 80, and 100, the parallel Non-local Means algorithm is faster by 492, 756, and 796 seconds, respectively. Figure 6 shows that the graph for the parallel Non-local Means algorithm is below the line for the serial Nonlocal Means algorithm when the noise standard deviation is greater than 30.

Mandrill.Jpg Size 256 x 256 Pixels
The experiment results indicate that the serial Non-local Means algorithm has a relatively faster noise reduction time when the noise standard deviation is below or equal to 30. This is because the search block used to calculate the weight is only 21 x 21 pixels in size and could be more effective by processing in parallel. However, when the noise standard deviation is above 30, the search block used is 35 x 35 pixels in size, and parallel computation is more effective than serial computation. In parallel computation, the weight is calculated in three parts of the search block, and each search block must wait for the other search blocks to finish before the weights can be summed. With a search block size of 21 x 21 pixels, it takes more time than serial computation. However, with a search block size of 35 x 35 pixels, waiting for the search blocks to finish calculating the weights in each part is less compared to serial computation.
The visual difference between the noisy image and the denoised image using the Non-local Means algorithm is shown in Table  8.  The Non-local Means algorithm with serial and parallel computations produces relatively similar MSE and PSNR values. The difference between the two types of computations is seen in the time taken for the noise reduction process. The Non-local Means algorithm with parallel computation tends to be faster when the noise standard deviation is high enough. This is because, in the Non-local Means algorithm, the larger the search block used, the higher the noise standard deviation, and the processing time required is directly proportional. If this search block is divided into three parts and computed independently, the noise reduction process time can also be minimized. Another factor that affects the noise reduction process time is the computer specifications used and the number of other processes running when the noise reduction is being performed.

CONCLUSION
In conclusion, the denoised image using the Non-local Means algorithm with either serial or parallel computations has similar MSE and PSNR values. The noise standard deviation in the image affects the time required for noise reduction. The Non-local Means algorithm with serial computation shows faster noise reduction when the noise standard deviation is below 30. In contrast, the Non-local Means algorithm with parallel computation performs better when the noise standard deviation exceeds 30 more than 30%. Therefore, the noise standard deviation affects the search block and patch size, and dividing the search block into three parts for parallel computation can minimize the noise reduction time. It is important to note that the computer's specifications and other ongoing processes can also influence the processing time during noise reduction.