Accepted Papers

  • Wihidum: Distributed Complex Event Processing
    Isuru Ranawaka, Sachini Jayasekara, Sameera Kannangara, TishanDahanayakage, Srinath Perera and Vishaka Nanayakkara, University of Moratuwa, Sri Lanka
    ABSTRACT

    In the present world, complex event processing (CEP) is becoming increasingly popular in enterprise level systems. Its ability to analyze series of events in real time and deduce results based on those events is highly sought after in present highly competitive enterprise culture. CEP is being used in various scenarios such as data processing, transportation and logistics field, business applications and business activity monitoring. Complex event processing can be utilized efficiently in various scenarios in above fields to make vital decisions through its ability to analyze and summarize data. Because of the high competitiveness in the present society, information and forecasts are very much of value to organizations. Hence complex event processing is becoming a sought after technology in enterprise level. But as the volume of events to be processed and the complexity of queries to be processed are increasing it is difficult to meet industrial needs using standalone Complex Event Processing servers. This paper describes how a standalone complex event processing engine can be distributed in order to achieve significant performance improvements and high availability by combining resources (processing power, memory, etc.) available in a distributed setting. Wihidum mainly focuses on handling large volumes of data, handling high data incoming rates, and processing complex queries that cannot be efficiently processed in a single node due to limitations of memory and processing power. Furthermore, Wihidum discusses how to balance the workload among nodes efficiently, how complex event processing queries can be broken up into simple sub queries, and how queries can be efficiently deployed in the cluster.

  • HPC Energy Optimization: Process Migration and Optimal Operating Point Predictive Model
    Manisha Chauhan, Nazia Parveen, Sumit Kumar Saurav and B. S. Bindhumadhava, Centre for Development of Advanced Computing, India
    ABSTRACT

    Energy optimization has become an important agenda as the High Performance Computing (HPC) world is moving towards "green computing". Several surveys indicate that the energy utilized in computation and communication within a HPC system contributes considerably to their operational costs. This paper aims at reducing energy consumption in HPC cluster. Our proposed energy optimiza-tion approach is based on analysis of efficiency matrix through which we can predict the optimal operating point (operating voltage and fre-quency) for different applications, in order to achieve the target per-formance. Here, we introduce an optimization technique which mi-grate processes to fully utilize some of the systems within the cluster and switch off the unused system. The idea is to run an application at its optimal operating point and whenever possible, migrating the exe-cuting processes based on the availability of required resources, opti-mal operating condition of the application and maximum system utili-zatio. We are using a knowledge base for predicting an optimal op-erating level for a particular application. The knowledge base pro-vides the optimal voltage and frequency level for an application, where the energy consumption is minimal. Earlier works has explored process migration technique in different aspects like load balancing but the novelty of our technique lies in the fact that we have consid-ered optimal operating point as a parameter for migration along with resource availability and utmost system utilization.

  • Design and Implementation Of A Cache Hierarchy-Aware Task Scheduling For Parallel Loops On Multicore Architectures
    Nader Khammassi and Jean-Christophe Le Lann, ENSTA-BRETAGNE, France
    ABSTRACT

    Effective cache utilization is critical for performance in chip-multiprocessor systems (CMP). Modern CMP architectures are based on hierarchical cache topology with varying private and shared caches configurations at different levels. Cache-aware scheduling has become a great design challenge. Many scheduling strategies have been designed to target specific cache configuration. In this paper we introduce a cache hierarchy-aware task scheduling (CHATS) algorithm which adapt to the underlying architecture and its cache topology. The proposed scheduling policy aims at improving cache performance by optimizing spatial and temporal data locality and reducing communication overhead without neglecting load balancing. CHATS has been implemented in the parallel loop construct of XPU framework introduced in previous works [1,7]. We compare CHATS to several popular scheduling policies including dynamic and static scheduling and taskstealing. Experimental results on synthetic and real workloads show that our scheduling policy achieves up to 25% execution speed up compared to OpenMP, TBB and Cilk++ parallel loop implementations. We use our parallel loop implementation in two popular applications from the PARSEC benchmark suite and we compare it to the provided OpenMP, TBB and PThreads version on different architectures.

  • Real Time Face Detection On GPU Using OpenCL
    Narmada Naik and Rathna.G.N, Indian Institute of science, India
    ABSTRACT

    The paper presents a novel approach of low computation real time face detection using Heterogenous Computing GPU-CPU with OpenCL API. Our approach here is to get the real time image from camera and convert the image to grayscale before preprocessing.In preprocessing, gamma correction and Difference of Gaussian(DOG)operation is performed to make an image to work under different illumination condition. Then the image is divided into blocks(cells), Local Binary Pattern(LBP)and histogram is computed in GPU while the training is done in CPU[5]. Here, we have calculated each cell in different compute units instead of assigning one compute unit per pixel.this is because the calculation of histogram depends on all the pixels within block and thus is it better to do the whole calculation within one compute unit. Additionally, the amount of computation per compute unit shouldn't be too small otherwise the overhead associated with managing a compute unit will be more than the actual computation.Since the whole computation is done in the GPU and only the input image and the final histogram are transferred between the CPU and GPU thus overheads associated with data transfer are minimal. As a result the computational speed is 20ms for 640x480 input image. Similarly, for other resolution scaling is done before LBP computation.

  • On Demand-based Frequency Allocation to Mitigate Interference in Femto-Macro LTE Cellular Network
    Shahadate Rezvy, Middlesex University, United Kingdom
    ABSTRACT

    Long Term Evolution (LTE) has introduced Femtocells technology in cellular mobile communication system in order to enhance indoor coverage. Femtocell is low-power,very small and cost effective cellular base station used in indoor environment. However, the impact of Femtocells on the performance of the conventional Macrocell system leads interference problem between Femtocells and pre-existing Macrocells as they share the same licensed frequency spectrum. In this paper, we propose an efficient method to mitigate interference and improve system capacity in the existing Femto- Macro two tier networks. In our proposed scheme, we use a novel frequency planning for two tiers cellular networks using frequency reuse technique where Macro base stations allocate frequency sub-bands for Femtocells users on demand basis through Femtocells base stations. This novel frequency reuse technique aims to mitigate interference by improving system throughput.

  • Positive Impression of Low-Ranking MicroRNAs in Human Cancer Classification
    Feifei Li, Yongjun Piao, Minghao Piao and Keun Ho Ryu, Chungbuk National University, South Korea
    ABSTRACT

    Recently, many studies based on microRNAs (miRNAs) showed a new aspect of cancer classification, and feature selection methods are used to reduce the high dimensionality of miRNA expression data. These methods just consider the problem of where feature to class is 1:1 or n:1. But one miRNA may have influence to more than one type of cancers. However, these miRNAs are considered to be low ranked in traditional feature selection methods and they are removed at most of time. Therefore, it is necessary to consider the problem of 1:n or m:n during feature selection. In our wok, we considered both high and low-ranking features to cover all problems (1:1, n:1, 1:n, m:n) in cancer classification. After numerous tests, information gain and chi-squared feature selection methods were chosen to select the high and low-ranking features to form the m-to-n feature subset, and LibSVM classifier was used to do the multi-class classification. Our results demonstrate that the m-to-n features make a positive impression of low-ranking microRNAs in cancer classification since they lead to achieve higher classification accuracy compared with the traditional feature selection methods.

  • Computationally Efficient Implementation of a Hamming Code Decoder using a GPU
    MdShohidul Islam and Jong-Myon Kim, University of Ulsan, Korea
    ABSTRACT

    Hamming code is one of the efficient error coding algorithms widely used in wireless data communication. Existing decoders are mostly based on hardware, suffering fromcritical bottlenecks such as scalability, programmability and flexibility.Of late, software defined radio (SDR) is a promising technology that implements a range of communication protocols in software by using central processing units (CPUs) and graphics processing units (GPUs). Thus, in this paper, we present a computationally efficient implementation of a Hamming code decoder on a GPU.GPU offers an extremely high-throughput parallel computing platform by employing hundreds of processors concurrently.However, the Hamming algorithm is challenging to parallelize effectively on a GPU because it executes on sparsely located data items with several conditional statements, leading to noncoalesced, long latency global memory access and huge thread divergence. In spite of this, the proposed approach provides insights into how to produce a high-performance GPU implementation.When executed on a 336 core GPU,the achieved speedup is 99x overan equivalentCPU implementation. Moreover, the implementation yields a significant reduction in computational complexity from O(n) of the sequential algorithm to O(1) of the GPUbased approach. Furthermore, the GPU based decoder exceedingly outperforms the CPU based approach in terms of energy efficiency.The proposed approach is validated using a ComputeUnified Device Architecture(CUDA) enabledthe NVIDIA GeForce GTX 560 graphics card.

  • Detection of renal cysts by a method of SVM classification
    Mekhaldi Nadia and Benyettou Mohamed, Laboratory Modeling and Optimization of Industrial Systems, Algeria
    ABSTRACT

    During last years the medical imagery became a part vital for detection, diagnostic and treatment of cancer. In the majority of the cases the medical imagery is the first stage in the prevention of certain diseases of cancer. The medical imagery allows a fast treatment, an effective storage and a rapid transmission of the images for diagnostic on remote sites. In the work, we present a system of image processing scanning renal which is based on the method Split and Merge for the stage of segmentation and application the SVM (support vector machine) fod the detection of lesions in these images.

    Courtesy