Final year Projects | Real Time Projects | Ns2 Projects
We are offering final year projects based on java, dotnet, Ns2
Data Leakage Detection
- Knowledge and Data Engineering, January 2011
We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject â€œrealistic but fakeâ€ data records to further improve our chances of detecting leakage and identifying the guilty party.
Usher: Improving Data Quality with Dynamic Forms
- Knowledge and Data Engineering, January 2011
Data quality is a critical problem in modern databases. data-entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving data quality at entry time. In this paper, we propose Usher, an end-to-end system for form design, entry, and data quality assurance. Using previous form submissions, Usher learns a probabilistic model over the questions of the form. Usher then applies this model at every step of the data-entry process to improve data quality. Before entry, it induces a form layout that captures the most important data values of a form instance as quickly as possible and reduces the complexity of error-prone questions. During entry, it dynamically adapts the form to the values being entered by providing real-time interface feedback, reasking questions with dubious responses, and simplifying questions by reformulating them. After entry, it revisits question responses that it deems likely to have been entered incorrectly by reasking the question or a reformulation thereof. We evaluate these components of Usher using two real-world data sets. Our results demonstrate that Usher can improve data quality considerably at a reduced cost when compared to current practice.
Nymble: Blocking Misbehaving Users in Anonymizing Networks
- Dependable and Secure Computing, March- April 2011
Anonymizing networks such as Tor allow users to access Internet services privately by using a series of routers to hide the client’s IP address from the server. The success of such networks, however, has been limited by users employing this anonymity for abusive purposes such as defacing popular Web sites. Web site administrators routinely rely on IP-address blocking for disabling access to misbehaving users, but blocking IP addresses is not practical if the abuser routes through an anonymizing network. As a result, administrators block all known exit nodes of anonymizing networks, denying anonymous access to misbehaving and behaving users alike. To address this problem, we present Nymble, a system in which servers can â€œblacklistâ€ misbehaving users, thereby blocking users without compromising their anonymity. Our system is thus agnostic to different servers’ definitions of misbehavior-servers can blacklist users for whatever reason, and the privacy of blacklisted users is maintained.
Privacy-Preserving Updates to Anonymous and Confidential Databases
- Dependable and Secure Computing, July-August 2011
Suppose Alice owns a k-anonymous database and needs to determine whether her database, when inserted with a tuple owned by Bob, is still k-anonymous. Also, suppose that access to the database is strictly controlled, because for example data are used for certain experiments that need to be maintained confidential. Clearly, allowing Alice to directly read the contents of the tuple breaks the privacy of Bob (e.g., a patientâ€™s medical record); on the other hand, the confidentiality of the database managed by Alice is violated once Bob has access to the contents of the database. Thus, the problem is to check whether the database inserted with the tuple is still k-anonymous, without letting Alice and Bob know the contents of the tuple and the database respectively. In this paper, we propose two protocols solving this problem on suppression-based and generalization-based k-anonymous and confidential databases. The protocols rely on well-known cryptographic assumptions, and we provide theoretical analyses to proof their soundness and experimental results to illustrate their efficiency.
Dynamics of Malware Spread in Decentralized Peer-to-Peer Networks
Dependable and Secure Computing â€“ July-August 2011
In this paper, we formulate an analytical model to characterize the spread of malware in decentralized, Gnutella type peer-to-peer (P2P) networks and study the dynamics associated with the spread of malware. Using a compartmental model, we derive the system parameters or network conditions under which the P2P network may reach a malware free equilibrium. The model also evaluates the effect of control strategies like node quarantine on stifling the spread of malware. The model is then extended to consider the impact of P2P networks on the malware spread in networks of smart cell phones
Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems
- Parallel and Distributed System, 2011
Although anonymizing Peer-to-Peer (P2P) systems often incurs extra traffic costs, many systems try to mask the identities of their users for privacy considerations. Existing anonymity approaches are mainly path-based: peers have to pre-construct an anonymous path before transmission. The overhead of maintaining and updating such paths is significantly high. We propose Rumor Riding (RR), a lightweight and non-path-based mutual anonymity protocol for decentralized P2P systems. Employing a random walk mechanism, RR takes advantage of lower overhead by mainly using the symmetric cryptographic algorithm. We conduct comprehensive trace-driven simulations to evaluate the effectiveness and efficiency of this design, and compare it with previous approaches. We also introduce some early experiences on RR implementations.
Dynamic Conflict-Free Transmission Scheduling for Sensor Network Queries
Mobile Computing -Â May 2011
With the emergence of high data rate sensor network applications, there is an increasing demand for high-performance query services. To meet this challenge, we propose Dynamic Conflict-free Query Scheduling (DCQS), a novel scheduling technique for queries in wireless sensor networks. In contrast to earlier TDMA protocols designed for general-purpose workloads, DCQS is specifically designed for query services in wireless sensor networks. DCQS has several unique features. First, it optimizes the query performance through conflict-free transmission scheduling based on the temporal properties of queries in wireless sensor networks. Second, it can adapt to workload changes without explicitly reconstructing the transmission schedule. Furthermore, DCQS also provides predictable performance in terms of the maximum achievable query rate. We provide an analytical capacity bound for DCQS that enables DCQS to handle overload through rate control. NS2 simulations demonstrate that DCQS significantly outperforms a representative TDMA protocol (DRAND) and 802.11b in terms of query latency and throughput.
A Privacy-Preserving Location Monitoring System for Wireless Sensor Networks
- MobileÂ Computing, 2011
Monitoring personal locations with a potentially untrusted server poses privacy threats to the monitored individuals. To this end, we propose a privacy-preserving location monitoring system for wireless sensor networks. In our system, we design two in-network location anonymization algorithms, namely, resource and quality-aware algorithms, that aim to enable the system to provide high-quality location monitoring services for system users, while preserving personal location privacy. Both algorithms rely on the well-established k-anonymity privacy concept, that is, a person is indistinguishable among k persons, to enable trusted sensor nodes to provide the aggregate location information of monitored persons for our system. Each aggregate location is in a form of a monitored area A along with the number of monitored persons residing in A, where A contains at least k persons. The resource-aware algorithm aims to minimize communication and computational cost, while the quality-aware algorithm aims to maximize the accuracy of the aggregate locations by minimizing their monitored areas. To utilize the aggregate location information to provide location monitoring services, we use a spatial histogram approach that estimates the distribution of the monitored persons based on the gathered aggregate location information. Then, the estimated distribution is used to provide location monitoring services through answering range queries. We evaluate our system through simulated experiments. The results show that our system provides high-quality location monitoring services for system users and guarantees the location privacy of the monitored persons.
Fast Detection of Mobile Replica Node Attacks in Wireless Sensor Networks Using Sequential Hypothesis Testing
Mobile Computing â€“ June 2011
Due to the unattended nature of wireless sensor networks, an adversary can capture and compromise sensor nodes, make replicas of them, and then mount a variety of attacks with these replicas. These replica node attacks are dangerous because they allow the attacker to leverage the compromise of a few nodes to exert control over much of the network. Several replica node detection schemes have been proposed in the literature to defend against such attacks in static sensor networks. However, these schemes rely on fixed sensor locations and hence do not work in mobile sensor networks, where sensors are expected to move. In this work, we propose a fast and effective mobile replica node detection scheme using the Sequential Probability Ratio Test. To the best of our knowledge, this is the first work to tackle the problem of replica node attacks in mobile sensor networks. We show analytically and through simulation experiments that our scheme detects mobile replicas in an efficient and robust manner at the cost of reasonable overheads.
Approaching Throughput-Optimality in Distributed CSMA Scheduling Algorithms With Collisions
Networking- June 2011
It was shown recently that carrier sense multiple access (CSMA)-like distributed algorithms can achieve the maximal throughput in wireless networks (and task processing networks) under certain assumptions. One important but idealized assumption is that the sensing time is negligible, so that there is no collision. In this paper, we study more practical CSMA-based scheduling algorithms with collisions. First, we provide a Markov chain model and give an explicit throughput formula that takes into account the cost of collisions and overhead. The formula has a simple form since the Markov chain is â€œalmostâ€ time-reversible. Second, we propose transmission-length control algorithms to approach throughput-optimality in this case. Sufficient conditions are given to ensure the convergence and stability of the proposed algorithms. Finally, we characterize the relationship between the CSMA parameters (such as the maximum packet lengths) and the achievable capacity region.
Heuristics Based Query Processing for Large RDF Graphs Using Cloud Computing
- Knowledge and Data Engineering,Â 2011
Semantic Web is an emerging area to augment human reasoning for which various technologies are being developed. These technologies have been standardized by W3C. One such standard is the RDF. With the explosion of semantic web technologies, large RDF graphs are common place. Current frameworks do not scale for large RDF graphs and as a result does not address these challenges. In this paper, we describe a framework that we built using Hadoop to store and retrieve large numbers of RDF triples by exploiting the cloud computing paradigm. We describe a scheme to store RDF data in Hadoop Distributed File System. More than one Hadoop job may be needed to answer a query because a triple pattern in a query cannot take part in more than one join in a Hadoop job. To determine the jobs, we present an algorithm to generate query plan, whose worst case cost is bounded, based on a greedy approach to answer a SPARQL query. We use Hadoop’s MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can handle large amounts of RDF data, unlike traditional approaches.
Publishing Search Logs – A Comparative Study of Privacy Guarantees
Search engine companies collect the “database of intentions,” the histories of their users’ search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent keywords, queries and clicks of a search log. We first show how methods that achieve variants of k-anonymity are vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured by epsilon-differential privacy unfortunately does not provide any utility for this problem. We then propose a novel algorithm ZEALOUS and show how to set its parameters to achieve (epsilon, delta)-probabilistic privacy. We also contrast our analysis of ZEALOUS with an analysis by Korolova et al. that achieves (epsilon’, delta’)-indistinguishability. Our paper concludes with a large experimental study using real applications where we compare ZEALOUS and previous work that achieves k-anonymity in search log publishing. Our results show that ZEALOUS yields comparable utility to k-anonymity while at the same time achieving much stronger privacy guarantees.