research

Computer systems are more and more interconnected. From the Internet as a whole to individual online communities, the value of a system grows superlinearly with its size: the larger the system, not only the higher its total utility, but also the higher is the per-user utility.

Large scale systems are often organizationally-distributed, i.e., created through an agreement between independent entities. The classic notions of a provider, a consumer and a payment are blurry in such system. For instance, in a computational grid, an organization (a university, or a laboratory) provides resources (processors) for others to use, but also submits jobs to be computed on the shared infrastructure. In file-sharing systems (like BitTorrent), a user downloads data from other users, but also makes data available for others to download.

These systems require a new approach for resource management. Classic approaches optimize a single goal for the whole system: for instance, the length of the schedule (the makespan), or the average download speed. These approaches ignore the complex, heterogeneous nature of organizationally-distributed systems. Even if an average user is served well, some users might be better off isolating themselves from the system; or, the distribution of the utility might be unfair taking into account users' contributions (for instance, a laboratory provides access to a large number of processors through the computational grid, but jobs submitted by this laboratory queue longer than other jobs); or, some users might be better off declaring untruthful characteristics of their workload or their resources.

The goal of my research is to develop methods of resource management that will allow to build large-scale, organizationally-distributed systems.

The following list is organized by areas of interest; if you need a list organized by publication type (journal, conference, etc.), get my CV.

resource management

 
Pawel Janus and Krzysztof Rzadca. Slo-aware colocation of data center tasks based on instantaneous processor requirements. In ACM SoCC'17, ACM Symposium on Cloud Computing (accepted). ACM, 2017. [ bib | DOI | http | .pdf ]
In a cloud data center, a single physical machine simultaneously executes dozens of highly heterogeneous tasks. Such colocation results in more efficient utilization of machines, but, when tasks' requirements exceed available resources, some of the tasks might be throttled down or preempted. We analyze version 2.1 of the Google cluster trace that shows short-term (1 second) task CPU usage. Contrary to the assumptions taken by many theoretical studies, we demonstrate that the empirical distributions do not follow any single distribution. However, high percentiles of the total processor usage (summed over at least 10 tasks) can be reasonably estimated by the Gaussian distribution. We use this result for a probabilistic fit test, called the Gaussian Percentile Approximation (GPA), for standard bin-packing algorithms. To check whether a new task will fit into a machine, GPA checks whether the resulting distribution's percentile corresponding to the requested service level objective, SLO is still below the machine's capacity. In our simulation experiments, GPA resulted in colocations exceeding the machines' capacity with a frequency similar to the requested SLO.

 
Fanny Pascual and Krzysztof Rzadca. Optimizing egalitarian performance in the side-effects model of colocation for data center resource management. In Euro-Par 2017 Proceedings, volume 10417 of LNCS, pages 206-219. Springer, 2017. [ bib | DOI | http | .pdf ]
In data centers, up to dozens of tasks are colocated on a single physical machine. Machines are used more efficiently, but tasks' performance deteriorates, as colocated tasks compete for shared resources. As tasks are heterogeneous, the resulting performance dependencies are complex. In our previous work [18] we proposed a new combinatorial optimization model that uses two parameters of a task - its size and its type - to characterize how a task influences the performance of other tasks allocated to the same machine.

In this paper, we study the egalitarian optimization goal: maximizing the worst-off performance. This problem generalizes the classic makespan minimization on multiple processors (P||Cmax). We prove that polynomially-solvable variants of multiprocessor scheduling are NP-hard and hard to approximate when the number of types is not constant. For a constant number of types, we propose a PTAS, a fast approximation algorithm, and a series of heuristics. We simulate the algorithms on instances derived from a trace of one of Google clusters. Algorithms aware of jobs' types lead to better performance compared with algorithms solving P||Cmax.

The notion of type enables us to model degeneration of performance caused by using standard combinatorial optimization methods. Types add a layer of additional complexity. However, our results - approximation algorithms and good average-case performance - show that types can be handled efficiently. The notion of type enables us to model degeneration of performance caused by using standard combinatorial optimization methods. Types add a layer of additional complexity. However, our results - approximation algorithms and good average-case performance - show that types can be handled efficiently.

 
Piotr Skowron, Krzysztof Rzadca, and Anwitaman Datta. Cooperation and competition when bidding for complex projects: Centralized and decentralized perspectives. IEEE Intelligent Systems, 32(1):17-23, 2017. [ bib | DOI | http ]
To successfully complete a complex project, be it a construction of an airport or of a backbone IT system, agents (companies or individuals) must form a team having required competences and resources. A team can be formed either by the project issuer based on individual agents' offers (centralized formation); or by the agents themselves (decentralized formation) bidding for a project as a consortium-in that case many feasible teams compete for the contract. We investigate rational strategies of the agents (what salary should they ask? with whom should they team up?). We propose concepts to characterize the stability of the winning teams and study their computational complexity.

 
Piotr Skowron and Krzysztof Rzadca. Geographically distributed load balancing with (almost) arbitrary load functions. In HiPC 2015, 22nd IEEE/ACM International Conference on High Performance Computing. IEEE, 2015. [ bib | DOI | .pdf ]
In geographically-distributed systems, communication latencies are non-negligible. The perceived processing time of a request is thus composed of the time needed to route the request to the server and the true processing time. Once a request reaches a target server, the processing time depends on the total load of that server; this dependency is described by a load function. We consider a broad class of load functions; we just require that they are convex and two times differentiable. In particular our model can be applied to heterogeneous systems in which every server has a different load function. We present optimization centralized and a decentralized algorithms for load balancing. We prove bounds on the algorithms' convergence. To the best of our knowledge these bounds were not known even for the special cases studied previously (queuing theory and batches of requests). Both algorithms are any-time and self-stabilizing algorithms.

 
Fanny Pascual and Krzysztof Rzadca. Partition with side effects. In HiPC 2015, 22nd IEEE/ACM International Conference on High Performance Computing. IEEE, 2015. [ bib | DOI | .pdf ]
In data centers, many tasks (services, virtual machines or computational jobs) share a single physical machine. We propose a new resource management model for such colocation. Our model uses two parameters of a task-its size and its type-to characterize how a task influences the performance of the other tasks allocated on the same machine. As typically a data center hosts many similar, recurring tasks (e.g.: a webserver, a database, a CPU-intensive computation), the resource manager should be able to construct these types and their performance interactions. Moreover, realistic variants of our model are polynomially-solvable, in contrast to the NP-hard vector packing used previously. In particular, we minimize the total cost in a model in which each task's cost is a function of the total sizes of tasks allocated on the same machine (each type is counted separately). We show that for a linear cost function the problem is strongly NP-hard, but polynomially-solvable in some particular cases. We propose an algorithm polynomial in the number of tasks (but exponential in the number of types and machines); and another algorithm polynomial in the number of tasks and machines (but exponential in the number of types and admissible sizes of tasks). When there is a single type, we give a polynomial time algorithm. We also prove that, even for a single type, the problem becomes NP-hard for convex costs.

 
Yiannis Georgiou, David Glesser, Krzysztof Rzadca, and Denis Trystram. A scheduler-level incentive mechanism for energy efficiency in HPC. In CCGrid 2015, 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 617-626, 2015. [ bib | DOI | .pdf ]
Energy consumption has become one of the most important factors in High Performance Computing platforms. However, while there are various algorithmic and programming techniques to save energy, a user has currently no incentive to employ them, as they might result in worse performance. We propose to manage the energy budget of a supercomputer through EnergyFairShare (EFS), a FairShare-like scheduling algorithm. FairShare is a classic scheduling rule that prioritizes jobs belonging to users who were assigned small amount of CPU-second in the past. Similarly, EFS keeps track of users' consumption of Watt-seconds and prioritizes those whom jobs consumed less energy. Therefore, EFS incentives users to optimize their code for energy efficiency. Having higher priority, jobs have smaller queuing times and, thus, smaller turn-around time. To validate this principle, we implemented EFS in a scheduling simulator and processed workloads from various HPC centers. The results show that, by reducing her energy consumption, a user will reduce her stretch (slowdown), compared to increasing her energy consumption. To validate the general feasibility of our approach, we also implemented EFS as an extension for SLURM, a popular HPC resource and job management system. We validated our plugin both by emulating a large scale platform, and by experiments upon a real cluster with monitored energy consumption. We observed smaller waiting times for energy efficient users.

 
Piotr Skowron, Krzysztof Rzadca, and Anwitaman Datta. People are processors: Coalitional auctions for complex projects (extended abstract). In AAMAS 2014, 13th International Conference on Autonomous Agents and Multiagent Systems, pages 1525-1526, 2014. [ bib | http | .pdf ]
To successfully complete a complex project, be it a construction of an airport or of a backbone IT system or crowd-sourced projects, agents (companies or individuals) must form a team (a coalition) having required competences and resources. A team can be formed either by the project issuer based on individual agents' offers (centralized formation); or by the agents themselves (decentralized formation) bidding for a project as a consortium-in that case many feasible teams compete for the employment contract. In these models, we investigate rational strategies of the agents (what salary should they ask? with whom should they team up?) under different organizations of the market. We propose various concepts allowing to characterize the stability of the winning teams. We show that there may be no (rigorously) strongly winning coalition, but the weakly winning and the auction-winning coalitions are guaranteed to exist. In a general setting, with an oracle that decides whether a coalition is feasible, we show how to find winning coalitions with a polynomial number of calls to the oracle. We also determine the complexity of the problem in a special case in which a project is a set of independent tasks. Each task must be processed by a single agent, but processing speeds differ between agents and tasks.

 
Joseph Emeras, Vinicius Pinheiro, Krzysztof Rzadca, and Denis Trystram. Ostrich: Fair scheduling for multiple submissions. In PPAM 2013, International Conference on Parallel Processing and Applied Mathematics, volume 8385 of LNCS, pages 26-37. Springer, 2014. [ bib | .pdf ]
Campaign Scheduling is characterized by multiple job submissions issued from multiple users over time. This model perfectly suits today's systems since most available parallel environments have multiple users sharing a common infrastructure. When scheduling individually the jobs submitted by various users, one crucial issue is to ensure fairness. This work presents a new fair scheduling algorithm called OStrich whose principle is to maintain a virtual time-sharing schedule in which the same amount of processors is assigned to each user. The completion times in the virtual schedule determine the execution order on the physical processors. Then, the campaigns are interleaved in a fair way by OStrich. For independent sequential jobs, we show that OStrich guarantees the stretch of a campaign to be proportional to campaign's size and the total number of users. The stretch is used for measuring by what factor a workload is slowed down relative to the time it takes on an unloaded system. The theoretical performance of our solution is assessed by simulating OStrich compared to the classical FCFS algorithm, issued from synthetic workload traces generated by two different user profiles. This is done to demonstrate how OStrich benefits both types of users, in contrast to FCFS.

 
Piotr Skowron and Krzysztof Rzadca. Fair share is not enough: measuring fairness in scheduling with cooperative game theory. In PPAM 2013, International Conference on Parallel Processing and Applied Mathematics, volume 8385 of LNCS, pages 38-48. Springer, 2014. [ bib | .pdf ]
We consider the problem of fair scheduling in a multi-organizational system in which organizations are contributing their own resources to the global pool and the jobs to be processed on the common resources. We consider on-line, non-clairvoyant scheduling of sequential jobs without preemption. The organizations must agree on the order of executing the jobs, i.e. on the scheduling algorithm. To ensure that the organizations are willing to cooperate the scheduling algorithm must be fair.

To characterize fairness, we use cooperative game theory approach. The contribution of an organization is computed based on how this organization influences the utility (which can be any metric, e.g., flow time, turnaround, resource allocation, etc.) of all organizations. Formally, the contribution of the organization is its Shapley value in the cooperative game. The scheduling algorithm should ensure that the contributions of the organizations are close to their utilities. Our previous work proves that this problem is NP-hard and hard to approximate.

In this paper we propose a heuristic scheduling algorithm for the fair scheduling problem. We experimentally evaluate the heuristic and compare its fairness to other algorithms (fair share, round robin) and the exact exponential algorithm. Our results show that fairness of the heuristic algorithm is close to the optimal. The difference between our heuristic and the fair share algorithm is more visible on longer traces with more organizations. We believe that these results prove that fair share might not be an optimal solution for multi-organizational systems.

 
Piotr Skowron and Krzysztof Rzadca. Non-monetary fair scheduling - a cooperative game theory approach. In SPAA 2013, 25th ACM Symposium on Parallelism in Algorithms and Architectures, pages 288-297. ACM, 2013. [ bib | DOI | http | .pdf ]
We consider a multi-organizational system in which each organization contributes processors to the global pool but also jobs to be processed on the common resources. The fairness of the scheduling algorithm is essential for the stability and even for the existence of such systems (as organizations may refuse to join an unfair system).

We consider on-line, non-clairvoyant scheduling of sequential jobs. The started jobs cannot be stopped, canceled, preempted, or moved to other processors. We consider identical processors, but most of our results can be extended to related or unrelated processors.

We model the fair scheduling problem as a cooperative game and we use the Shapley value to determine the ideal fair schedule. In contrast to the current literature, we do not use money to assess the relative utilities of jobs. Instead, to calculate the contribution of an organization, we determine how the presence of this organization influences the performance of other organizations. Our approach can be used with arbitrary utility function (e.g., flow time, tardiness, resource utilization), but we argue that the utility function should be strategy resilient. The organizations should be discouraged from splitting, merging or delaying their jobs. We present the unique (to within a multiplicative and additive constants) strategy resilient utility function.

We show that the problem of fair scheduling is NP-hard and hard to approximate. However, for unit-size jobs, we present a fully polynomial-time randomized approximation scheme (FPRAS). We also show that the problem parametrized with the number of organizations is fixed parameter tractable (FPT). In cooperative game theory, the Shapley value is considered in many contexts as “the” fair solution. Our results show that, although the problem for the large number of organizations is computationally hard, this solution concept can be used in scheduling (for instance, as a benchmark for measuring fairness of heuristic algorithms).

 
Piotr Skowron and Krzysztof Rzadca. Network delay-aware load balancing in selfish and cooperative distributed systems. In HCW 2013, 22st International Heterogeneity in Computing Workshop (in conjunction with IPDPS 2013), IPDPSW, pages 7-18. IEEE, 2013. [ bib | http | .pdf ]
We consider a geographically distributed request processing system composed of various organizations and their servers connected by the Internet. The latency a user observes is a sum of communication delays and the time needed to handle the request on a server. The handling time depends on the server congestion, i.e. the total number of requests a server must handle. We analyze the problem of balancing the load in a network of servers in order to minimize the total observed latency. We consider both cooperative and selfish organizations (each organization aiming to minimize the latency of the locally-produced requests). The problem can be generalized to the task scheduling in a distributed cloud; or to content delivery in an organizationally-distributed CDNs.

In a cooperative network, we show that the problem is polynomially solvable. We also present a distributed algorithm iteratively balancing the load. We show how to estimate the distance between the current solution and the optimum based on the amount of load exchanged by the algorithm. During the experimental evaluation, we show that the distributed algorithm is efficient, therefore it can be used in networks with dynamically changing loads.

In a network of selfish organizations, we prove that the price of anarchy (the worst-case loss of performance due to selfishness) is low when the network is homogeneous and the servers are loaded (the request handling time is high compared to the communication delay). After relaxing these assumptions, we assess the loss of performance caused by the selfishness experimentally, showing that it remains low.

Our results indicate that a set of servers handling requests, connected in a heterogeneous network, can be efficiently managed by a distributed algorithm. Additionally, even if the network is organizationally distributed, with individual organizations optimizing performance of their requests, the network remains efficient.

 
Pierre-Francois Dutot, Fanny Pascual, Krzysztof Rzadca, and Denis Trystram. Approximation algorithms for the multi-organization scheduling problem. IEEE Transactions on Parallel and Distributed Systems, 22:1888-1895, 2011. [ bib | DOI | .pdf ]
The distributed nature of new computing platforms results in the problem of scheduling parallel jobs produced by several independent organizations that have each their own rules. They have no direct control over the whole system, thus it is necessary to revisit classical scheduling with locality constraints. In this article, we consider distributed computing systems in which each organization has its own resources. Each organization aims at minimizing the execution times of its own jobs. We introduce a global centralized mechanism for designing a collaborative solution that improves the global performance of the system while respecting organizations' selfish objectives. The proposed algorithm is proved to have an approximation ratio equal to 3 over the global optimal makespan and this bound is shown to be asymptotically tight (when the number of organizations is large). Several variants of this problem are also studied. Then, we derive another algorithm that improves in practice these solutions by further balancing the schedules. Finally, we provide some experiments based on simulations that demonstrate a very good efficiency of this last algorithm on typical instances.

 
P.-F Dutot, K. Rzadca, E. Saule, and D. Trystram. Multi-objective scheduling. In Y. Robert and F. Vivien, editors, Introduction to Scheduling, Computational Science, pages 219-251. Chapman & Hall/CRC, 2009. [ bib | http ]
 
F. Pascual, K. Rzadca, and D. Trystram. Cooperation in multi-organization scheduling. Concurrency&Computation: Practice and Experience, 21:905-921, 2009. [ bib | DOI | .pdf ]
The distributed nature of the grid results in the problem of scheduling parallel jobs produced by several independent organizations that have partial control over the system. We consider systems in which each organization owns a cluster of processors. Each organization wants its tasks to be completed as soon as possible. In this paper, we model an off-line system consisting of N identical clusters of m processors. We show that it is always possible to produce a collaborative solution that respects participants' selfish goals, at the same time improving the global performance of the system. We propose an algorithm (called MOLBA) with a guaranteed worst-case performance ratio on the global makespan, equal to 4. Next, we show that a better bound (equal to 3) can be obtained in a specific case when the last completed job requires at most m-2 processors. Then, we derive another algorithm (called ILBA) that in practice improves the proposed, guaranteed solution by further balancing the schedules. Finally, by an extensive evaluation by simulation, we show that the algorithms are efficient on typical instances.

 
K. Rzadca and D. Trystram. Promoting cooperation in selfish computational grids. European Journal of Operational Research, 199:647-657, 2009. [ bib | DOI | .pdf ]
In distributed computing the recent paradigm shift from centrally-owned clusters to organizationally distributed computational grids introduces a number of new challenges in resource management and scheduling. In this work, we study the problem of Selfish Load Balancing which extends the well-known Load Balancing (LB) problem to scenarios in which each processor is concerned only with the performance of its local jobs. We propose a simple mathematical model for such systems and a novel function for computing the cost of the execution of foreign jobs. Then, we use the game-theoretic framework to analyze the model in order to compute the expected result of LB performed in a grid formed by two clusters. We show that, firstly, LB is a socially-optimal strategy, and secondly, for similarly loaded clusters, it is sufficient to collaborate during longer time periods in order to make LB the dominant strategy for each cluster. However, we show that if we allow clusters to make decisions depending on their current queue length, LB will never be performed. Then, we propose a LB algorithm which balances the load more equitably, even in the presence of overloaded clusters. Our algorithms do not use any external forms of compensation (such as money). The load is balanced only by considering the parameters of execution of jobs. This analysis is assessed experimentally by simulation, involving scenarios with multiple clusters and heterogeneous load.

 
M. Krystek, K. Kurowski, A. Oleksiak, and K. Rzadca. Comparison of centralized and decentralized scheduling algorithms using gssim simulation environment. In Paraskevi Fragopoulou Sergei Gorlatch and Thierry Priol, editors, Grid Computing: Achievements and Prospects, pages 185-196. Springer, 2008. [ bib | DOI | .pdf ]
 
F. Pascual, K. Rzadca, and D. Trystram. Cooperation in multi-organization scheduling. In Euro-Par 2007 Proceedings, volume 4641 of LNCS, pages 224-233. Springer, 2007. Conference version of [14]. [ bib | DOI ]
 
K. Rzadca. Scheduling in multi-organization grids: Measuring the inefficiency of decentralization. In PPAM 2007 Proceedings, number 4967 in LNCS, pages 1048-1058. Springer, 2007. [ bib | DOI | .pdf ]
We present a novel, generic model of the grid that emphasises the roles of individual organizations that form the system. The model allows us to study the global behaviour of the system without introducing external forms of recompense. Using game-theory and equitable multicriteria optimization, we study three diverse types of computational grids: an off-line system with dedicated uniprocessors, an on-line system with divisible load and an off-line system with parallel jobs. Results show that, unless strong assumptions are made, the complete decentralization leads to a significant loss of performance.

 
K. Rzadca, D. Trystram, and A. Wierzbicki. Fair game-theoretic resource management in dedicated grids. In IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2007), Proceedings. IEEE Computer Society, 2007. [ bib | DOI | .pdf ]
We study two problems directly resulting from organizational decentralization of the Grid. Firstly, the problem of fair scheduling in systems in which the grid scheduler has complete control of processors' schedules. Secondly, the problem of fair and feasible scheduling in decentralized case, in which the grid scheduler can only suggest a schedule, which can be later modified by a processor's owner. Using game theory, we show that scheduling in decentralized case is analogous to the Prisoner's Dilemma game. Moreover, the Nash equilibrium results in significant performance drop. Therefore, a strong community control is required to achieve acceptable performance.

 
T. Roeblitz and K. Rzadca. On the placement of reservations into job schedules. In Euro-Par 2006 Proceedings, volume 4128 of LNCS, pages 198-210. Springer, 2006. [ bib | DOI | .pdf ]
We present a new method for determining placements of flexible reservation requests into a schedule. For each considered placement the what-if method inserts a placeholder into the schedule and simulates the processing of batch jobs currently known to the system. Each placement is evaluated wrt. well-known scheduling metrics. This information may be used by a Grid reservation service to choose the most likely successful placement of a reservation. According to the results of extensive simulations, the what-if method grants more reservations and improves the performance of local jobs compared to our previously used load method.

 
K. Rzadca and D. Trystram. Brief announcement: Promoting cooperation in selfish grids. In SPAA 2006 (Annual ACM Symposium on Parallelism in Algorithms & Architectures) Proceedings, page 332. ACM Press, 2006. Conference version of [15]. [ bib ]
 
G. Wojtyla, K. Rzadca, and F. Seredynski. Artificial immune systems applied to multiprocessor scheduling. In PPAM 2005 Proceedings, volume 3911 of LNCS, pages 904-911. Springer, 2006. [ bib ]
 
K. Rzadca and F. Seredynski. Heterogeneous multiprocessor scheduling with differential evolution. In IEEE CEC 2005 (Congress of Evolutionary Computation) Proceedings, pages 2840-2847. IEEE Computer Society, 2005. [ bib | DOI | .pdf ]

distributed systems

 
Grzegorz Milka and Krzysztof Rzadca. Dfuntest: A testing framework for distributed applications. In PPAM 2017, International Conference on Parallel Processing and Applied Mathematics (accepted), LNCS. Springer, 2017. [ bib ]
 
Piotr Skowron and Krzysztof Rzadca. Flexible replica placement for optimized p2p backup on heterogeneous, unreliable machines. Concurrency and Computation: Practice and Experience, 28(7):2166-2186, 2016. [ bib | DOI | http | .pdf ]
P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization's workstations should be cheaper as existing hardware is used; more efficient as there is no single bottleneck server; and more reliable as the machines can be geographically dispersed.

We present an architecture of a p2p backup system that uses pairwise replication contracts between a data owner and a replicator. In contrast to a standard p2p storage system using directly a DHT, the contracts allow our system to optimize replicas' placement depending on a specific optimization strategy, and so to take advantage of the heterogeneity of the machines and the network. Such optimization is particularly appealing in the context of backup: replicas can be geographically dispersed, the load sent over the network can be minimized, or the optimization goal can be to minimize the backup/restore time. However, managing the contracts, keeping them consistent and adjusting them in response to dynamically changing environment is challenging.

We built a scientific prototype and ran experiments on 150 workstations in our university's computer laboratories and, separately, on 50 PlanetLab nodes. We found out that the main factor affecting the performance of the system is the availability of the machines. Yet, our main conclusion is that it is possible to build an efficient and reliable backup system on highly unavailable machines, as our computers had just 13% average availability.

 
Krzysztof Rzadca, Anwitaman Datta, Gunnar Kreitz, and Sonja Buchegger. Game-theoretic mechanisms to increase data availability in decentralized storage systems. ACM Transactions on Autonomous and Adaptive Systems, 10(3):14:1-14:32, 2015. [ bib | DOI | .pdf ]
In a decentralized storage system, agents replicate each other's data in order to increase availability. Compared to organizationally-centralized solutions, such as cloud storage, a decentralized storage system requires less trust in the provider and may result in smaller monetary costs. Our system is based on reciprocal storage contracts which allow the agents to adopt to changes in their replication partners' availability (by dropping inefficient contracts and forming new contracts with other partners). The data availability provided by the system is a function of the participating agents' availability. However, a straightforward system in which agents' matching is decentralized uses the given agent availability inefficiently. As agents are autonomous, the highly-available agents form cliques replicating data between each other, which makes the system too hostile for the weakly-available newcomers. In contrast, a centralized, equitable matching is not incentive-compatible: it does not reward users for keeping their software running.

We solve this dilemma by a mixed solution: an “adoption” mechanism in which highly-available agents donate some replication space, which in turn is used to help the worst-off agents. We show that the adoption motivates agents to increase their availability (is incentive-compatible); but also that it is sufficient for acceptable data availability for weakly-available agents.

 
Piotr Skowron and Krzysztof Rzadca. Exploring heterogeneity of unreliable machines for p2p backup. In HPCS 2013, International Conference on High Performance Computing & Simulation, pages 91-98. IEEE, 2013. [ bib | DOI | http | .pdf ]
P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization's workstations should be cheaper (as existing hardware is used), more efficient (as there is no single bottleneck server) and more reliable (as the machines are geographically dispersed).

We present the architecture of a p2p backup system that uses pairwise replication contracts between a data owner and a replicator. In contrast to standard p2p storage systems using directly a DHT, the contracts allow our system to optimize replicas' placement depending on a specific optimization strategy, and so to take advantage of the heterogeneity of the machines and the network. Possible goals include geographical dispersion of the replicas, minimization of backup or restore time or the usage of long-range links. However, managing the contracts, keeping them consistent and adjusting them in response to dynamically changing environment is challenging.

We built a scientific prototype and ran the experiments on 150 workstations in the university's computer laboratories and, separately, on 50 PlanetLab nodes. We found out that the main factor affecting the quality of the system is the availability of the machines. Yet, our main conclusion is that it is possible to build an efficient and reliable backup system on highly unreliable machines (our computers had just 13% average availability).

 
Anwitaman Datta, Sonja Buchegger, Le-Hung Vu, Strufe Thornsten, and Krzysztof Rzadca. Decentralized online social networks. In B. Furht, editor, Handbook of Social Network Technologies and Applications, pages 349-378. Springer, 2011. [ bib | http ]
 
Anwitaman Datta, Krzysztof Rzadca, Sally Ang, and Goh Chee Hong. Serverless social software for nomadic collaboration. In A. Murugan, editor, e-Research Collaboration: Theory, Techniques and Challenges, pages 85-103. Springer, New York Dordrecht Heidelberg London, 2010. [ bib | http ]
Recent portable devices, from sophisticated mobile phones, to netbooks, thanks to wireless networking and powerful batteries, give hardware support for collaborative work on the go, even when the Internet connection is not available. Yet, current collaboration software, from Concurent Versions System (CVS) or Subversion (SVN) repositories to Google Docs and collaborative versions of commercial applications (CoWord, CoMaya), requires a dedicated server to synchronize clients, and thus a stable network connection. In this chapter, we present two tools that use peer-to-peer paradigm to build serverless collaboration networks. PBDMS enables users to share, search and review bibliographic databases. SharedMind provides collaborative document editing to FreeMind, popular, open source mind-mapping software. Both tools handle disconnections and network divisions, enabling users to continue their work and to synchronize with their reachable peers. Both tools have been implemented and tested in small scale. PBDMS is available for download at http://code.google.com/p/bibliographicsocialinfosys/; SharedMind is available at http://code.google.com/p/sharedmind . We believe that such seamless, flexible collaboration applications provide the degree of freedom promised by the recent portable devices, yet not fully used by the current applications.

 
Krzysztof Rzadca, Anwitaman Datta, and Sonja Buchegger. Replica placement in p2p storage: Complexity and game theoretic analyses. In ICDCS 2010, The 30th International Conference on Distributed Computing Systems, Proceedings, pages 599-609. IEEE Computer Society, 2010. [ bib | DOI | .pdf ]
In peer-to-peer storage systems, peers replicate each others' data in order to increase availability. If the matching is done centrally, the algorithm can optimize data availability in an equitable manner for all participants. However, if matching is decentralized, the peers' selfishness can greatly alter the results, leading to performance inequities that can render the system unreliable and thus ultimately unusable. We analyze the problem using both theoretical approaches (complexity analysis for the centralized system, game theory for the decentralized one) and simulation. We prove that the problem of optimizing availability in a centralized system is NP-hard. In decentralized settings, we show that the rational behavior of selfish peers will be to replicate only with similarly-available peers. Compared to the socially-optimal solution, highly available peers have their data availability increased at the expense of decreased data availability for less available peers. The price of anarchy is high: unbounded in one model, and linear with the number of time slots in the second model. We also propose centralized and decentralized heuristics that, according to our experiments, converge fast in the average case. The high price of anarchy means that a completely decentralized system could be too hostile for peers with low availability, who could never achieve satisfying replication parameters. Moreover, we experimentally show that even explicit consideration and exploitation of diurnal patterns of peer availability has a small effect on the data availability-except when the system has truly global scope. Yet a fully centralized system is infeasible, not only because of problems in information gathering, but also the complexity of optimizing availability. The solution to this dilemma is to create system-wide cooperation rules that allow a decentralized algorithm, but also limit the selfishness of the participants.

 
Krzysztof. Rzadca, Jackson Tan Teck, and Anwitaman Datta. Multi-objective optimization of multicast overlay for collaborative applications. Computer Networks, 54(12):1986-2005, 2010. [ bib | DOI | .pdf ]
Current implementations of real-time collaborative applications rely on a dedicated infrastructure to carry out all synchronizing and communication functions, and require all end nodes to communicate directly with and through the central server. In this paper, we investigate scenarios, in which the most resource intensive functionality of continuous communication among collaborators to disseminate changes is decentralized, utilizing the end users as relays. We observe that communication characteristics of real-time collaboration makes use of existing multicast mechanisms unsuitable. As collaborative editing sessions are typically long, we are able to gather and then use additional parameters of nodes (their instabilities and frequency of sending updates) and communication links (latencies and average costs). We identify several criteria to determine the quality of a multicast tree: cost, latency and instability. We analyze the complexity of these problems and propose algorithms to optimize the communication topology. We also consider the multiobjective problem in which we search for a tree that results in a good trade-off between these measures. Validation of algorithms on numerous graphs shows that it is important to consider the multiobjective problem, as optimal solutions for one performance measure can be far from optimal values of the others. Finally, we briefly present an implementation of a communication library that uses the proposed algorithms to periodically adjust the dissemination tree.

 
Adam Wierzbicki, Anwaitaman. Datta, Lukasz Zaczek, and Krzysztof Rzadca. Supporting collaboration and creativity through mobile p2p computing. In X. Shen, H. Yu, J. Buford, and M. Akon, editors, Handbook of Peer-to-Peer Networking, pages 1367-1400. Springer, 2010. [ bib | DOI ]
 
K. Rzadca, J. Tan Teck Yong, and A. Datta. Multicast trees for collaborative applications. In 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, Proceedings, pages 60-67. IEEE Computer Society, 2009. [ bib | DOI | .pdf ]
Current implementations of real-time collaborative applications rely on a dedicated infrastructure to carry out all synchronizing and communication functions, and require all end nodes to communicate directly with and through the central server. In this paper, we investigate an architecture, in which the most resource intensive functionality of continuous communication among collaborators to disseminate changes is decentralized, utilizing the end users as relays. We observe that communication characteristics of real-time collaboration makes use of existing multicast mechanisms unsuitable. As collaborative editing sessions are typically long, we are able to gather and then use additional parameters of nodes (their instabilities and frequency of sending updates) and communication links (latencies and average costs). We identify several criteria to determine the quality of a multicast tree: cost, latency and instability. We analyze the complexity of these problems and propose algorithms to optimize the communication topology. We also consider the multiobjective problem in which we search for a tree that results in a good trade-off between these measures. Validation of algorithms on numerous graphs shows that it is important to consider the multiobjective problem, as optimal solutions for one performance measure can be far from optimal values of the others.

trust

 
Liu Xin, Anwitaman Datta, and Krzysztof Rzadca. Trust beyond reputation: A computational trust model based on stereotypes. Electronic Commerce Research and Applications, 12:24-39, January-February 2013. [ bib | DOI | http ]
Models of computational trust support users in taking decisions. They are commonly used to guide users' judgements in online auction sites; or to determine quality of contributions in Web 2.0 sites. However, most existing systems require historical information about the past behavior of the specific agent being judged. In contrast, in real life, to anticipate and to predict a stranger's actions in absence of the knowledge of such behavioral history, we often use our "instinct"- essentially stereotypes developed from our past interactions with other "similar" persons. In this paper, we propose StereoTrust, a computational trust model inspired by stereotypes as used in real-life. A stereotype contains certain features of agents and an expected outcome of the transaction. When facing a stranger, an agent derives its trust by aggregating stereotypes matching the stranger's profile. Since stereotypes are formed locally, recommendations stem from the trustor's own personal experiences and perspective. Historical behavioral information, when available, can be used to refine the analysis. According to our experiments using Epinions.com dataset, StereoTrust compares favorably with existing trust models that use different kinds of information and more complete historical information.

 
A. Hupa, K. Rzadca, A. Wierzbicki, and A. Datta. Interdisciplinary matchmaking: Choosing collaborators by skill, acquaintance and trust. In Ajith Abraham, Aboul-Ella Hassanien, and Vaclav Snasel, editors, Computational Social Network Analysis, Computer Communication and Networks, pages 319-347. Springer, 2010. [ bib | DOI | http ]
 
Liu Xin, A. Datta, K. Rzadca, and Lim Ee-Peng. Stereotrust: A group based personalized trust model. In CIKM 2009, The 18th ACM Conference on Information and Knowledge Management, Proceedings, pages 7-16. ACM Press, 2009. [ bib | DOI | .pdf ]
Trust plays important roles in diverse decentralized environments, including our society at large. Computational trust models help to, for instance, guide users' judgements in online auction sites about other users; or determine quality of contributions in web 2.0 sites. Most of the existing trust models, however, require historical information about past behavior of a specific agent being evaluated - information that is not always available. In contrast, in real life interactions among users, in order to make the first guess about the trustworthiness of a stranger, we commonly use our "instinct" - essentially stereotypes developed from our past interactions with "similar" people. We propose StereoTrust, a computational trust model inspired by real life stereotypes. A user forms stereotypes using her previous transactions with other agents. A stereotype contains certain features of agents and an expected outcome of the transaction. These features can be taken from agents' profile information, or agents' observed behavior in the system. When facing a stranger, the stereotypes matching stranger's profile are aggregated to derive his expected trust. Additionally, when some information about stranger's previous transactions is available, StereoTrust uses it to refine the stereotype matching. According to our experiments, StereoTrust compares favorably with existing trust models that use different kind of information and more complete historical information. Moreover, because evaluation is done according to user's personal stereotypes, the system is completely distributed and the result obtained is personalized. StereoTrust can be used as a complimentary mechanism to provide the initial trust value for a stranger, especially when there is no trusted, common third parties.

 
T. Kaszuba, K. Rzadca, and A. Wierzbicki. Discovering the most trusted agents without central control. In Embedded and Ubiquitous Computing, IEEE/IFIP International Conference on, volume 2, pages 616-621. IEEE Computer Society, 2008. [ bib | DOI ]

data clustering

 
K. Rzadca. Evaluating the quality of maximum variance cluster algorithms. In ICCVG 2004 (International Conference on Computer Vision and Graphics) Proceedings, volume 32 of Computational Imaging and Vision, pages 981-986. Springer, 2006. [ bib ]
 
K. Rzadca and F. Ferri. Incrementally assessing cluster tendencies with a maximum variance cluster algorithm. In IbPRIA (Iberian Conference on Pattern Recognition and Image Analysis) Proceedings, volume 2653 of LNCS, pages 868-875. Springer, 2003. [ bib | .pdf ]

generated by bibtex2html 1.96.