Introduction Very few companies run 24/7 clusters at the scale like Facebook using Hadoop/HDFS. As scale increases HDFS shows its limitations in scalability, performance and availability. HDFS was first designed 10+ years ago based on the assumptions that were valid in 2005. Over the years, numerous significant improvements have been made to the system. However storage, network and compute technology changed at different ratios during last decade. In this paper we describe Warm Storage - grounds up solution that exploits technology inflection points that exist in 2015 and we believe will continue into the future. We describe architecture and some of the implementation details of the novel and more efficient and available storage system optimized for DW workloads.
Kestutis Patiejunas is a software engineer at Facebook. He works with a team of engineers on Facebook Warm and Hot Storage – in internal replacement for HDFS. Before he worked on Cold Storage – system for long term data archival where he created hard drive based cold storage and... Read More →
RADOS is the layer of the Ceph storage system responsible for data storage and replication. In this presentation, Sam Just, the PTL of the Ceph core team, will discuss recent developments in RADOS new in the recent Jewel release and exciting upcoming work.
Ceph is the leading open-source storage management platform for private cloud and large-scale clustered systems. As flash-based storage has come into the mainstream many of the industry best-practices must be re-examined to properly realize the full value of flash while simultaneously being cost-effective.
Ceph has been rapidly evolving to support large-scale deployment of flash. This presentation will examine the history and current best-practices for deploying flash with Ceph. Future developments in the Ceph platform will also be described and their impact on flash deployments.
GlusterFS is an open-source (mostly) POSIX compliant distributed filesystem originally written by Gluster Inc and now maintained by RedHat Inc. Here at Facebook it had humble beginings: a single rack of machines, serving only a single use-case. Over the next 4 years it grew to thousands of machines, hosting 10s of petabytes of data. This presentation is a story of how this transformation occurred and the things we did to make it happen.
I'll cover how we manage & automate 1000's of GlusterFS bricks as well as dive into some of the patches we've contributed to the community to make GlusterFS easier to manage, monitor and scale.
Richard Wareing has been a Production Engineer for over 7 years at Facebook, with a passion for Storage Engineering. During the course of his career there he helped scale their GlusterFS (POSIX) install base from nothing to one of the largest install bases in the world.From there... Read More →
Ceph is a popular storage backend for OpenStack deployments, in use for block devices via RBD/Cinder and for object storage with RGW. Ceph also includes a filesystem, CephFS which is suitable for integration with Manila, the OpenStack shared filesystem service.
This presentation introduces the new CephFS Native driver for Manila, describing its the implementation of the driver, and how to deploy and use it. Areas of particular interest for Ceph users will include how CephFS snapshots map to Manila's snapshot API, how Ceph's authentication scheme can be used with Manila.
In addition, the direction of future work will be described, including how the driver can be extended to provide an NFS share service based on CephFS, and how this work would integrate with VSOCK-based hypervisor-mediated filesystem access.
Glusterfs's new tiering feature helps assure that data is available and accessible at the correct performance level thus satisfying both cost and performance concerns. This presentation will explain the Glusterfs's automated tiering feature followed by a live demonstration, and provide performance benchmark results from different hardware configurations. Glusterfs tiering monitors and identifies the activity level of the data and automatically promotes active data to hot SSD based storage and demotes inactive data to cold SATA based storage. Hence, the data movement can happen in either direction, based on the access frequency. The demonstration from Quanta/QCT labs (commodity x86 hardware) will show how to implement a hot tier based on SSD drives (optimized for performance) and attach it to an Erasure coded Glusterfs volume that is based on SATA drives (optimized for cost/capacity).
I am a Senior Storage Architect at Red Hat and have extensive experience with Red Hat's software defined storage products. I help the BU to create reference architectures on the storage products with various software and hardware partners. I have architected and launched the test... Read More →
Replication is an important aspect of any storage solution for backing up data and ensuring High Availability. This is non-trivial in case of distributed systems because achieving consistency and HA with only 2x replication is difficult as there is no notion of quorum (majority voting). But not everyone can afford to store 3x copies just to achieve consensus in case of a disagreement. So they end up using 2x replication but face frequent split-brains due to flaky networks etc. - a common complaint on the gluster-users mailing list.
The presentation will focus on the arbiter configuration in glusterFS replica volumes and explain how it provides the same consistency guarantees as that of a full blown 3 way replica but without the 3x storage cost. It describes how the arbitration logic works to prevents files from ending up in split-brain, how to deploy and monitor them.
Ravishankar a.k.a. Ravi is a believer of Linux and OSS. He started out as a Linux user circa 2004 when he got his hands on a Knoppix Live CD after buying a PC and shortly thereafter the Red Hat 9 three-CD pack. Since 2009, he has been working as a developer on Linux on multiple domains... Read More →
Erasure Coding is traditionally limited to Archival workloads due to its performance and computational requirements. All Flash Storage changes that and enables Ceph with Erasure Coding to be a viable solution for active workloads as well. You will hear about how Erasure coding and other techniques can be used with Ceph on All Flash Storage to provide the benefits of Flash at affordable costs. We will discuss recent improvements in Ceph to making it a high performance Cinder block Storage solution on Flash, while lowering the overall storage costs.
Allen joined SanDisk in 2013 as an Engineering Fellow, he is responsible for directing software development for SanDisk’s system level products. He has previously served as Chief Architect at Weitek Corp. and Citrix, and founded several companies including AMKAR Consulting, Orbital... Read More →
The increasing popularity of cloud storage is attracting more and more companies move their data from their traditional datacenters to cloud and build their storage solutions with Ceph. It is becoming more and more popular building scale out storage solutions in China storage market recently, ranging from IPDC, OEM/ODM to research insistutes. In this session we will presents the challenges and opportunities Ceph based storage solutions faced in those customers. We will first introduce the advantages Ceph demonstrated in their real life production cluster; then shift to the common issues they encountered in different scnearios, including stability concerns like OSD flapping and slow requests, feature missing - high peroformance caching, performance problem. At the last, we will focus on the areas we can improve or optimize for more opportunities like smart failure report and detection.
Jian is a senior software engineer of the Cloud Storage Engineer group from Intel Asia Pacific Research & Development Ltd. He is responsible for developmign software storage architecture and solutions based on Intel platforms. He has 7 years of software development, performance analysis... Read More →
In a typical distributed system managing n number of nodes in an effective way is always a challenge. Distributed systems always demands to meet CAP theorem which is really hard to meet with a good performance numbers. All the nodes participating in the cluster should have the consistent data which is one of the criteria of the CAP theorem. This could be achieved by keeping the configuration details across all the nodes in the cluster, however this algorithm doesn’t scale (considering n X n number of exchange of information) and can end up having split brain situations. This could be avoided having distributed consistent store across m number of nodes (considered) as leaders where m < n. There are some existing technologies like etcd/consul which provides good abstraction and APIs for centralized store which can be consumed here.
Atin has been working with Red Hat India Pvt Ltd for its storage business unit as a senior s/w engineer. His key responsibility is to maintain high quality code for GlusterD, the management daemon of Gluster. Prior to storage, Atin has worked in different domains like telecom, BFS... Read More →
Gluster is a scale-out storage system that works on commodity hardware. Storage as a Service is an evolving paradigm that enables organizations to provision, consume and de-provision storage on demand. In this session, Vijay will provide an introduction to the architecture of Gluster, Storage as a Service paradigm and recent changes in Gluster that enable Storage as a Service better. Vijay will also detail integrations with projects like Docker, Kubernetes, OpenShift etc. that allows Gluster storage to be consumed as a service.
Vijay Bellur is a co-maintainer for the upstream GlusterFS project and was an architect at Gluster before its acquisition by Red Hat in 2011. He has been involved with building enterprise storage and scalable, distributed systems for the past decade. Vijay works out of the Red Hat... Read More →
The Massachusetts Open Cloud (MOC) initiative is a consortium of private and public institutions across Massachusetts dedicated to the creation of publicly available cloud computing that will drive Big Data innovation. In this section, we look at running Hadoop over Ceph Object Storage on a large MOC type cloud environment and several common problems are identified. In the Storage deployment part, we are using a new mid-tier cache architecture with Intel NVMe SSDs as the warm-tier to speed up the performance. We will show the HW configurations and performance results of the reference architecture.
Jian is a senior software engineer of the Cloud Storage Engineer group from Intel Asia Pacific Research & Development Ltd. He is responsible for developmign software storage architecture and solutions based on Intel platforms. He has 7 years of software development, performance analysis... Read More →
IBM mainframes can be a mystery to those more familiar with x86 Linux clusters, and high availability Linux clustering can be a mystery to those more familiar with mainframe systems. In this presentation, Steven Whitehouse gives an overview of how the pieces fit together to enable an s390x HA Linux cluster for a shared GFS2 filesystem.
Steven Whitehouse currently manages the RHEL Filesystems team at Red Hat. His introduction to Linux kernel development came in 1993 when he wrote a small patch for AX.25, he is also the previous maintainer of Linux DECnet and the GFS2 Filesystem. Steven has spoken at a number of conferences... Read More →