The Complete File System/ File System Virtualization Using Python.notes
Thursday, March 24, 2005
TITLE OF PAPER: The Complete File System: File System Virtualization Using Python
URL OF PRESENTATION: _URL_of_powerpoint_presentation_
PRESENTED BY: Christopher Gillett
REPRESENTING: Compete, Inc.
CONFERENCE: PyCon 2005
DATE: Thursday, March 24, 2005
LOCATION: GWU Cafritz Conference Center, Grand Ballroom
--------------------------------------------------------------------------
REAL-TIME NOTES / ANNOTATIONS OF THE PAPER:
{If you've contributed, add your name, e-mail & URL at the bottom}
Compete Inc.
How to drive advert, marketing, sales
Analyzing lots of data
Need to manage lots of files
CFS Compete File System
Application level (runs in user space)
Uses MySQL with Python
A few seconds per transaction
But jobs run for hours, so it doesn't matter
Still it may matter for real-time/fast applications
File System Virtualization
Mapping multiple and potentially disparate file systems int a monolithic view such that applications, scripts, etc. need not know the physical location of the files that are being manipulated.
Common directory structure for all participating file systems
CFS uses Scheduler to select "next" device
Database handles logical file name mapping to physical filenames
End User perspective
Has to integrate cleanly with Unix scripts
We solved this using cfsopen command
cfsopen maps logical filenames to physical filenames for use in scripts
Must have a programmable API
Compete Data Access Layer manages many different data sources
Wrap CFS functionality behind CDAL
Then users can use CDAL as they always do.
Module that maps virtual filenames to real filenames -- all file
open/closing done with this module
Users must be able to catalog and delete files as needed
cfsls and cfsrm commands
Compete File System architecture
Scheduler
Which real filesystem to use next?
Database manager
Maps virtual filenames to real filenames
Two of these, since replication wasn't available at the time
Straightforward implementation
Just map filenames
Allow access to CFS file as "normally" as possible for end users
Stateless model of sorts:
No daemons to worry about
Better portability for CFS
Simple code
Scheduling and Load Balancing
Version 1
Multiple file systems on individual NFS servers
Assume all file systems are about the same size
Used a round robin approach
Worked ok but large files & small files made problems
Version 1.1
Best fit scheduler - selects based on size of files
Problem with many small/temp files that grow, since the files are not
distributed very well
Future
Predictive Scheduler
Score-boarding file system sizing
Predicts sizes of new files based on average size in directory
Allows applications to pre-allocate space as a hint to CFS
Open file reaper tracks implicit file closes and updates stats
CFS Internal State
Stateless processes
State information stored in database
Losing the CFS Database due to ___
Role of Python
Easy to think about algorithms, functionality
Consideration given to other languages for deployment -C, C++
but rejected for the "usual" reasons
Whining and ranting
Python VM footprint and speed are good, but a compiled lang could be better
I/O in Python is a problem (dealing w/ files that are hundreds of GBs)
Results of Building and Deploying CFS
Happy management
Zero Worries
Effective storage resource management
Hundreds of thousands of files across multiple devices
Multiple terabytes of data under coherent structure
--------------------------------------------------------------------------
REFERENCES: {as documents / sites are referenced add them below}
--------------------------------------------------------------------------
QUOTES:
--------------------------------------------------------------------------
CONTRIBUTORS: {add your name, e-mail address and URL below}
Linden Wright <lwright@mac.com>
Abhay Saxena <ark3@email.com>
--------------------------------------------------------------------------
E-MAIL BOUNCEBACK: {add your e-mail address separated by commas }
--------------------------------------------------------------------------
NOTES ON / KEY TO THIS TEMPLATE:
A headline (like a field in a database) will be CAPITALISED
This differentiates from the text that follows
A variable that you can change will be surrounded by _underscores_
Spaces in variables are also replaced with under_scores
This allows people to select the whole variable with a simple double-click
A tool-tip is lower case and surrounded by {curly brackets / parentheses}
These supply helpful contextual information.
--------------------------------------------------------------------------
Copyright shared between all the participants unless otherwise stated...