Secure Cloud Computing Center

Introduction

There is a critical need to securely store, manage, share and analyze massive amounts of complex (e.g., semi-structured and unstructured) data to determine patterns and trends in order to improve the quality of healthcare, better safeguard the nation and explore alternative energy. The emerging cloud computing model attempts to handle massive amounts of data. Google has now introduced the MapReduce framework for processing large amounts of data on commodity hardware. Apache’s Hadoop distributed file system (HDFS) is emerging as a superior software component for cloud computing combined with integrated parts such as MapReduce. However, state of the art cloud computing systems are not sufficient due to the fact that (i) they do not provide adequate security mechanisms to protect sensitive data and (ii) they do not have the capability to process massive amounts of semantic web and geospatial data.

To address the limitations of current cloud computing platforms, at The University of Texas at Dallas (UTD), we have utilized state-of-the-art hardware, software and data components based on Hadoop and MapReduce technologies and are developing a secure cloud computing framework for multiple agencies including the Air Force Office of Scientific Research since 2007. In particular, we have used modern hardware parts (e.g., secure coprocessors) to improve the performance due to incorporating additional security functionalities, integrated open source software parts, as well as custom developed software parts to support secure cloud query operations on complex data, provide fine-grained access control and reference monitor support as well as provide strong authentication mechanisms.

We have also established a strong education program in Information Assurance (IA) since 2000 at UTD. We were designated an NSA/DHS Center for Excellence in Education in 2004 and for Research in 2008. We received the NSF SFS award in 2010 and are training students to obtain Masters Degrees in IA. Our course offerings include Systems Security,Network Security, Data/Applications Security, Trustworthy Web Services/Semantic Web, Cryptography, Data Mining for Security and Digital Forensics.

To enhance our research efforts in assured cloud computing with a strong education program, we have obtained a capacity building grant from NSF in September 2011. The object of our NSF project is to leverage the extensive investments we have made in assured cloud computing research and IA education at UTD to develop courses in Assured Cloud Computing. In particular, we are developing new courses related to building and assuring the cloud as well as enhancing our existing courses on Network Security, Data and Applications Security, Data Mining for Security Applications, Systems Security and Digital Forensics by introducing a major component in secure cloud computing for each of these courses. We are also enhancing the current cloud computing framework that we have developed so that students (i) utilize this framework for their course projects and (ii) build features to this framework as part of their class programming projects. Our courses on assured cloud computing are included as part of the curriculum for the SFS program. In addition, we are also participating in a number of cloud computing initiatives including the DFW Metroplex Technology Business Council (MTBC) SIG (Special Interest Group) in cloud computing. Our book on Building and Securing the cloud is expected to be published in late 2012 and will be used as a text book for our Flagship Assured Cloud Computing Course to be taught in 2013.

Courses Offered

Information Assurance Track

The Department of Computer Science offers several courses related to assured cloud computing:

  • Secure Cloud Data Storage
  • Data and Applications Security – Units on Secure Cloud Data Management
  • Advanced Digital Forensics and Data Reverse Engineering – Unit on Cloud Forensics
  • Secure Web Services and Cloud Computing
  • Systems Security and Binary Code Analysis – Unit on Virtualization Security
  • Information and Security Analytics – Units on the Impact of Cloud Computing



Fall 2011

Secure Cloud Data Storage:
This course provides a comprehensive overview of cloud data security and storage issues. Students will learn various cryptography techniques to secure the data in the cloud. They will also learn storage schemes in the cloud. Finally students will have a thorough knowledge of the various cloud offerings and their data storage aspects.

Learning outcome: Students will have a solid understanding of data storage and security strategies for the cloud.


Data and Applications Security – Units on Secure Cloud Data Management:
This course taught principles, technologies, tools and trends for data and applications security. Topics covered include: Confidentiality, Privacy and Trust Management, Secure Databases, Secure Distributed Systems, Secure Multimedia and Object Systems, Secure Data Warehouses, Data Mining for Security Applications, Assured Information Sharing, Secure Knowledge Management, Trustworthy Semantic Web and Secure Social Networks. In addition, several units on secure cloud data management were introduced.

Learning outcome: Students have a thorough understanding of the principles, practice and technologies of secure data management.


Advanced Digital Forensics and Data Reverse Engineering – Unit on Cloud Forensics:

The course covered the underlying technical details of digital forensics and data reverse engineering, discussed various security applications, analyzed potential limitations of existing systems and proposed solutions to develop a more secure systems. Unit on cloud forensics was also introduced.

Learning outcome: Students have a good understanding of the fundamentals of digital forensics through reverse engineering.


Spring 2012

Secure Web Services and Cloud Computing:
The first half of the course explores secure web services, semantic web services which are fundamental to cloud computing. The second half of the course is devoted entirely to secure cloud computing. Topics include secure virtualization, secure cloud data storage, identity management in the cloud and secure cloud computing technologies, tools and standards. A book based on this course is expected to be published in Fall 2012.

Learning outcome: Students understand the various cloud technologies and the use of the Hadoop/MapReduce framework to develop assured cloud computing tools.


Systems Security and Binary Code Analysis – Unit on Virtualization Security:
This course explained low-level system details from compiler, linker, loader to OS kernel and computer architectures, examine the weakest link in each system component, explore the left bits and bytes after all these transformations and study the state-of-the-art offenses and defenses. Attacks due to virtualization (related to secure cloud) were also included in the course.

Learning outcome: Students will be able to understand how an attack is launched (e.g., how an exploit is created), and how to do the defense (e.g., developing OS patches, analyzing the binary code, and detecting intrusions)


Cloud Computing:
This course covers a series of current cloud computing technologies including technologies for Infrastructure as a Service, Platform as a Service, Software as a Service, and Physical Systems as a Service. For different layers of the cloud technologies, practical solutions such as Google, Amazon, Microsoft, SalesForce.com, etc. solutions as well as theoretical solutions (covered by a set of papers) are introduced.

Learning outcome: By engaging in hands-on exploration of existing cloud technologies as well as development of new technologies, students develop an in-depth understanding of cloud computing.

Summer 2012

Information and Security Analytics – Units on the Impact of Cloud Computing:

This course covers the ten CISSP modules. This includes Security Governance and Risk, Access Control, Security Architecture, Cryptography, Network Security, Physical Security, Applications Security, Business Recovery Management, Operating Security, Legal aspects and Forensics. We have introduced units on the impact of the cloud on the ten modules.

Learning outcome: At the end of the course in August 2012, students will have a solid understanding on the CISSP modules for the cloud. For example, what are the governance issues for the cloud? What are the access control and identity management issues for the cloud?

Course Details

CS 7301 – Secure Cloud Data Storage
For more information on CS 7301 please click here


CS 4389 – Data and Applications Security
For more information on CS 4389 please click here

CS 6v81 – Advanced Digital Forensics and Data Reverse Engineering
For more information on this course please click here

CS 6v81 – Secure Web Services/Cloud Computing
For more information on this course please click here


CS 6v81 – Systems Security and Binary Code Analysis
For more information on this course please click here

CS 6v81 – Information and Security Analytics
For more information on this course please click here

CS 6v81 – Cloud Computing
For more information on this course please click here

Research

We have defined a layered framework for assured cloud computing consisting of the secure virtual machine layer, secure cloud storage layer, secure cloud data layer, and the secure virtual network monitor layer. Cross cutting services are provided by the policy layer, the cloud monitoring layer, the reliability layer and the risk analysis layer.

For the Secure Virtual Machine (VM) Monitor we are examining XEN developed at the University of Cambridge and exploring security to meet the needs of our applications (e.g., secure distributed storage and data management).

For Secure Cloud Storage Management, we are developing a storage infrastructure with Hadoop and MapReduce technologies. For Secure Cloud Data Management, we have developed secure query processing algorithms for RDF (Resource Description Framework) and SQL (HIVE) data in clouds with an XACML-based (eXtensible Access Control Markup Language) policy manager utilizing the Hadoop/MapReduce Framework.

For Secure Cloud Network Management, our goal is to implement a Secure Virtual Network Monitor (VNM) that will create end-to-end virtual links with the requested bandwidth, as well as virtual nodes and monitor the computing resources. Below we give an overview of a sample our research projects that are contributing to our cloud infrastructure. We use our cloud computing framework and our research projects for our students to carry out programming projects for their courses. That is, our education efforts are tightly integrated with our research efforts.

AFOSR: Assured Cloud Computing, $2.2m, 2008-13, PI: B. Thuraisingham, Co-PI: L. Khan, M. Kantarcioglu, K. Hamlen, I. Yen
UTD is leading an effort to design and develop a Secure Service Oriented Architecture-based (SOA) Cloud that will host the resource management services (e.g., scheduling), security services (e.g., attribute based access control and accountability), storage services and information management services. Our goal is to develop technologies to support DoD’s Global Information Grid. We are developing a layered framework for an assured cloud consisting of network, virtual machine, storage and data management layers.

AFOSR: A Framework for Assured Information Sharing Lifecycle, $1.0m, 2008-13 PI: M. Kantarcioglu (PI: 2010-3, Co-PI: 2008-10), B. Thuraisingham (PI: 2008-10, co-PI: 2010-3), Co-PIs: L. Khan, N. Berg (School of Social Sciences), A. Bensoussan (School of Management)
The objective is to define, design and develop an Assured Information Sharing Lifecycle that realizes the DoD’s information sharing value chain. We have developed flexible policies for social networks to facilitate assured information sharing. In addition, we developed social network mining techniques to leverage multiple social relationship types. We also developed an evolutionary game theoretic framework to simulate various data sharing scenarios under different incentive and trust models. Our recent work includes mechanisms to give incentives to organizations for information sharing using concepts from the theory of contracts to determine appropriate rewards such as ranking or monetary benefits. Together with our European partners (Kings College London and University of Insubria, Italy), we are using our assured cloud for information sharing experiments.

NSF CAREER: An Integrated Approach For Efficient Privacy Preserving Distributed Data Analytics $400K (2009-2014) PI: M. Kantarcioglu
Organizations need to securely share their private data to execute critical tasks. Due to the limitations of the current approaches, efficient and accurate privacy-preserving solutions are needed for handling large distributed data sets. To address this challenge, we are designing and developing a novel framework where sanitization and SMC (secure multiparty computation) techniques are integrated to develop efficient privacy-preserving solutions under resource constraints. We are using our assured cloud to scale our algorithms.

AFOSR (Young Investigator Program): Automated, Certified, In-lined Reference Monitors, $280K, 2008-2010 PI: K. Hamlen
In-lined Reference Monitors (IRM’s) implement traditional Reference Monitors by injecting runtime security checks directly into untrusted binary code. This facilitates efficient enforcement of application-specific, history-based software security policies in settings where it is undesirable or infeasible to modify the OS. Automated certification applies type-checking, model-checking, and other software verification technologies to formally guarantee that IRM’s are policy-adherent. Such verification allows the producer of the IRM to remain untrusted. We have developed the first fully declarative, aspect-oriented, XML-based IRM policy specification language, a complete formal semantics for the language, and a suite of policy enforcement and policy analysis tools for Java bytecode programs. We are examining ways to apply the results to our assured cloud.

AFOSR: Reactively Adaptive Malware: Attacks and Defenses, $0.5m, 2010-14 PI: K. Hamlen, Co-PI: L. Khan Reactively adaptive malware chooses its mutations strategically by identifying, analyzing, and adapting to signature-matching defenses fully automatically in the wild. Such malware can adapt and mutate in response to new signature databases far more quickly than human analysts can create them, removing the advantage currently exploited by most antivirus products. This project investigates the feasibility of such reactively adaptive malware by using machine learning technologies to augment malware with anti-antivirus defenses in a secure testing environment. We are examining ways to apply our techniques for cloud monitoring.

Infrastructure

Software: A major part of the software component of our cloud is HDFS which is a distributed Java-based file system with the capacity to handle a large number of nodes storing petabytes of data. On top of the file system, there exists the map/reduce engine. This engine consists of a Job Tracker. The client applications submit map/reduce jobs to this engine. The Job Tracker attempts to place the work near the data by pushing the work out to the available Task Tracker nodes in the cluster. We have (are making) the following enhancements to the software infrastructure to support research and education in assured cloud computing with funds from our funded projects. We have devised a number of programming projects for our courses based on the infrastructure we are developing.
Handle encrypted sensitive data: Sensitive data ranging from medical records to credit card transactions need to be stored using encryption techniques for additional protection. Currently, HDFS does not perform secure and efficient query processing over encrypted data. We are addressing this limitation in our research.
Semantic web data management: There is a need for viable solutions to improve the performance and scalability of queries against semantic web data such as RDF (Resource Description Framework). The number of RDF datasets is increasing. The problem of storing billions of RDF triples and the ability to efficiently query them is yet to be solved. At present, there is no support to store and retrieve RDF data in HDFS and we have addressed this limitation.
Fine-grained access control: HDFS does not provide fine-grained access control. Yahoo recently released a version of HDFS that provides access control lists. Unfortunately, for many applications such as assured information sharing, access control lists are not sufficient and there is a need to support more complex policies. This limitation is being addressed in our current work.
Strong authentication: Yahoo version of HDFS supports network authentication protocols like Kerberos for user authentication and encryption of data transfers. However, for some assured information sharing scenarios; we may need public key instruments (PKI) for digital signature support.

Hardware: At UTD we already have substantial hardware to support our research and education in assured cloud computing. Our current hardware includes four major clusters with different configurations.

The first cluster is very small in size and is generally used as our test cluster. It consists of 4 nodes. Each node has a Pentium-IV processor with an 80 GB hard drive and 1GB of main memory.

The second cluster is placed in the SAIAL (Security Analysis and Information Assurance Lab with lab support) and has a total of 22 nodes. All the nodes in this cluster run on commodity class hardware on which Hadoop runs as well. This 22 node cluster has a mixed collection of hardware: 7 nodes have a Pentium-IV processor with 360GB of hard disk space and 4GB of main memory in each of them.

The remaining 15 nodes also have a Pentium-IV processor with about 290GB of hard disk space and 4 GB of main memory in each. The third cluster is also placed in the SAIAL and consists of 10 nodes. Each node in this cluster has a Pentium-IV processor with 500GB of disk space and 4GB main memory.
All these nodes are connected to each other via a 48-port Cisco switch on an internal network. Only the master node is accessible from the public network on each cluster. The fourth cluster to which we have access is the Open Cirrus testbed instrument from HP Labs. We also have 2 solid state disks incorporated into the already existing clusters.

Future Plans

We will continue to introduce units on assured cloud computing into the following courses:

  • Data Mining for Security Applications
  • Computer Systems Security
  • Network Security
  • Language Security
  • Cryptography
  • Privacy

Based on the feedback received from the students we will enhance our course on Secure Web Services and Cloud Computing. This course will evolve into our Flagship course to be offered in 2013.

We are also discussing with our partner Texas Southern University of suitable dates to present our course on Assured Cloud Computing. We had hoped to deliver this course during Summer 2012. Since we could not come up with a suitable date, we are scheduling it for Academic year 2012-2013.

We will make our courses available to partner universities.

We have received funding from AFOSR to enhance our cloud infrastructure. We have also received additional research funding for assured cloud data storage. We will use the results from these projects to enhance our assured cloud computing courses.