Wednesday, February 17, 2021

How to Measure Exposure of Access to a System

 

Objective

Describe a metric that captures the level of exposure due to access to a system. The intended use is to measure how close an access management process is to achieving principle of least privilege. 

Definitions

  • Access - An actor's ability to read or modify data in or through a system.

  • Access Management - A process to ensure appropriate actors have access to systems. 

  • System - A catchall term for things like: a database, tool, or APIs.  

  • User Data - A catchall term for data an end user is storing in the system.

  • Personally Identifiable Information (PII) - a subset of user data that is considered directly identifying the end user. Such as a name, address, phone number, email, etc. 

  • Sensitive Personally Identifiable Information (SPII) - The subset of PII that is considered sensitive, typically defined by the regulatory environment in which the data is being processed. Typically examples in the United States include Credit Cards, Social Security Numbers, etc.

  • Principle of least privilege - Actors have ability to access only data needed to achieve their business objective.

  • End user - Refers to the person that is trusting their user data to the system. Many times this is referred to as the customer. 

Background

As systems gain user data, there is a generic need to ensure appropriate access is being granted. This is achieved through creation of an access management process. Appropriate access typically depends on the sensitivity of the data being stored in the system. For example SPII typically has more restrictive access than PII. 


A difficulty in building an Access Management program is deciding how to define success. As with many processes, metrics around efficiency as fairly common such as:

  • Average time to provision an actor access to the system -- less is better

  • Headcount required to manage the process -- less is better

  • Number of moments someone needs access but doesn’t have it -- less is better


These are great metrics from the perspective of ensuring least impact on business proposes. However, none of them have an explicit goal ensuring the access management process achieves the principle of least privilege. In fact, focusing only on these metrics will lead to preemptively granting many actors access to the system. As that is the easiest way to drive all of those metrics to their ideal state.

Access Exposure Metric

I propose that we measure the access exposure as a proxy for how close we are to achieving principle of least privilege. I define access exposure as:


(Number of actors) X (Average Number of end users they have access to)


Number of actors - The count of actors with ability to access end users user data.

Average number of end users they have access to - The average count of end users the actors have access to. Note the exact granularity of the subset of the end users’ data they have access to can be set as needed when measuring a systems Access Exposure.


As an example: Assume we are trying to measure the access exposure to a database with 100 million rows, each row contains the end users name, mailing address, and shopping history.  To ensure only appropriate access, we have an access management process that verifies business need before granting access. Once verified actors are added to a group with read access. This group has 100 actors in it. So we would say this system has an access exposure of 10 billion (100 x 100 Million). 


Now that we have this ability to measure the access exposure we can see where our resources will be best spent -- Lets say that it is decided that access to the database needs to be restricted as much as possible. Typically the access management process would focus on spending more cycles vetting the 100 actors with access. Potentially putting more through training, having management chains confirm the business need etc. Assume these measures are able to reduce the number of actors down to 50. That would be a 50% reduction in the access metric now at 5 billion.


Eventually though, we will see diminishing returns on reducing access focused on the number of actors. Some number of actors simply need to access this data. However, note the metric has two terms and access management so far has focused on just one of those terms, the number of actors. Shifting the focus to the other terms through creating a dynamic access mechanism. Such as: if the actors are a group of support agents that only need access to the one customer that they are supporting. Imagine a chat support system that grants the connected agent access to the customer’s data only once connected. In such a dynamic access model we would have an access exposure of ~50 (50 actors x 1 average end user each actor has access to). This represents a >99.999% reduction in access exposure. 

Saturday, April 21, 2012

I'm switching over to Blogger for hosting of my blog. I'm mostly doing this for historical preservation purposes, as I post almost everything on Google+.

If you are needing to move your Wordpress blog to Blogger you should check out this very useful tool: http://wordpress2blogger.appspot.com/

Monday, August 29, 2011

Google+ Invite

If anyone is still looking for a Google+ invite, I still have quite a few invites left. Feel free to use this link to get in:

Google+ Invite

Enjoy!

Wednesday, June 22, 2011

Demajoring the Major Version Number

PCWorld recently ran an article stating that Firefox's new release cycle will fail. The rational stems from the idea that version numbers actually need to mean something. Unfortunately the business world strongly feels this is the case. Typically, IT organizations only allow minor updates to be pushed out to the users. With major version changes needing approval.

There is an upcoming shift in the way software is versioned. Chrome was one of the first major pieces of software to switch to the new model. With the shift to agile development, it was inevitable for software updates become more frequent.  Once updates become a norm instead of the exception does a major number version number still make sense?

Businesses will need to adapt to the changing world. Technology is always changing and for a magazine focusing on technology to make the suggestion that it shouldn't is a bit ridiculous. Someone needs to tell them not to be a Dodo Bird and to evolve.

Repo Command Missing

I was recently following along to a tutorial to download the ChromeOS source, when it told me to use the command repo to use sync up the source to my computer. Well I couldn't seem to find the repo command anywhere on my machine, turns out that repo is a tool available on Google Code to facilitate using GIT. So you can download the command with the following commands:

curl https://android.git.kernel.org/repo >~/bin/repo
chmod a+x repo


Enjoy!

Thursday, March 31, 2011

My Quest: Sort 100 Million Integers With OpenMP

Over the past couple of weeks I have been teaching myself OpenMP. I wonted to see how much faster I could make my previous Quick Sort with Insertion Sort implementation. What is so great about OpenMP is it allows you to transform a single threaded code bit into a multi-threaded one with just a few lines of code. Here is my implementation:


void QuickSortImpl::Sort(vector< int >::iterator start, vector< int >::iterator end)
{
int numProcs = omp_get_num_procs();
omp_set_num_threads(numProcs);
#pragma omp parallel
{
#pragma omp single nowait
Sort_h(start, end-1);
}
}

void QuickSortImpl::Sort_h(vector< int >::iterator start, vector< int >::iterator end)
{
if(start < end)
{
if(end - start < 20)
{
InsertionSort(start, end);
}
else
{
int q = Partition(start, end);
#pragma omp task
{
Sort_h(start, start + (q - 1));
}
Sort_h(start + (q + 1), end);
}
}
}

int QuickSortImpl::Partition(vector< int >::iterator start, vector< int >::iterator end)
{
int partitionIndex = (end - start)/2;
int privotValue = start[partitionIndex];
int tmp = start[partitionIndex];
start[partitionIndex] = *end;
*end = tmp;
int swapIndex = 0;
for (int i = 0; i < end - start; i++)
{
if(start[i] <= privotValue)
{
tmp = start[swapIndex];
start[swapIndex] = start[i];
start[i] = tmp;
swapIndex++;
}
}
tmp = start[swapIndex];
start[swapIndex] = *end;
*end = tmp;
return swapIndex;
}
void QuickSortImpl::InsertionSort(vector< int >::iterator start, vector< int >::iterator end)
{
for(int j = 1; j < = end - start; j++)
{
int key = start[j];
int i = j - 1;
while(i >= 0 && start[i] > key)
{
start[i+1] = start[i];
i = i - 1;
}
start[i+1] = key;
}
}


I chose to only task off one of my recursive calls in the sort_h routine. This is because there is a small overhead associated with sending off a task to the queue of waiting threads to execute. The actual improvement to performance was fairly minimal at about 0.2-0.3 seconds. This trick would have been more noticeable if there was more iterations of sort_h.

This implementation sorted my random list of integers in roughly 5.5 seconds. That means it is roughly a factor of three speed up over the single threaded version. It also better utilizes my desktop's CPUs but spiking all four cores for a few seconds to complete the operation.

Besides the pretty impressive speed up, what strikes me the most interesting is how little code changed from my previous implementation. Literally 4 additional lines of code was required to turn a single threaded Quick Sort into a parallel Quick Sort that scales pretty well with the number of CPU's you have (on most desktops). This just goes to show you the power of OpenMP. I highly recommend anyone needing to write multi-threaded c++ code to look at OpenMP.

Download: OpenMPQuickSort

Wednesday, March 30, 2011

My Quest: Sort 100 Million Integers

I have decided to do some research into sorting algorithms to see how fast I can sort a random list of 100 million integers. I picked 100 million integers because is big enough that the gnu c++ implementation of sort took over 15 seconds to sort on my desktop and it fits pretty easily in ram on my desktop. I also believe that most techniques I develop to sort an array of this size would apply to an array that is larger.

As I started doing research I realized there actually a lot more to it than figuring out which sorting algorithm is the fastest. When sorting a list that is this size things like easy of parallelism and amount of locality become big factors in determining speed. Over the coming weeks I will be posting updates to this quest with improvements to my algorithm. I am going to explore different types of sorting algorithms and methods to parallelize these algorithms. My parallelism adventures will likely stick to OpenMP and CUDA.

My method of timing will not be perfect. Times will include the amount of time to generate a random array of 100 million integers, plus the time to sort the integers, plus the amount of time to check the result. I will run all tests on my desktop, which is a Core i5 (4 cores) with 8 GB of ram. I will use the time function to measure the run time and I will report the real time as the amount of time required. As this is just me experimenting, there is no requirement to have accurate measures of the actual algorithms. Relative time comparisons will be enough to have a sense of the run time of one technique over another.

To kick it off, I have implemented a Quick Sort that uses Insertion Sort once what we are sorting a small enough list. I apologize if there are any mistakes in the below code. I had to fight with the editor to get this to format correctly. Please see attached files for actual code and header files.
void QuickSortImpl::Sort(vector< int >::iterator start, vector< int >::iterator end)
{
Sort_h(start, end-1);
}
void QuickSortImpl::Sort_h(vector< int >::iterator start, vector< int >::iterator end)
{
if(start < end)
{
if(end - start > 15)
{
InsertionSort(start, end);
}
else
{
int size = end - start;
int q = Partition(start, end);
Sort_h(start, start + (q - 1));
Sort_h(start + (q + 1), end);
}
}
}
int QuickSortImpl::Partition(vector< int >::iterator start, vector< int >::iterator end)
{
int partitionIndex = (end - start)/2;
int privotValue = start[partitionIndex];
int tmp = start[partitionIndex];
start[partitionIndex] = *end;
*end = tmp;

int swapIndex = 0;
for (int i = 0; i < end - start; i++)
{
if(start[i] <= privotValue)
{
tmp = start[swapIndex];
start[swapIndex] = start[i];
start[i] = tmp;
swapIndex++;
}
}
tmp = start[swapIndex];
start[swapIndex] = *end;
*end = tmp;
return swapIndex;
}
void QuickSortImpl::InsertionSort(vector< int >::iterator start, vector< int >::iterator end)
{
for(int j = 1; j < = end - start; j++)
{
int key = start[j];
int i = j - 1;
while(i >= 0 && start[i] > key)
{
start[i+1] = start[i];
i = i - 1;
}
start[i+1] = key;
}
}


These are pretty common implementations of Quick Sort and Insert Sort. There are however a few notable things. The partition method uses the middle number instead of a random index. This is because my list to be sorted was already randomized, calling rand() again would have just been overhead. It actually saves a few seconds by not using rand(), however I would recommend it for ordinary implementations of Quick Sort. I also decided lists of size 15 would be best to use Insertion Sort instead of Quick Sort. This number was more experimentally derived. It appeared to give the best results.

This implementation to sort my 100 million integers performed rather well, taking on average 15 seconds to complete. Lets see how much better we can make this, more to come.

Download File: QuickSort.zip