Dataguise is a data security company that enables organizations to identify, locate and secure their sensitive data. Manmeet Singh and Adrian Booth co-founded the company in 2007 when they realized that while most companies tend to focus on external threats to their data, a bigger danger resides within the firms themselves. In an interview with India Knowledge at Wharton, Singh points out that, using a device as small as a flash drive, employees can easily walk away with thousands — or millions — of a company’s confidential records. Singh, who holds a master’s degree in computer science and has earlier worked for companies including Oracle, HCL Technologies, Miri Infotech and Zeneb, says in the coming years, data and security will no longer been seen as two different aspects and will be merged together.
An edited transcript of the conversation follows.
India Knowledge at Wharton: Data has been around for a long time so why is data security attracting so much attention now?
Manmeet Singh: Data has been around for hundreds of years. If you go back and look, people have been collecting data one way or the other [since then]. But in the last 10 years, data has been revolutionized because the storage [of data] has become cheaper. People are now collecting data to spot trends — to find out what people are doing at present, and what they want to do in the future. With the proliferation of data, one key problem is that it is tough to define what [characterizes] sensitive data, or the data that is subject to compliance regulations, and which [types of] data you can share with other people. There is a need to protect the sensitive data and hence there is a lot of attention from everybody including [venture capitalists] and entrepreneurs.
India Knowledge at Wharton: At a very basic level, can you explain what is the difference between sensitive and non-sensitive data?
Singh: Sensitive data is determined by regulatory requirements and corporate security policy. There are specific mandates such as PII [Personally Identifiable Information] from the government and PCI [Payment Card Industry] from companies like Visa and MasterCard, regarding the protection of Social Security numbers of individuals, their addresses, zip codes, credit card numbers [and] health records. In financial information, you can still put in some checks and balances, but when you lose health care data [for example], the illness of a person is disclosed to others and you cannot take it back. You cannot change the fact that others now know this information.
India Knowledge at Wharton: Looking ahead, where do you see the biggest threat coming from with respect to data breaches?
Singh: If you look at all the data security companies that have come into the picture to date, they have been protecting data from the outside — the peripheral, the external threat. Everybody’s worried about some hacker getting into the system and taking the data out. That threat is there, but there are thousands of employees working within a company that can walk away with your data today. They are the internal threat: Your developers, your testers, your contractors, who are sitting within the company, have access to non-production data [during development, testing, quality assurance, etc.], which is [significant information]. And companies are not doing anything about it because they think they’re safe just because the employees have signed a letter saying, “I’ll protect my data.” Today, with something as small as a flash drive, internal people can copy your data — thousands and thousands, and even millions of records — and walk away.
India Knowledge at Wharton: Can you share a specific example of a data security breach that highlights this issue?
Singh: People are willing to talk about external threats, but very little information is available in the market regarding internal threats because people don’t talk about them. Our products have been bought by companies, educational institutions [and] financial institutions that have lost a lot of data internally, but they don’t talk about it.
India Knowledge at Wharton: How did you get the idea for setting up Dataguise?
Singh: I’ve been in the data business and doing the architecture of data since I graduated from college in 1990, and have seen since then the increase in data proliferation. Initially, 1GB of data [was considered to be] a lot. Then it became 100GB and then [terabytes] of data [or 1,000GB]. I saw this problem of data security going from small to big. People talk about databases and security in the databases, but there is no real marriage of the two. People are thinking of two different things. But that’s not the way to look at it. In the next five years, data and security are going to be merged and everybody will start focusing on it.
I co-founded the company [with Adrian Booth] in 2007. We put our own money into it, but came to a point where we were hitting a wall. We needed more money. We needed the company to grow bigger because we realized that we had to go across databases. We had to go across Oracle, DB2, Sideways, Teradata. We went to an Oracle OpenWorld event and started talking to investors. One of them, Herb Maden, got very interested in the company. He’s been an investor as well as a guide and mentor for us. We raised a total of US$3.2 million from Herb and other friends and family.
India Knowledge at Wharton: Can you explain what exactly Dataguise offers?
Singh: We offer an enterprise solution for proactive data privacy, detection, protection and management. What we are offering is to protect the data. We first discover the data and tell you where it’s located on the network. We then identify the sensitive data; for example [we] tell you which data relates to PCI, HIPAA [the U.S. Health Insurance Portability and Accountability Act, which established national standards in the U.S. for electronic health insurance transactions and other privacy protections], PII or custom, and then mask it to protect it.
India Knowledge at Wharton: What does “masking” mean?
Singh: Masking is something that is very new. People sometimes confuse it with encryption, but it is completely different from that. Masking works like this: Your Social Security number, your name and address — these three things make you personally identifiable [to obtain] a credit card or for anything else. For applications to work [effectively] data has to be realistic and people need to be able to see it. But all people don’t always need to see the real information. So what we do is we move data from production to non-production classification. We change the Social Security number of an individual to a different Social Security number and similarly with ZIP code and address. It sounds simple but it is not — the alteration has to be consistent and in certain cases, it has to be unique and synchronized. And it has to be the same across databases. So that’s what masking is — completely obfuscating the data, changing the look and feel of the data, but keeping the value and everything [else] the same so that all your financial applications still run. You can also continue to develop and test [using] this masked data.
India Knowledge at Wharton: What about other companies in this field – are they not also looking at masking?
Singh: They are doing masking. But their masking approach is different from ours. They are all archiving companies. They take the data out, mask it, and then they put it back into the databases. They need huge storage [infrastructures] and since data is backed up, leakage is also possible. What we do is “in-place” masking.
India Knowledge at Wharton: Can you elaborate on the process a bit more?
Singh: … If we have to mask 100GB of data we don’t need 200GB of space to mask it. We use the relational power of the database. Before we even get to the masking stage, we use our discover tool, called DgDiscover, to find what people need to mask and what is the PII and PCI relevant information. Our ability to leverage the inherent advantages of database technology has enabled us to transform data masking, turning what used to take months and weeks into hours and minutes. We also make the data migration to the cloud easier.
India Knowledge at Wharton: Dataguise currently has two products — DgDiscover and DgMasker. You spoke about DgDiscover briefly. Can you also explain what DgMasker does?
Singh: As I mentioned [earlier], for every copy of production data in the typical enterprise, there may be dozens of non-production copies. Non-production data serves a variety of purposes, including offshore or outsourced development, testing, support, and quality assurance. DgMasker transparently masks sensitive information in application data sets with next generation masking-in-place technology. DgMasker generates data that meets the business logic requirements of downstream applications, while automatically preserving relational integrity of data sets.
India Knowledge at Wharton: Apart from DgDiscover and DgMasker are you planning to come out with any other products?
Singh: In the first quarter of this year we are announcing some new products — Discover for SharePoint and a DgDashboard. DgDashboard will be a great differentiator for the company as the security and the IT personnel will get a holistic view of the IT and the security issues. This will have open APIs and will take feed from any existing system.
India Knowledge at Wharton: Apart from the enterprise customers, Dataguise is also trying to provide services to the U.S. government. Can you tell us how you got into the government space?
Singh: Alan Thompson, the executive vice president of operations for our company, had federal experience before he joined us and during my discussions with him, I realized that the [U.S.] government has a huge amount of data — IRS, INS, security, terrorist, financial accessing and other information that they would like to protect. So we decided to focus on the federal space also. Sales cycles for government procurements typically have a long lead time and I expect to see federal business coming in from this year onwards.
India Knowledge at Wharton: What about masking data in gaming, online social networking, retail and e-commerce? Do you see big opportunities in these areas?
Singh: Gaming and social media are yet to evolve as a market for our business. Until the government mandates a certain kind of regulation in these areas, I don’t think companies in these sectors will look at masking data. The biggest markets I personally see are financial, health care, retail, insurance and education. Universities, specifically, are going to be a big market since they have a huge amount of student information.
India Knowledge at Wharton:Are you also planning to look for customers in other markets outside of the U.S.?
Singh:We are planning on expanding, but not until 2012. For this coming year, our focus is on North America as well as the U.S. federal markets.
India Knowledge at Wharton: Moving to a personal note, do you come from an entrepreneurial family background?
Singh: No, not at all. My father retired as an officer from the Delhi [government] administration and my mother is a school teacher.
India Knowledge at Wharton: What has been the biggest challenge for you as an entrepreneur?
Singh:Initially, the biggest challenge I had was finding senior technical people who had the right mixture of database and security expertise. Despite the fact that there has been quite a bit of recent hype surrounding database security, I found it tough to effectively recruit people with the right blend of expertise — especially when it comes to masking, which conceptually relates more to the database side then the security side.
I used my own connections to find these people. [They were] people that I or the other founders had known or worked with in the past. LinkedIn and other professional social networks at the time were not as powerful or evolved as they are today, so our efforts consisted of culling through hundreds of business cards, Rolodexes and e-mails, and targeting the people I thought would be the best fit. Then came the challenge of selling them on Dataguise and convincing them to join the company.
The first year we did not have a lot of money, so retaining technical talent was tough. It was no easy feat to convince them to leave companies such as Oracle and VMware. Once we raised some money, it became easier to recruit the needed talent and I was able to hire developers, architects and product managers. However, I had to then raise more money, which was challenging given the economic climate of the past two years. Next I needed a sales team, which put me in an area where I did not have any connections. In order to find good sales people, I had to use headhunters.