Page123
Inference and Aggregation
Inference and aggregation occur when a user can use lower-level access to learn restricted information. These issues occur in multiple realms, including database security.
Inference requires deduction: there is a mystery to be solved, and lower-level details provide the clues. Aggregation is a mathematical process: a user asks every question, receives every answer, and derives restricted information.
Learn by Example
Pentagon Pizza Inference
The United States Pentagon ordered a lot of pizza on the evening of January 16, 1991, far more than normal. The sheer volume of pizza delivery cars allowed many people without United States Military clearances to see that a lot of people were working long hours, and therefore infer that something big was going on. They were correct; Operation Desert Storm (aka Gulf War I) was about to launch: “Outside of technology, Maj. Ceralde cited an example of how ‘innocuous’ bits of information can give a snapshot of a bigger picture. He described how the Pentagon parking lot had more parked cars than usual on the evening of January 16, 1991, and how pizza parlors noticed a significant increase of pizza to the Pentagon and other government agencies. These observations are indicators, unclassified information available to all, Maj. Ceralde said. That was the same night that Operation Desert Storm began” [30].
Inference requires deduction: clues are available, and a user makes a logical deduction. It is like a detective solving a crime: “Why are there so many pizza delivery cars in the Pentagon parking lot? A lot of people must be working all night … I wonder why?” In our database example, polyinstantiation is required to prevent the manager from inferring that a layoff is already planned for John Doe.
Aggregation is similar to inference, but there is a key difference: no deduction is required. Aggregation asks every question, receives every answer, and the user assembles restricted information.
Imagine you have an online phone database. Regular users can resolve a name, like Jane Doe, to a number, like 555-1234. They may also perform a reverse lookup, resolving 555-1234 to Jane Doe. Normal users cannot download the entire database: only phone administrators can do so. This is done to prevent salespeople from downloading the entire phone database and cold calling everyone in the organization.
Aggregation allows a normal user to download the entire database and receive information normally restricted to the phone administrators. The aggregation attack is launched when a normal user performs a reverse lookup for 555-0000, then 555-0001, then 555-0002, etc., until 555-9999. The user asks every question (reverse lookup for every number in a phone exchange), receives every answer, and aggregates the entire phone database.
Inference and Aggregation Controls
Databases may require inference and aggregation controls. A real-world inference control based on the previous “Pentagon Pizza” learn by example would be food service vendors with contracts under NDA, required to securely deliver flexible amounts of food on short notice.
An example of a database inference control is polyinstantiation. Database aggregation controls may include restricting normal users to a limited number of queries.
Data Mining
Data mining searches large amounts of data to determine patterns that would otherwise get “lost in the noise.” Credit card issuers have become experts in data mining, searching millions of credit card transactions stored in their databases to discover signs of fraud. Simple data mining rules, such as “X or more purchases, in Y time, in Z places” can be used to discover credit cards that have been stolen and used fraudulently.
Data mining raises privacy concerns: imagine if life insurance companies used data mining to track purchases such as cigarettes and alcohol, and denied claims based on those purchases.
Data Analytics
Data analytics can play a role in database security by allowing the organization to better understand the typical use cases and a baseline of what constitutes typical or normal interaction with the database. Understanding what normal operations looks like can potentially allow the organization to more proactively identify abuse from insider threats or compromised accounts. Given the rather high likelihood that significant and/or sensitive data is housed within a database, any tools that can improve the organization’s facility for detecting misuse could be a significant boon to security.