Technical details for data integration

Blockmetry produces anonmyized web analytics hit-level data that is stored in customer-controlled databases. This document describes the details of this data integration.

Sending measurements to Blockmetry

Blockmetry can receive measurements through a number of channels:

Blockmetry measurement processing

Once a measurement is received, it is processed in datacenters in the EU (currently Ireland). We use Amazon Web Services (AWS) as our infrastructure provider. This diagram describes the data flow using the webpage integration option, i.e using JavaScript to send measurements to Blockmetry (click for full-size version):

Blockmetry data flow and integration The Blockmetry pipeline is built for high availability, using AWS products such as AWS Lambda. This has many technical and user-facing benefits, such as high availability and user-visible low response times. # Storing hit-level data records in customer databases The final step is to produce the analytics record with all the (anonymized) parameters. The analytics record, called BMRecord, is produced only if the measurement is detected to have come not from a bot/spammer. **This means that not all Blockmetry processing will produce an analytics record**. If produced, the analytics record is stored in a customer-owned data asset such as: * An S3 bucket to store a separate file for each record, * A data pipeline, such as AWS Kinesis which can send the data to Redshift, Elastic, or for further processing. **Blockmetry is given write-only access to these customer-owned data assets**. You really own your data. Part of the privacy promises from Blockmetry is that we cannot read back the analytics data. # Sample Blockmetry record This is a simplified Blockmetry analytics record that is sent to a customer's database, rendered as JSON: ``` { "AnonymitySetID": , "PropertyType": , "Datetime": { "DatetimeUTC": "YYYY-MM-DD HH:MM", "DateUTC": "YYYY-MM-DD", "TimeUTC": "HH:MM", "HourUTC": _H, "MonthUTC": _M, "WeekdayUTC": , "ISOWeekUTC": , "DatetimeLocal": "YYYY-MM-DD HH:MM", "DateLocal": "YYYY-MM-DD", "TimeLocal": "HH:MM", "HourLocal": _H, "MonthLocal": _M, "WeekdayLocal": , "ISOWeekLocal": }, "PageProperties": { "Title": , "URLHost": , "DocumentURL": , "CanonicalURL": , "URLScheme": , "URLPath": , "PathSegment1": , "PathSegment2": , "PathSegment3": }, "DeviceProperties": { "UAFamily": , "UAMajor": , "OS": , "OSMajor": , "DeviceFamily": , "DeviceClass": , "ScreenWidth": , "ScreenHeigh": , "JSEnabled": , "IsGoogleWeblight": , "IsInAppBrowser": , "HostApp": }, "Locale": { "Country": , "CityGeoNameID": , "CityNameEN": , "Languages": { "": , ... }, "TimeZone": }, "Acquisition": { "Referrer": , "ReferrerHost": , "AcquistionSource": , "AcquisitionMedium": , "AcquisitionCampaign": , "AcquisitionTerm": , "AcquisitionContent": , "AcquisitionCampaignID": }, "PerfMetrics": { "NavTimingsRaw": { : , ... }, "NavTimingsComputed": { : , ... } } } ``` Note that not all properties may be populated in the final record. For example, if using a web data integration configuration and the webpage does not have a [rel=canonical](https://support.google.com/webmasters/answer/139066?hl=en) in the `` element (as per spec), then the `PageProperties.CanonicalURL` property will be an empty string. # Data security Blockmetry will write to S3 or AWS Kinesis using HTTPS endpoints, meaning all data is encrypted in transit. The customer is responsible for data encryption, access controls, logging, auditing, etc under the GDPR, for the data once it reaches your data stores.