DataStori is a SaaS product which automates the ingestion of data from cloud applications (ERP, CRM, HR, finance and others) to a central data store. DataStori ensures the availability of regular, clean and reliable data to businesses, which is the first step in data-driven decision-making.
DataStori is used by large and mid-market enterprises worldwide to understand their data and make the best use of it to achieve their business goals.
DataStori offers major benefits in functionality, security and pricing over other data connectors.
- DataStori creates data pipelines in real-time by reading source API documentation. Other tools have a library of connectors, with processes and lead times to add new ones. This makes it possible for customers to access thousands of applications which are not served by other connectors.
- DataStori executes data pipelines in the customer's cloud. Data source and destination are both in the customer's cloud, and data never leaves their environment. This ensures that data handled by DataStori is always in compliance with the customer's data security and privacy policies.
- DataStori has transparent and cost-effective pricing. Because it runs serverless, DataStori spins up and shuts down infrastructure on-demand, keeping costs low. By executing pipelines in the customer's cloud unlike other connectors, DataStori eliminates an extra data hop through its cloud. This is a significant benefit both on cost and data security.
A data pipeline is a component to copy specified data from a source to a destination using an integration. For example, a data pipeline can be built to copy the General Ledger table from NetSuite (source) to Azure SQL (destination). The integration specifies the data copy and automation parameters including data deduplication, pipeline run schedule, source columns, data backload and many others.
DataStori orchestrates data pipelines from its cloud (AWS US East-1 region) but executes them in the customer's cloud. Data source and destination are both in the customer's cloud, and data never leaves their environment. This ensures that customer data handled by DataStori is always in compliance with the customer's data security and privacy policies.
DataStori creates a Lakehouse in the customer's cloud and follows the Medallion architecture for data management. Files are written in the delta format and pushed to a data warehouse of the customer's choice, e.g., Azure SQL, Snowflake, PostgreSQL or any SQL Alchemy supported database.
Users can consume the ingested data from the Lakehouse or from the data warehouse. In addition to delta, DataStori can store data in Iceberg, Parquet or CSV formats in the Lakehouse.
DataStori performs a limited set of data transformations. It dedupes and flattens incoming data and encrypts user-specified columns.
This makes the data ready for enrichment, AI based querying, custom analytics, reporting and any other end-use that the customer needs it for. These business processes are all downstream of DataStori and not part of the product.
No, customers do not need to buy any IT infrastructure upfront to run DataStori. They need to set up a cloud account with AWS, MS Azure or GCP to provision servers and storage, and security and other services to comply with their data policies. All these components are directly licensed by the customer from the cloud services provider.
DataStori is built using serverless architecture. It spins up servers and other components in the customer's cloud to run pipelines and shuts them down after execution. This ensures that the provisioning matches demand, with minimal fixed cost when pipelines aren't running.
No, DataStori is designed to scale server and storage infrastructure on-demand. It has run production data pipelines on tables as large as 100 GB and 30 million rows. Pipelines in DataStori can be configured and scheduled to backload multi-year data.
The only constraints to data load throughput are the rate limits imposed by source APIs or database connections. Breaking them down into smaller datasets resolves the matter.
Users can set up and run as many data pipelines as they want. The number of data pipelines is only constrained by the parameters defined by the user's cloud services provider.
DataStori charges on the number of application instances connected. This fee has two components - one-time onboarding and monthly licensing.
DataStori does not charge users on the volume of data ingested or the number of pipelines created and executed - these are part of the infrastructure costs that customers directly pay their cloud services provider.
At all times, customer data resides and moves within their environment - source applications, SharePoint, email, SFTP folders, and destination storage.
DataStori orchestrates data pipelines from its cloud (AWS US East-1 region) but executes them in the customer's cloud. Data source and destination are both in the customer's cloud, and data never leaves their environment. A further level of security is that DataStori can encrypt user-specified columns from a data source or drop them from the final output.
All this ensures that customer data handled by DataStori is always in compliance with the customer's data security and privacy policies.
Data sources include application APIs and databases, emailed CSV files and SFTP folders.
Yes, users need to allow DataStori access to their:
- Cloud infrastructure to spin up servers and other components
- Source application APIs or database from where data is to be ingested.
All your credentials are secure with DataStori. Application API tokens are encrypted using AES 256 and stored in the application database. They cannot be read by the DataStori admin or anyone else.
DataStori doesn't need any credentials to the destination storage, because the required permissions are assigned to the servers spun up for pipeline execution.
Security elements in DataStori include data encryption, virtual network, multi-factor authentication, detailed alerts and logging.
No, DataStori cannot view business data. While DataStori orchestrates data pipelines from its cloud, the data movement from source application to storage destination is entirely in the customer's cloud. DataStori can only create and access the metadata on pipeline setup and execution.
DataStori runs the following checks on all ingested data for every pipeline execution:
- Data freshness test, to check when the data was last refreshed
- Primary key not null test
- Primary key uniqueness test, to ensure that there are no duplicates in the primary key.
In addition, DataStori has automated retries, logging and alerts to make data pipelines more robust.
By default, all data pipelines in DataStori have a concurrency of 1, i.e. at a given time only one instance of a pipeline can run. All the other triggered instances of that pipeline are queued. In addition, the output data is saved in delta format and supports ACID compliance.
In DataStori, schema evolution and tracking are automated. In addition, data and schema changes can be rolled back to a defined restore point if required.
DataStori supports the following API authentication mechanisms:
1. API Key
2. Basic Authentication
3. OAuth2 - Client Credentials and Authorization Grant flow
In addition, DataStori can be extended to support custom authentication flows that a source application may have implemented.