Frequently Asked Questions
Q: What is Data Commons?
Please see About Data Commons.
Q: What is the purpose of India Data Commons?
India Data Commons is an effort to highlight the India-specific data in the ever-growing Data Commons
knowledge graph. Indian data incites a lot of interesting insights and questions, and India Data
Commons features those insights in an user-friendly way. More about India Data Commons on our
About page
Q: What is the difference between Data Commons and public dataset projects like Dataverse,
Kaggle datasets, Google Big Query Public Datasets, Dataset search etc.?
These collections of datasets provide a great service by aggregating topical
open data sets. However though the data is open, using it to answer specific
questions often involves tedious 'foraging' --- finding the data, cleaning the
data, reconciling different formats and schemas, figuring out how to merge data
about the same entity from different sources, etc. This error prone and tedious
process is repeated, once (or more) by each organization. This is a problem in
almost every area of study involving data, from the social sciences and physical
sciences to public policy.
Data Commons is an attempt to ameliorate some of this tedium by doing this once, on a large scale and providing cloud accessible APIs to the cleaned, normalized and joined data. While there are millions of datasets and it will be a while before Data Commons includes a substantial fraction of them, in every domain, some collections of data get used more frequently than others. We have started with a core set of these in the hope that useful applications can be built on top of them.
Q: What is the difference between Data Commons and
Wikidata?
The focus in Data Commons is on aggregating external, already available data (with an emphasis on statistical data) from government agencies and other authoritative sources, as opposed to creating a corpus of structured data from scratch.
Q: What is the relation between DataCommons.org and Schema.org?
DataCommons.org builds upon on the vocabularies defined by Schema.org,
with additional terms defined to cover concepts (e.g. "citizenship") that are important to the
data in Data Commons but which have not been a priority
for Schema.org-based Web markup. The Data Commons schemas constitute an "external extension" to
Schema.org, similar to that provided by GS1.
Some schemas could migrate into Schema.org if the community find value in them.
Q: What are the usage rights of the data in Data Commons?
Data Commons knowledge graph, and the compilation of the datasets is licensed under CC BY.
The Data Commons REST API and the R, Python Libraries are released under Apache License 2.0.
The data included in Data Commons Graph come from different sources. The source of the
data (provenance) is provided for all the data.
Provenance includes the URL of the source of the data. While effort is made to obtain data from
sources which offer unrestricted usage of underlying data, terms of use of this data may be
subject to different licenses and terms of use as specified in the URL of the provenance.
Q: How can we access data in Data Commons?
The data in knowledge graph can be accessed through the India Data Commons Knowledge Graph and APIs for Python, REST and Google Sheets.
Q: How can we add our own data to knowledge graph?
Data Commons is intended to be a community project and seeks your involvement.
To know more about publishing data that can be included into India Data Commons, check out our
documentation section. You can also contact
datacommons@rbcdsai.org
if you have an interesting dataset that you think should be included in India Data Commons and would
like to help.
In the future we plan to allow users to ingest data into the Data Commons Knowledge Graph using
an upload tool. We will update the community when this functionality is released.
Q: What does per capita mean in the Time Series tool?
Different variables are measured over different populations. For example,
the number of people with gender equal male (in a given place) is
measured over the population of all people. On the other hand, the number
of people whose educational attainment is High School is measured over
the set of people whose age is 25 years or higher. Depending on the
variable, the per capita calculations are done over the population over
which the measurement was done.
Q: How long will you store the data for?
Data Commons is not an archival service. We collect the data, build the knowledge graph and
provide access to the Graph.
As with any website, long term storage and safekeeping of the data is the responsibility of the
primary publisher.
Q: Where can I download all the data?
Given the size and evolving nature of the Data Commons Knowledge Graph, we prefer you access
it via the APIs.
If your project needs local access to a large fraction of the Data Commons Knowledge Graph, please contact
datacommons@rbcdsai.org
Q: How much does this service cost to use?
The public data in the Data Commons Knowledge Graph is hosted on Google Cloud platform by
Data Commons and is made available for users. There is no cost for data itself, when it is
publicly available for free.
The usage limits for the service beyond free tier quota will be in line with pricing of Big Query Public dataset
program.
In the future when more data is added to the knowledge graph by users - just like the Web, we
expect some data to be free, some data to be private, and some data may have an associated cost
to access.
Q: How do we know if the data is accurate?
Data Commons provides an access mechanism to data and makes no
commitment on the accuracy of the data.
Answers to queries will include the provenance (source of the data). Choice of which data to
use, based on source, is in developer's control. There may be
errors in cleaning, etc. of the data. If you find something you
think is in error, we would love to hear from you.
Q: How often is the data refreshed?
Different data sources refresh at different frequencies. We try to keep the
data updated as the sources publish new versions of their data.
Q: What are the SLAs / Performance levels we can expect?
The service is provided on an as-is basis with no SLA or commitments on
availability or
uptime.
Q: How do I cite datacommons.org?
To cite charts and tools on this site, please use the following format.
Data Commons 2024, Data Commons, viewed 8 Dec 2024, <https://datacommons.org>.
If citing data from a particular dataset, e.g. CDC Places, then use:
Data Commons 2024, CDC Places, electronic dataset, Data Commons, viewed 8 Dec 2024, <https://datacommons.org>.
In both cases, please use the date you viewed the site (in the examples above, we used 8 Dec 2024).
Q: I have a question / feedback. Whom do I contact?
You can post your question on the
GitHub forum or contact
datacommons@rbcdsai.org