What is a dataset? Part 2

In an earlier post I introduced the concept of datasets and how they are becoming more valuable via crowd sourcing tools. Tech guru Tim O’Reilly has suggested that, “Data is the next Intel Inside,” meaning that the next major commodity in our economy will be specific bodies of information or data. Here’s the money quote from the O’Reilly post:
“The race is on to own certain classes of core data: location, identity, calendaring of public events, product identifiers and namespaces. In many cases, where there is significant cost to create the data, there may be an opportunity for an Intel Inside style play, with a single source for the data. In others, the winner will be the company that first reaches critical mass via user aggregation, and turns that aggregated data into a system service.”
The goal of the Tagged Tanakh is to create such a dataset of information around the Jewish Bible and then produce services that enhance the experience of text study. JPS wants to shepherd this discrete body of data and serve as a hub for the multitude of digital Jewish resources that connect to the Bible and related texts (~ahem~ On1foot, MediaMidrash, OpenSiddurProject).
What would happen if JPS sponsored a contest to use data visualization tools to help generate new biblical commentaries? Or imagine when the mitzvah projects for all of the Bar/Bat Mitzvahs in 2012 contribute to a crowd-sourced music video of 7th graders rapping the Bible. New methods for engaging Jewish text are at our fingertips with a dataset of the Tanakh.
As Bill Moyers put it: “The more each of us knows and understands (the Bible), the better our chances for living purposeful lives, creating strong families, building solid communities, and forging a more tolerant and vibrant democracy … together.”
So to answer the question, what is a dataset? The answer for some is money, for others it is power, and for us it is a conversation.


I agree with most of what
I agree with most of what you’re saying about crowdsourcing and datasets. I don’t think that closed data on closed platforms will make it a reality. Open standards, open platforms, and open sharing arrangements are the only way it will work. Otherwise, you just end up creating smaller incompatible (by legal or technical constraints) data-fiefdoms.
Web 1.0 is instructive in both how it can go right and how it can go wrong. The fact that the Internet/web exists as-is and that nearly every platform can access it on identical terms is quite amazing. On the other hand, remember the bad old days of the “designed for Internet Explorer”/”designed for Netscape” web?
Open Is The Way
I 100% agree, open platforms and sharing arrangements is the way of the future - at least for success unless you’re a major player IMO.
Microsoft believe data = $$
Just saw this and had to share:
http://gigaom.com/2009/11/17/microsofts-future-lies-in-software-and-data...
Looks like the Tagged Tanakh could benefit from the DALLAS data collection possibly.
thanks for the link. quite
thanks for the link. quite interesting.
really? thanks for the
really? thanks for the explanation.
Open source vs otherwise
One can certainly do more with open source, but so far there arn’t many credited player in the open source area.