From NYC Facets
What is NYCFacets?
- A: NYCFacets is a Smart Open Data Data Exchange for All Data NYC. In it first iteration, we've catalogued all the datasets in NYC Open Data as exposed by Socrata. As we go forward, we will also catalog and correlate additional NYC-related datasources (both structured and unstructured) including external databases, Web 2.0 APIs, Linked Data sources, Applications and Websites. And from this comprehensive metacatalog, we will expose and publish Federated feeds.
Smart Open Data Exchange?
- A: We just don't catalog the metadata for each datasource. We squeeze additional metadata - extrametadata as we call it, and correlate all the datasources to allow Open Data Users to see the "forest for the trees". Or in the case of NYC - the "city for the streets"? (TODO: find urban equivalent of "See Forest for the Trees")
- The "Smart" comes from a process we call "Crowdknowing" - leveraging metadata + extrametadata to score each dataset from various perspectives, automatically correlate them, and in the near future, perform semi-automatic domain mapping.
Extrametadata?
- A: Derived Metadata - Statistics (Quantitative and Qualitative), Ontologies, Semantic Mappings, Inferences, Federated Queries, Scores, Curations, Annotations plus various other Machine and Human-powered signals through a process we call "Crowdknowing".
Crowdknowing?
- A: Human-powered, machine-accelerated, collective knowledge systems cataloging metadata + derived extrametadata (derived using semantics, statistics, algorithm and the crowd). At this stage, the human-powered aspect is not emphasized because we found that the NYC Data Catalog community is still in its infancy - there were very few comments and ratings. But we hope to help improve that over time as we crawl secondary signals (e.g. votes and comments in NYCBigApps, Challengepost and Appstores; Facebook likes; Tweets, etc.).
What kind of information will you expose in the NYCFacets Smart Open Data Exchange?
- A: For NYCBigApps 3.0, we did an extensive treatment of the NYC Open Data catalog. Not only can you ask things about the dataset metadata, we also sampled each dataset and extracted top values and computed some basic statistics and implemented some inferencing rules. (As of February 14, 2012, we had about 1.3 million facts/triples in our datastore).
- In 1Q 2012, we will also expose some Query Federation capabilities, Federated Feeds (across datasets inside NYC Open Data and external sources like the 2010 US Census and Linked Data Web) and a SPARQL endpoint.
- We will also collate and catalog resources that make use of these datasources and feeds - apps across all platforms, websites, etc. so it also becomes a discovery mechanism for finding apps.
- See our Roadmap for more information.
What problem are you trying to solve?
- SHORTEST ANSWER: We answer the question - "How do I make sense of all this Data?" - an oft-repeated question asked by many developers during the NYCBigApps meetup...
- SHORT ANSWER: Data is the lifeblood of the Information Economy. And in this day and age of pervasive connectivity, Google, and Everyone as a Publisher, we have long crossed the threshold from Information Scarcity to Information Overload.
- And that's why Data Transparency alone is not enough. For exposing more data actually exacerbates Overload.
- NYCFacets aims to help Open Data users, primarily Developers and Publishers, navigate All Data NYC as they build the Digital City of the Future.
- LONG, BUZZWORD-COMPLIANT, ANSWER: Pediacities aims to create an Smart City Open Data Exchange using open standards. We aim to create a hyperlocal "River of Data" - a "Data Subway" of sorts, whereby, city data innovators can build interchangeable solutions using their preferred technology stack while minimizing data balkanization. We’ll start by co-existing with NYC’s existing Web 2.0 API powered by Socrata, enlarging the Exchange as we go along by registering additional data sources and APIs, both from public and private data publishers.
- It does this using a pragmatic approach towards building this Data Exchange by layering a thin semantic interoperability layer over existing data silos. This layer allows semantic data mashups, replete with federation and inferencing, without forcing cities to revamp their application portfolio, standards and practices.
- Building a Data Exchange, however, is not enough as evinced by the relatively slow adoption of Linked Data.
- We also aim to address usability, discoverability and community - creating an environment that accelerates socio-semantic data innovation.
- We believe that the Smart City is the gateway towards practical Linked Data adoption. The major stakeholders are localized, the domain boundaries are readily identifiable and bound by a common theme, and the market is quite large as urbanization accelerates - large enough to create and sustain a viable data innovation ecosystem within and across cities. (C2C - with various combinations/permutations of City, Citizens, City Users & Commerce. E.g. City 2 City, Citizen 2 City, Citizen 2 Commerce, etc. etc.)
Is it just an online data dictionary?
- A: NO! NO! NO!!! NYCFacets is much more than that. Though it is an awesome data dictionary :), it can do soooo much more - query the metadata AND the data ACROSS datasets; transform query results in various ways (CSV, JSON, Charts, Sortable tables, etc.); Visualizations; etc. etc. etc. See our Tours section for more info.
Why?
- A: We believe that Open Data innovators need more than just access to the data to build the Digital Metropolis of the Future. They also need help to navigate and explore the huge data trove that is and will be NYC Open Data[1]. And that is before we tap into external NYC-related datasources.
- Much the same way steel scaffolding and elevators made skyscrapers possible[2], Pediacities aims to equip Open Data Innovators to become NYC's Digital Ironworkers of the Future.
- We'd like to think of it as "An Open Data Innovation Accelerator". ;)
Who is your Target Audience?
- A:The primary audiences are "Developers that are trying to use the NYC Open Data Catalog" and "NYC Open Data publishers" (City agencies, Socrata, etc.). The secondary audiences are "NYC Open Data Researchers" (journalists, researchers); and the General Public.
- For this early version, NYCFacets is heavily slanted towards developers. Ultimately, we want to expose entry points that are appropriate for other audience types.
What are the supported browsers?
- A:NYCFacets is best viewed in Chrome, Firefox, Safari and IE8, in order of preference. IE9 is still problematic.
How often do you update your data?
- A: At this stage, we crawl the NYC Open Data Catalog once a week, during the weekend.
What has it got to do with the Semantic Web? Linked Open Data?
- A: A Lot! But we've done a lot of the heavy-lifting - ontology mapping, curation, annotation, etc. and made it more approachable and accessible to people (developers and regular folks alike) who may not be versed in Web 3.0 technologies.
What is Pediacities Rank?
- A: Think of it as Google Pagerank for Open Data. In this early implementation, we're using it to score each dataset along these dimensions - Freshness, Sparseness, Social, Views, Downloads. It is a simple way to create a positive feedback loop to help Open Data Publishers and Consumers cooperatively refine the data.
- It helps promote SEO for Open Data - Open Data Optimization. Read more about it here.
What about other Socrata powered sites?
- A: Yes. Our approach can be easily ported to other Socrata-powered sites, but we're focused on NYC at the moment. We are also keenly following the Cities.gov Multi-Jurisdictional Unified Data collaboration.
What technologies did you use to create NYCFacets?
- A: As evinced by the numerous logos in our footer, we tapped a whole slew of technologies to construct NYCFacets. But we primarily used Semantic Mediawiki and SMW+ for the front-end along with a generous helping of various standards-based, mostly open-source technologies.
- For the backend — I'm afraid we can't really talk too much about the backend ;)
- It also goes without saying that we stood on the Shoulders of many Friends to whip this early version into shape.
Who is Pediacities?
- A: Pediacities is the Smart Cities brand of Ontodia. Ontodia is a little Metro NYC, semtech startup focused on building "human-powered, machine-accelerated, collective knowledge systems", or "crowdknowing" as we call it.
- NYCFacets is but our first offering in this space - a small, but necessary step of cataloging and characterizing the rich data catalog that NYC has to offer, on top of which we aim to create a rich ecosystem for Open Data innovation.
- In 2012, we'll launch an exciting, Smart City solution that is geared to the general public - watch this space :)
Notes
- ↑ Open Government/Innovation: Open Data http://www.nyc.gov/html/doitt/html/open/data.shtml
- ↑ http://en.wikipedia.org/wiki/Skyscraper_design_and_construction

