I am moving this site over to Squarespace. If you’d like to catch my semi-annual posts, you can check them out at http://www.seanoc.com/blog
Recently Justin Lilly and I have taken up the day-to-day running of the Django-NYC users group to take some load off its original founders, Loren Davie and Kevin Fricovsky. Loren and Kevin have done a great job of getting the group off the ground and building a community around Django in NYC. Unfortunately, as with most user groups, running Django-NYC has become a very time consuming venture. Accordingly they asked Justin and I to come in and help out for a while.
As part of this change of guard Justin and I have started a few new initiatives for the group to try and get more people involved. These initiatives include recording talks, holding hack nights, providing tutorial events, and reaching out to more Django-friendly organizations around New York.
Simplest and most accessible of these initiatives is the recordings of talks from our meetings. We now have the recordings available as a podcast (iTunes, RSS) and a Vimeo channel. We are also experimenting with streaming the meetings live at livestream.com.
Hack Nights provide a great way for those of us with active Django projects to work in a collaborative environment. The rules for a hack night are that you have to come with a project to work on and you will be strongly pushed to show off your work at the end of the night. So far we’ve held only one hack night, but we are in the process of planning more and refining the event format. These are great events for our more advanced members since they are able to get some work done and talk to others about their specific challenges/solutions.
Tutorials are the yin to the hack nights’ yang. They will provide a venue for new Django users to dig their teeth into Django and learn about its awesomeness. Our first tutorial is coming up this week and we are very excited to see how it goes.
Finally we are reaching out to various Django-friendly organizations around the city to participate in all of these new events. HUGE has been and will continue to be and excellent host and supporter of the group. That being said, Brooklyn is kind of a pain for many potential members to get to. Accordingly we will be moving events around the city in an effort to better engage Djangonauts from all around New York. The first such event will be our tutorial session this week which is being hosted by Six Apart. Additionally we are working to collaborate better with other NYC user groups, in particular NYCPython. This December we will be holding a joint meeting and we are discussing other possible collaborative events.
So that’s all of the new stuff happening with Django-NYC for now. Stay tuned for updates on even more that we have lined up. Have any other ideas? Belong to an organization who would like to host an event? Have any questions or comments? Let me know in the comments or contact me at firstname.lastname@example.org.
For the last week or so I’ve been wrapping my brain around Solr and Solango. The whole time that I’ve been doing this I’ve had the feeling that they can do awesome, powerful things but they’re documentation is so poor that I couldn’t figure anything out beyond the basic examples. Ultimately I had to dig through a bunch of code and do some experimentation. Now that I’ve finally figured out how to do what I’ve been trying to do and have wrapped my brain around some of the trickier bits I’m going to share some of the gotchas and solutions I’ve found.
This is one thing which is reasonably well documented for Solango. Getting solr installed isn’t the easiest process for anybody who isn’t a Java dev but there’s also only so much one can do to help the inexperienced deal with the finicky creature which is Tomcat. To that end here is the first bunch of tips related to install/Tomcat:
- Don’t try to use Tomcat installed by a package manager – Unless your a Java person, this will almost certainly install tomcat with a bunch of non-stock configs that you will spend a huge amount of time adjusting to get Solr to work.
- Be mindful of your working directory while starting tomcat – If you don’t install tomcat via a package manager you probably will start and stop it with the `startup.sh` an `shutdown.sh` scripts which come with tomcat. This generally works fine but be aware, they are very sensitive to your working directory. If you start tomcat from your home directory, then all of your solr data files will end up in your home directory. Somewhat obviously if your not consistent about where you start Tomcat from, things are going to get very confused and very broken, very quickly.
Once your up and running you will need to setup your search documents and get Solango talking to Solr. Again in this area the project documentation is pretty decent but here are my relevant tips:
- Make sure that your SOLR_ROOT and related settings are set to the solr child directory of what ever your working path was when you started Tomcat. If you don’t, things will not be happy.
- Know that all the SearchDocument classes you create, get aggregated into a common document. This is mentioned in the Solango documentation and makes sense since in the end your doing a search across your entire site but it can lead to some confusing errors/problems. Just keep this in mind while designing your documents and debugging. Particularly be careful of using common field names across SearchDocument classes with different definitions.
Now you have some simple indexing and searching going, you probably want to do some more interesting. This is where nearly all of the documentation out there either doesn’t exist or boils down to the eternally useful Javadoc style. From here on I’ll break things down by the more advanced topic I dealt with.
Facets are more or less a way to filter the results of a search by certain document attributes. The best resource I’ve found for reading facets was this article. To be honest I had to read that a few times and play with thing before I really understood what was going on. Once you understand what they are, you can do some pretty neat things with facets. Here are my realizations with facets:
- Read Faceted Search with Solr again, seriously.
- You will probably want to have facets operate on multiple properties. Sometimes you may be able to do so by simply faceting on each of the properties. Other times you will need to normalize the values of the various properties into a single new property and facet on that (especially if you want to do nested facets).
Even if you think you want nested facets, you probably don’t really need nested facets. Read this again and think about it for a while. If you decide that you really do want nested facets here’s what you will need to know:
- The concept of nested facets is purely a Solango construct, Solr does not currently have any formal concept of nested Facets.
- Nested facets are referenced in the Solango documentation but are not actually described or explained. This means at some point you will probably need to dig into the Solango source to get them working.
- Since nested facets are purely a Solango concept, they work by populating a document field with entries which look like “parent value__child value__grand child value”.
- You must generate these entires using a transform menthod on your SearchDocument classes.
- The separator (“__” in the example above) is defined by your settings file as FACET_SEPERATOR.
- The default setting from Solango for this seperator is not URL safe and will break things.
- If you want to have nice display versions of your nested facets you will need to patch solango. I have done so in my fork at GitHub. Eventually I’ll try to contribute this back.
- If your documents need multiple facet property values (they probably do), you will need to use a multivalued field. Keep reading to learn about those.
Frequently you will need to index documents which have ManyToMany or ForegnKey which you would like to search or facet by. This is problematic at first glance since solr doesn’t support anything like a join. The solution however comes in the form of a bit of de-normalization called multi-value fields. All solr field types take an optional “multivalued” property. If this property is true you are able to provide Solr with multiple field entries in your data XML, then it will use all of the provided values when searching or faceting. Solr does not do anything to munge the values together into a CSV or anything like that, this way you will see the discrete values in your search results or while you are viewing your solr indexes. Here are the gotchas I’ve found with multivalued fields:
- Solango doesn’t currently support them – You can set the “multivalued” property on your search documents and solango properly uses that information when generating a schema.xml file, it doesn’t have any way to actually populate a multivalued field with multiple values. If you need multivalued fields you can checkout my fork of solango at GitHub. With my version of Solango you can return any interator from your transform methods and they will be handled properly. Eventually I’ll work to get this merged back into mainline solango.
- Generally you need to store the literal string, numeric, or datetime value of your related fields in the solr index. Since you can’t do joins anything but the literal value is generally pretty useless.
- Since Solango doesn’t have any way to know where the data for related fields comes from beyond the transform method and since you are storing literal values, your Solr index won’t automatically be kept up to date when you change related objects. To keep your index up to date you will need to either live with stale data for a while and periodically do a full reindex or setup code using signals to hook onto changes of related objects and update the appropriate solr documents.
Hopefully this helps you avoid some pain and suffering. Given a bit more time/energy I might even work towards turning these into a bit more formal documentation.
If you find any better solutions or have any questions bring them up in the comments!
After a bit of a break, there is now a bit of new functionality in sPaste. Now when creating a snippet users can select a date and a number of views, after which the snippet will self-destruct. This update adds a great extra layer of security by only allowing information to exist in sPaste just as long as it needs to, and no longer. All while keeping sPaste very simple and easy to use.
Check it out and get a copy of sPaste to run on your own server at https://spaste.com!
A few months ago, I found myself with a frequent itch at work. Often I needed to send sensitive information such as access credentials to co-workers and clients. Obviously email isn’t an acceptable way to send this information, but anything much more complex than email overly frustrates who ever I am sending information to.
My solution to this problem was to create a new site called sPaste. sPaste is a secure pastebin where small snippets of text can be easily submitted, secured, and sent to other people.
At a personal level, sPaste was great success. sPaste took me less than 40 hours to write (thanks to the magic of Django and jQuery) and it perfectly meet my needs. Unfortunately in its existing form, it could never really be useful to most other people. Since sPaste is running on my server, anybody using sPaste would have to trust that I’m not doing anything evil with their data. Accordingly, for the past couple of weeks I’ve been working on cleaning things up and putting together an open source release.
Today I finally finished that work and have released sPaste|source. sPaste|source is a ready to go site, based on Django, which gives all of the features an functionality of sPaste.com on your own server. Simply grab the source, set it up like any other full Django project, and you will be ready to go.
Check it out! Feedback and comments are always welcome.
Recently while playing around with a Django model in the always awesome iPython shell I discoved a neat feature of the Django ORM. It’s basically a way to get the id of a related object without actually triggering a query to get all of the related object’s data.
Frequently when working with a model which has a foreign key, I simply want to access the id of the related item and I don’t care about any of the related item’s other information. Situations where this comes up include generating links and building queryset filters. Unfortunately if I follow the normal Django style and do something like “item.related.id”, the Django ORM will fire off an extra query to get all of the information of the related object (which I don’t care about). While this is far from tragic it is still unnecessary work since all I care about is the object’s id and that is already contained in the “item” object.
Fortunately there is an alternative! Instead of getting the id via “item.related.id” one can say “item.related_id”. Using this method, no extra query is performed and I get just the id value I was looking for.
There are two things to be aware of with this trick. First I have not found this feature documented anywhere with a cursory search of the Django docs. This means that I am not sure how much this feature is actually supported and how permament it may or may not be. Second while I have not tested it, I suspect that if your foreign key has a custom database field name, the field on the model will match that custom field name.
As I come across any other quick Django tricks I may start making them a regular feature here on the blog. If you found this intersting or have your own quick django trick, let me know and leave a comment!
Tonight I am officially launching my new side project, sPaste.com! sPaste is a simple tool to help you quickly and securely send small snippets of data around the web. As usual, the site is written entirely in Django on the server side and jQuery on the client side. At the moment it is little more than a pastebin which forces the use of SSL but soon it will offer additional features and tools.
Check it out at https://sPaste.com and let me know what you think!